By Patricia Waldron

Native speakers often dominate the discussion in multilingual online meetings, but adding an automated participant that periodically interrupts the conversation can help nonnative speakers get a word in edgewise, according to new research at Cornell.

Xiaoyan Li, a doctoral student in the field of information science, used multilingual groups to test out the helpful bot – called a conversational agent – which was programmed to intervene after native speakers took six consecutive turns. The agent enabled nonnative speakers to break into the conversation, increasing their participation from 12% to 17% of all words spoken. 

While people who did not have English as a first language generally found the agent to be helpful, native speakers thought the intrusions were distracting and unnecessary.

“Nonnative speakers appreciated having a gap to reflect on the conversation and the opportunity to ask questions,” said Li. “Also, being invited to speak, they felt like their communication partners were valuing their perspectives.”

Li presented the study, “Improving Nonnative Speakers’ Participation with an Automatic Agent in Multilingual Groups,” Jan. 9 at the Association for Computing Machinery (ACM) International Conference on Supporting Group Work. The paper is published in Proceedings of the ACM on Human-Computer Interaction.

The inspiration for the study struck Li when she was a new student at Cornell, trying to contribute to group discussions in her communications seminar. Despite being fluent in English, Li struggled to identify natural gaps in the discussion and to beat native speakers to the openings.

“When the nonnative speakers don’t speak up in class, people assume that it’s just because they had nothing to say,” said co-author Susan Fussell, professor of information science in the Cornell Ann S. Bowers College of Computing and Information Science, and of communication in the College of Agriculture and Life Sciences. “Nobody ever thinks it is because they have problems getting the floor.”

For the study, Li recruited 48 volunteers and placed them into groups of three, with two native English speakers and a native Japanese speaker meeting in a videoconference. The groups completed three survival exercises, which involved discussing imaginary disaster scenarios and ranking which items salvaged from a boat, plane or spaceship (e.g., ax, compass, newspaper, etc.), would be useful for survival.

One exercise included the automated agent and for another, the groups were on their own. In a third exercise, nonnative speakers could secretly activate the agent when they wanted to speak, instead of waiting for it to intervene. The Japanese speakers rarely used this option, however, for fear of interrupting the conversation at the wrong time.

The agent used IBM Watson automatic speech recognition software to track who was speaking, and would blink and wave to allow another participant a chance to talk. Co-author Naomi Yamashita, a distinguished researcher at the Nippon Telegraph and Telephone Corporation (NTT), built the agent.

Previous efforts to overcome language barriers – such as providing meeting transcripts, automatic language translation and graphics showing everyone’s participation level – have failed. In contrast, the agent proved remarkably successful, increasing participation from nonnative speakers by 40%.

In interviews after the survival exercises, nonnative speakers said the agent didn’t always interrupt at a good time, but being put on the spot forced them to be less worried about their grammar, so they could focus on getting their ideas across.

Native speakers, however, had a less positive view of the agent. “Nonnative speakers spoke a lot less, but the native speakers were not aware of that,” Li said. “So they blamed the agent for interrupting when they thought the conversation was equal."

Fussell’s group has recently developed its own agent and has several proposed improvements to test out.

“It’d be nice if the agent only intervened when the nonnative speaker had something they wanted to say, as opposed to just putting them on the spot,” Fussell said.

They may employ more subtle signals that it’s time to yield the floor, such as private messages to the native speakers, or they could use artificial intelligence or biosensors to determine when a nonnative speaker is ready for a gap.

Wen Duan, Ph.D. ’22, now a postdoctoral fellow at Clemson University, and Yoshinari Shirai of NTT, are co-authors on the paper.

Patricia Waldron is a writer for the Cornell Ann S. Bowers College of Computing and Information Science.