June 1, 2020 feature

ConvoKit: An open-source toolkit to aid the analysis of conversations

by Ingrid Fadelli , Tech Xplore

In recent years, researchers have developed increasingly advanced natural language processing (NLP) techniques that can be trained to process, interpret and respond to sentences in human languages. In addition, some have developed toolkits that can guide researchers who are developing, training and evaluating NLP techniques.

Researchers at Cornell University have recently put together a new toolkit, dubbed ConvoKit, containing existing tools, methods and data that are ideal for developing and training NLP models designed to analyze human conversations and social interactions. This new toolkit, presented in a paper set to be presented at the SIGDIAL conference next month, makes a variety of cutting-edge techniques accessible to users with different levels of technical expertise.

"Through conversations, we discuss, collaborate, empathize and make our voices heard," Caleb Chiam, one of the researchers who developed the toolkit, told TechXplore. "Existing NLP toolkits, however, are not designed to work directly with conversational structures. ConvoKit fills that gap, as it is designed to make computational tools for conversational analysis accessible to users—no matter their technical background."

ConvoKit presents conversational data in a simple, user-friendly format. This basic format allows both expert and non-expert developers to explore and annotate the data, as well as running computations on it.

"Every conversation is about some set of individuals speaking to each other, saying certain things, in a specific order," Chiam explained. "We might typically record those conversations as transcripts—think, for example, of the transcripts we have of every "Friends' episode or every Supreme Court session (both of which are available in ConvoKit format, among many others). ConvoKit represents a set of such conversations as a 'corpus.'"

In ConvoKit, every corpus of conversations has three main elements or components, namely speakers (i.e., who is speaking), conversations (i.e., the overall exchange between two or more speakers) and utterances (i.e., what was said by a speaker at different points during a conversation). These three elements are considered 'first-class objects," which means that the toolkit enables their use as primary units of analysis.

A user could, for example, use ConvoKit to predict which speakers are more likely to mimic the linguistic style of other speakers, what conversations are more likely to become 'toxic' based on how they started off, or which utterances are polite and which ones are rude. This makes it ideal for conducting analyses that focus on specific aspects of conversations.

"ConvoKit's structure makes it easy to explore conversations," Chiam said. "For example, with these data structures, it is straightforward to pick any speaker in the dataset and go through the utterances made by that speaker and the conversations they have started. Similarly, you could choose any conversation in the dataset and iterate through the utterances that form the conversation or the speakers that were involved."

The new toolkit developed by Chiam and his colleagues also has a variety of transformers built into it, which enable more in-depth analyses. Transformers are modules that can be easily run on a conversational corpus, analyzing them using sophisticated machine learning and NLP methods.

"These computational methods can be adapted and applied to any given conversational corpus," Chiam said. "Moreover, ConvoKit users can design their own transformers for their own custom analysis. One can find examples of customized transformer features listed on convokit.cornell.edu. These include things like linguistic coordination, politeness strategies, prompt types, and much more."

The new toolkit could prove extremely valuable for both developers and non-expert tech enthusiasts who are trying to create tools for the automatic analysis of conversations. ConvoKit is very easy to use and highly customizable, which makes it ideal for a variety of NLP applications.

"ConvoKit is in active development," Chiam said. "While much of the codebase is stable at this point, we have in the works many more methods and datasets that are currently being developed as part of our other active research. Also, since this is an open-source effort, we expect external contributions as well. Follow our GitHub page for the latest updates."

More information: ConvoKit: A Toolkit for the analysis of conversations. arXiv:2005.04246 [cs.CL]. arxiv.org/abs/2005.04246

github.com/CornellNLP/Cornell- … nal-Analysis-Toolkit

Citation: ConvoKit: An open-source toolkit to aid the analysis of conversations (2020, June 1) retrieved 19 April 2024 from https://techxplore.com/news/2020-06-convokit-open-source-toolkit-aid-analysis.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Creating chatbots with multiple conversational skills

158 shares

Feedback to editors

Researchers develop sodium battery capable of rapid charging in just a few seconds

6 hours ago

Greater access to clean water, thanks to a better membrane

8 hours ago

Silent flight edges closer to take off, according to new research

8 hours ago

A flexible and efficient DC power converter for sustainable-energy microgrids

9 hours ago

Microsoft's AI app VASA-1 makes photographs talk and sing with believable facial expressions

9 hours ago

To build a better AI helper, start by modeling the irrational behavior of humans

9 hours ago

Versatile fibers offer improved energy storage capacity for wearable devices

10 hours ago

Harnessing solar energy for high-efficiency NH₃ production

10 hours ago

A dexterous four-legged robot that can walk and handle objects simultaneously

12 hours ago

Climate change will increase value of residential rooftop solar panels across US, study finds

14 hours ago

Load comments (0)

ConvoKit: An open-source toolkit to aid the analysis of conversations

Researchers develop sodium battery capable of rapid charging in just a few seconds

Greater access to clean water, thanks to a better membrane

Silent flight edges closer to take off, according to new research

A flexible and efficient DC power converter for sustainable-energy microgrids

Microsoft's AI app VASA-1 makes photographs talk and sing with believable facial expressions

To build a better AI helper, start by modeling the irrational behavior of humans

Versatile fibers offer improved energy storage capacity for wearable devices

Harnessing solar energy for high-efficiency NH₃ production

A dexterous four-legged robot that can walk and handle objects simultaneously

Climate change will increase value of residential rooftop solar panels across US, study finds

Creating chatbots with multiple conversational skills

Researcher examines how people perceive interruptions in conversation

Facebook researchers build a dataset to train personalized dialogue agents

Bridging the 'liking-gap,' researchers discuss awkwardness of conversations

Predicting when online conversations turn toxic

Hey Google, are my housemates using my smart speaker?

Microsoft's AI app VASA-1 makes photographs talk and sing with believable facial expressions

To build a better AI helper, start by modeling the irrational behavior of humans

Team develops a way to teach a computer to type like a human

For more open and equitable public discussions on social media, try 'meronymity'

Using sim-to-real reinforcement learning to train robots to do simple tasks in broad environments

Meta's newest AI model beats some peers. But its amped-up AI agents are confusing Facebook users

Phys.org

Medical Xpress

Science X

ConvoKit: An open-source toolkit to aid the analysis of conversations

Researchers develop sodium battery capable of rapid charging in just a few seconds

Greater access to clean water, thanks to a better membrane

Silent flight edges closer to take off, according to new research

A flexible and efficient DC power converter for sustainable-energy microgrids

Microsoft's AI app VASA-1 makes photographs talk and sing with believable facial expressions

To build a better AI helper, start by modeling the irrational behavior of humans

Versatile fibers offer improved energy storage capacity for wearable devices

Harnessing solar energy for high-efficiency NH₃ production

A dexterous four-legged robot that can walk and handle objects simultaneously

Climate change will increase value of residential rooftop solar panels across US, study finds

Related Stories

Creating chatbots with multiple conversational skills

Researcher examines how people perceive interruptions in conversation

Facebook researchers build a dataset to train personalized dialogue agents

Bridging the 'liking-gap,' researchers discuss awkwardness of conversations

Predicting when online conversations turn toxic

Hey Google, are my housemates using my smart speaker?

Recommended for you

Microsoft's AI app VASA-1 makes photographs talk and sing with believable facial expressions

To build a better AI helper, start by modeling the irrational behavior of humans

Team develops a way to teach a computer to type like a human

For more open and equitable public discussions on social media, try 'meronymity'

Using sim-to-real reinforcement learning to train robots to do simple tasks in broad environments

Meta's newest AI model beats some peers. But its amped-up AI agents are confusing Facebook users

Your Privacy