Humans are highly adaptable creatures. Whether it be learning from past experience or understanding social expectations, we move from one situation to another with ease. For artificial intelligence, adapting to new situations is not as easy. Though AI models are able to hold enormous quantities of knowledge and learn from past mistakes, they lack a general understanding of implicit information and common sense that often informs our decision making.
In order to test AI's ability to master decision making skills in diverse settings and contexts, Jonathan May, ISI researcher and research assistant professor of computer science at Viterbi, teamed up with ISI Senior Supervisory Computer Scientist Ralph Weischedel and Ph.D. student Xusen Yin to create an intricate training process for AI models.
Previously, May had conducted a research study which aimed at exploring ways in which AI chatbots could incorporate improv into conversation. By building upon a "yes-and" approach that's commonly used in improv studies, May and his team created SpolinBot, a chatbot that's able to generate engaging conversation that goes beyond simply reacting to a message.
Whereas his previous project was centered around creating fun and engaging conversation, May's newer work seeks to explore the human-like capabilities of AI even further. This was done specifically through Deep Reinforcement Learning, a process in which deep neural networks contribute to helping models learn from their mistakes and make the right decisions towards a better outcome.
"We could make dialogs to be fluent, engaging, and even empathetic, given a different training corpus. But most dialog agents couldn't stick to a problem to solve, especially in a long conversation," said Yin.
In this research study, the challenge was for AI to master text-based games which followed a "choose-your-own-adventure" structure. Specifically, the researchers used a series of cooking games to train BERT, a well-known language-processing model originally developed by Google. Because each decision in the game leads to either a positive or negative outcome, the AI model eventually learns which decisions are beneficial and which aren't desired. However, the lack of common sense causes AI models to exhaust all options before coming to the best decision.
"If the agent has common sense, it would save a lot of searching time and concentrate on the more important task-specific knowledge," explained Yin.
Through Deep Reinforcement Learning, May and his team were able to not only train BERT with the necessary decision making skills to achieve a desirable outcome on unseen cooking games, but also generalize these skill sets to novel games in a completely unseen treasure-hunting domain.
"Each micro-decision you make may not teach you whether you're on the right path, but eventually you'll learn this, and that'll help you the next time you have to make decisions," explained May about the objective of the project.
The development of sequential decision making skills will prove important in artificial intelligence models because it allows for more contextually flexible interaction. If modern dialog and assistant AI bots were able to adopt complex decision making skills, our interactions with them would be much more efficient and helpful.
Moving forward, May and his team are looking to combine the improv abilities of SpolinBot with the decision-making skills of this new venture. The main obstacle is that the current bot is conditioned to choose between a given set of decisions; in order to combine the two projects, the AI model would have to learn to balance both creativity and decision-making at once.
With the successes of research studies like this one, AI is getting closer and closer to resembling human characteristics that were previously exclusive to our kind. This study and others like it will propel the artificial intelligence field into one that truly understands the ins and outs of being human.
More information: Learning to Generalize for Sequential Decision Making: arXiv:2010.02229v1 [cs.CL] arxiv.org/abs/2010.02229
Provided by University of Southern California