April 24, 2018 weblog

Researchers explore working up cartoons using text descriptions

by Nancy Owano , Tech Xplore

What if you were told you can create cartoons by just working off text descriptions?

Reports are in that a group of researchers unveiled an AI capable of making original videos of "The Flintstones" from text descriptions.

Yes, these are scenes created by an artificial intelligence. Consider some scene description: Fred is wearing a blue hat and talking to Wilma in the living room. Wilma then sits down on a couch.

Composition, Retrieval and Fusion Network, or CRAFT, is the name of their model. The authors noted they showed CRAFT on Flintstones, a dataset with over 2,500 videos and each 75 frames long.

They have written a paper, titled "Imagine This! Scripts to Compositions to Videos" and it is on arXiv. The five researchers are Tanmay Gupta, Dustin Schwenk, Ali Farhadi, Derek Hoiem and Aniruddha Kembhavi. Author affiliations include The Allen Institute for Artificial Intelligence (AI2), The University of Illinois Urbana-Champaign and The University of Washington.

The authors said that once it is given a novel description, "Craft sequentially composes a scene layout and retrieves entities from a video database to create complex scene videos."

Tristan Greene, The Next Web, explained how the technology works: "Craft uses the annotations from videos to determine how the original images correspond to the words used to describe them. Eventually it builds up a set of parameters that enables it to 'understand' what makes individual characters and objects from the cartoon match their plain-language counterparts. Once it understands this relation, it's able to generate video clips based on novel text inputs that look a lot like the cartoon it was trained on."

The authors also discussed their model based on text:

"Unlike pixel generation approaches, our appearance model is based on text to entity segment retrieval from a video database. Spatio-temporal segments are extracted from the retrieved videos and fused together to generate the final video. The layout composition and entity retrieval work in a sequential manner which is determined by the language input."

The authors stated that "CRAFT outperforms direct pixel generation approaches."

Interestingly, video viewers wrote responses ranging from wow to tepid to confused.

Several thought it was Awesome; one remarked that it was "more advanced than I would have imagined" and another said "it still looks like if someone tried to animate for the first time on demo software. It looks like it has potential, though."

Another observer was more confused than startled. "I'm confused. My understanding is that the AI learned 25k fully annotated cartoons. And then the researchers typed in a text scenario, and the AI just found images that matched it? Isn't that just a simple retrieval of the corresponding video snippet based on a text lookup from the annotated database? What am I missing?"

Writers on tech sites offered their perspective about this research. Referring to the videos, The Next Web stepped in. OK it's a "glitchy little clip," as Tristan Greene put it. All the same, he added, "Today's glitchy little clip, generated from simple text phrases, could lead to tomorrow's entertainment being created from scratch by AI instead of studios full of people."

Andrew Liszewski in Gizmodo similarly found that the quality of the animations that were generated was "awful at best" and "no one's going to be fooled into thinking these are the Hanna-Barbera originals." Nonetheless, he added, seeing an AI generate a cartoon, with iconic characters, all by itself, was "a fascinating sneak peek at how some films and TV shows might be made one day."

Lucy Black wrote Sunday, in I Programmer that "This is more than just another clever trick with neural networks. It is a sign that AI is moving towards larger systems where deep neural networks do different jobs and work together to create the solution. You could call it the second stage of deep neural networks."

OK, unanswered question: Would animators lose their jobs. Black said, "Yes I suppose given time and effort something like CRAFT could be developed into a cartoon generator and throw thousands of animators out of a job, but computer graphics is already chipping away at that job market."

More information: Imagine This! Scripts to Compositions to Videos, arXiv:1804.03608 [cs.CV] arxiv.org/abs/1804.03608

Abstract
Imagining a scene described in natural language with realistic layout and appearance of entities is the ultimate test of spatial, visual, and semantic world knowledge. Towards this goal, we present the Composition, Retrieval, and Fusion Network (CRAFT), a model capable of learning this knowledge from video-caption data and applying it while generating videos from novel captions. CRAFT explicitly predicts a temporal-layout of mentioned entities (characters and objects), retrieves spatio-temporal entity segments from a video database and fuses them to generate scene videos. Our contributions include sequential training of components of CRAFT while jointly modeling layout and appearances, and losses that encourage learning compositional representations for retrieval. We evaluate CRAFT on semantic fidelity to caption, composition consistency, and visual quality. CRAFT outperforms direct pixel generation approaches and generalizes well to unseen captions and to unseen video databases with no text annotations. We demonstrate CRAFT on FLINTSTONES, a new richly annotated video-caption dataset with over 25000 videos.

Citation: Researchers explore working up cartoons using text descriptions (2018, April 24) retrieved 31 August 2024 from https://techxplore.com/news/2018-04-explore-cartoons-text-descriptions.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Microsoft researchers build a bot that draws what you tell it to

27 shares

Feedback to editors

A quantum neural network can see optical illusions like humans do. Could it be the future of AI?

5 hours ago

Pilot plant demonstrates iron-based hydrogen storage feasibility

7 hours ago

Exploring the fundamental reasoning abilities of LLMs

10 hours ago

Research team proposes solution to AI's continual learning problem

17 hours ago

Virtual and augmented reality can temporarily change the way people perceive distances, finds study

Aug 30, 2024

Researchers develop ultra-high efficiency perovskite LEDs by strengthening lattice

Aug 30, 2024

Transparency is often lacking in datasets used to train large language models, study finds

Aug 30, 2024

Morphing facial technology sheds light on the boundaries of self-recognition

Aug 30, 2024

Silicon chip propels 6G communications forward

Aug 29, 2024

Scalable graphene technology could significantly enhance battery safety and performance

Aug 29, 2024

Load comments (0)

Researchers explore working up cartoons using text descriptions

A quantum neural network can see optical illusions like humans do. Could it be the future of AI?

Pilot plant demonstrates iron-based hydrogen storage feasibility

Exploring the fundamental reasoning abilities of LLMs

Research team proposes solution to AI's continual learning problem

Virtual and augmented reality can temporarily change the way people perceive distances, finds study

Researchers develop ultra-high efficiency perovskite LEDs by strengthening lattice

Transparency is often lacking in datasets used to train large language models, study finds

Morphing facial technology sheds light on the boundaries of self-recognition

Silicon chip propels 6G communications forward

Scalable graphene technology could significantly enhance battery safety and performance

Microsoft researchers build a bot that draws what you tell it to

Researchers create first image-recognition software that greatly improves web searches

Smarter video searching and indexing

Deep-learning algorithm creates videos of the future

Virtual reality makes journalism immersive, realism makes it credible

Researchers explore photographic images synthesized from semantic layouts

Exploring the fundamental reasoning abilities of LLMs

Research team proposes solution to AI's continual learning problem

Virtual and augmented reality can temporarily change the way people perceive distances, finds study

Google's GameNGen simulates parts of video game Doom

Study seeks to unite high-performance computing, quantum computing for science

Universal accelerator finds faster answers to complex problems

Phys.org

Medical Xpress

Science X

Researchers explore working up cartoons using text descriptions

A quantum neural network can see optical illusions like humans do. Could it be the future of AI?

Pilot plant demonstrates iron-based hydrogen storage feasibility

Exploring the fundamental reasoning abilities of LLMs

Research team proposes solution to AI's continual learning problem

Virtual and augmented reality can temporarily change the way people perceive distances, finds study

Researchers develop ultra-high efficiency perovskite LEDs by strengthening lattice

Transparency is often lacking in datasets used to train large language models, study finds

Morphing facial technology sheds light on the boundaries of self-recognition

Silicon chip propels 6G communications forward

Scalable graphene technology could significantly enhance battery safety and performance

Related Stories

Microsoft researchers build a bot that draws what you tell it to

Researchers create first image-recognition software that greatly improves web searches

Smarter video searching and indexing

Deep-learning algorithm creates videos of the future

Virtual reality makes journalism immersive, realism makes it credible

Researchers explore photographic images synthesized from semantic layouts

Recommended for you

Exploring the fundamental reasoning abilities of LLMs

Research team proposes solution to AI's continual learning problem

Virtual and augmented reality can temporarily change the way people perceive distances, finds study

Google's GameNGen simulates parts of video game Doom

Study seeks to unite high-performance computing, quantum computing for science

Universal accelerator finds faster answers to complex problems

Your Privacy