Google StyleDrop generates images from text
It took Da Vinci 16 years to paint the Mona Lisa. Some say he needed 12 years just to paint her lips.
There is no truth to the rumors that slow Internet was the cause.
But Da Vinci, a polymath who dabbled in botany, engineering, science, sculpture, and geology as well as painting, surely would have appreciated a new text-to-image generative vision transformer developed by Google Research.
StyleDrop returns images reflecting the user's specifications in about three minutes.
"The proposed method is extremely versatile and captures nuances and details of a user-provided style, such as color schemes, shading, design patterns, and local and global effects," Google said in its report "StyleDrop: Text-to-Image Generation in Any Style."
StyleDrop also creates typography that faithfully incorporates stylistic features of images.
For example, users could propose an image of a bridge, a letter and then specify a style of drawing. Such styles could be "melting golden rendering," "wooden sculpture," "3D rendering," "cartoon drawing" or any other preferred style. One's imagination is the only limit.
StyleDrop will then generate impressive renderings of objects with a Dali-like dripping bridge, or perhaps a cartoon-like version, along with letters incorporating the same characteristics.
StyleDrop works in connection with Google's Muse, a generative vision transformer unveiled earlier this year that offers a remarkable degree of photorealism. Muse was trained on 3 billion parameters, ensuring capacity for high-quality image generation.
Researchers evaluated the accuracy and quality of StyleDrop's output using industry standard CLIP text and style scoring as well as user feedback. Evaluations found StyleDrop "convincingly outperforms" other leading image-and text-generation methods, including DreamBooth, Imagen and Stable Diffusion.
The developers see this program, which has no yet been released to the public, as an invaluable aid to art directors and graphics designers who can create photorealistic imagery of designated products or themes that include text reflecting the same colors, structuring and style.
For a new product campaign, say for a new soda brand, an artist could propose in just a few words a sleek-shaped glass bottle nestled amid thousands of tulips in a Dutch field, with accompany text featuring letters constructed of 3D rendered glass, in the style of Impressionist Monet. In three minutes, with the right wording, a new ad campaign featuring a warm, brightly colored, scenic skyscape could be born.
The renowned typographer Helmut Schmidt once said, "Typography needs to be felt. Typography needs to be experienced." StyleDrop may well help designers bring a greater degree of intimacy and connectedness to their work.
The report acknowledges, however, that copyright protection is a concern.
"We recognize potential pitfalls such as the ability to copy individual artists' styles without their consent, and urge the responsible use of our technology," the report stated.
And just what instructions would Da Vinci have used for StyleDrop? "Draw a picture of an attractive noblewoman, kind of smiling but not too much, sitting outdoors with mountains in the background. Draw in the style of … Da Vinci." With the job done in three minutes—instead of 16 years—Leonardo, who loved botany, would have had plenty more time to go out and smell those roses.
More information: Kihyuk Sohn et al, StyleDrop: Text-to-Image Generation in Any Style, arXiv (2023). DOI: 10.48550/arxiv.2306.00983
© 2023 Science X Network