December 20, 2021
One-size fits all image descriptions on the web don't meet the needs of blind people
Online image descriptions—or "alt-text"—help people who are blind or have low vision easily access information by providing the context and detail needed to interact with websites meaningfully, securely and efficiently.
However, researchers at the University of Colorado Boulder recently published findings that suggest there is still a lot of work to be done to first generate and then improve these descriptions by creators across numerous platforms. The work, published in ACM SIGACCESS Conference on Computers and Accessibility, aims to fill that gap by exploring ways to create training materials that humans and artificial intelligence can use to author more useful image descriptions.
The research was led by CU Boulder alumnus Abigale Stangl along with Assistant Professor Danna Gurari—who recently joined the College of Engineering and Applied Science. Stangl earned her Ph.D. in Technology, Media and Society from the ATLAS Institute in 2019. She currently works remotely for the National Science Foundation as a Computing Research Association Computing Innovation Fellow (CI-Fellow) at the University of Washington.
She said the goal of the work was to investigate how to quickly create image descriptions that are responsive to the context in which they are found—no matter the platform or situation.
"We presented 28 people who are blind with as much information as possible about five images and then asked them to specify what information they would like about the image for the different scenarios," Stangl said. "Each scenario contained a media source in which an image is found and a predetermined information goal. For instance, we considered a person visiting a shopping website to find a gift for a friend as a potential scenario."
Stangl said the work provided several key findings. One was that the information blind people want in an image description changes based on the scenario in which they are encountering the image.
"For alt-text to be accurate, both human and AI systems will need training to author image descriptions that are responsive or context-aware to the user's information goal along with where the image is found," she said.
Other findings suggest that there are some types of information that blind people want for an image across all scenarios, and thus it may be possible to determine what image content should always be included in those descriptions.
During her Ph.D. studies, Stangl volunteered with the Anchor Center for the Blind, the Colorado Center for the Blind and the National Federation for the Blind to better understand the barriers blind people face in gaining access to information and becoming artists and designers themselves. She said she has always been motivated to make sure that end-users and stakeholders are involved in the design process.
"My research with Professor Gurari was essentially a proof of concept that one-size-fits-all image descriptions do not meet the access needs of blind people. In it, we provide reflections and guidance for how our experimental approach may be used and scaled by others interested in creating user-centered training materials for context-aware image descriptions—or at least minimum viable image descriptions," she said. "I am looking forward to continuing it and exploring new approaches and problems in the near future."
Co-authors of the new study include Nitin Verma and Kenneth Fleischmann of The University of Texas at Austin and Meredith Ringel Morris of Microsoft Research.
More information: Abigale Stangl et al, Going Beyond One-Size-Fits-All Image Descriptions to Satisfy the Information Wants of People Who are Blind or Have Low Vision, The 23rd International ACM SIGACCESS Conference on Computers and Accessibility (2021). DOI: 10.1145/3441852.3471233