Acquired Electronics360

Consumer Electronics

Watch How Caption Crawler Helps the Visually Impaired Interpret Online Images

13 February 2018

Source: University of Colorado-BoulderSource: University of Colorado-Boulder

Computer users who are visually impaired rely on text descriptions of images that vary based on website authors’ descriptive practices. Some website developers include descriptions of images in the code (called “alt text”) because it improves their search engine rankings. However, there’s no format for determining whether these descriptions are accurate or informative. As a result, developers often enter one-word descriptions such as “image” or “photo,” leaving the visually impaired with no useful information about the image.

Research conducted at University of Colorado-Boulder’s ATLAS Institute with collaborators from Microsoft Research has resulted in a system that collects captions and alt text associated with other instances of the same photo elsewhere online, associating human-authored descriptions with every website where it appears. The Caption Crawler image captioning system compiles descriptions in a database: if a photo has never been queried, it will offer alt text in about 20 seconds; if the photo has previously been processed, alt text is available almost immediately.

To replace poor-quality alt text, Caption Crawler users press a keyboard shortcut to request a replacement. The screen reader automatically speaks the new caption, which is the longest caption found for a particular photo. A different shortcut can also be activated to access any additional captions found.

Caption Crawler only works with images used on multiple websites, but the approach is effective because about half of website administrators provide informative photo descriptions. The scheme combines a Google Chrome Browser Extension with a Node.js cloud server. The browser extension searches the Document Object Model (DOM) of the active web page for image tags and background images, which are then sent to the server for caption retrieval. When Caption Crawler finds a caption for an image, the caption is streamed back to the browser extension, which then associates the caption to the image.

Research shows that humans produce higher quality captions than automated computer and machine-learning based approaches. Caption Crawler uses a hybrid system that captures both, prioritizing human captioning over machine learning and computer vision-based approaches. If no human-authored captions can be found, computer-generated captions from Microsoft’s CaptionBot are used to describe the image. When the text from CaptionBot is read aloud, the screen reader first speaks the words “CaptionBot,” so that the user is aware that the caption is not human-authored.

The research, which merges the benefits of a fully automated system with the quality of human-authored content, will be presented at the Association for Computing Machinery’s (ACM) 2018 Conference on Human Factors in Computing Systems (CHI) in Montreal in April.

To contact the author of this article, email

Powered by CR4, the Engineering Community

Discussion – 0 comments

By posting a comment you confirm that you have read and accept our Posting Rules and Terms of Use.
Engineering Newsletter Signup
Get the Engineering360
Stay up to date on:
Features the top stories, latest news, charts, insights and more on the end-to-end electronics value chain.
Weekly Newsletter
Get news, research, and analysis
on the Electronics industry in your
inbox every week - for FREE
Sign up for our FREE eNewsletter


Date Event Location
12-16 Aug 2018 Vancouver, Canada
11-13 Sep 2018 Novi, Michigan
27 Sep 2018 The Reef, Los Angeles
03-05 Oct 2018 Boston, Massachusetts
26 Oct 2018 Old Billingsgate
Find Free Electronics Datasheets