Using Artificial Intelligence to Generate Alt Text on Images

Web developers and content editors alike often forget or ignore one of the most important parts of making a website accessible and SEO performant: image alt text. You know, that seemingly small image attribute that describes an image:

<img src="/cute/sloth/image.jpg" alt="A brown baby sloth staring straight into the camera with a tongue sticking out." >

A brown baby sloth staring straight into the camera with a tongue sticking out. — 📷 Credit: Huffington Post

If you regularly publish content on the web, then you know it can be tedious trying to come up with descriptive text. Sure, 5-10 images is doable. But what if we are talking about hundreds or thousands of images? Do you have the resources for that?

Let’s look at some possibilities for automatically generating alt text for images with the use of computer vision and image recognition services from the likes Google, IBM, and Microsoft. They have the resources!

Reminder: What is alt text good for?

Often overlooked during web development and content entry, the alt attribute is a small bit of HTML code that describes an image that appears on a page. It’s so inconspicuous that it may not appear to have any impact on the average user, but it has very important uses indeed:

Web Accessibility for Screen Readers: Imagine a page with lots of images and not a single one contains alt text. A user surfing in using a screen reader would only hear the word “image” blurted out and that’s not very helpful. Great, there’s an image, but what is it? Including alt enables screen readers to help the visually impaired “see” what’s there and have a better understanding of the content of the page. They say a picture is worth a thousand words — that’s a thousand words of context a user could be missing.
Display text if an image does not load: The World Wide Web seems infallible and, like New York City, that it never sleeps, but flaky and faulty connections are a real thing and, if that happens, well, images tend not to load properly and “break.” Alt text is a safeguard in that it displays on the page in place of where the “broken” image is, providing users with content as a fallback.
SEO performance: Alt text on images contributes to SEO performance as well. Though it doesn’t exactly help a site or page skyrocket to the top of the search results, it is one factor to keep in mind for SEO performance.

Knowing how important these things are, hopefully you’ll be able to include proper alt text during development and content entry. But are your archives in good shape? Trying to come up with a detailed description for a large backlog of images can be a daunting task, especially if you’re working on tight deadlines or have to squeeze it in between other projects.

What if there was a way to apply alt text as an image is uploaded? And! What if there was a way to check the page for missing alt tags and automagically fill them in for us?

There are available solutions!

Computer vision (or image recognition) has actually been offered for quite some time now. Companies like Google, IBM and Microsoft have their own APIs publicly available so that developers can tap into those capabilities and use them to identify images as well as the content in them.

There are developers who have already utilized these services and created their own plugins to generate alt text. Take Sarah Drasner’s generator, for example, which demonstrates how Azure’s Computer Vision API can be used to create alt text for any image via upload or URL. Pretty awesome!

See the Pen
Dynamically Generated Alt Text with Azure's Computer Vision API by Sarah Drasner (@sdras)
on CodePen.

There’s also Automatic Alternative Text by Jacob Peattie, which is a WordPress plugin that uses the same Computer Vision API. It’s basically an addition to the workflow that allows the user to upload an image and generated alt text automatically.

Tools like these generally help speed-up the process of content management, editing and maintenance. Even the effort of thinking of a descriptive text has been minimized and passed to the machine!

Getting Your Hands Dirty With AI

I have managed to have played around with a few AI services and am confident in saying that Microsoft Azure’s Computer Vision produces the best results. The services offered by Google and IBM certainly have their perks and can still identify images and proper results, but Microsoft’s is so good and so accurate that it’s not worth settling for something else, at least in my opinion.

Creating your own image recognition plugin is pretty straightforward. First, head down to Microsoft Azure Computer Vision. You’ll need to login or create an account in order to grab an API key for the plugin.

Once you’re on the dashboard, search and select Computer Vision and fill in the necessary details.

Wait for the platform to finish spinning up an instance of your computer vision. The API keys for development will be available once it’s done.

Keys: Also known as the Subscription Key in the official documentation

Let the interesting and tricky parts begin! I will be using vanilla JavaScript for the sake of demonstration. For other languages, you can check out the documentation. Below is a straight-up copy and paste of the code and you can use to replace the placeholders.

var request = new XMLHttpRequest();
request.open('POST', 'https://[LOCATION]/vision/v1.0/describe?maxCandidates=1&language=en', true);
request.setRequestHeader('Content-Type', 'application/json');
request.setRequestHeader('Ocp-Apim-Subscription-Key', '[SUBSCRIPTION_KEY]');
request.send(JSON.stringify({ "url": "[IMAGE_URL]" }));
request.onload = function () {
    var resp = request.responseText;
    if (request.status >= 200 && request.status < 400) {
        // Success!
        console.log('Success!');
    } else {
        // We reached our target server, but it returned an error
        console.error('Error!');
    }

    console.log(JSON.parse(resp));
};

request.onerror = function (e) {
    console.log(e);
};

Alright, let’s run through some key terminology of the AI service.

Location: This is the subscription location of the service that was selected prior to getting the subscription keys. If you can’t remember the location for some reason, you can go to the Overview screen and find it under Endpoint.

Overview > Endpoint : To get the location value

Subscription Key: This is the key that unlocks the service for our plugin use and can be obtained under Keys. There’s two of them, but it doesn’t really matter which one is used.
Image URL: This is the path for the image that’s getting the alt text. Take note that the images that are sent to the API must meet specific requirements:
- File type must be JPEG, PNG, GIF, BMP
- File size must be less than 4MB
- Dimensions should be greater than 50px by 50px

Easy peasy

Thanks to big companies opening their services and API to developers, it’s now relatively easy for anyone to utilize computer vision. As a simple demonstration, I uploaded the image below to Microsoft Azure’s Computer Vision API.

Possible alt text: a hand holding a cellphone

The service returned the following details:

{
    "description": {
        "tags": [
            "person",
            "holding",
            "cellphone",
            "phone",
            "hand",
            "screen",
            "looking",
            "camera",
            "small",
            "held",
            "someone",
            "man",
            "using",
            "orange",
            "display",
            "blue"
        ],
        "captions": [
            {
                "text": "a hand holding a cellphone",
                "confidence": 0.9583763512737793
            }
        ]
    },
    "requestId": "31084ce4-94fe-4776-bb31-448d9b83c730",
    "metadata": {
        "width": 920,
        "height": 613,
        "format": "Jpeg"
    }
}

From there, you could pick out the alt text that could be potentially used for an image. How you build upon this capability is your business:

You could create a CMS plugin and add it to the content workflow, where the alt text is generated when an image is uploaded and saved in the CMS.
You could write a JavaScript plugin that adds alt text on-the-fly, after an image has been loaded with notably missing alt text.
You could author a browser extension that adds alt text to images on any website when it finds images with it missing.
You could write code that scours your existing database or repo of content for any missing alt text and updates them or opens pull requests for suggested changes.

Take note that these services are not 100% accurate. They do sometimes return a low confidence rating and a description that is not at all aligned with the subject matter. But, these platforms are constantly learning and improving. After all, Rome wasn’t built in a day.

Michael Crenshaw

# February 1, 2019

Cool approach. This will never reach the necessary scale, but wouldn’t it be funny if AI-generated alt texts created a feedback loop making AIs VERY opinionated about what’s in images? Funny…….. or terrifying.

Jens Oliver Meiert

What this can really mean is that screen readers could implement this so that @alt, one day, maybe, becomes obsolete.

Carlos

Awesome article Niño!

Josh H

# February 2, 2019

I tried it three times with varying degrees of success. This photo of a model, for instance, says “A woman talking on a cellphone”. Only, there is no cellphone and clearly she’s not talking. Otherwise I could see this being useful for making suggestions, but only if the API could return multiple possible options. Can it do that?

Gift

# February 3, 2019

This was an interesting read. I think if every developer or content creator should integrate this into their platform it’ll take out most of the work needed and they’ll end up following good practice. It’s a win-win for everyone.

Eric Bailey

# February 4, 2019

I’m cautiously optimistic about this technology, but I don’t think it’s ready for prime time.

Facebook, who arguably has some of the best image recognition tech out there, has some pretty uninspiring automatically generated alt descriptions. At best, they might clue you into what is in the image, but only surface level at best. There’s a pretty wide gulf between “dog indoors”, and “A golden retriever puppy starting up at you with a single feather sticking out of its mouth. There is a torn up pillow in the background.” At worst, they’ll mislabel the image.

There’s also the issue of what these kinds of libraries won’t describe, which can lead to an unintentional infantilizing effect. There have been scenarios where sculpture isn’t described because the technology thinks it is nudity, and therefore disallowed.

Both scenarios get into an incredibly troubling area, namely a tier of experience that is lower than what someone browsing under ideal circumstances may be experiencing.

Writing effective alt text can be a creative challenge. For example, Apple does an incredible job with the stock wallpapers iOS ships with.

Puddingsan

# February 5, 2019

Interesting notion, though I think it might end up providing humorous entertainment along the lines of smartphone autocorrect …

Adrian Roselli

# February 12, 2019

I have done some accessibility review work around auto-generated images. I also built a browser plug-in (private deploy only) for blind / low-vision users who wanted to at least get something in cases of missing alt text.

These approaches should almost never render final user-facing alt text. Icons performed poorly. For photos, it is not uncommon for an author to use the same image in more than one context, warranting different alt text. These tools cannot understand author intent nor parse surrounding context.

They can be great for priming an image library prior to human review or, as the author suggests, providing a stop-gap tool for users to lessen the impact of missing alts. I caution all readers against thinking this approach will ever replace the need for human-written text. See Eric’s reference above to Facebook’s ineffective effort.

David

Permalink to comment# March 4, 2019

To your point:
https://medium.com/@amyalexandraleak/should-you-use-alt-text-or-a-caption-48311e259ded

Chris Coyier

# July 28, 2019

Just dumping out some notes I had saved on this subject.

iOS app aiPicture
Chrome Plugin Auto Alt Text

Reminder: What is alt text good for?

There are available solutions!

Getting Your Hands Dirty With AI

Easy peasy

Comments