Using Artificial Intelligence to Generate Alt Text on Images

📣 Freelancers, Developers, and Part-Time Agency Owners: Kickstart Your Own Digital Agency with UACADEMY Launch by UGURUS 📣

Web developers and content editors alike often forget or ignore one of the most important parts of making a website accessible and SEO performant: image alt​ text. You know, that seemingly small image attribute that describes an image:

​​​<img src="/cute/sloth/image.jpg" alt="A brown baby sloth staring straight into the camera with a tongue sticking out." >

A brown baby sloth staring straight into the camera with a tongue sticking out.
📷 Credit: Huffington Post

If you regularly publish content on the web, then you know it can be tedious trying to come up with descriptive text. Sure, 5-10 images is doable. But what if we are talking about hundreds or thousands of images? Do you have the resources for that?

Let’s look at some possibilities for automatically generating alt text for images with the use of computer vision and image recognition services from the likes Google, IBM, and Microsoft. They have the resources!

Reminder: What is alt text good for?

Often overlooked during web development and content entry, the alt​ attribute is a small bit of HTML code that describes an image that appears on a page. It’s so inconspicuous that it may not appear to have any impact on the average user, but it has very important uses indeed:

  • ​​Web Accessibility for Screen Readers: Imagine a page with lots of images and not a single one contains alt​ text. A user surfing in using a screen reader would only hear the word “image” blurted out and that’s not very helpful. Great, there’s an image, but what is it? Including alt​ enables screen readers to help the visually impaired “see” what’s there and have a better understanding of the content of the page. They say a picture is worth a thousand words — that’s a thousand words of context a user could be missing.
  • Display text if an image does not load: The World Wide Web seems infallible and, like New York City, that it never sleeps, but flaky and faulty connections are a real thing and, if that happens, well, images tend not to load properly and “break.” Alt text is a safeguard in that it displays on the page in place of where the “broken” image is, providing users with content as a fallback.
  • ​​SEO performance: Alt text on images contributes to SEO performance as well. Though it doesn’t exactly help a site or page skyrocket to the top of the search results, it is one factor to keep in mind for SEO performance.

Knowing how important these things are, hopefully you’ll be able to include proper alt​ text during development and content entry. But are your archives in good shape? Trying to come up with a detailed description for a large backlog of images can be a daunting task, especially if you’re working on tight deadlines or have to squeeze it in between other projects.

What if there was a way to apply alt​ text as an image is uploaded? And! What if there was a way to check the page for missing alt​ tags and automagically fill them in for us?

There are available solutions!

Computer vision (or image recognition) has actually been offered for quite some time now. Companies like Google, IBM and Microsoft have their own APIs publicly available so that developers can tap into those capabilities and use them to identify images as well as the content in them.

There are developers who have already utilized these services and created their own plugins to generate alt​ text. Take Sarah Drasner’s generator, for example, which demonstrates how Azure’s Computer Vision API can be used to create alt​ text for any image via upload or URL. Pretty awesome!

​​See the Pen
​​Dynamically Generated Alt Text with Azure's Computer Vision API
by Sarah Drasner (@sdras)
​​on CodePen.
​​

There’s also Automatic Alternative Text by Jacob Peattie, which is a WordPress plugin that uses the same Computer Vision API. It’s basically an addition to the workflow that allows the user to upload an image and generated alt​ text automatically.

​​Tools like these generally help speed-up the process of content management, editing and maintenance. Even the effort of thinking of a descriptive text has been minimized and passed to the machine!

Getting Your Hands Dirty With AI

I have managed to have played around with a few AI services and am confident in saying that Microsoft Azure’s Computer Vision produces the best results. The services offered by Google and IBM certainly have their perks and can still identify images and proper results, but Microsoft’s is so good and so accurate that it’s not worth settling for something else, at least in my opinion.

Creating your own image recognition plugin is pretty straightforward. First, head down to Microsoft Azure Computer Vision. You’ll need to login or create an account in order to grab an API key for the plugin.

Once you’re on the dashboard, search and select Computer Vision and fill in the necessary details.

Starting out

Wait for the platform to finish spinning up an instance of your computer vision. The API keys for development will be available once it’s done.

​​Keys: Also known as the Subscription Key in the official documentation

Let the interesting and tricky parts begin! I will be using vanilla JavaScript for the sake of demonstration. For other languages, you can check out the documentation. Below is a straight-up copy and paste of the code and you can use to replace the placeholders.

​​var request = new XMLHttpRequest();
request.open('POST', 'https://[LOCATION]/vision/v1.0/describe?maxCandidates=1&language=en', true);
request.setRequestHeader('Content-Type', 'application/json');
request.setRequestHeader('Ocp-Apim-Subscription-Key', '[SUBSCRIPTION_KEY]');
request.send(JSON.stringify({ "url": "[IMAGE_URL]" }));
request.onload = function () {
    var resp = request.responseText;
    if (request.status >= 200 && request.status < 400) {
        // Success!
        console.log('Success!');
    } else {
        // We reached our target server, but it returned an error
        console.error('Error!');
    }

    console.log(JSON.parse(resp));
};

request.onerror = function (e) {
    console.log(e);
};

Alright, let’s run through some key terminology of the AI service.

  • Location: This is the subscription location of the service that was selected prior to getting the subscription keys. If you can’t remember the location for some reason, you can go to the Overview screen and find it under Endpoint.
  • ​​

Overview > Endpoint : To get the location value
  • ​​Subscription Key: This is the key that unlocks the service for our plugin use and can be obtained under Keys. There’s two of them, but it doesn’t really matter which one is used.
  • ​​Image URL: This is the path for the image that’s getting the alt​ text. Take note that the images that are sent to the API must meet specific requirements:
    • File type must be JPEG, PNG, GIF, BMP
    • ​File size must be less than 4MB
    • ​​Dimensions should be greater than 50px by 50px

Easy peasy

​​Thanks to big companies opening their services and API to developers, it’s now relatively easy for anyone to utilize computer vision. As a simple demonstration, I uploaded the image below to Microsoft Azure’s Computer Vision API.

Possible alt​ text: a hand holding a cellphone

​​The service returned the following details:

​​{
    "description": {
        "tags": [
            "person",
            "holding",
            "cellphone",
            "phone",
            "hand",
            "screen",
            "looking",
            "camera",
            "small",
            "held",
            "someone",
            "man",
            "using",
            "orange",
            "display",
            "blue"
        ],
        "captions": [
            {
                "text": "a hand holding a cellphone",
                "confidence": 0.9583763512737793
            }
        ]
    },
    "requestId": "31084ce4-94fe-4776-bb31-448d9b83c730",
    "metadata": {
        "width": 920,
        "height": 613,
        "format": "Jpeg"
    }
}

​​From there, you could pick out the alt​ text that could be potentially used for an image. How you build upon this capability is your business:

  • ​​You could create a CMS plugin and add it to the content workflow, where the alt​ text is generated when an image is uploaded and saved in the CMS.
  • ​​You could write a JavaScript plugin that adds alt​ text on-the-fly, after an image has been loaded with notably missing alt​ text.
  • ​​You could author a browser extension that adds alt​ text to images on any website when it finds images with it missing.
  • ​​You could write code that scours your existing database or repo of content for any missing alt​ text and updates them or opens pull requests for suggested changes.

​​Take note that these services are not 100% accurate. They do sometimes return a low confidence rating and a description that is not at all aligned with the subject matter. But, these platforms are constantly learning and improving. After all, Rome wasn’t built in a day.