How I Used AI to Generate Alt Text for My Website

May 20th, 2024
A humanoid robot facing a display of space-related images and information with the moon in the background.

I enjoy writing, but sometimes it can become tedious. Like writing alt text for images - who really wants to take the time to describe the content of an image? Not me. I recently got tired of writing alt text for the images on my blog, and was thinking there must be a better way to do this. Writing quality alt text is a task this is often put on the backburner. Why describe a photo as "A futuristic humanoid robot with various displays and a moon background.", when you could just write "Robot" 🙃. The good news is our A.I. overlords are more than happy to write this quality alt text for us! Here's how I implemented OpenAI's new GPT-4o model on my blog to auto-generate alt text for me.

To start, you'll need an OpenAI account and an API key. Once you've got these setup, you can begin making API requests and utilizing their models in your applications.

On my site, each blog post is created as an mdx file - essentially a markdown file that lets you drop in React components. For almost all of my blog posts, I have a banner image at the top of the page which I usually source from unsplash.com. Here's my banner image React component:

import Image from "next/image";

const PostBannerImg = async ({ url, alt }) => {
  return (
    <div
      style={{
        position: "relative",
        width: "100%",
        aspectRatio: "3 / 2",
        marginBottom: "2.5rem",
      }}
    >
      <Image
        src={url}
        alt={alt}
        priority
        fill
        quality={80}
        style={{ borderRadius: "4px", objectFit: "cover" }}
        sizes="800px"
      />
    </div>
  );
};

export default PostBannerImg;

This component works great but as you can see it takes a prop called "alt", which means any time I use this I have to manually write out some alt text to provide for the component.

Here's what I want to do:

const PostBannerImg = async ({ url }) => {
  const altText = await getAltText(url)
  return (...);
};

Since this is a server component, I can make network requests right in the component and don't have to worry about exposing API keys. I'll just add the function to get the alt text like so:

import Image from "next/image";
import OpenAI from "openai";

const getAltText = async (url) => {
  //Don't waste my money on openai requests if running locally
  if (process.env.NODE_ENV === "development") {
    return "DEV alt text here";
  }

  let altText = "";
  try {
    const openai = new OpenAI();
    const chatCompletion = await openai.chat.completions.create({
      model: "gpt-4o",
      messages: [
        {
          role: "user",
          content: [
            {
              type: "text",
              text: "In 20 words or less, tell me what's in this image.",
            },
            { type: "image_url", image_url: { url, detail: "low" } },
          ],
        },
      ],
      max_tokens: 25,
    });

    altText = chatCompletion.choices[0].message.content;
  } catch (err) {
    altText = "wilsonstaley.dev blog post";
  }

  return altText;
};

const PostBannerImg = async ({ url }) => {
  const altText = await getAltText(url)
  return (...);
};

Let's break this down a bit...

The first thing you might notice is that I'm using the OpenAI Nodejs library to interact with the API. This is a great way to interact with OpenAI's models in a Node runtime. When I initialize the OpenAI instance with const openai = new OpenAI(), this is normally where you would provide your API key. However, this library will look for an environment variable called OPENAI_API_KEY and use that by default. Since I have that var setup in my environment, I can omit it entirely.

The next thing I'll callout is the early escape if the environment is "development". When developing locally, Nextjs and many other frameworks use hot module reloading so if you make changes to the code, it will automatically update in your browser. This causes many rerenders during development. While the OpenAI API is relatively inexpensive, triggering a ton of requests during development can make the costs add up. When you create a new OpenAI account, you should get some free money in your account to try out the API and this might actually last you quite a while. But if you're like me, you'll eventually have to add money into your account to keep using the API. And if you don't want to quickly use up your funds, you'll want to reduce unecessary requests.

There's a few other things I'm doing to reduce costs as well. The new GPT-4o model is actually cheaper than the previous GPT-4-Turbo. I'm using this model, but you can refer to the full list to determine which one is best for your use case.

When I provide the image url as part of the message, I also specify that the detail should be "low".

💡

low will enable the "low res" mode. The model will receive a low-res 512px x 512px version of the image, and represent the image with a budget of 65 tokens. This allows the API to return faster responses and consume fewer input tokens for use cases that do not require high detail.

I also tell the model to give me a response in 20 words or less and set the max_tokens parameter to 25. This essentially limits the size of the response I'll get back. All of these tweaks serve as guardrails to prevent me from spamming the API and prevent the model from giving me a huge response. Of course, it's also wise to wrap the call to OpenAI in a try/catch block. In the event that OpenAI is down, or the funds in my account are depleted, I don't want my site build to bust. I'd rather have it default to giving me some generic alt text. Then once the issue is resolved, a subsequent build should get those alt tags back up!