Image Analysis Using Claude 3.5 Sonnet Model

In my article on Image Analysis Using OpenAI GPT-4o Model, I explained how GPT-4o model allows you to analyze images and answer questions related images precisely.

In this article, I will show you how to analyze images with the Anthropic Claude 3.5 Sonnet model, which has shown state-of-the-art performance for many text and vision problems. I will also share my insights on how Claude 3.5 Sonnet compares with GPT-4o for image analysis tasks. So, let's begin without ado.

Importing Required Libraries

You will need to install the anthropic Python library to access the Claude 3.5 Sonnet model in this article. In addition, you will need the Anthropic API key, which you can obtain here.

The following script installs the Anthropic Python library.


!pip install anthropic

The script below imports all the Python modules you will need to run scripts in this article.


import os
import base64
from IPython.display import display, HTML
from IPython.display import Image
from anthropic import Anthropic

General Image Analysis

Let's first perform a general image analysis. We will analyze the following image and ask Claude 3.5 Sonnet if it shows any potentially dangerous situation.


# image source: https://healthier.stanfordchildrens.org/wp-content/uploads/2021/04/Child-climbing-window-scaled.jpg

image_path = r"D:\Datasets\sofa_kid.jpg"
img = Image(filename=image_path, width=600, height=600)
img

Output:

Note: For comparison, the images we will analyze in this article are the same as those we analyzed with GPT-4o.

Next, we will define a method that converts an image into Base64 format. The Claude 3.5 Sonnet model expects image inputs to be in Base64 format.

We also define an object of the Anthropic client. We will call the Claude 3.5 Sonnet model using this client object.


def encode_image64(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

base64_image = encode_image64(image_path)
image1_media_type = "image/jpeg"

client = Anthropic(api_key = os.environ.get('ANTHROPIC_API_KEY'))

We will define a helper function analyze_image() that accepts text query as a parameter. Inside the image function, we call the message.create() method of the Anthropic client object. We set the model value to claude-3-5-sonnet-20240620, which is the id for the Claude 3.5 Sonnet model. The temperature is set to 0 since we want a fair comparison with the GPT-4o model. Finally, We set the system prompt and then pass the image and the text query to the messages list.

We ask the Claude 3.5 Sonnet model to identify any dangerous situation in the image.


def analyze_image(query):
    message = client.messages.create(
        model="claude-3-5-sonnet-20240620",
        temperature = 0,
        max_tokens=1024,
        system="You are a baby sitter.",
        messages=[
             {
                "role": "user",
                "content": [
                    {
                        "type": "image",
                        "source": {
                            "type": "base64",
                            "media_type": image1_media_type,
                            "data":  base64_image,
                        },
                    },
                    {
                        "type": "text",
                        "text": query
                    }
                ],
            }
        ],
    )
    return message

response_content = analyze_image("Do you see any dangerous situation in the image? If yes, how to prevent it?")
print(response_content.content[0].text)

Output:

The above output shows that the Claude 3.5 Sonnet model has identified a dangerous situation and provided some suggestions.

Compared to GPT-4o, which gave five suggestions, Claude 3.5 Sonnet provided seven suggestions and a more detailed response.

Graph Analysis

Next, we will perform the graph Analysis task using Claude 3.5 Sonnet and summarize the following graph.


# image path: https://globaleurope.eu/wp-content/uploads/sites/24/2023/12/Folie2.jpg

image_path = r"D:\Datasets\Folie2.jpg"
img = Image(filename=image_path, width=800, height=800)
img

Output:


base64_image = encode_image64(image_path)

def analyze_graph(query):
    message = client.messages.create(
        model = "claude-3-5-sonnet-20240620",
        temperature = 0,
        max_tokens = 1024,
        system = "You are a an expert graph and visualization expert",
        messages = [
             {
                "role": "user",
                "content": [
                    {
                        "type": "image",
                        "source": {
                            "type": "base64",
                            "media_type": image1_media_type,
                            "data":  base64_image,
                        },
                    },
                    {
                        "type": "text",
                        "text": query
                    }
                ],
            }
        ],
    )
    return message.content[0].text

response_content = analyze_graph("Can you summarize the graph?")
print(response_content)

Output:

The above output shows the graph's summary. Though Claude 3.5 Sonnet here is more elaborative than GPT-4o, I found the GPT-4o summary better as it categorized the countries into high, moderate, and lower debt levels.

Next, I asked Claude 3.5 Sonnet to create a table showing countries against their debts.


response_content = analyze_graph("Can you convert the graph to table such as Country -> Debt?")
print(response_content)

Output:

The results obtained with Claude 3.5 Sonnet were astonishingly accurate compared to GPT-4o. For example, GPT-4o showed Estonia having a debt of 10% of its GDP, whereas Claude 3.5 Sonnet depicted Estonia as having a debt of 19.2%. If you look at the Graph, you will see that Claude 3.5 Sonnet is extremely accurate here.

Claude 3.5 Sonnet is a clear winner for Graph Analysis.

Image Sentiment Prediction

Next, we will predict facial sentiment using Claude 3.5 Sonnet. Here is the sample image.


# image path: https://www.allprodad.com/the-3-happiest-people-in-the-world/

image_path = r"D:\Datasets\happy_men.jpg"
img = Image(filename=image_path, width=800, height=800)
img

Output:


base64_image = encode_image64(image_path)

def predict_sentiment(query):
    message = client.messages.create(
        model = "claude-3-5-sonnet-20240620",
        temperature = 0,
        max_tokens = 1024,
        system = "You are helpful psychologist.",
        messages = [
             {
                "role": "user",
                "content": [
                    {
                        "type": "image",
                        "source": {
                            "type": "base64",
                            "media_type": image1_media_type,
                            "data":  base64_image,
                        },
                    },
                    {
                        "type": "text",
                        "text": query
                    }
                ],
            }
        ],
    )
    return message.content[0].text

response_content = predict_sentiment("Can you predict facial sentiment from the input image?")
print(response_content)

Output:

The above output shows that Claude 3.5 Sonnet provided detailed information about the sentiment expressed in the image. GPT-4o, on the other hand, was more precise.

I will again go with Claude 3.5 Sonnet here as the first choice for sentiment classification.

Analyzing Multiple Images

Finally, let's see how Claude 3.5 Sonnet fairs for analyzing multiple images.
We will compare the following two images for sentiment predictions.


from PIL import Image
import matplotlib.pyplot as plt

# image1_path: https://www.allprodad.com/the-3-happiest-people-in-the-world/
# image2_path: https://www.shortform.com/blog/self-care-for-grief/

image_path1 = r"D:\Datasets\happy_men.jpg"
image_path2 = r"D:\Datasets\sad_woman.jpg"


# Open the images using Pillow
img1 = Image.open(image_path1)
img2 = Image.open(image_path2)

# Create a figure to display the images side by side
fig, axes = plt.subplots(1, 2, figsize=(10, 5))

# Display the first image
axes[0].imshow(img1)
axes[0].axis('off')  # Hide axes

# Display the second image
axes[1].imshow(img2)
axes[1].axis('off')  # Hide axes

# Show the plot
plt.tight_layout()
plt.show()

Output:


base64_image1 = encode_image64(image_path1)
base64_image2 = encode_image64(image_path2)


def predict_sentiment(query):
    message = client.messages.create(
        model = "claude-3-5-sonnet-20240620",
        temperature = 0,
        max_tokens = 1024,
        system = "You are helpful psychologist.",
        messages = [
             {
                "role": "user",
                "content": [
                    {
                        "type": "image",
                        "source": {
                            "type": "base64",
                            "media_type": image1_media_type,
                            "data":  base64_image1,
                        },
                    },
                    {
                        "type": "image",
                        "source": {
                            "type": "base64",
                            "media_type": image1_media_type,
                            "data":  base64_image2,
                        },
                    },
                    {
                        "type": "text",
                        "text": query
                    }
                ],
            }
        ],
    )
    return message.content[0].text

response_content = predict_sentiment("Can you explain all the differences in the two images?")
print(response_content)

Output:

The above output shows the image comparison results achieved via Claude 3.5 Sonnet. I found that for image comparison, the results I obtained with GPT-4o in my previous article were better than those of Claude 3.5 Sonnet. In my opinion, GPT-4o is a better model for image comparison tasks.

Conclusion

The Claude 3.5 Sonnet Model is a state-of-the-art model for text and vision tasks. In this article, I explained how to analyze images with Claude 3.5 Sonnet. Compared with GPT-4o, I found Claude 3.5 Sonnet better for general image and graph analysis tasks. On the contrary, GPT-4o achieved better results for image summarization and comparison tasks. I urge you to test both these models and share your results.