Image Description, Category, and Tag

Learn how to get the description, category, and tags of an image.

In this lesson, we’ll go over three image analysis features and compare how they work on the same image. We’ll use the following image throughout this lesson:

Describe an image

An exciting feature of Microsoft Computer Vision is that it can describe an entire image in human-readable language using complete sentences. The algorithm works like this:

  1. It generates various descriptions based on the objects identified in the image.
  2. It evaluates and assigns a confidence score to each description.
  3. Finally, it returns a list of descriptions in descending order of confidence score.

To get the description of an image, we use the describe_image method. Let’s use it on our reference image.

Note: The image URL is provided at line 5. Feel free to change it and test the API on other images.

Press + to interact
# Authenticate the client
computervision_client = ComputerVisionClient("{{ENDPOINT}}", CognitiveServicesCredentials("{{SUBSCRIPTION_KEY}}"))
# Provide image URL
remote_image_url = "https://images.pexels.com/photos/356065/pexels-photo-356065.jpeg?auto=compress&cs=tinysrgb&dpr=2&h=750&w=1260"
# Call API
description_results = computervision_client.describe_image(remote_image_url)
# Get the captions (descriptions) from the response, with confidence level
print("Description of remote image: ")
if (len(description_results.captions) == 0):
print("No description detected.")
else:
for caption in description_results.captions:
print("'{}' with confidence {:.2f}%".format(caption.text, caption.confidence * 100))

For our reference image, the algorithm returns only one description with a confidence of 47.91%. The confidence level might seem low, but the description is pretty accurate.

Apply content tags to images

Microsoft Computer Vision returns tags based on the objects, living beings, and actions identified in the image. Tagging is not limited to the main subject, such as a person in the foreground, but includes background details like the setting (indoor or outdoor), furniture, tools, plants, animals, accessories, and gadgets.

Tags are not organized as taxonomy, and no inheritance hierarchies exist. Content tags form the foundation for an image description. They are used to create human-understandable descriptions when we call the describe_image method.

Note: At this point, English is the only supported language for the image description feature.

To get the tags of an image, we use the tag_image method. Let’s use it on our reference image.

Note: The image URL is provided at line 5. Feel free to change it and test the API on other images.

Press + to interact
# Authenticate the client
computervision_client = ComputerVisionClient("{{ENDPOINT}}", CognitiveServicesCredentials("{{SUBSCRIPTION_KEY}}"))
# Provide image URL
remote_image_url = "https://images.pexels.com/photos/356065/pexels-photo-356065.jpeg?auto=compress&cs=tinysrgb&dpr=2&h=750&w=1260"
# Call API with remote image
tags_result_remote = computervision_client.tag_image(remote_image_url)
# Print results with confidence score
print("Tags in the remote image: ")
if (len(tags_result_remote.tags) == 0):
print("No tags detected.")
else:
for tag in tags_result_remote.tags:
print("'{}' with confidence {:.2f}%".format(tag.name, tag.confidence * 100))

The above code returns twelve tags. Out of those twelve, six tags have a confidence rating greater than 90%.

Categorize images by subject matter

In addition to the description, Microsoft Computer Vision can categorize an image broadly or specifically. These categories are in a parent-child hereditary hierarchy. Unlike the thousands of available tags, there are only 86 categories for the algorithm to use.

Note: Category names are only in English.

Here is the complete list:

To get the category of an image, we use the analyze_image method. In addition to that, we have to specify the feature that we want to analyze, which in this case is "categories". Let’s use it on the image shown at the start of the lesson.

Note: The image URL is provided at line 5. Feel free to change it and test the API on other images.

Press + to interact
# Authenticate the client
computervision_client = ComputerVisionClient("{{ENDPOINT}}", CognitiveServicesCredentials("{{SUBSCRIPTION_KEY}}"))
# Provide image URL
remote_image_url = "https://images.pexels.com/photos/356065/pexels-photo-356065.jpeg?auto=compress&cs=tinysrgb&dpr=2&h=750&w=1260"
# Select the visual feature(s) you want.
remote_image_features = ["categories"]
# Call API with URL and features
categorize_results_remote = computervision_client.analyze_image(remote_image_url , remote_image_features)
# Print results with confidence score
print("Categories from remote image: ")
if (len(categorize_results_remote.categories) == 0):
print("No categories detected.")
else:
for category in categorize_results_remote.categories:
print("'{}' with confidence {:.2f}%".format(category.name, category.score * 100))