How to analyze text using Amazon Comprehend

Key takeaways:

  • AWS Comprehend is used for text analysis.

  • AWS Comprehend is used for sentiment analysis, entity recognition, key phrase detection, language detection, and topic modeling.

  • Comprehend uses prebuilt machine learning algorithms to extract insights from data; however, we can also use custom models tailored to our specific use case.

  • AWS Comprehend can identify and remove Personally Identifiable Information (PII) from datasets.

Amazon Comprehend is a text analysis service provided by AWS that uses multiple ML models to extract insights from text. In this Answer, we’ll examine how to use a few of the built-in models.

How does AWS comprehend work?

AWS Comprehend is trained using vast data to analyze text, so we do not have to train the model before using the service. However, we can also use our custom AWS Comprehend models to perform data analysis tailored to our requirements.

What are the uses of AWS Comprehend?

AWS Comprehend is extensively used for multiple text-related tasks. Let’s explore different tasks and learn which command we can use to analyze text using comprehend:

1. Language detection

Comprehend uses language identifiers from RFS 5646 and can detect multiple languages. Additionally, it can break down the sentence to identify different parts of speech.

The command given below is used to detect the language in the sentence:

This widget is not supported in dev-mode. Kindly enable it or run using yarn webapp:dev-widgets.

2. Sentiment analysis

We can use Amazon Comprehend to find out the sentiment of the text. It groups sentiments into the following categories:

  • Positive

  • Negative

  • Mixed

  • Neutral

Let’s execute the command below in the terminal at the bottom to use this feature. This command uses Amazon Comprehend to determine the sentiment of the following text:

This widget is not supported in dev-mode. Kindly enable it or run using yarn webapp:dev-widgets.

3. Entity detection

We can use Amazon Comprehend to get information about entities detected in the provided text, such as people, organizations, locations, and dates. Amazon Comprehend can detect the following entities:

  • COMMERCIAL_ITEM

  • DATE

  • EVENT

  • LOCATION

  • ORGANIZATION

  • PERSON

  • TITLE

The code below uses this feature to detect entities in the text, as shown below:

This widget is not supported in dev-mode. Kindly enable it or run using yarn webapp:dev-widgets.

When we run the command, we get two important parameters in response; Score and Type. The type defines the type of the detected entity. For example, Mexico will be an entity of type LOCATION in the example given above. Similarly, the score associated with each detected entity represents the degree of assurance that Amazon Comprehend has over the accuracy of the entity type detection. With this score, we can prevent inaccurate detections.

We can leverage Amazon Comprehend’s entity recognition capabilities to identify PII entities within the text and then implement a process to redact or remove those entities from the file.

4. Topic modeling

Topic modeling can categorize multiple documents based on their topic. For example, we can use it to categorize news articles into nature, politics, medicine, etc. AWS Comprehend analyzes each word in a document. The set of words frequently corresponding to a particular context makes up the topic.

On AWS Comprehend, topic modeling is an asynchronous process. We provide the list of documents stored in an S3 bucket to the StartTopicsDetectionJob operation, which returns results to an output S3 bucket.

This widget is not supported in dev-mode. Kindly enable it or run using yarn webapp:dev-widgets.

Let’s see how we can create a Comprehend topic detection job using AWS CLI. First, set up input and output S3 buckets and add documents to the input S3 bucket. Also, create an IAM role that gives our topic detection job permission to read and write to the S3 buckets.

This widget is not supported in dev-mode. Kindly enable it or run using yarn webapp:dev-widgets.

Here the cli-input-json file will contain the JSON configurations of the input and output S3 buckets as well as the role with permissions to access the buckets:

This widget is not supported in dev-mode. Kindly enable it or run using yarn webapp:dev-widgets.

We’ll pass the ARN of the IAM role with read-and-write access as the DataAccessRoleArn in this configuration file.

Hands-on exercise

Enter your AWS access_key_id and secret_access_key in the widget below before running any commands. If you don’t have these keys, follow the steps in this documentation to generate them.

Note: The IAM user whose credentials are used must have permission to perform all the required actions.

After the successful configuration of AWS, try out the above “language detection,” “sentiment analysis,” and “entity detection” commands in the terminal below:

This widget is not supported in dev-mode. Kindly enable it or run using yarn webapp:dev-widgets.
This widget is not supported in dev-mode. Kindly enable it or run using yarn webapp:dev-widgets.

Free Resources

Copyright ©2026 Educative, Inc. All rights reserved