Try this Cloud Lab to learn to use Amazon Textract and Amazon Comprehend to build an end-to-end text analysis pipeline.
Key takeaways:
Amazon Comprehend is a fully managed natural language processing (NLP) service AWS provides. It uses machine learning (ML) and deep learning techniques to analyze and understand large volumes of text data.
Amazon Comprehend can detect sentiment in text, whether in a single document, a batch of documents, or a large-scale asynchronous job.
We can perform real-time sentiment analysis by integrating Amazon Comprehend with other AWS services.
Do you know what your customers think about your product? Extracting meaningful insights can be challenging with their voices lost in mountains of unstructured text—tweets, reviews, and support emails. Businesses must turn this heap of text into actionable intelligence to make informed decisions. This is exactly where sentiment analysis can help.
Sentiment analysis is a natural language processing (NLP) technique for determining and extracting a text's emotional tone or sentiment. It helps businesses analyze written content—such as customer reviews and social media posts—and allows organizations to gauge how people feel about a topic, product, service, or brand.
For businesses, sentiment analysis, or opinion mining, is like a secret decoder ring for customer feedback. It helps them transform user feedback into actionable insights and enhance their products and services. For instance, as seen in the customer reviews above, a clothing store can assess the quality of its customer service or identify shortcomings in the premium product, highlighting areas for improvement.
Typically, the process of sentiment analysis has two main steps:
Preprocessing: This process involves breaking down the sentences into tokens, converting words into their root form using lemmatization (e.g., the root form of “ran” is “run”), and removing the stop words.
Keyword analysis: This step involves analyzing the tokens and assigning a sentiment score. The sentiment or score is a relative measure of a text's positivity, negativity, or neutrality.
There are three common approaches used to design a system for sentiment analysis.
Rule-based approach: This approach utilizes lexicons to classify every word and assign it a score. Lexicons are predefined dictionaries or lists of words and phrases associated with specific sentiments—typically positive, negative, or neutral. Each word in the lexicon is assigned a sentiment score or polarity, which quantifies its emotional tone. These scores are used to analyze text and determine its overall sentiment. This approach, though effective, is difficult to scale because lexicons require frequent updates as new terms emerge and industry-specific vocabulary expands.
ML approach: This approach trains neural networks to classify and assign scores. It is quite effective as long as the model is trained accurately.
Hybrid approach: It combines ML and rule-based approaches to improve the system's accuracy and speed up sentiment analysis.
Implementing accurate sentiment analysis at scale can be challenging due to the nuances of human language and the need for robust machine-learning models. To address these challenges, AWS offers powerful tools and services, like Amazon Comprehend, that simplify and enhance sentiment analysis.
Mastering Natural Language Processing
Natural language processing (NLP) enables computers to understand, interpret, and generate human language meaningfully, contextually, and relevantly. NLP applications, including virtual assistants like Siri and Alexa, language translation services, and more, are widespread. The field continues to evolve with ongoing research and technological advancements, making it a highly valued skill amongst machine learning engineers in the tech industry. The Skill Path begins with a comprehensive introduction to the fundamental concepts of natural language processing (NLP) and machine learning. Next, you’ll extensively cover spaCy’s (a widely used Python library for machine learning) architecture and gain hands-on experience using spaCy for real-world NLP applications. Finally, you’ll use these skills to build some applications using NLP.
Amazon Comprehend is a fully managed natural language processing (NLP) service that Amazon Web Services (AWS) provides. It uses machine learning (ML) and deep learning techniques to help businesses extract insights from large volumes of text data. It is typically used to perform the following tasks:
Entity recognition: It is a technique for identifying and classifying specific entities in text into predefined categories. These entities typically include names of people, organizations, locations, dates, percentages, monetary values, and more. Amazon Comprehend also supports custom entity recognition, allowing businesses to detect domain-specific terms like product names, brands, or social security numbers (SSNs).
Language detection: AWS Comprehend makes it easy to detect the language of a given text by using its detect_dominant_language
function (via an API call).
Sentiment analysis: Used to identify the general sentiment in a text, such as positive, negative, or neutral. We can also use Comprehend for targeted sentiment analysis to understand sentiments toward certain entities more granularly. For example, what do people think about the latest iPhone design or the new Marvel movie released this holiday season?
PII identification and redaction: Personally identifiable information refers to any information that can be used to identify an individual. Amazon Comprehend helps detect and remove PII from datasets and text.
Toxicity detection: Comprehend can detect toxicity in text-based documents using simple NLP-based techniques.
Amazon Comprehend simplifies sentiment analysis by providing an API endpoint that can easily be integrated into applications. We can use the following APIs to perform sentiment analysis on a given simple sentence or a set of documents.
The DetectSentiment
operation identifies the sentiment of a single piece of text as Positive
, Negative
, Neutral
, or Mixed
. We can invoke the API using the following command:
aws comprehend detect-sentiment \--text "I am extremely happy with the service!" \--language-code "en"
Here, the --text
flag is used to provide the input text for analysis, and --language-code
defines the language of the text. The output of the above command will be as shown below:
{"Sentiment": "POSITIVE","SentimentScore": {"Positive": 0.95,"Negative": 0.01,"Neutral": 0.02,"Mixed": 0.02}}
The BatchDetectSentiment
operation analyzes sentiments for multiple texts in a single API call. We can invoke the API using the command:
aws comprehend batch-detect-sentiment \--text-list file://input_texts.json \--language-code "en"
Here, the --text-list
flag defines the path to the JSON file containing the sentiments for analysis. An example of the input_texts.json
file is shown below:
["I love this product!","The service was terrible.","I feel okay about the purchase."]
The command will return an output similar to the one shown below:
{"ResultList": [{"Index": 0,"Sentiment": "POSITIVE"},{"Index": 1,"Sentiment": "NEGATIVE"},{"Index": 2,"Sentiment": "NEUTRAL"}]}
The StartSentimentDetectionJob
operation is used for asynchronous sentiment analysis on large datasets stored in Amazon S3. The API can be invoked using the command given below:
aws comprehend start-sentiment-detection-job \--input-data-config S3Uri="s3://your-bucket/input/" InputFormat="ONE_DOC_PER_LINE" \--output-data-config S3Uri="s3://your-bucket/output/" \--data-access-role-arn "arn:aws:iam::*:role/ComprehendDataAccessRole" \--language-code "en" \--job-name "SentimentJob"
The flags are used to specify the following:
--input-data-config
: Specifies the input S3 URI and input format (ONE_DOC_PER_LINE
ONE_DOC_PER_FILE
--output-data-config
: This is the S3 URI where results will be saved.
--data-access-role-arn
: This is an
--language-code
: This is the language of the input data (e.g., “en”).
--job-name
: This is a unique name for the job.
If successfully executed, the output of the command is as follows:
{"JobId": "1234567890abcdef","JobStatus": "SUBMITTED"}
These are just the ways to perform sentiment analysis on data in offline or batch processing mode.
Traditionally, businesses have taken the “store now, analyze later” approach to customer feedback. They build architectures to collect and stash feedback in a data warehouse, leaving the heavy lifting of analysis to data scientists.
This approach is also utilized for sentiment analysis, but it’s slow. Data can take days to flow through the pipeline to the data warehouse and wait for its turn to be analyzed in batches. When those insights about customer sentiment make their way into a report, hours or even days have passed.
In today’s fast-moving world, delayed insights might as well be no insights. Businesses need real-time answers, not yesterday’s news.
Companies can gain a competitive edge in a fast-paced business environment by analyzing real-time customer feedback. For instance, customer service teams can quickly identify dissatisfaction and take immediate action, improving customer experience and strengthening brand loyalty.
Amazon Comprehend enables this by seamlessly integrating into business workflows to extract sentiment insights from text. Connecting with services like Amazon Kinesis for real-time data streaming and AWS Lambda for serverless processing enables instant analysis without complex infrastructure.
Consider a restaurant that uses the Comprehend API to analyze user reviews as they come in, as illustrated below:
Users post their sentiments on the restaurant’s mobile application, which is captured in real-time by Amazon Kinesis data streams. Adding kinesis data streams triggers the Lambda function, which performs sentiment analysis on review and pushes it to an S3 bucket.
Beyond analyzing overall sentiment, we can use targeted sentiment analysis to detect customer opinions on specific menu items or branch locations.
For example, a customer might leave a review such as:
“I enjoyed the Penne Arrabbiata—it was flavorful. However. The Chicken Kiev was bland.”
Or maybe a customer could leave a review like;
“My food at the Seattle restaurant was really good, but the service was slow.”
Here, targeted sentiment analysis can help us identify in real time whether Chicken Kiev has been bland lately or whether the service at the Seattle restaurant experienced service issues last night.
This architecture can be further improved by integrating a few services. For example, by integrating SNS and SQS services, we can immediately alert management about slow service reviews.
We can also add an Amazon QuickSight dashboard to provide visualizations and color-coded trends, enhancing your analysis.
Zillow is a leading online real estate marketplace in the United States, offering services for buying, selling, renting, and financing homes. They wanted to build an application to analyze customer sentiment in support calls.
Zillow effectively utilized Amazon Transcribe and Amazon Comprehend to convert audio files into text and perform sentiment analysis to achieve this. The diagram below illustrates their infrastructure:
Let’s briefly overview how everything comes together to make the application work:
S3 bucket: Recorded phone calls are stored in an S3 bucket, which triggers a Step function using an Amazon EventBridge rule.
Step Functions: Orchestrates the workflow by invoking two key services:
Amazon Transcribe: Converts audio into text and removes PII (Personally Identifiable Information) for customer privacy.
Amazon Comprehend: Performs sentiment analysis on the transcribed text and stores the results in an S3 bucket
Amazon S3 (intermediate storage): Stores sentiment analysis results, enriching them with call metadata (e.g., location, service type)
ElasticSearch: Stores the enriched data, enabling internal teams to search using features like phone numbers.
This workflow demonstrates how AWS services can be combined to build end-to-end speech analytics solutions. Whether using legacy monolithic applications or serverless architectures, you can easily integrate sentiment analysis into your systems by invoking the Amazon Comprehend API in just a few steps.
Learn the A to Z of Amazon Web Services (AWS)
Learn about the core AWS's services like compute, storage, networking services and how they work with other services like Identity, Mobile, Routing, and Security. This course provides you with a good grasp an all you need to know of AWS services. This course has been designed by three AWS Solution Certified Architects who have a combined industry experience of 17 years. We aim to provide you with just the right depth of knowledge you need to have.
Sentiment analysis is a transformative tool for businesses, allowing them to extract actionable insights from unstructured text data. By leveraging techniques like natural language processing and tools like Amazon Comprehend, organizations can effectively analyze customer feedback, social media posts, and reviews to gauge public sentiment. This enables better decision-making, improved customer experiences, and the development of more targeted strategies, making sentiment analysis an essential component of modern business intelligence.
Ready to see sentiment analysis in action? Try the Educative Cloud Lab and start building your sentiment analysis pipeline today!
What kind of text can Amazon Comprehend analyze for sentiment?
How accurate is sentiment analysis with Amazon Comprehend?
Can I customize sentiment analysis for my specific needs?
Can we do a sentimental analysis on audio files in AWS?
Free Resources