How to find similarity between two words using NLP

In this shot, we are going to build an NLP engine that will show similarity between two given words.

For this, we are going to use Gensim’s word2vec model. Gensim provides an optimum implementation of word2vec’s CBOW model and Skip-Gram model.

Similarity between two words

Before moving on, you need to download the word2vec vectors.

Click here to download the vectors. Remember the file size is ~1.5GB.

We suggest you work on Google Colab for this, as the file size is very large.

Open your Google Colab and run the command below to get your word vectors.

!wget -P /root/input/ -c "https://s3.amazonaws.com/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz"

This command will download it on Google servers and save a lot of time.

Now, let’s install the packages we require.

pip install gensim
pip install scikit-learn

You can run the above command in both Google Colab and on your local machine (if you’re using that).

Lets move on to the coding part by first importing the packages, as shown below.

Explanation:

In line 1, we loaded the word2vec model. This is the word2vec model that was trained on the Google News dataset. It has been trained to create vectors of 300 dimensions.
In line 3 and line 4, we tried to get the word vectors for banana and mango.
In line 5, we used the cosine_similarity() function and computed the similarity by passing the two vectors.

You will see an output similar to this below.

array([[0.63652116]], dtype=float32)

The above means that both of the words are around 63% similar.

Note: if you try to get the vectors for words that are not in the vocabulary, you will get an error. You can solve this by training the model using your dataset.

This is how transfer learning is implemented in NLP. If you want to learn more about transfer learning, check out the shots below:

License: Creative Commons-Attribution-ShareAlike 4.0 (CC-BY-SA 4.0)

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

TRENDING TOPICS

Learn to Code

Tech Interview Prep

Generative AI

Data Science

Machine Learning

GitHub Students Scholarship

Early Access Courses

Blind 75

Pricing

For Individuals

Try for Free

Gift a Subscription

CONTRIBUTE

Become an Author

Become an Affiliate

Earn Referral Credits

RESOURCES

Blog

Cheatsheets

Webinars

Answers

ABOUT US

Our Team

Careers

Hiring

Frequently Asked Questions

Press

LEGAL

Cookie Policy

Business Terms of Service

Data Processing Agreement

INTERVIEW PREP COURSES

Grokking the Modern System Design Interview

Grokking the Product Architecture Design Interview

Grokking the Coding Interview Patterns

Machine Learning System Design