What Is Natural Language Processing?
Learn about NLP and the challenges involved in designing NLP systems.
We'll cover the following
According to DOMO, an analytics company, there were 1.7 MB for every person on earth every second by 2020, with a staggering 4.6 billion active users on the internet. This includes roughly 500,000 tweets sent and 306 billion emails circulated every day. These figures are only going in one direction as this course is being written, and that is up! Of all this data, a large fraction is unstructured text and speech because there are billions of emails and social media content created and phone calls made every day.
NLP in daily life
These statistics provide a good basis for us to define what NLP is. Simply put, the goal of NLP is to make machines understand our spoken and written languages. Moreover, NLP is ubiquitous and is already a large part of human life. Virtual assistants (VAs), such as Google Assistant, Cortana, Alexa, and Apple Siri, are largely NLP systems. Numerous NLP tasks take place when we ask a VA, “Can you show me a good Italian restaurant nearby?” First, the VA needs to convert the utterance to text (that is, speech-to-text). Next, it must understand the semantics of the request (for example, identify the most important keywords like “restaurant” and “Italian”) and formulate a structured request (for example, cuisine = Italian, rating = 3 – 5, distance < 10 mi). Then, the VA must search for restaurants, filtering by the location and cuisine and then rank the restaurants by the ratings received. To calculate an overall rating for a restaurant, a good NLP system may look at both the rating and text description provided by each user. Finally, once the user is at the restaurant, the VA might assist the user by translating various menu items from Italian to English. This example shows that NLP has become an integral part of human life.
Complexity of NLP
It should be understood that NLP is an extremely challenging field of research because words and semantics have a highly complex nonlinear relationship, and it’s even more difficult to capture this information as a robust numerical representation. To make matters worse, each language has its own grammar, syntax, and vocabulary. Therefore, processing textual data involves various complex tasks, such as text parsing (for example, tokenization and stemming), morphological analysis, word sense disambiguation, and understanding the underlying grammatical structure of a language. For example, in these two sentences, “I went to the bank” and “I walked along the river bank,” the word “bank” has two entirely different meanings due to the context it’s used in. To distinguish or (disambiguate) the word “bank,” we need to understand the context in which the word is being used. Machine learning has become a key enabler for NLP, helping to accomplish the aforementioned tasks through machines. Below, we discuss some of the important tasks that fall under NLP.