Natural language processing (NLP) is an area of computer science and artificial intelligence concerned with the interaction between computers and humans in natural language. The ultimate goal of NLP is to help computers understand language as well as we do. It is the driving force behind things like virtual assistants, speech recognition, sentiment analysis, automatic text summarization, machine translation, and much more. In this post, we'll cover the basics of natural language processing, dive into some of its techniques, and also learn how NLP has benefited from recent advances in Deep Learing.
Introduction
Natural language processing (NLP) is the intersection of computer science, linguistics, and machine learning. The field focuses on communication between computers and humans in natural language and NLP is all about making computers understand and generate human language. Applications of NLP techniques include voice assistants like Amazon's Alexa and Apple's Siri, but also things like machine translation and text-filtering.
NLP is Artificial Intelligence or Machine Learning or a Deep Learning
The answer is here. The question itself is not fully correct! Sometimes people incorrectly use the terms AI, ML, and DL. Why not simplify those first and then come back.
Clearing the Confusion: AI vs. Machine Learning vs. Deep Learning Differences
The commencement of modern AI can be traced to classical philosophers’ attempts to describe human thinking as a symbolic system. But the field of AI wasn’t formally founded until 1956, at a conference at Dartmouth College, in Hanover, New Hampshire, where the term “artificial intelligence” was coined.
NLP: How Does NLP Fit into the AI World
With a basic understanding of Artificial Intelligence, Machine Learning, and Deep Learning, let's revisit our very first query NLP is Artificial Intelligence or Machine Learning or Deep Learning? The words AI, NLP, and ML (machine learning) are sometimes used almost interchangeably. However, there is an order to the madness of their relationship. Hierarchically, natural language processing is considered a subset of machine learning while NLP and ML both fall under the larger category of artificial intelligence.
Natural Language Processing combines Artificial Intelligence (AI) and computational linguistics so that computers and humans can talk seamlessly.
NLP endeavors to bridge the divide between machines and people by enabling a computer to analyze what a user said (input speech recognition) and process what the user meant. This task has proven quite complex.
To converse with humans, a program must understand syntax (grammar), semantics (word meaning), and morphology (tense), pragmatics (conversation). The number of rules to track can seem overwhelming and explains why earlier attempts at NLP initially led to disappointing results. With a different system in place, NLP slowly improved moving from a cumbersome-rule based to a pattern learning-based computer programming methodology. Siri appeared on the iPhone in 2011. In 2012, the new discovery of the use of graphical processing units (GPU) improved digital neural networks and NLP.
NLP empowers computer programs to comprehend unstructured content by utilizing AI and machine learning to make derivations and give context to language, similar as what human brains do. It is a device for revealing and analyzing the “signals” covered in unstructured information. Organizations would then be able to get a deeper comprehension of public perception around their products, services, and brand, just like those of their rivals.
Now Google has released its own neural-net-based engine for eight language pairs, closing much of the quality gap between its old system and a human translator and fuelling increasing interest in the technology. Computers today can already produce an eerie echo of human language if fed with the appropriate material. Over the past few years, Deep Learning (DL) architectures and algorithms have made impressive advances in fields such as image recognition and speech processing. Their application to Natural Language Processing (NLP) was less impressive at first but has now proven to make significant contributions, yielding state-of-the-art results for some common NLP tasks. Named entity recognition (NER), part of speech (POS) tagging or sentiment analysis are some of the problems where neural network models have outperformed traditional approaches. The progress in machine translation is perhaps the most remarkable of all.
NLP is not Just About Creating Intelligent bots
NLP is a tool for computers to analyze, comprehend, and derive meaning from natural language in an intelligent and useful way. This goes way beyond the most recently developed chatbots and smart virtual assistants. In fact, natural language processing algorithms are everywhere from search, online translation, spam filters, and spell checking.
Components of NLP
NLP can be divided into two basic components.
Natural Language Understanding
Natural Language Generation
Natural Language Understanding (NLU)
NLU is naturally harder than NLG tasks. Really? Let’s see what are all challenges faced by a machine while understanding. There are lot of ambiguity while learning or trying to interpret a language.
Lexical Ambiguity can occur when a word carries different sense, i.e. having more than one meaning and the sentence in which it is contained can be interpreted differently depending on its correct sense. Lexical ambiguity can be resolved to some extent using parts-of-speech tagging techniques.
Syntactical Ambiguity means when we see more than one meaning in a sequence of words. It is also termed as grammatical ambiguity.
Referential Ambiguity: Very often a text mentions as entity (something/someone), and then refers to it again, possibly in a different sentence, using another word. Pronoun causing ambiguiyty when it is not clear which noun it is refering to
Natural Language Generation (NLG)
It is the process of producing meaningful phrases and sentences in the form of natural language from some internal representation.
Text planning
It includes retrieving the relevant content from the knowledge base.
Sentence planning
It includes choosing required words, forming meaningful phrases, and setting the tone of the sentence.
Text Realization
It is mapping sentence plans into sentence structure.
Levels of NLP
In the previous sections, we have discussed different problems associated with NLP. Now let us see what are all the typical steps involved while performing NLP tasks. We should keep in mind that the below section describes some standard workflow, it may however differ drastically as we do real-life implementations basis on our problem statement or requirements.
Phonological Analysis:
This level is applied only if the text's origin is a speech. It deals with the interpretation of speech sounds within and across words. Speech sound might give a big hint about the meaning of a word or a sentence
Morphological Analysis:
Deals with understanding distinct words according to their morphemes ( the smallest units of meanings). Take, for example, the word: “unhappiness ”. It can be broken down into three morphemes (prefix, stem, and suffix), with each conveying some form of meaning: the prefix un- refers to “not being”, while the suffix -ness refers to “a state of being”. The stem happy is considered as a free morpheme since it is a “word” in its own right. Bound morphemes (prefixes and suffixes) require a free morpheme to which they can be attached to, and can therefore not appear as a “word” on their own.
Lexical Analysis:
It involves identifying and analyzing the structure of words. Lexicon of a language means the collection of words and phrases in a language. Lexical analysis is dividing the whole chunk of text into paragraphs, sentences, and words. In order to deal with lexical analysis, we often need to perform Lexicon Normalization.
Stemming: Stemming is a rudimentary rule-based process of stripping the suffixes (“ing”, “ly”, “es”, “s” etc) from a word.
Lemmatization: Lemmatization, on the other hand, is an organized & step by step procedure of obtaining the root form of the word, it makes use of vocabulary (dictionary importance of words) and morphological analysis (word structure and grammar relations).
Syntactic Analysis:
Deals with analyzing the words of a sentence so as to uncover the grammatical structure of the sentence. E.g.. "Colourless green idea." This would be rejected by the Symantec analysis as colorless here; green doesn't make any sense. Syntactical parsing involves the analysis of words in the sentence for grammar and their arrangement in a manner that shows the relationships among the words. Dependency Grammar and Part of Speech tags are the important attributes of text syntactic.
Semantic Analysis:
Determines the possible meanings of a sentence by focusing on the interactions among word-level meanings in the sentence. Some people may think it’s the level that determines the meaning, but actually, all the levels do. The semantic analyzer disregards sentences such as “hot ice cream”.
Discourse Integration:
Focuses on the properties of the text as a whole that convey meaning by making connections between component sentences. It means a sense of the context. The meaning of any single sentence depends upon that sentence. It also considers the meaning of the following sentence. For example, the word "that" in the sentence "He wanted that" depends upon the prior discourse context.
Pragmatic Analysis:
Explains how extra meaning is read into texts without actually being encoded in them. This requires much world knowledge, including the understanding of intentions, plans, and goals. Consider the following two sentences:
The city police refused the demonstrators a permit because they feared violence.
The city police refused the demonstrators a permit because they advocated revolution.
The meaning of “they” in the 2 sentences is different. In order to figure out the difference, world knowledge in knowledge bases and inference modules should be utilized. Pragmatic analysis helps users to discover this intended effect by applying a set of rules that characterize cooperative dialogues. E.g., "close the window?" should be interpreted as a request instead of an order.