What is NLP?

Natural Language Processing is an area of computer science and machine learning also referred to as 'Artificial Intelligence' that explores how to use the interaction of natural language between humans and computers. Understanding human language is considered a difficult task for computers, due to its complexity. For example, there are several different ways to arrange words in a sentence. Words can have several meanings and, therefore, contextual information is necessary to correctly interpret sentences. Every language is more or less unique and ambiguous, making human language a complex system to interpret in computer terms.

The ultimate goal of NLP is to help computers understand natural language - in any language - as humans do. It is the driving force behind concepts like virtual assistants, speech recognition, sentiment analysis, automatic text summarization, machine translation, to name some fields of its use.

What is Nuix NLP?

Nuix NLP (Natural Language Processing) integrates with Nuix Workstation and the Nuix Engine so that you can send text items NLP analysis, and automatically retrieve the enriched results in a case.

At its core, NLP is a two-step process, being:

Detection of the entities in the text.

Classification of entities into categories.

For a definition of 'entity', see the following Key terms related to NLP section.

Nuix NLP in Nuix Workstation provides an automated content analytics platform combining artificial intelligence (AI) and machine learning-driven natural language processing (NLP) technologies with a proprietary pre-trained language model for reading, interpreting, and analyzing unstructured text.

When searching for content risks related to data privacy, sensitive and proprietary information, and security and compliance liabilities, using Nuix NLP to analyze selected Nuix Workstation items saves an enormous amount of effort through its:

No-code user interface that empowers non-technical domain experts to build and modify models without engineering or data science skills

Built-in models for text classifications and extractions

User-driven risk engine

What are the benefits of integrating Nuix NLP in Nuix Workstation?

Nuix NLP relies on a Document and Word Embedding methodology. Representing documents and words as points in Semantic Space makes it an easy and efficient way to:

Accelerate your content intelligence with automated clustering, that is through grouping based on content patterns.

Identify risky content by combining AI with internal weighting based on what is important to your organization. This is helpful for a wide range of use cases including compliance, insider threats, fraud, intellectual property loss, litigation, regulatory action, or any other content liabilities.

Minimize labor-intensive reading and analysis by applying accurate, explainable, and validated AI models that dramatically reduce false positives, human error, and bias.

Key terms related to NLP

NLP uses Compound Lexemes (CLs) to identify, amongst others things, PII and PHI.

Lexeme: A fundamental unit of the lexicon (or word stock) of a language, and often an individual word (a simple lexeme or dictionary word). A single dictionary word (for example, talk) may have a number of inflectional forms (talks, talked, talking).

Compound lexeme: A multiword or composite lexeme comprises more than one word, such as a phrasal verb (for example, speak up; pull through), or an open compound (fire engine; couch potato).

Entity: Basically, the thing that is consistently talked about or referred to in the text. NER (Named Entity Recognition) is the form of NLP.

PII: Personal Identifiable Information, being any type of data that can be used to identify someone, from their name and address to their phone number or passport, for example.

PHI: Protected Health Information, being data collected by health professionals during medical visits. For example, when matching a "Credit Card Number" compound lexeme, Nuix NLP also provides the following sub fields for the related named items: the individual CC number, CVV code, and expiration date. This allows users to query 'credit card numbers', 'expiration dates', and 'cvv codes'.