Create Compound Lexemes and Regex

Simple Item Types

Within compound lexemes seven different basic components are detected by the system.

Keyword

Named Entity

Regex

Lexeme

Geolocation

Topic

Dictionary

Keywords

The following table describes the Keyword settings.

Number in image Description
1 Keyword - This field takes any one-word string to match.
2 Optional - This field appears on all elements if the user wishes to relax the requirements for a Compound Lexeme match.

Named Entity

Named entities are detected and identified by the system based on the syntax of the word within the text it is found in.

The following table describes the Named Entity settings.

Number in image Description
1

There are 18 types of detectable Named Entity types the user can pick from:

PERSON

GROUP

FACILITY

ORGANIZATION

GEO_POLITICAL_ENTITY

LOCATION

PRODUCT

EVENT

WORK_OF_ART

LAW

LANGUAGE

DATE

TIME

PERCENT

MONEY

QUANTITY

ORDINAL

CARDINAL

2 The Contains field allows the user to specify one or many strings that must be a part of the detected Named Entity in order for the element match to be considered valid.
3 This Does not Contain field allows users to specify one or many strings, which when found as part of the detected Named Entity invalidates the element match.
4 Select for optional. 

 

Regex

The Regex element allows users to add a regex or reference an existing regex.

The following table describes the Regex settings.

Number in image Description
1 ID - This allows the user to reference the Regex using its name.
2 Label - This allows the user to reference one or many Regex as any detected Regex with the label specified will be considered a match for this element.
3 String - This option allows the user to input a regex string which must be detected in order for the element to match.

 

Lexeme

The Lexeme element allows the user to specify a lexeme which when detected within the range of the minimum relevance triggers a match for the element.

The following table describes the Lexeme settings.

Number in image Description
1 Pick Lexeme - This is where the user finds the specific lexeme to match. When selecting the lexeme the user navigates through the dictionary and topic. This means that if there are multiple lexemes with the same name, only the one within the Dictionary/Topic selection triggers an element match.
2 Min. Relevance - This is the Minimum Relevance that the detected lexeme must have within the text in order to trigger the element match.tected in order for the element to match.

Geolocations

Geolocations allow users to specify a location in the world which when detected triggers an element match.

The following table describes the Geolocations settings.

Number in image Description
1 Pick Geolocation - This allows the user to pick a geolocation (either a country or county).
2 Min. Relevance - This is the minimum relevance of the detected geolocation in order to trigger the element match.
3 Include children - If the specified geolocation has any children (cities within a country) then this will allow them to trigger an element match.

Topic

Selecting a topic. 

The following table describes the Geolocations settings.

Number in image Description
1 Pick Topic - This is where the user selects a Topic. If any of the lexemes within the Topic are detected with the Minimum Relevance then an element match is triggered.
2 Min. Relevance - This allows the user to specify the Minimum Relevance necessary for the element match.
3 Pick Lexeme - This allows the user to specify Lexemes within the Topic which will not trigger an element match.

Dictionaries

Selecting a dictionary.

The following table describes the Dictionary settings.

Number in image Description
1 Pick Dictionary - This is where the user selects a Dictionary. If any of the lexemes within the Dictionary are detected with the Minimum Relevance then an element match is triggered.
2 Min. Relevance - This allows the user to specify the Minimum Relevance necessary for the element match.
3 Pick Lexeme - This allows the user to specify Lexemes within the Topic which will not trigger an element match.
4 Pick Topic - This allows the user to specify Topics within the Dictionary whose Lexemes will not trigger an element match.

Complex Item Types or Blocks

The Complex Item Types are used as part of logic components called Blocks. All required items within a Block must be matched to trigger a Block match. Items within a Block can be another Block or a Simple Item Type. All required Blocks must be matched to trigger a Compound Lexeme match.

Or Block

The Or Block is the simplest of the Blocks and takes at least one item. The name field allows the user to name the block to see which Block within the Compound Lexeme was triggered by a match. This only returns the highest-level Block’s name.

Ordered Block

The Ordered Block expects the matched items to appear in a one-directional sequence. All Ordered Block items have a required proximity attribute which dictates the maximum distance the next item can be found within.

The following table describes the Ordered Block settings.

Number in image Description
1 Name - The name of the Block.
2 Min. Match - The minimum number of item matches in order to trigger a Block match.
3 Items - The items (at least one) which can be matched.
4 Exclude - Any items which will invalidate the Block match if found within the bounds of the block.

Any Order Block

The following table describes the Any Order Block settings.

Number in image Description
1 Name - The name of the Block.
2 Block Proximity - This is the maximum distance between any two matched items in order to trigger a block match.
3 Min. Match - This is the minimum number of matches in order to trigger a block match.
4 Items - These are the items that must be matched.
5 Exclude - These are the items that will invalidate the block match if found.

Alt Proximity Block

The following table describes the Alt Proximity Block settings.

Number in image Description
1 Name - The Block’s name.
2 Target Item - This is the first item that needs to be matched in order to check for the next.
3 Primary Item - This is the item that may be found. If there is a match on this item as well as the first, a Compound Lexeme match is triggered and the process is over.
4 Alt. Item - If the “Primary Item” is not found, a match on the “Alt Item” will trigger a Compound Lexeme match.

Alt Proximity Primary and Alt Items

The following table describes the Alt Proximity Primary and Alt Items settings.

Number in image Description
1 Keyword - This field is dependent on the type of item chosen by the user.
2 Direction - This field specifies the direction from the Target Item. This item must be found in options: RIGHT, LEFT, or ANY.
3 Proximity - This field specifies the distance from the Target Item.

Adding Regex

There are two tabs within the Compound Lexeme page. One of them allows a user to add a regex to the system so that it may be referenced for use within a Compound Lexeme and appear in the results (as the name it’s created with).

The following table describes the Regex settings.

Number in image Description
1 Enable Regex - There are two ways to enable or disable a regex. One way is from the menu which does not require the user to view the complete Regex, and another way is from within the view. This only affects whether the regex name appears in the results. Block Matches are unaffected.
2 Labels - The user can add any number of labels that can be used to reference the Regex within the Regex Simple Item.
3 Expression - This is where the Regex string is added.
4

Code Tag - The Code Tag currently only recognizes two inputs:

CC_num - This validates that the matched string is a credit card number. If it is not, the match is invalidated.

NotAWord - This checks whether the matched string is a word (as defined by Nuix NLP).

Adding Compound Lexemes

When on the Compound Lexemes page, the left tab allows users to add Compound Lexemes. The name of the Compound Lexeme is what appears in the result when all required blocks find a match.

The Compound Lexeme is empty when it is first created. The first Item the user adds must be a Block but all other Add Item buttons allow for Simple Item Types as well.

Labels

Labels can be added to both Compound Lexemes and Regex. The user can also filter entities by their label. The dash in the label list resets the filter so that all the entities can be viewed again.