Create Compound Lexemes and Regex
Simple Item Types
Within compound lexemes seven different basic components are detected by the system.
Keyword
Named Entity
Regex
Lexeme
Geolocation
Topic
Dictionary
Keywords
The following table describes the Keyword settings.
Number in image | Description |
1 | Keyword - This field takes any one-word string to match. |
2 | Optional - This field appears on all elements if the user wishes to relax the requirements for a Compound Lexeme match. |
Named Entity
Named entities are detected and identified by the system based on the syntax of the word within the text it is found in.
The following table describes the Named Entity settings.
Number in image | Description |
1 | There are 18 types of detectable Named Entity types the user can pick from: PERSON GROUP FACILITY ORGANIZATION GEO_POLITICAL_ENTITY LOCATION PRODUCT EVENT WORK_OF_ART LAW LANGUAGE DATE TIME PERCENT MONEY QUANTITY ORDINAL CARDINAL |
2 | The Contains field allows the user to specify one or many strings that must be a part of the detected Named Entity in order for the element match to be considered valid. |
3 | This Does not Contain field allows users to specify one or many strings, which when found as part of the detected Named Entity invalidates the element match. |
4 | Select for optional. |
Regex
The Regex element allows users to add a regex or reference an existing regex.
The following table describes the Regex settings.
Number in image | Description |
1 | ID - This allows the user to reference the Regex using its name. |
2 | Label - This allows the user to reference one or many Regex as any detected Regex with the label specified will be considered a match for this element. |
3 | String - This option allows the user to input a regex string which must be detected in order for the element to match. |
Lexeme
The Lexeme element allows the user to specify a lexeme which when detected within the range of the minimum relevance triggers a match for the element.
The following table describes the Lexeme settings.
Number in image | Description |
1 | Pick Lexeme - This is where the user finds the specific lexeme to match. When selecting the lexeme the user navigates through the dictionary and topic. This means that if there are multiple lexemes with the same name, only the one within the Dictionary/Topic selection triggers an element match. |
2 | Min. Relevance - This is the Minimum Relevance that the detected lexeme must have within the text in order to trigger the element match.tected in order for the element to match. |
Geolocations
Geolocations allow users to specify a location in the world which when detected triggers an element match.
The following table describes the Geolocations settings.
Number in image | Description |
1 | Pick Geolocation - This allows the user to pick a geolocation (either a country or county). |
2 | Min. Relevance - This is the minimum relevance of the detected geolocation in order to trigger the element match. |
3 | Include children - If the specified geolocation has any children (cities within a country) then this will allow them to trigger an element match. |
Topic
Selecting a topic.
The following table describes the Geolocations settings.
Number in image | Description |
1 | Pick Topic - This is where the user selects a Topic. If any of the lexemes within the Topic are detected with the Minimum Relevance then an element match is triggered. |
2 | Min. Relevance - This allows the user to specify the Minimum Relevance necessary for the element match. |
3 | Pick Lexeme - This allows the user to specify Lexemes within the Topic which will not trigger an element match. |
Dictionaries
Selecting a dictionary.
The following table describes the Dictionary settings.
Number in image | Description |
1 | Pick Dictionary - This is where the user selects a Dictionary. If any of the lexemes within the Dictionary are detected with the Minimum Relevance then an element match is triggered. |
2 | Min. Relevance - This allows the user to specify the Minimum Relevance necessary for the element match. |
3 | Pick Lexeme - This allows the user to specify Lexemes within the Topic which will not trigger an element match. |
4 | Pick Topic - This allows the user to specify Topics within the Dictionary whose Lexemes will not trigger an element match. |
Complex Item Types or Blocks
The Complex Item Types are used as part of logic components called Blocks. All required items within a Block must be matched to trigger a Block match. Items within a Block can be another Block or a Simple Item Type. All required Blocks must be matched to trigger a Compound Lexeme match.
Or Block
The Or Block is the simplest of the Blocks and takes at least one item. The name field allows the user to name the block to see which Block within the Compound Lexeme was triggered by a match. This only returns the highest-level Block’s name.
Ordered Block
The Ordered Block expects the matched items to appear in a one-directional sequence. All Ordered Block items have a required proximity attribute which dictates the maximum distance the next item can be found within.
The following table describes the Ordered Block settings.
Number in image | Description |
1 | Name - The name of the Block. |
2 | Min. Match - The minimum number of item matches in order to trigger a Block match. |
3 | Items - The items (at least one) which can be matched. |
4 | Exclude - Any items which will invalidate the Block match if found within the bounds of the block. |
Any Order Block
The following table describes the Any Order Block settings.
Number in image | Description |
1 | Name - The name of the Block. |
2 | Block Proximity - This is the maximum distance between any two matched items in order to trigger a block match. |
3 | Min. Match - This is the minimum number of matches in order to trigger a block match. |
4 | Items - These are the items that must be matched. |
5 | Exclude - These are the items that will invalidate the block match if found. |
Alt Proximity Block
The following table describes the Alt Proximity Block settings.
Number in image | Description |
1 | Name - The Block’s name. |
2 | Target Item - This is the first item that needs to be matched in order to check for the next. |
3 | Primary Item - This is the item that may be found. If there is a match on this item as well as the first, a Compound Lexeme match is triggered and the process is over. |
4 | Alt. Item - If the “Primary Item” is not found, a match on the “Alt Item” will trigger a Compound Lexeme match. |
Alt Proximity Primary and Alt Items
The following table describes the Alt Proximity Primary and Alt Items settings.
Number in image | Description |
1 | Keyword - This field is dependent on the type of item chosen by the user. |
2 | Direction - This field specifies the direction from the Target Item. This item must be found in options: RIGHT, LEFT, or ANY. |
3 | Proximity - This field specifies the distance from the Target Item. |
Adding Regex
There are two tabs within the Compound Lexeme page. One of them allows a user to add a regex to the system so that it may be referenced for use within a Compound Lexeme and appear in the results (as the name it’s created with).
The following table describes the Regex settings.
Number in image | Description |
1 | Enable Regex - There are two ways to enable or disable a regex. One way is from the menu which does not require the user to view the complete Regex, and another way is from within the view. This only affects whether the regex name appears in the results. Block Matches are unaffected. |
2 | Labels - The user can add any number of labels that can be used to reference the Regex within the Regex Simple Item. |
3 | Expression - This is where the Regex string is added. |
4 | Code Tag - The Code Tag currently only recognizes two inputs: CC_num - This validates that the matched string is a credit card number. If it is not, the match is invalidated. NotAWord - This checks whether the matched string is a word (as defined by Nuix NLP). |
Adding Compound Lexemes
When on the Compound Lexemes page, the left tab allows users to add Compound Lexemes. The name of the Compound Lexeme is what appears in the result when all required blocks find a match.
The Compound Lexeme is empty when it is first created. The first Item the user adds must be a Block but all other Add Item buttons allow for Simple Item Types as well.
Labels
Labels can be added to both Compound Lexemes and Regex. The user can also filter entities by their label. The dash in the label list resets the filter so that all the entities can be viewed again.