Configure List types for repetitive processes
List types are 'external artifacts'
Nuix Workstation allows you to create the following four list types into and from the application, and import and export them:
Digest Lists
Shingle Lists
Word Lists
Fuzzy Hash List
You configure and use these List types when you want to repeatedly use them in when processing data.
Access the settings for these external artifacts from the second panel of the Global Options window, using a single-click on their icon.
Common actions relating to all List types
For Digest Lists, Shingle Lists, Word Lists and Fuzzy Hash Lists, you can do the following:
Import a list - once you do, you must rename the file, effectively 'creating' a new list.
Export a list - to one of two kinds of database.
Duplicate a list - into another case
Move a list - to share it with other users, or place on another device or network.
All List types also contain a default configuration setting, that you can use as a basis to create other lists. For details on how to import any List type, see Import data.
Configure a Digest List
A Digest List is a list of hashes for a collection of files you can use as filters when performing a search.
Select the Digest Lists option to import digest lists from third-party sources or create a new list to assist with tasks such as duplicate identification.
Create a Digest List for use in Nuix Workstation
To create a Digest List for use in Nuix Workstation:
In the Results pane, select the items in the result set you want to include in the Digest List.
Click Export and select Export Shingle List.
The Export Digest List dialog opens showing the number of unique shingles to export.
Select one of the following options:
Name a new Digest List and add hashes to it.
Select an existing Digest List and import additional hashes into it.
See the Import a Digest List in Import data to how to do this, and further information about Digest Lists.
Deduplicate emails using EDRM MIH hashing
To deduplicate emails using EDRM MIH hashing:
On the Data Processing Settings tab of the Edit Processing Profile window, under Digest Settings > Email Digest Settings, to maximize the uniqueness of the MD5 values returned, enable the following:
Use EDRM MIH check box
Include Communication Date check box (optional, however)
This is similar to enabling the MD5 checkbox under Digest to compute to calculate new hash values during ingestion.
Note: Enabling or disabling the Include Communication Date option has no effect on the EDRM MIH hash values produced as these are always based only on the Message-ID. However, the EDRM MIH and MD5 hash values that are produced do differ if this date is enabled, as the MD5 values are based on the Message-ID with the Date. (When the Include Communication Date is OFF, both the EDRM MIH and MD5 hash values have exactly the same value.)
Then search using text-custom-metadata:"edrm-mih:*" or "edrm-mih:12345". to find targeted results in the Results view which you can then see more details in the Preview pane's Metadata tab.
Configure a Shingle List
Select the Shingle Lists option to import a shingle list to search for similar documents. Shingle Lists you create in Nuix Workstation use the .shlist format.
In Nuix Workstation, you can create a Shingle List from a set of key documents to find similar items or near-duplicates against other items in your case. You can then import this list into other cases to use against other datasets.
Note: When using a Shingle List in Filtered Items, the items that return reflect the resemblance threshold you set in the Search option. That is, if you set it to a lower percentage than the default threshold of 0.85%, your result set returns many items with less resemblance to those in your Shingle List.
You can only import a shingle list in Nuix's own binary .shlist format. See the Import a Shingle List in Import data to how to do this.
Import and rename a Shingle List
You create a Shingle List by selecting items in a result set and exporting them to a Shingle List, which you use to find similar items inside a collection of evidence.
Nuix Workstation saves Shingle Lists to the Nuix\Shingle Lists directory and not to a directory of your choice. You then manage these lists from File > Global Options > Shingle Lists and display them in the Filtered Items pane for use with review tasks. Here you can add selected items to an existing Shingle List or create a new Shingle List with selected items.
To create a Shingle List for use in Nuix Workstation:
In the Results pane, select the items in the result set you want to include in the Shingle List.
Click Import and browse to select a Shingle List.
Rename the Shingle List to effectively create a new one.
Then import additional shingles into it.
Click OK to open the Exporting dialog while the shingle list is being created and close it when the task is complete.
Configure a Word List
A Word List is a list of keywords or simple phrases you can use as a filter against a dataset. Use one in the Filtered Items pane to return items that include the words from your selected list.
Select the Word Lists option to import Word Lists for performing a search.
In a Word List, you must enter each new word or simple phrase in the text file on a separate line, and be aware that:
There is no limit to the number of words that you can include in the Word List, but the greater the number of lines in a list, the longer the search takes to return all matches.
The encoding of the text file must be in the UTF-8 character set, which is important for languages that are not Latin-based.
Multiple words on a single row are treated as an exact phrase (for example, Dog Cat Mouse, are treated as a search for "dog cat mouse"). Quotes are unnecessary and are stripped.
Word Lists do not support:
Boolean or other searches or queries, so "(classification OR maxim)" is not valid.
To perform a series of Boolean or complex searches in the dataset, the scripting interface provides a means of automatically executing queries, and applying classifications to the result set. If you require complex queries or reporting, refer to the Scripting Reference Guide for more information.
Wildcards (special characters used to substitute characters in a term to allow searching for one or more words that share some of the same characters).
Also see Word List tab for its details.
Nuix Workstation stores word lists as .list files in the following directories:
Windows: %AppData%\Nuix\Word Lists
macOS: /Users/username/Library/Nuix/Word Lists
Linux: /home/<user>/.nuix/Word Lists/
Import Fuzzy Hash Lists
A Fuzzy Hash Lists is a list of hashes where the hash value is used to search and identify files that are similar to the originally hashed file, based on a percentage of similarity.
Select the Fuzzy Hash Lists option to import Fuzzy Hash Lists as a filter when performing a search. You import Fuzzy Hash Lists in the plain text (.txt) SSDeep format.
See Search using Fuzzy hashing for more information.