Process structured and semi-structured data

In Nuix Workstation, ingested data is either of the following types:

Structured where the ingested data type is known or fixed. For example, an MSSQL database has structured data.

Semi-structured where the ingested data has no fixed schema (data type) as you do not know how it is broken down and you need to infer how it will be broken down.

For example, a Comma Separated Value (CSV) file has semi-structured data which you know will have columns of data, but each will not have the same properties or children every time.

This data matches a particular extraction algorithm, however, you need to determine the final data types and/or structure so that the engine can process them properly.

An example would be a CSV file, where the column names and the data types of each column are unknown until you examine the file.

In Nuix Workstation, use a metadata import profile to import metadata into the case where the ingested data (evidence) is matched by a defined parameter. So when the evidence, say a CSV file, is ingested, a column containing a date can be interpreted as a string. Using the metadata import profile, you can convert this string to a Java date object and save it as such in the repository.

Use the following example to process structured and semi-structured data with its metadata import profiles through the scripting API:

caseName = 'case9999'

caseDir = '/cases/' + caseName importProfileName = "importProfile"

# creates a case

kase = utilities.caseFactory.create(caseDir, name: caseName)

begin

# creates a processor

processor = kase.createProcessor

# creates an evidence container

evidenceContainer = processor.newEvidenceContainer("Evidence 5") evidenceContainer.setMetadataImportProfileName(importProfileName) evidenceContainer.addFile("/opt/data/test-data/quotes1.csv") evidenceContainer.addFile("/opt/data/test-data/quotes2.csv") evidenceContainer.save

# start processing processor.process

ensure

kase.close end