CliParameters JobFile structure

This section describes the settings available for configuring the following kinds of jobs:

File collections, volatile information collections and RAM image collections

File extractions

File deletions

File surveys

Settings for all the above job types are saved as child elements within the <CliParameters> element of a JobFile.

For disk image collections, see topic DiskImageParams JobFile Structure.

Titan element

The Titan element specifies general settings for the collection or extraction. These settings reside within one set of <Titan>..</Titan> tags.

<Titan>

<!-- various "child" elements go here:

<RetryCount/>

<TemplatePath/>

...etc -->

</Titan>

The <Titan> element has no attributes or text. All settings are stored in "child" elements, described below.

Threads element

<Titan>

<Threads>1</Threads>

</Titan>

The value of this element should be set to 1 for all jobs in Nuix Collector 7.8 and ECC 7.8. This element is deprecated, but still required. It may be removed in the future.

RetryCount element

<Titan>

<RetryCount>2</RetryCount>

</Titan>

The <RetryCount> element specifies the number of attempts to collect or extract failed items (such as files that failed to collect on the first attempt because they were open). Must be an integer 0 or greater.

Nuix Collector (and ECC Client Service) will log items that failed to be collected or extracted and then move on to the next file. After all the files have been processed, the program will go back through the failed items a specific number of times to re-try. The above example setting would instruct the program to make up to two additional attempts to process any failed files.

CloudRetryCount element

<Titan>

<CloudRetryCount>7</CloudRetryCount>

</Titan>

The <CloudRetryCount> element specifies the maximum number of retries when saving data to a Cloud destination such as AWS S3 or Azure. Can be 0 or higher. If not specified, the default value is 7.

CloudRetryTimeout element

<Titan>

<CloudRetryTimeout>300</CloudRetryTimeout>

</Titan>

The <CloudRetryTimeout> element specifies the total time allowed (in seconds) to retry writing to a Cloud destination such as AWS S3 or Azure. Can be 0 or higher. If not specified, the default value is 300.

TemplatePath element

<Titan>

<TemplatePath>..\Templates\CliStatus.db</TemplatePath>

</Titan>

The <TemplatePath> element specifies the empty database file used as a model for each new crawl database.

The <TemplatePath> element also implicitly specifies the Templates folder, which contains the CliStatus.db file as well as CSS style sheets used for formatting reports.

The above example specifies a relative path to CliStatus.db. This path is relative to the Modules folder within the Nuix Collector or ECC Client installation folder, e.g.:

Modules folder (default for an installation on 64-bit Windows):

C:\Program Files (x86)\Nuix\Nuix Collector\Modules

Templates folder implicitly specified by TemplatePath in above example...

C:\Program Files (x86)\Nuix\Nuix Collector\Templates

...which can also be expressed as:

C:\Program Files (x86)\Nuix\Nuix Collector\Modules\..\Templates

Note: When specifying the Template Path via the Collector Wizard, only specify the folder (e.g.: ..\Templates). When the Wizard saves the <TemplatePath> element to the JobFile, it will append \CliStatus.db to the folder you specified. On Linux and macOS, paths are specified using forward slashes.

The specified <TemplatePath> value may include environment variables, e.g.:

<TemplatePath>%USERPROFILE%\Templates\CliStatus.db</TemplatePath>

In the above example, the %USERPROFILE% environment variable would be translated according to the user who is currently logged in and running Nuix Collector or ECC Client, e.g.:

C:\Users\Linda\Templates\CliStatus.db

NistDirectory element

<Titan>

<NistDirectory UseToEliminateMatches="No">

..\..\Nuix Collector Evidence Browser\Hash\NIST

</NistDirectory>

</Titan>

The <NistDirectory> element specifies the directory that contains the NIST hash database. For Nuix Collector, the default path is the hash\NIST database folder, located within the Nuix Collector Evidence Browser installation folder. This allows both Nuix Collector and Nuix Collector Evidence Browser to share the same NIST hash database. For ECC Client, the default path is ..\Hash\NIST.

The <NistDirectory> element value may be a relative path, as shown in the example, above. The path is relative to the Modules folder within the Nuix Collector or ECC Client installation folder.

The specified <NistDirectory> value may include environment variables, e.g.:

<NistDirectory UseToEliminateMatches="No">

%USERPROFILE%\Hash\NIST

</NistDirectory>

In the above example, the %USERPROFILE% environment variable would be translated according to the user who is currently logged in and running Nuix Collector or ECC Client, e.g.: C:\Users\Linda\Hash\NIST.

The UseToEliminateMatches attribute can be "Yes" or "No":

Yes

The NIST hash database will be used to determine files that are to be excluded from processing due to being included in the NIST hash set. The determination of what is a NIST known operating system or application file is based on the MD5 hash of the data (data stream) of each individual file. The file name and other metadata are not used in this determination.

No

The NIST hash database will not be used to exclude files from the collection.

StdDirectory element

<Titan>

<StdDirectory UseToEliminateMatches="No">

..\..\Nuix Collector Evidence Browser\Hash\Standard

</StdDirectory>

</Titan>

The <StdDirectory> element specifies the directory that contains the "Standard Files" hash database. The default path is the hash\Standard database folder, located within the Nuix Collector Evidence Browser installation folder. This allows both Nuix Collector and Nuix Collector Evidence Browser to share the same Standard hash database. For ECC Client, the default path is ..\Hash\Standard.

The <StdDirectory> element value may be a relative path, as shown in the example, above. The path is relative to the Modules folder within the Nuix Collector or ECC Client installation folder.

The specified <StdDirectory> value may include environment variables, e.g.

<StdDirectory UseToEliminateMatches="No">

%USERPROFILE%\Hash\Standard

</StdDirectory>

In the above example, the %USERPROFILE% environment variable would be translated according to the user who is currently logged in and running Nuix Collector or ECC Client, e.g.: C:\Users\Linda\Hash\Standard.

The UseToEliminateMatches attribute can be "Yes" or "No":

Yes

The Standard hash database will be used to determine files that are to be excluded from processing due to being included in the Standard hash set. The determination of what is a Standard known operating system or application file is based on the MD5 hash of the data (data stream) of each individual file. The file name and other metadata are not used in this determination.

No

The Standard hash database will not be used to exclude files from processing.

DupDirectory element

<Titan>

<DupDirectory UseToEliminateMatches="No">

..\..\Nuix Collector Evidence Browser\Hash\Duplicate

</DupDirectory>

</Titan>

The <DupDirectory> element specifies the directory that contains the "Duplicate Files" hash database to be used for this collection or extraction. Typically, this directory is empty; however, it can contain a Duplicate Files hash database from a previous run.

The same Duplicate Files hash database can be used for multiple processing runs. With each run, the duplicate database accumulates more entries. This allows a file processed in a previous run to be recognized as a duplicate when performing subsequent processing runs days or weeks later. To disregard previous processing runs for duplicate detection, the Duplicate Files hash database within the <DupDirectory> should be deleted. The database filename is Temp.dat.

The <DupDirectory> element value may be a relative path, as shown in the example, above. The path is relative to the Modules folder within the Nuix Collector or ECC Client installation folder. The default path is the Hash\Duplicate database folder, located within the Nuix Collector Evidence Browser installation folder. This allows both Nuix Collector and Nuix Collector Evidence Browser to share the same Duplicate hash database. For ECC Client, the default path is ..\Hash\Duplicate.

The specified <DupDirectory> value may include environment variables, e.g.

<DupDirectoryUseToEliminateMatches="No">

%USERPROFILE%\Hash\Duplicate

</DupDirectory>

In the above example, the %USERPROFILE% environment variable would be translated according to the user who is currently logged in and running Nuix Collector or ECC Client, e.g.: C:\Users\Linda\Hash\Duplicate.

The UseToEliminateMatches attribute can be "Yes" or "No":

Yes

The Duplicate Files hash database in the specified folder will be used to determine files that are to be excluded from processing. The determination of what is a duplicate file is based on the MD5 hash of the content of each individual file. The file name and other metadata are not used in this determination.

No

The Duplicate Files hash database in the specified folder will not be used to exclude files from processing.

Signatures element

<Titan>

<Signatures>..\Signatures\File Headers.xml</Signatures>

</Titan>

The <Signatures> element specifies the File Headers XML file used by the signature analysis feature to determine file types.

The <Signatures> element value may be a relative path, as shown in the example, above. The path is relative to the Modules folder within the Nuix Collector or ECC Client installation folder.

Search element

<Titan>

<Search MinWordSize="3" MaxWordSize="32">

<!-- various "child" elements go here:

<AllowableChars/>

<AllowableDigits/>

<AllowableSpecials/>

...etc -->

</Search>

</Titan>

The <Search> element contains settings for the index-based search engine. This search engine allows text files to be included only if they contain certain words or phrases. For details on the search feature, see the topics covering the <Extension> element's CollectAll attribute and the <Keyword> element, below.

The MinWordSize attribute is used to set the minimum size that a word can be. By default, this is set to 3 characters. This means that one and two character words are not recognized by the search engine.

The MaxWordSize attribute is used to set the maximum size that a word can be. By default, this is set to 32 characters.

AllowableChars element

<Titan>

<Search MinWordSize="3" MaxWordSize="32">

<AllowableChars>ABCDEFGHIJKL…</AllowableChars>

</Search>

</Titan>

The <AllowableChars> element is used to define the standard characters that can be included within a word. Note that the value of this element is case-sensitive. If appropriate, include characters in both upper and lower case.

AllowableDigits element

<Titan>

<Search MinWordSize="3" MaxWordSize="32">

<AllowableDigits>0123456789</AllowableDigits>

</Search>

</Titan>

The <AllowableDigits> element is used to define the digits that can be included within a word.

AllowableSpecials element

<Titan>

<Search MinWordSize="3" MaxWordSize="32">

<AllowableSpecials>'_-@.</AllowableSpecials>

</Search>

</Titan>

The <AllowableSpecials> element is used to define the special characters that can be included within a word. The default "special characters" can be edited to include other characters to include specific items of interest, such as comma (,) and dollar sign ($) so that dollar amounts will be detected as words by the indexing engine.

TempFolder element

<Titan>

<TempFolder UseDefault="No">C:\Example Temp Path

</TempFolder>

</Titan>

The <TempFolder> element specifies the folder for saving temporary files it needs to create during processing. These files will be deleted when the program finishes.

The UseDefault attribute can be set to "Yes" or "No":

Yes

Any temporary files are saved in the folder specified by the TMP or TEMP environment variable.

No

Any temporary files are saved in the folder specified by the <TempFolder> element.

Note: When the Nuix Collector Wizard is used to configure a Portable Collection, the <TempFolder> is usually configured to save temporary files on the external drive, to avoid creating temporary files on the custodian's computer. The <TempFolder> element can also be used to save logs and reports; see the <Location> element's UseTemp attribute for details.

Restart element

<Titan>

<Restart Mode="IgnorePrevious"></Restart>

</Titan>

The <Restart> element is for support purposes. It should be used only under the direction of a Nuix Technical Support staff member.

The Mode attribute can be set as follows:

IgnorePrevious

This is the normal operating mode. Any previous results database is ignored. In certain support scenarios, you may be asked by a Nuix Technical Support staff member to change the value of this element.

ForensicSnapshot element

<Titan>

<ForensicSnapshot Apply="None"></ForensicSnapshot>

</Titan>

The <ForensicSnapshot> element configures the ForensicSnapshot feature, for processing files that are open and locked by applications or services. When enabled, the ForensicSnapshot feature will create a snapshot of an entire local NTFS volume. This allows most open or locked files to be collected; however, creating the snapshot requires some processing time and temporary disk space.

The Apply attribute allows you to configure the ForensicSnapshot feature according to your priorities:

None

The ForensicSnapshot feature will not be used. No additional processing time or disk space will be required.

Always

A ForensicSnapshot will be created at the start of a job, which will crawl and collect from that snapshot only. Once the job is complete the snapshot will be released.

IfNeeded

A ForensicSnapshot will be created at the start of the job, but the job will then crawl and collect files normally. If a file cannot be collected, then the program will collect that file from the ForensicSnapshot. All files that are not locked or challenging will be processed normally, and only locked or challenging files will be processed from the snapshot. Once the job is complete the snapshot will be released.

Note: The ForensicSnapshot feature is available only when processing local NTFS volumes. The Volume Shadow Copy Service (included with Windows) must be available. The ForensicSnapshot feature is integrated with the Volume Shadow Copy Service because a wide number of applications allow their open and locked files to be accessed via this service.

CollectionType element

<Titan>

<CollectionType IsPortable="No">Local</CollectionType>

</Titan>

The <CollectionType> element specifies the type of collection, extraction, deletion or survey. This element's value can be set to one of the following:

Local

A collection or deletion from a local volume on the computer running Collector or ECC Client. Can also collect from attached storage devices, UNC paths and mapped network drives & shares.

Remote

A portable collection or deletion run from a remote machine. Can include files from a local volume on the computer running Portable Collector. Can also collect from attached storage devices, UNC paths and mapped network drives & shares.

NetworkShare

A portable collection or deletion run from a network share. Can include files from a local volume on the computer running Portable Collector. Can also collect from attached storage devices, UNC paths and mapped network drives & shares.

Production

Extracts files from a forensics image, evidence file or disk image.

SharePoint

Collects items from within Microsoft SharePoint.

See also the topic for the <Target> element, and the CrawlOnly attribute of the <Input> element, which further define the job as a collection, deletion, extraction or survey.

The IsPortable attribute specifies whether the collection will be configured to run from a Portable Collection Device. This optional attribute is referenced only by the Nuix Collector Wizard; it can be set to "Yes" or "No":

Yes

The collection will be configured as a Portable Device, i.e. run via Portable Collector.

No

The collection will be configured to run as regular collection.

ExaminerName element

<Titan>

<ExaminerName>Joe Examiner</ExaminerName>

</Titan>

The <ExaminerName> element is used to document the name of the examiner who configured the job. The element must be present within the JobFile; however, its value can be left empty.

CaseName element

<Titan>

<CaseName>The Big One</CaseName>

</Titan>

The <CaseName> element is used to document the name of the case or project associated with this collection. The element must be present within the JobFile; however, its value can be left empty.

BatchCommitSize element

<Titan>

<BatchCommitSize

UseShadowCrawl=Yes"

CommitSizeMB="100">100</BatchCommitSize>

</Titan>

The number of responsive file records to hold in the write cache before committing the records to the crawl database. Defaults to 100, which provides optimal performance on a typical PC.

If a catastrophic failure (crash) occurs during a file collection or survey, this value can be lowered to 1 before rerunning the collection to possibly identify the issue (this will cause processing to slow down). Values higher than 100 may consume excessive amounts of memory. Please change this value only under the direction of a Nuix Technical Support staff member.

The UseShadowCrawl attribute determines whether the crawl database -- generated during the crawl phase of a file collection or survey -- will be a SQL database. This, in turn, determines whether the crawl phase of a job can be resumed after a failure. Can be set to "Yes" or "No":

Yes (or omitted)

The crawl database is generated as a SQL database, allowing some file collections and surveys which fail during the crawl phase to be resumed from the point where they left off.

No

Suppress the creation of the SQL database, and disable the ability to resume from a failed crawl.

The CommitSizeMB attribute specifies the number of megabytes of data to hold in the write cache before writing to the crawl database. If this attribute value is larger than 0, a database write will take place whenever the amount of unwritten data exceeds the CommitSizeMB threshold or whenever the unwritten file count exceeds the BatchCommitSize threshold, whichever comes first. The default value is 100 megabytes if this attribute is omitted or empty. Set the value to 0 to disable this setting entirely.

Description element

<Titan>

<Description>A collection of data files from Server X.</Description>

</Titan>

The <Description> element describes the file collection job. The element is optional. The value is limited to 512 characters and can be left empty.

CollectionName element

<Titan>

<CollectionName>My File Collection</CollectionName>

</Titan>

The <CollectionName> element specifies a name for the file collection. The element is optional. The value is limited to 256 characters of any kind and can be left empty. For Nuix Collector users, this name is shown in the Collector Wizard.

EvidenceNumber element

<Titan>

<EvidenceNumber>9999</EvidenceNumber>

</Titan>

The <EvidenceNumber> element holds an evidence number for the file collection job. The element is optional. The value is limited to 256 characters of any kind and can be left empty.

End of General Settings, closed by the </Titan> tag.

Input element

The Input element describes the file locations or containers to be crawled and collected (or extracted). These settings reside within one set of <Input>..</Input> tags.

<Input PreHash="No"

CrawlOnly="No"

ConvertToUNC="No"

MachineSummary="No"

DeletedFilesOnly="No" >

<!-- Various "child" elements go here –

typically one of the following:

<Directories/>

<Files/>

<FileSafes/>

<LogicalEvidenceFiles/>

<SharePoints/>

<FileLists/>

<FileSystemFeatures/>

...etc -->

</Input>

The <Input> element is used to define the input information for a collection, extraction or other job.

The PreHash attribute specifies whether the program will calculate the hash values of all files specified in the input settings (i.e. all the files being crawled). This attribute can be set to "Yes" or "No":

Yes

Hash values will be calculated for each file specified in the input settings (i.e. all the files being crawled).

Note: Setting PreHash="Yes" within the <Input> element overrides any PreHash="No" attribute within a child element, such as a <Directory> element. Setting PreHash="Yes" requires additional processing time.

No

Hash values will not be calculated for each file specified in the input settings (i.e. all the files being crawled) – unless a child element (e.g.: a <Directory> element) specifies a PreHash="Yes" attribute.

The CrawlOnly attribute specifies whether the program will crawl the input sources and process files which meet the selection criteria, or merely crawl the input sources without performing any processing. This attribute can be set to "Yes" or "No":

Yes

Input sources will be crawled, but no file processing will take place. Logs and reports will be generated. This is also known as a "Survey Only" job.

No

Input sources will be crawled and processing will take place. Logs and reports will be generated.

The ConvertToUNC attribute specifies whether paths beginning with mapped drive letters are converted to UNC paths when selected via the Nuix Collector Wizard. Converting mapped drive letter paths to UNC paths ensures the specified path is unambiguous and not dependent on a transitory mapping. ECC ignores this attribute; it only impacts the Nuix Collector Wizard. The ConvertToUNC attribute can be set to "Yes" or "No":

Yes

Paths referencing mapped drives will be converted to their equivalent UNC path when selected or specified via the Collector Wizard.

No

Paths referencing mapped drives will remain unchanged when selected or specified via the Collector Wizard.

The MachineSummary attribute specifies whether to collect a summary of machine information (CPU name, amount of memory installed, network adapter name, etc.) from the computer which will run the collection. When set to "Yes", the information is saved within the FileSafe in a special named file called "MachineSummary", but only if the collection is being saved to a FileSafe (i.e. the CreateFileSafe attribute within the <Target> element is set to "Yes"). The MachineSummary attribute can be set to "Yes" or "No":

Yes

Will collect a summary of machine information from the computer on which the collection is run.

No

No machine information will be collected.

The DeletedFilesOnly attribute specifies whether the survey or collection will be restricted to deleted files only. Valid only for collections and surveys running on Windows computers. Can be set to "Yes" or "No":

Yes

The survey or collection will be limited to deleted files only. Supported on Windows only. Requires FileSystemFeatures and CollectDeletedFiles to be enabled.

No

The survey or collection will not be limited to deleted files. Whether deleted files are included depends on whether the computer is running Windows, and the FileSystemFeatures and CollectDeletedFiles settings.

The <Input> element for an Enterprise Collection Center job may contain additional attributes intended for internal use by the New Collection Wizard in ECC Admin Console.

All other Input element settings are stored as "child" elements within the <Input> element, described below.

Directories element

<Input PreHash="No" CrawlOnly="No">

<Directories

PreHash="No"

PreserveAccessDate="Yes"

ExcludeADS="No"

AutoDetectVolumes="None">

<!-- optional <Note> element goes here -->

<!-- one or more <Directory> elements go here -->

</Directories>

</Input>

The <Directories> element specifies the folders to be crawled. Individual directory entries are made using one or more <Directory> elements, which are contained within the <Directories> element.

The PreHash attribute can be set to "Yes" or "No":

Yes

During the crawl phase, the program will calculate the hash values of each file contained within any child <Directory> elements.

Note: Setting PreHash="Yes" requires additional processing time.

No

During the crawl phase, the program will not calculate the hash values of each file contained within any child <Directory> elements, unless the <Input> element has a PreHash="Yes" attribute.

The PreserveAccessDate attribute (deprecated; see note) — When Nuix Collector hashes a file’s contents or copies the file, it will need to access the file. This access will trigger the Last Access date and time to be updated automatically by Windows.

This attribute can be set to "Yes" or "No":

Yes (deprecated; see note)

The Last Access dates and times will be reset back to the exact time they were prior to the program touching the files. This is the default setting for JobFiles created via the Nuix Collector Wizard.

No

The Last Access dates and times will be updated on all files crawled.

Note: In versions of Nuix Collector and Nuix ECC 9.10 or newer, the PreserveAccessDate attribute has no impact: the Last Access dates and times of accessed files are not preserved. In versions of Nuix Collector and Nuix ECC 9.8 or older, when PreserveAccessDate is set to Yes, the Windows API is used to reset the Last Access date and time, which updates information in the Master File Table. This will change other metadata properties within the file system. The user-level metadata will be left as it was prior to the program being run.

In versions of Nuix Collector and Nuix ECC 9.8 or older, setting PreserveAccessDate to No can speed processing times when crawling READ-ONLY volumes, or volumes where the user account running Nuix Collector or ECC Client does not have rights to update the Last Access Date and Time file attribute.

When extracting items from an evidence file container, the Last Accessed dates and times for items within the source container do not change as the items are crawled.

The ExcludeADS attribute specifies whether Alternate Data Streams will be processed:

Yes

Alternate Data Streams will be processed when processing directories which support them, such as directories on NTFS volumes.

No (or omitted)

Alternate Data Streams will be ignored when processing directories.

The AutoDetectVolumes attribute allows the program to automatically detect the available volumes on the host machine and add them to the Source Directories to be crawled (searched for files matching the collection criteria). This attribute can be set to one of the following values:

None

Volumes will not be auto-detected.

AddLocalVolumes

Will detect and add all available local volumes. Excludes network share volumes (network mapped drives).

AddAllVolumes

Will detect and add all available volumes, including network share volumes (network mapped drives).

AddLocalVolumesExcludeCli

Will detect and add all available local volumes, except the volume that the program is running from. Excludes network share volumes (network mapped drives).

AddAllVolumesExcludeCli

Will detect and add all available volumes, except the volume that the program is running from. Includes network share volumes (network mapped drives).

Note element

<Input PreHash="No" CrawlOnly="No">

<Directories

PreHash="No"

PreserveAccessDate="Yes"

ExcludeADS="No"

AutoDetectVolumes="None">

<Note>Optional notes go here.</Note>

<Directory>C:\Files</Directory>

<Directory>E:\Docs</Directory>

<Directory>F:\Notes</Directory>

</Directories>

</Input>

The <Note> element, if present, contains text notes which appear on the Summary Report.

Directory element

<Input PreHash="No" CrawlOnly="No">

<Directories

PreHash="No"

PreserveAccessDate="Yes"

ExcludeADS="No"

AutoDetectVolumes="None">

<Directory>C:\Files</Directory>

<Directory>E:\Docs</Directory>

<Directory>F:\Notes</Directory>

</Directories>

</Input>

One or more <Directory> elements are used to identify individual directories that are to be crawled. One directory may be specified per <Directory> element, using Uniform Naming Convention (UNC) or local file system paths to any available resource. Multiple <Directory> elements can be specified to crawl multiple directories in a single run.

Each <Directory> element resides within a single <Directories> parent element, as shown in the example, above.

The specified <Directory> value may include environment variables, e.g.:

<Directory>

%USERPROFILE%\Documents

</Directory>

In the above example, the %USERPROFILE% environment variable would be translated according to the user who is currently logged in and running Nuix Collector or ECC Client, e.g.: C:\Users\Linda\Documents.

For a portable collection – i.e. where the <Titan> <CollectionType> value is Remote – the specified <Directory> value may be a single question mark, which will prompt the custodian to specify additional directories to collect when the portable collection is run.

<Directory> elements which specify a UNC path generally require login credentials to access the path. The following attributes are available for the <Directory> element to enable access to network shares accessed via UNC paths:

The LoginUser attribute specifies the user ID for accessing the network path.

The Password attribute specifies the password for the above User ID.

The optional LoginDomain attribute specifies the login domain for the above User ID. Omit this attribute when accessing workgroup shares via local users.

The following example collects from the \\MyServer\MyShare path, using login credentials for domain user Alice (on the MYCOMPANY domain):

<Directory

LoginUser="Alice"

Password="G1sv4zz2#"

LoginDomain="MYCOMPANY">\\MyServer\MyShare

</Directory>

Note: JobFiles containing user IDs and passwords should be encrypted and stored carefully to reduce their exposure to unauthorized users. For details see topic Customizing JobFiles.

FileLists and Files elements under Directory

A <FileLists> element or <Files> element may be included as a child of the <Directory> element to further specify the files to be collected from a particular <Directory>. The following example shows a <FileLists> element holding one <FileList>:

<Input PreHash="No" CrawlOnly="No">

<FileLists PreHash="No" PreserveAccessDate="Yes" />

<Directories>

<Directory>C:\Users\George\Documents

<FileLists>

<FileList>E:\FileLists\List1.Txt</FileList>

</FileLists>

</Directory>

</Directories>

</Input>

In the above example, the first <FileLists> element appears above the <Directories> element. This first element specifies the PreHash and PreserveAccessDate attributes, but contains no child elements. A second <FileLists> element appears within the particular <Directory> element; this second <FileLists> element includes a <FileList> child element containing the files to collect from the <Directory>.

The following example shows a <Files> element holding two <File> elements to restrict the files to collect from a particular <Directory>:

<Input PreHash="No" CrawlOnly="No">

<Directories

PreHash="No"

PreserveAccessDate="Yes"

AutoDetectVolumes="None">

<Directory>C:\Users\George\Documents

<Files>

<File>

C:\Users\George\Documents\Contract.doc

</File>

<File>

C:\Users\George\Contract revisions.doc

</File>

</Files>

</Directory>

</Directories>

</Input>

The specified <FileList> and <File> values may include environment variables, e.g.:

<FileList>%USERPROFILE%\FileLists\FileList.txt</FileList>

In the above example, the %USERPROFILE% environment variable would be translated according to the user who is currently logged in and running Nuix Collector or ECC Client, e.g.: C:\Users\Linda\FileLists\FileList.txt. For ECC Client, this requires configuring the collection to run with impersonation enabled.

ExcludeDir element

Each <Directory> element may have one or more optional <ExcludeDir> elements, which further refine the file selection criteria.

For example, the following would collect everything from a user's Documents folder, except for files in the Pictures subfolder:

<Directory>C:\Users\Alice\Documents

<ExcludeDir>Pictures</ExcludeDir>

</Directory>

ExcludeFile element

Each <Directory> element may have one or more optional <ExcludeFile> elements, which further refine the file selection criteria. Each <ExcludeFile> element specifies the name of a file contained within the <Directory> to be excluded from processing. The file specified can be a relative path to a specific file, or a wildcard pattern in the form *.extension

For example, the following would collect everything from a user's Documents folder, except for the file C:\Users\Alice\Documents\Personal\Resume.docx or any PDF files found anywhere within the C:\Users\Alice\Documents folder or sub-folders:

<Directory>C:\Users\Alice\Documents

<ExcludeFile>Personal\Resume.docx</ExcludeFile>

<ExcludeFile>*.pdf</ExcludeFile>

</Directory>

Note: Wildcard entries such as *.pdf must not contain any volume or folder. Wildcard entries apply to the <Directory> and any of its sub-directories.

Files element

<Input PreHash="No" CrawlOnly="No">

<Files>

<!-- one or more <File> elements go here -->

<File>C:\Data\Docs\report.docx</File>

</Files>

</Input>

The <Files> element contains one or more <File> child elements which specify individual files to collect or process. The <Files> element can also be specified under a <Directories><Directory> element, as mentioned above.

Each <Files> element may contain an optional Yes/No attribute excludeADS, which specifies whether Alternative Data Streams (ADS) of the specified files (i.e. child <File> elements) will be processed (applies to NTFS file systems only):

Yes

Any Alternative Data Streams for the specified files will be EXCLUDED from collection or processing.

No (or omitted):

Any Alternative Data Streams for the specified files WILL be collected or processed.

File element

<Input PreHash="No" CrawlOnly="No">

<Files>

<!-- one or more <File> elements go here -->

<File>C:\Data\Docs\report.docx</File>

</Files>

</Input>

The value of each <File> element specifies the full path to an individual file to collect or process. These values can contain environment variables. If the number of <File> elements exceeds 100, it is recommended to instead specify a separate FileList file containing the full path to each file to collect. For details see the FileLists element.

Each <File> element may contain an optional Imported attribute which indicates whether the <File> element was imported:

Yes

This File element was imported from a FileList or Log file, via a Wizard in ECC or Collector.

No (or omitted):

This File element was manually entered by a user.

FileSafes element

<Input PreHash="No" CrawlOnly="No">

<FileSafes>

<!-- one or more <FileSafe> elements go here -->

</FileSafes>

</Input>

Within the <FileSafes> element, one or more <FileSafe> elements define the individual FileSafe files that are to be crawled and extracted from.

FileSafe element

<Input PreHash="No" CrawlOnly="No">

<FileSafes>

<FileSafe>E:\Tom Smith Laptop.mfs01</FileSafe>

</FileSafes>

</Input>

One or more <FileSafe> elements may be specified. Specify one .mfs01 file per <FileSafe> element. For FileSafe file sets which span multiple segments, only one <FileSafe> element specifying the first segment (the file ending in .mfs01) is required.

The specified <FileSafe> may include environment variables, e.g.:

<FileSafe>%USERPROFILE%\FileSafes\%USERNAME%.mfs01</FileSafe>

In the above example, the %USERPROFILE% environment variable would be translated according to the user who is currently logged in and running Nuix Collector or ECC Client, e.g.: C:\Users\Linda\FileSafes\Linda.mfs01.

<FileSafe> elements which specify a UNC path generally require login credentials to access the path. The following attributes are available for the <FileSafe> element to enable access to network shares accessed via UNC paths:

The LoginUser attribute specifies the user ID for accessing the network path.

The Password attribute specifies the password for the above User ID.

The optional LoginDomain attribute specifies the login domain for the above User ID. Omit this attribute when accessing workgroup shares via local users.

The following example extracts files from a FileSafe in the \\FileSrvr\Test share, using login credentials for local user Joe (LoginDomain is left empty when the machines are on a workgroup):

<FileSafe>

LoginUser="Joe"

Password="xt@987dGH"

LoginDomain="">\\FileSrvr\Test\SampleFileSafe.mfs01

</FileSafe>

Note: JobFiles containing passwords should be saved as encrypted JobFiles to help keep this information confidential.

A <FileLists> element or <Files> element may be included to further specify the files to be extracted from a particular <FileSafe>. The following example shows a <FileLists> element holding one <FileList>:

<Input PreHash="No" CrawlOnly="No">

<FileLists PreHash="No" PreserveAccessDate="Yes"></FileLists>

<FileSafes>

<FileSafe>E:\MyFileSafe.mfs01

<FileLists>

<FileList>E:\FileLists\List2.Txt

</FileList>

</FileLists>

</FileSafe>

</FileSafes>

</Input>

In the above example, the first <FileLists> element appears above the <FileSafes> element. This first element specifies the PreHash and PreserveAccessDate attributes, but contains no child elements. A second <FileLists> element appears within the <FileSafe> element. This second element has no attributes, but holds a <FileList> "child element" which points to a FileList.

<FileList> elements which specify a UNC path may include login credentials for accessing the path. Please refer to the <FileList> documentation on the previous page.

Each FileList is a simple text file (ASCII, UTF-8 or UTF-16 LE encoded), containing the full file specification – one per line – for each file to be collected. When specifying FileSafe items within a FileList, use the following format:

FileSafe Filename \ OriginalLocation \ Drive Letter or Server Name \ Folder \ Filename

The following is an example of a FileList specifying files within a FileSafe:

MyFileSafe.mfs01\Georges Laptop\C\Users\George\Documents\Contract.doc

MyFileSafe.mfs01\Georges Laptop\C\Users\George\Documents\Contract revisions.doc

MyFileSafe.mfs01\Georges Laptop\C\Users\George\Documents\Time\Billable Hours.xls

The following example shows a <Files> element used to specify two files to be extracted from the <FileSafe>:

<Input PreHash="No" CrawlOnly="No">

<FileSafes>

<FileSafe>E:\MyFileSafe.mfs01

<Files>

<File>

MyFileSafe.mfs01\Georges

Laptop\C\Users\George\Documents\Contract.doc

</File>

<File>

MyFileSafe.mfs01\Georges

Laptop\C\Users\George\Documents\Contract revisions.doc

</File>

</Files>

</FileSafe>

</FileSafes>

</Input>

The specified <FileList> and <File> values may include environment variables, e.g.

<FileList>%USERPROFILE%\FileLists\FileList.txt</FileList>

In the above example, the %USERPROFILE% environment variable would be translated according to the user who is currently logged in and running Nuix Collector or ECC Client, e.g.: C:\Users\Linda\FileLists\FileList.txt.

LogicalEvidenceFiles and LogicalEvidence elements

<Input PreHash="No" CrawlOnly="No">

<LogicalEvidenceFiles>

<LogicalEvidence>E:\Bill Jones Files.L01

</LogicalEvidence>

</LogicalEvidenceFiles>

</Input>

Within the <LogicalEvidenceFiles> element, one or more <LogicalEvidence> elements define the individual EnCase Logical Evidence Files (LEF's) that are to be crawled and extracted from. Specify one .L01 file per <LogicalEvidence> element.

For EnCase Logical Evidence Files which span multiple segments, only one <LogicalEvidence> element specifying the first segment (the one ending in .L01) is required.

The specified <LogicalEvidence> file may include environment variables, e.g.:

<LogicalEvidence>%USERPROFILE%\EvidenceFiles\%USERNAME%.L01</LogicalEvidence>

In the above example, the %USERPROFILE% environment variable would be translated according to the user who is currently logged in and running Nuix Collector or ECC Client, e.g.: C:\Users\Linda\EvidenceFiles\Linda.L01.

<LogicalEvidence> elements which specify a UNC path generally require login credentials to access the path. The following attributes are available for the <LogicalEvidence> element to enable access to network shares accessed via UNC paths:

The LoginUser attribute specifies the user ID for accessing the network path.

The Password attribute specifies the password for the above User ID.

The optional LoginDomain attribute specifies the login domain for the above User ID. Omit this attribute when accessing workgroup shares via local users.

The following example extracts files from a Logical Evidence file in the \\FileSrvr\Test share, using login credentials for local user Joe (LoginDomain is left empty when the machines are on a workgroup):

<LogicalEvidence>

LoginUser="Joe"

Password="xt@987dGH"

LoginDomain="">\\FileSrvr\Test\LEFileSample.L01

</LogicalEvidence>

Note: JobFiles containing passwords should be saved as encrypted JobFiles to help keep this information confidential.

A <FileLists> element or <Files> element may be included to further specify the files to be extracted from a <LogicalEvidence> file. For details, see the examples under the topics for the <Directory> element and the <FileSafe> element.

SharePoints and SharePoint elements

<Input PreHash="No" CrawlOnly="No">

<SharePoints>

<SharePoint

UserName="Admin"

Password="MyPwd555#"

Domain="ORGDOMAIN"

IgnoreOlderVersions="No"

MaxTasks="10">

http://SharePointServer

<!-- one or more <Location> elements goes here: -->

</SharePoint>

<Owners/>

<!-- zero or more <Name> elements goes here -->

</SharePoints>

</Input>

The <SharePoints> element contains one or more <SharePoint> child elements. The <SharePoint> element value specifies a SharePoint server to connect to. The UserName, Password and Domain attributes specify the login credentials for accessing SharePoint content (as shown in the example, above).

Note: JobFiles containing user login credentials should be saved as encrypted JobFiles to help keep this information confidential.

The IgnoreOlderVersions attribute specifies which versions of items in a <SharePoint> are collected:

Yes

Only the latest version of each SharePoint item is collected.

No (or omitted):

All available versions of each SharePoint item are collected.

The MaxTasks attribute specifies the maximum number of simultaneous downloads to perform from this <SharePoint> server. The minimum value is 1 and the maximum value is 500. If not specified, the default value of 100 is used.

Location element – child element of SharePoint

<Input PreHash="No" CrawlOnly="No">

<SharePoints>

<SharePoint

userName="Admin"

password="MyPwd555#"

domain="ORGDOMAIN">

http://SharePointServer

<Location ExcludeCrawl="Yes"/>

<Location ExcludeCrawl="Yes">

Records

</Location>

<Location>Reports</Location>

</SharePoint>

</SharePoints>

</Input>

Each <SharePoint> element may contain one or more <Location> child elements. Each <Location> element value specifies one site within the specified SharePoint server.

The ExcludeCrawl attribute, when specified and set to "Yes" will prevent this location from being included in the collection. If this attribute is omitted or set to "No", then the location will be considered for inclusion in the collection.

In the example, above:

The first <Location> element has an empty value and an ExcludeCrawl attribute set to "Yes". This will implicitly exclude all SharePoint sites which are not otherwise included by another <Location> element.

The second <Location> element explicitly excludes the Records site, i.e. the URL http://SharePointServer/Records, from collection.

The third <Location> element includes the Reports site, i.e. the URL http://SharePointServer/Reports, for collection (subject to additional selection criteria).

Owners element – child element of SharePoints under Input

<Input PreHash="No" CrawlOnly="No">

<SharePoints>

<Owners>

<!-- two or more <Name> elements go here -->

<Name OwnerType="UserName">Jane Smith</Name>

</Owners>

</SharePoints>

</Input>

The <Owners> element specifies file owners whose files will be collected from the specified SharePoint server URLs. Two or more <Name> child elements specify the user names of the individual file owners.

This <Owners> element is for SharePoint collections only; for regular file collections, specify file owners via the <Owners> element under the <SelectionSet> parent element.

Note: If no <Name> elements are specified, then SharePoint collection will not be limited to a given set of file owners.

Name element

<Input PreHash="No" CrawlOnly="No">

<SharePoints>

<Owners>

<Name OwnerType="UserName">Jane Smith</Name>

</Owners>

</SharePoints>

</Input>

Each <Name> element specifies the Name of a SharePoint user who owns items (files in SharePoint) to be collected. Multiple <Name> elements may be specified to identify multiple item owners; specify one SharePoint user Name per <Name> element.

Attribute OwnerType specifies whether the <Name> value represents a SharePoint user Name or Account. Possible attribute values are:

UserName

The <Name> value represents a SharePoint user Name, as shown in SharePoint. This is the only supported value for the OwnerType attribute as of Collector and ECC versions 7.8 and newer.

Account

The <Name> value represents a SharePoint user Account, as shown in SharePoint. This value is for Nuix internal use only.

FileLists element

<Input PreHash="No" CrawlOnly="No">

<FileLists

PreHash="No"

PreserveAccessDate="Yes"

ExcludeADS="No">

<!-- one or more <FileList>, <XmlFileList> and/or <ExtendedFileList> elements go here -->

</FileLists>

</Input>

The <FileLists> element contains one or more <FileList>, <XmlFileList> and/or <ExtendedFileList> child elements. Each of these child elements allow you to specify files using various file lists (described on the following pages).

The PreHash attribute can be set to "Yes" or "No":

Yes

During the crawl phase, the program will calculate the hash values of all files specified by any child <FileList> elements.

Note: Setting PreHash="Yes" requires additional processing time.

No

During the crawl phase, the program will not calculate the hash values of all files specified by any child <FileList> elements, unless the <Input> element has a PreHash="Yes" attribute.

The PreserveAccessDate attribute will control whether or not the program will reset the Last Access dates and times for any files that are touched during the crawl and collection process.

When the program hashes a file’s contents or copies the file, it will need to access the file. This access will trigger the Last Access date and time to be updated automatically by Windows.

Note: If you specify the PreserveAccessDate attribute in both the <Directories> element and the <FileLists> element, the value of the attribute from the <Directories> element will take precedence.

This attribute can be set to "Yes" or "No":

Yes

The program will reset the Last Access dates and times back to the exact time that it was set to prior to the program touching each file.

Note: The process of resetting the Last Access date and time will update information in the Master File Table, which will change other metadata properties within the file system. The user-level metadata will be left as it was prior to the program being run.

No

The Last Access dates and times will be updated on all files crawled.

The ExcludeADS attribute specifies whether Alternate Data Streams will be processed:

Yes

Alternate Data Streams will be processed when processing files listed in a FileList, provided the files reside on a volume which supports Alternate Data Streams, such as an NTFS volume.

No (or omitted)

Alternate Data Streams will be ignored when processing files listed in a FileList.

FileList element

<Input PreHash="No" CrawlOnly="No">

<FileLists PreHash="No"

PreserveAccessDate="Yes"

ExcludeADS="No">

<FileList>

E:\FileLists\List1.txt

</FileList>

</FileLists>

</Input>

Each <FileList> element specifies a single FileList. There can be one or more <FileList> elements within the single <FileLists> element.

The specified <FileList> may include environment variables, e.g.:

<FileList>%USERPROFILE%\FileLists\%USERNAME%.txt</FileList>

In the above example, the %USERPROFILE% environment variable would be translated according to the user who is currently logged in and running Nuix Collector or ECC Client, e.g.:

C:\Users\Linda\FileLists\Linda.txt.

<FileList> elements which specify a UNC path generally require login credentials to access the path. The following attributes are available for the <FileList> element to enable access to network shares accessed via UNC paths:

The LoginUser attribute specifies the user ID for accessing the network path.

The Password attribute specifies the password for the above User ID.

The optional LoginDomain attribute specifies the login domain for the above User ID. Omit this attribute when accessing workgroup shares via local users.

Example:

<FileList

LoginUser="Alice"

Password="G1sv4zz2#"

LoginDomain="MYCOMPANY">

\\MyServer\MyShare\FileLists\List1.Txt

</FileList>

Note: JobFiles containing passwords should be saved as encrypted JobFiles to help keep this information confidential.

A FileList file is a simple text file (ASCII, UTF-8 or UTF-16 LE encoded), containing the full file specification – one per line – for each file to be collected, e.g.:

C:\Users\George\Documents\Contract.doc

C:\Users\George\Documents\Contract revisions.doc

C:\Users\George\Documents\Time\Billable Hours.xls

Each line within a FileList file may include environment variables, e.g.:

%USERPROFILE%\Documents\Letter.doc

In the above example, the %USERPROFILE% environment variable would be translated according to the user who is currently logged in and running Nuix Collector or ECC Client, e.g.: C:\Users\Linda\Documents\Letter.doc.

XmlFileList element

<Input PreHash="No" CrawlOnly="No">

<FileLists PreHash="No" PreserveAccessDate="Yes">

<XmlFileList>

E:\FileLists\List2.xml

</XmlFileList>

</FileLists>

</Input>

An XmlFileList file is an XML file containing a list of files to be processed by the job. Each <XmlFileList> element specifies a single XmlFileList file. There can be one or more <XmlFileList> elements within the single <FileLists> element.

The specified <XmlFileList> may include environment variables, e.g.:

<XmlFileList>%USERPROFILE%\FileLists\%USERNAME%.xml</XmlFileList>

In the above example, the %USERPROFILE% environment variable would be translated according to the user who is currently logged in and running Nuix Collector or ECC Client, e.g.: C:\Users\Linda\FileLists\Linda.xml.

<XmlFileList> elements which specify a UNC path generally require login credentials to access the path. The following attributes are available for the <XmlFileList> element to enable access to network shares accessed via UNC paths:

The LoginUser attribute specifies the user ID for accessing the network path.

The Password attribute specifies the password for the above User ID.

The optional LoginDomain attribute specifies the login domain for the above User ID. Omit this attribute when accessing workgroup shares via local users.

Example:

<XmlFileList

LoginUser="Alice"

Password="G1sv4zz2#"

LoginDomain="MYCOMPANY">

\\MyServer\MyShare\FileLists\List2.xml

</XmlFileList>

The content of an XmlFileList file is plain text, encoded in ASCII, UTF-8 or UTF-16 LE. The XML structure (or schema) for an XmlFileList file is apparent in the following example:

<File FilePath="C:\Movie Scripts\Annie Hall.txt" CreationDate="2012-12-22T10:48:36-08:00" ModificationDate="2007-08-08T23:59:00-07:00" LastAccessDate="2012-12-22T10:48:36-08:00" FileSize="177032" MD5Hash="65246F43C9C33F2B61798874DFEED291"/>

<File FilePath="C:\Movie Scripts\Apocalypse Now.txt" CreationDate="2012-12-22T10:48:36-08:00" ModificationDate="2007-08-08T23:59:00-07:00" LastAccessDate="2012-12-22T10:48:36-08:00" FileSize="188300" MD5Hash="351F4AB053140C244E85733C0E8B0597"/>

<File FilePath="C:\Movie Scripts\As Good As It Gets.txt" CreationDate="2012-12-22T10:48:36-08:00" ModificationDate="2007-08-08T23:59:00-07:00" LastAccessDate="2012-12-22T10:48:36-08:00" FileSize="134201" MD5Hash="142680713F89D22D6148C060F4FB67BD"/>

Note: Each <File> element within an XmlFileList must occur on a single line. Each line must end with a carriage return + line feed pair.

Each <File> element within an XmlFileList specifies the path to a file to be processed. The FilePath attribute holds this value.

When performing a Deletion, the following additional attributes may be specified to ensure the file being deleted is the intended file:

Attribute

Description

CreationDate

The date (and time) the file was created

ModificationDate

The date (and time) the file was last modified

LastAccessDate

The date (and time) the file was last accessed

FileSize

The size of the file (in bytes)

MD5Hash

The MD5 hash checksum for the content of the file

If any of the above optional attributes are specified, Nuix Collector's Deletion feature will check the corresponding value in the file specified by FilePath. If the values all match, the deletion will proceed. But if there is any discrepancy between the specified attributes and the corresponding value in the file, then an error will be logged and the file will not be deleted.

An alternate to XmlFileList is available: the ExtendedFileList. It is similar to an XmlFileList, but avoids using XML elements.

ExtendedFileList element

<Input PreHash="No" CrawlOnly="No">

<FileLists PreHash="No" PreserveAccessDate="Yes">

<ExtendedFileList>

E:\FileLists\ExtendedFileList3.txt

</ExtendedFileList>

</FileLists>

</Input>

An ExtendedFileList file is a text file containing a list of files to be processed by the job. Each <ExtendedFileList> element specifies a single Extended FileList file. There can be one or more <ExtendedFileList> elements within the single <FileLists> element.

The specified <ExtendedFileList> may include environment variables, e.g.

<ExtendedFileList>%USERPROFILE%\FileLists\%USERNAME%.txt</ExtendedFileList>

In the above example, the %USERPROFILE% environment variable would be translated according to the user who is currently logged in and running Nuix Collector or ECC Client, e.g.: C:\Users\Linda\FileLists\Linda.txt.

<ExtendedFileList> elements which specify a UNC path generally require login credentials to access the path. The following attributes are available for the <ExtendedFileList> element to enable access to network shares accessed via UNC paths:

The LoginUser attribute specifies the user ID for accessing the network path.

The Password attribute specifies the password for the above User ID.

The optional LoginDomain attribute specifies the login domain for the above User ID. Omit this attribute when accessing workgroup shares via local users.

Example:

<ExtendedFileList

LoginUser="Alice"

Password="G1sv4zz2#"

LoginDomain="MYCOMPANY">

\\MyServer\MyShare\FileLists\ExtendedFileList3.txt

</ExtendedFileList>

The content of an ExtendedFileList file is plain text, encoded in ASCII, UTF-8 or UTF-16 LE. The structure for an ExtendedFileList file is apparent in the following example:

FilePath="C:\Movie Scripts\Annie Hall.txt" CreationDate="2012-12-22T10:48:36-08:00" ModificationDate="2007-08-08T23:59:00-07:00" LastAccessDate="2012-12-22T10:48:36-08:00" FileSize="177032" MD5Hash="65246F43C9C33F2B61798874DFEED291"

FilePath="C:\Movie Scripts\Apocalypse Now.txt" CreationDate="2012-12-22T10:48:36-08:00" ModificationDate="2007-08-08T23:59:00-07:00" LastAccessDate="2012-12-22T10:48:36-08:00" FileSize="188300" MD5Hash="351F4AB053140C244E85733C0E8B0597"

FilePath="C:\Movie Scripts\As Good As It Gets.txt" CreationDate="2012-12-22T10:48:36-08:00" ModificationDate="2007-08-08T23:59:00-07:00" LastAccessDate="2012-12-22T10:48:36-08:00" FileSize="134201" MD5Hash="142680713F89D22D6148C060F4FB67BD"

Note: The above ExtendedFileList example is actually only three lines long; however, the pages in this document are too narrow to show the complete lines without wrapping them.

Each line within an ExtendedFileList specifies a single file to be processed. The FilePath value is the full file specification for the file to be processed.

When performing a Deletion, the following additional values may be specified to ensure the file being deleted is the intended file:

Attribute

Description

CreationDate

The date (and time) the file was created

ModificationDate

The date (and time) the file was last modified

LastAccessDate

The date (and time) the file was last accessed

FileSize

The size of the file (in bytes)

MD5Hash

The MD5 hash checksum for the content of the file

If any of the above optional values are specified, Nuix Collector's Deletion feature (or ECC's Delete task) will check the corresponding value in the file specified by FilePath. If the values all match, the deletion will proceed. But if there is any discrepancy between the values in the ExtendedFileList and the corresponding value in the specified file, then an error will be logged and the file will not be deleted.

An alternate to the ExtendedFileList is available: the XmlFileList. It is similar to an ExtendedFileList, but uses XML elements and attributes.

FileSystemFeatures element

<Input PreHash="No" CrawlOnly="No">

<FileSystemFeatures Enabled="Yes"

UseMFTIndex="No"

MFTRecordCacheSize="20000">

<!-- <CollectDeletedFiles/>

<CollectMFTs/>

<CollectFATs/>

<CollectUnallocated/>

etc. -->

</FileSystemFeatures>

Specify optional features which require direct access to the file system, including the survey or collection of deleted files, unallocated clusters and file system tables. For NTFS volumes, alternate MAC times can also be obtained from the file system. These features are only available on computers running Windows, and only for local volumes formatted with the NTFS or FAT file systems (including FAT-32, FAT-16 and FAT-12). Refer to the child elements for details regarding specific features.

The Enabled attribute specifies whether any of the file system features are enabled:

Yes

Directly read the file system tables and take advantage of file system-related features, as specified in the <FileSystemFeatures> child elements.

No

Ignore the file system tables and instead use Windows to obtain file information. All <FileSystemFeatures> child elements are ignored.

The UseMFTIndex attribute specifies whether to access the master file table via its index on NTFS-formatted logical volumes:

Yes

Utilize the MFT index, which generally speeds up the crawl stage of the collection or survey when dealing with LESS THAN 40,000 or so file or directory entries on an NTFS volume.

No

Read the entire MFT, which generally speeds up the crawl stage of the collection or survey when dealing with MORE THAN 40,000 or so file or directory entries on an NTFS volume. Also, allows the survey or collection of deleted files and unallocated space.

The MFTRecordCacheSize attribute specifies the size of the cache for reading the MFT, in MFT records, on an NTFS volume. If not specified defaults to 20000, which translates roughly into a 20 MB chunk of the MFT being read into memory at a time.

CollectDeletedFiles element

<Input PreHash="No" CrawlOnly="No">

<FileSystemFeatures Enabled="Yes"

UseMFTIndex="No"

MFTRecordCacheSize="20000">

<CollectDeletedFiles>Yes</CollectDeletedFiles>

</FileSystemFeatures>

Whether to survey or collect deleted files.

Yes

Survey or collect deleted files. Paths to be scanned for deleted files are specified via one or more <Directory> entries (within the <Input> section of the JobFile).

No

Do not survey or collect deleted files.

Note: When collecting deleted files, the file's original path is noted, and corresponding paths are established within the destination folder or FileSafe. When collecting deleted files from an NTFS-formatted volume, the UseMFTIndex attribute value of the <FileSystemFeatures> element will be forced to No as the collection takes place.

Warning: It is not always possible to gather the complete file and path names of deleted files. In some cases, underscores will be substituted for filename or path characters which have been lost. It may not be possible to successfully collect all deleted files, because deleted items are subject to being overwritten by other data. When collecting deleted files, files which retain 100% of their original content will be saved to a sub-folder called DELETED, (within the destination folder or FileSafe), while files whose content has been partially or fully overwritten will be saved to a sub-folder called DELETED+OVERWRITTEN.

Tip: You can reduce further overwrites of deleted files by minimizing writes to the volume(s) where the deleted files reside. To minimize further writes during a collection or survey, set the following JobFile values to paths which reside on a volume where deleted file processing is not taking place: 
<Titan><TempFolder> (the UseDefault attribute must be set to No)
<Titan><NistDirectory> (only if de-NISTing)
<Titan><StdDirectory> (only if excluding known files)
<Titan><DupDirectory> (only if de-duplicating)
<Target><ExtractPath> (only if saving to native/raw files)
<Target><FileSafePath> (only if saving to a FileSafe)
<Logs><Location>

CollectMFTs element

<Input PreHash="No" CrawlOnly="No">

<FileSystemFeatures Enabled="Yes"

UseMFTIndex="No"

MFTRecordCacheSize="20000">

<CollectMFTs>Yes</CollectMFTs>

</FileSystemFeatures>

Whether to survey or collect the Master File Table files on an NTFS-formatted volume.

Yes

Survey or collect the Master File Table files. Volumes are specified via one or more <Directory> entries, or via the file paths within one or more <FileList>, <XmlFileList> or <ExtendedFileList> entries (within the <Input> section of the JobFile).

No

Do not survey or collect the Master File Table files.

CollectFATs element

<Input PreHash="No" CrawlOnly="No">

<FileSystemFeatures Enabled="Yes"

UseMFTIndex="No"

MFTRecordCacheSize="20000">

<CollectFATs>Yes</CollectFATs>

</FileSystemFeatures>

Whether to survey or collect the File Allocation Table(s) on a FAT-formatted volume.

Yes

Survey or collect the File Allocation Table(s). Volumes are specified via one or more <Directory> entries, or via the file paths within one or more <FileList>, <XmlFileList> or <ExtendedFileList> entries (within the <Input> section of the JobFile).

No

Do not survey or collect the File Allocation Table(s).

CollectUnallocated element

<Input PreHash="No" CrawlOnly="No">

<FileSystemFeatures Enabled="Yes"

UseMFTIndex="No"

MFTRecordCacheSize="20000">

<CollectUnallocated>Yes</CollectUnallocated>

</FileSystemFeatures>

Whether to survey or collect the unallocated clusters on a logical volume.

Yes

Survey or collect all unallocated clusters on the specified volume(s). These clusters are treated as a single file. Volumes are specified via one or more <Directory> entries, or via the file paths within one or more <FileList>, <XmlFileList> or <ExtendedFileList> entries (within the <Input> section of the JobFile).

No

Do not survey or collect the unallocated clusters.

Note: When collecting unallocated clusters from an NTFS-formatted volume, the UseMFTIndex attribute value of the <FileSystemFeatures> element will be forced to No as the collection takes place.

GetAlternateMACTimes element

<Input PreHash="No" CrawlOnly="No">

<FileSystemFeatures Enabled="Yes"

UseMFTIndex="No"

MFTRecordCacheSize="20000">

<GetAlternateMACTimes>Yes</GetAlternateMACTimes>

</FileSystemFeatures>

Whether to obtain last Modified/Accessed/Created dates and times (MAC times) directly from the master file table on an NTFS-formatted volume. These alternate MAC times can be more reliable than the standard MAC times, particularly if malware of some other software product has manipulated the standard MAC times.

Yes

Obtain alternate MAC times directly from the MFT on an NTFS-formatted volume.

No

Obtain standard MAC times via Windows.

VolatileInfo element

<Input PreHash="No" CrawlOnly="No">

<VolatileInfo>

<!-- the <CollectVolatile>, <CollectHandles> and <CollectScreenShots> child elements go here -->

</VolatileInfo>

Contains child elements which specify whether (and how) volatile information will be collected from the target computer. Volatile information includes operating system details. For details see descriptions of the child elements <CollectVolatile>, <CollectHandles> and <CollectScreenShots>, below.

CollectVolatile element

<Input PreHash="No" CrawlOnly="No">

<VolatileInfo>

<CollectVolatile>Yes</CollectVolatile>

<CollectHandles>Yes</CollectHandles>

<CollectScreenShots ImageType="PNG">Yes
‎</CollectScreenShots>

</VolatileInfo>

The <CollectVolatile> element specifies whether volatile information will be collected from the target computer. The text value can be set to Yes or No:

Yes

Collect volatile information from the target computer.

No

Do not collect any volatile information from the target computer.

If the <CollectVolatile> element is not specified, a value of No is assumed: no volatile information will be collected from the target computer.

CollectHandles element

<Input PreHash="No" CrawlOnly="No">

<VolatileInfo>

<CollectVolatile>Yes</CollectVolatile>

<CollectHandles>Yes</CollectHandles>

<CollectScreenShots ImageType="PNG">Yes
‎</CollectScreenShots>

</VolatileInfo>

The <CollectHandles> element specifies whether a list of open handles will be collected from the target computer. The text value can be set to Yes or No:

Yes

Collect a list of open handles associated with running processes on the target computer. The <CollectVolatile> element must also have a value of Yes.

No

Do not collect a list of open handles associated with running processes on the target computer.

CollectScreenShots element

<Input PreHash="No" CrawlOnly="No">

<VolatileInfo>

<CollectVolatile>Yes</CollectVolatile>

<CollectHandles>Yes</CollectHandles>

<CollectScreenShots ImageType="PNG">Yes
‎</CollectScreenShots>

</VolatileInfo>

The <CollectScreenShots> element specifies whether images of open applications and the Windows Desktop will be collected from the current user's session on the target computer. The text value can be set to Yes or No:

Yes

Collect images of open applications and the Windows Desktop from the current user's session on the target computer. The <CollectVolatile> element must also have a value of Yes.

No

Do not collect any images of open applications or the Windows Desktop from the current user's session on the target computer.

The <CollectScreenShots> element includes an ImageType attribute, which specifies the image file format to use. This value can be either PNG or JPG.

RAMCapture element

<Input PreHash="No" CrawlOnly="No">

<RAMCapture>

<!-- the <CaptureRAM>, <EnableLogging> and <CaptureEngine> child elements go here -->

</RAMCapture>

Contains child elements which specify whether (and how) system RAM will be captured and collected from the target computer. For Nuix Collector Network or Portable editions, the target computer is the computer that is running Nuix Collector. For Enterprise Collection Center, the target computers are the computers running ECC Client.

For details see descriptions of the child elements <CaptureRAM>, <EnableLogging> and <CaptureEngine>, below.

CaptureRAM element

<Input PreHash="No" CrawlOnly="No">

<RAMCapture>

<CaptureRAM OutputFormat="RAW">Yes</CaptureRAM>

<EnableLogging>Yes</EnableLogging>

<CaptureEngine>PMEM</CaptureEngine>

</RAMCapture>

The <CaptureRAM> element specifies whether system RAM will be collected from the target computer. The text value can be set to Yes or No:

Yes

Collect system RAM from the target computer.

No

Do not collect system RAM from the target computer.

If the <CaptureRAM> element is not specified, a value of No is assumed: system RAM will be not collected from the target computer.

The OutputFormat attribute must be RAW for Nuix Collector Suite 7.2 and newer, and for Nuix Enterprise Collection Center 7.2 and newer. RAM capture is not supported on older versions of these products.

EnableLogging element

<Input PreHash="No" CrawlOnly="No">

<RAMCapture>

<CaptureRAM OutputFormat="RAW">Yes</CaptureRAM>

<EnableLogging>Yes</EnableLogging>

<CaptureEngine>PMEM</CaptureEngine>

</RAMCapture>

The <EnableLogging> element specifies whether the RAM capture process will be logged:

Yes

Generate a log file to log the RAM capture process. The log file is named RAMCaptureLog.txt and is saved in a folder named RAMCapture, located under the folder specified by the <Logs><Location> element. To generate this log file, the <CaptureRAM> element must also have a value of Yes.

No

Do not generate a log file for RAM capture.

CaptureEngine element

<Input PreHash="No" CrawlOnly="No">

<RAMCapture>

<CaptureRAM OutputFormat="RAW">Yes</CaptureRAM>

<EnableLogging>Yes</EnableLogging>

<CaptureEngine>PMEM</CaptureEngine>

</RAMCapture>

The <CaptureEngine> element specifies the system used to capture system RAM. The value of this element must be PMEM for Nuix Collector Suite 7.2 and newer, and for Nuix Enterprise Collection Center 7.2 and newer. The <CaptureRAM> element must also have a value of Yes. RAM capture is not supported on older versions of these products.

End of Input element, closed by the </Input> tag.

SelectionSet element

Selection Set settings are used to identify the criteria to use to identify specific files to be collected or extracted. These settings reside within one set of <SelectionSet>..</SelectionSet> tags.

<SelectionSet>

<!-- various "child" elements go here:

<Owners/>

<Extensions/>

<Dates/>

<DateCriteria/>

<SaveFilesPath/>

<Hashes/>

<Advanced/>

</SelectionSet>

The <SelectionSet> element has no attributes or text. All settings are stored in "child" elements, described below.

Owners element – child element of SelectionSet

<SelectionSet>

<Owners>

<!-- various "child" elements go here:

<Name></Name> can be more than one

<SID></SID> can be more than one -->

</Owners>

..

</SelectionSet>

The <Owners> element specifies file owners whose files will be collected. Child elements specify the user names or user SIDs of the file owners.

This <Owners> element is for regular file collections only; for SharePoint collections, specify file owners via the <Owners> element under the <Input><SharePoints> parent element.

Note: The "owner" is the user who last saved the file according to the file system. It is not necessarily the user who first created the file.
To detect file ownership, the file must reside on an NTFS volume or CIFS share and have valid ownership metadata.
If no <Name> or <SID> elements are specified, then file collection will not be limited to a given set of file owners.

Name element

<SelectionSet>

<Owners>

<Name>rzeigler</Name><!-- zero or more Name elements allowed -->

</Owners>

</SelectionSet>

Each <Name> element specifies the username of a file owner. Multiple <Name> elements may be specified to identify multiple users; specify one user per <Name> element.

When specifying Owner names, the computer name or Active Directory domain name may be prefaced, e.g.: <Name>ORGDOMAIN\jsmith</Name>

SID element

<SelectionSet>

<Owners>

<SID>S-1-5-21-1957994488-2025429265-682003330-1003</SID>

<!-- zero or more SID elements allowed -->

</Owners>

</SelectionSet>

Each <SID> element specifies the Security Identifier (SID) of a file owner (each user has a distinct SID). Multiple <SID> elements may be specified to identify multiple owners; specify one SID per <SID> element.

Extensions element (file types)

<SelectionSet>

<Extensions AnalyzeSignatures="No" Mode="Include">

<!-- "child" elements go here:

<Extension></Extension> can be zero or more <Extension> elements -->

</Extensions>

<SelectionSet>

The AnalyzeSignatures attribute can be set to "Yes" or "No":

Yes

The program will collect all files whose filename extension matches one of the specified <Extension> child elements. In addition, the program will perform signature analysis on all files which have a filename extension not listed in one of the <Extension> child elements, to ensure that files with an incorrect filename extension are considered for collection. If signature analysis identifies a file as having the same intrinsic type as one of the specified <Extension> child elements, the file will be included for collection.

Note: Signature analysis is only available for file types which:

Are listed in the <Extension> child elements, and

Have a corresponding file signature within the signature header file.

For details see topic File Signature Analysis within the Nuix Collector Suite User Guide.

No

The program will process all files whose filename extension matches one of the specified <Extension> child elements. Signature analysis will not be performed; files are selected for processing based on their filename extension. Files must also meet any other specified selection criteria.

The Mode attribute can be set to "Include" or "Exclude":

Include

The program will collect only files that have the extension or match the file type (if Signature Analysis is turned on) for the file extensions specified in the Selection Set attributes.

Exclude

The program will collect all files except those with the file extensions specified in the Selection Set attributes. If Signature Analysis is used the program will collect all files except files that match the Signature Analysis for the file types specified in the Selection Set Attributes.

Extension element

<SelectionSet>

<Extensions AnalyzeSignatures="No" Mode="Include">

<Extension CollectAll="No">pdf</Extension>

</Extensions>

<SelectionSet>

Each <Extension> element specifies a filename extension for files to be considered for collection. Multiple <Extension> elements may be specified to identify multiple file extensions to be considered. To collect or extract all files regardless of filename extension, omit all <Extension> elements (the parent <Extensions> element is still required).

If the <Extensions> element's AnalyzeSignatures attribute is set to Yes, and if the specified extension (e.g.: pdf) has a corresponding entry in the signatures header file, then the <Extensions> element also specifies an intrinsic file type. In this case, Signature analysis will be applied to files whose filename extension has no corresponding <Extension> element. Any files matching the intrinsic file type signature (e.g.: pdf) will be collected – regardless of their filename extension.

The CollectAll attribute can be set to "Yes" or "No":

Yes

Files with this extension (or of this file type) will be automatically collected (known as "auto-collect"), and will not be searched for any full-text search terms specified via the Advanced element.

No

Files with this extension (or of this file type) will be searched for any full-text search terms specified via the Advanced element.

Dates element

<SelectionSet>

<Dates UTC-Mode="No">

<!-- "child" elements go here:

<CreationDate></CreationDate>

<ModificationDate></ModificationDate>

<AccessDate></AccessDate> -->

</Dates>

<SelectionSet>

The <Dates> element restricts processing to files whose creation dates, modification dates and/or last access dates fall within the specified criteria. Child elements contain the actual criteria. One or more <Dates> elements are permitted.

The more child elements contained within a <Dates> element, the narrower the file selection criteria becomes. For example, specifying both <CreationDate> and <ModificationDate> elements within a <Dates> element will select only files which fall within both the specified creation date range and the specified modification date range.

By specifying multiple <Dates> elements, the file selection criteria becomes broader. For example, given two <Dates> elements – one with a <CreationDate> element and one with a <ModificationDate> element – files will be selected which fall within either the creation date range from the first <Dates> element or the modification date range from the second <Dates> element.

Note: The Nuix Collector Suite Wizard saves date criteria under this <Dates> element. An alternate element called <DateCriteria> can also be used to specify date criteria.

The UTC-Mode attribute can be set to "Yes" or "No":

Yes

File dates and times are interpreted in Universal Time Conversion (UTC/GMT).

Note: If you are processing data collected from multiple time zones, it is recommended that you set the UTC-Mode attribute to Yes.

No

File dates and times are interpreted according to the configured time zone of the computer running Nuix Collector or ECC Client.

CreationDate element

<SelectionSet>

<Dates UTC-Mode="No">

<CreationDate>

<StartDate>2012-03-11T03:11:12</StartDate>

<EndDate>2012-03-11T17:24:00</EndDate>

</CreationDate>

</Dates>

<SelectionSet>

The <CreationDate> element restricts the collection to files whose Creation Date falls within the specified date range.

The <StartDate> child element specifies the starting Creation Date (inclusive).

The <EndDate> child element specifies the ending Creation Date (inclusive).

The <StartDate> and <EndDate> values can be expressed in the format "yyyy-MM-ddThh:mm:ss", where yyyy is the year, MM is the month, dd is the day, T is a literal constant, hh is the hour, mm is the minute and ss is the second.

The <StartDate> and <EndDate> values can also be expressed using the {today} variable, e.g.:

<CreationDate>

<StartDate>{today-7}</StartDate>

<EndDate>{today}</EndDate>

</CreationDate>

The above example would act on files created in the past 7 days, up through the current day. Note that "today" is defined as the day when the JobFile is executed. As a value in a <StartDate> element, {today} or {today-n} begins at 00:00:00 (midnight). As a value in an <EndDate> element, {today} or {today-n} ends at 23:59:59 (the last second of the day).

ModificationDate element

<SelectionSet>

<Dates UTC-Mode="No">

<ModificationDate>

<StartDate>2012-03-11T03:11:12</StartDate>

<EndDate>2012-03-11T17:24:00</EndDate>

</ModificationDate>

</Dates>

<SelectionSet>

The <ModificationDate> element restricts the collection to files whose Modification Date falls within the specified date range.

The <StartDate> child element specifies the starting Modification Date (inclusive).

The <EndDate> child element specifies the ending Modification Date (inclusive).

The <StartDate> and <EndDate> values can be expressed in the format "yyyy-MM-ddThh:mm:ss", where yyyy is the year, MM is the month, dd is the day, T is a literal constant, hh is the hour, mm is the minute and ss is the second.

The <StartDate> and <EndDate> values can also be expressed using the {today} variable, e.g.:

<ModificationDate>

<StartDate>{today-7}</StartDate>

<EndDate>{today}</EndDate>

</ModificationDate>

The above example would act on files modified in the past 7 days, up through the current day. Note that "today" is defined as the day when the JobFile is executed. As a value in a <StartDate> element, {today} or {today-n} begins at 00:00:00 (midnight). As a value in an <EndDate> element, {today} or {today-n} ends at 23:59:59 (the last second of the day).

AccessDate element

<SelectionSet>

<Dates UTC-Mode="No">

<AccessDate>

<StartDate>2012-03-11T03:11:12</StartDate>

<EndDate>2012-03-11T17:24:00</EndDate>

</AccessDate>

</Dates>

<SelectionSet>

The <AccessDate> element restricts the collection to files whose Last Access Date falls within the specified date range.

The <StartDate> child element specifies the starting Last Access Date (inclusive).

The <EndDate> child element specifies the ending Last Access Date (inclusive).

The <StartDate> and <EndDate> values can be expressed in the format "yyyy-MM-ddThh:mm:ss", where yyyy is the year, MM is the month, dd is the day, T is a literal constant, hh is the hour, mm is the minute and ss is the second.

The <StartDate> and <EndDate> values can also be expressed using the {today} variable, e.g.:

<AccessDate>

<StartDate>{today-7}</StartDate>

<EndDate>{today}</EndDate>

</AccessDate>

The above example would act on files last accessed in the past 7 days, up through the current day.

Note: "today" is defined as the day when the JobFile is executed. As a value in a <StartDate> element, {today} or {today-n} begins at 00:00:00 (midnight). As a value in an <EndDate> element, {today} or {today-n} ends at 23:59:59 (the last second of the day).

DateCriteria element – (an alternate to the <Dates> element)

<SelectionSet>

<DateCriteria

StartDate="2012-01-01T00:00:00"

EndDate="2012-12-31T23:59:59"

CheckCreation="Yes"

CheckModification="Yes"

CheckAccess="No"

Mode="Or"

UTC-Mode="No"

<DateCriteria/>

<SelectionSet>

The <DateCriteria> element restricts processing to files whose creation dates, modification dates or last access dates fall within the specified criteria. Tag attributes contain the actual criteria. One or more <DateCriteria> elements are permitted.

Note: The Collection Wizard within Nuix Enterprise Collection Center - Admin Console saves date criteria under this <DateCriteria> element. An alternate element called <Dates> can also be used to specify date criteria.

The <DateCriteria> element attributes can be set as follows:

Attribute

Value

Description

StartDate

A date+time, in the format
‎"yyyy-MM-ddThh:mm:ss"

Specifies the starting date+time (inclusive).

EndDate

A date+time, in the format
‎"yyyy-MM-ddThh:mm:ss"

Specifies the ending date+time (inclusive).

CheckCreation

"Yes" or "No"

If "Yes", the specified date+time applies to each file's Creation date+time.

CheckModification

"Yes" or "No"

If "Yes", the specified date+time applies to each file's Last Modified date+time.

CheckAccess

"Yes" or "No"

If "Yes", the specified date+time applies to each file's Last Access date+time.

Mode

"And" or "Or"

If the Mode is set to "And", then files will be included in the collection only if they meet all the checks which are set to "Yes".

If the Mode is set to "Or", then files will be included in the collection if they meet any of the checks which are set to "Yes".

Specifying And is more restrictive; and specifying Or is more inclusive. The following two examples illustrate:

If CheckModification and CheckAccess are both "Yes", and Mode is "And", then only files whose Modification Date falls within the specified date range and whose Last Access Date falls within the specified date range will be selected.

If CheckModification and CheckAccess are both "Yes", but Mode is "Or", then files whose Modification Date falls within the specified date range or whose Last Access Date falls within the specified date range will be selected.

UTC-Mode

"Yes" or "No"

When "Yes", file dates and times are interpreted in Universal Time Conversion (UTC/GMT).

When "No", file dates and times are interpreted according to the configured time zone of the computer running Nuix Collector or ECC Client.

If you are processing data collected from multiple time zones, setting UTC-Mode="Yes" is recommended.

Keywords element

<SelectionSet>

<Keywords>

<Keyword>sample</Keyword>

<Keyword>sample phrase</Keyword>

</Keywords>

<SelectionSet>

Within the <Keywords> element, one or more <Keyword> elements define the keywords or phrases used to determine which text files to process.

Note: Only plain text files encoded in ASCII or Unicode can be searched using <Keywords>.

The <Keywords> element is no longer accessible via the Wizards in Collector and ECC as of version 7.8. Instead, the Advanced element can be used to define a full-text search. <Advanced> searches can search the contents of text files, Word documents, Excel workbooks, PDF files, binary files and dozens of other common file formats.

When searching a set of text files, an <Advanced> search may run slower than a search based on <Keywords>.

Keyword element – Begins With

<Keyword>product_id*</Keyword>

Using "*" at the end of a keyword will identify all files that contain any instances of a word that begins with that string of characters.

Keyword element – Ends With

<Keyword>*product_id</Keyword>

Using "*" at the beginning of a keyword will identify all files that contain any instances of a word that ends with that string of characters.

Keyword element – Contains

<Keyword>*product_id*</Keyword>

Using "*" at the beginning and the end of a keyword will identify all files that contain any instances of a word that contains that string of characters.

Keyword element – AND

<Keyword>(*product_id*) & (*invoice_id*)</Keyword>

Using "&" in between keywords will identify all files that contain one or more instances of both keywords.

Keyword element – OR

<Keyword>(product_id*) | (product-id*) | (productid*)</Keyword>

Using "|" in between keywords will identify all files that contain any instances of any of the keywords.

Keyword element – NOT

<Keyword>(*product_id*) ! (*invoice_id*)</Keyword>

Using "!" in between keywords will identify all files that contain one or more instances of the keyword on the left (*product_id*) but not any instances of the keyword on the right (*invoice_id*).

Keyword element – Within (proximity search)

<Keyword>allen w/2 Smith</Keyword>

Using a "w/{N}" in between keywords will identify all files that contain either keyword within {N} words of the other keyword. Substitute a positive whole number for {N}.

Hashes element

<SelectionSet>

<Hashes HashType="MD5" UseToIncludeMatches="Yes">

%temp%\Include_Db{DateTime}

<!-- child elements go here:

<DBDirPath/>

<Hash/>

<HashLists/>

-->

</Hashes>

<SelectionSet>

The optional <Hashes> element contains specifications for selecting files based on the MD5 hash value of each file's content. The ability to select files based on hash values can be helpful for locating and processing sensitive files or known malware files.

Any other file selection criteria specified in the JobFile also apply. I.E. the only files which will be selected for processing are the files:

whose MD5 hash value is one of the specified hash values, and

where all other specified file selection criteria match (e.g.: file extension, modification date, folder, etc.).

The text value of the <Hashes> element can be used to specify the folder for the select-by-hash database. This database stores file hash values during processing. If the select-by-hash database already exists, any hash values already present in the database will also be processed. Alternately, this path can be specified using the <DBDirPath> child element.

Actual file hash values are specified via one or more <Hash> child elements, and/or a <HashLists> child element.

The <Hashes> element contains three attributes:

The HashType attribute must be set to MD5.

The UseToIncludeMatches attribute can be set to Yes or No:

Yes

Include the file(s) whose hash values match the hash values specified in the <Hash> child element(s).

No

Do not use the hash values in this element or child elements to include files for processing.

The Source attribute references the file or folder which the hash was generated from. This attribute is optional.

DBDirPath element

<SelectionSet>

<Hashes HashType="MD5" UseToIncludeMatches="Yes">

<DBDirPath>C:\Data\HashFolder\Include_Db{DateTime}

</DBDirPath>

</Hashes>

<SelectionSet>

Specifies the directory which contains the select-by-hash database. File hash values specified via this JobFile will be added to this database. Hash values from the select-by-hash database will then be read and processed. If the select-by-hash database already exists, any hash values already present in the database will also be processed.

To ensure a unique, empty select-by-hash database is used for a given job, specify a {DateTime} variable as part of the folder name.

You can omit the <DBDirPath> element and instead specify the folder for the select-by-hash database by setting the value of the <Hashes> parent element.

Hash element

<SelectionSet>

<Hashes HashType="MD5" UseToIncludeMatches="Yes">

%temp%\Include_Db{DateTime}

<Hash

MicroHash="B6F0A3848CEEFC81455C015520CC60DE"

FileSize="34620124"

Source="C:\Docs\Letter.docx">

7D7D868CA5D5590E2FD52A5FD9E67493

</Hash>

</Hashes>

<SelectionSet>

The text value of each <Hash> child element holds the MD5 hash value of one file to be selected for processing. In the example above, the <Hash> element specifies a file hash value of 7D7D868CA5D5590E2FD52A5FD9E67493.

Each <Hash> element can contain two optional attributes:

The MicroHash attribute contains the MD5 hash value of the first 64 KB of the file (or the MD5 hash value of the entire file, if the file size is 64 KB or less).

The FileSize attribute contains the size of the file in bytes.

The Source attribute is an optional attribute which references the original file or folder which the hash value was calculated from. This value may be displayed in the Nuix Collector Wizard, but has no impact on processing.

Note: If any of the <Hash> elements lack both a MicroHash attribute and a FileSize attribute, it will take the collection engine longer to identify files by hash value. The Nuix Collector Wizard includes these optional attribute values automatically whenever you select files (or folders of files) for inclusion on the Collect by Hash screen.

HashLists element

<SelectionSet>

<Hashes HashType="MD5" UseToIncludeMatches="Yes">

%temp%\Include_Db{DateTime}

<HashLists>

<HashList>C:\Data\MyHashList.txt</HashList>

</HashLists>

</Hashes>

<SelectionSet>

The <HashLists> element holds one or more <HashList> child elements. Each <HashList> child element specifies the full filename of a Hash List File containing MD5 hash values of files to process. It can be more convenient to specify MD5 file hash values via a Hash List File than by using a <Hash> element for each MD5 hash value.

HashList element

<SelectionSet>

<Hashes HashType="MD5" UseToIncludeMatches="Yes">

%temp%\Include_Db{DateTime}

<HashLists>

<HashList>C:\Data\MyHashList.txt</HashList>

</HashLists>

</Hashes>

<SelectionSet>

The text value of each <HashList> child element holds a full file specification to a Hash List File. A Hash List File is a simple text file containing one or more MD5 hash values. Only one MD5 hash value (32 characters consisting of hexadecimal numerals only) must appear per line, with no spaces or delimiters. The Hash List File should be an ordinary text file, encoded as ANSI, UTF-8 or UTF-16 LE. The latter two Unicode text file encodings require the Hash List File be saved with a corresponding Byte Order Mark.

SaveFilesPath element – Saved Selection Sets

<SelectionSet>

<SaveFilesPath AllowSave="yes">../Sets</SaveFilesPath>

<SelectionSet>

The <SaveFilesPath> element specifies the folder containing Saved Selection Sets. These Saved Selection Sets define the default list of selection sets available from within the Collector Wizard. This element is typically used within a JobFile Template, which the user will load into the Collector Wizard.

The <SaveFilesPath> element value may be a relative path, as shown in the example, above. The path is relative to the Modules folder within the Nuix Collector or ECC Client installation folder.

The AllowSave attribute can be set to "Yes" or "No":

Yes

When running a Collector Wizard, the user can save changes or additions to the Saved Selection Sets.

No

When running a Collector Wizard, the user cannot save changes or additions to the Saved Selection Sets. Existing sets can still be selected.

Advanced element

<SelectionSet>

<Advanced

Active="Yes"

Timeout="14400"

ServerURL="https://eccserver.mydomain.com"

Formula="1 AND (2 OR 3)"

JavaHeapSpace="3072m"

ProcessContainerItems="Yes"

CaseSensitive="No"

RegularExpression="No">

<Text ExpressionName="Sample Search Term">Williams</Text>

</Advanced>

<SelectionSet>

The <Advanced> element specifies search criteria for a full-text search for a file collection or survey. The content of each file encountered during the collection or survey is searched for any of the specified terms. Files which contain any one of the specified terms will be processed (i.e. collected or included in the survey). If a file cannot be processed successfully by the full-text search feature, it will be processed.

Numerous file types are supported for full-text search, including Microsoft Office documents, PDF files containing text, text files, binary application files and dozens of other common file types. Files stored within PST files and ZIP files can also be searched with the full-text search feature: if any file within the PST or ZIP contains a search term, the entire PST or ZIP file is collected or included in the survey.

For ECC, Network Collector and Portable Collector, full-text searches require more memory and processing time than other collection jobs. For ECC, such searches can run only on ECC Client computers with the full-text search feature installed. Full-text search is not available for SharePoint collections. See the corresponding Users Guide for memory requirements and other details.

The Active attribute specifies whether to use full-text searching. It can be set to "Yes" or "No":

Yes

Employ full-text searching against various data file types.

No

Do not use full-text searching.

The Timeout attribute specifies the maximum amount of time, in seconds, to allow the full-text search engine to complete searching each specified directory. If not specified, the default value is 14400, i.e. 14400 seconds or 4 hours.

The ServerURL attribute specifies the ECC Server URL which each ECC Client will access to obtain text parsing utilities for certain kinds of data files. This attribute is not required for Nuix Collector.

The Formula attribute specifies an optional way to combine multiple full-text search expressions using the terms AND, OR and NOT. Parentheses can also be used for grouping. An expression is referenced in the formula by its number. For example: 1 AND (2 OR 3) AND (4 OR NOT (5)).

Before a formula is evaluated, each expression will be searched separately and resolve to TRUE or FALSE. The whitespace setting in each search expression will be honored. See the ECC User Guide or the Collector Suite User Guide for further examples.

Defaults and shortcuts:

Set the formula to just one word, OR, to join all expressions by OR. When formula is blank an OR is implied.

Set the formula to just one word, AND, to join all expressions by AND.

You may also specify &amp; for AND, | for OR, ! for NOT. For example 1 &amp; (2|3) &amp; (4|!5) is equivalent to the earlier example 1 AND (2 OR 3) AND (4 OR NOT (5)).

The JavaHeapSpace attribute specifies the maximum amount of memory to reserve for executing the full-text search process. This value corresponds to the -Xmx command line parameter in Java 8. For example, to reserve 3 gigabytes of memory specify "3g" or "3072m" for the JavaHeapSpace attribute value. "3072m" is a suggested minimum when running on a 64-bit operating system having at least 4 GB of memory. For 32-bit operating systems, a value of "1400m" is suggested.

The ProcessContainerItems attribute specifies how a full-text search applies to a container file, such as a PST or ZIP file. Can be set to "Yes" or "No":

Yes

If an item found within a container file meets the full-text search criteria, then the individual item is processed or collected.

No

If an item found within a container file meets the full-text search criteria, then the entire container file is processed or collected.

The CaseSensitive attribute specifies whether the search terms for full-text search – the <Text> elements – are case-sensitive or not. This applies only to <Text> child elements which lack their own CaseSensitive attribute. Can be set to "Yes" or "No":

Yes

A case-sensitive search will be performed using the search term within each <Text> element. i.e. letter case is honored.

No

A case-insensitive search will be performed using the search term within each <Text> element. I.E. letter case is ignored.

The RegularExpression attribute specifies whether the search terms for full-text search – the <Text> elements – are Regular Expressions or literal search terms. This applies only to <Text> child elements which lack their own RegularExpression attribute. Can be set to "Yes" or "No":

Yes

Each <Text> element specifies a Regular Expression.

No

Each <Text> element specifies a literal search term.

Text element – child element of Advanced

<SelectionSet>

<Advanced

Active="Yes"

Timeout="14400"

ServerURL=https://eccserver.mydomain.com

Formula="1 AND (2 OR 3) "

JavaHeapSpace="3072m"

CaseSensitive="No"

RegularExpression="No">

<Text

ExpressionName="Sample Search Term"

Number="1"

CaseSensitive="Yes"

RegularExpression="No"

WhitespaceBetween="None"

UserInput="Williams"

UserInputType="Plain">Williams</Text>

</Advanced>

<SelectionSet>

The <Text> element specifies an individual search expression for a full-text search. Multiple <Text> elements may be specified, i.e. one <Text> element per search term. Files containing ANY of the specified search expressions will be collected or processed (this behavior can be overridden; see the Formula attribute).

The value of this <Text> element is a regular expression generated by the Wizards in Nuix Collector and Nuix ECC. The regular expression is based on UserInput and other attributes.

The search processor ignores all attributes, except for:

RegularExpression: since 8.2 it is deprecated and is always set to 'Yes' in new JobFiles. For backward compatibility when it is 'No', the search expression is treated as a plain text.

CaseSensitive: the search processor sets the corresponding flag when compiling the regular expression.

Each <Text> element can include the following attributes:

The ExpressionName attribute specifies a human-readable name for the search term within the <Text> element. Used by the ECC and Collector Wizards for named Regular Expression search terms. This attribute is optional.

The Number attribute assigns an arbitrary number to this search expression. The number can be used to refer to this search expression within an optional search formula (see the Formula attribute under the parent element <Advanced>.

The Number attribute specifies the number of this search expression. If a Formula is specified, it can refer to this search expression by its ExpressionName or Number.

The CaseSensitive attribute specifies whether this full-text search term is case-sensitive or not. Can be set to 'Yes' or 'No':'

Yes

Case-sensitive full-text searching.

No

Case-insensitive full-text searching.

The RegularExpression attribute specifies whether this full-text search term is interpreted as a Regular Expression or as Plain Text. Can be set to 'Yes' or 'No':'

Yes

This search term is interpreted as a Regular Expression.

No

This search term is interpreted literally.

Note: The RegularExpression attribute is deprecated, but maintained for compatibility with older JobFiles. The Wizards in Nuix Collector and Nuix ECC versions 8.2 and newer always set this attribute value to 'Yes'. These newer versions track the search text or pattern that the user originally specified in the UserInput and UserInputType attributes.

Note: If an individual <Text> element lacks a CaseSensitive attribute or RegularExpression attribute, then these missing attribute values will be read from the <Advanced> parent element and applied to the <Text> element.

The WhitespaceBetween attribute specifies whether additional whitespaces and newlines will be permitted within a literal search term. This attribute applies only when RegularExpression="No".

Allowing for whitespaces and newlines between words or characters can ensure search terms are found even within poorly formatted documents (including text files generated by OCR processing of image files, or text extracted from certain kinds of PDF documents). Can be set to "None", "Words" or "Characters":

None

No allowance is made for additional whitespaces; the literal search term will be used as specified.

Words

The specified literal search term is converted into a Regular Expression. Each space is converted to \s+ in the generated regular expression. This allows any number of spaces, tabs and newlines between each word in the original search expression.

Characters

The specified literal search term is converted into a Regular Expression, and the term \s+ is specified: (1) in place of each space in the original search term, and (2) between each character in the original search term. This allows any number of spaces, tabs and newlines between each space or character in the original literal search term.

Tip: Specifying WhitespaceBetween="Characters" provides the widest assurance of identifying and selecting files containing a literal search term. There is no performance penalty for using this option.

The UserInput attribute holds the search text or pattern as entered by the user. This value is interpreted as either Plain Text or a Regular Expression, as specified in the UserInputType attribute.

The UserInputType attribute specifies whether the user intended this search term to be interpreted as a Regular Expression or as Plain Text. Can be set to 'RegEx' or 'Plain':

RegEx

This search expression is intended to be interpreted as a Regular Expression.

Plain

This search expression is intended to be interpreted literally.

Note: If the user chose an option to allow for extraneous whitespace, then a regular expression will be generated based on the UserInput -- even if the user intended the search expression to be Plain Text.

End of SelectionSet element, closed by the </SelectionSet> tag.

Target element

Settings in the <Target> element identify the location where the collected or extracted files will be placed, and whether a Deletion will occur. For Collection jobs, the file type to save (Native or FileSafe) can also be specified. These settings reside within one set of <Target>..</Target> tags.

<Target

HashOnly="No"

CreateFileSafe="No"

DoExtract="Yes"

DoDelete="No"

DeleteOnly="No"

PostValidate="No"

HashBlocks="No"

SuppressDuplicates="No"

SeparateAltDataStreams="No">

<!-- various "child" elements go here:

<FileSafePath/> ...destination when collecting to a FileSafe

<ExtractPath/> ...destination when collecting to native files

<Trash/> ...when deleting files

<OriginalLocation/> -->

</Target>

The <Target> element is used to define the settings for the output (processing results).

The HashOnly attribute can be set to "Yes" or "No":

Yes

Each responsive file will be hashed, but no files will be collected, extracted or deleted. The attributes CreateFileSafe and DoExtract must be set to "No" when HashOnly is set to "Yes".

When the <DupDirectory> element's UseToEliminateMatches attribute is set to "Yes", these input file hash values will be added to the Duplicates Database.

No

This is the typical value. Each responsive file will be processed.

The CreateFileSafe attribute can be set to "Yes" or "No":

Yes

The collected or extracted files will be saved to a FileSafe file. The attributes HashOnly and DoExtract must be set to "No" when CreateFileSafe is set to "Yes".

No

The collected or extracted files will not be saved to a FileSafe.

The DoExtract attribute can be set to "Yes" or "No":

Yes

Specified files will be saved as native copies. The attributes HashOnly and CreateFileSafe must be set to "No" when DoExtract is set to "Yes".

No

Specified files will not be saved as native copies.

The DoDelete attribute can be set to "Yes" or "No":

Yes

Specified files will be deleted (requires a Deletion feature license for Nuix Collector Suite). The attribute HashOnly must also be set to "No" for Deletions to occur.

Warning: JobFiles and JobFile Templates configured with DoDelete="Yes" can irrevocably delete files without a confirmation prompt. Treat such files with care, especially if you are scripting Nuix Collector.

No

No deletions will take place.

The DeleteOnly attribute can be set to "Yes" or "No":

Yes

File deletion will occur without collecting any files.

No

File deletion will occur after each file has been successfully collected.

The PostValidate attribute can be set to "Yes" or "No":

Yes

The collected or extracted files will be re-hashed at the end of the collection, to ensure that all files on the destination media have the same hash values as their corresponding source files. Enabling this feature requires more processing time.

No

The collected or extracted files will not be re-hashed.

Note: Nuix Collector and ECC Client always use hashing to verify file integrity during a file save. Post-validation is a secondary check which re-reads and re-hashes the newly saved files to ensure no media errors have corrupted these files.

When saving to a FileSafe, you can set the PostValidate attribute to No, then later run the Verify FileSafe Utility to double-check the new FileSafe. Using the utility allows validation to occur on a different computer at a different time, and provides a detailed validation report.

The HashBlocks attribute specifies whether each chunk of data is hashed when being saved to a cloud destination, such as AmazonS3 or Azure Blob Storage. This attribute can be set to Yes or No:

Yes

As data is moved to the destination, each block of data is hashed before and after it is written. Enabling this feature requires more processing time.

No

The data is not hashed during the writing.

The SuppressDuplicates attribute can be set to "Yes" or "No":

Yes

The files will be de-duplicated and only unique instances of each file will be extracted/collected.

No

Files will be collected without regard to whether a file with the same hash value had previously been processed.

The SeparateAltDataStreams attribute can be set to "Yes" or "No":

Yes

Alternate data streams for any collected or extracted files will be saved as separate files.

No

Alternate data streams for any collected or extracted files will be saved as NTFS alternate data streams. The association between the alternate data stream and its parent file is maintained. This is the default value whenever the SeparateAltDataStreams attribute is not specified.

Note: Alternate Data Streams are collected from NTFS-formatted volumes only if the ExcludeADS attribute is set to "No" or omitted for any input sources.

When files collected from NTFS-formatted volumes are saved as native files to a FAT32 volume, any alternate data streams are always saved as separate files, regardless of the SeparateAltDataStreams attribute.

ExtractPath element

<Target

HashOnly="No"

CreateFileSafe="No"

DoExtract="Yes"

PostValidate="No"

SuppressDuplicates="No">

<ExtractPath

OverWrite="No"

CheckExistsOnOverWrite="No"

CopyMode="No"

NumConcurrent="5"

Suffix="MyCase\MyCollection\MyCustodian\MyTarget\Native Copies">

E:\Destination\MyCase\MyCollection\MyCustodian\MyTarget\Native Copies</ExtractPath>

</Target>

The <ExtractPath> element specifies the location for saving native copies of each file which will be collected or extracted. The specified path may include environment variables, e.g.:

<ExtractPath OverWrite="No" CopyMode="No">

%USERPROFILE%\Desktop\Extraction

</ExtractPath>

In the above example, the ExtractPath would be translated according to the currently logged in user, e.g.:

C:\Users\Linda\Desktop\Extraction.

Note: The specified <ExtractPath> may also be a URL pointing to an S3 bucket on Amazon Web Services or a URL pointing to Azure Blob Storage. Saving to Azure Blob Storage is only available on PCs running Windows 7 / Windows Server 2008 R2 or newer, Linux 64-bit, or macOS.

Guidelines for Amazon S3 bucket URLs

The S3 bucket must already exist. If the bucket resides in the default us-east-1 region, then the <ExtractPath> value would look like:

https://s3.amazonaws.com/mybucketname/MyCase/MyCollection {DateTime}

For S3 buckets in other AWS regions, the region is part of the URL:

https://s3-myregion.amazonaws.com/mybucketname/MyCase/MyCollection {DateTime}

or with a dot after 's3':

https://s3.myregion.amazonaws.com/mybucketname/MyCase/MyCollection {DateTime}

or using 'virtual hosted' style URLs, e.g.:

https://mybucketname.s3.myregion.amazonaws.com/MyCase/MyCollection {DateTime}

where

'myregion' is the AWS region where the S3 bucket resides, e.g.:'us-west-1'.

'mybucketname' is the name of the existing S3 bucket.

'MyCase/' is an optional folder for organizing collections for a given case.

'MyCollection {DateTime}' is an optional folder for this collection's items. In this example, the {DateTime] variable will be converted to the date and time when the collection runs, to ensure that a secondary run of the collection will not overwrite files saved in a previous run.

Guidelines for Azure Blob Storage URLs

The Azure Storage Account must already exist, and must of type BlobStorage, Storage (general purpose v1), or StorageV2 (general purpose v2). Files uploaded to Azure are always uploaded as Blob types. Given a Storage Account named myblobstorage, and a desired path of mycase/mycollection {DateTime} within the storage account, the URL would look this:

https://myblobstorage.blob.core.windows.net/mycase/mycollection {DateTime}

The Azure Account (within which the Storage Account resides) is not part of the URL, nor is the Region where the account resides. Only the Storage Account is used in the URL.

The first folder in the URL after "windows.net/" is a special object in Azure, called a Container ("mycase" in the example, above). A Container must be part of the URL. Files cannot be placed in the root of the Storage Account. If the Container specified in the URL does not exist, it will be created for you. Consider the following valid URL for a native collection:

https://myblobstorage.blob.core.windows.net/mycase/mycollection {DateTime}

Storage Account: myblobstorage

Container Name: mycase

Additional subfolders: mycollection {DateTime}

Now consider the following invalid URL:

https://myblobstorage.blob.core.windows.net

Storage Account: myblobstorage

Container Name: no container - invalid URL

Additional subfolders: no additional subfolders - okay

Containers also have naming restrictions - they can only be named with lowercase letters, numbers, and dashes (but not consecutive dashes). If the Container name contains illegal characters, it is automatically modified to create a valid container name. The following rules are used to modify an illegal container name:

Upper case is changed to lower case

Consecutive dashes are reduced down to a single dash

Spaces are removed

All illegal characters (non-alphanumeric, excepting spaces) are replaced with a dash if the dash does not create consecutive dashes. If consecutive dashes would result, then the illegal character is removed with no replacement.

For example the following container name: My New Case #4 -- (from Joe)

Is modified to mynewcase-4-fromjoe-

ExtractPath attributes

The OverWrite attribute can be set to "Yes" or "No":

Yes

Any files that exist at the destination that have the same name as the files being extracted or moved will be overwritten by the files being extracted.

No

Any files being extracted or moved that have the same name as any files already existing at the destination will be re-named with a sequential number appended to the end of the file name. This is the default value if this attribute is missing, empty or has an invalid value.

The CheckExistsOnOverWrite attribute is intended for Move jobs in Nuix ECC, but also can impact a collection (in ECC or Collector) which saves to native files. This attribute can be set to "Yes" or "No":

Yes

If a file exists at the destination that has the same name as the file being extracted or moved, this file will be overwritten by the file being extracted or moved. But if there is no corresponding file that already exists at the destination, then no file is saved to the destination and an error is logged. This setting is applicable only when the OverWrite attribute is also set to "Yes".

No

Files will be extracted or moved regardless whether a same-named file already exists at the destination. This setting is applicable only when the OverWrite attribute is also set to "Yes". This is the default value if this attribute is missing, empty, or has an invalid value.

The CopyMode attribute can be set to "Yes", "No" or "Single":

Yes

Only the path to the beginning of the original crawl directory will be maintained. Corresponds to Wizard setting Maintain Full Path unchecked.

No

The entire original directory path will be maintained at the destination. Corresponds to Wizard setting Maintain Full Path checked.

For an example with CopyMode set to "No", see topic Network Collection Wizard Page "Collection Settings" in the Nuix Collector Suite User Guide.

Single

Files will be saved in the destination in a flat directory structure, within subfolders such as Files_1-20, Files_21-40, Files_41-60, etc. The actual ranges depend on the FilesPerFolder attribute, described below.

The NumConcurrent attribute specifies the maximum number of concurrent file upload sessions to allow. This attribute is optional, and applies only when the <ExtractPath> is a URL to an Amazon S3 bucket. A value of 10 is suggested. The minimum value is 1. NumConcurrent does not apply when saving to Azure Blob Storage.

The Suffix attribute specifies the portion of the <ExtractPath> value that represents the additional sub-folders to be created inside the destination folder specified by the user. When using ECC, a typical value for Suffix is "Case\Collection\Custodian\Target\Native Copies", where blue values represent placeholders for the specified case, collection, etc. The New Collection Wizard in ECC Admin Console obtains the default Suffix value from the Job XML template.

The optional FilesPerFolder attribute will collect files into a flat directory structure (ignoring the original file path), provided the CopyMode attribute is set to "Single". The FilesPerFolder attribute value can be set to a positive whole number to determine the number of files that will be collected into a single folder. For example, if FilesPerFolder is set to 1000, then within the Destination directory there will be a separate sub-folder for every 1000 files.

Note: The original path for each file can be found in the Collection Log, along with the path that each file was collected to (the ItemID column).

<ExtractPath> elements which specify a UNC path generally require login credentials to access the path. The following attributes are available for the <ExtractPath> element to enable access to network shares accessed via UNC paths:

The LoginUser attribute specifies the user ID for accessing the <ExtractPath> when it resides on a network path, an Amazon S3 bucket, or Azure Blob Storage. For an Amazon S3 bucket, this LoginUser attribute corresponds to the Access Key ID. For Azure, the LoginUser is the name of the Storage Account.

The Password attribute specifies the password for the above LoginUser. For an Amazon S3 bucket, this Password attribute corresponds to the Secret Access Key. For Azure, the Password attribute is the Access Key, which is generated at the time the Storage Account is created.

The optional LoginDomain attribute specifies the login domain for the above LoginUser. Omit this attribute when accessing workgroup shares via local users.

The following example saves a collection as native files on the \\MyServer\MyShare path, using login credentials for domain user Alice (on the MYCOMPANY domain):

<ExtractPath

OverWrite="No"

CopyMode="No"

LoginUser="Alice"

Password="G1sv4zz2#"

LoginDomain="MYCOMPANY">

\\MyServer\MyShare

</ExtractPath>

Note: JobFiles containing user IDs and passwords should be encrypted and stored carefully to reduce their exposure to unauthorized users. For details see topic Customizing JobFiles.

FileSafePath element

<Target

HashOnly="No"

CreateFileSafe="Yes"

DoExtract="No"

PostValidate="No"

SuppressDuplicates="No">

<FileSafePath

FileSafeName="MyFileSafe"

FileSafePassword=""

SegmentSizeMagnitude="GB"

SegmentSize="2"

NumConcurrent="10"

Suffix="MyCase\MyCollection\MyCustodian\MyTarget\FileSafe"

Compression="None"

QuickValidate="No">

E:\Destination\MyCase\MyCollection\MyCustodian\MyTarget\FileSafe\MyFileSafe</FileSafePath>

</Target>

The <FileSafePath> element specifies the path and filename of the target FileSafe to be created, as well as various FileSafe options.

Note: The .mfs01 filename extension is omitted.

A FileSafePath of E:\FileSafes\MyFileSafe would create one or more target FileSafe files named as follows:

E:\FileSafes\MyFileSafe.mfs01

E:\FileSafes\MyFileSafe.mfs02

E:\FileSafes\MyFileSafe.mfs03

Note: To save a collection or extraction as a new FileSafe the CreateFileSafe attribute of the parent <Target> element must be set to "Yes".

The specified <FileSafePath> value may include environment variables, e.g.

<FileSafePath>

%USERPROFILE%\FileSafes

</FileSafePath>

In the above example, the %USERPROFILE% environment variable would be translated according to the user who is currently logged in and running Nuix Collector or ECC Client, e.g.: C:\Users\Linda\FileSafes.

Note: The specified <FileSafePath> may also be a URL pointing to an S3 bucket on Amazon Web Services or a URL pointing to Azure Blob Storage. Saving to Azure Blob Storage is only available on PCs running Windows 7 / Windows Server 2008 R2 or newer, Linux 64-bit, or macOS.

Guidelines for Amazon S3 bucket URLs

The S3 bucket must already exist. If the bucket resides in the default us-east-1 region, then the <FileSafePath> value would look like:

https://s3.amazonaws.com/mybucketname/MyCase/MyCollection {DateTime}/MyFileSafe

For S3 buckets in other AWS regions, the region is part of the URL:

https://s3-myregion.amazonaws.com/mybucketname/MyCase/MyCollection {DateTime}/MyFileSafe

or with a dot after 's3':

https://s3.myregion.amazonaws.com/mybucketname/MyCase/MyCollection {DateTime}/MyFileSafe

or using 'virtual hosted' style URLs, e.g.:

https://mybucketname.s3.myregion.amazonaws.com/MyCase/MyCollection {DateTime}/MyFileSafe

where

myregion is the AWS region where the S3 bucket resides, e.g.:'us-west-1'

mybucketname is the name of the existing S3 bucket

MyCase is an optional folder for organizing collections for a given case

MyCollection {DateTime}' is an optional folder for this collection's items. In this example, the {DateTime} variable will be converted to the date and time when the collection runs, to ensure that a secondary run of the collection will not overwrite FileSafes saved in a previous run. MyFileSafe is the base filename for the FileSafe, without the .mfs01 filename extension.

Guidelines for Azure Blob Storage URLs

The Azure Storage Account must already exist, and must of type BlobStorage, Storage (general purpose v1), or StorageV2 (general purpose v2). Files uploaded to Azure are always uploaded as Blob types. Given a Storage Account named myblobstorage, and a desired path of mycase/MyCollection {DateTime} within the storage account, and the first FileSafe file named MyFileSafe.mfs01, the URL would look this:

https://myblobstorage.blob.core.windows.net/mycase/MyCollection {DateTime}/MyFileSafe

The Azure Account (within which the Storage Account resides) is not part of the URL, nor is the Region where the account resides. Only the Storage Account is used in the URL.

The first folder in the URL after "windows.net/" is a special object in Azure, called a Container ("mycase" in the example, above). A Container must be part of the URL. Files cannot be placed in the root of the Storage Account. If the Container specified in the URL does not exist, it will be created for you. Consider the following valid URL for a collection to a FileSafe:

https://myblobstorage.blob.core.windows.net/mycase/MyCollection {DateTime}/MyFileSafe

Storage Account: myblobstorage

Container Name: mycase

Additional subfolders: MyCollection {DateTime}

FileSafe File(s): MyFileSafe.mfs01; MyFileSafe.mfs02; MyFileSafe.mfs03; etc.

Note that the .mfs01 filename extension is omitted from the URL.

Now consider the following invalid URL:

https://myblobstorage.blob.core.windows.net

Storage Account: myblobstorage

Container Name: no container - invalid URL

Additional subfolders: no additional subfolders - okay

FileSafe File(s): not specified - invalid URL

Containers also have naming restrictions - they can only be named with lowercase letters, numbers, and dashes (but not consecutive dashes). If the Container name contains illegal characters, it is automatically modified to create a valid container name. The following rules are used to modify an illegal container name:

Upper case is changed to lower case

Consecutive dashes are reduced down to a single dash

Spaces are removed

All illegal characters (non-alphanumeric, excepting spaces) are replaced with a dash if the dash does not create consecutive dashes. If consecutive dashes would result, then the illegal character is removed with no replacement.

For example the following container name: My New Case #4 -- (from Joe)

Is modified to mynewcase-4-fromjoe-

FileSafePath attributes

The FileSafeName attribute specifies the portion of the <FileSafePath> value that represents the "name" of the FileSafe. If the AlternateDestination path is used for the collection destination, a FileSafe named according to this attribute value will be created in the AlternateDestination path.

The FileSafePassword attribute specifies the password for accessing the FileSafe itself. Optional. If present and a value is specified, indicates that the FileSafe to be created will be password protected.

Tip: Use the Collector Wizard (in Nuix Collector Suite) or the New Collection Wizard (in ECC Admin Console) to specify the FileSafe password. These wizards will save the password in the JobFile as a salted hash (i.e. encrypted). This is more secure than editing JobFiles manually and typing in plain text password values.

FileSafes can become huge, so several attributes are available allowing you to create a set of smaller FileSafe files which, taken together, form a single FileSafe repository. Each file in the set is considered a "segment".

The SegmentSizeMagnitude attribute specifies the unit of measurement for calculating the maximum size of each file in a multi-segment FileSafe. This attribute can be set to one of the following:

KB

(Kilobyte)

MB

(Megabyte)

GB

(Gigabyte)

The SegmentSize attribute specifies a positive whole number to be applied to the SegmentSizeMagnitude attribute to specify the size of each segment in a multi-segment FileSafe.

For example, SegmentSizeMagnitude="GB" SegmentSize="2" will create one or more files – each no greater than 2 GB. The FileSafe repository is "spanned" across the set of files.

The Compression attribute specifies the compression level for a FileSafe. This setting can be set to one of the following:

None

No compression — largest FileSafe, with fewest CPU cycles

Medium

Medium compression — smaller FileSafe

High

High compression — smallest FileSafe, but highest CPU cycles (possibly slower, depending on CPU speed)

 

Note: High compression collections may run slower or faster than with medium or no compression, depending on the relative speeds of your network, hard disk(s) and CPU.

The NumConcurrent attribute specifies the maximum number of concurrent file upload sessions to allow. This attribute is optional, and applies only when the <FileSafePath> is a URL to an Amazon S3 bucket. A value of 10 is suggested. The minimum value is 1. NumConcurrent does not apply when saving to Azure Blob Storage.

The Suffix attribute specifies the portion of the <FileSafePath> value that represents the additional sub-folders to be created inside the destination folder specified by the user. When using ECC, a typical value for Suffix is "Case\Collection\Custodian\Target\FileSafe", where blue values represent placeholders for the specified case, collection, etc. The New Collection Wizard in ECC Admin Console obtains the default Suffix value from the Job XML template.

The <FileSafePath> element accepts an optional DestDrive attribute for use with Portable Collector, which can be set as follows:

RelativeDrive

The specified <FileSafePath> value will be adjusted so the FileSafe will be saved to the same volume which Portable Collector is running from. This ensures the FileSafe will be saved onto the same external hard disk where Portable Collector resides, without having to know in advance what drive letter will be assigned to the external hard disk.

?

The end-user or custodian is prompted for a location to save the resulting FileSafe. This prompt will occur when the collection is run.

The following example uses DestDrive="RelativeDrive" to save one or more FileSafe files to the Collection folder on the volume where Portable Collector was launched from.

<FileSafePath

SegmentSizeMagnitude="GB"

SegmentSize="2"

Compression="None"

DestDrive="RelativeDrive">

Collection\MyFileSafe</FileSafePath>

The above example would create one or more target FileSafe files named:

X:\Collection\MyFileSafe.mfs01

X:\Collection\MyFileSafe.mfs02

X:\Collection\MyFileSafe.mfs03

...etc., where X: is the drive letter from which Portable Collector was launched.

The QuickValidate attribute specifies whether to do a "Quick Validate" after the FileSafe is closed. A Quick Validate verifies that the FileSafe can be opened, and checks the number of files contained in it, but does not hash each file by reading the FileSafe contents. Quick Validate is a good option when the destination is a remote file share or cloud destination, where Post Validate would incur a significant performance penalty.

Yes

Perform a Quick Validate.

No

Perform a standard Post Validate. Enabling this feature requires more processing time.

<FileSafePath> elements which specify a UNC path generally require login credentials to access the path. The following attributes are available for the <FileSafePath> element to enable access to network shares accessed via UNC paths:

The LoginUser attribute specifies the user ID for accessing the <FileSafePath> when it resides on a network path, an Amazon S3 bucket, or Azure Blob Storage. For an Amazon S3 bucket, this LoginUser attribute corresponds to the Access Key ID. For Azure, the LoginUser is the name of the Storage Account.

The Password attribute specifies the password for the above LoginUser. For an Amazon S3 bucket, this Password attribute corresponds to the Secret Access Key. For Azure, the Password attribute is the Access Key, which is generated at the time the Storage Account is created.

The optional LoginDomain attribute specifies the login domain for the above User ID. Omit this attribute when accessing workgroup shares via local users.

The following example saves a collection as a FileSafe on the \\MyServer\MyShare path, using login credentials for domain user Alice (on the MYCOMPANY domain):

<FileSafePath

SegmentSizeMagnitude="GB"

SegmentSize="2"

Compression="None"

LoginUser="Alice"

Password="G1sv4zz2#"

LoginDomain="MYCOMPANY">

\\MyServer\MyShare

</FileSafePath>

Note: JobFiles containing user IDs and passwords should be encrypted and stored carefully to reduce their exposure to unauthorized users. For details see topic Customizing JobFiles.

AlternateDestination element

<Target HashOnly="No" . . . >

<FileSafePath . . . >

<AlternateDestination

Force="No"

LoginUser="Alice"

Password="G1sv4zz2#"

LoginDomain="MYCOMPANY">

\\OtherServer\AlternateShare

</AlternateDestination>

</Target>

Optional. Specifies an alternate target folder path for the collection. This path does not specify any filename. It is used only if the primary destination specified in the <ExtractPath> or <FileSafePath> is not reachable.

Note: AlternateDestination, if present, can specify a local path, a network path, a URL for an Amazon S3 bucket, or a URL for a storage container within Azure Blob Storage. Saving to Azure Blob Storage is only available on PCs running Windows 7 / Windows Server 2008 R2 or newer, Linux 64-bit, or macOS. For details see the guidelines for Amazon S3 and Azure Blob Storage URLs under the topics ExtractPath element and FileSafePath element.

AlternateDestination attributes

The Force attribute, if set to "Yes", ensures the AlternateDestination is used regardless of the accessibility of the primary destination. Otherwise, the AlternateDestination is used only if the primary destination is not accessible and writeable.

The LoginUser attribute specifies the user ID for accessing the <AlternateDestination> folder when it resides on a network path, an Amazon S3 bucket, or Azure Blob Storage. For an Amazon S3 bucket, this LoginUser attribute corresponds to the Access Key ID. For Azure, the LoginUser is the name of the Storage Account.

The Password attribute specifies the password for the above LoginUser. For an Amazon S3 bucket, this Password attribute corresponds to the Secret Access Key. For Azure, the Password attribute is the Access Key, which is generated at the time the Storage Account is created.

The optional LoginDomain attribute specifies the login domain for the above LoginUser. Omit this attribute when accessing workgroup shares via local users.

MaxBandWidthMbps element

<Target HashOnly="No" . . . >

<FileSafePath . . . >

<AlternateDestination . . .>

<MaxBandWidthMbps ThrottleLocal="No">500</MaxBandWidthMbps>

</Target>

Optional. Specifies a limit on the rate of saving collected data for file or disk image collections. This allows collection jobs to be "throttled" to reduce their performance impact on the network or the destination drive. The value of <MaxBandWidthMbps> must be a positive integer specifying the maximum rate in megabits per second. If the value is 0 then no limit is set (i.e. saving of collected data is limited only by hardware constraints and contention from concurrent activity).

The ThrottleLocal attribute can be set to Yes or No:

Yes

Throttling applies to data saved to a network destination or local volume.

No

Throttling applies only to data saved to a network destination. This is the default value if this attribute is missing, empty or has an invalid value.

Note: Throttling is ineffective on network paths which use mapped drive letters.

Trash element

<Target HashOnly="No" . . . >

<FileSafePath . . . >

<AlternateDestination . . .>

<MaxBandWidthMbps . . .>

<Trash

OverwriteCycles="1"

ScrambleCreationDates="Yes"

ScrambleModificationDates="Yes"

ScrambleLastAccessDates="Yes"

ScrambleName="Yes"

DeleteFolders="Yes"

</Trash>

</Target>

The <Trash> element specifies options for scrambling and deleting files and folders. This element is ignored unless two conditions are met:

The <Target> element's DoDelete attribute is set to "Yes".

A separate Deletion License is activated (required for either Nuix Collector Suite or Nuix Enterprise Collection Center).

After a file is deleted, the file can often be undeleted using various undelete utilities. These utilities must read residual data within the file system to recover the deleted file. The various attributes in the <Trash> element allow you to randomize this data, to prevent successful undeletion.

The OverwriteCycles attribute specifies the number of times to overwrite the previously allocated sectors (the content) of each deleted file with random data. Valid values are 0 to 7.

Each overwrite cycle takes time. A value of 0 (no overwrite) runs fastest but leaves open the possibility of being able to undelete the file's content. A value of 1 renders the file content permanently deleted; however, some extremely sophisticated recovery equipment can possibly detect traces of old file patterns. Various military standards require from 2 to 7 overwrite cycles to securely wipe data.

The ScrambleCreationDates, ScrambleModificationDates and ScrambleLastAccessDates attributes specify whether to scramble the Created / Modified / Last Access dates, respectively, of each file being deleted. This effects the file allocation table.

The ScrambleName attribute specifies whether to scramble the name of each deleted file with random data. This effects the file allocation table.

The DeleteFolders attribute specifies whether to delete any folders which were emptied by the current deletion job. Can be Yes or No:

Yes

Deletes any folders which were emptied by the current deletion job.

No

Any folders emptied by the program will remain.

OriginalLocation element

<Target

HashOnly="No"

CreateFileSafe="No"

DoExtract="Yes"

PostValidate="No"

SuppressDuplicates="No">

<OriginalLocation>George's Laptop</OriginalLocation>

</Target>

The <OriginalLocation> element holds a description for the original location of the files collected. The contents of this element will be used to create a folder (directory) with this name at the root level of the destination location.

In the example, above, the root folder within a resulting FileSafe or native extraction would be \Georges Laptop. If <OriginalLocation> is left blank, then no additional directory will be created at the root of the destination.

Note: The <OriginalLocation> element corresponds to the "Description" field on the first page of the Collector Wizard.

End of Target element, closed by the </Target> tag.

Logs element

Settings in the <Logs> element specify the output folders where logs and reports are saved to. Also, the specific logs and reports to be generated are specified in this element. These settings reside within one set of <Logs>..</Logs> tags.

<Logs UTC-Mode="No"

GenerateXML="No"

IncludeJobInfo="Yes"

MaxRowsAllowedInReport="5000"

MaxRowsAllowedInLog="500000"

WriteBOM="Yes"

EncodingType="UTF-16"

GenerateLocalLogs="Yes" >

<!-- various "child" elements go here:

<Location/>

<BaseName/>

<CollectionReport/>

<CrawlReport/>

...etc -->

</Logs>

The UTC-Mode attribute can be set to "Yes" or "No":

Yes

File dates and times within logs and reports are interpreted in Universal Time Conversion (UTC/GMT).

If you are processing data collected from multiple time zones, it is recommended that you set the UTC-Mode attribute to Yes.

No

File dates and times within logs and reports are interpreted according to the configured time zone of the computer running Nuix Collector or ECC Client.

The GenerateXML attribute allows XML log files to be produced, in addition to any CSV log files. Once generated, these XML files can be manually edited and converted into XML FileLists, or transformed via XSL templates into custom HTML reports. The GenerateXML attribute can be set to "Yes" or "No":

Yes

An XML log file will be generated for each Log file specified in the JobFile.

Note: XML log file generation is available only on Windows. There is a performance cost to generating XML log files. XML log files feature only a subset of the fields available on the standard CSV log files.

No

No XML files will be generated. This is more efficient and will speed up the generation of logs.

The IncludeJobInfo attribute specifies whether job details are added to the Summary Report HTML file. The IncludeJobInfo attribute can be set to "Yes" or "No":

Yes

A Job Info element containing job details will be added to the start of the Summary Report. This is the default setting if the IncludeJobInfo attribute is not present.

No

The Job Info element will be omitted from the Summary Report.

The MaxRowsAllowedInReport attribute specifies the maximum number of rows that can be included in an HTML format report. If the report to be generated will exceed this number of rows, then (1) the report will contain only a message indicating the row limit has been reached, and (2) the corresponding CSV log file will be generated in full. The MaxRowsAllowedInReport default value is 5000 if this attribute is omitted. Setting this value to 0 will generate HTML reports regardless of size.

The MaxRowsAllowedInLog attribute specifies the maximum number of rows that can be included in a .csv log file (examples: a crawl or collection log). If this attribute is not present the default maximum row limit is 500000. Setting this value to 0 will generate .csv log files with an unlimited number of rows.

The WriteBOM attribute specifies whether to include a byte order mark at the beginning of each report and log file. The WriteBOM attribute can be set to "Yes" or "No":

Yes

A byte order mark will be written. This is the default setting if the WriteBOM attribute is not present or empty.

No

No byte order mark will be written.

The EncodingType attribute specifies the text encoding to use for each report and log file. The EncodingType attribute can be set to "UTF-8" or "UTF-16":

UTF-8

Writes a UTF-8 encoded file.

UTF-16

Writes a UTF-16LE encoded file. This is the default setting if the EncodingType attribute is not present or empty.

The GenerateLocalLogs attribute specifies whether logs, reports and crawl database files (.mcd01 and .cdb files) that are to be saved to a remote destination (e.g. a network share or cloud destination) are first written locally, and then copied to the remote destination when the job is finished. Can be Yes or No:

Yes

Logs, reports and crawl database files are written locally first, before being copied to a remote destination. Writing to the SQL database across a network is quite slow and susceptible to network outages, so writing these files locally first improves both performance and reliability. This is the default behavior if the GenerateLocalLogs attribute is missing or empty and the logs destination is a remote destination.

No

Logs, reports and crawl database files are written directly to the remote destination. Specify No if logs should not be written locally first. Has no effect if the logs destination is a local path.

Location element

<Logs UTC-Mode="No">

<Location>C:\Test\Output\Logs</Location>

</Logs>

The <Location> element specifies the location for databases, reports and log files – provided the <OutputDirectory> element value begins with the special sequence $\.

Note: The <Location> element can specify a local path, a network path, a URL for an Amazon S3 bucket, or a URL for a storage container within Azure Blob Storage. Saving to Azure Blob Storage is only available on PCs running Windows 7 / Windows Server 2008 R2 or newer, Linux 64-bit, or macOS. For details see the guidelines for Amazon S3 and Azure Blob Storage URLs under the topic ExtractPath element.

The specified <Location> value may include environment variables, e.g.:

<Location>

%USERPROFILE%\Desktop\CollectionLogs

</Location>

In the above example, the %USERPROFILE% environment variable would be translated according to the user who is currently logged in and running Nuix Collector or ECC Client, e.g.: C:\Users\Linda\\Desktop\CollectionLogs. For ECC, this requires configuring the job to run with impersonation.

The <BaseName> element, described below, defines a further sub-folder within which logs and reports are placed.

Location attributes

The <Location> element accepts an optional UseTemp attribute:

Yes

Use the folder specified by the <TempFolder> element (a child element of <Titan>) for saving reports and log files.

No

Use the <Location> and <BaseName> elements to determine the output folder for reports and log files. This is the default setting if the UseTemp attribute is not specified.

The <Location> element also accepts an optional DestDrive attribute for use with Portable Collector, which can be set as follows:

RelativeDrive

The specified <Location> value will be adjusted so it resides on the same volume which Portable Collector is running from. This ensures the resulting logs and reports will be saved onto the same external hard disk where Portable Collector resides, without having to know in advance what drive letter will be assigned to the external hard disk.

?

When the DestDrive attribute value is a question mark, the end-user or custodian is prompted for a location to save log files to. This prompt will occur when the collection is run.

For example, the following would save logs to the \Output\Logs folder on the volume where Portable Collector was launched from.

<Logs UTC-Mode="No">

<Location DestDrive="RelativeDrive">

Output\Logs

</Location>

..

</Logs>

Other elements also impact the location where logs and reports are saved; see the <BaseName> element, immediately below, as well as the <OutputDirectory> element, within the Reports and Logs Subsection which follows.

<Location> elements which specify a UNC path generally require login credentials to access the path. The following attributes are available for the <Location> element to enable access to network shares accessed via UNC paths:

The LoginUser attribute specifies the user ID for accessing the <Logs><Location> folder when it resides on a network path, an Amazon S3 bucket, or Azure Blob Storage. For an Amazon S3 bucket, this LoginUser attribute corresponds to the Access Key ID. For Azure, the LoginUser is the name of the Storage Account.

The Password attribute specifies the password for the above LoginUser. For an Amazon S3 bucket, this Password attribute corresponds to the Secret Access Key. For Azure, the Password attribute is the Access Key, which is generated at the time the Storage Account is created.

The optional LoginDomain attribute specifies the login domain for the above User ID. Omit this attribute when accessing workgroup shares via local users.

The following example saves collection logs and reports on the \\MyServer\MyShare\Logs\MyCollection path, using login credentials for domain user Alice (on the MYCOMPANY domain):

<Logs UTC-Mode="No">

<Location

LoginUser="Alice"

Password="G1sv4zz2#"

LoginDomain="MYCOMPANY">

\\MyServer\MyShare\Logs

</Location>

<BaseName>MyCollection</BaseName>

..

</Logs>

Note: JobFiles containing user IDs and passwords should be encrypted and stored carefully to reduce their exposure to unauthorized users. For details see topic Customizing JobFiles.

The Suffix attribute specifies the portion of the <Location> value that represents the additional sub-folders to be created inside the logs destination folder specified by the user.

<Logs UTC-Mode="No">

<Location

Suffix="MyCase\MyCollection\MyCustodian\MyTarget\Logs">

D:\MyDestination\MyCase\MyCollection\MyCustodian\MyTarget\Logs

</Location>

<BaseName>MyCollection</BaseName>

..

</Logs>

When using ECC a typical value for Suffix is "Case\Collection\Custodian\Target\Logs", where blue values represent placeholders for the specified case, collection, etc. The New Collection Wizard in ECC Admin Console obtains the default Suffix value from the Job XML template.

BaseName element

<Logs UTC-Mode="No">

<Location>C:\CollectionResults\Logs</Location>

<BaseName>MyCollection</BaseName>

</Logs>

The <BaseName> element is used to define a sub-folder containing the crawl database for a collection or extraction. The value of the <BaseName> element is appended to the value of the <Location> element.

In the above example, the crawl database folder would be C:\CollectionResults\Logs\MyCollection. Generally, the reports and logs folder is a subfolder of this folder; for details see the <OutputDirectory> element, below.

Available logs and reports

Nuix Collector and ECC Client can generate numerous logs and reports after each collection, extraction or deletion. To configure which reports and logs to generate, include the following child elements within the <Logs> element of the JobFile:

Log/Report Tag

Description

<SummaryReport>

A summary of all job processing activity.

<CrawlReport>

Lists the files accessed during the crawl phase.

<FolderReport>

Lists the folders accessed during the crawl phase.

<ResponsiveReport>

Lists files which met all the applicable selection criteria. This is the primary report for a Survey.

<UnresponsiveReport>

Lists files that did not meet all the collection criteria and were excluded.

<UndeterminedReport>

A listing of the files in which responsiveness could not be determined (in some cases the Advanced Search feature cannot search the text within a given file). These files are collected in case they are responsive.

<CollectionReport>

Lists files collected.

Also produces a CollectedWithErrors report or log, listing files that were collected but experienced an error during collection.

<UncollectedReport>

Lists files which should have been collected based on the selection criteria, but could not be collected due to one or more errors.

<DuplicateReport>

Lists files which were excluded from processing due to being internal or external duplicates.

<NistReport>

Lists files that were excluded from processing due to being recognized as belonging to the set of files in the NIST hash database.

<StdReport>

List of files excluded from processing due to being part of the "Standard" hash database.

<DeletionReport>

List of files which were deleted.

<NotDeletedReport>

The files which should have been deleted based on the selection criteria, but could not be deleted due to one or more errors.

<VolatileInfoReport>

A summary of the volatile information collected from the target computer.

<VolatileInfoWarningReport>

Lists warnings detected during volatile information collection. Numerous warnings are typical due to access restrictions and other limitations imposed by the operating system.

<WarningReport>

List of any warnings which occurred during file processing.

<ErrorReport>

List of any errors which occurred during file processing, including errors during re-tries.

Log/Report child elements

Each of the above Log/Report elements requires a common set of "child" elements for configuring the log or report file. For example, here are two <CollectionReport> entries — one for the Collection Log (CSV) and one for the Collection Report (HTML):

<Logs UTC-Mode="No">

<Location>C:\CollectionResults\Logs</Location>

<BaseName>MyCollection</BaseName>

<CollectionReport name="Collection Log">

<OutputDirectory>$\Collection</OutputDirectory>

<ReportFileExtension>csv</ReportFileExtension>

</CollectionReport>

<CollectionReport name="Collection Report">

<OutputDirectory>$\Collection</OutputDirectory>

<ReportFileExtension>htm</ReportFileExtension>

<CssTemplate>..\Templates\CollectionReport.css</CssTemplate>

</CollectionReport>

</Logs>

OutputDirectory element – a child element for each report or log

<OutputDirectory>$\Responsive</OutputDirectory>

The <OutputDirectory> element specifies the destination directory for each report or log file.

If the first two characters of the <OutputDirectory> element value are the special sequence $\ -- as in the example, above – then these two characters will be replaced with the value of the <Location> element, followed by the value of <BaseName> (if specified). Using the $\ sequence is optional. For example:

<Logs UTC-Mode="No">

<Location>C:\Test\Output\Logs</Location>

<BaseName>MyCollection</BaseName>

<ResponsiveReport name="Responsive Log">

<OutputDirectory>$\Responsive</OutputDirectory>

<ReportFileExtension>csv</ReportFileExtension>

</ResponsiveReport>

</Logs>

The above JobFile settings would generate a collection log named:

C:\Test\Output\Logs\MyCollection\Responsive\Responsive Log.csv

The log filename and path are built from the following elements:

Path Element

Derives From

C:\Test\Output\Logs

The first two characters of the <OutputDirectory>, $\, translate to the <Location> value.

\

Added by the program

MyCollection

The first two characters of the <OutputDirectory>, $\, also append any <BaseName> value

\

Added by the program

Responsive

Any remaining characters in the <OutputDirectory> are appended.

\

Added by the program

Responsive Log

Filename taken from <ResponsiveReport> element's "name" attribute value.

(period)

Added by the program

csv

Filename extension taken from <ReportFileExtension> value.

The specified <OutputDirectory> value may include environment variables, e.g.

<OutputDirectory>

%USERPROFILE%\Desktop\CollectionReports

</OutputDirectory>

In the above example, the %USERPROFILE% environment variable would be translated according to the user who is currently logged in and running Nuix Collector or ECC Client, e.g.: C:\Users\Linda\Desktop\CollectionReports.

ReportFileExtension element – a child element for each report or log

<ReportFileExtension>htm</ReportFileExtension>

The <ReportFileExtension> element specifies the filename extension to append to the report or log file. This extension can be either csv or htm:

csv

Generate a log file as a tab-separated text file. This file can be opened in Microsoft Excel.

htm

Generate a report file as an HTML text file. This file can be opened in a web browser. Report formatting can be altered using a corresponding CSS template (see below).

CssTemplate element – a child element for each report

<CssTemplate>..\Templates\ResponsiveReport.css
‎</CssTemplate>

The <CssTemplate> element specifies the cascading style sheet file used to format the content of the resulting HTML report. This element is optional. The value of <CssTemplate> may be an absolute path, or a relative path as shown in the example, above. A relative path is relative to the Modules folder within the Nuix Collector or ECC Client installation folder.

Note: When generating log files, no CSS is utilized (omit the <CssTemplate> element).

Fields element – an optional child element for each report or log

<Fields>

</Fields>

The <Fields> element contains one or more <FieldName> child elements which specify customizations to the report of log. See the <FieldName> element, below, for details.

FieldName element

<CollectionReport name="Collection Report">

<OutputDirectory>$\Collection</OutputDirectory>

<ReportFileExtension>htm</ReportFileExtension>

<CssTemplate>..\Templates\</CssTemplate>

<Fields>

<FieldName Include="Yes"

ColumnHeading="Status"

InsertBefore="ReportCreationDate">ReportStatus

</FieldName>

</Fields>

Each <FieldName> element specifies a column (a field) in the report or log file to customize. The text value of this element specifies the field to be customized. In the example, above, the ReportStatus field within the Collection Report is being customized. The customizations are specified in the following attributes:

The Include attribute specifies whether the field should be included or excluded from the report. The value of this attribute can be "Yes" or "No":

Yes

Include this field in the report.

No

Exclude this field from the report.

The ColumnHeading attribute specifies the field's column heading within the report or log file. If omitted, the default column heading for this field will appear in the report or log.

The InsertBefore attribute specifies the order in which this field appears in the report or log file. The value specifies the name of the existing report field which this field will be inserted before. If omitted, and if the Include attribute is set to Yes, then the field will be appended to the end of the report.

A subsequent <FieldName> element without an InsertBefore attribute can append another field to the end of the report.

Each report will include only the default fields for that particular report type, unless one or more <FieldName> elements customize this behavior. It is not necessary to specify a <FieldName> element to include a field that is already included by default. Such fields would only require a <FieldName> element if you wished to exclude the field, alter its column heading, or reorder the field.

It is not necessary to specify a <FieldName> element to exclude a field that is not already included by default.

Refer to the Appendix C: Report & Log Customization for a list of available reports, logs, field names, default fields and default column headings.

End of <Logs> element, closed by the </Logs> tag. This is the final element within a pair of <CliParameters></CliParameters> tags.