Appendix A: Glossary
Alternate Data Stream
On a volume formatted for the NTFS file system, an alternate data stream is a secondary set of data associated with a file.
On Windows, alternate data streams can store extended file properties, such as Author and Title. Downloaded files may contain an alternate data stream containing details regarding the security zone from which the file was downloaded.
Nuix Collector Suite preserves any alternate data streams when collecting or extracting files; however, it is possible to configure a JobFile to ignore alternate data streams.
Case
A name given to a file collection project. A Case may represent a legal matter, a phase of litigation, or other file collection project.
An organization may have only a single Case, or numerous Cases.
Collection
A set of files copied from one or more input sources. Input sources can include local volumes, local folders, network mapped drives, network shares, SharePoint sites and scopes and evidence files.
Original files in each input source location are left unaltered by collection activity.
See also: Extraction, Survey, JobFile, FileList
Crawl
A stage during the collection process which examines the input sources searching for files which meet the selection criteria. The crawl process notes each file and sub-folder to be collected, along with statistics such as the total number of files and total bytes to be collected.
Crawls of drives, folders and network shares can also be used to generate logs of what would be collected – without actually gathering any files. This is referred to as a "survey".
See also: Crawl Database, Survey, Responsive File
Crawl Database
A database, consisting of two files, created by Nuix Collector and saved in the \Logs sub-folder for each collection.
The database contains the list of files and folders detected during the Collection's crawl process, along with statistics such as the total number of files and total bytes to be collected.
The database also stores log entries which record the status of the full collection process, including a list of files successfully collected, the total number of files and total bytes collected, plus any errors or warnings that occurred.
The \Logs folder sub-and the database files within it should be retained for as long as the collected files are retained.
See also: Crawl, Examining Collection Logs & Reports
Custodian
People or entities responsible for files located on specific computers or network shares. Custodians are typically designated managers and staff who keep files relevant to a Case or project.
De-duping
A feature to ensure no duplicate files are collected. Results in somewhat smaller collections, at the expense of added processing time.
When enabled, Nuix Collector checks the hash value of each responsive file against the hash values in the Duplicates Database. Only files which have not previously been added to this database are collected.
Duplicate detection can span several distinct collection runs, if the same duplicates database folder is specified for each collection.
See also: Hash Value, De-NISTing, Responsive File
De-NISTing
A feature to ensure that no files are collected which are listed in the NIST database. This database contains a set of known operating system and application file hash values. Results in somewhat smaller collections, at the expense of added processing time during the collection process.
When enabled, Nuix Collector checks the hash value of each responsive file against the hash values in the NIST Database. Only files which are not in this database are collected.
See also: Hash Value, De-duping, Responsive File
Deletion
A type of Nuix Collector processing job where the input source files are deleted, and the metadata for each deleted file is logged. Options allow the deletion to be done securely, so that subsequent recovery of deleted files is virtually impossible.
Because a deletion job destroys data, a degree of caution is in order. The "Collect and Delete" option will collect each file (saved at the specified target location) before deleting the file from its original location. Deletions can also be driven by an XML FileList, based on data collected in a previous survey. Such jobs will delete files only if they are listed in the XML FileList and only if the file metadata in the XML FileList matches the metadata of the actual file.
Deletion jobs are configured in the Collector Wizard.
See also: File Collection, Extraction, Pre Hash, Survey
Destination
Also referred to as the "target" in Nuix Collector. Destination is a folder or network share where collected or extracted files are stored. The Destination can be a folder on a portable hard disk, but is more often a network share on a server volume, NAS device or SAN device.
Care must be taken to specify Destinations with sufficient free disk space to store the collected files. If multiple Collections will store files at Destinations residing on the same physical disk or array, take care to:
Ensure free disk space is sufficient to store files from all the Collections
Ensure Collections are scheduled to run in a staggered fashion, if necessary, so that disk I/O at the Destination does not become a performance bottleneck.
See also: Destination Type, FileSafe, Native Files, MHTML
Destination Type
The kind of files that a collection or extraction will save at the specified Destination.
Destination Types (formats) include:
Native Files: Copies of the original files
FileSafe: An evidence file format, containing multiple files and folders with metadata intact
MHTML: For storing dynamic content collected from SharePoint servers
See also: Destination, FileSafe, Native Files, MHTML
Disk Image Collection
A complete copy of a logical volume or an entire physical disk. Disk images include all the readable data for the volume or disk, including any unallocated sectors (which can contain the content of deleted files).
Because disk images are much larger than a "targeted file collection", they take more time to capture and transmit, as well as take more disk space to store. Furthermore, more time and disk space are required for disk images when performing any subsequent indexing and searching of the data (using a program such as Nuix Workbench, licensed separately). For these reasons, disk images are generally sued only for critical matters such as criminal investigations, or when mandated by a court order.
See also Physical Disk, Logical Volume, Targeted File Collection
Duplicate Files Hash Database
The Duplicate Files hash database contains the hashes for all files that have previously been extracted with Nuix Collector Evidence Browser or Nuix Collector. This hash database can be used to de-duplicate files based on those files that have been extracted or collected previously. These duplicates are considered "external" duplicates, as they are from previous extraction or collection runs. Any duplicate file determined based on this hash database will be logged as duplicates but will not have a reference to the original file.
See also: Standard Files Hash Database, NIST Hash Database, Hash Value
EFS
Encrypting File System: On PCs running Microsoft Windows 2000 or newer, folders and files on NTFS-formatted volumes can be EFS-encrypted from Windows Explorer. EFS-encrypted files can only be unencrypted while logged in as (1) the user who performed the encryption, or (2) a user who is a designated Data Recovery Agent (the DRA user must have been designated a DRA prior to the time the files were originally encrypted).
Nuix Collector can collect EFS-encrypted files when it is running as the user who encrypted the files. The collected files are saved in unencrypted form at the Collection's specified destination when collected as native files. When collected to a FileSafe, the FileSafe itself is encrypted, but the collected files within the FileSafe can be extracted as unencrypted files.
EnCase
A suite of digital forensics products by Guidance Software.
EnScript
An EnCase script which automates EnCase activities.
Also, the scripting language incorporated into the EnCase product suite.
Evidence File
A file which contains other files. Similar to a ZIP file. May also contain metadata for each file, such as last access date, owner, hash value, etc.
May also be referred to as an evidence container, evidence file container or evidence repository.
See also: Extraction
Examiner
The person responsible for defining and/or reviewing a given collection.
External Duplicate
An external duplicate is a duplicate file where the original file was first encountered in a previous collection or extraction run.
See also: Internal Duplicate
Extraction
A kind of collection where the input source is an evidence file container or disk image. Example input sources for an extraction (and their filename extensions) include the following:
A FileSafe File (.MFS01)
An EnCase Logical Evidence File (.L01)
See also: File Collection, Deletion, Survey, Evidence Container
File Collection
A type of Collection where files in specified input sources are crawled (i.e. examined) and then selected files are copied to a specific Destination path. There is a corresponding JobFile containing settings for such collections.
See also: Job Template, Survey, Crawl
File System
A scheme for storing files and folders on a volume, such as a hard disk partition. Various operating systems support different file systems. The most common file system on a Windows computer is NTFS. Each file system offers different features and size limits.
File Type
Each file on a computer is of a particular type. Examples include Word documents, Excel spreadsheets, application files and PST files.
File type is usually apparent by noting the filename extension which the file is named, e.g.
Sample.doc......Microsoft Word document
Sample.xls......Microsoft Excel spreadsheet
Sample.pdf......Adobe Acrobat document
Sample.exe......Application
Using filename extension to identify file types can be limited, because:
Some programs save their documents using the same filename extension
Different versions of a given program may use the same filename extension, but the actual file type may be different.
Users can save or rename files using an alternate filename extension.
Nuix Collector's signature analysis feature can often overcome these limitations.
See also: Intrinsic File Type, Filename Extension, Signature Analysis, Signature Header File
FileList
A simple text file containing a list of files to be processed. Each line within the FileList specifies a single file.
See also: Survey, XML FileList, Hash List File
Filename Extension
The last several characters of a file's name, preceded by a period. The filename extension is used to identify a file's type, though this method has limitations.
Files with no period in their filename have no filename extension.
See also: File Type, Intrinsic File Type, Signature Analysis, Signature Header File
FileSafe
An evidence file format which preserves files and metadata information in a forensically-sound manner. FileSafe files:
Contain a set of collected or extracted files
Can span multiple files to breakup large collections into manageable pieces.
Are designed to be written to only once.
Can be accessed and searched using Nuix Workstation and certain other Nuix eDiscovery applications (licensed separately).
The files and folders contained within a FileSafe can be extracted using Nuix Collector or Nuix Collector Evidence Browser.
See also: Native Files
Forensic Snapshot
A feature of Nuix Collector which allows the program to make sound copies of files even when they are open and locked by an application. Enabling the Forensic Snapshot feature ensures most or all open files can be successfully collected, at the expense of added processing time.
The Forensic Snapshot feature relies on Microsoft's Volume Shadow Copy Service. Forensic Snapshots can only be employed on local NTFS volumes with sufficient free disk space.
See also: Volume Shadow Copy Service
Hash List File
A text file containing a list of file hashes to be processed.
Hash List Files contain a single line for each MD5 hash value. No hyphens or delimiters are permitted; only the hexadecimal numerals 0 – 9 and A – F.
See also: Deletion, Survey, Hash Value, Pre Hash, FileList, XML FileList
Hash Only
A setting within a Nuix Collector processing job which determines the hash value for the content of each responsive file. The hash value is noted in the logs, but no further processing is done. Refer to the HashOnly parameter of the <Target> tag, within the accompanying Nuix Collector & ECC JobFile Reference.
See also: Responsive File
Hash Value
A numeric calculation based on the byte contents of a file. If a file's content changes, a new hash value calculation will arrive at a different number than an earlier calculation. Thus, hash values are used to validate that the contents of a file have not changed.
Hash values can also be used to identify files. Nuix Collector's de-duping and de-NISTing features rely on hash values to determine whether files are duplicates (of previously collected files), or whether files are listed in the NIST database or a custom database.
See also: De-duping, De-NISTing, MD5, Hash List File, Standard Files Hash Database
Image File
A binary file containing an image of a physical disk, logical disk partition or folder tree. .DD, .E01 and Symantec Ghost .GHO files are image files.
The contents of system memory (RAM) can also be captured and saved as an image file. Various tools are available for analyzing RAM image files. Different tools are available for analyzing disk image files.
See also: Evidence File, Extraction, Virtual Hard Disk File
Internal Duplicate
An internal duplicate is a duplicate file where the original file was first encountered earlier in the same collection or extraction run. Logging of internal duplicates will include information regarding the original file.
See also: External Duplicate
Intrinsic File Type
A file's type, as determined by its content, rather than by the filename extension.
For example, it is possible to rename an Acrobat document from MyLetter.pdf to MyLetter.ltr. The file's content has not changed, so it remains intrinsically an Acrobat document.
Nuix Collector features file signature analysis, which can collect files based on their intrinsic file type.
See also: File Type, Filename Extension, Signature Analysis, Signature Header File.
JobFile
A JobFile contains all the settings for running a Nuix Collector job, including:
General settings:
the name of the case or project
the type of collection or extraction
Input settings:
the input sources (data locations) to collect
File selection settings:
File selection criteria
Output (target) settings:
whether the job will perform deletions
the destination path to save collected files to
Log and report settings:
which reports and logs to produce, and where to place them
The Collector Wizard provides screens for configuring the primary settings in a JobFile.
JobFiles can be edited manually to configure additional collection settings, to refine the behavior of collection jobs. For details, see the accompanying Nuix Collector & ECC JobFile Reference.
JobFile template
A JobFile whose settings are used as a model for configuring a new JobFile. JobFile templates are no different than regular JobFiles; only the manner in which they are used is different.
For details on editing JobFiles, see topic Customizing JobFiles in the Nuix Collector & ECC JobFile Reference.
Local Administrator
On a Windows PC, a user who is a member of the local security Group 'Administrators'. Such users typically have access to every folder and file on the system, and can install and uninstall applications.
Logical Volume
A portion of the storage space on a physical disk which is treated as a distinct disk volume by the operating system. A logical volume can consist of one or more disk partitions. Nuix Collector Suite's Portable Collector program supports disk image collections of logical volumes on supported Windows, Linux and macOS computers (physical disk image collection is also supported).
See also Physical Disk, Disk Imaging Collection
M.A.C. Times
M.A.C. stands for "Modified, Accessed, Created". M.A.C. Times are the dates and times (a.k.a. timestamps) when a given file was last modified, accessed or created.
See also: Metadata
Mapped Drive Letter
A computer may "map" a shared folder residing on another computer and assign that shared folder to a drive letter. E.G.: the UNC path \\myserver\myshare can be "mapped" to drive letter M: and subsequently referred to as M:.
Mapping a shared folder to a drive letter is typically done as a convenience, so the shared folder can be referred to in a concise manner. However, it may be necessary to map a drive letter to allow certain programs that lack UNC path support to be able to access files and folders on a shared folder. Nuix Collector Suite supports both UNC paths and mapped drive letters.
Note: Mapped drive letters may be associated with a particular user's Active Directory profile or network login script. Users can also manually establish drive letter mappings to shared folders.
Mapped drive letters are arbitrary – one user's mapped S: drive may point to a different shared folder than another user's S: drive.
MD5
Message-Digest Algorithm 5: A checksum algorithm for calculating a file's unique hash value (see Note, below). Used by Nuix Collector Suite and several other forensic tools for identifying files.
Note: Uniqueness is not guaranteed under MD5. In theory, it is possible for two files which are different to calculate to the same MD5 hash value. In practice, this is a rare occurrence. Intentionally modifying a file while keeping its MD5 hash value unchanged is a technically challenging undertaking, with significant restrictions as to the kinds of edits which can be made.
See also: Hash Value
Metadata
Data associated with a data file or record. For example, Creation Date is a metadata property of a file or folder stored on an NTFS volume. Such data is stored in the file allocation table of the volume where the file resides, rather than directly in the file. There are numerous other metadata properties for files and folders.
Metadata can also exist within a file. For example, a Microsoft Word document may track Author, Company Name, number of revisions, last modification date and other details. Such data is not presented to the user directly (although Word can reveal most of the metadata it stores in a document). Metadata can also include the full path to the file, previous edits and revision marks – even text deleted in previous sessions.
MHTML
MIME HTML: A file format for storing MHTML/HTML files (web pages) along with any graphic files, style sheets and JavaScript files needed for the page to display properly.
Native Files
Copies of original files. Also referred to as "Native Copies" or "Exact Native Copies".
Note: Collected or extracted files can be saved as either Native Files, or as a FileSafe.
See also: FileSafe
Owner
Each file and folder on an NTFS partition has an "owner", which is a specific User ID or Security Group on an Active Directory domain or local computer. Most CIFS shares also contain file owner information for each file.
Nuix Collector Suite can restrict a collection to only files owned by a specified list of users, security groups or SIDs.
See also: Custodian, Owner SID
Owner SID
The Windows Security Identifier used to identify the owner of a file or folder on an NTFS volume or CIFS share.
See also: SID
Physical Disk
A traditional hard disk, SSD disk, flash drive or other device. When selecting a physical disk as the source for a disk imaging job, all the readable data on the physical disk will be saved within a disk image file. If the physical disk contains multiple logical volumes, data from all these volumes will be saved in the resulting disk image file.
See also Logical Volume, Disk Imaging Collection
Pre Hash
A setting within a Nuix Collector processing job. When set to "Yes", Nuix Collector will calculate the hash value for the content of each input file crawled. The hash value is noted in the Crawl log/report. Refer to the PreHash parameter of the <Input> tag, within the accompanying Nuix Collector & ECC JobFile Reference.
See also: Survey
Production Environment
The network or computing resources which are actively being used in an organization. Programs deployed in the production environment are said to be "in production".
See also: Test Environment
RAM Capture
An image of the system memory of a computer running Nuix Collector or Portable Collector. The resulting "dump file" can be processed with various tools.
See also Volatile Information
Regular Expression
A search term using various character sequences to match a "string" or byte sequence. Provides more powerful and flexible search capabilities than wild card searches using * and ?.
Regular expressions are supported by numerous computer programming languages and utilities, and are employed by Nuix Collector's file signature detection engine.
See also: Intrinsic File Type, Signature Analysis.
Responsive File
A file within an input source which meets all the criteria for a given job. As Nuix Collector crawls the input sources, each responsive file is noted in the crawl database. Subsequent logs and reports can list these files.
Responsive files are generally processed; however, a responsive file may not be processed for various reasons:
The file is currently open and locked, and cannot be accessed via Volume Shadow Copy Service
The file exists in the Duplicate Files List (i.e. the file has previously been collected), and the De-Duplicate Collection feature is enabled
The file exists in the NIST hash database, and the Ignore files in the NIST hash database feature is enabled
The file exists in the Known Files Hash Database, and the Ignore Known Files feature is enabled
An I/O error occurs during processing.
The PreHash feature is enabled, and a change is noted in the file's hash value (between the time the file is first crawled and the time an attempt is made to copy the file to the destination). This can occur if the file is modified during the collection.
The job is cancelled or terminates abnormally.
Samba
A set of software tools which allow Macintosh OS X and Linux computers to share folders on a network, in a manner where Windows PCs can access the files. Shares made available via Samba are typically CIFS compliant, and include file ownership information for each file.
Samba also includes Rsync – a utility for copying files and folders across a network and across computer platforms.
Segment Size
A size limit for spanning a collection across multiple FileSafe files. Necessary when creating large FileSafe files on volumes where the underlying file system imposes a restrictive limit on the maximum size of each file (e.g. the 2 GB limit on FAT32-formatted volumes).
See also: Spanning
Selection Criteria
Settings specifying which files will be processed from all the input sources in a collection, extraction, deletion or survey job. Selection criteria include:
Input directories, FileSafes, SharePoint sites and queries, Evidence Files and other data sources.
Filename extensions, such as doc, docx, xls, xlsx, pdf, etc.
Date ranges for file metadata, including Creation Date, Last Modification Date and Last Access Date.
File header inspection (signature analysis), to determine file type regardless of the filename extension.
Owner name or SID
Keyword search criteria (for text files)
Whether known files exclusions are active, such as deNISTing, deDuping, etc.
Selection Set
Selection sets are groups of settings which cover the following Selection Criteria only:
Filename extensions, such as doc, docx, xls, xlsx, pdf, etc.
Date ranges for file metadata, including Creation Date, Last Modification Date and Last Access Date.
File header inspection (signature analysis), to determine file type regardless of the filename extension.
Selection sets can be named and saved, then later applied to new jobs.
SID
Security Identifier: a unique alphanumeric character sequence used to identify a Windows user or a group of users.
Signature Analysis
A feature of Nuix Collector, whereby the content of a file is examined to determine the file's Intrinsic File Type. This allows specific types of files to be collected, even if the files are named with an incorrect or inconsistent filename extension.
See also: File Type, Filename Extension, Intrinsic File Type, Signature Header File.
Signature Header File
A list of file types, including the corresponding filename extensions and character sequences found within such files.
The Signature header file is used by Nuix Collector to perform file signature analysis (i.e. file type identification by examining the file's content). The file can be modified to extend Nuix Collector's ability to determine a file's type; for details, see topic File Signature Analysis.
See also: File Type, Filename Extension, Intrinsic File Type, Signature Analysis.
Spanning
The process of writing an output file to multiple files, so that a file system-imposed limit on maximum file size is never reached.
Spanning also allows huge collections to be subsequently copied to removable media, such as a set of DVD discs, provided the Segment Size is set no greater than what the individual media can hold.
Nuix Collector can span large FileSafes across multiple FileSafe files.
See also: Segment Size
Standard Files Hash Database
Contains hash values for a set of files that are known files within a particular environment. A collection or extraction configured to skip "Known Files" will skip any file whose hash value is found in this database.
Nuix Collector Evidence Browser can be used to create this hash database by crawling known images or directory structures containing known files. For details, see the Evidence Browser topic Using a Known Files Hash Database.
See also: Hash Value, De-duping, De-NISTing, MD5
Survey
A collection, extraction or deletion which crawls its input sources (including sub-folders and files), and which generates log files, but does not actually process any files.
The purpose of Survey-only jobs is to gather file lists and statistics, and to test file selection criteria. The Survey-only setting can be configured in the Collector Wizard, or edited directly in the JobFile. For details, see topic Using the Collector Wizard, or refer to the CrawlOnly parameter of the <Input> tag, within the accompanying Nuix Collector & ECC JobFile Reference.
See also: Pre Hash, Job Template, File Collection, Crawl
Test Environment
One or more computers where a program can be executed without impacting network or computing resources which are actively being used by others. Sometimes a test environment resides on an isolated network workgroup or segment.
See also: Production Environment
UNC
Universal Naming Convention: a way to specify a file or folder residing on a particular computer. UNC names begin with two backslashes, followed by a hostname, another backslash, and a share name corresponding to a shared folder or volume.
Access to a file residing on a UNC path may require specifying a UserID, Domain and Password. JobFiles can be manually edited to include these settings, if needed, and then encrypted.
URI
Uniform Resource Identifier: a way to specify the name or location of a local or network resource, such as a file, folder, web page or other data. URIs can be local to a given computer or can access resources across a local network or the Internet. URL is a narrower term, though is often used synonymously for URI.
URL
Uniform Resource Locator: a way to specify the location of a file, folder, web page or other data. URLs can be local to a given computer or can access resources across a local network or the Internet. URI is a broader term, though is often used synonymously for URL. Website addresses are examples of URLs.
Virtual Hard Disk File
A binary file containing the entire contents of a hard disk image. Virtual hard disk image files are mounted by virtual host computers and made accessible to virtual machines running on the host.
See also: Image File
Volatile Information
Specific data held in the computer's memory, including operating system details, network information. Details on running processes, such as file handles and process handles may also be available, depending on the operating system. This data is captured as a set of log file entries saved in plain text.
On Windows computers, screenshots of the Windows Desktop and each open window or dialog can also be captured.
Volatile information can be helpful when auditing which programs are running on each computer, and can be especially helpful in certain cyber security incident response scenarios.
See also: RAM Capture
Volume Shadow Copy Service
A service included with Microsoft Windows which allows a program to make a sound copy of an open or locked file. Nuix Collector — and several leading backup programs — can utilize this service.
See also: Forensic Snapshot
XML FileList
An XML text file containing a list of files to be processed.
XML FileLists may optionally contain file metadata: Creation Time, Last Access Time, Last Modification Time, File Size and the MD5 Hash Value of the file content. When performing a deletion job using an XML FileList, any specified metadata values will be used to ensure the file that is about to be deleted has not changed since the metadata was gathered.
A collection job configured for "Survey only" (i.e. JobFile setting CrawlOnly="Yes" – and optionally PreHash="Yes" – within the <Input> tag) can be used to generate a crawl log (or crawl report) containing file metadata. This log file can be reformatted for use as an XML FileList for a subsequent deletion job.
See also: Deletion, Survey, Pre Hash, FileList, Hash List File