Review and search for irregular file types
During the ingestion process, Nuix Workstation flags irregular files and presents them as part of the Statistics View. You can access each row from that view or through a query. You must review irregular files every time you process data to ensure that all the data is processed correctly. Nuix Workstation records an item failure the same way for a *.txt file as a PST file. You must review any questionable items and potentially reprocess them.
This section lists how to search for these 'questionable items' as one of the following irregular file types:
Items with bad extensions
Corrupted items
Deleted items
Empty items
Encrypted items
Non-searchable PDFs
Text-stripped items
Unrecognized items
Unsupported items
Note: Nuix Workstation only presents the types of irregular files present in the current case. The types can vary in each case.
Search for items with bad extensions
Bad Extension indicates items whose MIME-type is inconsistent with their file extension.
In the preceding example image, the Family.jpeg file is not an image but is actually a Microsoft Word document.
To search for files with improper extensions, use the following search syntax: flag:irregular_file_extension
Note: Nuix Workstation sets a native file's extension to File Extension (Corrected) during an export. It records the exported item's definitive metadata in the Export Item Summary, per-item XHTML report files, or load file.
Search for corrupted items
Corrupted items are those that Nuix Workstation has been unable to process. These items may also be referred to as evidence containers, evidence file containers, or evidence repositories. Nuix Workstation marks a document corrupt if:
It is unable to open the file.
Opening the file results in some type of failure.
It is otherwise unable to process the file.
For items listed as Corrupted, the File Type property displays the type of corruption. Additionally, two pieces of metadata may be recorded: FailureDetail and FailureMessage. By reviewing these items or optionally building a specific Metadata Profile that contains these fields, you can gain insight into the nature of the failures. A reason could be something as simple as a file being locked by an external process. Hovering over the FailureDetail value displays a message with full details for you to review.
To search for corrupted items, use the following search syntax: properties:FailureDetail
Search for deleted items
Deleted items are those items that Nuix Workstation extracted from the slack space of Microsoft email boxes.
Deleted email messages are not items listed in the Deleted Items folder. Instead, they are items that have been "permanently deleted" from within Outlook or Outlook Express. While processing them, Nuix Workstation attempts to extract as many fragments as possible, and reconstitute complete messages. If only a portion of the message still exists, Nuix Workstation extracts the available portion.
To search for deleted items, use the following search syntax: flag:deleted
Search for empty items
Empty items are items that are zero (0) bytes in size.
To search for empty items, use the following search syntax: mime-type:application/x-empty
Note: The classification of exceptions is based on our knowledge of file types. It is recommended that you save the diagnostics information to a file, which allows you to review the exceptions later.
Search for encrypted items
Encrypted items are those that Nuix Workstation has determined contain encrypted content. Nuix Workstation still extracts metadata, and as much information as possible from an encrypted file, but is unable to index all of the content.
To search for encrypted files, use the following search syntax: flag:encrypted
Identify encrypted files in a decrypted zip file
Sometimes after using a password to decrypt an encrypted zip file that decrypts most of the files it contains, you may still find one or more encrypted files. Then, how do you search for and identify those files or, in other words, generate a list of encrypted files belonging to a decrypted parent file?
Run a search using the following flags: flag:encrypted AND NOT flag:decrypted AND NOT content:*
This identifies all decrypted items which have no text. Then find them in the Document Navigator’s No text folder.
Search for non-searchable PDFs
Non-Searchable PDFs are items that are determined to be a PDF through header recognition but do not contain indexable text. These items are most frequently image-only PDFs and warrant further investigation, as the content in these PDFs is not text indexed, and therefore unsearchable by Nuix Workstation.
To search for non-searchable PDFs, use the following search syntax: mime-type:application/pdf AND NOT content:*
Nuix Workstation allows you to export the items using a third-party tool to OCR images (for example, PDF, TIFF, and PNG) and import the searchable text and PDFs back into Nuix Workstation.
Search for text-stripped items
Text-stripped items are items where Nuix Workstation is able to identify the file type but does not have a routine to cleanly extract all text and metadata in accordance with the file types' API. The result is an item that is searchable, but the text may be garbled or not properly formatted.
Note: Nuix Workstation only strips out US-ASCII characters (punctuation, 0-9, A-z). Nuix Workstation uses UTF-16LE encoding (a Unicode encoding used by Microsoft) to potentially extract more textual data.
To search for text-stripped file, use the following search syntax: flag:text_stripped
Types of text-stripped items
Text-stripped file types include the following (list is subject to change):
image/vnd.corel-draw
image/vnd.micrografx-designer
image/x-pict
image/vnd.micrografx-designer
application/vnd.adobe-photoshop
application/vnd.ms-shortcut
application/vnd.lotus-freelance
application/vnd.lotus-wordpro
application/vnd.borland-paradox
image/vnd.autocad-dwg
image/cgm
application/vnd.myob
application/x-js-taro
application/vnd.lotus-123
application/vnd.ms-works-ss
application/vnd.ms-works-wp
application/vnd.corel-slideshow
application/vnd.ms-works-wp
application/vnd.ms-visio
application/vnd.corel-quattro
application/vnd.corel-wordperfect
application/vnd.stardivision.calc
application/vnd.stardivision.draw
application/vnd.stardivision.impress
application/vnd.stardivision.math
application/vnd.stardivision.writer
application/x-hwp
application/octet-stream
Search for unrecognized items
Unrecognized items are items where Nuix Workstation did not recognize the header and was unable to assign a MIME-type. When Nuix Workstation cannot recognize the header in an item, the item is tagged as application/octet-stream and its text is stripped. In addition to extracting the ASCII text, Nuix Workstation extracts all recognizable system metadata.
Note: Nuix Workstation only strips out US-ASCII characters (punctuation, 0-9, A-z). Nuix Workstation uses the UTF-16LE encoding (a Unicode encoding used by Microsoft) to potentially extract more textual data.
Unrecognized MIME-types
There are 4 potential unrecognized MIME-types:
XML
OLE2
TXT
Unknown binary
To search for unrecognized files, use the following search syntax: kind:unrecognized
Search for unsupported items
Unsupported items are those from which Nuix Workstation was unable to extract any content or text. To search for unsupported items, use the following search syntax:
( has-embedded-data:0 AND has-text:0 AND has-image:0 AND NOT kind:multimedia ) OR ( mime-type:application/vnd.lotus-notes AND has-embedded-data:0 )
See the Nuix Supported Files Types document for the most current list of supported file types in Nuix Workstation.