OCR data

Optical Character Recognition (OCR) is a technology that identifies letters, numbers, and other characters, converting images or scanned paper documents into searchable electronic text. Using OCR technology, Nuix Workstation can extract text from PDF documents (including text in email PDF attachments, text in scanned documents and text from pictures in PDF documents) and from picture artifacts. This allows you to view and search on text that is normally locked inside images. The OCR process extracts text from image and document items, allowing for better investigative capability from certain media types.

Performing OCR, like processing or exporting activities, is most effectively done in Nuix Workstation by using Worker servers in a distributed network that hosts multiple workers; a Worker being one instance of the Nuix engine. So, if you run out of resources on your current machine you can connect to a Worker Server to have remote Workers (that is, distributed workers) join your job. This architecture requires at least two licenses with Nuix Worker capabilities. For how to configure and maintain such a network, see:

Configure settings for parallel processing

Nuix Workstation Guide to Configuring Distributed Workers.

Nuix Workstation Nuix Workstation Guide to OCR Processing

The Nuix Workstation Guide to OCR Processing details all you need to know about OCRing. It covers:

An overview of how OCR works in Nuix Workstation

How to perform OCR during ingestion - from Data Processing Settings

How to perform OCR after ingestion - from the Results view

How to customize an OCR Profile

How to set OCR cache options

How to check the success or failure of OCRed items

Miscellaneous post-OCRing tasks

Worker Side Scripts for performing OCR

OCR license types and OCR Addon versions