OCR data
Optical Character Recognition (OCR) is a technology that identifies letters, numbers, and other characters, converting images or scanned paper documents into searchable electronic text. Using OCR technology, Nuix Workstation can extract text from PDF documents (including text in email PDF attachments, text in scanned documents and text from pictures in PDF documents) and from picture artifacts. This allows you to view and search on text that is normally locked inside images. The OCR process extracts text from image and document items, allowing for better investigative capability from certain media types.
Performing OCR, like processing or exporting activities, is most effectively done in Nuix Workstation by using Worker servers in a distributed network that hosts multiple workers; a Worker being one instance of the Nuix engine. So, if you run out of resources on your current machine you can connect to a Worker Server to have remote Workers (that is, distributed workers) join your job. This architecture requires at least two licenses with Nuix Worker capabilities. For how to configure and maintain such a network, see:
Configure settings for parallel processing
Nuix Workstation Guide to Configuring Distributed Workers.
Nuix Workstation Nuix Workstation Guide to OCR Processing
The Nuix Workstation Guide to OCR Processing details all you need to know about OCRing. It covers:
An overview of how OCR works in Nuix Workstation
How to perform OCR during ingestion - from Data Processing Settings
How to perform OCR after ingestion - from the Results view
How to customize an OCR Profile
How to set OCR cache options
How to check the success or failure of OCRed items
Miscellaneous post-OCRing tasks
Worker Side Scripts for performing OCR
OCR license types and OCR Addon versions