Perform OCR on worker items

For WorkerItems, you can use the following methods:

isPerformOCR() : Gets whether to perform OCR for the current item.

getOcrProfileName() : Gets the name of the OCR profile to use, if it has been set.

setPerformOcr(Boolean performOcr, String ocrProfileName) : Sets whether to perform OCR on the current item. These values override any other OCR processing settings that may be set in the processing task.

Note: Ensure the default OCR profile used in the script exists before running the script.

Set OCR profile for GIF file types

Use the following example to set a specific OCR profile to items of file type GIF:

#Define mime types you want to include in the OCR process

$mimeTypesToOcr=["image/gif","image/tiff"]

def nuixWorkerItemCallback(workerItem)

#If an item being processed has a mime type matches our criteria

if ($mimeTypesToOcrInclude? workerItem.sourceItem.type.getLocalisedName()) #Use the OCR Profile titled 'Default' to run the ocr workerItem.SetPerformOcr(true, 'Default')

end

end

Set script to display text in Text tab and Navigation pane

If an OCRed item displays its text in the Preview pane's Text tab, but the Navigation pane shows the same result as having no text that is available or stored, per the following image:

Then use the following script to display its results correctly in the Navigation pane:

def nuix_worker_item_callback(worker_item) stored_item = worker_item.getStoredItem()

if stored_item.nil? || !stored_item.getTextObject().isAvailable() worker_item.addTag("WSS|TEXT|UNAVAILABLE")

else

worker_item.addTag("WSS|TEXT|AVAILABLE")

item_text = stored_item.getTextObject().toString() if !item_text.nil? && !item_text.empty?

worker_item.addTag("WSS|TEXT|FOUND")

else

worker_item.addTag("WSS|TEXT|NONE")

end

end

end

Retrieve OCRed text if No Text found in Preview pane

When OCRed items (particularly of images) are processed using a WSS, the WSS can indicate that there was no text available or stored, and yet the text is available in the Text tab of the Preview pane. Conversely, the selected item in the Results pane can then show "No text found" in the Text tab of the Preview pane, when there is, in fact, text that should display, and which in the WSS shows as "Text found".

When performing OCR, any extracted text is saved with the stored item. The source item is not modified to contain or be associated with any text. Therefore, when performing a reload of items in the case, if you want to view and use any text associated with an item, you have to use the stored item version of it. The, use the following script to correctly retrieve the text (which derives from the stored item as opposed to the source item, as the former contains any text that was applied to the item - see Source items and stored items in the introduction to the Guide to Worker Side Scripting for details).

def nuix_worker_item_callback(worker_item) stored_item = worker_item.getStoredItem()

if stored_item.nil? || !stored_item.getText().isAvailable() worker_item.addTag("WSS|TEXT|UNAVAILABLE")

else

worker_item.addTag("WSS|TEXT|AVAILABLE")

item_text = stored_item.getText().toString() if !item_text.nil? && !item_text.empty?

worker_item.addTag("WSS|TEXT|FOUND")

else

worker_item.addTag("WSS|TEXT|NONE")

end

end

end

However, the worker_item.getStoredItem() returns null if there is no associated stored item.