Run Worker Side Scripts to perform OCR
This section covers the following example Ruby scripts you can use to:
Perform OCR on Worker items
Retrieve OCRed text if No Text found in Preview pane
Perform OCR on Worker items
For WorkerItems, you can use the following methods:
isPerformOCR(): Sets whether to perform OCR for the current item.
getOcrProfileName(): Sets the name of the OCR Profile to use, if it has been set.
setPerformOcr(Boolean performOcr, String ocrProfileName): Sets whether to perform OCR on the current item and override any other OCR processing settings set in the processing task.
Before you run the script, ensure the OCR Profile you use in the script exists.
Note: Use the following example to set a specific OCR Profile to only process GIF and TIFF images:
#Define mime types you want to include in the OCR process
$mimeTypesToOcr=["image/gif","image/tiff"]
def nuixWorkerItemCallback(workerItem)
#If an item being processed has a mime type matches our criteria
if ($mimeTypesToOcrInclude? workerItem.sourceItem.type.getLocalisedName())
#Use the OCR Profile titled 'Default' to run the ocr
workerItem.SetPerformOcr(true, 'Default')
end
end
Retrieve OCRed text if 'No Text found' in Preview pane
When items (particularly of images) are processed for OCRing using a WSS, the WSS can show there was no text available or stored, and yet the text is available in the Text tab of the Preview pane. Conversely, the selected item in the Results pane can show "No text found" in the Text tab of the Preview pane, when there is text that should display, and which shows shows as "Text found" in the WSS.
Use the following script to correctly retrieve the text from the stored item not the source item. (See What's the difference between source items and stored items? in the introduction to the Guide to Worker Side Scripting for details.) It effectively makes the nuix.Item available in nuix.WorkerItem with function worker_item.stored_item to provide access to the stored item, which becomes available on reload. However, the getStoredItem() may return null if there is no associated stored item.
def nuix_worker_item_callback(worker_item) stored_item = worker_item.stored_item
if stored_item.nil? || !stored_item.text_object.isAvailable() worker_item.addTag("WSS|TEXT|UNAVAILABLE")
else
worker_item.addTag("WSS|TEXT|AVAILABLE")
item_text = stored_item.text_object.toString() if !item_text.nil? && !item_text.empty?
worker_item.addTag("WSS|TEXT|FOUND")
else
worker_item.addTag("WSS|TEXT|NONE")
end
end
end