Match on Word Boundaries with FindWord

Keystroke and clipboard event data will typically contain sentences or paragraphs of text composed in whatever language is in use on the endpoint. When writing rules to match keywords against the data fields of these events use of the strstr function can be imprecise because it searches for a matching substring anywhere within the data without consideration of word boundaries. For example, consider the two sentences below.

“That's why he wants to quit.”

“That's quite a result.”

The rule

alert when strstr(keystroke.keydata, “quit”, false);

will match both sentences one and two because the sequence of characters “quit” appear in both sentences. The findword function can produce more precise matches than strstr when matching language text because the findword function matches along word boundaries. The rule

alert when findword(keystroke.keydata,”quit”, false);

Will only match sentence one above, because “quit” appears as a bounded word, in this case, preceded by a space and followed by a period. It will not match sentence two in which “quit” appears as a substring in another word. The word boundary analysis performed by the findword relies upon the Unicode Consortium’s open-source ICU (International Components for Unicode) library which implements robust, language-specific rules for identifying word boundaries.