Search query syntax for Document Content

Document Content is text that is extracted and indexed by dtSearch.

One method to search Document Content is to build an advanced search on the Search page. The available expression operators for Document Content are contains, does not contain, and has a value. This search method uses dtSearch syntax, which provides significant additional options, as described in the following table.

Search query on the Search page showing the Document Content field and the search expression operators.

A second method to search document content is to use the quick search box on the navigation bar. The quick search box includes the option to search Document Content.

Quick search menu showing all options and Document Content selected.

The syntax in the following table applies to those Document Content searches. For a full explanation of the functionality in the quick search box, see “Perform a quick search” in the Nuix Discover help.

Document Content search operators

The following table describes the syntax and operators that you can use to query Document Content. Nuix Discover supports Unicode characters.

Note: Because dtSearch does not index punctuation, punctuation is not searchable unless your administrator changes the alphabet file. The same applies to the following characters because they are reserved as operators: %, #, &, ?, =, and ~

Operator

Description

Example query and results

AND

Requires all terms connected by AND to exist.

A search for apple pie AND poached pear finds documents that contain both apple pie and poached pear.

OR

Requires any term connected by OR to exist.

A search for apple OR pear finds documents that contain the word apple, documents that contain the word pear, and documents that contain both words.

NOT

Requires certain terms to not exist.

A search for NOT banana finds documents that do not contain the word banana.

A search for apple AND NOT pear finds documents that contain the word apple and do not contain the word pear.

not w/0

Searches for a word or phrase not in association with another word or phrase. 

A search for Word?? not w/0 Word04 finds documents that include Word01 or Word02 or Word03 but exclude documents that include Word04.

Nuix Discover finds all words in the index that meet the criteria for Word?? and excludes words after the not w/0 proximity operator (in this case, Word04).

NEAR

The word near is treated as a search term, not an operator. Use a proximity search to locate terms that are near each other.

Not applicable.

Words and phrases

Quotation marks are not required when searching a phrase.

Noise words, such as if and the, are treated as any word.

To search for an exact phrase that includes the words and, or, or not, enclose the phrase in quotation marks.

Punctuation inside a search word is treated as a space.

A search for tart apple pie finds tart apple pie but not apple pie.

A search for bill of sale finds documents containing bill, any intervening word, and sale.

A search for "apple and pie" in quotation marks finds documents containing apple and pie but not apple pie.

( )

Use parentheses with searches that have two or more connectors. If you do not use parentheses, dtSearch evaluates OR operators before AND operators.

A search for apple AND (pear OR orange) finds the word apple with either pear or orange. If you do not use parentheses, this search will return the same result.

?

Wildcard that matches any single character.

A search for appl? finds apple and apply.

*

Wildcard that matches any number of characters. Use at the beginning or end of a search term.

A search for *ppl* finds application and supply.

=

Wildcard that matches a single digit.

Use multiple equals signs (=) to find multiple digits.

A search for ==== finds 1234.

~

Stemming search: Finds grammatical variations of a word.

A search for click~ finds clicked and clicking.

%

Fuzzy search is useful to find misspelled words or to search faulty text generated by optical character recognition (OCR).

Each percent sign (%) in a search term represents one incorrect character.

Characters prior to percent signs must match exactly.

A search for capit%al finds capital, capitol, and capita.

A search for int%%ernet finds internet and intranet.

 

x w/n y

Proximity search:

In a content search, x w/n y finds the term (x) within a specified number of words (n) of another term (y).

In a coding search (database search), w/n is treated as a proximity search when the value for n is 50 words or fewer. When the value for n is greater than 50 words, the proximity search is treated as an AND operator (both words exist in the text).

At least one of the two expressions connected by w/n must be a word, a phrase, or a group of words and phrases connected by OR.

The x NOT w/n y operator allows you to search for a term that is not associated with another term.

A search for apple w/5 pear finds apple and pear where apple appears within five words of pear.

A search for (apple and banana) w/5 (pear or cherry) finds documents that contain both apple and banana within five words of either pear or cherry.

A search for apple NOT w/5 pear finds apple, except where apple is within five words of pear.

x pre/n y

Proximity search:

Finds the term (x) within a specified number of words (n) of another term (y). The first term must occur before the second term.

At least one of the two expressions connected by pre/n must be a word, a phrase, or a group of words and phrases connected by OR.

A search for apple pre/5 pear finds documents that contain apple within five words before pear.

A search for (apple and banana) pre/5 (pear or cherry) finds documents that contain both apple and banana within five words before either pear or cherry.

xfirstword

xlastword

Built-in search words that indicate the beginning or end of a document, as follows:

xfirstword: Marks the beginning of a document.

xlastword: Marks the end of a document.

Combine xfirstword and xlastword with proximity operators to limit a search to the beginning or end of a document.

A search for apple w/5 xfirstword finds apple when it appears within five words of the beginning of a document.

 


‎