Building Lifecycle Applications for v5.0
The Discovery DocScript functions are used in conjunction with the Discovery module and are typically used within the Classification Rules and Indexing Rules.
Understanding Search Results Blocks
DocuNECT has a powerful document architecture called DocInfo that stores different characteristics of the document. Discovery function return a Search Result Block, which is a DocScript array that has the follow elements:
This is return by the document indexing functions and has the following elements:
- Position 0 - Extracted value from the text.
- Position 1 - The page number the value was extracted from.
- Position 2 - OCR block x position.
- Position 3 - OCR block y position.
- Position 4 - OCR block width position.
- Position 5 - OCR block height position.
This is part of the Information that is displayed against the field and the field zoning as shown in the screenshot:

Classification Functions
The following table details the functions related to the automated Classification and Indexing rules:
Function | Description |
[Classification.ApplyRules](pagetext) |
This command it automatically run when the Execute Classification Rules is set to Yes in the lifecycle. However, if you want to control when rules are run, then disable the flag and use this command in the DocScript. Parameters
|
Indexing Functions (Finding Information in Documents)
The following table details the functions related to the automated Classification and
Indexing rules:
Function | Description |
[Indexing.ApplyRules]() | This command it automatically run when the Execute Indexing Rules is set to Yes in the lifecycle. However, if you want to control when rules are run, then disable the flag and use this command in the DocScript. |
[Indexing.FindText](string, mode, page) |
Finds text within a specific page. Returns Search Result Block array related to the text found. Parameters
|
[Indexing.FindAround](tag, above, below, left, right, removetag, includeintersections) |
Finds the text around an anchor. For example, if you are trying to find a Loan No. you can search for the label "Loan No" and then look at the text above, below, to the left and right of the label to extract the value. Returns Search Result Block array related to the text found. Parameters
|
[Indexing.FindBetween](string1,string2) |
Find text between two anchors and returns Search Result Block array related to the text found. Parameters
|
[Indexing.FindValue](anchordictionary, formatdictionary, samplevalue, wholepage) |
This function is designed to get specific values from documents such as Invoice Nos, Purchase Order Nos, Order Nos etc. For more unstructured text extraction scenarios, then use the FindText, FindBetween, or FindAround functions. Returns Search Result Block array related to the text found. The anchor used is positioned at the end of the array. Parameters
This parameter is optional, but is useful to target specific text blocks in the document. If this value is not specified then the whole page text is used.
|
[Indexing.GetOCRBlocksInRegion](pagenumber, rectangle, includeintersections) |
Returns text found on pagenumber at the specified rectangle. Parameters
Best used in conjunction with one of the Find functions |
Page Functions
The following table details the functions related to document pages:
Function | Description |
[Page.GetSize](page) |
Returns the page size as an array with two values: Position 1: Page Width Parameters
|
[Page.GetOrientation](page) |
Returns the page orientation in degrees. Parameters
|
Match Functions
The following table details the functions related to (fuzzy) matching text within a string.
Function | Description |
[Match.FuzzySearch](searchvalue, intext, difference, ignorecase) |
This function performs a fuzzy text search and allows for a number of different characters in the text to be different. For example, if you OCR a bad quality document some text values may be misread. The phrase "The dog is in the bed" could be mis-OCR'd as "The dog is in the bel". Obviously if you search for the word "bed" it will not be found. If you add a difference of 1 it will search for the word in the phrase with one character difference. Another example of this function is to search for name that could be represented slightly differently. Example, searching for "John Smith", but the name is represented in the document as "John A. Smith". Parameters
|
[Match.GetValueRegEx](text) |
This will convert the text passed in to a regular expression. Parameters
|
[Match.GetDateRegEx](datetext, delimiter) |
This will convert a date text passed to a regular expression. In bad quality documents the dates could be misread. For example, with dates formatted as "01/01/2019', the '/' can be misread as '1' which invalidates the date. This function allows a regular expression to be built to help find/extract date values. Parameters
|
[Match.GetDateRegEx](sampledate, dateextracted) |
This validates a date extracted against a sample date. Parameters
|
[Match.WordToNumber](string) |
Converts string to number i.e. eight -> 8 Parameters
|