Lifecycle Definitions - Process Tab in v5.2

Building Lifecycle Applications for v5.0
Lifecycle Definitions in v5.2



The Process Tab will work on documents with the status of Created. Make sure the Upload Status is set to Created, and the Status After capture is set to Created. The settings on this tab drive the Document Processing connector.

processtab.png

Settings Sub-Tab

The following table provides a description of the fields in the Process > Settings sub-tab.

Field Description
Separate Document

Indicates whether the document needs to be separated. The following methods are supported:

  • No - No document separation is required.
  • With DocuNECT Barcode Separator Sheets - Detects and separate using DocuNECT Barcode Separator Sheets.
  • With T-Patch Separator Sheets - Detects and separate using T-Patch sheets.
  • With DocuNECT Barcode or Separator Sheets - Detects and separates using DocuNECT Barcode or T-Patch sheets.
  • With DocuNECT Barcode with Sticky Field T-Patch Separator Sheets - This detects the barcode and applies the sticky fields if required.
Leave Separator Sheet with Document If a separator sheet is found in the document, then this value determines whether it is left in the document or removed.
Remove Blank Pages

Removes blank pages from image based documents.

  • No - Does not remove blank pages.
  • Yes - Removes blank pages
Classify Document

Determines whether the document needs to be classified before indexing and if so what method of classification is required:

  • No- No classification is required.
  • Manual Classification - Manual classification by the user is required.
  • Auto-Classification - Auto classification is required only, with no user intervention. However, if there document falls below the Classification Confidence then the batch will be need to be reviewed by a user.
  • Auto-Classification followed by user validation - Auto classification is executed and the batch then goes into user validation regardless of the confidence values.
Classification Confidence This is used in conjunction with the auto-classification rules and determines the success threshold. Classification confidence less than the confidence threshold will require user intervention.
Index Document

Determines whether the document requires indexing:

  • No - Document indexing is not required.
  • Manual Indexing - Manual indexing is required.
  • Auto-Indexing - Auto indexing is required only, with no user intervention. However, if there document falls below the Indexing Confidence then the batch will be need to be reviewed by a user.
  • Auto-Indexing following by user verification - Auto indexing is executed and the batch then goes into user validation regardless of the confidence values.
Auto-Index Confidence This is used in conjunction with the auto-indexing rules and determines the success threshold. Index extraction confidence less than the confidence threshold will require user intervention.
Verify Document

Will place the batch in user verification after indexing:

  • No - No user verification is required.
  • Yes - User verification is required.
Document indexes will be extracted from text

Determines whether the index values will be extracted from the actual text of the document. This is similar setting to the Document Is Text Searchable. The two settings trigger when the extraction/OCR process takes place in the lifecycle.

  • No - No index extraction is required.
  • Yes - Index extraction is required. All text will be extracted and images (TIFF or Adobe PDF) will be OCR'd.
  • First Page - Indicates whether the indexes are extracted from the first page to reduce unnecessary OCR'ing.
  • First X Pages - Note, if this is selected then the X: field becomes enabled, allowing you to define the number of pages that include the indexes. This setting can also help reduce unnecessary OCR'ing.
Document Is Text Searchable

Determine whether the document is text-searchable.

  • No - No full-text search is required.
  • Yes - Full-text search is required. All text will be extracted and images (TIFF or Adobe PDF) will be OCR'd.
  • First Page - Indicates whether the relevant information is on the first page to reduce unnecessary OCR'ing.
Document Requires a PDF Rendition Determines whether the document requires a PDF rendition.
OCR Engine

There are currently two OCR engines used by DocuNECT.

  • Standard - Standard OCR engine.
  • Enhanced - Higher-quality engine that comes with the Discovery module.
Status After Processing

Determines the final status:

  • Lifecycle Complete - This is typically used for DocuNECT Capture.
  • Distributed - This is typically used for lifecycle applications where the documents reside in DocuNECT.

Separation Rules Sub-Tab

The rules in this section are run per document and allow for custom document separation rules to be defined if they are not included in the Process tab.


Auto-Classification Rules Sub-Tab

This allows for the auto-classification rules to be defined. The rules in this section is run per document.

For a majority of Discovery systems enable the Use Classification Rules option and this will classify using the rules defined in Classification Rules.

If you still need to process the document after the rules have been executed then you can add DocScript.


Ready for Classification Rules Sub-Tab

This table allows for business rules to be entered for the auto-classification rules in the process stage and is run at the batch level. Business rules have two elements, Tags and DocScript:


Post Classification Rules Sub-Tab

This table allows for business rules to be entered for the auto-classification rules in the process stage and is run at the batch level. Business rules have two elements, Tags and DocScript:


Auto Indexing Rules Sub-Tab

This table allows for business rules to be entered for the auto-classification rules in the process stage. Business rules have two elements, Tags and DocScript:

For a majority of Discovery systems enable the Use Indexing Rules option and this will extraction using the rules defined in Indexing Rules.

If you still need to process the document after the rules have been executed then you can add DocScript.


Ready for Indexing Rules Sub-Tab

This table allows for business rules to be entered for the auto-classification rules in the process stage and is run at the batch level.. Business rules have two elements, Tags and DocScript:


Ready for Verification Rules Sub-Tab

This table allows for business rules to be entered for ready for verification rules in the process stage and is run at the batch level. Business rules have two elements, Tags and DocScript:


OCR Settings Sub-Tab

Portford has pre-set these values based on internal testing and implementation experience. However, there are settings that can be adjusted for specific document types. If OCR is not working as expected then please call our support and we will assist with these settings.

OCR Setting Description
WORK DEPTH Default is 100. Range from 0 to 255. Higher number is slower but achieves better results for bad quality documents
REMOVE EXISTING TEXT Default value is TRUE Removes the existing text from a PDF when OCR'ring. ie. ignores PDF text.
AUTO ROTATE Default is FALSE. If TRUE pages will be rotated.
DESKEW Default is FALSE. If TRUE handles up to 10% of page deskew.
BINARIZE Default is TRUE. Converts color images to bi-tonal before OCR'ing. Note, color images must be bi-tonal in order to OCR.
PDF TO IMAGE DPI Default is 200. Values can be 100,150,200,300,400,500,600,72. If an invalid value is used, it will default to 200.
EXTRACT METHOD Default is CONVERT TO TIFF. Valid values are CONVERT TO TIFF or NATIVE. If an invalid value is used, it will default to CONVERT TO TIFF.
DESPECKLE Default is 0. Valid values are 0 to 20. 0 is disabled. Increasing value increases despeckling. If an invalid value is used, it will default to 20.