Multiple layouts for a single document type
Often, documents of a single type will vary widely in appearance. Invoices from different vendors, for example, will contain similar information, but the format of this information, the layout of the page, and the labels may all vary. When this happens, you must first assess the content of the samples. If the text on the documents is very similar, then it may not be necessary to include samples of each format for classification. Similarly, if the field labels are consistent, it may not be necessary to create new extraction rules for the content.
When analyzing new documents of similar types, you must also decide if it is best to create a new document type (copy an existing document type) or include the document in an existing type. This assessment should take into account how you want to maintain rules and whether the extraction rules would interfere with one another.
Let's consider a case where the content is so different that we must both train...