Identify document type using Document Intelligence

Question

Identify document type using Document Intelligence

Paul Spinks 0

I’m currently using Azure Document Intelligence to process PDFs and extract invoice data.

One challenge I’m facing is determining the document type (for example invoice vs credit note vs other). Using the prebuilt invoice model, I’ve noticed that credit notes are often still identified as invoices since the fields are very similar.

Is there any out-of-the-box model within Azure Document Intelligence that can reliably classify document types like this?

Or is the recommended approach to train a custom document classification model for distinguishing between invoices, credit notes, statements, etc?

Any guidance or best practices would be appreciated.

0 comments

2 answers

Your answer

Answer 1

Yutaka_K_JP 1,645

I think doc intel has no doc‑type model at all, invoice n credit note fall into the same layout space… maybe train a tiny classifier with a few of ur real files first, n if it still slips just add 1–2 fixed cues so it routes cleanly.

0 comments

Answer 2

Azure Document Intelligence doesn’t provide an out-of-the-box prebuilt model that reliably distinguishes between closely related financial document types such as invoices vs credit notes vs statements. Prebuilt models are document-specific (for example, the prebuilt invoice model) and are designed primarily for field extraction from that document type, not for fine-grained multi-class classification across similar types.

For scenarios where multiple document types share similar structure and fields (like invoices and credit notes), the recommended approach is:

Use a custom classification model to identify the document type first.
- Custom classification is explicitly intended for “at least two different types of documents” such as different forms or letters.
- Once trained, it can “identify the document type before you invoke the extraction model,” which fits the need to route invoices vs credit notes vs other documents.
After classification, route each document type to the appropriate extraction model:
- Use prebuilt models (for example, prebuilt invoice) where they match the classified type.
- Use custom extraction models (template or neural) for document types that don’t map cleanly to a prebuilt model or need custom fields.
Best practices for the custom classifier in this scenario:
- Ensure at least two classes and a minimum of five samples per class; in practice, more samples per class and coverage of layout variations improve accuracy.
- Include representative examples of invoices, credit notes, statements, and any “other” category that appears in the workload.
- If documents appear in multiple formats (PDF, images, Word, Excel, PowerPoint), include at least one sample of each format per class to improve robustness.

This pattern—custom classification first, then conditional routing to prebuilt or custom extraction models—is the intended design for document-centric solutions in Document Intelligence and is the recommended way to reliably distinguish between similar financial document types.

References:

Share via

Identify document type using Document Intelligence

2 answers

Your answer