Document Ingestion API can be used to detect signatures in documents in two ways:
- Get the bounding boxes of detected signatures in the document.
- Get the context of the signatures detected in the document.
Getting Bounding Boxes of Signatures
Bounding boxes of signatures can be detected by setting signature_detection
to true
in the parse_options
JSON object when calling the parse
API.
from tensorlake.documentai import DocumentAI
from tensorlake.documentai.models.options import ParsingOptions
doc_ai = DocumentAI(api_key="YOUR_API_KEY")
parsing_options = ParsingOptions(
signature_detection=True,
)
parse_id = doc_ai.parse(
file="tensorlake-XXX", # Replace with your file ID or URL
parsing_options=parsing_options,
)
Response
The bounding boxes of signatures are present in the Document object returned by the parse
API. This is a JSON
object which contains all the detected objects in the document such as tables, figures, charts, signatures, etc.
results = doc_ai.get_job(job_id)
# There is a signature on page 10 of this document
# result.outputs.document.pages[10].page_fragments[0]
# PageFragment(fragment_type=<PageFragmentType.SIGNATURE: 'signature'>, content=Text(content='Signature detected'), reading_order=-1, page_number=None, bbox={'x1': 79.0, 'x2': 200.0, 'y1': 812.0, 'y2': 855.0})
Getting Context of Signatures
Context of signatures can be detected by using the Structured Extraction API.
You can specify a schema that captures the context, such as has the signature been signed by the signee, name of the personal
signing, etc.
A sample schema for signature context is shown below:
from typing import List, Optional
from pydantic import BaseModel, Field
from tensorlake.documentai import DocumentAI
from tensorlake.documentai.models.options import (
StructuredExtractionOptions,
ParsingOptions
)
class Signature(BaseModel):
has_signed: Optional[str] = Field(
None, description="Has the signee signed the signature"
)
name_signee: Optional[str] = Field(None, description="Name of the signee")
class Signatures(BaseModel):
signatures: List[Signature]
signatures_extraction = StructuredExtractionOptions(
schema_name="signatures",
json_schema=Signatures
)
parsing_options = ParsingOptions(
signature_detection=True
)
doc_ai = DocumentAI(api_key="YOUR_API_KEY")
parse_id = doc_ai.parse(
file="tensorlake-XXX", # Replace with your file ID or URL
parsing_options=parsing_options,
structured_extraction_options=[signatures_extraction],
)
results = doc_ai.wait_for_completion(parse_id)
print(results)
# {
# "signatures": [
# {
# "has_signed": "Yes",
# "name_signee": "John Doe"
# }
# ]
# }