Skip to main content
The most basic use-cases of Document Ingestion API are:
  • Convert the Document to Markdown for feeding into an LLM.
  • Extract structured data from the document specified by a JSON schema.
You will learn how to convert a rental agreement document to markdown chunks, and extract structured data from the document specified by a schema.

Prerequisites

  • A Tensorlake API key
  • [Optional] Tensorlake SDK for Python

Convert to Markdown

  • Python SDK
  • REST API
1

Install the SDK

pip install tensorlake
2

Set your API key

Export the variable and the SDK will reference your environment variables, looking for TENSORLAKE_API_KEY:
export TENSORLAKE_API_KEY=your-api-key-here
3

Parse a document

quickstart.py
from tensorlake.documentai import (
  DocumentAI,
  ParsingOptions,
  ChunkingStrategy,
)

doc_ai = DocumentAI()

# Use a publicly accessible URL or upload a file to Tensorlake and use the file ID.
file_url = "https://tlake.link/docs/real-estate-agreement"

# In this example, we are using the PAGE chunking strategy, which means that each page of the document will be a separate chunk.
parsing_options = ParsingOptions(
    chunking_strategy=ChunkingStrategy.PAGE,
)

# Submit the parse operation and wait for the job to complete
parse_id = doc_ai.read(
    file_url=file_url,
    page_range="1-3",
    parsing_options=parsing_options,
)
4

Wait for the job to complete

quickstart.py
result = doc_ai.wait_for_completion(parse_id)
5

Use the results

quickstart.py
for chunk in result.chunks:
  print(f"## Page {chunk.page_number}\n\n")
  print(f"{chunk.content}\n\n")

Output

When the parsing is complete, you will see -
  • markdown_chunks.md
Markdown Chunks
## Page 9

relationships in accordance with any agreement(s) made with licensed real estate agent(s). Seller has read and acknowledges receipt of a copy of this Agreement and authorizes any licensed real estate agent(s) to deliver a signed copy to the Buyer.
Delivery may be in any of the following: (i) hand delivery; (ii) email under the condition that the Party transmitting the email receives electronic confirmation that the email was received to the intended recipient; and (iii) by facsimile to the other Party or the other Party’s licensee, but only if the transmitting fax machine prints a confirmation that the transmission was successful.
XXX. LICENSED REAL ESTATE AGENT(S). If Buyer or Seller have hired the services of licensed real estate agent(s) to perform representation on their behalf, he/she/they shall be entitled to payment for their services as outlined in their separate written agreement.

XXXI. DISCLOSURES. It is acknowledged by the Parties that: (check one) 

- There are no attached addendums or disclosures to this Agreement. 



 - The following addendums or disclosures are attached to this Agreement: (check all that apply) 

- 


Lead-Based Paint Disclosure Form [ ]



- [ ]



- [ ]



- [ ]


- 

- 

- 
XXXII. ADDITIONAL TERMS AND CONDITIONS.
None
XXXIII. ENTIRE AGREEMENT. This Agreement together with any attached addendums or disclosures shall supersede any and all other prior understandings and agreements, either oral or in writing, between the Parties with respect to the subject matter hereof and shall constitute the sole and only agreements between the Parties with respect to the said Property. All prior negotiations and agreements between the Parties with respect to the Property hereof are merged into this Agreement. Each Party to this Agreement acknowledges that no representations, inducements, promises, or agreements, orally or otherwise, have been made by any Party or by anyone acting on behalf of any Party, which are not embodied in this Agreement and that any agreement, statement or promise that is not contained in this Agreement shall not be valid or binding or of any force or effect.
e


Buyer's Initials NE 

-
Seller's Initials JV. 

Page 9 of 10


## CHUNK NUMBER 1

## Page 10

XXXIV. EXECUTION.



|                                                                                  |                    |
|----------------------------------------------------------------------------------|--------------------|
| Buyer Signature: Nova Ellison Date: Print Name: Nova Ellison                     | September 10, 2025 |
| Buyer Signature: Date: Print Name:                                               |                    |
| Seller Signature: Juno Vegi Date: Print Name: J uno Vega                         | September 10, 2025 |
| Seller Signature: Date: Print Name:                                              |                    |
| Agent Signature: Aster Polaris Date: Print Name: Aster Polaris Polaris Group LLC | September 10, 2025 |
| Agent Signature: Date: Print Name:                                               |                    |


e
Page 10 of 10
The chunks contain the document in markdown format. All the elements of the pages including text, tables, figures, etc, are available in the chunks. They are ordered by their natural reading order, which will improve the chunks for your document pre-processing pipelines. In addition, you also have the bounding boxes of every element in the document. Learn more about the output in detail here.

Extract Structured Data

  • Python SDK
  • REST API
1

Parse a document

quickstart.py
import json
import os
from typing import Optional

from pydantic import BaseModel, Field
from tensorlake.documentai import (
  DocumentAI,
  ParsingOptions,
  StructuredExtractionOptions,
  ChunkingStrategy,
)

doc_ai = DocumentAI()

# Use a publicly accessible URL or upload a file to Tensorlake and use the file ID.
file_url = "https://tlake.link/docs/real-estate-agreement"

# Define a JSON schema using Pydantic
# Our structured extraction model will identify the properties we want to extract from the document.
# In this case, we are extracting the names and signature dates of the buyer and seller.
class Signers(BaseModel):
    buyer_name: Optional[str] = Field(
        default=None, description="The name of the buyer, do not extract initials"
    )
    buyer_signature_date: Optional[str] = Field(
        default=None, description="Date and time that the buyer signed."
    )
    seller_name: Optional[str] = Field(
        default=None, description="The name of the seller, do not extract initials"
    )
    seller_signature_date: Optional[str] = Field(
        default=None, description="Date and time that the seller signed."
    )

# Create a structured extraction options object with the schema
#
# You can send as many schemas as you want, and the API will return structured data for each schema
# indexed by the schema name.
real_estate_agreement_extraction_options = StructuredExtractionOptions(
    schema_name="Signers",
    json_schema=Signers,
)

# Submit the parse operation and wait for the job to complete
parse_id = doc_ai.extract(
    file_url=file_url,
    page_range="9-10",
    structured_extraction_options=[real_estate_agreement_extraction_options],
)
2

Wait for the job to complete

quickstart.py
result = doc_ai.wait_for_completion(parse_id)
3

Use the results

quickstart.py
print(json.dumps(result.structured_data[0].data, indent=4))

Output

When the parsing is complete, you will see the structured data in the console.
  • structured_data.json
{
  "Signers": [
    {
      "data": {
        "buyer_name": "Nova Ellison",
        "buyer_signature_date": "September 10, 2025",
        "seller_name": "Juno Vega",
        "seller_signature_date": "September 10, 2025"
      },
      "page_numbers": [
        9,
        10
      ],
      "schema_name": "Signers"
    }
  ]
}

Next Steps

I