- Convert the Document to Markdown for feeding into an LLM.
- Extract structured data from the document specified by a JSON schema.
Prerequisites
- A Tensorlake API key
- [Optional] Tensorlake SDK for Python
Convert to Markdown
- Python SDK
- REST API
1
Install the SDK
Copy
Ask AI
pip install tensorlake
2
Set your API key
Export the variable and the SDK will reference your environment variables, looking for
TENSORLAKE_API_KEY:Copy
Ask AI
export TENSORLAKE_API_KEY=your-api-key-here
3
Parse a document
quickstart.py
Copy
Ask AI
from tensorlake.documentai import (
DocumentAI,
ParsingOptions,
ChunkingStrategy,
)
doc_ai = DocumentAI()
# Use a publicly accessible URL or upload a file to Tensorlake and use the file ID.
file_url = "https://tlake.link/docs/real-estate-agreement"
# In this example, we are using the PAGE chunking strategy, which means that each page of the document will be a separate chunk.
parsing_options = ParsingOptions(
chunking_strategy=ChunkingStrategy.PAGE,
)
# Submit the parse operation and wait for the job to complete
parse_id = doc_ai.read(
file_url=file_url,
page_range="1-3",
parsing_options=parsing_options,
)
4
Wait for the job to complete
quickstart.py
Copy
Ask AI
result = doc_ai.wait_for_completion(parse_id)
5
Use the results
quickstart.py
Copy
Ask AI
for chunk in result.chunks:
print(f"## Page {chunk.page_number}\n\n")
print(f"{chunk.content}\n\n")
1
Parse a document
parseFileUrl.js
Copy
Ask AI
async function parseFileUrl(fileUrl, tensorlakeApiKey) {
const parsingOptions = {
chunking_strategy: "page",
};
const body = {
file_url: fileUrl,
page_range: "1-3",
parsing_options: parsingOptions,
};
const options = {
method: 'POST',
headers: {
Authorization: `Bearer ${tensorlakeApiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify(body),
};
const response = await fetch(
'https://api.tensorlake.ai/documents/v2/read',
options
);
const result = await response.json();
console.log('result:', JSON.stringify(result, null, 2));
return result.jobId;
}
const fileId =
'https://tlake.link/docs/real-estate-agreement';
const tensorlakeApiKey =
'your-tensorlake-api-key-here';
const parseId = await parseFileUrl(fileId, tensorlakeApiKey);
2
Wait for the job to complete
getResults.js
Copy
Ask AI
function writeParseResults(jobResult) {
let markdownContent = '';
jobResult.chunks.forEach(chunk) => {
markdownContent += `## PAGE NUMBER ${chunk.page_number}\n\n`;
markdownContent += `${chunk.content}\n\n`;
});
console.log(markdownContent);
}
async function getParseResults(parseId, tensorlakeApiKey) {
while (true) {
const response = await fetch(
`https://api.tensorlake.ai/documents/v2/parse/${parseId}`,
{
method: 'GET',
headers: {
Authorization: `Bearer ${tensorlakeApiKey}`,
'Content-Type': 'application/json',
},
}
);
if (!response.ok) {
console.error(`Error fetching job: ${response.statusText}`);
return;
}
const result = await response.json();
if (result.status === 'pending' || result.status === 'processing') {
console.log('waiting 5s...');
await new Promise((resolve) => setTimeout(resolve, 5000));
console.log(`job status: ${result.status}`);
} else {
if (result.status === 'successful') {
console.log(result);
writeParseResults(result);
return result;
} else {
console.error(`Job finished with status: ${result.status}`);
return result;
}
}
}
}
const parseId = 'your-parse-id-here';
const tensorlakeApiKey = 'your-tensorlake-api-key-here';
await getParseResults(parseId, tensorlakeApiKey);
Output
When the parsing is complete, you will see -- markdown_chunks.md
Markdown Chunks
Copy
Ask AI
## Page 9
relationships in accordance with any agreement(s) made with licensed real estate agent(s). Seller has read and acknowledges receipt of a copy of this Agreement and authorizes any licensed real estate agent(s) to deliver a signed copy to the Buyer.
Delivery may be in any of the following: (i) hand delivery; (ii) email under the condition that the Party transmitting the email receives electronic confirmation that the email was received to the intended recipient; and (iii) by facsimile to the other Party or the other Party’s licensee, but only if the transmitting fax machine prints a confirmation that the transmission was successful.
XXX. LICENSED REAL ESTATE AGENT(S). If Buyer or Seller have hired the services of licensed real estate agent(s) to perform representation on their behalf, he/she/they shall be entitled to payment for their services as outlined in their separate written agreement.
XXXI. DISCLOSURES. It is acknowledged by the Parties that: (check one)
- There are no attached addendums or disclosures to this Agreement.
- The following addendums or disclosures are attached to this Agreement: (check all that apply)
-
Lead-Based Paint Disclosure Form [ ]
- [ ]
- [ ]
- [ ]
-
-
-
XXXII. ADDITIONAL TERMS AND CONDITIONS.
None
XXXIII. ENTIRE AGREEMENT. This Agreement together with any attached addendums or disclosures shall supersede any and all other prior understandings and agreements, either oral or in writing, between the Parties with respect to the subject matter hereof and shall constitute the sole and only agreements between the Parties with respect to the said Property. All prior negotiations and agreements between the Parties with respect to the Property hereof are merged into this Agreement. Each Party to this Agreement acknowledges that no representations, inducements, promises, or agreements, orally or otherwise, have been made by any Party or by anyone acting on behalf of any Party, which are not embodied in this Agreement and that any agreement, statement or promise that is not contained in this Agreement shall not be valid or binding or of any force or effect.
e
Buyer's Initials NE
-
Seller's Initials JV.
Page 9 of 10
## CHUNK NUMBER 1
## Page 10
XXXIV. EXECUTION.
| | |
|----------------------------------------------------------------------------------|--------------------|
| Buyer Signature: Nova Ellison Date: Print Name: Nova Ellison | September 10, 2025 |
| Buyer Signature: Date: Print Name: | |
| Seller Signature: Juno Vegi Date: Print Name: J uno Vega | September 10, 2025 |
| Seller Signature: Date: Print Name: | |
| Agent Signature: Aster Polaris Date: Print Name: Aster Polaris Polaris Group LLC | September 10, 2025 |
| Agent Signature: Date: Print Name: | |
e
Page 10 of 10
Extract Structured Data
- Python SDK
- REST API
1
Parse a document
quickstart.py
Copy
Ask AI
import json
import os
from typing import Optional
from pydantic import BaseModel, Field
from tensorlake.documentai import (
DocumentAI,
ParsingOptions,
StructuredExtractionOptions,
ChunkingStrategy,
)
doc_ai = DocumentAI()
# Use a publicly accessible URL or upload a file to Tensorlake and use the file ID.
file_url = "https://tlake.link/docs/real-estate-agreement"
# Define a JSON schema using Pydantic
# Our structured extraction model will identify the properties we want to extract from the document.
# In this case, we are extracting the names and signature dates of the buyer and seller.
class Signers(BaseModel):
buyer_name: Optional[str] = Field(
default=None, description="The name of the buyer, do not extract initials"
)
buyer_signature_date: Optional[str] = Field(
default=None, description="Date and time that the buyer signed."
)
seller_name: Optional[str] = Field(
default=None, description="The name of the seller, do not extract initials"
)
seller_signature_date: Optional[str] = Field(
default=None, description="Date and time that the seller signed."
)
# Create a structured extraction options object with the schema
#
# You can send as many schemas as you want, and the API will return structured data for each schema
# indexed by the schema name.
real_estate_agreement_extraction_options = StructuredExtractionOptions(
schema_name="Signers",
json_schema=Signers,
)
# Submit the parse operation and wait for the job to complete
parse_id = doc_ai.extract(
file_url=file_url,
page_range="9-10",
structured_extraction_options=[real_estate_agreement_extraction_options],
)
2
Wait for the job to complete
quickstart.py
Copy
Ask AI
result = doc_ai.wait_for_completion(parse_id)
3
Use the results
quickstart.py
Copy
Ask AI
print(json.dumps(result.structured_data[0].data, indent=4))
Prerequisites
- A Tensorlake API key
1
Parse a document
parseFileUrl.js
Copy
Ask AI
async function parseFileUrl(fileUrl, tensorlakeApiKey) {
const signersSchema = {
title: "Signers",
type: "object",
properties: {
buyerName: {
type: "string",
description: "The name of the buyer, do not extract initials",
title: "Buyer Name"
},
buyerSignatureDate: {
type: "string",
description: "Date and time that the buyer signed.",
title: "Buyer Signature Date"
},
sellerName: {
type: "string",
description: "The name of the seller, do not extract initials",
title: "Seller Name"
},
sellerSignatureDate: {
type: "string",
description: "Date and time that the seller signed.",
title: "Seller Signature Date"
}
}
};
const realEstateAgreementExtractionOptions = {
schema_name: "Signers",
json_schema: signersSchema,
};
const body = {
file_url: fileUrl,
page_range: "9-10",
structured_extraction_options: [realEstateAgreementExtractionOptions],
};
const options = {
method: 'POST',
headers: {
Authorization: `Bearer ${tensorlakeApiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify(body),
};
const response = await fetch(
'https://api.tensorlake.ai/documents/v2/extract',
options
);
const result = await response.json();
console.log('result:', JSON.stringify(result, null, 2));
return result.jobId;
}
const fileId =
'https://tlake.link/docs/real-estate-agreement';
const tensorlakeApiKey =
'your-tensorlake-api-key-here';
const parseId = await parseFileUrl(fileId, tensorlakeApiKey);
2
Wait for results
getResults.js
Copy
Ask AI
import { writeFileSync } from 'fs';
function writeParseResults(jobResult) {
const structuredData = jobResult.structured_data;
console.log(structuredData);
}
async function getParseResults(parseId, tensorlakeApiKey) {
while (true) {
const response = await fetch(
`https://api.tensorlake.ai/documents/v2/parse/${parseId}`,
{
method: 'GET',
headers: {
Authorization: `Bearer ${tensorlakeApiKey}`,
'Content-Type': 'application/json',
},
}
);
if (!response.ok) {
console.error(`Error fetching job: ${response.statusText}`);
return;
}
const result = await response.json();
if (result.status === 'pending' || result.status === 'processing') {
console.log('waiting 5s...');
await new Promise((resolve) => setTimeout(resolve, 5000));
console.log(`job status: ${result.status}`);
} else {
if (result.status === 'successful') {
console.log(result);
writeParseResults(result);
return result;
} else {
console.error(`Job finished with status: ${result.status}`);
return result;
}
}
}
}
const parseId = 'your-parse-id-here';
const tensorlakeApiKey = 'your-tensorlake-api-key-here';
await getParseResults(parseId, tensorlakeApiKey);
Output
When the parsing is complete, you will see the structured data in the console.- structured_data.json
Copy
Ask AI
{
"Signers": [
{
"data": {
"buyer_name": "Nova Ellison",
"buyer_signature_date": "September 10, 2025",
"seller_name": "Juno Vega",
"seller_signature_date": "September 10, 2025"
},
"page_numbers": [
9,
10
],
"schema_name": "Signers"
}
]
}