Skip to main content
LangChain is a framework for building LLM-powered applications. When combined with Tensorlake, you can create agents that automatically parse complex documents during conversations—no manual preprocessing needed. This integration is essential for building financial analysts, research assistants, and document QA agents that need to process files on-the-fly.
Run this end-to-end in Colab:

Why Use Tensorlake + LangChain?

The Problem:
  • Agents need to process documents mid-conversation but parsing happens outside the workflow
  • Manual file preprocessing breaks agentic automation
  • Agents can’t extract structured data, tables, or figures without custom code
  • No way to handle document parsing as a tool in agent toolchains
The Solution: Tensorlake’s LangChain tool enables agents to parse documents on-demand. When an agent encounters a file URL, it automatically calls Tensorlake to extract text, tables, and summaries. Key Benefits:
  • Automatic parsing - Agents parse documents when needed, no preprocessing
  • Tool integration - Document parsing becomes a native agent capability
  • Structured extraction - Pull metadata, tables, and figures in agent workflows
  • Production-ready - Handle financial reports, research papers, and contracts in conversational AI

Installation

pip install langchain-tensorlake

Quick Start

Step 1: Set API Keys

export TENSORLAKE_API_KEY="your-tensorlake-api-key"
export OPENAI_API_KEY="your-openai-api-key"

Step 2: Create Agent with Document Parsing Tool

Build a LangGraph agent that can parse documents automatically:
from langchain_tensorlake import document_markdown_tool
from langgraph.prebuilt import create_react_agent

# Create agent with document parsing capability
agent = create_react_agent(
    model="openai:gpt-4o-mini",
    tools=[document_markdown_tool],
    prompt=(
        """
        I have a document that needs to be parsed. Please parse this document and answer the question about it.
        """
    ),
    name="financial-analyst",
)

# Agent automatically parses documents when needed
result = agent.invoke({
    "messages": [{
        "role": "user",
        "content": "What is the quarterly revenue of Apple based on this file? https://www.apple.com/newsroom/pdfs/fy2025-q2/FY25_Q2_Consolidated_Financial_Statements.pdf"
    }]
})

print(result["messages"][-1].content)
Output:
Based on the financial statements from Apple's second quarter of FY2025, the quarterly revenue figures are as follows:

- **Total Net Sales** for the quarter ended March 29, 2025: **$95,359 million**.

This total includes revenue from both products and services:
- **Products Revenue**: $68,714 million
- **Services Revenue**: $26,645 million
The agent automatically:
  1. Detected the PDF URL in the query
  2. Called Tensorlake to parse the financial statement
  3. Extracted revenue data from tables
  4. Answered the question with specific figures

How Agent-Based Parsing Works

Traditional document pipelines require upfront processing. Agents can’t adapt to new files during conversations. This integration changes the workflow:
  1. During conversation: User mentions a file URL
  2. Tool invocation: Agent recognizes it needs document content and calls the Tensorlake tool
  3. Parsing: Tensorlake parses the document and extracts text, tables, and data
  4. Context injection: Parsed content returns to the agent’s context window
  5. Response generation: Agent answers using the parsed document
The key insight: Parsing happens on-demand as part of the agent’s reasoning loop, not as a separate preprocessing step.

Use Cases

Financial Analysis Agents

Build analysts that parse earnings reports, balance sheets, and regulatory filings on-demand. Extract revenue, expenses, and key metrics without manual preprocessing.

Research Assistants

Create agents that read research papers mid-conversation. Automatically extract abstracts, methodologies, and experimental results when users ask questions. Build agents that analyze contracts and legal briefs. Parse clause content, extract key terms, and compare documents during conversations.

Customer Support Automation

Enable support agents to parse product manuals, warranty documents, and technical specs when helping customers.

Compliance Monitoring

Create agents that review regulatory filings and compliance documents. Extract required disclosures and flag missing information.

Best Practices

1. Design Clear Agent Prompts

Help agents understand when to use document parsing:
agent = create_react_agent(
    model="openai:gpt-4o-mini",
    tools=[document_markdown_tool],
    name="analyst",
    system_message="""You are a financial analyst. When users provide 
    PDF links to financial documents, use the document parsing tool to 
    extract and analyze the content."""
)

2. Handle Multiple Documents Efficiently

Process documents in parallel when comparing multiple files:
# Agent can process multiple documents in one turn
result = agent.invoke({
    "messages": [{
        "role": "user",
        "content": """Compare these three quarterly reports:
        Q1: https://example.com/q1.pdf
        Q2: https://example.com/q2.pdf
        Q3: https://example.com/q3.pdf"""
    }]
})

3. Validate Tool Usage

Monitor agent behavior to ensure proper tool usage:
result = agent.invoke({"messages": conversation})

# Check tool calls
for message in result["messages"]:
    if message.get("type") == "tool_call":
        print(f"Tool used: {message['name']}")
        print(f"Arguments: {message['args']}")

Using the Python SDK Directly

For non-agentic workflows, use the Tensorlake Python SDK directly in LangChain pipelines:
from tensorlake.documentai import DocumentAI, ParsingOptions, ChunkingStrategy
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma

# Parse with Tensorlake
doc_ai = DocumentAI()
file_id = doc_ai.upload("contract.pdf")

parse_options = ParsingOptions(
    chunking_strategy=ChunkingStrategy.SECTION
)

result = doc_ai.parse_and_wait(file_id, parse_options)

# Use in LangChain
documents = [chunk.content for chunk in result.chunks]

# Create embeddings and vector store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_texts(documents, embeddings)

# Build retrieval chain
retriever = vectorstore.as_retriever()

Complete Example

Try the full working example with financial analysis agent:

Agentic Document Parsing Notebook

Complete code walkthrough.

What’s Next?

Build advanced agents: Combine with vector databases:

Resources

Need Help?

Join our community to discuss agentic workflows: