In this tutorial you will extract contextual information from documents containing signatures using Tensorlake, LangChain, and OpenAI.

A full, runnable example of an already built agent using both the CLI and a Streamlit app is available in the Tensorlake GitHub repository.

Closing Deals Faster with Signature Detection and LangGraph

Let’s set the context for this example, you will build a LangGraph agent for a real estate company to help track who has signed property documents, when they signed, and who still needs to sign.

You’ll learn how to:

  • Use Tensorlake’s Signature Detection SDK
  • Extract and summarize signature status per property
  • Create a LangGraph agent that uses the structured data to answer questions like:
    • How many signatures were detected in this document and who are the parties involved?
    • What contextual information can you extract about any signatures?
    • Are there any missing signatures on any pages?

This is perfect for automating due diligence and compliance tracking across large sets of signature-heavy documents.

Prerequisites

Step 0: Set up your environment

pip install openai tensorlake langchain langgraph langchain-community python-dotenv

In .env, set your API keys:

OPENAI_API_KEY=your_openai_api_key
TENSORLAKE_API_KEY=your_tensorlake_api_key

Step 1: Upload and parse documents with Tensorlake

For this tutorial, you need to create a file called signature_detection_langgraph_agent.py where you will extract data from the documents with Tensorlake and define our LangGraph agent.

1.1. Prepare your imports

At the top, make sure you’ve imported all of the necessary Tensorlake, LangGraph, LangChain, and helper packages. Then, load your environment variables from .env:

# helper packages
import os
import time
import json
import logging
from typing import Optional, Type, Annotated, TypedDict, Union, List
from pydantic import Field, BaseModel, Json
import asyncio
from pathlib import Path
from dotenv import load_dotenv

# LangGraph and LangChain imports
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
from langchain_core.tools import StructuredTool

# TensorLake imports
from tensorlake.documentai import DocumentAI, ParsingOptions
from tensorlake.documentai.parse import (ExtractionOptions)

load_dotenv()

# Load environment variables
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
TENSORLAKE_API_KEY = os.getenv("TENSORLAKE_API_KEY")

1.2. Use Tensorlake to extract signatures

Create a Langchain tool called detect_signatures_in_document that takes as the file path of the document to be parsed as string. This will handle everything related to data extraction, including:

  • Upload the document to Tensorlake
  • Specify parsing options so that we extract the specific information we’re looking for
  • Initiate the parsing job with Tensorlake
  • Query the job until it completes; if successful, the results will be returned
def detect_signatures_in_document(path: str) -> str:
    """
    Uploads a document to TensorLake, triggers parsing, including signature detection, and returns the parsed result.
    """
    if not Path(path).exists():
        return f"File not found: {path}"

    if not TENSORLAKE_API_KEY:
        return "Error: TENSORLAKE_API_KEY environment variable is not set"

    # Initialize DocumentAI client
    doc_ai = DocumentAI(api_key=TENSORLAKE_API_KEY)
    
    # Upload document to TensorLake
    file_id = doc_ai.upload(path=path)

    parsing_options = ParsingOptions(
        detect_signature=True,  # this needs to be True
        extraction_options=ExtractionOptions(skip_ocr=True),
    )
    # Start parsing job
    job_id = doc_ai.parse(file_id, options=parsing_options)

    # Poll for completion with configurable timeout
    start_time = time.time()
    max_wait_time = 300  # or set as a constant or parameter (here, we will wait for 5 min max)

    while time.time() - start_time < max_wait_time:
        # Signature detection result after parsing the document
        result = doc_ai.get_job(job_id)  # this may take 2-3 minutes

        if result.status in ["pending", "processing"]:
            time.sleep(5)  # Wait 5 seconds before checking again
        elif result.status == "successful":

            # Optional: save the parsed result in a file for referring to it later
            with open(f"parsed_result_{file_id}.json", "w") as f:
                json.dump(result.model_dump(), f, indent=2)

            # Return parsed result
            return str(result)
        else:
            return f"Document parsing failed with status: {result.status}"

    # Timeout reached
    return f"Document processing timeout after {max_wait_time} seconds. Job ID: {job_id}"

This synchronous function will run the core logic of Signature Detection using Tensorlake. To make this work asynchronously so that it can be integrated with async-compatible agent frameworks like LangGraph, we wrap it in an async-compatible function using asyncio.to_thread().

async def detect_signatures_in_document_async(path: str) -> str:
    """Asynchronous version of detecting signatures from document."""
    return await asyncio.to_thread(detect_signatures_in_document, path)

With the signature detection function defined, we wrap it as a LangChain tool using StructuredTool, so it can be invoked by agents.

# Create the LangChain tool using StructuredTool
signature_detection_tool = StructuredTool.from_function(
    func=detect_signatures_in_document,
    coroutine=detect_signatures_in_document_async,
    name="SignatureDetectionTool",
    description="Detect signatures from any document (PDF, Markdown, Docx, etc.)",
    return_direct=False,
    handle_tool_error="Document parsing failed. Please verify the file path and your Tensorlake API key."
)

1.3. Understand the parsed result

The result of the parsing job will contain structured data about the document, including pages, fragments, and detected signatures. Find a full example of what the JSON output might look like in this gist.

It’s important to understand the structure of this data as it relates to signatures so that we can extract the relevant information for our agent. Below is an example of a single page fragment that is found on the first page. The result will contain a list of pages, each with its own fragments. Each fragment will have a bounding box, content, and a type, such as text, key-value pairs, or signatures.

Page 1 of the document contains an initials signature at the bottom of the page:

You can see here that a signature page fragment was found on page one, towards the bottom left corner of the page.:

{
    "pages": [
      {
        "dimensions": [
          1584,
          1224
        ],
        "layout": {},
        "page_fragments": [
          {
            "bbox": {
              "x1": 207,
              "x2": 250,
              "y1": 730,
              "y2": 756
            },
            "content": {
              "content": "Signature detected"
            },
            "fragment_type": "signature",
            "reading_order": null
          }
        ],
        "page_number": 1
      },
    ]
}

If you were to view the bounding boxes on the Tensorlake Playground, you would see the signature fragment highlighted in the bottom left corner of the page:

Now that you know how the data is structured and how the structured data relates to the document, you can extract only the relevant data for this particular agent. For this example, we will proceed with the full extracted data.

Step 2: Create the signature query LangGraph agent

Now that we have a tool that can extract signature data from a document, we want to enable users to ask natural language questions about it. Instead of manually opening JSON files, we’ll build a conversational agent using LangGraph, a framework for building stateful, tool-using agents that run on top of language models. This agent will:

  • Use the signature_detection_tool to extract signature data from the document using Tensorlake’s Contextual Signature Detection
  • Interpret user questions (e.g. “Which pages are missing signatures?”)
  • Return structured, accurate answers

2.1. Define the LangGraph agent prompt and behavior

First, define how the agent should think. To do this, build a dynamic system prompt that includes the parsed result and questions the agent should answer. This prompt is injected at runtime and defines the agent’s behavior.

def build_document_analysis_prompt(parsed_result: str, questions: Union[str, List[str]]) -> str:
    # Normalize single question to list
    if isinstance(questions, str):
        questions = [questions]

    question_block = "\n".join(f"{i + 1}. {q}" for i, q in enumerate(questions))

    system_prompt = f"""You are a helpful assistant that answers questions about documents with signature detection data.

Your responsibilities:
1. Answer questions based on that loaded data
2. Help users understand the signature analysis results

You can answer questions like:
- How many signatures were found?
- Which pages contain signatures?
- Who signed the document?
- What does the content say around signatures?
- What type of document is this?
- Who are the parties involved?
- What is the date of the signature?
- Did each party sign the document?
- Are there any missing signatures on any pages?
- Which property is missing signatures?
- Who is the agent for the properties missing signatures?

I've processed a document and got this result:
{parsed_result}

Please analyze the above parsed output and answer the following:
{question_block}
"""
    return system_prompt

2.2. Define the LangGraph angent workflow

Next, define the LangGraph state machine that controls how the agent operates. In this setup:

  • The agent always starts by reasoning over the user’s input.
  • If the model chooses to call a tool (e.g., to load saved signature data), the graph transitions to the tool node.
  • Once the tool executes, control is returned to the agent to continue the conversation.

This loop continues until no further tool calls are made, and the conversation ends. LangGraph makes this flow explicit, predictable, and production-safe.

# Define the agent state
class AgentState(TypedDict):
    messages: Annotated[list, add_messages]


# Agent node - decides whether to use tools
async def agent_node(state: AgentState):
    model = ChatOpenAI(
        model="gpt-4o",
        temperature=0.1
    ).bind_tools([signature_detection_tool])

    response = await model.ainvoke(state["messages"])
    return {"messages": [response]}


# Conditional Logic for Tool Use
def should_continue(state: AgentState):
    last_message = state["messages"][-1]
    if hasattr(last_message, 'tool_calls') and last_message.tool_calls:
        return "tools"
    return END

# LangGraph Workflow Setup
workflow = StateGraph(AgentState)
workflow.add_node("agent", agent_node)
workflow.add_node("tools", ToolNode([signature_detection_tool]))
workflow.set_entry_point("agent")
workflow.add_conditional_edges("agent", should_continue, {"tools": "tools", END: END})
workflow.add_edge("tools", "agent")
app = workflow.compile()
StateGraph is the core abstraction in LangGraph. It defines how state flows between nodes (e.g., agent reasoning and tool execution). Each node processes and updates the state in this case, the conversation history, and the graph determines which node runs next, based on tool usage or end conditions. You can read more about this on the LangGraph announcement blog post.
# Document + Agent Pipeline
async def analyze_signatures_agents(
        path: str,
        questions: List[str]
) -> str:
    """Invoke the tool with parsing options, then use agent for analysis."""

    print("🔍 Processing document with signature detection...")

    # Pass parsing options and run the tool
    parsed_output = await signature_detection_tool.ainvoke({
        "path": path
    })

    # Build prompt
    prompt = build_document_analysis_prompt(parsed_output, questions)

    # Run agent on prompt
    final_state = await app.ainvoke({
        "messages": [HumanMessage(content=prompt)]
    })

    return final_state["messages"][-1].content

2.3. Run the LangGraph agent

Once built, run the agent with the following code:

async def example_signature_detection_real_estate():

    # change to your own file path
    path = "path/to/your/document.pdf"

    analysis_questions = [
        "How many signatures were detected in this document and who are the parties involved?",
        "What contextual information can you extract about any signatures?",
        "Are there any missing signatures on any pages?"
    ]

    result = await analyze_signatures_agents(
        path=path,
        questions=analysis_questions
    )

    print("Analysis Result:\n\n", result)


if __name__ == "__main__":
    # run the example
    asyncio.run(example_signature_detection_real_estate())

Step 3: Test the Tensorlake powered LangGraph agent in the CLI

Finally, run the script to see the agent in action. It will:

  • Parse the document using Tensorlake’s signature detection
  • Build a dynamic prompt based on the parsed data
  • Use the LangGraph agent to answer questions about the signatures
(venv) % python3 signature_detection_langgraph_agent.py
🔍 Processing document with signature detection...
Analysis Result:

 Based on the parsed output from the document, here are the answers to your questions:

1. **How many signatures were detected in this document and who are the parties involved?**

   - A total of 20 signatures were detected in the document. 
   - The parties involved in the document are:
     - **Buyer:** Nova Ellison
     - **Seller:** Juno Vega
     - **Agent:** Aster Polaris from Polaris Group LLC

2. **What contextual information can you extract about any signatures?**

   - The document is a "Residential Real Estate Purchase Agreement" made on September 20, 2025.
   - The signatures are associated with the execution of the agreement, indicating acceptance of the terms by the Buyer, Seller, and Agent.
   - The document includes specific sections where signatures are required, such as the execution section on page 10, where the Buyer, Seller, and Agent have signed and dated the document on September 10, 2025.
   - The signatures are detected on each page, indicating that initials or signatures are required throughout the document to confirm agreement to various sections.

3. **Are there any missing signatures on any pages?**

   - The document does not explicitly indicate missing signatures. However, there are placeholders for initials on several pages (e.g., "Buyer's Initials __________ - _______ Seller's Initials __________ - _______"), which suggest that initials might be required but are not filled in the parsed output.
   - The final execution page (page 10) shows that the Buyer, Seller, and Agent have signed, which is crucial for the document's validity.
   - Without the actual document to verify, it's unclear if these placeholders were intended to be filled or if they are optional. The presence of detected signatures suggests that the main required signatures are present.
Don’t forget to deactive venv when you’re done testing the agent.

Step 4: Build a Tensorlake backed LangGraph agent yourself

You can start using Signature Detection today in the Tensorlake Playground or via our Python SDK. When you sign up, you get 100 free credits, enough to process about 100 pages.

If you want to run an already built agent, you can check out this full example using both the CLI and a Streamlit app in the Tensorlake GitHub repository.

We built TensorLake to empower developers and product teams to do more with documents - faster, and with less complexity.

We’d love to see what you build with this, you can share with us or give us feedback in our Slack Community.