Extract Structured Data from Images

This example demonstrates how to build a Tensorlake application that extracts structured data from driver’s license images using OpenAI’s vision model. What you’ll learn:

Processing multimodal data (images) in Tensorlake
Using custom dependencies with Docker images
Managing API secrets securely
Extracting structured data with Pydantic models

structured_extraction.py

import os
import base64

import requests
from pydantic import BaseModel
from tensorlake.applications import application, function, Image, RequestError

# Install dependencies for your application
image = Image().run("pip install openai pydantic requests")
# List of secrets required by the application.
# The application expects to find these secrets in the environment.
secrets = ["OPENAI_API_KEY"]

class DrivingLicense(BaseModel):
    name: str
    date_of_birth: str
    address: str
    license_number: str
    license_expiration_date: str

@application()
@function(image=image, secrets=secrets)
def extract_driving_license_data(url: str) -> DrivingLicense:
    from openai import OpenAI

    # Download image from URL
    http_response = requests.get(url)
    http_response.raise_for_status()

    # Encode image as base64
    image_base64 = base64.b64encode(http_response.content).decode("utf-8")

    # Determine image format from content type or URL
    content_type = http_response.headers.get("content-type", "")
    if "jpeg" in content_type or "jpg" in content_type:
        image_format = "jpeg"
    elif "png" in content_type:
        image_format = "png"
    else:
        # Default to jpeg if can't determine
        image_format = "jpeg"

    # Extract structured data using OpenAI's vision model
    openai = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

    completion = openai.beta.chat.completions.parse(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": "Extract the personal information from the driving license image.",
            },
            {
                "role": "user",
                "content": [
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/{image_format};base64,{image_base64}"
                        },
                    }
                ],
            },
        ],
        response_format=DrivingLicense,
    )

    license_data: DrivingLicense = completion.choices[0].message.parsed
    return license_data

Building custom images allows you to install pretty much anything you want in your function’s environment. Take a look at the Dependency management guide to learn more about it. Before we deploy this application on Tensorlake, we need to make sure the function can access the secret api key. You can do this by running the tensorlake secrets command in your terminal:

tensorlake secrets set OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>

If you want to learn more about how we manage secrets, take a look at the Secrets management guide.

Now that you’ve defined your custom image, and set the secret api key, you can deploy the application:

tensorlake deploy structured_extraction.py

You should see the tensorlake stream build logs as your image is being built. Once the image is built, you can invoke the application as before.

curl -N -X POST https://api.tensorlake.ai/applications/extract_driving_license_data \
  -H "Authorization: Bearer $TENSORLAKE_API_KEY" \
  --json '"https://tlake.link/dl"'

The response will contain the extracted structured data:

{
  "name": "John Doe",
  "date_of_birth": "1990-01-15",
  "address": "123 Main St, City, State 12345",
  "license_number": "D1234567",
  "license_expiration_date": "2025-01-15"
}

Overview

Agentic Applications

Document Processing

Extract Structured Data from Images