Documentation Index
Fetch the complete documentation index at: https://docs.tensorlake.ai/llms.txt
Use this file to discover all available pages before exploring further.
Analyzing AI Risk Disclosures in SEC Filings with Tensorlake & Databricks
Track how AI risk disclosures evolved across major tech companies from 2021-2025 by parsing SEC filings, extracting structured risk data, and running SQL analytics in Databricks Data Intelligence Platform.Track AI Risk Evolution Across Tech Companies
Let’s set the context for this example: you’ll build a document analytics pipeline that processes SEC filings from major tech companies to track how AI risk disclosures have evolved from 2021-2025. You’ll learn how to:- Use Tensorlake’s Page Classification to identify risk factor pages with VLMs
- Extract structured data from only relevant pages using Pydantic schemas
- Deploy serverless applications on Tensorlake’s platform to run your entire pipeline
- Load parsed document data into Databricks for SQL analytics
- Query trends, compare companies, and discover emerging risk patterns
The Challenge
Major tech companies file lengthy SEC reports (100-200+ pages) quarterly. AI-related risk disclosures are scattered throughout these documents, making manual analysis time-consuming and prone to missing critical information.Our Solution
We’ll analyze 3 SEC filings from Microsoft, Google, and Meta spanning 2024-2025 to:- Use VLMs to identify pages containing AI risk factors (reducing processing from ~200 pages to ~20 per document)
- Extract structured risk data from only relevant pages
- Deploy the entire pipeline as serverless applications on Tensorlake
- Store and analyze trends in Databricks SQL Warehouse
- Uncover emerging AI risk patterns and regulatory concerns
Prerequisites
- Python 3.11+
- A Tensorlake API key
- Databricks SQL Warehouse credentials:
- Server Hostname
- HTTP Path
- Access Token
- [Optional] A virtual Python environment to keep dependencies isolated
Getting Started
Databricks Setup
You need access to a Databricks SQL Warehouse. Find your connection details in the Databricks workspace under SQL Warehouses → Connection Details.Local Testing
1. Install Dependencies
2. Set Environment Variables
.env file with these values.
Build Your Document Processing Application
We’ll create a Tensorlake application that extracts AI risk data from SEC filings and stores it in Databricks. This application demonstrates a complete document processing pipeline using Tensorlake Applications with parallel processing via.map().
Pipeline Architecture
The application follows this flow:@application(): Marks the entry point of your application@function(): Makes functions distributed and executable in the cloud or locally.map(): Enables parallel execution across multiple itemsImage: Defines the Docker container environment with dependenciessecrets: Securely injects environment variables at runtime
Define Your Extraction Schemas
First, define the Pydantic models that describe the data structure you want to extract:Create the Document Processing Application
Create a file calledprocess-sec.py:
Test Locally
Run the processing script to extract data from a test SEC filing:- Classify pages to find AI risk factors using VLMs
- Extract structured data from those pages in parallel via
.map() - Initialize the Databricks table schema
- Load the extracted data into your Databricks tables in parallel via
.map()
Build Your Query Application
Now create a separate application for querying the extracted data. Create a file calledquery-sec.py:
Test Queries Locally
Query the extracted data (replace5 with any query number 0-5):
0- Risk category distribution1- Operational AI risks (most detailed per company)2- Emerging risks in 20253- Risk timeline analysis4- Company risk profiles5- Company summary statistics
Deploy to Tensorlake Cloud
Now that you’ve tested locally, deploy your applications to run as serverless functions in the cloud.1. Verify Tensorlake Connection
2. Set Secrets
Store your credentials securely in Tensorlake:3. Verify Secrets
4. Deploy Applications
Deploy the processing application:5. Run the Full Pipeline
Create a script calledprocess-sec-remote.py to process all SEC filings using your deployed application:
6. Query from the Deployed Application
Create a script calledquery-sec-remote.py:
Analyze Your Results
Let’s examine what insights we can extract from the data.Query 1: Risk Category Distribution
See which types of AI risks are most common:Query 2: Most Detailed Operational Risks
Find the most comprehensive operational risk description from each company:Query 3: Timeline Analysis
Track how risk mentions evolved over time:Query 4: Company Risk Profiles
Compare risk category frequencies across companies:Key Insights
Through this analysis pipeline, you can uncover:- Risk Category Trends: Operational and regulatory risks dominate across all companies
- Disclosure Evolution: Risk mention frequency increases in more recent filings
- Company Differences: Each company emphasizes different risk categories based on their AI strategy
- Emerging Patterns: New risk categories appear over time (liability, IP concerns, energy dependencies)
Architecture Benefits
This Tensorlake + Databricks integration provides:- Serverless Execution: No infrastructure to manage, applications scale automatically
- Parallel Processing: Multiple documents processed simultaneously via
.map()at both extraction and database write stages - Separation of Concerns: Document processing and querying are independent applications
- Reusable Components: Each function can be called independently or composed into larger pipelines
- Secret Management: Credentials stored securely and injected at runtime
- Fault Tolerance: Functions wrapped in try-except ensure pipeline continues even if individual items fail
Adapt This Pipeline
This pipeline can be adapted for any document analysis use case:- ESG Disclosures: Track sustainability commitments across annual reports
- Financial Metrics Tracking: Extract KPIs from earnings reports over time
- Competitive Intelligence: Monitor competitor product launches and strategies
- Regulatory Compliance: Alert on new compliance requirements in legal documents
- Contract Analysis: Extract key terms and obligations from agreements
Clean Up
When you’re done with this example:Next Steps
Now that you have the basics down, explore these resources:- Python SDK and API Docs
- Applications Documentation
- Page Classification Guide
- Structured Extraction Guide
- Blog
- Community Slack