Analyzing AI Risk Disclosures in SEC Filings with Tensorlake & MotherDuck
Track how AI risk disclosures evolved across Microsoft, Google, and Meta from 2021-2025 by parsing 40 SEC filings, extracting structured risk data, and running SQL analytics in MotherDuck.Track AI Risk Evolution Across Tech Companies
Let’s set the context for this example: you’ll build a document analytics pipeline that processes SEC filings from major tech companies to track how AI risk disclosures have evolved from 2021-2025. You’ll learn how to:- Use Tensorlake’s Page Classification to identify risk factor pages with VLMs
- Extract structured data from only relevant pages using Pydantic schemas
- Load parsed document data into MotherDuck for SQL analytics
- Query trends, compare companies, and discover emerging risk patterns
The Challenge
Major tech companies file lengthy SEC reports (100-200+ pages) quarterly. AI-related risk disclosures are scattered throughout these documents, making manual analysis time-consuming and prone to missing critical information.Our Solution
We’ll analyze 40 SEC filings from Microsoft, Google, and Meta spanning 2021-2025 to:- Use VLMs to identify pages containing AI risk factors (reducing processing from ~200 pages to ~20 per document)
- Extract structured risk data from only relevant pages
- Store and analyze trends in MotherDuck’s cloud data warehouse
- Uncover emerging AI risk patterns and regulatory concerns
Prerequisites
- Python 3.10+
- A Tensorlake API key
- A MotherDuck token
- SEC filing PDFs (we provide sample URLs)
- [Optional] A virtual Python environment to keep dependencies isolated
Build Your Document Analytics Pipeline
Set up your environment
Thetensorlake package includes DocumentAI for parsing, while duckdb provides the MotherDuck connector.
Install necessary packages
Configure your API keys
Set environment variables for authentication:.env file:
Prepare your imports
Configure target documents
We’ll analyze SEC filings from three AI leaders. These URLs point to 10-Ks (annual) and 10-Qs (quarterly) filings:Step 1: Classify Risk Factor Pages with VLMs
Using Tensorlake’s Vision Language Models, we’ll scan all filings to identify pages containing AI-related risk factors. This typically reduces processing from ~200 pages to ~20-30 relevant pages per document:Review classification results
Let’s examine which pages were identified as containing AI risk factors:Step 2: Define Extraction Schema
We’ll extract structured data about AI risks including categories (Operational, Regulatory, Competitive, etc.), descriptions, and severity indicators:Step 3: Extract Structured Risk Data
Now we extract detailed AI risk information from only the classified pages. This targeted approach processes ~15% of pages while capturing 100% of relevant risk disclosures:Step 4: Save Extracted Data to JSON
Export each filing’s risk data to JSON files for loading into MotherDuck:Step 5: Load Data into MotherDuck
Create a cloud-based data warehouse table in MotherDuck to enable fast SQL analytics across all filings:Step 6: Analyze Risk Trends with SQL
Now the real power emerges—run SQL analytics on your document data to uncover insights.Query 1: Risk Category Distribution
Understand the breakdown of AI risk categories across all companies:Query 2: Deep Dive into Operational Risks
Extract the most detailed operational risk descriptions from each company:Key Insights Discovered
Through this analysis pipeline, we’ve:- Processed 40 SEC filings (~6,000+ total pages)
- Identified and extracted AI risk disclosures from relevant pages
- Built a queryable database of AI risk evolution from 2022-2025
Emerging Trends:
- Operational risks dominate (37 mentions) - All three companies express concerns about AI infrastructure costs, development challenges, and potential misuse of AI systems
- Ethical considerations intensifying (28 mentions) - Growing focus on bias, harmful content, and societal impact, particularly around generative AI
- Regulatory landscape evolving rapidly - 2025 filings show increased mentions of specific regulations (EU AI Act, US AI Executive Order)
-
New risk categories emerging in 2025:
- Liability risks - Meta explicitly discussing third-party misuse of open-source AI
- Intellectual property concerns - Copyright and training data issues becoming prominent
- Energy dependencies - Companies highlighting reliance on computing power
- Risk disclosure volume increasing - Average risk mentions per filing grew from 2.0 in 2022 to 7.0 in 2024
Company-Specific Patterns:
- Microsoft: Most comprehensive risk disclosures (55 total mentions), heavy focus on operational (19) and ethical (17) risks
- Meta: Balanced concern across operational (16) and regulatory (16) risks, unique focus on open-source AI liability
- Alphabet: More measured disclosures (10 total), but showing acceleration in 2025
Adapt This Pipeline for Your Use Case
This pipeline can be adapted for any document analysis need:- ESG disclosures - Track sustainability commitments and progress
- Financial metrics tracking - Extract KPIs across earnings reports
- Competitive intelligence - Monitor competitor product launches and strategies
- Regulatory compliance monitoring - Alert on new compliance requirements