Overview
Charts in PDFs and documents are static images. Traditional parsers either skip them entirely or return a generic figure fragment with no underlying data. Tensorlake’s Agentic Chart Extraction transforms those images into structured, usable JSON — detecting the chart type, extracting data series and axis information, and producing output that can be fed directly into analytics, BI tools, or plotted programmatically. Enable it withchart_extraction=True in your EnrichmentOptions.
Enabling Chart Extraction
Setchart_extraction=True in your EnrichmentOptions:
How It Works
For each chart detected in the document, the system:- Identifies the chart type (bar, line, scatter, or pie)
- Extracts axis definitions, series names, data points, and rendering hints (colors, markers, legend position)
- Outputs a standardized JSON object conforming to the schema for that chart type
Supported Chart Types
| Chart type | Schema highlights |
|---|---|
| Bar | orientation (vertical/horizontal), named series for grouped/stacked bars, x_axis.categories, optional axis bounds and per-bar display flags |
| Line | x/y axis definitions, explicit values arrays (numeric or categorical), multiple series with color, line_style, and marker styling |
| Scatter | Per-series x_data/y_data arrays, marker styling (size, alpha, edge_color), and axis bounds |
| Pie | Slice-centric schema with label, value, optional percentage, colors, and display flags |
Output Examples
Bar chart
Scatter plot
Line chart
Common Use Cases
- Financial reports — extract revenue, cost, and margin trends from bar and line charts without manual transcription
- Scientific papers — recover experimental data points from scatter plots for further analysis or comparison
- Business presentations — pull KPI charts into structured data for dashboards and reporting pipelines
- RAG pipelines — surface chart data as structured context so LLMs can answer quantitative questions about visuals
- BI and analytics — re-plot or aggregate extracted series directly using the output JSON without rebuilding data manually