Resume parsing is crucial in modern hiring workflows where recruiters deal with hundreds or thousands of resumes. Automating the extraction of key information (skills, experience, education) saves time and enables efficient candidate screening. It also powers recommendation engines, applicant tracking systems (ATS), and helps maintain structured databases of talent profiles. Here is an example of how to use the Tensorlake Python SDK how to use extract structured data when parsing candidate resumes.Documentation Index
Fetch the complete documentation index at: https://docs.tensorlake.ai/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
- Get your Tensorlake API key
- Install the Tensorlake SDK with
pip install tensorlake
Import packages, setup client, and define file path
# Import libraries
from tensorlake.documentai import DocumentAI
from tensorlake.documentai.models import (
ParsingOptions,
StructuredExtractionOptions,
ParseStatus
)
from tensorlake.documentai.models.enums import ChunkingStrategy
import time
import json
# Create a Tensorlake Client
doc_ai = DocumentAI()
# Reference to a resume that you want to parse
file_path = 'https://pub-226479de18b2493f96b64c6674705dd8.r2.dev/jakes-resume.pdf'
Define the schema
Define a JSON schema to extract relevant information from the resume.
structured_schema = {
"title": "ResumeInfo",
"type": "object",
"properties": {
"candidateName": { "type": "string" },
"email": { "type": "string" },
"phone": { "type": "string" },
"address": { "type": "string" },
"professionalSummary": { "type": "string" },
"skills": {
"type": "array",
"items": { "type": "string" }
},
"workExperience": {
"type": "array",
"items": {
"type": "object",
"properties": {
"jobTitle": { "type": "string" },
"companyName": { "type": "string" },
"location": { "type": "string" },
"startDate": { "type": "string" },
"endDate": { "type": "string" },
"description": { "type": "string" }
}
}
},
"education": {
"type": "array",
"items": {
"type": "object",
"properties": {
"degree": { "type": "string" },
"fieldOfStudy": { "type": "string" },
"institution": { "type": "string" },
"location": { "type": "string" },
"graduationDate":{ "type": "string" }
}
}
}
}
}
Parse the document with the Python SDK
# Configure parsing with structured schema
parsing_options = ParsingOptions(
chunking_strategy=ChunkingStrategy.PAGE
)
structured_extraction_options = StructuredExtractionOptions(
schema_name="Candidate Resume",
json_schema=structured_schema # schema for structured extraction
)
# Parse the document with the specified extraction options for structured data
parse_id = doc_ai.parse(file_path, parsing_options=parsing_options, structured_extraction_options=[structured_extraction_options])
print(f"Parse job submitted with ID: {parse_id}")
# Wait for completion
result = doc_ai.wait_for_completion(parse_id)
Review the output
The result will include the extracted data, all of the markdown chunks, and the entire document layout.The output will be:
# Print the structured data output
print(json.dumps(result.structured_data[0].data, indent=2))
# Get the markdown from extracted data
for index, chunk in enumerate(result.chunks):
print(f"Chunk {index}:")
print(chunk.content)
- Structured Data
- Markdown Chunks
Structured Data Outputs
{
"data": {
"address": null,
"candidateName": "Jake Ryan",
"education": [
{
"degree": "Bachelor of Arts",
"fieldOfStudy": "Computer Science, Minor in Business",
"graduationDate": "May 2021",
"institution": "Southwestern University",
"location": "Georgetown, TX"
},
{
"degree": "Associate's in Liberal Arts",
"fieldOfStudy": null,
"graduationDate": "May 2018",
"institution": "Blinn College",
"location": "Bryan, TX"
}
],
"email": "jake@su.edu",
"phone": "123-456-7890",
"professionalSummary": null,
"skills": [
"Java",
"Python",
"C/C++",
"SQL (Postgres)",
"JavaScript",
"HTML/CSS",
"R",
"React",
"Node.js",
"Flask",
"JUnit",
"WordPress",
"Material-UI",
"FastAPI",
"Git",
"Docker",
"TravisCI",
"Google Cloud Platform",
"VS Code",
"Visual Studio",
"PyCharm",
"IntelliJ",
"Eclipse",
"pandas",
"NumPy",
"Matplotlib"
],
"workExperience": [
{
"companyName": "Texas A&M University",
"description": "• Developed a REST API using FastAPI and PostgreSQL to store data from learning management systems • Developed a full-stack web application using Flask, React, PostgreSQL and Docker to analyze GitHub data • Explored ways to visualize GitHub collaboration in a classroom setting",
"endDate": null,
"jobTitle": "Undergraduate Research Assistant",
"location": "College Station, TX",
"startDate": "June 2020"
},
{
"companyName": null,
"description": "• Explored methods to generate video game dungeons based off of The Legend of Zelda Georgetown, TX • Developed a game in Java to test the generated dungeons • Contributed 50K+ lines of code to an established codebase via Git • Conducted a human subject study to determine which video game dungeon generation technique is enjoyable • Wrote an 8-page paper and gave multiple presentations on-campus • Presented virtually to the World Conference on Computational Intelligence",
"endDate": "July 2019",
"jobTitle": "Artificial Intelligence Research Assistant",
"location": "Georgetown, TX",
"startDate": "May 2019"
},
{
"companyName": "Georgetown, TX",
"description": "• Communicate with managers to set up campus computers used on campus • Assess and troubleshoot computer problems brought by students, faculty and staff • Maintain upkeep of computers, classroom equipment, and 200 printers across campus",
"endDate": null,
"jobTitle": "Information Technology Support Specialist",
"location": "Georgetown, TX",
"startDate": "Sep. 2018"
}
]
},
"page_numbers": [1],
"schema_name": "Candidate Resume"
}
Markdown Chunks
Chunk 0:
## Jake Ryan
123-456-7890 | \underline{jake@su.edu} | \underline{linkedin.com/in/jake} | \underline{github.com/jake}
## EDUCATION
Southwestern University
Bachelor of Arts in Computer Science, Minor in Business
Blinn College Associate's in Liberal Arts
Georgetown, TX
Aug. 2018 – May 2021
Bryan, TX
Aug. 2014 – May 2018
## EXPERIENCE
## Undergraduate Research Assistant
## Texas A&M University
June 2020 – Present
College Station, TX
• Developed a REST API using FastAPI and PostgreSQL to store data from learning management systems
• Developed a full-stack web application using Flask, React, PostgreSQL and Docker to analyze GitHub data
• Explored ways to visualize GitHub collaboration in a classroom setting
## Information Technology Support Specialist
Sep. 2018 – Present
Georgetown, TX
• Communicate with managers to set up campus computers used on campus
• Assess and troubleshoot computer problems brought by students, faculty and staff
• Maintain upkeep of computers, classroom equipment, and 200 printers across campus
## Artificial Intelligence Research Assistant
May 2019 – July 2019
• Explored methods to generate video game dungeons based off of The Legend of Zelda
Georgetown, TX
• Developed a game in Java to test the generated dungeons
• Contributed 50K+ lines of code to an established codebase via Git
• Conducted a human subject study to determine which video game dungeon generation technique is enjoyable
• Wrote an 8-page paper and gave multiple presentations on-campus
• Presented virtually to the World Conference on Computational Intelligence
## PROJECTS
Gitlytics | Python, Flask, React, PostgreSQL, Docker June 2020 – Present
• Developed a full-stack web application using with Flask serving a REST API with React as the frontend
• Implemented GitHub OAuth to get data from user’s repositories
• Visualized GitHub data to show collaboration
• Used Celery and Redis for asynchronous tasks
Simple Paintball | Spigot API, Java, Maven, TravisCI, Git
May 2018 – May 2020
• Developed a Minecraft server plugin to entertain kids during free time for a previous job
• Published plugin to websites gaining 2K+ downloads and an average 4.5/5-star review
• Implemented continuous delivery using TravisCI to build the plugin upon new a release
• Collaborated with Minecraft server administrators to suggest features and get feedback about the plugin
## TECHNICAL SKILLS
Languages: Java, Python, C/C++, SQL (Postgres), JavaScript, HTML/CSS, R Frameworks: React, Node.js, Flask, JUnit, WordPress, Material-UI, FastAPI Developer Tools: Git, Docker, TravisCI, Google Cloud Platform, VS Code, Visual Studio, PyCharm, IntelliJ, Eclipse Libraries: pandas, NumPy, Matplotlib