1
Prerequisites
- Get your Tensorlake API key
- Install the Tensorlake SDK with
pip install tensorlake
2
Import packages, setup client, and define file path
Copy
Ask AI
# Import libraries
from tensorlake.documentai import DocumentAI
from tensorlake.documentai.models import (
ParsingOptions,
StructuredExtractionOptions,
ParseStatus
)
from tensorlake.documentai.models.enums import ChunkingStrategy
import time
import json
# Create a Tensorlake Client
doc_ai = DocumentAI()
# Reference to a resume that you want to parse
file_path = 'https://pub-226479de18b2493f96b64c6674705dd8.r2.dev/jakes-resume.pdf'
3
Define the schema
Define a JSON schema to extract relevant information from the resume.
Copy
Ask AI
structured_schema = {
"title": "ResumeInfo",
"type": "object",
"properties": {
"candidateName": { "type": "string" },
"email": { "type": "string" },
"phone": { "type": "string" },
"address": { "type": "string" },
"professionalSummary": { "type": "string" },
"skills": {
"type": "array",
"items": { "type": "string" }
},
"workExperience": {
"type": "array",
"items": {
"type": "object",
"properties": {
"jobTitle": { "type": "string" },
"companyName": { "type": "string" },
"location": { "type": "string" },
"startDate": { "type": "string" },
"endDate": { "type": "string" },
"description": { "type": "string" }
}
}
},
"education": {
"type": "array",
"items": {
"type": "object",
"properties": {
"degree": { "type": "string" },
"fieldOfStudy": { "type": "string" },
"institution": { "type": "string" },
"location": { "type": "string" },
"graduationDate":{ "type": "string" }
}
}
}
}
}
4
Parse the document with the Python SDK
Copy
Ask AI
# Configure parsing with structured schema
parsing_options = ParsingOptions(
chunking_strategy=ChunkingStrategy.PAGE
)
structured_extraction_options = StructuredExtractionOptions(
schema_name="Candidate Resume",
json_schema=structured_schema # schema for structured extraction
)
# Parse the document with the specified extraction options for structured data
parse_id = doc_ai.parse(file_path, parsing_options=parsing_options, structured_extraction_options=[structured_extraction_options])
print(f"Parse job submitted with ID: {parse_id}")
# Wait for completion
result = doc_ai.wait_for_completion(parse_id)
5
Review the output
The result will include the extracted data, all of the markdown chunks, and the entire document layout.The output will be:
Copy
Ask AI
# Print the structured data output
print(json.dumps(result.structured_data[0].data, indent=2))
# Get the markdown from extracted data
for index, chunk in enumerate(result.chunks):
print(f"Chunk {index}:")
print(chunk.content)
- Structured Data
- Markdown Chunks
Structured Data Outputs
Copy
Ask AI
{
"data": {
"address": null,
"candidateName": "Jake Ryan",
"education": [
{
"degree": "Bachelor of Arts",
"fieldOfStudy": "Computer Science, Minor in Business",
"graduationDate": "May 2021",
"institution": "Southwestern University",
"location": "Georgetown, TX"
},
{
"degree": "Associate's in Liberal Arts",
"fieldOfStudy": null,
"graduationDate": "May 2018",
"institution": "Blinn College",
"location": "Bryan, TX"
}
],
"email": "[email protected]",
"phone": "123-456-7890",
"professionalSummary": null,
"skills": [
"Java",
"Python",
"C/C++",
"SQL (Postgres)",
"JavaScript",
"HTML/CSS",
"R",
"React",
"Node.js",
"Flask",
"JUnit",
"WordPress",
"Material-UI",
"FastAPI",
"Git",
"Docker",
"TravisCI",
"Google Cloud Platform",
"VS Code",
"Visual Studio",
"PyCharm",
"IntelliJ",
"Eclipse",
"pandas",
"NumPy",
"Matplotlib"
],
"workExperience": [
{
"companyName": "Texas A&M University",
"description": "• Developed a REST API using FastAPI and PostgreSQL to store data from learning management systems • Developed a full-stack web application using Flask, React, PostgreSQL and Docker to analyze GitHub data • Explored ways to visualize GitHub collaboration in a classroom setting",
"endDate": null,
"jobTitle": "Undergraduate Research Assistant",
"location": "College Station, TX",
"startDate": "June 2020"
},
{
"companyName": null,
"description": "• Explored methods to generate video game dungeons based off of The Legend of Zelda Georgetown, TX • Developed a game in Java to test the generated dungeons • Contributed 50K+ lines of code to an established codebase via Git • Conducted a human subject study to determine which video game dungeon generation technique is enjoyable • Wrote an 8-page paper and gave multiple presentations on-campus • Presented virtually to the World Conference on Computational Intelligence",
"endDate": "July 2019",
"jobTitle": "Artificial Intelligence Research Assistant",
"location": "Georgetown, TX",
"startDate": "May 2019"
},
{
"companyName": "Georgetown, TX",
"description": "• Communicate with managers to set up campus computers used on campus • Assess and troubleshoot computer problems brought by students, faculty and staff • Maintain upkeep of computers, classroom equipment, and 200 printers across campus",
"endDate": null,
"jobTitle": "Information Technology Support Specialist",
"location": "Georgetown, TX",
"startDate": "Sep. 2018"
}
]
},
"page_numbers": [1],
"schema_name": "Candidate Resume"
}
Markdown Chunks
Copy
Ask AI
Chunk 0:
## Jake Ryan
123-456-7890 | \underline{[email protected]} | \underline{linkedin.com/in/jake} | \underline{github.com/jake}
## EDUCATION
Southwestern University
Bachelor of Arts in Computer Science, Minor in Business
Blinn College Associate's in Liberal Arts
Georgetown, TX
Aug. 2018 – May 2021
Bryan, TX
Aug. 2014 – May 2018
## EXPERIENCE
## Undergraduate Research Assistant
## Texas A&M University
June 2020 – Present
College Station, TX
• Developed a REST API using FastAPI and PostgreSQL to store data from learning management systems
• Developed a full-stack web application using Flask, React, PostgreSQL and Docker to analyze GitHub data
• Explored ways to visualize GitHub collaboration in a classroom setting
## Information Technology Support Specialist
Sep. 2018 – Present
Georgetown, TX
• Communicate with managers to set up campus computers used on campus
• Assess and troubleshoot computer problems brought by students, faculty and staff
• Maintain upkeep of computers, classroom equipment, and 200 printers across campus
## Artificial Intelligence Research Assistant
May 2019 – July 2019
• Explored methods to generate video game dungeons based off of The Legend of Zelda
Georgetown, TX
• Developed a game in Java to test the generated dungeons
• Contributed 50K+ lines of code to an established codebase via Git
• Conducted a human subject study to determine which video game dungeon generation technique is enjoyable
• Wrote an 8-page paper and gave multiple presentations on-campus
• Presented virtually to the World Conference on Computational Intelligence
## PROJECTS
Gitlytics | Python, Flask, React, PostgreSQL, Docker June 2020 – Present
• Developed a full-stack web application using with Flask serving a REST API with React as the frontend
• Implemented GitHub OAuth to get data from user’s repositories
• Visualized GitHub data to show collaboration
• Used Celery and Redis for asynchronous tasks
Simple Paintball | Spigot API, Java, Maven, TravisCI, Git
May 2018 – May 2020
• Developed a Minecraft server plugin to entertain kids during free time for a previous job
• Published plugin to websites gaining 2K+ downloads and an average 4.5/5-star review
• Implemented continuous delivery using TravisCI to build the plugin upon new a release
• Collaborated with Minecraft server administrators to suggest features and get feedback about the plugin
## TECHNICAL SKILLS
Languages: Java, Python, C/C++, SQL (Postgres), JavaScript, HTML/CSS, R Frameworks: React, Node.js, Flask, JUnit, WordPress, Material-UI, FastAPI Developer Tools: Git, Docker, TravisCI, Google Cloud Platform, VS Code, Visual Studio, PyCharm, IntelliJ, Eclipse Libraries: pandas, NumPy, Matplotlib