Parse Resumes with Tensorlake

Resume parsing is crucial in modern hiring workflows where recruiters deal with hundreds or thousands of resumes. Automating the extraction of key information (skills, experience, education) saves time and enables efficient candidate screening. It also powers recommendation engines, applicant tracking systems (ATS), and helps maintain structured databases of talent profiles.

Try parsing resumes using this notebook:

Here is an example of how to use the Tensorlake Python SDK how to use extract structured data when parsing candidate resumes.

Prerequisites

Get your Tensorlake API key
Install the Tensorlake SDK with pip install tensorlake

Import packages, setup client, and define file path

# Import libraries
from tensorlake.documentai import DocumentAI
from tensorlake.documentai.models import (
    ParsingOptions,
    StructuredExtractionOptions,
    ParseStatus
)
from tensorlake.documentai.models.enums import ChunkingStrategy
import time
import json

# Create a Tensorlake Client
doc_ai = DocumentAI()

# Reference to a resume that you want to parse
file_path = 'https://pub-226479de18b2493f96b64c6674705dd8.r2.dev/jakes-resume.pdf'

Define the schema

Define a JSON schema to extract relevant information from the resume.

structured_schema = {
  "title": "ResumeInfo",
  "type": "object",
  "properties": {
    "candidateName":   { "type": "string" },
    "email":           { "type": "string" },
    "phone":           { "type": "string" },
    "address":         { "type": "string" },
    "professionalSummary": { "type": "string" },
    "skills": {
      "type": "array",
      "items": { "type": "string" }
    },
    "workExperience": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "jobTitle":     { "type": "string" },
          "companyName":  { "type": "string" },
          "location":     { "type": "string" },
          "startDate":    { "type": "string" },
          "endDate":      { "type": "string" },
          "description":  { "type": "string" }
        }
      }
    },
    "education": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "degree":        { "type": "string" },
          "fieldOfStudy":  { "type": "string" },
          "institution":   { "type": "string" },
          "location":      { "type": "string" },
          "graduationDate":{ "type": "string" }
        }
      }
    }
  }
}

Parse the document with the Python SDK

# Configure parsing with structured schema
parsing_options = ParsingOptions(
    chunking_strategy=ChunkingStrategy.PAGE
)

structured_extraction_options = StructuredExtractionOptions(
    schema_name="Candidate Resume",
    json_schema=structured_schema  # schema for structured extraction
)

# Parse the document with the specified extraction options for structured data
parse_id = doc_ai.parse(file_path, parsing_options=parsing_options, structured_extraction_options=[structured_extraction_options])

print(f"Parse job submitted with ID: {parse_id}")

# Wait for completion
result = doc_ai.wait_for_completion(parse_id)

Review the output

The result will include the extracted data, all of the markdown chunks, and the entire document layout.

# Print the structured data output
print(json.dumps(result.structured_data[0].data, indent=2))

# Get the markdown from extracted data
for index, chunk in enumerate(result.chunks):
    print(f"Chunk {index}:")
    print(chunk.content)

The output will be:

Structured Data
Markdown Chunks

Structured Data Outputs

{
  "data": {
    "address": null,
    "candidateName": "Jake Ryan",
    "education": [
      {
        "degree": "Bachelor of Arts",
        "fieldOfStudy": "Computer Science, Minor in Business",
        "graduationDate": "May 2021",
        "institution": "Southwestern University",
        "location": "Georgetown, TX"
      },
      {
        "degree": "Associate's in Liberal Arts",
        "fieldOfStudy": null,
        "graduationDate": "May 2018",
        "institution": "Blinn College",
        "location": "Bryan, TX"
      }
    ],
    "email": "jake@su.edu",
    "phone": "123-456-7890",
    "professionalSummary": null,
    "skills": [
      "Java",
      "Python",
      "C/C++",
      "SQL (Postgres)",
      "JavaScript",
      "HTML/CSS",
      "R",
      "React",
      "Node.js",
      "Flask",
      "JUnit",
      "WordPress",
      "Material-UI",
      "FastAPI",
      "Git",
      "Docker",
      "TravisCI",
      "Google Cloud Platform",
      "VS Code",
      "Visual Studio",
      "PyCharm",
      "IntelliJ",
      "Eclipse",
      "pandas",
      "NumPy",
      "Matplotlib"
    ],
    "workExperience": [
      {
        "companyName": "Texas A&M University",
        "description": "• Developed a REST API using FastAPI and PostgreSQL to store data from learning management systems • Developed a full-stack web application using Flask, React, PostgreSQL and Docker to analyze GitHub data • Explored ways to visualize GitHub collaboration in a classroom setting",
        "endDate": null,
        "jobTitle": "Undergraduate Research Assistant",
        "location": "College Station, TX",
        "startDate": "June 2020"
      },
      {
        "companyName": null,
        "description": "• Explored methods to generate video game dungeons based off of The Legend of Zelda Georgetown, TX • Developed a game in Java to test the generated dungeons • Contributed 50K+ lines of code to an established codebase via Git • Conducted a human subject study to determine which video game dungeon generation technique is enjoyable • Wrote an 8-page paper and gave multiple presentations on-campus • Presented virtually to the World Conference on Computational Intelligence",
        "endDate": "July 2019",
        "jobTitle": "Artificial Intelligence Research Assistant",
        "location": "Georgetown, TX",
        "startDate": "May 2019"
      },
      {
        "companyName": "Georgetown, TX",
        "description": "• Communicate with managers to set up campus computers used on campus • Assess and troubleshoot computer problems brought by students, faculty and staff • Maintain upkeep of computers, classroom equipment, and 200 printers across campus",
        "endDate": null,
        "jobTitle": "Information Technology Support Specialist",
        "location": "Georgetown, TX",
        "startDate": "Sep. 2018"
      }
    ]
  },
  "page_numbers": [1],
  "schema_name": "Candidate Resume"
}

With Tensorlake parse output you have accurate, detailed, and precise data that is reliable for quick filtering of candidates.

Code Snippets

Cookbooks

Tutorials

Parse Resumes with Tensorlake