Parse any PDF, word document, or presentation and perform post-processing steps like chunking. Document Ingestion preserves the reading order and layout of the document to enable an LLM to read documents as a human would. It can extract information from charts, complex tables and hand-written notes.

Getting Started

With the Python SDK, you can easily setup a client to interact wiht the Document Ingestion API. To use the SDK, you need to install it first. You can do this using pip:

pip install tensorlake

Once you have the SDK installed, you can start using it to interact with the Document Ingestion API. The first step is to create a client instance. You can do this by providing your API key. The API key is used to authenticate your requests to the Tensorlake Cloud. You can find your API key in Tensorlake Cloud:

from tensorlake.documentai import DocumentAI, ParsingOptions

API_KEY="tl__apiKey_xxxx"
doc_ai = DocumentAI(api_key=API_KEY)
file_id = doc_ai.upload(path="/path/to/file.pdf")
job_id = doc_ai.parse(file_id, options=ParsingOptions())
data = doc_ai.get_job(job_id=job_id)
Make sure you get an API Key, we recommend creating a .env file with the API Key for that specific project.

Next Steps

Once you have the basic setup, you can start exploring the different features of the Document Ingestion API. Here are some places to get started: