Create Datasets
Having all of your documents in a single Tensorlake Dataset makes it easy to build a knowledge base from a corpus of documents and keep it continuously updated when documents are added, updated, or removed.
At the moment, datasets can apply Document Ingestion actions, like document parsing, chunking or structured extraction on new documents.
All the code examples on this page use the official Tensorlake Python SDK. For other languages, please consult our API Reference.
Quick Start
The example can be run in a Google Colab notebook.
Create a dataset
Parse a file
Retrieve outputs
The wait_for_completion
method will block until the parsing job is complete and return the result.
With datasets, you can ingest as many files as you want, and the parsing configuration will be applied to all of them. You can also create a dataset with structured extraction options, which will allow you to extract structured data from related documents.