POST
/
documents
/
v1
/
extract_async

Submit a file for structured extraction. If you have configured a webhook, we will notify you when the job is complete.

The API call returns a job_id, which you can use to retrieve the results of the job with your API key.

You can either use a managed file, tensorlake-<file_id>, or provide a pre-signed URL or any HTTP URL, that can be used to download the file.

Provide the JSON Schema you want to extract in the jsonSchema field.

Choosing the tensorlake model provider ensures that your data is not sent to a third party service(OpenAI, Anthropic, etc), and uses our own models to extract the data.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
file
string
required

The URL of the document to be processed. You can provide one of the following:

  • A publicly available URL or a presigned S3 URL
  • A tensorlake- prefixed id obtained from the /files endpoint after directly uploading a document
jsonSchema
string
required

The JSON schema to guide structured data extraction from the file. Encode the JSON schema as a string.

deliverWebhook
boolean

Whether to deliver a webhook when the job is completed.

modelProvider
enum<string>
default:
tensorlake

The model provider to use for structured data extraction. Specifying tensorlake will use a private model, which runs on our servers.

Available options:
tensorlake,
claude-3-5-sonnet-latest,
gpt-4o-mini
prompt
string | null

Overide the prompt to customize structured extractions. Use this if you want to extract data from a file using a different prompt than the one we use to extract.

structuredDataset
string | null

The dataset to insert the extracted data into.

Response

200 - application/json
jobId
string
required
status
string
required