POST
/
documents
/
v1
/
files_large
curl --request POST \
  --url https://api.tensorlake.ai/documents/v1/files_large \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "file_size": 123,
  "filename": "<string>",
  "mime_type": "<string>",
  "sha256_checksum": "<string>"
}'
{
  "id": "<string>",
  "presigned_url": "<string>"
}

This API call initiates the upload process for large files. It returns a presigned URL that can be used to upload the file in chunks.

This API call returns a temporary identifier for the file, which is used to identify the file during the upload process. The identifier is of the form tl-presigned-<temporary_file_id>.

The presigned URL is valid for 1 hour. After the file is uploaded, you must call the Finalize the upload process for large files API to complete the upload process.

Files which have not been finalized won’t be available for parsing or structured extraction.

Uploading to a presigned URL

The presigned URL is an AWS S3 presigned URL. You can use any HTTP client to upload the file to the presigned URL. The file must be uploaded as a PUT request with the following headers:

  • Content-Type: The content type of the file being uploaded. This is required. For example, application/pdf for PDF files, image/jpeg for JPEG images, etc.
  • Content-Length: The size of the file being uploaded. This is required. This value must be set to the size of the file in bytes.
  • x-amz-sdk-checksum-algorithm: The checksum algorithm used to calculate the SHA256 hash. This is required. This value must be set to SHA256.
  • x-amz-checksum-sha256: The SHA256 hash of the file being uploaded. This is required. This value must be set to the base64 encoded SHA256 hash of the file in hex format.

This is an example of how to calculate the SHA256 hash of a file in Python:

import base64
import hashlib

hasher = hashlib.sha256()
with open('file.pdf', 'rb') as f:
    while chunk := f.read(8192):
        hasher.update(chunk)

checksumed = hasher.digest()

# To get the checksum in hex format for the x-amz-content-sha256 header:
checksum_encoded = base64.b64encode(bytes.fromhex(checksume)).decode()

And here there is an example of how to calculate the SHA256 header in bash:

checksum="mychecksum"
printf "%s" "$checksum" | xxd -r -p | base64

Note: The official Document AI Python SDK handles the large file upload process for you. You can use the Document AI Python SDK to upload large files. The SDK will automatically calculate the SHA256 hash and upload the file to the presigned URL.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

Response

200
application/json
Response object with the temporary File ID and the presigned URL.

The response is of type object.