> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tensorlake.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Classify Document

Submit a uploaded file, an internet-reachable URL, or any kind of raw text for document parsing. If you have configured a webhook,
we will notify you when the job is complete, be it a success or a failure.

Once submitted, the API will return a parse response with a `parse_id` field. You can query the status and results of the parse operation
with the [Get Parse Result](./get) endpoint.

## Using page classes

For this operation, you must pass in an array of categories along with their descriptions to guide the classifier in the
`page_classifications` field. The API will return the page class for each page of the document.

Each page class name must be unique within the document, and should be descriptive enough to convey the content of the page.


## OpenAPI

````yaml post /documents/v2/classify
openapi: 3.1.0
info:
  title: Tensorlake API
  description: >-
    Tensorlake Cloud APIs for Sandboxes, Document Ingestion, and Serverless
    Workflows
  license:
    name: ''
  version: 0.1.0
servers:
  - url: https://api.tensorlake.ai/
security:
  - bearerAuth: []
tags:
  - name: Tensorlake Cloud API
    description: >-
      Tensorlake Cloud APIs for Sandboxes, Document Ingestion, and Serverless
      Workflows
paths:
  /documents/v2/classify:
    post:
      tags:
        - classify
      operationId: post_classify
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/ClassificationRequest'
        required: true
      responses:
        '200':
          description: Created parse job details
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ParseCreatedResponse'
        '400':
          description: Invalid request
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ApiError'
        '401':
          description: Unauthorized. Invalid or missing credentials
        '403':
          description: Forbidden. You do not have permission to access this resource
        '404':
          description: Resource not found
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ApiError'
        '422':
          description: Invalid properties in request body
          content:
            text/plain: {}
        '500':
          description: Internal server error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ApiError'
components:
  schemas:
    ClassificationRequest:
      allOf:
        - $ref: '#/components/schemas/RequestFileInfo'
        - $ref: '#/components/schemas/ClassificationRequestConfiguration'
        - type: object
          properties:
            labels:
              type:
                - object
                - 'null'
              description: >-
                Additional metadata to identify the classification request. The
                labels

                are returned in the classification response.
              additionalProperties: {}
              propertyNames:
                type: string
              example:
                priority: high
                source: email
    ParseCreatedResponse:
      type: object
      required:
        - parse_id
        - created_at
      properties:
        parse_id:
          type: string
          description: >-
            The unique identifier for the parse job


            This is the ID that can be used to track the status of the parse
            job.

            Used in the `GET /documents/v2/parse/{parse_id}` endpoint to
            retrieve

            the status and results of the parse job.
        created_at:
          type: string
          description: |-
            The creation date and time of the parse job.

            The date is in RFC 3339 format.
    ApiError:
      type: object
      required:
        - message
        - code
        - timestamp
      properties:
        message:
          type: string
          description: A human-readable error message
        code:
          $ref: '#/components/schemas/ApiErrorCode'
          description: The error code, which can be used to programmatically handle errors
        timestamp:
          type: integer
          format: int64
          description: Millis since Unix epoch; easy to parse in every language
        trace_id:
          type:
            - string
            - 'null'
          description: Optional request correlation-id for distributed tracing
        details:
          description: Optional field-level validation errors, etc.
    RequestFileInfo:
      allOf:
        - type: object
          properties:
            page_range:
              type: string
              description: >-
                Comma-separated list of page numbers or ranges to parse (e.g.,
                '1,2,3-5'). Default: all pages.
              examples:
                - 1-5,8,10
            file_name:
              type: string
              description: Name of the file. Only populated when using file_id.
              examples:
                - document.pdf
        - oneOf:
            - type: object
              title: file_id
              required:
                - file_id
              properties:
                file_id:
                  type: string
                  description: >-
                    ID of the file previously uploaded to Tensorlake. Has
                    tensorlake- (V1) or file_ (V2) prefix.
                  examples:
                    - file_abc123xyz
                mime_type:
                  type: string
                  enum:
                    - application/pdf
                    - >-
                      application/vnd.openxmlformats-officedocument.wordprocessingml.document
                    - application/msword
                    - >-
                      application/vnd.openxmlformats-officedocument.presentationml.presentation
                    - application/vnd.ms-powerpoint
                    - application/vnd.apple.keynote
                    - image/jpeg
                    - image/tiff
                    - text/plain
                    - text/html
                    - text/markdown
                    - text/x-markdown
                    - >-
                      application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
                    - application/vnd.ms-excel.sheet.macroenabled.12
                    - application/vnd.ms-excel
                    - text/xml
                    - text/csv
                    - image/png
                    - text/rtf
                    - application/rtf
                    - application/octet-stream
                    - application/pkcs7-mime
                    - application/x-pkcs7-mime
                    - application/pkcs7-signature
            - type: object
              title: file_url
              required:
                - file_url
              properties:
                file_url:
                  type: string
                  format: uri-template
                  description: >-
                    External URL of the file to parse. Must be publicly
                    accessible.
                  examples:
                    - >-
                      https://pub-226479de18b2493f96b64c6674705dd8.r2.dev/real-estate-purchase-all-signed.pdf
                mime_type:
                  type: string
                  enum:
                    - application/pdf
                    - >-
                      application/vnd.openxmlformats-officedocument.wordprocessingml.document
                    - application/msword
                    - >-
                      application/vnd.openxmlformats-officedocument.presentationml.presentation
                    - application/vnd.ms-powerpoint
                    - application/vnd.apple.keynote
                    - image/jpeg
                    - image/tiff
                    - text/plain
                    - text/html
                    - text/markdown
                    - text/x-markdown
                    - >-
                      application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
                    - application/vnd.ms-excel.sheet.macroenabled.12
                    - application/vnd.ms-excel
                    - text/xml
                    - text/csv
                    - image/png
                    - text/rtf
                    - application/rtf
                    - application/octet-stream
                    - application/pkcs7-mime
                    - application/x-pkcs7-mime
                    - application/pkcs7-signature
            - type: object
              title: raw_text
              required:
                - raw_text
                - mime_type
              properties:
                raw_text:
                  type: string
                  description: The raw text content to parse.
                  examples:
                    - This is the document content...
                mime_type:
                  type: string
                  enum:
                    - application/pdf
                    - >-
                      application/vnd.openxmlformats-officedocument.wordprocessingml.document
                    - application/msword
                    - >-
                      application/vnd.openxmlformats-officedocument.presentationml.presentation
                    - application/vnd.ms-powerpoint
                    - application/vnd.apple.keynote
                    - image/jpeg
                    - image/tiff
                    - text/plain
                    - text/html
                    - text/markdown
                    - text/x-markdown
                    - >-
                      application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
                    - application/vnd.ms-excel.sheet.macroenabled.12
                    - application/vnd.ms-excel
                    - text/xml
                    - text/csv
                    - image/png
                    - text/rtf
                    - application/rtf
                    - application/octet-stream
                    - application/pkcs7-mime
                    - application/x-pkcs7-mime
                    - application/pkcs7-signature
          description: 'File source - must be exactly one of: file_id, file_url, or raw_text'
    ClassificationRequestConfiguration:
      type: object
      properties:
        page_classifications:
          type: array
          items:
            $ref: '#/components/schemas/PageClassConfig'
          description: |-
            The properties of this object define the configuration for page
            classify.

            If this object is present, the API will perform page classify on
            the document.
      additionalProperties: false
    ApiErrorCode:
      oneOf:
        - type: string
          enum:
            - QUOTA_EXCEEDED
        - type: string
          enum:
            - INVALID_JSON_SCHEMA
        - type: string
          enum:
            - INVALID_CONFIGURATION
        - type: string
          enum:
            - INVALID_PAGE_CLASSIFICATION
        - type: string
          enum:
            - ENTITY_NOT_FOUND
        - type: string
          enum:
            - ENTITY_ALREADY_EXISTS
        - type: string
          enum:
            - INVALID_FILE
        - type: string
          enum:
            - INVALID_PAGE_RANGE
        - type: string
          enum:
            - INVALID_MIME_TYPE
        - type: string
          enum:
            - INVALID_DATASET_NAME
        - type: string
          enum:
            - INVALID_JOB_STATE
        - type: string
          enum:
            - INTERNAL_ERROR
        - type: string
          enum:
            - INVALID_MULTIPART
        - type: string
          enum:
            - MULTIPART_STREAM_END
        - type: string
          enum:
            - CLIENT_DISCONNECT
        - type: string
          enum:
            - INVALID_ID
        - type: object
          required:
            - INVALID_QUERY_PARAMS
          properties:
            INVALID_QUERY_PARAMS:
              type: object
              required:
                - property
              properties:
                property:
                  type: string
                message:
                  type:
                    - string
                    - 'null'
    PageClassConfig:
      type: object
      required:
        - name
        - description
      properties:
        name:
          type: string
          description: The name of the page class.
        description:
          type: string
          description: |-
            The description of the page class to guide the model to classify the
            pages. Describe what the model should look for in the page to
            classify it.
  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer

````