Video RAG
Videos Extraction and Understanding
Indexify supports videos, and you could build complex pipelines that extracts information from videos. Some of the capabilities provided by Indexify extractors are -
- Key Frame Extraction
- Audio Extraction from videos
- Tracking objects in a video
- Video Description using Visual LLMs
Video RAG Example
To demonstrate video understanding capabilities, we will build a RAG application to answer questions about topics from a video. We will ingest the State of The Union address from President Biden video and build a Q&A bot to answer questions.
At the end of the tutorial your application will be able to answer the following questions.
We will extract the audio from the video, transcribe it, and index it. Then we will use the extracted information to answer questions about the video.
We will be using the following extractors:
- Audio Extractor - It will extract audio from ingested videos.
- Whisper Extractor - It will extract transcripts of the audio.
- Chunk Extractor - It will chunk the transcripts into smaller parts.
- Mini LM L6 Extractor - A Sentence Transformer to extract embedding from the audio extractor.
The Q&A will be powered by Langchain and OpenAI. We will use Indexify Retriever and pass it to Langchain to retrieve the relevant text of the questions based on semantic search.
Download Indexify and Extractors
Start Indexify and Extractors in Terminal
We need to use 2 terminals to start the Indexify server and the extractors.
Download the Libraries
Download the Video
Create the Extraction Policies
Next, we create an extraction graph with 4 extraction policies:
- We extract the audio from every video that is ingested by using the
tensorlake/audio-extractor
on the videos. - The extracted audio are passed through the
tensorlake/whisper-asr
extractor to be transcribed. - We pass the transcripts to the
tensorlake/chunk-extractor
to chunk the transcripts into smaller parts. - We process the transcript chunks through
tensorlake/minilm-l6
extractor to extract the vector embedding and index them. - The
content_source
parameter is used to specify the source of the content for the extraction policy. Typically, when creating a pipeline of multiple extractors, the output of one extractor is used as the input for the next extractor.
Upload the Video
Without needing to do anything, Indexify will automatically start the extraction process on the video. This is because Indexify will evaluate any data that is uploaded to it against the extraction graph and start the extraction process if the data matches the graph specification.
Perform RAG
Create the Indexify Langchain Retriever. This retriever is automatically created by Indexify Extractor MiniLM-L6 and after the extraction process is completed, we can use it to retrieve the relevant text based on the questions.
Create the Langchain Q&A chain to ask questions about the video.
Now, we can ask questions related to the video.
Answer:
Was this page helpful?