Partnerships
Partnerships
Partnerships
Building a Semantic Video Search Workflow with TwelveLabs and Qdrant


James Le
James Le
James Le
We demonstrate how to build a semantic video search workflow by combining TwelveLabs’ multimodal embedding capabilities with Qdrant’s vector search engine.
We demonstrate how to build a semantic video search workflow by combining TwelveLabs’ multimodal embedding capabilities with Qdrant’s vector search engine.


Join our newsletter
Receive the latest advancements, tutorials, and industry insights in video understanding
Feb 24, 2025
Feb 24, 2025
Feb 24, 2025
8 Min
8 Min
8 Min
Copy link to article
Copy link to article
Copy link to article
Big thanks to the Qdrant team (David Myriel and Anush Shetty) for collaborating with us on this tutorial.
Introduction
In today’s data-driven world, video content is a rich source of information that combines multiple modalities, including visuals, audio, and text. However, extracting meaningful insights from videos and enabling semantic search across them can be challenging due to their complexity. This is where the integration of TwelveLabs Embed API and Qdrant comes into play.
The TwelveLabs Embed API empowers developers to create multimodal embeddings that capture the essence of video content, including visual expressions, body language, spoken words, and contextual cues. These embeddings are optimized for a unified vector space, enabling seamless cross-modal understanding. On the other hand, Qdrant is a powerful vector similarity search engine that allows you to store and query these embeddings efficiently.
In this tutorial, we’ll demonstrate how to build a semantic video search workflow by combining TwelveLabs’ multimodal embedding capabilities with Qdrant’s vector search engine. By the end of this guide, you’ll be able to:
Generate multimodal embeddings for videos using the TwelveLabs Embed API.
Store and manage these embeddings in Qdrant.
Perform semantic searches across video content using text or other modalities.
This workflow is ideal for applications like video indexing, content recommendation systems, and contextual search engines.

1 - Setting Up The Environment
Before diving into the implementation, let’s set up the necessary tools and libraries. For this tutorial, we’ll use Python in a Colab notebook environment.
Step 1: Install Required SDKs
Run the following command in your Colab notebook to install the TwelveLabs and Qdrant SDKs:
!pip install twelvelabs qdrant-client
Step 2: Configure API Clients
Next, configure the TwelveLabs and Qdrant clients by importing their respective libraries and initializing them with your API keys.
from twelvelabs import TwelveLabs from qdrant_client import QdrantClient # Get your API keys from: https://playground.twelvelabs.io/dashboard/api-key from google.colab import userdata TL_API_KEY=userdata.get('TL_API_KEY') twelvelabs_client = TwelveLabs(api_key=TL_API_KEY) qdrant_client = QdrantClient(":memory:")
2 - Generating Multimodal Embeddings with TwelveLabs

The TwelveLabs Embed API allows you to generate multimodal embeddings that capture the essence of video content across modalities like visuals, audio, and text. These embeddings are represented as high-dimensional vectors, enabling seamless semantic search and cross-modal understanding. In this section, we’ll demonstrate how to use the Marengo-retrieval-2.7 engine to create embeddings for a video.
Step 1: Understanding the Embedding Process
The Marengo-retrieval-2.7 engine is optimized for video-native embeddings with a dimensionality of 1024. It supports cosine similarity for vector comparisons, making it suitable for tasks like semantic search or retrieval. You can also use this engine to embed audio, text, and images into the same vector space, enabling cross-modality searches. For context, the engine is built on top of TwelveLabs’ state-of-the-art video embedding model Marengo 2.7.
Step 2: Embedding a Video
To generate embeddings for a video, follow these steps:
Specify the Video URL: Provide the URL of the video you want to process.
Create an Embedding Task: Use the TwelveLabs client to initiate an embedding task.
Wait for Task Completion: Monitor the task status until it’s complete.
Retrieve the Embeddings: Once completed, retrieve the vector embeddings from the task results.
Here’s the implementation in Python:
# Step 1: Create an embedding task task = twelvelabs_client.embed.task.create( model_name="Marengo-retrieval-2.7", # Specify the model video_url="https://sample-videos.com/video321/mp4/720/big_buck_bunny_720p_2mb.mp4" # Video URL ) # Step 2: Wait for the task to complete task.wait_for_done(sleep_interval=3) # Check every 3 seconds # Step 3: Retrieve the embeddings task_result = twelvelabs_client.embed.task.retrieve(task.id) # Display the embedding results print("Embedding Vector (First 10 Dimensions):", task_result.embeddings[:10]) print("Embedding Dimensionality:", len(task_result.embeddings))
Step 3: Cross-Modality Embeddings (Optional)
The same engine can embed other modalities (e.g., text, audio, or images) into a unified vector space. For example:
Text: Input descriptive text such as "a person riding a bike".
Audio: Use an audio file URL to extract its semantic representation.
Image: Provide an image URL for embedding.
This flexibility allows you to perform cross-modality searches, such as querying a video collection using text or audio descriptions.
Step 4: Next Steps
Once you’ve generated embeddings, they can be stored in a vector database like Qdrant for efficient similarity search. In the next section, we’ll explore how to prepare these embeddings for insertion into Qdrant and perform semantic searches across your video content.
3 - Preparing Data for Qdrant
Once you’ve generated multimodal embeddings using the TwelveLabs Embed API, the next step is to prepare these embeddings for insertion into Qdrant. Qdrant operates on points, which consist of a vector, an optional unique ID, and a payload with additional metadata. For this tutorial, we’ll map each video segment’s embedding into Qdrant’s PointStruct
format.
Step 1: Extract Embedding Segments
The TwelveLabs Embed API generates embeddings for video segments, each containing:
Vector: The high-dimensional embedding.
Metadata: Including start and end timestamps (
start_offset_sec
andend_offset_sec
) and the embedding scope.
Step 2: Convert to Qdrant Points
We’ll loop through the video embedding segments and convert them into Qdrant-compatible points. The metadata will be stored in the payload for each point.
Here’s the Python code to achieve this:
from qdrant_client.models import PointStruct # Convert embedding segments to Qdrant points points = [ PointStruct( id=idx, # Unique identifier for each vector vector=v.embeddings_float, # Embedding vector payload={ "start_offset_sec": v.start_offset_sec, # Start time of the segment "end_offset_sec": v.end_offset_sec, # End time of the segment "embedding_scope": v.embedding_scope, # Scope of the embedding }, ) for idx, v in enumerate(task_result.video_embedding.segments) ] print(f"Prepared {len(points)} points for insertion into Qdrant.")
At this stage, your data is ready to be inserted into a Qdrant collection.

4 - Setting Up a Qdrant Collection
Qdrant organizes vectors into collections, which are named sets of points. Each collection has specific parameters such as vector dimensionality and a distance metric (e.g., cosine similarity). Let’s create a collection to store the prepared points.
Step 1: Define Collection Parameters
For this tutorial:
Vector Size: 1024 (matches the dimensionality of embeddings from TwelveLabs).
Distance Metric: Cosine similarity (optimal for comparing normalized vectors).
Step 2: Create a Collection
Use the following code to create a collection in Qdrant:
from qdrant_client.models import VectorParams, Distance # Define collection name collection_name = "twelve_labs_collection" # Create a collection with specified parameters qdrant_client.create_collection( collection_name, vectors_config=VectorParams( size=1024, # Dimensionality of vectors distance=Distance.COSINE, # Similarity metric ), ) print(f"Collection '{collection_name}' created successfully.")
Step 3: Insert Points into the Collection
Now that the collection is set up, insert the prepared points:
# Insert points into the collection qdrant_client.upsert(collection_name, points) print(f"Inserted {len(points)} points into '{collection_name}'.")
Summary
At this point:
The embeddings have been converted into Qdrant-compatible points.
A new collection has been created with appropriate parameters.
The points have been successfully inserted into the collection.
In the next section, we’ll demonstrate how to query this collection to perform semantic searches across your video content.
5 - Performing Semantic Searches
With the embeddings stored in Qdrant, you can now perform semantic searches across different modalities, such as text, audio, and images. This section demonstrates how to query the Qdrant collection using embeddings generated by the TwelveLabs Embed API.
Step 1: Querying with Text
Text-based queries allow you to search for video segments that semantically match a given textual description. For example, let’s search for segments related to "A white rabbit."
# Generate text embedding text_segment = twelvelabs_client.embed.create( model_name="Marengo-retrieval-2.7", text="A white rabbit", # Input query ).text_embedding.segments[0] # Perform semantic search in Qdrant text_results = qdrant_client.query_points( collection_name=collection_name, query=text_segment.embeddings_float, # Use the embedding vector ) print("Text Query Results:", text_results)
Step 2: Querying with Audio
Audio-based queries allow you to search for video segments that match the semantic content of an audio clip. For instance, here’s how to use an audio file as a query:
# Generate audio embedding audio_segment = twelvelabs_client.embed.create( model_name="Marengo-retrieval-2.7", audio_url="https://codeskulptor-demos.commondatastorage.googleapis.com/descent/background%20music.mp3", # Audio file URL ).audio_embedding.segments[0] # Perform semantic search in Qdrant audio_results = qdrant_client.query_points( collection_name=collection_name, query=audio_segment.embeddings_float, # Use the embedding vector ) print("Audio Query Results:", audio_results)
Step 3: Querying with an Image
Image-based queries enable you to find video segments that are semantically similar to a given image. For example:
# Generate image embedding image_segment = twelvelabs_client.embed.create( model_name="Marengo-retrieval-2.7", image_url="https://gratisography.com/wp-content/uploads/2024/01/gratisography-cyber-kitty-1170x780.jpg", # Image URL ).image_embedding.segments[0] # Perform semantic search in Qdrant image_results = qdrant_client.query_points( collection_name=collection_name, query=image_segment.embeddings_float, # Use the embedding vector ) print("Image Query Results:", image_results)
Summary
By leveraging the TwelveLabs Embed API and Qdrant:
You can perform cross-modal searches using text, audio, or images as queries.
The unified vector space ensures that embeddings from different modalities are comparable, enabling seamless multimodal retrieval.
6 - Conclusion and Next Steps
In this tutorial, we explored how to build a semantic video search workflow by integrating the TwelveLabs Embed API with Qdrant. By generating multimodal embeddings from video content and leveraging Qdrant’s vector database, we created a powerful system capable of performing semantic searches across modalities such as text, audio, and images. This workflow demonstrates the potential of combining advanced AI models with scalable vector search technology to unlock new possibilities in video understanding and retrieval.
Key Takeaways
The TwelveLabs Embed API provides state-of-the-art multimodal embeddings that capture the essence of video content across visual, audio, and textual modalities.
Qdrant enables efficient storage and similarity search of these embeddings using its flexible collection structure and high-performance query capabilities.
The unified vector space created by TwelveLabs’ models allows for seamless cross-modal searches, making it possible to query videos using text descriptions, audio clips, or images.
Call to Action
As we move forward, we encourage developers and businesses to explore the combined power of TwelveLabs and Qdrant for building next-generation AI applications. Whether it’s semantic video search, personalized recommendations, or innovative RAG workflows, this partnership is poised to redefine how we interact with multimodal data.
Big thanks to the Qdrant team (David Myriel and Anush Shetty) for collaborating with us on this tutorial.
Introduction
In today’s data-driven world, video content is a rich source of information that combines multiple modalities, including visuals, audio, and text. However, extracting meaningful insights from videos and enabling semantic search across them can be challenging due to their complexity. This is where the integration of TwelveLabs Embed API and Qdrant comes into play.
The TwelveLabs Embed API empowers developers to create multimodal embeddings that capture the essence of video content, including visual expressions, body language, spoken words, and contextual cues. These embeddings are optimized for a unified vector space, enabling seamless cross-modal understanding. On the other hand, Qdrant is a powerful vector similarity search engine that allows you to store and query these embeddings efficiently.
In this tutorial, we’ll demonstrate how to build a semantic video search workflow by combining TwelveLabs’ multimodal embedding capabilities with Qdrant’s vector search engine. By the end of this guide, you’ll be able to:
Generate multimodal embeddings for videos using the TwelveLabs Embed API.
Store and manage these embeddings in Qdrant.
Perform semantic searches across video content using text or other modalities.
This workflow is ideal for applications like video indexing, content recommendation systems, and contextual search engines.

1 - Setting Up The Environment
Before diving into the implementation, let’s set up the necessary tools and libraries. For this tutorial, we’ll use Python in a Colab notebook environment.
Step 1: Install Required SDKs
Run the following command in your Colab notebook to install the TwelveLabs and Qdrant SDKs:
!pip install twelvelabs qdrant-client
Step 2: Configure API Clients
Next, configure the TwelveLabs and Qdrant clients by importing their respective libraries and initializing them with your API keys.
from twelvelabs import TwelveLabs from qdrant_client import QdrantClient # Get your API keys from: https://playground.twelvelabs.io/dashboard/api-key from google.colab import userdata TL_API_KEY=userdata.get('TL_API_KEY') twelvelabs_client = TwelveLabs(api_key=TL_API_KEY) qdrant_client = QdrantClient(":memory:")
2 - Generating Multimodal Embeddings with TwelveLabs

The TwelveLabs Embed API allows you to generate multimodal embeddings that capture the essence of video content across modalities like visuals, audio, and text. These embeddings are represented as high-dimensional vectors, enabling seamless semantic search and cross-modal understanding. In this section, we’ll demonstrate how to use the Marengo-retrieval-2.7 engine to create embeddings for a video.
Step 1: Understanding the Embedding Process
The Marengo-retrieval-2.7 engine is optimized for video-native embeddings with a dimensionality of 1024. It supports cosine similarity for vector comparisons, making it suitable for tasks like semantic search or retrieval. You can also use this engine to embed audio, text, and images into the same vector space, enabling cross-modality searches. For context, the engine is built on top of TwelveLabs’ state-of-the-art video embedding model Marengo 2.7.
Step 2: Embedding a Video
To generate embeddings for a video, follow these steps:
Specify the Video URL: Provide the URL of the video you want to process.
Create an Embedding Task: Use the TwelveLabs client to initiate an embedding task.
Wait for Task Completion: Monitor the task status until it’s complete.
Retrieve the Embeddings: Once completed, retrieve the vector embeddings from the task results.
Here’s the implementation in Python:
# Step 1: Create an embedding task task = twelvelabs_client.embed.task.create( model_name="Marengo-retrieval-2.7", # Specify the model video_url="https://sample-videos.com/video321/mp4/720/big_buck_bunny_720p_2mb.mp4" # Video URL ) # Step 2: Wait for the task to complete task.wait_for_done(sleep_interval=3) # Check every 3 seconds # Step 3: Retrieve the embeddings task_result = twelvelabs_client.embed.task.retrieve(task.id) # Display the embedding results print("Embedding Vector (First 10 Dimensions):", task_result.embeddings[:10]) print("Embedding Dimensionality:", len(task_result.embeddings))
Step 3: Cross-Modality Embeddings (Optional)
The same engine can embed other modalities (e.g., text, audio, or images) into a unified vector space. For example:
Text: Input descriptive text such as "a person riding a bike".
Audio: Use an audio file URL to extract its semantic representation.
Image: Provide an image URL for embedding.
This flexibility allows you to perform cross-modality searches, such as querying a video collection using text or audio descriptions.
Step 4: Next Steps
Once you’ve generated embeddings, they can be stored in a vector database like Qdrant for efficient similarity search. In the next section, we’ll explore how to prepare these embeddings for insertion into Qdrant and perform semantic searches across your video content.
3 - Preparing Data for Qdrant
Once you’ve generated multimodal embeddings using the TwelveLabs Embed API, the next step is to prepare these embeddings for insertion into Qdrant. Qdrant operates on points, which consist of a vector, an optional unique ID, and a payload with additional metadata. For this tutorial, we’ll map each video segment’s embedding into Qdrant’s PointStruct
format.
Step 1: Extract Embedding Segments
The TwelveLabs Embed API generates embeddings for video segments, each containing:
Vector: The high-dimensional embedding.
Metadata: Including start and end timestamps (
start_offset_sec
andend_offset_sec
) and the embedding scope.
Step 2: Convert to Qdrant Points
We’ll loop through the video embedding segments and convert them into Qdrant-compatible points. The metadata will be stored in the payload for each point.
Here’s the Python code to achieve this:
from qdrant_client.models import PointStruct # Convert embedding segments to Qdrant points points = [ PointStruct( id=idx, # Unique identifier for each vector vector=v.embeddings_float, # Embedding vector payload={ "start_offset_sec": v.start_offset_sec, # Start time of the segment "end_offset_sec": v.end_offset_sec, # End time of the segment "embedding_scope": v.embedding_scope, # Scope of the embedding }, ) for idx, v in enumerate(task_result.video_embedding.segments) ] print(f"Prepared {len(points)} points for insertion into Qdrant.")
At this stage, your data is ready to be inserted into a Qdrant collection.

4 - Setting Up a Qdrant Collection
Qdrant organizes vectors into collections, which are named sets of points. Each collection has specific parameters such as vector dimensionality and a distance metric (e.g., cosine similarity). Let’s create a collection to store the prepared points.
Step 1: Define Collection Parameters
For this tutorial:
Vector Size: 1024 (matches the dimensionality of embeddings from TwelveLabs).
Distance Metric: Cosine similarity (optimal for comparing normalized vectors).
Step 2: Create a Collection
Use the following code to create a collection in Qdrant:
from qdrant_client.models import VectorParams, Distance # Define collection name collection_name = "twelve_labs_collection" # Create a collection with specified parameters qdrant_client.create_collection( collection_name, vectors_config=VectorParams( size=1024, # Dimensionality of vectors distance=Distance.COSINE, # Similarity metric ), ) print(f"Collection '{collection_name}' created successfully.")
Step 3: Insert Points into the Collection
Now that the collection is set up, insert the prepared points:
# Insert points into the collection qdrant_client.upsert(collection_name, points) print(f"Inserted {len(points)} points into '{collection_name}'.")
Summary
At this point:
The embeddings have been converted into Qdrant-compatible points.
A new collection has been created with appropriate parameters.
The points have been successfully inserted into the collection.
In the next section, we’ll demonstrate how to query this collection to perform semantic searches across your video content.
5 - Performing Semantic Searches
With the embeddings stored in Qdrant, you can now perform semantic searches across different modalities, such as text, audio, and images. This section demonstrates how to query the Qdrant collection using embeddings generated by the TwelveLabs Embed API.
Step 1: Querying with Text
Text-based queries allow you to search for video segments that semantically match a given textual description. For example, let’s search for segments related to "A white rabbit."
# Generate text embedding text_segment = twelvelabs_client.embed.create( model_name="Marengo-retrieval-2.7", text="A white rabbit", # Input query ).text_embedding.segments[0] # Perform semantic search in Qdrant text_results = qdrant_client.query_points( collection_name=collection_name, query=text_segment.embeddings_float, # Use the embedding vector ) print("Text Query Results:", text_results)
Step 2: Querying with Audio
Audio-based queries allow you to search for video segments that match the semantic content of an audio clip. For instance, here’s how to use an audio file as a query:
# Generate audio embedding audio_segment = twelvelabs_client.embed.create( model_name="Marengo-retrieval-2.7", audio_url="https://codeskulptor-demos.commondatastorage.googleapis.com/descent/background%20music.mp3", # Audio file URL ).audio_embedding.segments[0] # Perform semantic search in Qdrant audio_results = qdrant_client.query_points( collection_name=collection_name, query=audio_segment.embeddings_float, # Use the embedding vector ) print("Audio Query Results:", audio_results)
Step 3: Querying with an Image
Image-based queries enable you to find video segments that are semantically similar to a given image. For example:
# Generate image embedding image_segment = twelvelabs_client.embed.create( model_name="Marengo-retrieval-2.7", image_url="https://gratisography.com/wp-content/uploads/2024/01/gratisography-cyber-kitty-1170x780.jpg", # Image URL ).image_embedding.segments[0] # Perform semantic search in Qdrant image_results = qdrant_client.query_points( collection_name=collection_name, query=image_segment.embeddings_float, # Use the embedding vector ) print("Image Query Results:", image_results)
Summary
By leveraging the TwelveLabs Embed API and Qdrant:
You can perform cross-modal searches using text, audio, or images as queries.
The unified vector space ensures that embeddings from different modalities are comparable, enabling seamless multimodal retrieval.
6 - Conclusion and Next Steps
In this tutorial, we explored how to build a semantic video search workflow by integrating the TwelveLabs Embed API with Qdrant. By generating multimodal embeddings from video content and leveraging Qdrant’s vector database, we created a powerful system capable of performing semantic searches across modalities such as text, audio, and images. This workflow demonstrates the potential of combining advanced AI models with scalable vector search technology to unlock new possibilities in video understanding and retrieval.
Key Takeaways
The TwelveLabs Embed API provides state-of-the-art multimodal embeddings that capture the essence of video content across visual, audio, and textual modalities.
Qdrant enables efficient storage and similarity search of these embeddings using its flexible collection structure and high-performance query capabilities.
The unified vector space created by TwelveLabs’ models allows for seamless cross-modal searches, making it possible to query videos using text descriptions, audio clips, or images.
Call to Action
As we move forward, we encourage developers and businesses to explore the combined power of TwelveLabs and Qdrant for building next-generation AI applications. Whether it’s semantic video search, personalized recommendations, or innovative RAG workflows, this partnership is poised to redefine how we interact with multimodal data.
Big thanks to the Qdrant team (David Myriel and Anush Shetty) for collaborating with us on this tutorial.
Introduction
In today’s data-driven world, video content is a rich source of information that combines multiple modalities, including visuals, audio, and text. However, extracting meaningful insights from videos and enabling semantic search across them can be challenging due to their complexity. This is where the integration of TwelveLabs Embed API and Qdrant comes into play.
The TwelveLabs Embed API empowers developers to create multimodal embeddings that capture the essence of video content, including visual expressions, body language, spoken words, and contextual cues. These embeddings are optimized for a unified vector space, enabling seamless cross-modal understanding. On the other hand, Qdrant is a powerful vector similarity search engine that allows you to store and query these embeddings efficiently.
In this tutorial, we’ll demonstrate how to build a semantic video search workflow by combining TwelveLabs’ multimodal embedding capabilities with Qdrant’s vector search engine. By the end of this guide, you’ll be able to:
Generate multimodal embeddings for videos using the TwelveLabs Embed API.
Store and manage these embeddings in Qdrant.
Perform semantic searches across video content using text or other modalities.
This workflow is ideal for applications like video indexing, content recommendation systems, and contextual search engines.

1 - Setting Up The Environment
Before diving into the implementation, let’s set up the necessary tools and libraries. For this tutorial, we’ll use Python in a Colab notebook environment.
Step 1: Install Required SDKs
Run the following command in your Colab notebook to install the TwelveLabs and Qdrant SDKs:
!pip install twelvelabs qdrant-client
Step 2: Configure API Clients
Next, configure the TwelveLabs and Qdrant clients by importing their respective libraries and initializing them with your API keys.
from twelvelabs import TwelveLabs from qdrant_client import QdrantClient # Get your API keys from: https://playground.twelvelabs.io/dashboard/api-key from google.colab import userdata TL_API_KEY=userdata.get('TL_API_KEY') twelvelabs_client = TwelveLabs(api_key=TL_API_KEY) qdrant_client = QdrantClient(":memory:")
2 - Generating Multimodal Embeddings with TwelveLabs

The TwelveLabs Embed API allows you to generate multimodal embeddings that capture the essence of video content across modalities like visuals, audio, and text. These embeddings are represented as high-dimensional vectors, enabling seamless semantic search and cross-modal understanding. In this section, we’ll demonstrate how to use the Marengo-retrieval-2.7 engine to create embeddings for a video.
Step 1: Understanding the Embedding Process
The Marengo-retrieval-2.7 engine is optimized for video-native embeddings with a dimensionality of 1024. It supports cosine similarity for vector comparisons, making it suitable for tasks like semantic search or retrieval. You can also use this engine to embed audio, text, and images into the same vector space, enabling cross-modality searches. For context, the engine is built on top of TwelveLabs’ state-of-the-art video embedding model Marengo 2.7.
Step 2: Embedding a Video
To generate embeddings for a video, follow these steps:
Specify the Video URL: Provide the URL of the video you want to process.
Create an Embedding Task: Use the TwelveLabs client to initiate an embedding task.
Wait for Task Completion: Monitor the task status until it’s complete.
Retrieve the Embeddings: Once completed, retrieve the vector embeddings from the task results.
Here’s the implementation in Python:
# Step 1: Create an embedding task task = twelvelabs_client.embed.task.create( model_name="Marengo-retrieval-2.7", # Specify the model video_url="https://sample-videos.com/video321/mp4/720/big_buck_bunny_720p_2mb.mp4" # Video URL ) # Step 2: Wait for the task to complete task.wait_for_done(sleep_interval=3) # Check every 3 seconds # Step 3: Retrieve the embeddings task_result = twelvelabs_client.embed.task.retrieve(task.id) # Display the embedding results print("Embedding Vector (First 10 Dimensions):", task_result.embeddings[:10]) print("Embedding Dimensionality:", len(task_result.embeddings))
Step 3: Cross-Modality Embeddings (Optional)
The same engine can embed other modalities (e.g., text, audio, or images) into a unified vector space. For example:
Text: Input descriptive text such as "a person riding a bike".
Audio: Use an audio file URL to extract its semantic representation.
Image: Provide an image URL for embedding.
This flexibility allows you to perform cross-modality searches, such as querying a video collection using text or audio descriptions.
Step 4: Next Steps
Once you’ve generated embeddings, they can be stored in a vector database like Qdrant for efficient similarity search. In the next section, we’ll explore how to prepare these embeddings for insertion into Qdrant and perform semantic searches across your video content.
3 - Preparing Data for Qdrant
Once you’ve generated multimodal embeddings using the TwelveLabs Embed API, the next step is to prepare these embeddings for insertion into Qdrant. Qdrant operates on points, which consist of a vector, an optional unique ID, and a payload with additional metadata. For this tutorial, we’ll map each video segment’s embedding into Qdrant’s PointStruct
format.
Step 1: Extract Embedding Segments
The TwelveLabs Embed API generates embeddings for video segments, each containing:
Vector: The high-dimensional embedding.
Metadata: Including start and end timestamps (
start_offset_sec
andend_offset_sec
) and the embedding scope.
Step 2: Convert to Qdrant Points
We’ll loop through the video embedding segments and convert them into Qdrant-compatible points. The metadata will be stored in the payload for each point.
Here’s the Python code to achieve this:
from qdrant_client.models import PointStruct # Convert embedding segments to Qdrant points points = [ PointStruct( id=idx, # Unique identifier for each vector vector=v.embeddings_float, # Embedding vector payload={ "start_offset_sec": v.start_offset_sec, # Start time of the segment "end_offset_sec": v.end_offset_sec, # End time of the segment "embedding_scope": v.embedding_scope, # Scope of the embedding }, ) for idx, v in enumerate(task_result.video_embedding.segments) ] print(f"Prepared {len(points)} points for insertion into Qdrant.")
At this stage, your data is ready to be inserted into a Qdrant collection.

4 - Setting Up a Qdrant Collection
Qdrant organizes vectors into collections, which are named sets of points. Each collection has specific parameters such as vector dimensionality and a distance metric (e.g., cosine similarity). Let’s create a collection to store the prepared points.
Step 1: Define Collection Parameters
For this tutorial:
Vector Size: 1024 (matches the dimensionality of embeddings from TwelveLabs).
Distance Metric: Cosine similarity (optimal for comparing normalized vectors).
Step 2: Create a Collection
Use the following code to create a collection in Qdrant:
from qdrant_client.models import VectorParams, Distance # Define collection name collection_name = "twelve_labs_collection" # Create a collection with specified parameters qdrant_client.create_collection( collection_name, vectors_config=VectorParams( size=1024, # Dimensionality of vectors distance=Distance.COSINE, # Similarity metric ), ) print(f"Collection '{collection_name}' created successfully.")
Step 3: Insert Points into the Collection
Now that the collection is set up, insert the prepared points:
# Insert points into the collection qdrant_client.upsert(collection_name, points) print(f"Inserted {len(points)} points into '{collection_name}'.")
Summary
At this point:
The embeddings have been converted into Qdrant-compatible points.
A new collection has been created with appropriate parameters.
The points have been successfully inserted into the collection.
In the next section, we’ll demonstrate how to query this collection to perform semantic searches across your video content.
5 - Performing Semantic Searches
With the embeddings stored in Qdrant, you can now perform semantic searches across different modalities, such as text, audio, and images. This section demonstrates how to query the Qdrant collection using embeddings generated by the TwelveLabs Embed API.
Step 1: Querying with Text
Text-based queries allow you to search for video segments that semantically match a given textual description. For example, let’s search for segments related to "A white rabbit."
# Generate text embedding text_segment = twelvelabs_client.embed.create( model_name="Marengo-retrieval-2.7", text="A white rabbit", # Input query ).text_embedding.segments[0] # Perform semantic search in Qdrant text_results = qdrant_client.query_points( collection_name=collection_name, query=text_segment.embeddings_float, # Use the embedding vector ) print("Text Query Results:", text_results)
Step 2: Querying with Audio
Audio-based queries allow you to search for video segments that match the semantic content of an audio clip. For instance, here’s how to use an audio file as a query:
# Generate audio embedding audio_segment = twelvelabs_client.embed.create( model_name="Marengo-retrieval-2.7", audio_url="https://codeskulptor-demos.commondatastorage.googleapis.com/descent/background%20music.mp3", # Audio file URL ).audio_embedding.segments[0] # Perform semantic search in Qdrant audio_results = qdrant_client.query_points( collection_name=collection_name, query=audio_segment.embeddings_float, # Use the embedding vector ) print("Audio Query Results:", audio_results)
Step 3: Querying with an Image
Image-based queries enable you to find video segments that are semantically similar to a given image. For example:
# Generate image embedding image_segment = twelvelabs_client.embed.create( model_name="Marengo-retrieval-2.7", image_url="https://gratisography.com/wp-content/uploads/2024/01/gratisography-cyber-kitty-1170x780.jpg", # Image URL ).image_embedding.segments[0] # Perform semantic search in Qdrant image_results = qdrant_client.query_points( collection_name=collection_name, query=image_segment.embeddings_float, # Use the embedding vector ) print("Image Query Results:", image_results)
Summary
By leveraging the TwelveLabs Embed API and Qdrant:
You can perform cross-modal searches using text, audio, or images as queries.
The unified vector space ensures that embeddings from different modalities are comparable, enabling seamless multimodal retrieval.
6 - Conclusion and Next Steps
In this tutorial, we explored how to build a semantic video search workflow by integrating the TwelveLabs Embed API with Qdrant. By generating multimodal embeddings from video content and leveraging Qdrant’s vector database, we created a powerful system capable of performing semantic searches across modalities such as text, audio, and images. This workflow demonstrates the potential of combining advanced AI models with scalable vector search technology to unlock new possibilities in video understanding and retrieval.
Key Takeaways
The TwelveLabs Embed API provides state-of-the-art multimodal embeddings that capture the essence of video content across visual, audio, and textual modalities.
Qdrant enables efficient storage and similarity search of these embeddings using its flexible collection structure and high-performance query capabilities.
The unified vector space created by TwelveLabs’ models allows for seamless cross-modal searches, making it possible to query videos using text descriptions, audio clips, or images.
Call to Action
As we move forward, we encourage developers and businesses to explore the combined power of TwelveLabs and Qdrant for building next-generation AI applications. Whether it’s semantic video search, personalized recommendations, or innovative RAG workflows, this partnership is poised to redefine how we interact with multimodal data.
Related articles



Search Your Videos Semantically with Twelve Labs and FiftyOne Plugin



Build a Powerful Video Summarization Tool with Twelve Labs, MindsDB, and Slack



Building Semantic Video Search with Twelve Labs Embed API and MongoDB Atlas



Advanced Video Search: Leveraging Twelve Labs and Milvus for Semantic Retrieval