Short Summary
Twelve Labs Embed API enables developers to get multimodal embeddings that power advanced  video understanding use cases, from semantic video search and data curation to content recommendation and video RAG systems.
‍
With Twelve Labs, contextual vector representations can be generated that capture the relationship between visual expressions, body language, spoken words, and overall context within videos. Databricks Mosaic AI Vector Search provides a robust, scalable infrastructure for indexing and querying high-dimensional vectors.
‍
This blog post will guide you through harnessing these complementary technologies to unlock new possibilities in video AI applications.
‍
Big thanks to Nina Williams, Austin Zaccor, Fernanda Heredia, and Emily Hutson from Databricks for collaborating with us on this tutorial!
‍
Integrating Twelve Labs Embed API with Databricks Mosaic AI Vector Search addresses key challenges in video AI, such as efficient processing of large-scale video datasets and accurate multimodal content representation. This integration reduces development time and resource needs for advanced video applications, enabling complex queries across vast video libraries and enhancing overall workflow efficiency.
The unified approach to handling multimodal data is particularly noteworthy. Instead of juggling separate models for text, image, and audio analysis, users can now work with a single, coherent representation that captures the essence of video content in its entirety. This not only simplifies deployment architecture but also enables more nuanced and context-aware applications, from sophisticated content recommendation systems to advanced video search engines and automated content moderation tools.
Moreover, this integration extends the capabilities of the Databricks ecosystem, allowing seamless incorporation of video understanding into existing data pipelines and machine learning workflows. Whether companies are developing real-time video analytics, building large-scale content classification systems, or exploring novel applications in Generative AI, this combined solution provides a powerful foundation. It pushes the boundaries of what's possible in video AI, opening up new avenues for innovation and problem-solving in industries ranging from media and entertainment to security and healthcare.
‍
Twelve Labs' Embed API represents a significant advancement in multimodal embedding technology, specifically designed for video content. Unlike traditional approaches that rely on frame-by-frame analysis or separate models for different modalities, this API generates contextual vector representations that capture the intricate interplay of visual expressions, body language, spoken words, and overall context within videos. It is powered by our state-of-the-art multimodal foundation model Marengo-2.6.
The Embed API offers several key features that make it particularly powerful for AI engineers working with video data. First, it provides flexibility for any modality present in videos, eliminating the need for separate text-only or image-only models. Second, it employs a video-native approach that accounts for motion, action, and temporal information, ensuring a more accurate and temporally coherent interpretation of video content. Lastly, it creates a unified vector space that integrates embeddings from all modalities, facilitating a more holistic understanding of the video content.
For AI engineers, the Embed API opens up new possibilities in video understanding tasks. It enables more sophisticated content analysis, improved semantic search capabilities, and enhanced recommendation systems. The API's ability to capture subtle cues and interactions between different modalities over time makes it particularly valuable for applications requiring a nuanced understanding of video content, such as emotion recognition, context-aware content moderation, and advanced video retrieval systems.
‍
Before integrating Twelve Labs Embed API with Databricks Mosaic AI Vector Search, be sure you have the following prerequisites:
Note: The Embed API is currently in private beta but any user can request access by simply filling this form. Usually within a few hours, you will receive a confirmation email that you can now start using the Embed API.
‍
To begin, set up the Databricks environment and install the necessary libraries:
1 - Create a new Databricks workspace:
2 - Create a new cluster or connect to an existing cluster:
Almost any ML cluster will work for this application. The below settings are provided for those seeking optimal price performance.
3 - Create a new notebook in your Databricks workspace:
4 - Install the Twelve Labs and Mosaic AI Vector Search SDKs:
In the first cell of your notebook, run the following command:
5 - Set up Twelve Labs authentication:
In the next cell, add the following code:
Note: For enhanced security, it's recommended to use Databricks secrets to store your API key rather than hardcoding it or using environment variables.
‍
Use the provided generate_embedding
function to generate multimodal embeddings using Twelve Labs Embed API. This function is designed as a Pandas user-defined function (UDF) to work efficiently with Spark DataFrames in Databricks. It encapsulates the process of creating an embedding task, monitoring its progress, and retrieving the results.
Next, create a process_url
function, which takes the video URL as string input and invokes a wrapper call to the Twelve Labs Embed API - returning an array<float>.
Here's how to implement and use it:
Define the UDF:
Create a sample DataFrame with video URLs:
Apply the UDF to generate embeddings:
Display the results:
This process will generate multimodal embeddings for each video URL in a DataFrame that will capture the multimodal essence of the video content, including visual, audio, and textual information.
Remember that generating embeddings can be computationally intensive and time-consuming for large video datasets. Consider implementing batching or distributed processing strategies for production-scale applications. Additionally, ensure that you have appropriate error handling and logging in place to manage potential API failures or network issues.
‍
Now, create a source Delta Table to store video metadata and the embeddings generated by Twelve Labs Embed API. This table will serve as the foundation for a Vector Search index in Databricks Mosaic AI Vector Search.
First, create a source DataFrame with video URLs and metadata:
Next, declare the schema for the Delta table using SQL:
Note that Change Data Feed has been enabled on the table, which is crucial for creating and maintaining the Vector Search index.
Now, generate embeddings for your videos using the get_video_embeddings
function defined earlier:
This step may take some time, depending on the number and length of your videos.
With your embeddings generated, now you can write the data to your Delta Table:
Finally, verify your data by displaying the DataFrame with embeddings:
This step creates a robust foundation for Vector Search capabilities. The Delta Table will automatically stay in sync with the Vector Search index, ensuring that any updates or additions to our video dataset are reflected in your search results.
Some key points to remember:
id
column is auto-generated, providing a unique identifier for each video.embedding
column stores the high-dimensional vector representation of each video, generated by Twelve Labs Embed API.‍
In this step, set up Databricks Mosaic AI Vector Search to work with video embeddings. This involves creating a Vector Search endpoint and a Delta Sync Index that will automatically stay in sync with your videos_source_embeddings
Delta table.
First, create a Vector Search endpoint:
This code creates a new Vector Search endpoint or replaces an existing one with the same name. The endpoint will serve as the access point for your Vector Search operations.
Next, create a Delta Sync Index that will automatically stay in sync with your videos_source_embeddings
Delta table:
This code creates a Delta Sync Index that links to your source Delta table. If you want the index to automatically update within seconds of changes made to the source table (ensuring your Vector Search results are always up-to-date), then set pipeline_type="CONTINUOUS"
.
To verify that the index has been created and is syncing correctly, use the following code to trigger the sync:
This code allows you to check the status of your index and manually trigger a sync if needed. In production, you may prefer to set the pipeline to sync automatically based on changes to the source Delta table.
Key points to remember:
embedding_dimension
should match the dimension of the embeddings generated by Twelve Labs' Embed API (1024).primary_key
is set to "id", which should correspond to the unique identifier in our source table.embedding_vector_column
is set to "embedding," which should match the column name in our source table containing the video embeddings.‍
The next step is to implement similarity search functionality using your configured Mosaic AI Vector Search index and Twelve Labs Embed API. This will allow you to find videos similar to a given text query by leveraging the power of multimodal embeddings.
First, define a function to get the embedding for a text query using Twelve Labs Embed API:
This function takes a text query and returns its embedding using the same model as video embeddings, ensuring compatibility in the vector space.
Next, implement the similarity search function:
This function takes a text query and the number of results to return. It generates an embedding for the query, and then uses the Mosaic AI Vector Search index to find similar videos.
To parse and display the search results, use the following helper function:
Now, put it all together and perform a sample search:
This code demonstrates how to use Twelve Labs’ similarity search function to find videos related to the query "A dragon". It then parses and displays the results in a user-friendly format.
Key points to remember:
get_text_embedding
function uses the same Twelve Labs model as our video embeddings, ensuring compatibility.similarity_search
function combines text-to-embedding conversion with Vector Search to find similar videos.parse_search_results
function helps convert the raw API response into a more usable format.num_results
parameter in the similarity_search
function to control the number of results returned.This implementation enables powerful semantic search capabilities across your video dataset. Users can now find relevant videos using natural language queries, leveraging the rich multimodal embeddings generated by Twelve Labs Embed API.
‍
Now, it’s time to create a basic video recommendation system using the multimodal embeddings generated by Twelve Labs Embed API and Databricks Mosaic AI Vector Search. This system will suggest videos similar to a given video based on their embedding similarities.
First, implement a simple recommendation function:
This implementation does the following:
get_video_recommendations
function takes a video ID and the number of recommendations to return.display_recommendations
helper function formats and prints the recommendations in a user-friendly manner.To use this recommendation system:
videos_source_embeddings
table with valid embeddings.get_video_recommendations
function with a valid video ID from your dataset.This basic recommendation system demonstrates how to leverage multimodal embeddings for content-based video recommendations. It can be extended and improved in several ways:
Remember that the quality of recommendations depends on the size and diversity of your video dataset, as well as the accuracy of the embeddings generated by Twelve Labs Embed API. As you add more videos to your system, the recommendations should become more relevant and diverse.
‍
As your video library grows and evolves, it's crucial to keep your Vector Search index up-to-date. Mosaic AI Vector Search offers seamless synchronization with your source Delta table, ensuring that recommendations and search results always reflect the latest data.
Key considerations for index updates and synchronization:
By mastering these techniques, you'll ensure that your Twelve Labs video embeddings are always current and readily available for advanced search and recommendation use cases.
‍
As your video analysis pipeline grows, it is important to continue optimizing performance and scaling your solution. Distributed computing capabilities from Databricks, combined with efficient embedding generation from Twelve Labs, provide a robust foundation for handling large-scale video processing tasks.
Consider these strategies for optimizing and scaling your solution:
num_results
and implementing efficient filtering techniques.By implementing these optimization techniques, you'll be well-equipped to handle growing video libraries and increasing user demands while maintaining high performance and cost efficiency.
‍
Implementing robust monitoring and analytics is essential to ensuring the ongoing success of your video understanding pipeline. Databricks provides powerful tools for tracking system performance, user engagement, and business impact.
Key areas to focus on for monitoring and analytics:
By implementing a comprehensive monitoring and analytics strategy, you'll gain valuable insights into your video understanding pipeline's performance and impact. This data-driven approach will enable continuous improvement and help you demonstrate the value of integrating advanced video understanding capabilities from Twelve Labs with the Databricks Data Intelligence Platform.
‍
Twelve Labs and Databricks Mosaic AI provide a robust framework for advanced video understanding and analysis. This integration leverages multimodal embeddings and efficient Vector Search capabilities, enabling developers to construct sophisticated video search, recommendation, and analysis systems.
This tutorial has walked through the technical steps of setting up the environment, generating embeddings, configuring Vector Search, and implementing basic search and recommendation functionalities. It also addresses key considerations for scaling, optimizing, and monitoring your solution.
In the evolving landscape of video content, the ability to extract precise insights from this medium is critical. This integration equips developers with the tools to address complex video understanding tasks. We encourage you to explore the technical capabilities, experiment with advanced use cases, and contribute to the community of AI engineers advancing video understanding technology.
‍
To further explore and leverage this integration, consider the following resources:
These resources will help you stay at the forefront of video AI technology and continue to build innovative solutions using Twelve Labs and Databricks.
We are excited to announce Marengo 2.7 - a breakthrough in video understanding powered by our innovative multi-vector embedding architecture!
Introducing our new Embed API in Open Beta, enabling customers to generate state-of-the-art multimodal embeddings.
See how video foundation models can radically accelerate your film making timeline.
Learn how to build a semantic video search engine with the powerful integration of Twelve Labs' Embed API with ApertureDB for advanced semantic video search.