Youβre now subscribed to the Twelve Labs Newsletter! You'll be getting the latest news and updates in video understanding.
Oh no, something went wrong. Please try again.
Weβre excited to introduce our new Embed API in Open Beta, enabling customers to generate state-of-the-art multimodal embeddings.
β
Some of the highlights are:
Powered by our state-of-the-art multimodal model Marengo 2.6 outperforming Gemini Multimodal embeddings, Google VideoPrism in Text-to-Video Retrieval benchmarks - MSR-VTT, ActivityNet and Text-to-Image Retrieval benchmarks - MS-COCO
Up to 70% cheaper than other solutions including CLIP based models for cost/performance
Spatial-temporal understanding to identify and localize objects, actions, or events in both space (where it occurs in the frame) and time (when it happens over multiple frames) within a video.
To start using the Embed API for embeddings, just follow the Documentation and the Quickstart cookbook. The Embed API Pricing is now available here.
β
Why Multimodal Embeddings?
Applications often need to work with diverse content formats. The Twelve Labs Embed API supports embedding content across multiple modalities, allowing users to create connections between text, images, video, and audio data.
With Embed API, developers can perform any-to-any search and retrieval tasks, including:
Text-to-Video: Search video libraries using natural language queries.
Image-to-Video: Retrieve relevant video content based on an image.
Text-to-Audio and Audio-to-Video: Efficiently link and search across audio and video data.
Additionally, multimodal embeddings can be used for a wide number of AI use cases, including:
Retrieval Augmented Generation (RAG)
Hybrid search
Training models
Creating high-quality training data
Anomaly detection
The following interactive visualization shows embeddings of video samples from the Kinetics 400 dataset.
β
Building Video-RAG with Twelve Labs Embed API
One of the use cases for multimodal embeddings is building a hybrid search. You can now build a Video Retrieval Augmented Generation (Video-RAG) system using Twelve Labs Embed API and popular vector databases. Check out the following blogs where we give you a deep-dive tutorial on how to build a Video-RAG application using:
Generate Video Embeddings with Twelve Labs Embed API
The following table provides details about the Embeddings using the Twelve Labs Embed API.
Model
Modalities
Dimensions
Similarity Metric
Marengo-retrieval-2.6
Video, Audio, Image, Text (All in the same latent space)
1024
Cosine similarity
β
The following code snippet shows an example of generating embeddings using the Twelve Labs Embed API.Β
Twelve Labs Multimodal Embeddings can be accessed via our APIs and SDKs (Python, NodeJS). We will be using the Python SDK for the code snippet. First step is to install the necessary libraries and import the required modules:
After running the above code, you will have all the necessary libraries installed and ready for use in the subsequent steps. The code snippet below shows how to create video embeddings:
from twelvelabs import TwelveLabs
from twelvelabs.models.embed import EmbeddingsTask
# Initialize the Twelve Labs client
twelvelabs_client = TwelveLabs(api_key=TL_API_KEY)
def generate_embedding(video_url):
# Create an embedding task
task = twelvelabs_client.embed.task.create(
engine_name="Marengo-retrieval-2.6",
video_url=video_url
)
print(f"Created task: id={task.id} engine_name={task.engine_name} status={task.status}")
# Define a callback function to monitor task progress
def on_task_update(task: EmbeddingsTask):
print(f" Status={task.status}")
# Wait for the task to complete
status = task.wait_for_done(
sleep_interval=2,
callback=on_task_update
)
print(f"Embedding done: {status}")
# Retrieve the task result
task_result = twelvelabs_client.embed.task.retrieve(task.id)
# Extract and return the embeddings
embeddings = []
for v in task_result.video_embeddings:
embeddings.append({
'embedding': v.values,
'start_offset_sec': v.start_offset_sec,
'end_offset_sec': v.end_offset_sec,
'embedding_scope': v.embedding_scope
})
return embeddings, task_result
# Example usage
video_url = "https://storage.googleapis.com/ad-demos-datasets/videos/Ecommerce%20v2.5.mp4"
# Generate embeddings for the video
embeddings, task_result = generate_embedding(video_url)
print(f"Generated {len(embeddings)} embeddings for the video")
for i, emb in enumerate(embeddings):
print(f"Embedding {i+1}:")
print(f" Scope: {emb['embedding_scope']}")
print(f" Time range: {emb['start_offset_sec']} - {emb['end_offset_sec']} seconds")
print(f" Embedding vector (first 5 values): {emb['embedding'][:5]}")
print()
Core Research, Product, Design, Engineering and GTM
Jeff Kim, Jenna Kang, Sean Barclay, Sunny Nguyen, Meryl Hu, Ryan Won, Esther Kim, Wade Jeong, SJ Kim, Henry Choi, Maninder Saini, James Le, Aiden Lee, Soyoung Lee, Jae Lee
Learn how to build a semantic video search engine with the powerful integration of Twelve Labs' Embed API with ApertureDB for advanced semantic video search.
Whether you're looking to find the perfect berry-toned lipstick or just curious about spotting specific colors in your videos, this guide will help you leverage cutting-edge AI to do so effortlessly.
Leverage Twelve Labs Embed API and LanceDB to create AI applications that can process and analyze video content with unprecedented accuracy and efficiency.