Product

5 min

Introducing Twelve Labs Embed API Open Beta

Author

Manish Maheshwari

Date Published

11/13/2024

Some of the highlights are:

Powered by our state-of-the-art multimodal model Marengo 2.6 outperforming Gemini Multimodal embeddings, Google VideoPrism in Text-to-Video Retrieval benchmarks - MSR-VTT, ActivityNet and Text-to-Image Retrieval benchmarks - MS-COCO
Up to 70% cheaper than other solutions including CLIP based models for cost/performance
Spatial-temporal understanding to identify and localize objects, actions, or events in both space (where it occurs in the frame) and time (when it happens over multiple frames) within a video.
Integrations with MongoDB, Pinecone, Databricks Mosaic AI, Milvus, LanceDB, and ApertureDB for easy vector storage.
Get embeddings for your Indexed videos using API and visualize video embeddings on the Twelve Labs Playground.
Get started with APIs and SDKs (Python, NodeJS).

To start using the Embed API for embeddings, just follow the Documentation and the Quickstart cookbook. The Embed API Pricing is now available here.

‍

Why Multimodal Embeddings?

Applications often need to work with diverse content formats. The Twelve Labs Embed API supports embedding content across multiple modalities, allowing users to create connections between text, images, video, and audio data.

With Embed API, developers can perform any-to-any search and retrieval tasks, including:

Text-to-Video: Search video libraries using natural language queries.
Image-to-Video: Retrieve relevant video content based on an image.
Text-to-Audio and Audio-to-Video: Efficiently link and search across audio and video data.

Additionally, multimodal embeddings can be used for a wide number of AI use cases, including:

Retrieval Augmented Generation (RAG)
Hybrid search
Training models
Creating high-quality training data
Anomaly detection

The following interactive visualization shows embeddings of video samples from the Kinetics 400 dataset.

‍

Building Video-RAG with Twelve Labs Embed API

One of the use cases for multimodal embeddings is building a hybrid search. You can now build a Video Retrieval Augmented Generation (Video-RAG) system using Twelve Labs Embed API and popular vector databases. Check out the following blogs where we give you a deep-dive tutorial on how to build a Video-RAG application using:

Pinecone – Multimodal RAG: Chat with Videos Using Twelve Labs and Pinecone (Pinecone docs)
MongoDB Atlas – Building Semantic Video Search with Twelve Labs Embed API and MongoDB Atlas (MongoDB docs)
Databricks Mosaic AI – Advanced Video Understanding with Twelve Labs + Databricks Mosaic AI
Milvus – Leveraging Twelve Labs and Milvus for Semantic Retrieval (Milvus docs)
LanceDB – Integrating Twelve Labs Embed API with LanceDB for Multimodal AI‍
ApertureDB – Semantic Video Search Engine with Twelve Labs and ApertureDB

‍

Generate Video Embeddings with Twelve Labs Embed API

The following table provides details about the Embeddings using the Twelve Labs Embed API.

Model	Modalities	Dimensions	Similarity Metric
Marengo-retrieval-2.6	Video, Audio, Image, Text (All in the same latent space)	1024	Cosine similarity

‍

The following code snippet shows an example of generating embeddings using the Twelve Labs Embed API.

Twelve Labs Multimodal Embeddings can be accessed via our APIs and SDKs (Python, NodeJS). We will be using the Python SDK for the code snippet. First step is to install the necessary libraries and import the required modules:

# Install necessary libraries and dependencies
!pip install twelvelabs
# Import required modules
import twelvelabs

‍

After running the above code, you will have all the necessary libraries installed and ready for use in the subsequent steps. The code snippet below shows how to create video embeddings:

from twelvelabs import TwelveLabs
from twelvelabs.models.embed import EmbeddingsTask

# Initialize the Twelve Labs client
twelvelabs_client = TwelveLabs(api_key=TL_API_KEY)

def generate_embedding(video_url):
    # Create an embedding task
    task = twelvelabs_client.embed.task.create(
        engine_name="Marengo-retrieval-2.6",
        video_url=video_url
    )
    print(f"Created task: id={task.id} engine_name={task.engine_name} status={task.status}")

    # Define a callback function to monitor task progress
    def on_task_update(task: EmbeddingsTask):
        print(f"  Status={task.status}")

    # Wait for the task to complete
    status = task.wait_for_done(
        sleep_interval=2,
        callback=on_task_update
    )
    print(f"Embedding done: {status}")

    # Retrieve the task result
    task_result = twelvelabs_client.embed.task.retrieve(task.id)

    # Extract and return the embeddings
    embeddings = []
    for v in task_result.video_embeddings:
        embeddings.append({
            'embedding': v.values,
            'start_offset_sec': v.start_offset_sec,
            'end_offset_sec': v.end_offset_sec,
            'embedding_scope': v.embedding_scope
        })

    return embeddings, task_result

# Example usage
video_url = "https://storage.googleapis.com/ad-demos-datasets/videos/Ecommerce%20v2.5.mp4"

# Generate embeddings for the video
embeddings, task_result = generate_embedding(video_url)

print(f"Generated {len(embeddings)} embeddings for the video")
for i, emb in enumerate(embeddings):
    print(f"Embedding {i+1}:")
    print(f"  Scope: {emb['embedding_scope']}")
    print(f"  Time range: {emb['start_offset_sec']} - {emb['end_offset_sec']} seconds")
    print(f"  Embedding vector (first 5 values): {emb['embedding'][:5]}")
    print()

‍

The following code snippet shows you how to create text embeddings given a text query:

def create_text_embedding(
    twelvelabs_client: TwelveLabs,
    text: str,
    engine_name: str = "Marengo-retrieval-2.6",
    verbose: bool = True
) -> dict:
    """
    Create a text embedding using Twelve Labs Embed API.
    
    Example:
        twelvelabs_client = TwelveLabs(api_key=TL_API_KEY)
        text_embedding = create_text_embedding(twelvelabs_client, "Your text here")
    """
    # Create embedding
    text_embedding = twelvelabs_client.embed.create(
        engine_name=engine_name,
        text=text
    )
    
    # Print information if verbose is True
    if verbose:
        print("Created a text embedding")
        print(f" Engine: {text_embedding.engine_name}")
        print(f" Embedding: {text_embedding.text_embedding}")
    
    return text_embedding

‍

The following code snippet shows you how to create audio embeddings given an audio file:

def create_audio_embedding(
    twelvelabs_client: TwelveLabs,
    audio_file: Union[str, Path],
    engine_name: str = "Marengo-retrieval-2.6",
    verbose: bool = True
) -> dict:
    """
    Create an audio embedding using Twelve Labs Embed API.
    
    Example:
        twelvelabs_client = TwelveLabs(api_key=TL_API_KEY)
        audio_embedding = create_audio_embedding(twelvelabs_client, "path/to/audio.mp3")
    """
    # Create embedding
    audio_embedding = twelvelabs_client.embed.create(
        engine_name=engine_name,
        audio_file=audio_file,
    )
    
    # Print information if verbose is True
    if verbose:
        print("Created an audio embedding")
        print(f" Engine: {audio_embedding.engine_name}")
        
        if audio_embedding.audio_embedding.segments:
            print("Segments:")
            for i, segment in enumerate(audio_embedding.audio_embedding.segments, 1):
                print(f" Segment {i}:")
                print(f" Start Offset (sec): {segment.start_offset_sec}")
                if segment.values:
                    print(f" Values: {segment.values[:5]}... (truncated)")
    
    return audio_embedding

‍

The following code snippet shows you how to create image embeddings given an image file:

def create_image_embedding(
    twelvelabs_client: TwelveLabs,
    image_file: Union[str, Path],
    engine_name: str = "Marengo-retrieval-2.6",
    verbose: bool = True
) -> dict:
    """
    Create an image embedding using Twelve Labs Embed API.
    
    Example:
        twelvelabs_client = TwelveLabs(api_key=TL_API_KEY)
        image_embedding = create_image_embedding(twelvelabs_client, "path/to/image.jpg")
    """
    # Create embedding
    image_embedding = twelvelabs_client.embed.create(
        engine_name=engine_name,
        image_file=image_file
    )
    
    # Print information if verbose is True
    if verbose:
        print("Created an image embedding")
        print(f" Engine: {image_embedding.engine_name}")
        print(f" Embedding: {image_embedding.image_embedding.values[:5]}... (truncated)")
    
    return image_embedding

‍

Get Started with Twelve Labs Embed API

Twelve Labs Embeddings is now available through APIs and Playground. To quickly start using the Embed API, here are a few resources:

‍

Authors

Leads

Lucas Lee, Yeonhoo Park, Manish Maheshwari

Core Research, Product, Design, Engineering and GTM

Jeff Kim, Jenna Kang, Sean Barclay, Sunny Nguyen, Meryl Hu, Ryan Won, Esther Kim, Wade Jeong, SJ Kim, Henry Choi, Maninder Saini, James Le, Aiden Lee, Soyoung Lee, Jae Lee

Generation Examples

No items found.

Comparison against existing models