Author
Manish Maheshwari
Date Published
11/13/2024
Tags
Embeddings
Foundation models
Developers
Embed API
Playground
Vector Database
Share
Join our newsletter
You’re now subscribed to the Twelve Labs Newsletter! You'll be getting the latest news and updates in video understanding.
Oh no, something went wrong.
Please try again.

We’re excited to introduce our new Embed API in Open Beta, enabling customers to generate state-of-the-art multimodal embeddings.

‍

Some of the highlights are:

  • Powered by our state-of-the-art multimodal model Marengo 2.6 outperforming Gemini Multimodal embeddings, Google VideoPrism in Text-to-Video Retrieval benchmarks - MSR-VTT, ActivityNet and Text-to-Image Retrieval benchmarks - MS-COCO
  • Up to 70% cheaper than other solutions including CLIP based models for cost/performance
  • Spatial-temporal understanding to identify and localize objects, actions, or events in both space (where it occurs in the frame) and time (when it happens over multiple frames) within a video.
  • Integrations with MongoDB, Pinecone, Databricks Mosaic AI, Milvus, LanceDB, and ApertureDB for easy vector storage.
  • Get embeddings for your Indexed videos using API and visualize video embeddings on the Twelve Labs Playground.
  • Get started with APIs and SDKs (Python, NodeJS).

To start using the Embed API for embeddings, just follow the Documentation and the Quickstart cookbook. The Embed API Pricing is now available here.

‍

Why Multimodal Embeddings?

Applications often need to work with diverse content formats. The Twelve Labs Embed API supports embedding content across multiple modalities, allowing users to create connections between text, images, video, and audio data.

With Embed API, developers can perform any-to-any search and retrieval tasks, including:

  • Text-to-Video: Search video libraries using natural language queries.
  • Image-to-Video: Retrieve relevant video content based on an image.
  • Text-to-Audio and Audio-to-Video: Efficiently link and search across audio and video data.

Additionally, multimodal embeddings can be used for a wide number of AI use cases, including:

  • Retrieval Augmented Generation (RAG)
  • Hybrid search
  • Training models
  • Creating high-quality training data
  • Anomaly detection

The following interactive visualization shows embeddings of video samples from the Kinetics 400 dataset.

‍

Building Video-RAG with Twelve Labs Embed API

One of the use cases for multimodal embeddings is building a hybrid search. You can now build a Video Retrieval Augmented Generation (Video-RAG) system using Twelve Labs Embed API and popular vector databases. Check out the following blogs where we give you a deep-dive tutorial on how to build a Video-RAG application using:

‍

Generate Video Embeddings with Twelve Labs Embed API

The following table provides details about the Embeddings using the Twelve Labs Embed API.

Model Modalities Dimensions Similarity Metric

Marengo-retrieval-2.6

Video, Audio, Image, Text
(All in the same latent space)

1024

Cosine similarity

‍

The following code snippet shows an example of generating embeddings using the Twelve Labs Embed API.Β 

Twelve Labs Multimodal Embeddings can be accessed via our APIs and SDKs (Python, NodeJS). We will be using the Python SDK for the code snippet. First step is to install the necessary libraries and import the required modules:

# Install necessary libraries and dependencies
!pip install twelvelabs
# Import required modules
import twelvelabs

‍

After running the above code, you will have all the necessary libraries installed and ready for use in the subsequent steps. The code snippet below shows how to create video embeddings:

from twelvelabs import TwelveLabs
from twelvelabs.models.embed import EmbeddingsTask

# Initialize the Twelve Labs client
twelvelabs_client = TwelveLabs(api_key=TL_API_KEY)

def generate_embedding(video_url):
    # Create an embedding task
    task = twelvelabs_client.embed.task.create(
        engine_name="Marengo-retrieval-2.6",
        video_url=video_url
    )
    print(f"Created task: id={task.id} engine_name={task.engine_name} status={task.status}")

    # Define a callback function to monitor task progress
    def on_task_update(task: EmbeddingsTask):
        print(f"  Status={task.status}")

    # Wait for the task to complete
    status = task.wait_for_done(
        sleep_interval=2,
        callback=on_task_update
    )
    print(f"Embedding done: {status}")

    # Retrieve the task result
    task_result = twelvelabs_client.embed.task.retrieve(task.id)

    # Extract and return the embeddings
    embeddings = []
    for v in task_result.video_embeddings:
        embeddings.append({
            'embedding': v.values,
            'start_offset_sec': v.start_offset_sec,
            'end_offset_sec': v.end_offset_sec,
            'embedding_scope': v.embedding_scope
        })

    return embeddings, task_result

# Example usage
video_url = "https://storage.googleapis.com/ad-demos-datasets/videos/Ecommerce%20v2.5.mp4"

# Generate embeddings for the video
embeddings, task_result = generate_embedding(video_url)

print(f"Generated {len(embeddings)} embeddings for the video")
for i, emb in enumerate(embeddings):
    print(f"Embedding {i+1}:")
    print(f"  Scope: {emb['embedding_scope']}")
    print(f"  Time range: {emb['start_offset_sec']} - {emb['end_offset_sec']} seconds")
    print(f"  Embedding vector (first 5 values): {emb['embedding'][:5]}")
    print()

‍

The following code snippet shows you how to create text embeddings given a text query:

def create_text_embedding(
    twelvelabs_client: TwelveLabs,
    text: str,
    engine_name: str = "Marengo-retrieval-2.6",
    verbose: bool = True
) -> dict:
    """
    Create a text embedding using Twelve Labs Embed API.
    
    Example:
        twelvelabs_client = TwelveLabs(api_key=TL_API_KEY)
        text_embedding = create_text_embedding(twelvelabs_client, "Your text here")
    """
    # Create embedding
    text_embedding = twelvelabs_client.embed.create(
        engine_name=engine_name,
        text=text
    )
    
    # Print information if verbose is True
    if verbose:
        print("Created a text embedding")
        print(f" Engine: {text_embedding.engine_name}")
        print(f" Embedding: {text_embedding.text_embedding}")
    
    return text_embedding

‍

The following code snippet shows you how to create audio embeddings given an audio file:

def create_audio_embedding(
    twelvelabs_client: TwelveLabs,
    audio_file: Union[str, Path],
    engine_name: str = "Marengo-retrieval-2.6",
    verbose: bool = True
) -> dict:
    """
    Create an audio embedding using Twelve Labs Embed API.
    
    Example:
        twelvelabs_client = TwelveLabs(api_key=TL_API_KEY)
        audio_embedding = create_audio_embedding(twelvelabs_client, "path/to/audio.mp3")
    """
    # Create embedding
    audio_embedding = twelvelabs_client.embed.create(
        engine_name=engine_name,
        audio_file=audio_file,
    )
    
    # Print information if verbose is True
    if verbose:
        print("Created an audio embedding")
        print(f" Engine: {audio_embedding.engine_name}")
        
        if audio_embedding.audio_embedding.segments:
            print("Segments:")
            for i, segment in enumerate(audio_embedding.audio_embedding.segments, 1):
                print(f" Segment {i}:")
                print(f" Start Offset (sec): {segment.start_offset_sec}")
                if segment.values:
                    print(f" Values: {segment.values[:5]}... (truncated)")
    
    return audio_embedding

‍

The following code snippet shows you how to create image embeddings given an image file:

def create_image_embedding(
    twelvelabs_client: TwelveLabs,
    image_file: Union[str, Path],
    engine_name: str = "Marengo-retrieval-2.6",
    verbose: bool = True
) -> dict:
    """
    Create an image embedding using Twelve Labs Embed API.
    
    Example:
        twelvelabs_client = TwelveLabs(api_key=TL_API_KEY)
        image_embedding = create_image_embedding(twelvelabs_client, "path/to/image.jpg")
    """
    # Create embedding
    image_embedding = twelvelabs_client.embed.create(
        engine_name=engine_name,
        image_file=image_file
    )
    
    # Print information if verbose is True
    if verbose:
        print("Created an image embedding")
        print(f" Engine: {image_embedding.engine_name}")
        print(f" Embedding: {image_embedding.image_embedding.values[:5]}... (truncated)")
    
    return image_embedding

‍

Get Started with Twelve Labs Embed API

Twelve Labs Embeddings is now available through APIs and Playground. To quickly start using the Embed API, here are a few resources:

‍

Authors

Leads

Lucas Lee, Yeonhoo Park, Manish Maheshwari

Core Research, Product, Design, Engineering and GTM

Jeff Kim, Jenna Kang, Sean Barclay, Sunny Nguyen, Meryl Hu, Ryan Won, Esther Kim, Wade Jeong, SJ Kim, Henry Choi, Maninder Saini, James Le, Aiden Lee, Soyoung Lee, Jae Lee

Generation Examples
No items found.
No items found.
Comparison against existing models
No items found.

Related articles

Introducing Marengo 2.7: Pioneering Multi-Vector Embeddings for Advanced Video Understanding

We are excited to announce Marengo 2.7 - a breakthrough in video understanding powered by our innovative multi-vector embedding architecture!

Jeff Kim, Mars Ha, James Le
Semantic Video Search Engine with Twelve Labs and ApertureDB

Learn how to build a semantic video search engine with the powerful integration of Twelve Labs' Embed API with ApertureDB for advanced semantic video search.

James Le
Building a Shade Finder App: Using Twelve Labs' API to Pinpoint Specific Colors in Videos

Whether you're looking to find the perfect berry-toned lipstick or just curious about spotting specific colors in your videos, this guide will help you leverage cutting-edge AI to do so effortlessly.

Meeran Kim
Building Advanced Video Understanding Applications: Integrating Twelve Labs Embed API with LanceDB for Multimodal AI

Leverage Twelve Labs Embed API and LanceDB to create AI applications that can process and analyze video content with unprecedented accuracy and efficiency.

James Le, Manish Maheshwari