MARENGO 3.0

Find anything. In any video.

A video embedding model that processes visual, audio, dialogue, and motion across 36 languages, returning a 512-dimensional vector ready for production search.

A video embedding model that processes visual, audio, dialogue, and motion across 36 languages, returning a 512-dimensional vector ready for production search.

4hrs

Continuous video and audio, processed with full temporal context.

36+

Languages for query and retrieval. Native, not translated.

512d

Embedding dimensions 6× smaller than other competitors.

30×

Faster than others at video indexing, while leading on accuracy.

Things only Marengo does.

Other foundation models stop at text-to-clip retrieval on short videos in English. Marengo starts there, then does the work that production systems actually need.

Other foundation models stop at text-to-clip retrieval on short videos in English. Marengo starts there, then does the work that production systems actually need.

512 dimensions. Same understanding.

Marengo encodes video into 512-dimensional embeddings, cutting storage 6x versus competitors and speeding searches without losing accuracy.

An image plus a sentence, in one query.

Drop a player photo, add a sentence or two, and Marengo merges them into one embedding. Mix image, text, and audio in a single query.

36 languages, one space.

Search across 36 languages in Marengo's unified vector space. No translation step, no accuracy loss between languages.

Search by example frame.

Use an image query refined by text to find matching video moments.

From signup to first result in 5 minutes.

Same model, prompts, and JSON output. Choose the surface for your team.

Same model, prompts, and JSON output. Choose the surface for your team.

Python
Node.js
1import requests
2 
3# Step 2: Define the API URL and the specific endpoint
4API_URL = "https://api.twelvelabs.io/v1.3"
5INDEXES_URL = f"{API_URL}/indexes"
6 
7# Step 3: Create the necessary headers for authentication
8headers = {
9 "x-api-key": "<YOUR_API_KEY>"
10}
11 
12# Step 4: Prepare the data payload for your API request
13INDEX_NAME = "<YOUR_INDEX_NAME>"
14data = {
15 "models": [
16 {
17 "model_name": "marengo3.0",
18 "model_options": ["visual", "audio"]
19 }
20 ]
21}

Quick Start

Index your first video and run a search in under 5 minutes.

Sample Apps

Production repos: video search, RAG over video, highlight reel generator, compliance scanner.

MCP Server

Connect Claude, Cursor, or any MCP client to your video index.

SDKs & API

Python, Node, REST. Full reference, typed responses, streaming.

Quick Start

Index your first video and run a search in under 5 minutes.

Sample Apps

Production repos: video search, RAG over video, highlight reel generator, compliance scanner.

MCP Server

Connect Claude, Cursor, or any MCP client to your video index.

SDKs & API

Python, Node, REST. Full reference, typed responses, streaming.

Faster, smaller, more accurate.

Marengo was designed against production workloads, not benchmarks, demos, or three-minute clips. Here's what that means in practice.

Marengo was designed against production workloads, not benchmarks, demos, or three-minute clips. Here's what that means in practice.

73%

Composite performance. Google Vertex sits at 52%, Amazon Nova at 55%.

Composite video-retrieval performance

Composite video-retrieval performance

CAPABILITY

MARENGO 3.0

Gemini Embedding 2

Nova Multimodal embeddings

Embedding dimensions

512

3072 (default; 1536 / 768 via Matryoshka)

3072 (default; 1024 / 384 / 256 via Matryoshka)

Max video length

4 hrs (continuous)

120 sec per request

30 sec per segment (chunked)

Multimodal Composite Queries

Yes (single embedding for image + text + audio query)

No native composite query API

No native composite query API

Sports recognition

5 sports

Not a claimed capability

Not a claimed capability

Latency

0.05

~0.50

1.50+

Embedding confidence scores

Yes

Not exposed

Not exposed