MARENGO 3.0
Find anything. In any video.
4hrs
Continuous video and audio, processed with full temporal context.
36+
Languages for query and retrieval. Native, not translated.
512d
Embedding dimensions 6× smaller than other competitors.
30×
Faster than others at video indexing, while leading on accuracy.
Things only Marengo does.

512 dimensions. Same understanding.
Marengo encodes video into 512-dimensional embeddings, cutting storage 6x versus competitors and speeding searches without losing accuracy.

An image plus a sentence, in one query.
Drop a player photo, add a sentence or two, and Marengo merges them into one embedding. Mix image, text, and audio in a single query.

36 languages, one space.
Search across 36 languages in Marengo's unified vector space. No translation step, no accuracy loss between languages.

Search by example frame.
Use an image query refined by text to find matching video moments.
From signup to first result in 5 minutes.
Faster, smaller, more accurate.
CAPABILITY
MARENGO 3.0
Gemini Embedding 2
Nova Multimodal embeddings
Embedding dimensions
512
3072 (default; 1536 / 768 via Matryoshka)
3072 (default; 1024 / 384 / 256 via Matryoshka)
Max video length
4 hrs (continuous)
120 sec per request
30 sec per segment (chunked)
Multimodal Composite Queries
Yes (single embedding for image + text + audio query)
No native composite query API
No native composite query API
Sports recognition
5 sports
Not a claimed capability
Not a claimed capability
Latency
0.05
~0.50
1.50+
Embedding confidence scores
Yes
Not exposed
Not exposed



