Understanding videos is much more than extracting objects from images
A video contains rich information such as movement, objects, sound, text on screen, and speech. In order for an AI to contextually understand videos, it must extract all of this information as well as understand the complex relations between objects and connections between past and present.
How our AI works
Our comprehensive AI extracts multiple features from video such as time, objects, text in video, conversation, people, and actions to synthesize the vast amount information into vectors. Vectors enable fast, semantic and scalable search.
tect-rect
Why developers and product
managers love building
with Twelve Labs
RichUnderstanding
Rich Understanding
Powerful AI delivers context-specific search and insights, replacing ineffective keyword tagging.
Multimodality
Multimodal
Search anything within your video : Visuals, conversations, logos, and text.
EasyIntegration
Easy Integration
End-to-end infrastructure to make all of your videos searchable. Start building with just a few API calls.
principle4
State-of-the-art Accuracy
The AI models behind Twelve Labs outperform even the strongest open-source and commercial models in both research and industry.
Our Strength
State-of-the-art performance
Ranked #1 in the video retrieval track from the 2021 ICCV VALUE Challenge hosted by Microsoft,
outperforming the giants in cost as well as performance
2021 VALUE Challenge Record (2021.11)
Rank
Model
Mean
-Rank
Ave
-Score
TVR
How2R
YC2R
VATEX
-EN
1
ViSeRet (ensemble)
Twelve Labs & KAIST
1.75
35.67
14.18
7.74
62.72
58.03
2
craig.starr (ensemble)
Kakao Brain
2.25
35.32
15.41
6.75
66.04
53.07
3
hgzjy25
Tencent OVBU
4
32.79
15.78
5.56
60.89
48.94
4
ViSeRet (single)
Twelve Labs & KAIST
4.25
32.17
9.77
7.74
55.73
55.46
Rank
Model
Mean
-Rank
Ave
-Score
TVR
How2R
YC2R
VATEX
-EN
1
ViSeRet (ensemble)
Twelve Labs & KAIST
1.75
35.67
14.18
7.74
62.72
58.03
2
craig.starr (ensemble)
Kakao Brain
2.25
35.32
15.41
6.75
66.04
53.07
3
hgzjy25
Tencent OVBU
4
32.79
15.78
5.56
60.89
48.94
4
ViSeRet (single)
Twelve Labs & KAIST
4.25
32.17
9.77
7.74
55.73
55.46
tech-video1
4th Workshop on Closing the Loop Between Vision and Language
Watch video
arrowarrow
tech-video2
ViSeRet: A simple yet effective approach to moment retrieval via fine-grained video segmentation
Read report
arrowarrow
Our video-first approach
logo
approach1
End-to-end system born to process videos
approach2
Semantic on complex queries
approach3
Easy finetuning & domain adaptation
Existing solutions
existing
Collage of image & speech APIs resulting in brittle system
existing
Rule/simple tag-based search
existing
No finetuning, domain adaptation functionality
Interested in
making your videos searchable?
Next generation video understanding technology at your finger tips