Understanding videos is much more than extracting objects from images
A video contains rich information such as movement, objects, sound, text on screen, and speech. In order for an AI to contextually understand videos, it must extract all of this information as well as understand the complex relations between objects and connections between past and present.
How our AI works
Our comprehensive AI extracts multiple features from video such as time, objects, text in video, conversation, people, and actions to synthesize the vast amount information into vectors. Vectors enable fast, semantic and scalable search.
tect-rect
Crafted carefully
with 4 core design principles
principle1
Multimodality
Integrating both visuals and audio allows our AI to understand context of a given video
principle2
Time
Our AI forms temporal connections between frames to understand time and mimic human memory
principle3
Vectors
All information within a video is transformed into vectors to enable lightning fast and scalable search
principle4
Zero-shot Performance
Our state-of-the-art video understanding AI can be easily finetuned to solve domain-specific problems
Our Strength
State-of-the-Art performance
Ranked #1 in the video retrieval track from the 2021 ICCV VALUE Challenge hosted by Microsoft,
outperforming the giants in cost as well as performance
2021 VALUE Challenge Record (2021.11)
Rank
Model
Mean
-Rank
Ave
-Score
TVR
How2R
YC2R
VATEX
-EN
1
ViSeRet (ensemble)
Twelve Labs & KAIST
1.75
35.67
14.18
7.74
62.72
58.03
2
craig.starr (ensemble)
Kakao Brain
2.25
35.32
15.41
6.75
66.04
53.07
3
hgzjy25
Tencent OVBU
4
32.79
15.78
5.56
60.89
48.94
4
ViSeRet (single)
Twelve Labs & KAIST
4.25
32.17
9.77
7.74
55.73
55.46
Rank
Model
Mean
-Rank
Ave
-Score
TVR
How2R
YC2R
VATEX
-EN
1
ViSeRet (ensemble)
Twelve Labs & KAIST
1.75
35.67
14.18
7.74
62.72
58.03
2
craig.starr (ensemble)
Kakao Brain
2.25
35.32
15.41
6.75
66.04
53.07
3
hgzjy25
Tencent OVBU
4
32.79
15.78
5.56
60.89
48.94
4
ViSeRet (single)
Twelve Labs & KAIST
4.25
32.17
9.77
7.74
55.73
55.46
tech-video1
4th Workshop on Closing the Loop Between Vision and Language
Watch video
arrowarrow
tech-video2
ViSeRet: A simple yet effective approach to moment retrieval via fine-grained video segmentation
Read report
arrowarrow
Our video-first approach
logo
approach1
End-to-end system born to process videos
approach2
Semantic on complex queries
approach3
Easy finetuning & domain adaptation
Existing solutions
existing
Collage of image & speech APIs resulting in brittle system
existing
Rule/simple tag-based search
existing
No finetuning, domain adaptation functionality
Interested in
making your videos searchable?
The world’s most powerful video understanding technology at your fingertips
Get started
arrowarrow
Request a demo
arrowarrow