Author
The Chosun Daily
Date Published
Apr 8, 2024
Tags
Generative AI
Video understanding
Startup
Multimodal AI
Share
Join our newsletter
You’re now subscribed to the Twelve Labs Newsletter! You'll be getting the latest news and updates in video understanding.
Oh no, something went wrong.
Please try again.

Twelve Labs, a South Korean generative artificial intelligence (AI) startup, made headlines last year after securing investment from U.S. tech giant Nvidia. Based in Seoul and San Francisco, the company specializes in AI technology that analyzes and understands video. Nvidia, Intel and two other companies jointly invested $10 million in Twelve Labs last October.

“Just like OpenAI’s ChatGPT pioneered the realm of text-based generative AI, Twelve Labs aims to pave the way for the advancement of video AI,” said Twelve Labs co-founder and CEO Lee Jae-sung, 30, in a video interview with the Chosunilbo on April 8.

Twelve Labs is developing a multimodal AI that understands videos. The company’s AI model analyzes all the images and sounds in a video and matches them with the human language. For instance, the AI model can identify a scene with “a man holding a pen in the office” in an hour-long video within seconds.

When Lee founded Twelve Labs in 2020, the burgeoning AI market mainly focused on text or images. “AI startups were receiving astronomical funding for developing large language models like ChatGPT,” said Lee. “We believed video was a field where we could make a difference even with limited investment,” says Lee.

Lee, who majored in computer science at UC Berkeley and interned at Samsung Electronics and Amazon, returned to Korea to fulfill mandatory military service. Here, he met his future Twelve Labs co-founders. While serving in the Ministry of National Defense’s Cyber Operations Command, Lee and his colleagues, who were equally passionate about AI, spent time discussing research papers and exploring AI technologies, eventually starting Twelve Labs together in 2020.

“My co-founder, who was the first to finish military service, was so dedicated that he regularly visited us to study AI together,” Lee reflected. “Starting this company based on passion without worrying too much about the future turned out to be a good idea.”

Twelve Labs currently operates Pegasus, a video language foundation model that can summarize long videos into text and answer questions about videos with its users, and Marengo, a multimodal AI model that understands videos, images and audio. Over 30,000 developers and companies are using these AI models. One of the company’s most prominent partnerships is with the National Football League (NFL).

“Organizations like the NFL have amassed a treasure trove of video content that spans over a century, but monetizing such content requires advanced video search technology,” Lee said. “Companies with extensive data archives are seeking out Twelve Labs’ AI technology.”

By  Park Ji-min, Lee Jae-eun

Published 2024.04.08. 16:04

‍

Generation Examples
No items found.
No items found.
Comparison against existing models
No items found.

Related articles

Building Advanced Video Understanding Applications: Integrating Twelve Labs Embed API with LanceDB for Multimodal AI

Leverage Twelve Labs Embed API and LanceDB to create AI applications that can process and analyze video content with unprecedented accuracy and efficiency.

James Le, Manish Maheshwari
A Recap of Denver Multimodal AI Hackathon

We had fun interacting with the AI community in Denver!

James Le
Unlocking Video Insights: The Power of Phyllo and Twelve Labs Collaboration

The collaboration between Phyllo and Twelve Labs is set to revolutionize how we derive insights from video content on social media

James Le
A Recap of Our Multimodal AI in Media & Entertainment Hackathon in Sunny Los Angeles!

Twelve Labs co-hosted our first in-person hackathon in Los Angeles!

James Le