Twelve Labs, a South Korean generative artificial intelligence (AI) startup, made headlines last year after securing investment from U.S. tech giant Nvidia. Based in Seoul and San Francisco, the company specializes in AI technology that analyzes and understands video. Nvidia, Intel and two other companies jointly invested $10 million in Twelve Labs last October.
“Just like OpenAI’s ChatGPT pioneered the realm of text-based generative AI, Twelve Labs aims to pave the way for the advancement of video AI,” said Twelve Labs co-founder and CEO Lee Jae-sung, 30, in a video interview with the Chosunilbo on April 8.
Twelve Labs is developing a multimodal AI that understands videos. The company’s AI model analyzes all the images and sounds in a video and matches them with the human language. For instance, the AI model can identify a scene with “a man holding a pen in the office” in an hour-long video within seconds.
When Lee founded Twelve Labs in 2020, the burgeoning AI market mainly focused on text or images. “AI startups were receiving astronomical funding for developing large language models like ChatGPT,” said Lee. “We believed video was a field where we could make a difference even with limited investment,” says Lee.
Lee, who majored in computer science at UC Berkeley and interned at Samsung Electronics and Amazon, returned to Korea to fulfill mandatory military service. Here, he met his future Twelve Labs co-founders. While serving in the Ministry of National Defense’s Cyber Operations Command, Lee and his colleagues, who were equally passionate about AI, spent time discussing research papers and exploring AI technologies, eventually starting Twelve Labs together in 2020.
“My co-founder, who was the first to finish military service, was so dedicated that he regularly visited us to study AI together,” Lee reflected. “Starting this company based on passion without worrying too much about the future turned out to be a good idea.”
Twelve Labs currently operates Pegasus, a video language foundation model that can summarize long videos into text and answer questions about videos with its users, and Marengo, a multimodal AI model that understands videos, images and audio. Over 30,000 developers and companies are using these AI models. One of the company’s most prominent partnerships is with the National Football League (NFL).
“Organizations like the NFL have amassed a treasure trove of video content that spans over a century, but monetizing such content requires advanced video search technology,” Lee said. “Companies with extensive data archives are seeking out Twelve Labs’ AI technology.”
Published 2024.04.08. 16:04
‍
See how video foundation models can radically accelerate your film making timeline.
Twelve Labs has successfully completed its SOC 2 Type 2 audit, marking a significant milestone in our commitment to data security and privacy.
Leverage Twelve Labs Embed API and LanceDB to create AI applications that can process and analyze video content with unprecedented accuracy and efficiency.
We had fun interacting with the AI community in Denver!