Reasoning: Inducing the perceptual reasoning capability through video and language alignment
True video understanding requires the ability to reason about the perceived. This is where our video-language model, Pegasus, comes into play. Pegasus merges the reasoning skills learned from large language models (text data) with the perceptual understanding gained from our video encoder model (video data). By aligning these two modalities, Pegasus can perform cross-modal reasoning, inferring meaning and intent from Marengo's rich, multimodal representations.
Relevant Achievements
Pegasus-1-Alpha (80B): 2023.08 - The world’s first video language model deployment (before the release of OpenAI’s GPT4v)
Pegasus-1-Beta (17B): 2024.03 - A state-of-the-art video language model that beats Gemini 1.5
The synergy between Marengo and Pegasus is the key to inducing perceptual reasoning capabilities in our AI systems. By leveraging the strengths of both models, we can develop systems that not only perceive and understand the visual world but also reason about it in a way that resembles human cognition.
At Twelve Labs, we are committed to pushing the boundaries of intelligence with our focus on perceptual reasoning. Our research is not just about developing state-of-the-art models, but about fundamentally rethinking how AI systems can learn and reason about the world. Join us on this exciting journey as we pioneer the future of video understanding and unlock the full potential of artificial intelligence.