Author
Aiden Lee
Date Published
September 1, 2024
Tags
Research
Conference
Video understanding
Video Language Models
Multimodal AI
Share
Join our newsletter
You’re now subscribed to the Twelve Labs Newsletter! You'll be getting the latest news and updates in video understanding.
Oh no, something went wrong.
Please try again.

🎉 Twelve Labs to Host First-Ever Workshop on Video-Language Models at NeurIPS 2024!

We are thrilled to announce that Twelve Labs will host the inaugural "Workshop on Video-Language Models" at NeurIPS 2024! This event will unite the brightest minds in AI to explore the latest advancements in video-language models—marking the first time NeurIPS has featured a workshop dedicated solely to this critical area of research.

🗓️ Save the Date: December 14, 2024!

Led by our CTO, Aiden Lee, this workshop promises to be an exciting convergence of top minds and cutting-edge research. We're partnering with renowned institutions including Allen Institute for AI, Amazon AGI, Microsoft, Apple, NAVER AI Lab, KAIST, and the University of North Carolina at Chapel Hill to foster discussions at the forefront of video-language models.

📢 Call For Papers

We invite researchers to submit their original work on topics related to video-language models. Papers may address, but are not limited to:

  • Video Question Answering and Visual Dialogue Systems
  • Long-form Video Understanding and Summarization
  • Ethical Considerations in Video AI
  • Multimodal Fusion and Cross-modal Retrieval
  • Video-Text Alignment, Generation, and Temporal Reasoning

Submission instructions can be found here: https://openreview.net/group?id=NeurIPS.cc/2024/Workshop/Video-Langauge_Models#tab-recent-activity

Submission Tracks:

  • Short Track: Up to 3 pages for early, novel ideas
  • Long Track: Up to 9 pages (excluding references) for comprehensive papers

Outstanding papers will be recognized with awards, including a Best Paper Award and two Runner-Up Awards.

🌟 Featured Speakers

We are honored to host an outstanding lineup of speakers, each of whom has made significant contributions to the field of video-language models and AI:

Kristen Grauman - UT Austin

Kristen Grauman, Professor at UT Austin and Research Scientist at Facebook AI Research, is renowned for her work in computer vision. She led the development of the Ego4D dataset, a crucial resource in video-language research.

Jianwei Yang - Microsoft

Jianwei Yang, Senior Researcher at Microsoft, is recognized for advancing visual recognition through his contributions to video understanding, notably the development of Phi-3-Vision and Set-of-Marks.

Gedas Bertasius - UNC Chapel Hill

Gedas Bertasius, Assistant Professor at UNC Chapel Hill, has made significant strides in video AI. His work on Timesformer and VindLU has set new benchmarks in video understanding.

Dima Damen - University of Bristol, Google DeepMind

Dima Damen, Professor at the University of Bristol and Researcher at Google DeepMind, is a leading expert in egocentric vision. She's best known for creating the EPIC-KITCHENS dataset, which has been pivotal for video-language research.

Doyup Lee - Senior Researcher at RunwayML

Doyup Lee, Senior Researcher at RunwayML, has pioneered video generation technology. His work on Gen-3 is transforming video content creation and editing through advanced AI tools.

Ishan Misra - Meta GenAI Research

Ishan Misra, Research Scientist at Meta GenAI Research, is known for his work on Emu Video—a cutting-edge video-language model that excels in video understanding through self-supervised learning techniques.

☺️ Why Attend?

This workshop offers a unique opportunity to engage with cutting-edge research, exchange ideas with leading experts, and shape the future of video-language models. Whether you're a researcher, developer, or enthusiast, you'll find this event to be a fertile ground for inspiration and collaboration.

We're excited to welcome you to NeurIPS 2024! Stay tuned for more updates, and don't miss your chance to be part of this great event.

Generation Examples
No items found.
No items found.
Comparison against existing models
No items found.

Related articles

Building Advanced Video Understanding Applications: Integrating Twelve Labs Embed API with LanceDB for Multimodal AI

Leverage Twelve Labs Embed API and LanceDB to create AI applications that can process and analyze video content with unprecedented accuracy and efficiency.

James Le, Manish Maheshwari
A Recap of Denver Multimodal AI Hackathon

We had fun interacting with the AI community in Denver!

James Le
Unlocking Video Insights: The Power of Phyllo and Twelve Labs Collaboration

The collaboration between Phyllo and Twelve Labs is set to revolutionize how we derive insights from video content on social media

James Le
A Recap of Our Multimodal AI in Media & Entertainment Hackathon in Sunny Los Angeles!

Twelve Labs co-hosted our first in-person hackathon in Los Angeles!

James Le