Tutorial
Tutorial
Tutorial
Building a Video Highlight Generator with Twelve Labs


Hrishikesh Yadav
Hrishikesh Yadav
Hrishikesh Yadav
The YouTube Chapter Highlight Generator is a tool developed to automatically generate chapter timestamps for YouTube videos. By analyzing the video's content, it identifies key segments and creates timestamps that can be used to create chapters for better video navigation and user experience.
The YouTube Chapter Highlight Generator is a tool developed to automatically generate chapter timestamps for YouTube videos. By analyzing the video's content, it identifies key segments and creates timestamps that can be used to create chapters for better video navigation and user experience.


Join our newsletter
Receive the latest advancements, tutorials, and industry insights in video understanding
Oct 25, 2024
Oct 25, 2024
Oct 25, 2024
13 Min
13 Min
13 Min
Copy link to article
Copy link to article
Copy link to article
Introduction
🎬 Tired of manually creating chapter timestamps for your YouTube videos? Imagine automatically generating engaging video highlights and saving hours of tedious work.
In this tutorial, we'll explore the YouTube Chapter Highlight Generator with Twelve Labs, a powerful tool that revolutionizes how content creators approach video highlights. This application tackles one of the most time-consuming aspects of video production: creating accurate and meaningful chapter timestamps with highlights.
Whether you're an established YouTuber or just starting your content creation journey, this tool will streamline your workflow. It automatically analyzes video content, generates precise timestamps with highlights, and even creates segmented video clips. The best part? Creators can use it for both short-form content and longer podcast-style videos. Let's dive into how this application works and how you can build it using the TwelveLabs Python SDK to suit your specific needs.
You can explore the demo of the application here: Video Highlight Chapter Generation
If you want to access the code and experiment with the app directly, you can use this Replit Template.
Prerequisites
Generate an API key by signing up at the Twelve Labs Playground.
Find the repository for the notebooks and this application on Video Highlight Chapter Generator Github
There are several things you should already be familiar with - Python, HTML, Markdown.
Working of the Application
This section outlines the application flow for developing chapter highlights for YouTube videos, which saves time and simplifies the process of adding highlights to YouTube content.
For podcast videos, the system manages and combines timestamps from different video chunks. Users can also retrieve existing videos from the Index by selecting a previously indexed video, fetching its URL, and generating highlight timestamps using the video ID.
Here's a stepwise breakdown of the process:
User Interface
The application features two main tabs for user interaction.
The first tab allows users to upload new videos for highlight generation.
The second tab enables users to fetch and view previously indexed videos.
Video Upload Options
Users can upload two types of video content.
Basic videos are those less than 30 minutes in duration, while podcast-style videos can be up to an hour long.
Processing Workflow
The system processes basic videos directly.
Podcast-style videos are first split into manageable chunks for efficient handling.
Highlight Generation
Once indexing is complete, the system generates a unique video ID.
This video ID is then used as a parameter with the generate.summarize function.
The Pegasus 1.1 Generative Engine creates highlight-based timestamps for the video.
Output
The final result is a set of timestamps marking key moments in the video.
These timestamps serve as chapter markers or highlights for easy navigation.

Video clip segmentation is accomplished using moviepy.editor, which accesses the video URL and highlights timestamps. To facilitate content creation, the video segments are generated in MP4 format.
Here's an overview of the application's components and their interactions:
User Interface - Manages user interactions and displays processed video segments.

Video Processing - Provides core functionality for video segmentation and processing using moviepy, including:
Segment Creator - Generates video segments.
Video Processor - Trims videos and parses segments.
Utilities - Offers helper functions for tasks such as fetching videos and generating timestamps.
API Integration - Interfaces with Twelve Labs services for indexing, including:
Task creation and management.
Generation of Gist objects containing highlight chapter information.
Now that we have a comprehensive understanding of the application's workflow for YouTube video creators, our next step is to prepare for the building process.
Preparation Steps
Sign up and create an Index on the Twelve Labs Playground.
Obtain your API Key from the Twelve Labs Playground.
Enable the following video understanding engines for generation:some text
Marengo 2.6 (Embedding Engine) for video search and classification
Pegasus 1.1 (Generative Engine) for video-to-text generation
These engines provide a robust foundation for video understanding.

Retrieve your
INDEX_ID
by opening the Index created in step 1. The ID is in the URL: https://playground.twelvelabs.io/indexes/{index_id}.Set up the
.env
file with your API Key andINDEX_ID
, along with the main file.
Twelvelabs_API=your_api_key_here API_URL=your_api_url_here
If you prefer the code-based approach, follow these steps:
Obtain your API Key from the Twelve Labs Playground and prepare the environment variable.
Import the Twelve Labs SDK and the environmental variables. Initiate the SDK client using the Twelve Labs API Key from the environment variable.
from twelvelabs import TwelveLabs from dotenv import load_dotenv load_dotenv() API_KEY = os.getenv("API_KEY") client = TwelveLabs(api_key=API_KEY)
Specifying the desired engine for the generation task:
engines = [ { "name": "marengo2.6", "options": ["visual", "conversation", "text_in_video", "logo"] }, { "name": "pegasus1.1", "options": ["visual", "conversation"] } ]
Create a new index by calling
client.index
with the Index name and engine configuration parameters. Use a unique and identifiable name for the Index.
index = client.index.create( name="<YOUR_INDEX_NAME>", engines=engines ) print(f"A new index has been created: Index id={index.id} name={index.name} engines={index.engines}")
The index.id
field represents the unique identifier of your new index. This identifier is crucial for indexing videos in their correct location.
With these steps completed, you're now ready to dive in and develop the application!
Walkthrough for Video Highlight Generator
In this tutorial, we will build a Streamit application with a minimal frontend. Below is the directory structure:
. ├── app.py ├── requirements.txt ├── utils.py ├── .env └── .gitignore
1 - Creating the Streamlit Application
Now that you've completed all the above steps, it's time to build the Streamlit application. This app provides a simple way for you to upload a video, generate highlight chapters, and create segmented video clips. The application consists of two main files:
main.py: Contains the application flow with a minimal page layout
utils.py: Houses all the essential utility functions for the application
You can find the required dependencies to set up a virtual environment in the requirements.txt file.
To get started, create a virtual Python environment and configure it for the application:
pip install -r requirements.txt
2 - Setting up the utility function for the operation
In this section, we'll explore how to generate highlight chapters and handle longer videos for indexing. We'll also cover creating video segments from the highlighted chapters by applying the results to the video indexed in section 2.2.
2.1 - Generating the Highlight Chapter and Handling the Video Processing
First, we'll import the necessary libraries: moviepy.editor
, m3u8
, io
, urllib.parse
, yt_dlp
, and the Twelve Lab SDK. We'll also set up the API Key and Index ID environment variables. We'll discuss the importance of these libraries and their functions in detail.
import os import requests from moviepy.editor import VideoFileClip from twelvelabs import TwelveLabs from dotenv import load_dotenv import io import m3u8 from urllib.parse import urljoin import yt_dlp # Load environment variables load_dotenv() API_KEY = os.getenv("API_KEY") INDEX_ID = os.getenv("INDEX_ID") def seconds_to_mmss(seconds): minutes, seconds = divmod(int(seconds), 60) return f"{minutes:02d}:{seconds:02d}" def mmss_to_seconds(mmss): minutes, seconds = map(int, mmss.split(':')) return minutes * 60 + seconds def generate_timestamps(client, video_id, start_time=0): try: gist = client.generate.summarize(video_id=video_id, type="chapter") chapter_text = "\n".join([f"{seconds_to_mmss(chapter.start + start_time)}-{chapter.chapter_title}" for chapter in gist.chapters]) return chapter_text, gist.chapters[-1].start + start_time except Exception as e: raise Exception(f"An error occurred while generating timestamps: {str(e)}") # Utitily function to trim the video based on the time stamps def trim_video(input_path, output_path, start_time, end_time): with VideoFileClip(input_path) as video: new_video = video.subclip(start_time, end_time) new_video.write_videofile(output_path, codec="libx264", audio_codec="aac") # Based on the speicific Index_ID, fetching all the video_id def fetch_existing_videos(): url = f"https://api.twelvelabs.io/v1.2/indexes/{INDEX_ID}/videos?page=1&page_limit=10&sort_by=created_at&sort_option=desc" headers = {"accept": "application/json", "x-api-key": API_KEY, "Content-Type": "application/json"} response = requests.get(url, headers=headers) if response.status_code == 200: return response.json()['data'] else: raise Exception(f"Failed to fetch videos: {response.text}") # Utility function to retrieve the URL of the video with video_id def get_video_url(video_id): url = f"https://api.twelvelabs.io/v1.2/indexes/{INDEX_ID}/videos/{video_id}" headers = {"accept": "application/json", "x-api-key": API_KEY} response = requests.get(url, headers=headers) if response.status_code == 200: data = response.json() return data['hls']['video_url'] if 'hls' in data and 'video_url' in data['hls'] else None else: raise Exception(f"Failed to get video URL: {response.text}") # Utility function to handle and process the video clips larger than 30 mins def process_video(client, video_path, video_type): with VideoFileClip(video_path) as clip: duration = clip.duration if duration > 3600: raise Exception("Video duration exceeds 1 hour. Please upload a shorter video.") if video_type == "Basic Video (less than 30 mins)": task = client.task.create(index_id=INDEX_ID, file=video_path) task.wait_for_done(sleep_interval=5) if task.status == "ready": timestamps, _ = generate_timestamps(client, task.video_id) return timestamps, task.video_id else: raise Exception(f"Indexing failed with status {task.status}") elif video_type == "Podcast (30 mins to 1 hour)": trimmed_path = os.path.join(os.path.dirname(video_path), "trimmed_1.mp4") trim_video(video_path, trimmed_path, 0, 1800) task1 = client.task.create(index_id=INDEX_ID, file=trimmed_path) task1.wait_for_done(sleep_interval=5) os.remove(trimmed_path) if task1.status != "ready": raise Exception(f"Indexing failed with status {task1.status}") timestamps, end_time = generate_timestamps(client, task1.video_id) if duration > 1800: trimmed_path = os.path.join(os.path.dirname(video_path), "trimmed_2.mp4") trim_video(video_path, trimmed_path, 1800, int(duration)) task2 = client.task.create(index_id=INDEX_ID, file=trimmed_path) task2.wait_for_done(sleep_interval=5) os.remove(trimmed_path) if task2.status != "ready": raise Exception(f"Indexing failed with status {task2.status}") timestamps_2, _ = generate_timestamps(client, task2.video_id, start_time=end_time) timestamps += "\n" + timestamps_2 return timestamps, task1.video_id # Utility function to render the video on the UI def get_hls_player_html(video_url): return f""" <script src="https://cdn.jsdelivr.net/npm/hls.js@latest"></script> <style> #video-container {{ position: relative; width: 100%; padding-bottom: 56.25%; overflow: hidden; border-radius: 10px; box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1); }} #video {{ position: absolute; top: 0; left: 0; width: 100%; height: 100%; object-fit: contain; }} </style> <div id="video-container"> <video id="video" controls></video> </div> <script> var video = document.getElementById('video'); var videoSrc = "{video_url}"; if (Hls.isSupported()) {{ var hls = new Hls(); hls.loadSource(videoSrc); hls.attachMedia(video); hls.on(Hls.Events.MANIFEST_PARSED, function() {{ video.pause(); }}); }} else if (video.canPlayType('application/vnd.apple.mpegurl')) {{ video.src = videoSrc; video.addEventListener('loadedmetadata', function() {{ video.pause(); }}); }} </script> """
A. Input: Video Content Handling
This application heavily relies on the process_video
function. It supports two types of video content: short videos under 30 minutes and longer podcast-style videos up to an hour. For longer videos, further processing is carried out by splitting the content into manageable chunks using the trim_video
utility function, ensuring accurate analysis even for lengthy recordings.
B. Processing: Video Indexing, Analysis, and Chapter Generation
For short videos under 30 minutes, video indexing begins with Marengo 2.6 (Embedding Engine), which generates the video ID. It then interacts with the generate function in TwelveLabs SDK, Pegasus 1.1 (Generative Engine), to analyze the video content and generate timestamp-based chapters for the indexed video.
For longer videos, 30-minute chunks are created, indexed one by one, and then analyzed to generate timestamp-based chapters. The end timestamp of the first chunk becomes the start of the next chunk.
The get_video_url
function retrieves the video URL of the indexed video to render it in the application. The get_hls_player_html(video_url)
function is used to render the video.
C. Return Values: Highlights with Timestamps
The generation process produces timestamps in seconds. However, YouTube descriptions require highlights in minutes and seconds (mm:ss) format. Therefore, seconds_to_mmss
is used for conversion, while mmss_to_seconds
facilitates the conversion of minutes to seconds, which is then used to trim video clips based on chapters.
All user-uploaded videos are indexed by the Index ID created earlier. Users can access indexed videos using fetch_existing_videos()
. This provides the video ID, which is sent directly to the generative engine by calling generate_timestamp
.
2.2 - Retrieve Video from Index and Segment Based on Results
This section focuses on trimming video segments based on highlight timestamps. Utility functions work together to download, parse, and create segments based on timestamps.
# Utility function to download the indexed video with the url from video_id def download_video(url, output_filename): ydl_opts = { 'format': 'best', 'outtmpl': output_filename, } with yt_dlp.YoutubeDL(ydl_opts) as ydl: ydl.download([url]) # Utitily Function to Parse the Segment def parse_segments(segment_text): lines = segment_text.strip().split('\n') segments = [] for i, line in enumerate(lines): start, description = line.split('-', 1) start_time = mmss_to_seconds(start.strip()) if i < len(lines) - 1: end = lines[i+1].split('-')[0] end_time = mmss_to_seconds(end.strip()) else: end_time = None segments.append((start_time, end_time, description.strip())) return segments # Utiltiy function to create the video segment def create_video_segments(video_url, segment_info): full_video = "full_video.mp4" segments = parse_segments(segment_info) try: # Download the full video clip download_video(video_url, full_video) for i, (start_time, end_time, description) in enumerate(segments): output_file = f"{i+1:02d}_{description.replace(' ', '_').lower()}.mp4" trim_video(full_video, output_file, start_time, end_time) yield output_file, description os.remove(full_video) except yt_dlp.utils.DownloadError as e: raise Exception(f"An error occurred while downloading: {str(e)}") except Exception as e: raise Exception(f"An unexpected error occurred: {str(e)}") # Function to downlaod the video segments after the trimming is done def download_video_segment(video_id, start_time, end_time=None): video_url = get_video_url(video_id) if not video_url: raise Exception("Failed to get video URL") playlist = m3u8.load(video_url) start_seconds = mmss_to_seconds(start_time) end_seconds = mmss_to_seconds(end_time) if end_time else None total_duration = 0 segments_to_download = [] for segment in playlist.segments: if total_duration >= start_seconds and (end_seconds is None or total_duration < end_seconds): segments_to_download.append(segment) total_duration += segment.duration if end_seconds is not None and total_duration >= end_seconds: break buffer = io.BytesIO() for segment in segments_to_download: segment_url = urljoin(video_url, segment.uri) response = requests.get(segment_url) if response.status_code == 200: buffer.write(response.content) else: raise Exception(f"Failed to download segment: {segment_url}") buffer.seek(0) return buffer.getvalue()
The download_video
function uses the yt-dlp
library to download entire videos. To create multiple segments from a single video, the full video file is downloaded.
The parse_segments
function converts the generated timestamp information into a programmatically usable format. It breaks down chapter information into start times, end times, and descriptions, preparing for video segmentation.
create_video_segments
ties everything together. After downloading the entire video, it parses segment information to create individual video clips for each chapter. This function returns each segment along with its description.
Video segments can be downloaded using the download_video_segment
function. For creating chapter previews or extracting specific parts of longer videos, it uses the HLS (HTTP Live Streaming) protocol to download only the required video segments.
This process offers content creators the ability to not only generate timestamps but also create tangible video segments for use in their production workflows and editing process 🎬✂️️.
3 - Instruction Flow of the Streamlit Application
This section focuses on the main application function, designed for a minimal UI and streamlined instructions flow by utilizing key utility functions.
The application begins by configuring the Streamlit page and applying custom CSS for an attractive, themed interface. It then initializes session state variables to maintain data consistency across reruns. For the complete code, see app.py. Essential utility functions are discussed below.
# Uplaoding feature and the processing of the video def upload_and_process_video(): video_type = st.selectbox("Select video type:", ["Basic Video (less than 30 mins)", "Podcast (30 mins to 1 hour)"]) uploaded_file = st.file_uploader("Choose a video file", type=["mp4", "mov", "avi"]) if uploaded_file and st.button("Process Video", key="process_video_button"): with tempfile.NamedTemporaryFile(delete=False, suffix=".mp4") as tmp_file: tmp_file.write(uploaded_file.read()) video_path = tmp_file.name try: with st.spinner("Processing video..."): client = TwelveLabs(api_key=API_KEY) timestamps, video_id = process_video(client, video_path, video_type) st.success("Video processed successfully!") st.session_state.timestamps = timestamps st.session_state.video_id = video_id st.session_state.video_url = get_video_url(video_id) if st.session_state.video_url: st.video(st.session_state.video_url) else: st.error("Failed to retrieve video URL.") except Exception as e: st.error(str(e)) finally: os.unlink(video_path) # Selecting the existing video from the Index and generating timestamps highlight def select_existing_video(): try: existing_videos = fetch_existing_videos() video_options = {f"{video['metadata']['filename']} ({video['_id']})": video['_id'] for video in existing_videos} if video_options: selected_video = st.selectbox("Select a video:", list(video_options.keys())) video_id = video_options[selected_video] st.session_state.video_id = video_id st.session_state.video_url = get_video_url(video_id) if st.session_state.video_url: st.markdown(f"### Selected Video: {selected_video}") st.video(st.session_state.video_url) else: st.error("Failed to retrieve video URL.") if st.button("Generate Timestamps", key="generate_timestamps_button"): with st.spinner("Generating timestamps..."): client = TwelveLabs(api_key=API_KEY) timestamps, _ = generate_timestamps(client, video_id) st.session_state.timestamps = timestamps else: st.warning("No existing videos found in the index.") except Exception as e: st.error(str(e))
The main interface features two tabs: one for uploading new videos and another for selecting existing ones. The app handles errors gracefully and provides user feedback throughout. During lengthy video segmentation processes, progress bars and status messages keep users informed. A feature to clear all segments helps manage storage and allows users to start fresh.
upload_and_process_video()
: Handles video file uploads and processing. It manages both regular videos (under 30 minutes) and longer videos, converting longer ones into chunks before indexing. Finally, it employs the generative engine.select_existing_video()
: Allows users to choose from previously uploaded videos stored in the TwelveLabs index.
The application uses st.session_state
to store and retrieve crucial information such as video URLs, IDs, and generated timestamps. This approach enables data persistence across app re-runs, ensuring a seamless user experience for multi-step operations. By maintaining this state, the app can display processed videos, use them for further operations, and work with generated timestamps without requiring users to re-upload or reprocess data.
# Function to Display the Segment and also Download def display_segment(file_name, description, segment_index): if os.path.exists(file_name): st.write(f"### {description}") st.video(file_name) with open(file_name, "rb") as file: file_contents = file.read() unique_key = f"download_{segment_index}_{uuid.uuid4()}" st.download_button( label=f"Download: {description}", data=file_contents, file_name=file_name, mime="video/mp4", key=unique_key ) st.markdown("---") else: st.warning(f"File {file_name} not found. It may have been deleted or moved.") # Function to process the segment def process_and_display_segments(): if not st.session_state.video_url: st.error("Video URL not found. Please reprocess the video.") return segment_generator = create_video_segments(st.session_state.video_url, st.session_state.timestamps) progress_bar = st.progress(0) status_text = st.empty() st.session_state.video_segments = [] # Reset video segments total_segments = len(st.session_state.timestamps.split('\n')) for i, (file_name, description) in enumerate(segment_generator, 1): st.session_state.video_segments.append((file_name, description)) display_segment(file_name, description, i-1) # Pass the index here progress = i / total_segments progress_bar.progress(progress) status_text.text(f"Processing segment {i}/{total_segments}...") progress_bar.progress(1.0) status_text.text("All segments processed!") # Function to display the timestamps and the segments def display_timestamps_and_segments(): if st.session_state.timestamps: st.subheader("YouTube Chapter Timestamps") st.write("Copy the Timestamp description and add it to the Youtube Video Description") st.code(st.session_state.timestamps, language="") if st.button("Create Video Segments", key="create_segments_button"): try: process_and_display_segments() except Exception as e: st.error(f"Error creating video segments: {str(e)}") st.exception(e) # This will display the full traceback if st.session_state.video_segments: st.subheader("Video Segments") for index, (file_name, description) in enumerate(st.session_state.video_segments): display_segment(file_name, description, index) if st.button("Clear all segments", key="clear_segments_button"): for file_name, _ in st.session_state.video_segments: if os.path.exists(file_name): os.remove(file_name) st.session_state.video_segments = [] st.success("All segment files have been cleared.") st.experimental_rerun()
process_and_display_segments()
: Manages the creation and display of video segments based on generated timestamps.display_timestamps_and_segments()
: Presents generated timestamps in a copyable format and provides options to create and display video segments.display_segment()
: Renders individual video segments, complete with a download button for each.
The display_segment
function takes a file name, description, and segment index as inputs. It checks for the video file's existence, displays the segment's description, plays the video using Streamlit's st.video function, and provides a uniquely keyed download button. If the file is not found, it shows a warning message.
The process_and_display_segments
function creates and displays all video segments. It first checks for a video URL in the session state, showing an error message if absent. Otherwise, it uses create_video_segments
to generate segments based on provided timestamps, displaying a progress bar and status text during processing. Each segment is added to the session state and displayed using the display_segment
function.
The display_timestamps_and_segments
function integrates all components. It displays YouTube chapter timestamps if available and provides a button to create video segments. When clicked, this button triggers the process_and_display_segments
function. For existing segments, it displays them and offers an option to clear all segments, removing files and resetting the application state.
Below is the Demo Application Example:

As can be seen in the above demo, the video is uploaded and indexed to generate the highlight with timestamp, which can be easily configured to fit into a YouTube description. You can observe how the highlight is implemented in the next step of the demo, as well as how segments are generated from the highlight.

To explore Twelve Labs in diverse contexts, try applying the chapter use cases to various sectors such as editing, education, or any other area that piques your interest.
More Ideas to Experiment with the Tutorial
Understanding how an application works and its development process empowers you to implement innovative ideas and create products that meet users' needs. Here are some potential use cases for video content creators, similar to the one discussed in the tutorial:
📽️ YouTube Content Creators: Generate chapter highlight markers to enhance video navigation.
🎓 Educational Videos: Enable students to easily select specific sections of interest in long tutorial videos.
🎥 Content Segmentation: Effortlessly create video clips ready for upload.
Conclusion
This blog post provides a detailed explanation of the video highlight chapter generation process developed with Twelve Labs to cater to content creators' needs. Thank you for following along with the tutorial. We look forward to your ideas on improving the user experience and solving various challenges.
Additional Resources
Learn more about the engines used for the generation task, Marengo 2.6 (Embedding Engine) and Pegasus 1.1 (Generator Engine). To further explore Twelve Labs and enhance your understanding of video content analysis, check out these valuable resources:
Discord Community: Join our vibrant community of developers and enthusiasts to discuss ideas, ask questions, and share your projects. Join the Twelve Labs Discord
Sample Applications: Explore a variety of sample applications to inspire your next project or learn new implementation techniques.
Explore Tutorials: Dive deeper into Twelve Labs capabilities with our comprehensive tutorials.
We encourage you to leverage these resources to expand your knowledge and create innovative applications using Twelve Labs video understanding technology.
Introduction
🎬 Tired of manually creating chapter timestamps for your YouTube videos? Imagine automatically generating engaging video highlights and saving hours of tedious work.
In this tutorial, we'll explore the YouTube Chapter Highlight Generator with Twelve Labs, a powerful tool that revolutionizes how content creators approach video highlights. This application tackles one of the most time-consuming aspects of video production: creating accurate and meaningful chapter timestamps with highlights.
Whether you're an established YouTuber or just starting your content creation journey, this tool will streamline your workflow. It automatically analyzes video content, generates precise timestamps with highlights, and even creates segmented video clips. The best part? Creators can use it for both short-form content and longer podcast-style videos. Let's dive into how this application works and how you can build it using the TwelveLabs Python SDK to suit your specific needs.
You can explore the demo of the application here: Video Highlight Chapter Generation
If you want to access the code and experiment with the app directly, you can use this Replit Template.
Prerequisites
Generate an API key by signing up at the Twelve Labs Playground.
Find the repository for the notebooks and this application on Video Highlight Chapter Generator Github
There are several things you should already be familiar with - Python, HTML, Markdown.
Working of the Application
This section outlines the application flow for developing chapter highlights for YouTube videos, which saves time and simplifies the process of adding highlights to YouTube content.
For podcast videos, the system manages and combines timestamps from different video chunks. Users can also retrieve existing videos from the Index by selecting a previously indexed video, fetching its URL, and generating highlight timestamps using the video ID.
Here's a stepwise breakdown of the process:
User Interface
The application features two main tabs for user interaction.
The first tab allows users to upload new videos for highlight generation.
The second tab enables users to fetch and view previously indexed videos.
Video Upload Options
Users can upload two types of video content.
Basic videos are those less than 30 minutes in duration, while podcast-style videos can be up to an hour long.
Processing Workflow
The system processes basic videos directly.
Podcast-style videos are first split into manageable chunks for efficient handling.
Highlight Generation
Once indexing is complete, the system generates a unique video ID.
This video ID is then used as a parameter with the generate.summarize function.
The Pegasus 1.1 Generative Engine creates highlight-based timestamps for the video.
Output
The final result is a set of timestamps marking key moments in the video.
These timestamps serve as chapter markers or highlights for easy navigation.

Video clip segmentation is accomplished using moviepy.editor, which accesses the video URL and highlights timestamps. To facilitate content creation, the video segments are generated in MP4 format.
Here's an overview of the application's components and their interactions:
User Interface - Manages user interactions and displays processed video segments.

Video Processing - Provides core functionality for video segmentation and processing using moviepy, including:
Segment Creator - Generates video segments.
Video Processor - Trims videos and parses segments.
Utilities - Offers helper functions for tasks such as fetching videos and generating timestamps.
API Integration - Interfaces with Twelve Labs services for indexing, including:
Task creation and management.
Generation of Gist objects containing highlight chapter information.
Now that we have a comprehensive understanding of the application's workflow for YouTube video creators, our next step is to prepare for the building process.
Preparation Steps
Sign up and create an Index on the Twelve Labs Playground.
Obtain your API Key from the Twelve Labs Playground.
Enable the following video understanding engines for generation:some text
Marengo 2.6 (Embedding Engine) for video search and classification
Pegasus 1.1 (Generative Engine) for video-to-text generation
These engines provide a robust foundation for video understanding.

Retrieve your
INDEX_ID
by opening the Index created in step 1. The ID is in the URL: https://playground.twelvelabs.io/indexes/{index_id}.Set up the
.env
file with your API Key andINDEX_ID
, along with the main file.
Twelvelabs_API=your_api_key_here API_URL=your_api_url_here
If you prefer the code-based approach, follow these steps:
Obtain your API Key from the Twelve Labs Playground and prepare the environment variable.
Import the Twelve Labs SDK and the environmental variables. Initiate the SDK client using the Twelve Labs API Key from the environment variable.
from twelvelabs import TwelveLabs from dotenv import load_dotenv load_dotenv() API_KEY = os.getenv("API_KEY") client = TwelveLabs(api_key=API_KEY)
Specifying the desired engine for the generation task:
engines = [ { "name": "marengo2.6", "options": ["visual", "conversation", "text_in_video", "logo"] }, { "name": "pegasus1.1", "options": ["visual", "conversation"] } ]
Create a new index by calling
client.index
with the Index name and engine configuration parameters. Use a unique and identifiable name for the Index.
index = client.index.create( name="<YOUR_INDEX_NAME>", engines=engines ) print(f"A new index has been created: Index id={index.id} name={index.name} engines={index.engines}")
The index.id
field represents the unique identifier of your new index. This identifier is crucial for indexing videos in their correct location.
With these steps completed, you're now ready to dive in and develop the application!
Walkthrough for Video Highlight Generator
In this tutorial, we will build a Streamit application with a minimal frontend. Below is the directory structure:
. ├── app.py ├── requirements.txt ├── utils.py ├── .env └── .gitignore
1 - Creating the Streamlit Application
Now that you've completed all the above steps, it's time to build the Streamlit application. This app provides a simple way for you to upload a video, generate highlight chapters, and create segmented video clips. The application consists of two main files:
main.py: Contains the application flow with a minimal page layout
utils.py: Houses all the essential utility functions for the application
You can find the required dependencies to set up a virtual environment in the requirements.txt file.
To get started, create a virtual Python environment and configure it for the application:
pip install -r requirements.txt
2 - Setting up the utility function for the operation
In this section, we'll explore how to generate highlight chapters and handle longer videos for indexing. We'll also cover creating video segments from the highlighted chapters by applying the results to the video indexed in section 2.2.
2.1 - Generating the Highlight Chapter and Handling the Video Processing
First, we'll import the necessary libraries: moviepy.editor
, m3u8
, io
, urllib.parse
, yt_dlp
, and the Twelve Lab SDK. We'll also set up the API Key and Index ID environment variables. We'll discuss the importance of these libraries and their functions in detail.
import os import requests from moviepy.editor import VideoFileClip from twelvelabs import TwelveLabs from dotenv import load_dotenv import io import m3u8 from urllib.parse import urljoin import yt_dlp # Load environment variables load_dotenv() API_KEY = os.getenv("API_KEY") INDEX_ID = os.getenv("INDEX_ID") def seconds_to_mmss(seconds): minutes, seconds = divmod(int(seconds), 60) return f"{minutes:02d}:{seconds:02d}" def mmss_to_seconds(mmss): minutes, seconds = map(int, mmss.split(':')) return minutes * 60 + seconds def generate_timestamps(client, video_id, start_time=0): try: gist = client.generate.summarize(video_id=video_id, type="chapter") chapter_text = "\n".join([f"{seconds_to_mmss(chapter.start + start_time)}-{chapter.chapter_title}" for chapter in gist.chapters]) return chapter_text, gist.chapters[-1].start + start_time except Exception as e: raise Exception(f"An error occurred while generating timestamps: {str(e)}") # Utitily function to trim the video based on the time stamps def trim_video(input_path, output_path, start_time, end_time): with VideoFileClip(input_path) as video: new_video = video.subclip(start_time, end_time) new_video.write_videofile(output_path, codec="libx264", audio_codec="aac") # Based on the speicific Index_ID, fetching all the video_id def fetch_existing_videos(): url = f"https://api.twelvelabs.io/v1.2/indexes/{INDEX_ID}/videos?page=1&page_limit=10&sort_by=created_at&sort_option=desc" headers = {"accept": "application/json", "x-api-key": API_KEY, "Content-Type": "application/json"} response = requests.get(url, headers=headers) if response.status_code == 200: return response.json()['data'] else: raise Exception(f"Failed to fetch videos: {response.text}") # Utility function to retrieve the URL of the video with video_id def get_video_url(video_id): url = f"https://api.twelvelabs.io/v1.2/indexes/{INDEX_ID}/videos/{video_id}" headers = {"accept": "application/json", "x-api-key": API_KEY} response = requests.get(url, headers=headers) if response.status_code == 200: data = response.json() return data['hls']['video_url'] if 'hls' in data and 'video_url' in data['hls'] else None else: raise Exception(f"Failed to get video URL: {response.text}") # Utility function to handle and process the video clips larger than 30 mins def process_video(client, video_path, video_type): with VideoFileClip(video_path) as clip: duration = clip.duration if duration > 3600: raise Exception("Video duration exceeds 1 hour. Please upload a shorter video.") if video_type == "Basic Video (less than 30 mins)": task = client.task.create(index_id=INDEX_ID, file=video_path) task.wait_for_done(sleep_interval=5) if task.status == "ready": timestamps, _ = generate_timestamps(client, task.video_id) return timestamps, task.video_id else: raise Exception(f"Indexing failed with status {task.status}") elif video_type == "Podcast (30 mins to 1 hour)": trimmed_path = os.path.join(os.path.dirname(video_path), "trimmed_1.mp4") trim_video(video_path, trimmed_path, 0, 1800) task1 = client.task.create(index_id=INDEX_ID, file=trimmed_path) task1.wait_for_done(sleep_interval=5) os.remove(trimmed_path) if task1.status != "ready": raise Exception(f"Indexing failed with status {task1.status}") timestamps, end_time = generate_timestamps(client, task1.video_id) if duration > 1800: trimmed_path = os.path.join(os.path.dirname(video_path), "trimmed_2.mp4") trim_video(video_path, trimmed_path, 1800, int(duration)) task2 = client.task.create(index_id=INDEX_ID, file=trimmed_path) task2.wait_for_done(sleep_interval=5) os.remove(trimmed_path) if task2.status != "ready": raise Exception(f"Indexing failed with status {task2.status}") timestamps_2, _ = generate_timestamps(client, task2.video_id, start_time=end_time) timestamps += "\n" + timestamps_2 return timestamps, task1.video_id # Utility function to render the video on the UI def get_hls_player_html(video_url): return f""" <script src="https://cdn.jsdelivr.net/npm/hls.js@latest"></script> <style> #video-container {{ position: relative; width: 100%; padding-bottom: 56.25%; overflow: hidden; border-radius: 10px; box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1); }} #video {{ position: absolute; top: 0; left: 0; width: 100%; height: 100%; object-fit: contain; }} </style> <div id="video-container"> <video id="video" controls></video> </div> <script> var video = document.getElementById('video'); var videoSrc = "{video_url}"; if (Hls.isSupported()) {{ var hls = new Hls(); hls.loadSource(videoSrc); hls.attachMedia(video); hls.on(Hls.Events.MANIFEST_PARSED, function() {{ video.pause(); }}); }} else if (video.canPlayType('application/vnd.apple.mpegurl')) {{ video.src = videoSrc; video.addEventListener('loadedmetadata', function() {{ video.pause(); }}); }} </script> """
A. Input: Video Content Handling
This application heavily relies on the process_video
function. It supports two types of video content: short videos under 30 minutes and longer podcast-style videos up to an hour. For longer videos, further processing is carried out by splitting the content into manageable chunks using the trim_video
utility function, ensuring accurate analysis even for lengthy recordings.
B. Processing: Video Indexing, Analysis, and Chapter Generation
For short videos under 30 minutes, video indexing begins with Marengo 2.6 (Embedding Engine), which generates the video ID. It then interacts with the generate function in TwelveLabs SDK, Pegasus 1.1 (Generative Engine), to analyze the video content and generate timestamp-based chapters for the indexed video.
For longer videos, 30-minute chunks are created, indexed one by one, and then analyzed to generate timestamp-based chapters. The end timestamp of the first chunk becomes the start of the next chunk.
The get_video_url
function retrieves the video URL of the indexed video to render it in the application. The get_hls_player_html(video_url)
function is used to render the video.
C. Return Values: Highlights with Timestamps
The generation process produces timestamps in seconds. However, YouTube descriptions require highlights in minutes and seconds (mm:ss) format. Therefore, seconds_to_mmss
is used for conversion, while mmss_to_seconds
facilitates the conversion of minutes to seconds, which is then used to trim video clips based on chapters.
All user-uploaded videos are indexed by the Index ID created earlier. Users can access indexed videos using fetch_existing_videos()
. This provides the video ID, which is sent directly to the generative engine by calling generate_timestamp
.
2.2 - Retrieve Video from Index and Segment Based on Results
This section focuses on trimming video segments based on highlight timestamps. Utility functions work together to download, parse, and create segments based on timestamps.
# Utility function to download the indexed video with the url from video_id def download_video(url, output_filename): ydl_opts = { 'format': 'best', 'outtmpl': output_filename, } with yt_dlp.YoutubeDL(ydl_opts) as ydl: ydl.download([url]) # Utitily Function to Parse the Segment def parse_segments(segment_text): lines = segment_text.strip().split('\n') segments = [] for i, line in enumerate(lines): start, description = line.split('-', 1) start_time = mmss_to_seconds(start.strip()) if i < len(lines) - 1: end = lines[i+1].split('-')[0] end_time = mmss_to_seconds(end.strip()) else: end_time = None segments.append((start_time, end_time, description.strip())) return segments # Utiltiy function to create the video segment def create_video_segments(video_url, segment_info): full_video = "full_video.mp4" segments = parse_segments(segment_info) try: # Download the full video clip download_video(video_url, full_video) for i, (start_time, end_time, description) in enumerate(segments): output_file = f"{i+1:02d}_{description.replace(' ', '_').lower()}.mp4" trim_video(full_video, output_file, start_time, end_time) yield output_file, description os.remove(full_video) except yt_dlp.utils.DownloadError as e: raise Exception(f"An error occurred while downloading: {str(e)}") except Exception as e: raise Exception(f"An unexpected error occurred: {str(e)}") # Function to downlaod the video segments after the trimming is done def download_video_segment(video_id, start_time, end_time=None): video_url = get_video_url(video_id) if not video_url: raise Exception("Failed to get video URL") playlist = m3u8.load(video_url) start_seconds = mmss_to_seconds(start_time) end_seconds = mmss_to_seconds(end_time) if end_time else None total_duration = 0 segments_to_download = [] for segment in playlist.segments: if total_duration >= start_seconds and (end_seconds is None or total_duration < end_seconds): segments_to_download.append(segment) total_duration += segment.duration if end_seconds is not None and total_duration >= end_seconds: break buffer = io.BytesIO() for segment in segments_to_download: segment_url = urljoin(video_url, segment.uri) response = requests.get(segment_url) if response.status_code == 200: buffer.write(response.content) else: raise Exception(f"Failed to download segment: {segment_url}") buffer.seek(0) return buffer.getvalue()
The download_video
function uses the yt-dlp
library to download entire videos. To create multiple segments from a single video, the full video file is downloaded.
The parse_segments
function converts the generated timestamp information into a programmatically usable format. It breaks down chapter information into start times, end times, and descriptions, preparing for video segmentation.
create_video_segments
ties everything together. After downloading the entire video, it parses segment information to create individual video clips for each chapter. This function returns each segment along with its description.
Video segments can be downloaded using the download_video_segment
function. For creating chapter previews or extracting specific parts of longer videos, it uses the HLS (HTTP Live Streaming) protocol to download only the required video segments.
This process offers content creators the ability to not only generate timestamps but also create tangible video segments for use in their production workflows and editing process 🎬✂️️.
3 - Instruction Flow of the Streamlit Application
This section focuses on the main application function, designed for a minimal UI and streamlined instructions flow by utilizing key utility functions.
The application begins by configuring the Streamlit page and applying custom CSS for an attractive, themed interface. It then initializes session state variables to maintain data consistency across reruns. For the complete code, see app.py. Essential utility functions are discussed below.
# Uplaoding feature and the processing of the video def upload_and_process_video(): video_type = st.selectbox("Select video type:", ["Basic Video (less than 30 mins)", "Podcast (30 mins to 1 hour)"]) uploaded_file = st.file_uploader("Choose a video file", type=["mp4", "mov", "avi"]) if uploaded_file and st.button("Process Video", key="process_video_button"): with tempfile.NamedTemporaryFile(delete=False, suffix=".mp4") as tmp_file: tmp_file.write(uploaded_file.read()) video_path = tmp_file.name try: with st.spinner("Processing video..."): client = TwelveLabs(api_key=API_KEY) timestamps, video_id = process_video(client, video_path, video_type) st.success("Video processed successfully!") st.session_state.timestamps = timestamps st.session_state.video_id = video_id st.session_state.video_url = get_video_url(video_id) if st.session_state.video_url: st.video(st.session_state.video_url) else: st.error("Failed to retrieve video URL.") except Exception as e: st.error(str(e)) finally: os.unlink(video_path) # Selecting the existing video from the Index and generating timestamps highlight def select_existing_video(): try: existing_videos = fetch_existing_videos() video_options = {f"{video['metadata']['filename']} ({video['_id']})": video['_id'] for video in existing_videos} if video_options: selected_video = st.selectbox("Select a video:", list(video_options.keys())) video_id = video_options[selected_video] st.session_state.video_id = video_id st.session_state.video_url = get_video_url(video_id) if st.session_state.video_url: st.markdown(f"### Selected Video: {selected_video}") st.video(st.session_state.video_url) else: st.error("Failed to retrieve video URL.") if st.button("Generate Timestamps", key="generate_timestamps_button"): with st.spinner("Generating timestamps..."): client = TwelveLabs(api_key=API_KEY) timestamps, _ = generate_timestamps(client, video_id) st.session_state.timestamps = timestamps else: st.warning("No existing videos found in the index.") except Exception as e: st.error(str(e))
The main interface features two tabs: one for uploading new videos and another for selecting existing ones. The app handles errors gracefully and provides user feedback throughout. During lengthy video segmentation processes, progress bars and status messages keep users informed. A feature to clear all segments helps manage storage and allows users to start fresh.
upload_and_process_video()
: Handles video file uploads and processing. It manages both regular videos (under 30 minutes) and longer videos, converting longer ones into chunks before indexing. Finally, it employs the generative engine.select_existing_video()
: Allows users to choose from previously uploaded videos stored in the TwelveLabs index.
The application uses st.session_state
to store and retrieve crucial information such as video URLs, IDs, and generated timestamps. This approach enables data persistence across app re-runs, ensuring a seamless user experience for multi-step operations. By maintaining this state, the app can display processed videos, use them for further operations, and work with generated timestamps without requiring users to re-upload or reprocess data.
# Function to Display the Segment and also Download def display_segment(file_name, description, segment_index): if os.path.exists(file_name): st.write(f"### {description}") st.video(file_name) with open(file_name, "rb") as file: file_contents = file.read() unique_key = f"download_{segment_index}_{uuid.uuid4()}" st.download_button( label=f"Download: {description}", data=file_contents, file_name=file_name, mime="video/mp4", key=unique_key ) st.markdown("---") else: st.warning(f"File {file_name} not found. It may have been deleted or moved.") # Function to process the segment def process_and_display_segments(): if not st.session_state.video_url: st.error("Video URL not found. Please reprocess the video.") return segment_generator = create_video_segments(st.session_state.video_url, st.session_state.timestamps) progress_bar = st.progress(0) status_text = st.empty() st.session_state.video_segments = [] # Reset video segments total_segments = len(st.session_state.timestamps.split('\n')) for i, (file_name, description) in enumerate(segment_generator, 1): st.session_state.video_segments.append((file_name, description)) display_segment(file_name, description, i-1) # Pass the index here progress = i / total_segments progress_bar.progress(progress) status_text.text(f"Processing segment {i}/{total_segments}...") progress_bar.progress(1.0) status_text.text("All segments processed!") # Function to display the timestamps and the segments def display_timestamps_and_segments(): if st.session_state.timestamps: st.subheader("YouTube Chapter Timestamps") st.write("Copy the Timestamp description and add it to the Youtube Video Description") st.code(st.session_state.timestamps, language="") if st.button("Create Video Segments", key="create_segments_button"): try: process_and_display_segments() except Exception as e: st.error(f"Error creating video segments: {str(e)}") st.exception(e) # This will display the full traceback if st.session_state.video_segments: st.subheader("Video Segments") for index, (file_name, description) in enumerate(st.session_state.video_segments): display_segment(file_name, description, index) if st.button("Clear all segments", key="clear_segments_button"): for file_name, _ in st.session_state.video_segments: if os.path.exists(file_name): os.remove(file_name) st.session_state.video_segments = [] st.success("All segment files have been cleared.") st.experimental_rerun()
process_and_display_segments()
: Manages the creation and display of video segments based on generated timestamps.display_timestamps_and_segments()
: Presents generated timestamps in a copyable format and provides options to create and display video segments.display_segment()
: Renders individual video segments, complete with a download button for each.
The display_segment
function takes a file name, description, and segment index as inputs. It checks for the video file's existence, displays the segment's description, plays the video using Streamlit's st.video function, and provides a uniquely keyed download button. If the file is not found, it shows a warning message.
The process_and_display_segments
function creates and displays all video segments. It first checks for a video URL in the session state, showing an error message if absent. Otherwise, it uses create_video_segments
to generate segments based on provided timestamps, displaying a progress bar and status text during processing. Each segment is added to the session state and displayed using the display_segment
function.
The display_timestamps_and_segments
function integrates all components. It displays YouTube chapter timestamps if available and provides a button to create video segments. When clicked, this button triggers the process_and_display_segments
function. For existing segments, it displays them and offers an option to clear all segments, removing files and resetting the application state.
Below is the Demo Application Example:

As can be seen in the above demo, the video is uploaded and indexed to generate the highlight with timestamp, which can be easily configured to fit into a YouTube description. You can observe how the highlight is implemented in the next step of the demo, as well as how segments are generated from the highlight.

To explore Twelve Labs in diverse contexts, try applying the chapter use cases to various sectors such as editing, education, or any other area that piques your interest.
More Ideas to Experiment with the Tutorial
Understanding how an application works and its development process empowers you to implement innovative ideas and create products that meet users' needs. Here are some potential use cases for video content creators, similar to the one discussed in the tutorial:
📽️ YouTube Content Creators: Generate chapter highlight markers to enhance video navigation.
🎓 Educational Videos: Enable students to easily select specific sections of interest in long tutorial videos.
🎥 Content Segmentation: Effortlessly create video clips ready for upload.
Conclusion
This blog post provides a detailed explanation of the video highlight chapter generation process developed with Twelve Labs to cater to content creators' needs. Thank you for following along with the tutorial. We look forward to your ideas on improving the user experience and solving various challenges.
Additional Resources
Learn more about the engines used for the generation task, Marengo 2.6 (Embedding Engine) and Pegasus 1.1 (Generator Engine). To further explore Twelve Labs and enhance your understanding of video content analysis, check out these valuable resources:
Discord Community: Join our vibrant community of developers and enthusiasts to discuss ideas, ask questions, and share your projects. Join the Twelve Labs Discord
Sample Applications: Explore a variety of sample applications to inspire your next project or learn new implementation techniques.
Explore Tutorials: Dive deeper into Twelve Labs capabilities with our comprehensive tutorials.
We encourage you to leverage these resources to expand your knowledge and create innovative applications using Twelve Labs video understanding technology.
Introduction
🎬 Tired of manually creating chapter timestamps for your YouTube videos? Imagine automatically generating engaging video highlights and saving hours of tedious work.
In this tutorial, we'll explore the YouTube Chapter Highlight Generator with Twelve Labs, a powerful tool that revolutionizes how content creators approach video highlights. This application tackles one of the most time-consuming aspects of video production: creating accurate and meaningful chapter timestamps with highlights.
Whether you're an established YouTuber or just starting your content creation journey, this tool will streamline your workflow. It automatically analyzes video content, generates precise timestamps with highlights, and even creates segmented video clips. The best part? Creators can use it for both short-form content and longer podcast-style videos. Let's dive into how this application works and how you can build it using the TwelveLabs Python SDK to suit your specific needs.
You can explore the demo of the application here: Video Highlight Chapter Generation
If you want to access the code and experiment with the app directly, you can use this Replit Template.
Prerequisites
Generate an API key by signing up at the Twelve Labs Playground.
Find the repository for the notebooks and this application on Video Highlight Chapter Generator Github
There are several things you should already be familiar with - Python, HTML, Markdown.
Working of the Application
This section outlines the application flow for developing chapter highlights for YouTube videos, which saves time and simplifies the process of adding highlights to YouTube content.
For podcast videos, the system manages and combines timestamps from different video chunks. Users can also retrieve existing videos from the Index by selecting a previously indexed video, fetching its URL, and generating highlight timestamps using the video ID.
Here's a stepwise breakdown of the process:
User Interface
The application features two main tabs for user interaction.
The first tab allows users to upload new videos for highlight generation.
The second tab enables users to fetch and view previously indexed videos.
Video Upload Options
Users can upload two types of video content.
Basic videos are those less than 30 minutes in duration, while podcast-style videos can be up to an hour long.
Processing Workflow
The system processes basic videos directly.
Podcast-style videos are first split into manageable chunks for efficient handling.
Highlight Generation
Once indexing is complete, the system generates a unique video ID.
This video ID is then used as a parameter with the generate.summarize function.
The Pegasus 1.1 Generative Engine creates highlight-based timestamps for the video.
Output
The final result is a set of timestamps marking key moments in the video.
These timestamps serve as chapter markers or highlights for easy navigation.

Video clip segmentation is accomplished using moviepy.editor, which accesses the video URL and highlights timestamps. To facilitate content creation, the video segments are generated in MP4 format.
Here's an overview of the application's components and their interactions:
User Interface - Manages user interactions and displays processed video segments.

Video Processing - Provides core functionality for video segmentation and processing using moviepy, including:
Segment Creator - Generates video segments.
Video Processor - Trims videos and parses segments.
Utilities - Offers helper functions for tasks such as fetching videos and generating timestamps.
API Integration - Interfaces with Twelve Labs services for indexing, including:
Task creation and management.
Generation of Gist objects containing highlight chapter information.
Now that we have a comprehensive understanding of the application's workflow for YouTube video creators, our next step is to prepare for the building process.
Preparation Steps
Sign up and create an Index on the Twelve Labs Playground.
Obtain your API Key from the Twelve Labs Playground.
Enable the following video understanding engines for generation:some text
Marengo 2.6 (Embedding Engine) for video search and classification
Pegasus 1.1 (Generative Engine) for video-to-text generation
These engines provide a robust foundation for video understanding.

Retrieve your
INDEX_ID
by opening the Index created in step 1. The ID is in the URL: https://playground.twelvelabs.io/indexes/{index_id}.Set up the
.env
file with your API Key andINDEX_ID
, along with the main file.
Twelvelabs_API=your_api_key_here API_URL=your_api_url_here
If you prefer the code-based approach, follow these steps:
Obtain your API Key from the Twelve Labs Playground and prepare the environment variable.
Import the Twelve Labs SDK and the environmental variables. Initiate the SDK client using the Twelve Labs API Key from the environment variable.
from twelvelabs import TwelveLabs from dotenv import load_dotenv load_dotenv() API_KEY = os.getenv("API_KEY") client = TwelveLabs(api_key=API_KEY)
Specifying the desired engine for the generation task:
engines = [ { "name": "marengo2.6", "options": ["visual", "conversation", "text_in_video", "logo"] }, { "name": "pegasus1.1", "options": ["visual", "conversation"] } ]
Create a new index by calling
client.index
with the Index name and engine configuration parameters. Use a unique and identifiable name for the Index.
index = client.index.create( name="<YOUR_INDEX_NAME>", engines=engines ) print(f"A new index has been created: Index id={index.id} name={index.name} engines={index.engines}")
The index.id
field represents the unique identifier of your new index. This identifier is crucial for indexing videos in their correct location.
With these steps completed, you're now ready to dive in and develop the application!
Walkthrough for Video Highlight Generator
In this tutorial, we will build a Streamit application with a minimal frontend. Below is the directory structure:
. ├── app.py ├── requirements.txt ├── utils.py ├── .env └── .gitignore
1 - Creating the Streamlit Application
Now that you've completed all the above steps, it's time to build the Streamlit application. This app provides a simple way for you to upload a video, generate highlight chapters, and create segmented video clips. The application consists of two main files:
main.py: Contains the application flow with a minimal page layout
utils.py: Houses all the essential utility functions for the application
You can find the required dependencies to set up a virtual environment in the requirements.txt file.
To get started, create a virtual Python environment and configure it for the application:
pip install -r requirements.txt
2 - Setting up the utility function for the operation
In this section, we'll explore how to generate highlight chapters and handle longer videos for indexing. We'll also cover creating video segments from the highlighted chapters by applying the results to the video indexed in section 2.2.
2.1 - Generating the Highlight Chapter and Handling the Video Processing
First, we'll import the necessary libraries: moviepy.editor
, m3u8
, io
, urllib.parse
, yt_dlp
, and the Twelve Lab SDK. We'll also set up the API Key and Index ID environment variables. We'll discuss the importance of these libraries and their functions in detail.
import os import requests from moviepy.editor import VideoFileClip from twelvelabs import TwelveLabs from dotenv import load_dotenv import io import m3u8 from urllib.parse import urljoin import yt_dlp # Load environment variables load_dotenv() API_KEY = os.getenv("API_KEY") INDEX_ID = os.getenv("INDEX_ID") def seconds_to_mmss(seconds): minutes, seconds = divmod(int(seconds), 60) return f"{minutes:02d}:{seconds:02d}" def mmss_to_seconds(mmss): minutes, seconds = map(int, mmss.split(':')) return minutes * 60 + seconds def generate_timestamps(client, video_id, start_time=0): try: gist = client.generate.summarize(video_id=video_id, type="chapter") chapter_text = "\n".join([f"{seconds_to_mmss(chapter.start + start_time)}-{chapter.chapter_title}" for chapter in gist.chapters]) return chapter_text, gist.chapters[-1].start + start_time except Exception as e: raise Exception(f"An error occurred while generating timestamps: {str(e)}") # Utitily function to trim the video based on the time stamps def trim_video(input_path, output_path, start_time, end_time): with VideoFileClip(input_path) as video: new_video = video.subclip(start_time, end_time) new_video.write_videofile(output_path, codec="libx264", audio_codec="aac") # Based on the speicific Index_ID, fetching all the video_id def fetch_existing_videos(): url = f"https://api.twelvelabs.io/v1.2/indexes/{INDEX_ID}/videos?page=1&page_limit=10&sort_by=created_at&sort_option=desc" headers = {"accept": "application/json", "x-api-key": API_KEY, "Content-Type": "application/json"} response = requests.get(url, headers=headers) if response.status_code == 200: return response.json()['data'] else: raise Exception(f"Failed to fetch videos: {response.text}") # Utility function to retrieve the URL of the video with video_id def get_video_url(video_id): url = f"https://api.twelvelabs.io/v1.2/indexes/{INDEX_ID}/videos/{video_id}" headers = {"accept": "application/json", "x-api-key": API_KEY} response = requests.get(url, headers=headers) if response.status_code == 200: data = response.json() return data['hls']['video_url'] if 'hls' in data and 'video_url' in data['hls'] else None else: raise Exception(f"Failed to get video URL: {response.text}") # Utility function to handle and process the video clips larger than 30 mins def process_video(client, video_path, video_type): with VideoFileClip(video_path) as clip: duration = clip.duration if duration > 3600: raise Exception("Video duration exceeds 1 hour. Please upload a shorter video.") if video_type == "Basic Video (less than 30 mins)": task = client.task.create(index_id=INDEX_ID, file=video_path) task.wait_for_done(sleep_interval=5) if task.status == "ready": timestamps, _ = generate_timestamps(client, task.video_id) return timestamps, task.video_id else: raise Exception(f"Indexing failed with status {task.status}") elif video_type == "Podcast (30 mins to 1 hour)": trimmed_path = os.path.join(os.path.dirname(video_path), "trimmed_1.mp4") trim_video(video_path, trimmed_path, 0, 1800) task1 = client.task.create(index_id=INDEX_ID, file=trimmed_path) task1.wait_for_done(sleep_interval=5) os.remove(trimmed_path) if task1.status != "ready": raise Exception(f"Indexing failed with status {task1.status}") timestamps, end_time = generate_timestamps(client, task1.video_id) if duration > 1800: trimmed_path = os.path.join(os.path.dirname(video_path), "trimmed_2.mp4") trim_video(video_path, trimmed_path, 1800, int(duration)) task2 = client.task.create(index_id=INDEX_ID, file=trimmed_path) task2.wait_for_done(sleep_interval=5) os.remove(trimmed_path) if task2.status != "ready": raise Exception(f"Indexing failed with status {task2.status}") timestamps_2, _ = generate_timestamps(client, task2.video_id, start_time=end_time) timestamps += "\n" + timestamps_2 return timestamps, task1.video_id # Utility function to render the video on the UI def get_hls_player_html(video_url): return f""" <script src="https://cdn.jsdelivr.net/npm/hls.js@latest"></script> <style> #video-container {{ position: relative; width: 100%; padding-bottom: 56.25%; overflow: hidden; border-radius: 10px; box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1); }} #video {{ position: absolute; top: 0; left: 0; width: 100%; height: 100%; object-fit: contain; }} </style> <div id="video-container"> <video id="video" controls></video> </div> <script> var video = document.getElementById('video'); var videoSrc = "{video_url}"; if (Hls.isSupported()) {{ var hls = new Hls(); hls.loadSource(videoSrc); hls.attachMedia(video); hls.on(Hls.Events.MANIFEST_PARSED, function() {{ video.pause(); }}); }} else if (video.canPlayType('application/vnd.apple.mpegurl')) {{ video.src = videoSrc; video.addEventListener('loadedmetadata', function() {{ video.pause(); }}); }} </script> """
A. Input: Video Content Handling
This application heavily relies on the process_video
function. It supports two types of video content: short videos under 30 minutes and longer podcast-style videos up to an hour. For longer videos, further processing is carried out by splitting the content into manageable chunks using the trim_video
utility function, ensuring accurate analysis even for lengthy recordings.
B. Processing: Video Indexing, Analysis, and Chapter Generation
For short videos under 30 minutes, video indexing begins with Marengo 2.6 (Embedding Engine), which generates the video ID. It then interacts with the generate function in TwelveLabs SDK, Pegasus 1.1 (Generative Engine), to analyze the video content and generate timestamp-based chapters for the indexed video.
For longer videos, 30-minute chunks are created, indexed one by one, and then analyzed to generate timestamp-based chapters. The end timestamp of the first chunk becomes the start of the next chunk.
The get_video_url
function retrieves the video URL of the indexed video to render it in the application. The get_hls_player_html(video_url)
function is used to render the video.
C. Return Values: Highlights with Timestamps
The generation process produces timestamps in seconds. However, YouTube descriptions require highlights in minutes and seconds (mm:ss) format. Therefore, seconds_to_mmss
is used for conversion, while mmss_to_seconds
facilitates the conversion of minutes to seconds, which is then used to trim video clips based on chapters.
All user-uploaded videos are indexed by the Index ID created earlier. Users can access indexed videos using fetch_existing_videos()
. This provides the video ID, which is sent directly to the generative engine by calling generate_timestamp
.
2.2 - Retrieve Video from Index and Segment Based on Results
This section focuses on trimming video segments based on highlight timestamps. Utility functions work together to download, parse, and create segments based on timestamps.
# Utility function to download the indexed video with the url from video_id def download_video(url, output_filename): ydl_opts = { 'format': 'best', 'outtmpl': output_filename, } with yt_dlp.YoutubeDL(ydl_opts) as ydl: ydl.download([url]) # Utitily Function to Parse the Segment def parse_segments(segment_text): lines = segment_text.strip().split('\n') segments = [] for i, line in enumerate(lines): start, description = line.split('-', 1) start_time = mmss_to_seconds(start.strip()) if i < len(lines) - 1: end = lines[i+1].split('-')[0] end_time = mmss_to_seconds(end.strip()) else: end_time = None segments.append((start_time, end_time, description.strip())) return segments # Utiltiy function to create the video segment def create_video_segments(video_url, segment_info): full_video = "full_video.mp4" segments = parse_segments(segment_info) try: # Download the full video clip download_video(video_url, full_video) for i, (start_time, end_time, description) in enumerate(segments): output_file = f"{i+1:02d}_{description.replace(' ', '_').lower()}.mp4" trim_video(full_video, output_file, start_time, end_time) yield output_file, description os.remove(full_video) except yt_dlp.utils.DownloadError as e: raise Exception(f"An error occurred while downloading: {str(e)}") except Exception as e: raise Exception(f"An unexpected error occurred: {str(e)}") # Function to downlaod the video segments after the trimming is done def download_video_segment(video_id, start_time, end_time=None): video_url = get_video_url(video_id) if not video_url: raise Exception("Failed to get video URL") playlist = m3u8.load(video_url) start_seconds = mmss_to_seconds(start_time) end_seconds = mmss_to_seconds(end_time) if end_time else None total_duration = 0 segments_to_download = [] for segment in playlist.segments: if total_duration >= start_seconds and (end_seconds is None or total_duration < end_seconds): segments_to_download.append(segment) total_duration += segment.duration if end_seconds is not None and total_duration >= end_seconds: break buffer = io.BytesIO() for segment in segments_to_download: segment_url = urljoin(video_url, segment.uri) response = requests.get(segment_url) if response.status_code == 200: buffer.write(response.content) else: raise Exception(f"Failed to download segment: {segment_url}") buffer.seek(0) return buffer.getvalue()
The download_video
function uses the yt-dlp
library to download entire videos. To create multiple segments from a single video, the full video file is downloaded.
The parse_segments
function converts the generated timestamp information into a programmatically usable format. It breaks down chapter information into start times, end times, and descriptions, preparing for video segmentation.
create_video_segments
ties everything together. After downloading the entire video, it parses segment information to create individual video clips for each chapter. This function returns each segment along with its description.
Video segments can be downloaded using the download_video_segment
function. For creating chapter previews or extracting specific parts of longer videos, it uses the HLS (HTTP Live Streaming) protocol to download only the required video segments.
This process offers content creators the ability to not only generate timestamps but also create tangible video segments for use in their production workflows and editing process 🎬✂️️.
3 - Instruction Flow of the Streamlit Application
This section focuses on the main application function, designed for a minimal UI and streamlined instructions flow by utilizing key utility functions.
The application begins by configuring the Streamlit page and applying custom CSS for an attractive, themed interface. It then initializes session state variables to maintain data consistency across reruns. For the complete code, see app.py. Essential utility functions are discussed below.
# Uplaoding feature and the processing of the video def upload_and_process_video(): video_type = st.selectbox("Select video type:", ["Basic Video (less than 30 mins)", "Podcast (30 mins to 1 hour)"]) uploaded_file = st.file_uploader("Choose a video file", type=["mp4", "mov", "avi"]) if uploaded_file and st.button("Process Video", key="process_video_button"): with tempfile.NamedTemporaryFile(delete=False, suffix=".mp4") as tmp_file: tmp_file.write(uploaded_file.read()) video_path = tmp_file.name try: with st.spinner("Processing video..."): client = TwelveLabs(api_key=API_KEY) timestamps, video_id = process_video(client, video_path, video_type) st.success("Video processed successfully!") st.session_state.timestamps = timestamps st.session_state.video_id = video_id st.session_state.video_url = get_video_url(video_id) if st.session_state.video_url: st.video(st.session_state.video_url) else: st.error("Failed to retrieve video URL.") except Exception as e: st.error(str(e)) finally: os.unlink(video_path) # Selecting the existing video from the Index and generating timestamps highlight def select_existing_video(): try: existing_videos = fetch_existing_videos() video_options = {f"{video['metadata']['filename']} ({video['_id']})": video['_id'] for video in existing_videos} if video_options: selected_video = st.selectbox("Select a video:", list(video_options.keys())) video_id = video_options[selected_video] st.session_state.video_id = video_id st.session_state.video_url = get_video_url(video_id) if st.session_state.video_url: st.markdown(f"### Selected Video: {selected_video}") st.video(st.session_state.video_url) else: st.error("Failed to retrieve video URL.") if st.button("Generate Timestamps", key="generate_timestamps_button"): with st.spinner("Generating timestamps..."): client = TwelveLabs(api_key=API_KEY) timestamps, _ = generate_timestamps(client, video_id) st.session_state.timestamps = timestamps else: st.warning("No existing videos found in the index.") except Exception as e: st.error(str(e))
The main interface features two tabs: one for uploading new videos and another for selecting existing ones. The app handles errors gracefully and provides user feedback throughout. During lengthy video segmentation processes, progress bars and status messages keep users informed. A feature to clear all segments helps manage storage and allows users to start fresh.
upload_and_process_video()
: Handles video file uploads and processing. It manages both regular videos (under 30 minutes) and longer videos, converting longer ones into chunks before indexing. Finally, it employs the generative engine.select_existing_video()
: Allows users to choose from previously uploaded videos stored in the TwelveLabs index.
The application uses st.session_state
to store and retrieve crucial information such as video URLs, IDs, and generated timestamps. This approach enables data persistence across app re-runs, ensuring a seamless user experience for multi-step operations. By maintaining this state, the app can display processed videos, use them for further operations, and work with generated timestamps without requiring users to re-upload or reprocess data.
# Function to Display the Segment and also Download def display_segment(file_name, description, segment_index): if os.path.exists(file_name): st.write(f"### {description}") st.video(file_name) with open(file_name, "rb") as file: file_contents = file.read() unique_key = f"download_{segment_index}_{uuid.uuid4()}" st.download_button( label=f"Download: {description}", data=file_contents, file_name=file_name, mime="video/mp4", key=unique_key ) st.markdown("---") else: st.warning(f"File {file_name} not found. It may have been deleted or moved.") # Function to process the segment def process_and_display_segments(): if not st.session_state.video_url: st.error("Video URL not found. Please reprocess the video.") return segment_generator = create_video_segments(st.session_state.video_url, st.session_state.timestamps) progress_bar = st.progress(0) status_text = st.empty() st.session_state.video_segments = [] # Reset video segments total_segments = len(st.session_state.timestamps.split('\n')) for i, (file_name, description) in enumerate(segment_generator, 1): st.session_state.video_segments.append((file_name, description)) display_segment(file_name, description, i-1) # Pass the index here progress = i / total_segments progress_bar.progress(progress) status_text.text(f"Processing segment {i}/{total_segments}...") progress_bar.progress(1.0) status_text.text("All segments processed!") # Function to display the timestamps and the segments def display_timestamps_and_segments(): if st.session_state.timestamps: st.subheader("YouTube Chapter Timestamps") st.write("Copy the Timestamp description and add it to the Youtube Video Description") st.code(st.session_state.timestamps, language="") if st.button("Create Video Segments", key="create_segments_button"): try: process_and_display_segments() except Exception as e: st.error(f"Error creating video segments: {str(e)}") st.exception(e) # This will display the full traceback if st.session_state.video_segments: st.subheader("Video Segments") for index, (file_name, description) in enumerate(st.session_state.video_segments): display_segment(file_name, description, index) if st.button("Clear all segments", key="clear_segments_button"): for file_name, _ in st.session_state.video_segments: if os.path.exists(file_name): os.remove(file_name) st.session_state.video_segments = [] st.success("All segment files have been cleared.") st.experimental_rerun()
process_and_display_segments()
: Manages the creation and display of video segments based on generated timestamps.display_timestamps_and_segments()
: Presents generated timestamps in a copyable format and provides options to create and display video segments.display_segment()
: Renders individual video segments, complete with a download button for each.
The display_segment
function takes a file name, description, and segment index as inputs. It checks for the video file's existence, displays the segment's description, plays the video using Streamlit's st.video function, and provides a uniquely keyed download button. If the file is not found, it shows a warning message.
The process_and_display_segments
function creates and displays all video segments. It first checks for a video URL in the session state, showing an error message if absent. Otherwise, it uses create_video_segments
to generate segments based on provided timestamps, displaying a progress bar and status text during processing. Each segment is added to the session state and displayed using the display_segment
function.
The display_timestamps_and_segments
function integrates all components. It displays YouTube chapter timestamps if available and provides a button to create video segments. When clicked, this button triggers the process_and_display_segments
function. For existing segments, it displays them and offers an option to clear all segments, removing files and resetting the application state.
Below is the Demo Application Example:

As can be seen in the above demo, the video is uploaded and indexed to generate the highlight with timestamp, which can be easily configured to fit into a YouTube description. You can observe how the highlight is implemented in the next step of the demo, as well as how segments are generated from the highlight.

To explore Twelve Labs in diverse contexts, try applying the chapter use cases to various sectors such as editing, education, or any other area that piques your interest.
More Ideas to Experiment with the Tutorial
Understanding how an application works and its development process empowers you to implement innovative ideas and create products that meet users' needs. Here are some potential use cases for video content creators, similar to the one discussed in the tutorial:
📽️ YouTube Content Creators: Generate chapter highlight markers to enhance video navigation.
🎓 Educational Videos: Enable students to easily select specific sections of interest in long tutorial videos.
🎥 Content Segmentation: Effortlessly create video clips ready for upload.
Conclusion
This blog post provides a detailed explanation of the video highlight chapter generation process developed with Twelve Labs to cater to content creators' needs. Thank you for following along with the tutorial. We look forward to your ideas on improving the user experience and solving various challenges.
Additional Resources
Learn more about the engines used for the generation task, Marengo 2.6 (Embedding Engine) and Pegasus 1.1 (Generator Engine). To further explore Twelve Labs and enhance your understanding of video content analysis, check out these valuable resources:
Discord Community: Join our vibrant community of developers and enthusiasts to discuss ideas, ask questions, and share your projects. Join the Twelve Labs Discord
Sample Applications: Explore a variety of sample applications to inspire your next project or learn new implementation techniques.
Explore Tutorials: Dive deeper into Twelve Labs capabilities with our comprehensive tutorials.
We encourage you to leverage these resources to expand your knowledge and create innovative applications using Twelve Labs video understanding technology.