Youβre now subscribed to the Twelve Labs Newsletter! You'll be getting the latest news and updates in video understanding.
Oh no, something went wrong. Please try again.
Introduction
π¬ Tired of manually creating chapter timestamps for your YouTube videos? Imagine automatically generating engaging video highlights and saving hours of tedious work.
In this tutorial, we'll explore the YouTube Chapter Highlight Generator with Twelve Labs, a powerful tool that revolutionizes how content creators approach video highlights. This application tackles one of the most time-consuming aspects of video production: creating accurate and meaningful chapter timestamps with highlights.
Whether you're an established YouTuber or just starting your content creation journey, this tool will streamline your workflow. It automatically analyzes video content, generates precise timestamps with highlights, and even creates segmented video clips. The best part? Creators can use it for both short-form content and longer podcast-style videos. Let's dive into how this application works and how you can build it using the TwelveLabs Python SDK to suit your specific needs.
There are several things you should already be familiar with - Python, HTML, Markdown.
Working of the Application
This section outlines the application flow for developing chapter highlights for YouTube videos, which saves time and simplifies the process of adding highlights to YouTube content.
For podcast videos, the system manages and combines timestamps from different video chunks. Users can also retrieve existing videos from the Index by selecting a previously indexed video, fetching its URL, and generating highlight timestamps using the video ID.
Here's a stepwise breakdown of the process:
User Interface
The application features two main tabs for user interaction.
The first tab allows users to upload new videos for highlight generation.
The second tab enables users to fetch and view previously indexed videos.
Video Upload Options
Users can upload two types of video content.
Basic videos are those less than 30 minutes in duration, while podcast-style videos can be up to an hour long.
Processing Workflow
The system processes basic videos directly.
Podcast-style videos are first split into manageable chunks for efficient handling.
Highlight Generation
Once indexing is complete, the system generates a unique video ID.
This video ID is then used as a parameter with the generate.summarize function.
The Pegasus 1.1 Generative Engine creates highlight-based timestamps for the video.
Output
The final result is a set of timestamps marking key moments in the video.
These timestamps serve as chapter markers or highlights for easy navigation.
Video clip segmentation is accomplished using moviepy.editor, which accesses the video URL and highlights timestamps. To facilitate content creation, the video segments are generated in MP4 format.
Here's an overview of the application's components and their interactions:
User Interface - Manages user interactions and displays processed video segments.
Video Processing - Provides core functionality for video segmentation and processing using moviepy, including:
Segment Creator - Generates video segments.
Video Processor - Trims videos and parses segments.
Utilities - Offers helper functions for tasks such as fetching videos and generating timestamps.
API Integration - Interfaces with Twelve Labs services for indexing, including:
Task creation and management.
Generation of Gist objects containing highlight chapter information.
Now that we have a comprehensive understanding of the application's workflow for YouTube video creators, our next step is to prepare for the building process.
Create a new index by calling client.index with the Index name and engine configuration parameters. Use a unique and identifiable name for the Index.
index = client.index.create(
name="<YOUR_INDEX_NAME>",
engines=engines
)
print(f"A new index has been created: Index id={index.id} name={index.name} engines={index.engines}")
The index.id field represents the unique identifier of your new index. This identifier is crucial for indexing videos in their correct location.
With these steps completed, you're now ready to dive in and develop the application!
β
Walkthrough for Video Highlight Generator
In this tutorial, we will build a Streamit application with a minimal frontend. Below is the directory structure:
Now that you've completed all the above steps, it's time to build the Streamlit application. This app provides a simple way for you to upload a video, generate highlight chapters, and create segmented video clips. The application consists of two main files:
main.py: Contains the application flow with a minimal page layout
utils.py: Houses all the essential utility functions for the application
You can find the required dependencies to set up a virtual environment in the requirements.txt file.
To get started, create a virtual Python environment and configure it for the application:
pip install -r requirements.txt
β
2 - Setting up the utility function for the operation
In this section, we'll explore how to generate highlight chapters and handle longer videos for indexing. We'll also cover creating video segments from the highlighted chapters by applying the results to the video indexed in section 2.2.
β
2.1Β - Generating the Highlight Chapter and Handling the Video Processing
First, we'll import the necessary libraries: moviepy.editor, m3u8, io, urllib.parse, yt_dlp, and the Twelve Lab SDK. We'll also set up the API Key and Index ID environment variables. We'll discuss the importance of these libraries and their functions in detail.
import os
import requests
from moviepy.editor import VideoFileClip
from twelvelabs import TwelveLabs
from dotenv import load_dotenv
import io
import m3u8
from urllib.parse import urljoin
import yt_dlp
# Load environment variables
load_dotenv()
API_KEY = os.getenv("API_KEY")
INDEX_ID = os.getenv("INDEX_ID")
def seconds_to_mmss(seconds):
minutes, seconds = divmod(int(seconds), 60)
return f"{minutes:02d}:{seconds:02d}"
def mmss_to_seconds(mmss):
minutes, seconds = map(int, mmss.split(':'))
return minutes * 60 + seconds
def generate_timestamps(client, video_id, start_time=0):
try:
gist = client.generate.summarize(video_id=video_id, type="chapter")
chapter_text = "\n".join([f"{seconds_to_mmss(chapter.start + start_time)}-{chapter.chapter_title}" for chapter in gist.chapters])
return chapter_text, gist.chapters[-1].start + start_time
except Exception as e:
raise Exception(f"An error occurred while generating timestamps: {str(e)}")
# Utitily function to trim the video based on the time stamps
def trim_video(input_path, output_path, start_time, end_time):
with VideoFileClip(input_path) as video:
new_video = video.subclip(start_time, end_time)
new_video.write_videofile(output_path, codec="libx264", audio_codec="aac")
# Based on the speicific Index_ID, fetching all the video_id
def fetch_existing_videos():
url = f"https://api.twelvelabs.io/v1.2/indexes/{INDEX_ID}/videos?page=1&page_limit=10&sort_by=created_at&sort_option=desc"
headers = {"accept": "application/json", "x-api-key": API_KEY, "Content-Type": "application/json"}
response = requests.get(url, headers=headers)
if response.status_code == 200:
return response.json()['data']
else:
raise Exception(f"Failed to fetch videos: {response.text}")
# Utility function to retrieve the URL of the video with video_id
def get_video_url(video_id):
url = f"https://api.twelvelabs.io/v1.2/indexes/{INDEX_ID}/videos/{video_id}"
headers = {"accept": "application/json", "x-api-key": API_KEY}
response = requests.get(url, headers=headers)
if response.status_code == 200:
data = response.json()
return data['hls']['video_url'] if 'hls' in data and 'video_url' in data['hls'] else None
else:
raise Exception(f"Failed to get video URL: {response.text}")
# Utility function to handle and process the video clips larger than 30 mins
def process_video(client, video_path, video_type):
with VideoFileClip(video_path) as clip:
duration = clip.duration
if duration > 3600:
raise Exception("Video duration exceeds 1 hour. Please upload a shorter video.")
if video_type == "Basic Video (less than 30 mins)":
task = client.task.create(index_id=INDEX_ID, file=video_path)
task.wait_for_done(sleep_interval=5)
if task.status == "ready":
timestamps, _ = generate_timestamps(client, task.video_id)
return timestamps, task.video_id
else:
raise Exception(f"Indexing failed with status {task.status}")
elif video_type == "Podcast (30 mins to 1 hour)":
trimmed_path = os.path.join(os.path.dirname(video_path), "trimmed_1.mp4")
trim_video(video_path, trimmed_path, 0, 1800)
task1 = client.task.create(index_id=INDEX_ID, file=trimmed_path)
task1.wait_for_done(sleep_interval=5)
os.remove(trimmed_path)
if task1.status != "ready":
raise Exception(f"Indexing failed with status {task1.status}")
timestamps, end_time = generate_timestamps(client, task1.video_id)
if duration > 1800:
trimmed_path = os.path.join(os.path.dirname(video_path), "trimmed_2.mp4")
trim_video(video_path, trimmed_path, 1800, int(duration))
task2 = client.task.create(index_id=INDEX_ID, file=trimmed_path)
task2.wait_for_done(sleep_interval=5)
os.remove(trimmed_path)
if task2.status != "ready":
raise Exception(f"Indexing failed with status {task2.status}")
timestamps_2, _ = generate_timestamps(client, task2.video_id, start_time=end_time)
timestamps += "\n" + timestamps_2
return timestamps, task1.video_id
# Utility function to render the video on the UI
def get_hls_player_html(video_url):
return f"""
<script src="https://cdn.jsdelivr.net/npm/hls.js@latest"></script>
<style>
#video-container {{
position: relative;
width: 100%;
padding-bottom: 56.25%;
overflow: hidden;
border-radius: 10px;
box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
}}
#video {{
position: absolute;
top: 0;
left: 0;
width: 100%;
height: 100%;
object-fit: contain;
}}
</style>
<div id="video-container">
<video id="video" controls></video>
</div>
<script>
var video = document.getElementById('video');
var videoSrc = "{video_url}";
if (Hls.isSupported()) {{
var hls = new Hls();
hls.loadSource(videoSrc);
hls.attachMedia(video);
hls.on(Hls.Events.MANIFEST_PARSED, function() {{
video.pause();
}});
}}
else if (video.canPlayType('application/vnd.apple.mpegurl')) {{
video.src = videoSrc;
video.addEventListener('loadedmetadata', function() {{
video.pause();
}});
}}
</script>
"""
A. Input: Video Content Handling
This application heavily relies on the process_video function. It supports two types of video content: short videos under 30 minutes and longer podcast-style videos up to an hour. For longer videos, further processing is carried out by splitting the content into manageable chunks using the trim_video utility function, ensuring accurate analysis even for lengthy recordings.
B. Processing: Video Indexing, Analysis, and Chapter Generation
For short videos under 30 minutes, video indexing begins with Marengo 2.6 (Embedding Engine), which generates the video ID. It then interacts with the generate function in TwelveLabs SDK, Pegasus 1.1 (Generative Engine), to analyze the video content and generate timestamp-based chapters for the indexed video.
For longer videos, 30-minute chunks are created, indexed one by one, and then analyzed to generate timestamp-based chapters. The end timestamp of the first chunk becomes the start of the next chunk.
The get_video_url function retrieves the video URL of the indexed video to render it in the application. The get_hls_player_html(video_url) function is used to render the video.
C. Return Values: Highlights with Timestamps
The generation process produces timestamps in seconds. However, YouTube descriptions require highlights in minutes and seconds (mm:ss) format. Therefore, seconds_to_mmss is used for conversion, while mmss_to_seconds facilitates the conversion of minutes to seconds, which is then used to trim video clips based on chapters.
All user-uploaded videos are indexed by the Index ID created earlier. Users can access indexed videos using fetch_existing_videos(). This provides the video ID, which is sent directly to the generative engine by calling generate_timestamp.
β
2.2 - Retrieve Video from Index and Segment Based on Results
This section focuses on trimming video segments based on highlight timestamps. Utility functions work together to download, parse, and create segments based on timestamps.
# Utility function to download the indexed video with the url from video_id
def download_video(url, output_filename):
ydl_opts = {
'format': 'best',
'outtmpl': output_filename,
}
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
ydl.download([url])
# Utitily Function to Parse the Segment
def parse_segments(segment_text):
lines = segment_text.strip().split('\n')
segments = []
for i, line in enumerate(lines):
start, description = line.split('-', 1)
start_time = mmss_to_seconds(start.strip())
if i < len(lines) - 1:
end = lines[i+1].split('-')[0]
end_time = mmss_to_seconds(end.strip())
else:
end_time = None
segments.append((start_time, end_time, description.strip()))
return segments
# Utiltiy function to create the video segment
def create_video_segments(video_url, segment_info):
full_video = "full_video.mp4"
segments = parse_segments(segment_info)
try:
# Download the full video clip
download_video(video_url, full_video)
for i, (start_time, end_time, description) in enumerate(segments):
output_file = f"{i+1:02d}_{description.replace(' ', '_').lower()}.mp4"
trim_video(full_video, output_file, start_time, end_time)
yield output_file, description
os.remove(full_video)
except yt_dlp.utils.DownloadError as e:
raise Exception(f"An error occurred while downloading: {str(e)}")
except Exception as e:
raise Exception(f"An unexpected error occurred: {str(e)}")
# Function to downlaod the video segments after the trimming is done
def download_video_segment(video_id, start_time, end_time=None):
video_url = get_video_url(video_id)
if not video_url:
raise Exception("Failed to get video URL")
playlist = m3u8.load(video_url)
start_seconds = mmss_to_seconds(start_time)
end_seconds = mmss_to_seconds(end_time) if end_time else None
total_duration = 0
segments_to_download = []
for segment in playlist.segments:
if total_duration >= start_seconds and (end_seconds is None or total_duration < end_seconds):
segments_to_download.append(segment)
total_duration += segment.duration
if end_seconds is not None and total_duration >= end_seconds:
break
buffer = io.BytesIO()
for segment in segments_to_download:
segment_url = urljoin(video_url, segment.uri)
response = requests.get(segment_url)
if response.status_code == 200:
buffer.write(response.content)
else:
raise Exception(f"Failed to download segment: {segment_url}")
buffer.seek(0)
return buffer.getvalue()
The download_video function uses the yt-dlp library to download entire videos. To create multiple segments from a single video, the full video file is downloaded.
The parse_segments function converts the generated timestamp information into a programmatically usable format. It breaks down chapter information into start times, end times, and descriptions, preparing for video segmentation.
create_video_segments ties everything together. After downloading the entire video, it parses segment information to create individual video clips for each chapter. This function returns each segment along with its description.
Video segments can be downloaded using the download_video_segment function. For creating chapter previews or extracting specific parts of longer videos, it uses the HLS (HTTP Live Streaming) protocol to download only the required video segments.
This process offers content creators the ability to not only generate timestamps but also create tangible video segments for use in their production workflows and editing process π¬βοΈοΈ.
β
3 - Instruction Flow of the Streamlit Application
This section focuses on the main application function, designed for a minimal UI and streamlined instructions flow by utilizing key utility functions.
The application begins by configuring the Streamlit page and applying custom CSS for an attractive, themed interface. It then initializes session state variables to maintain data consistency across reruns. For the complete code, see app.py. Essential utility functions are discussed below.
# Uplaoding feature and the processing of the video
def upload_and_process_video():
video_type = st.selectbox("Select video type:", ["Basic Video (less than 30 mins)", "Podcast (30 mins to 1 hour)"])
uploaded_file = st.file_uploader("Choose a video file", type=["mp4", "mov", "avi"])
if uploaded_file and st.button("Process Video", key="process_video_button"):
with tempfile.NamedTemporaryFile(delete=False, suffix=".mp4") as tmp_file:
tmp_file.write(uploaded_file.read())
video_path = tmp_file.name
try:
with st.spinner("Processing video..."):
client = TwelveLabs(api_key=API_KEY)
timestamps, video_id = process_video(client, video_path, video_type)
st.success("Video processed successfully!")
st.session_state.timestamps = timestamps
st.session_state.video_id = video_id
st.session_state.video_url = get_video_url(video_id)
if st.session_state.video_url:
st.video(st.session_state.video_url)
else:
st.error("Failed to retrieve video URL.")
except Exception as e:
st.error(str(e))
finally:
os.unlink(video_path)
# Selecting the existing video from the Index and generating timestamps highlight
def select_existing_video():
try:
existing_videos = fetch_existing_videos()
video_options = {f"{video['metadata']['filename']} ({video['_id']})": video['_id'] for video in existing_videos}
if video_options:
selected_video = st.selectbox("Select a video:", list(video_options.keys()))
video_id = video_options[selected_video]
st.session_state.video_id = video_id
st.session_state.video_url = get_video_url(video_id)
if st.session_state.video_url:
st.markdown(f"### Selected Video: {selected_video}")
st.video(st.session_state.video_url)
else:
st.error("Failed to retrieve video URL.")
if st.button("Generate Timestamps", key="generate_timestamps_button"):
with st.spinner("Generating timestamps..."):
client = TwelveLabs(api_key=API_KEY)
timestamps, _ = generate_timestamps(client, video_id)
st.session_state.timestamps = timestamps
else:
st.warning("No existing videos found in the index.")
except Exception as e:
st.error(str(e))
The main interface features two tabs: one for uploading new videos and another for selecting existing ones. The app handles errors gracefully and provides user feedback throughout. During lengthy video segmentation processes, progress bars and status messages keep users informed. A feature to clear all segments helps manage storage and allows users to start fresh.
upload_and_process_video(): Handles video file uploads and processing. It manages both regular videos (under 30 minutes) and longer videos, converting longer ones into chunks before indexing. Finally, it employs the generative engine.
select_existing_video(): Allows users to choose from previously uploaded videos stored in the TwelveLabs index.
The application uses st.session_state to store and retrieve crucial information such as video URLs, IDs, and generated timestamps. This approach enables data persistence across app re-runs, ensuring a seamless user experience for multi-step operations. By maintaining this state, the app can display processed videos, use them for further operations, and work with generated timestamps without requiring users to re-upload or reprocess data.
# Function to Display the Segment and also Download
def display_segment(file_name, description, segment_index):
if os.path.exists(file_name):
st.write(f"### {description}")
st.video(file_name)
with open(file_name, "rb") as file:
file_contents = file.read()
unique_key = f"download_{segment_index}_{uuid.uuid4()}"
st.download_button(
label=f"Download: {description}",
data=file_contents,
file_name=file_name,
mime="video/mp4",
key=unique_key
)
st.markdown("---")
else:
st.warning(f"File {file_name} not found. It may have been deleted or moved.")
# Function to process the segment
def process_and_display_segments():
if not st.session_state.video_url:
st.error("Video URL not found. Please reprocess the video.")
return
segment_generator = create_video_segments(st.session_state.video_url, st.session_state.timestamps)
progress_bar = st.progress(0)
status_text = st.empty()
st.session_state.video_segments = [] # Reset video segments
total_segments = len(st.session_state.timestamps.split('\n'))
for i, (file_name, description) in enumerate(segment_generator, 1):
st.session_state.video_segments.append((file_name, description))
display_segment(file_name, description, i-1) # Pass the index here
progress = i / total_segments
progress_bar.progress(progress)
status_text.text(f"Processing segment {i}/{total_segments}...")
progress_bar.progress(1.0)
status_text.text("All segments processed!")
# Function to display the timestamps and the segments
def display_timestamps_and_segments():
if st.session_state.timestamps:
st.subheader("YouTube Chapter Timestamps")
st.write("Copy the Timestamp description and add it to the Youtube Video Description")
st.code(st.session_state.timestamps, language="")
if st.button("Create Video Segments", key="create_segments_button"):
try:
process_and_display_segments()
except Exception as e:
st.error(f"Error creating video segments: {str(e)}")
st.exception(e) # This will display the full traceback
if st.session_state.video_segments:
st.subheader("Video Segments")
for index, (file_name, description) in enumerate(st.session_state.video_segments):
display_segment(file_name, description, index)
if st.button("Clear all segments", key="clear_segments_button"):
for file_name, _ in st.session_state.video_segments:
if os.path.exists(file_name):
os.remove(file_name)
st.session_state.video_segments = []
st.success("All segment files have been cleared.")
st.experimental_rerun()
process_and_display_segments(): Manages the creation and display of video segments based on generated timestamps.
display_timestamps_and_segments(): Presents generated timestamps in a copyable format and provides options to create and display video segments.
display_segment(): Renders individual video segments, complete with a download button for each.
The display_segment function takes a file name, description, and segment index as inputs. It checks for the video file's existence, displays the segment's description, plays the video using Streamlit's st.video function, and provides a uniquely keyed download button. If the file is not found, it shows a warning message.
The process_and_display_segments function creates and displays all video segments. It first checks for a video URL in the session state, showing an error message if absent. Otherwise, it uses create_video_segments to generate segments based on provided timestamps, displaying a progress bar and status text during processing. Each segment is added to the session state and displayed using the display_segment function.
The display_timestamps_and_segments function integrates all components. It displays YouTube chapter timestamps if available and provides a button to create video segments. When clicked, this button triggers the process_and_display_segments function. For existing segments, it displays them and offers an option to clear all segments, removing files and resetting the application state.
Below is the Demo Application Example:
As can be seen in the above demo, the video is uploaded and indexed to generate the highlight with timestamp, which can be easily configured to fit into a YouTube description. You can observe how the highlight is implemented in the next step of the demo, as well as how segments are generated from the highlight.
To explore Twelve Labs in diverse contexts, try applying the chapter use cases to various sectors such as editing, education, or any other area that piques your interest.
β
More Ideas to Experiment with the Tutorial
Understanding how an application works and its development process empowers you to implement innovative ideas and create products that meet users' needs. Here are some potential use cases for video content creators, similar to the one discussed in the tutorial:
π½οΈ YouTube Content Creators: Generate chapter highlight markers to enhance video navigation.
π Educational Videos: Enable students to easily select specific sections of interest in long tutorial videos.
π₯ Content Segmentation: Effortlessly create video clips ready for upload.
β
Conclusion
This blog post provides a detailed explanation of the video highlight chapter generation process developed with Twelve Labs to cater to content creators' needs. Thank you for following along with the tutorial. We look forward to your ideas on improving the user experience and solving various challenges.
β
Additional Resources
Learn more about the engines used for the generation task, Marengo 2.6 (Embedding Engine) and Pegasus 1.1 (Generator Engine). To further explore Twelve Labs and enhance your understanding of video content analysis, check out these valuable resources:
Discord Community: Join our vibrant community of developers and enthusiasts to discuss ideas, ask questions, and share your projects. Join the Twelve Labs Discord
Sample Applications: Explore a variety of sample applications to inspire your next project or learn new implementation techniques.
Explore Tutorials: Dive deeper into Twelve Labs capabilities with our comprehensive tutorials.
We encourage you to leverage these resources to expand your knowledge and create innovative applications using Twelve Labs video understanding technology.
Learn how to build a semantic video search engine with the powerful integration of Twelve Labs' Embed API with ApertureDB for advanced semantic video search.
Whether you're looking to find the perfect berry-toned lipstick or just curious about spotting specific colors in your videos, this guide will help you leverage cutting-edge AI to do so effortlessly.
Leverage Twelve Labs Embed API and LanceDB to create AI applications that can process and analyze video content with unprecedented accuracy and efficiency.