Youβre now subscribed to the Twelve Labs Newsletter! You'll be getting the latest news and updates in video understanding.
Oh no, something went wrong. Please try again.
Introduction
Ever dreamed of winning a medal for organizing Olympic video footage? π₯ Well, now's your chance to stand on the podium!
The Olympics Video Clips Classification Application aims to streamline the tedious process of categorizing sports footage. Using Twelve Labs' Marengo 2.6 Embedding Model, this app swiftly classifies Olympic sports from video clips.
By analyzing on-screen text, conversations, and visuals, the application categorizes video clips with ease. This tutorial will guide you through creating a Streamlit application that revolutionizes how scholars, sports enthusiasts, and broadcasters interact with Olympic content. You'll learn to build an app that classifies videos based on user-defined categories. A detailed demonstration video of the application is provided below.
Find the repository for the notebooks and this application on GitHub.
The Streamlit application uses Python, HTML, and JavaScript.
Working of the Application
This section outlines the application flow for classifying Olympic video clips.
The application uses a classification search engine, which has various potential uses. In this task, we'll focus on classifying and retrieving Olympic sports video clips. The first crucial step is creating an index.
Create a New Index and select Marengo 2.6 (Embedding Engine), which is suitable for indexing and classification.
Once all sports videos are uploaded to this index, we can proceed.
The following section details how to create and implement an Index ID step by step. Here's an overview of the general application workflow:
Users have two options: they can either select the type of sport they want to view from a multi-select menu or add their own custom classes.
The selected class is then sent to the classify endpoint, along with the INDEX ID, to retrieve and classify all relevant video clips. Additional parameters, such as include_clips, are also sent to this endpoint.
The response includes the Video ID, class, confidence level, start and end times, thumbnail URL, and more.
To obtain the video URL associated with the resulting Video ID, we need to access the video info endpoint.
Choose the appropriate option for classification: Marengo 2.6 (Embedding Engine). This engine excels in video search and classification, providing a robust foundation for video understanding.
Upload sports video clips. If needed, use the sample clips provided for this tutorial: Sports Clips.
1 - Importing, Validating, and Setting Up the Client
We begin by importing the necessary libraries and setting up environment variables to create the correct environment. To get started:
import os
import requests
import glob
import requests
from twelvelabs import TwelveLabs
from twelvelabs.models.task import Task
import dotenv
API_KEY=os.getenv("Twelvelabs_API")
API_URL=os.getenv("API_URL")
INDEX_ID=os.getenv("INDEX_ID")
client = TwelveLabs(api_key=API_KEY)
# URL of the /indexes endpoint
INDEXES_URL = f"{API_URL}/indexes"
# Setting headers variables
default_header = {
"x-api-key": API_KEY
}
The dependencies have been imported, environment variables loaded correctly, and the API client initialized with the proper credentials. The base URL for accessing indexes and default headers for API requests have also been set up.
With this foundation in place, you're now ready to proceed with more specific classification tasks, such as creating indexes.
β
2 - Setting up the Classes
Below is the list of classes for classifying Olympic sports clips:
3 - Utility Function to Classify All Videos from a Particular Index
# Utility function
def print_page(page):
for data in page:
print(f"video_id={data.video_id}")
for cl in data.classes:
print(
f" name={cl.name} score={cl.score} duration_ratio={cl.duration_ratio} clips={cl.clips.model_dump_json(indent=2)}"
)
result = client.classify.index(
index_id=INDEX_ID,
options=["visual"],
classes=CLASSES,
include_clips=True
)
The client.classify.index() method is called to perform the classification. It uses the previously defined INDEX_ID, focuses on visual classification (options=["visual"]), and includes clip information in the results. The CLASSES parameter specifies the classes to be used for classification.
The print_page(result) function displays the classification results in a readable format. It iterates through the classification response, printing the video ID and detailed information about each classified class, including its name, score, duration ratio, and associated clips. This approach enables easy analysis of the classification output across all videos in the specified index.
β
4 - Function to Print the Result in a Structured and Clear Way
The print_classification_result() function displays TwelveLabs API classification results in a structured, readable format. It processes each video in the result set, presenting key information such as video ID, detected classes, and associated scores. For each class, it shows the overall score and duration ratio, followed by details of the top five clips, sorted by score in descending order. These clip details include start and end times, as well as the associated prompt in the CLASSES. The function also notes if there are additional clips beyond the top five.
def print_classification_result(result) -> None:
for video_data in result.data:
print(f"Video ID: {video_data.video_id}")
print("=" * 50)
for class_data in video_data.classes:
print(f" Class: {class_data.name}")
print(f" Score: {class_data.score:.2f}")
print(f" Duration Ratio: {class_data.duration_ratio:.2f}")
print(" Clips:")
sorted_clips = sorted(class_data.clips, key=lambda x: x.score, reverse=True)
for i, clip in enumerate(sorted_clips[:5], 1): # Print top 5 clips
print(f" {i}. Score: {clip.score:.2f}")
print(f" Start: {clip.start:.2f}s, End: {clip.end:.2f}s")
print(f" Prompt: {clip.prompt}")
if len(sorted_clips) > 5:
print(f" ... and {len(sorted_clips) - 5} more clips")
print("-" * 40)
print("\n")
print(f"Total results: {result.page_info.total_results}")
print(f"Page expires at: {result.page_info.page_expired_at}")
print(f"Next page token: {result.page_info.next_page_token}")
The sample output for the above result is -
Video ID: 66c9b03be53394f4aaed82c1
==================================================
Class: AquaticSports
Score: 96.08
Duration Ratio: 0.90
Clips:
1. Score: 85.74
Start: 19.30s, End: 42.00s
Prompt: water polo match
2. Score: 85.38
Start: 56.30s, End: 160.41s
Prompt: water polo match
3. Score: 85.25
Start: 19.30s, End: 124.07s
Prompt: synchronized swimming
4. Score: 85.13
Start: 0.00s, End: 24.83s
Prompt: swimming competition
5. Score: 85.08
Start: 124.10s, End: 160.41s
Prompt: synchronized swimming
... and 19 more clips
----------------------------------------
5 - Specific Class Categorization from All Videos in Index
This section focuses on the specific class categorization for "AquaticSports" across all videos in a given index. The process involves three main steps: defining a specific class with associated prompts, performing the classification, and displaying the results.
This targeted approach enables more accurate classification of videos containing aquatic sports content. The print_page() function serves as a utility tool, providing a straightforward output of the classification results.
CLASS = [
{
"name": "AquaticSports",
"prompts": [
"swimming competition",
"diving event",
"water polo match",
"synchronized swimming",
"open water swimming"
]
}
]
def print_page(page):
for data in page:
print(f"video_id={data.video_id}")
for cl in data.classes:
print(
f" name={cl.name} score={cl.score} duration_ratio={cl.duration_ratio} detailed_scores={cl.detailed_scores.model_dump_json(indent=2)}"
)
result = client.classify.index(
index_id=INDEX_ID,
options=["visual"],
classes=CLASS,
include_clips=True,
show_detailed_score=True
)
print_classification_result(result)
The sample output section of the above snippet would provide the videos that fall under the specific CLASS.
Video ID: 66c9b03be53394f4aaed82c1
==================================================
Class: AquaticSports
Score: 96.08
Duration Ratio: 0.90
Clips:
1. Score: 85.74
Start: 19.30s, End: 42.00s
Prompt: water polo match
2. Score: 85.38
Start: 56.30s, End: 160.41s
Prompt: water polo match
3. Score: 85.25
Start: 19.30s, End: 124.07s
Prompt: synchronized swimming
4. Score: 85.13
Start: 0.00s, End: 24.83s
Prompt: swimming competition
5. Score: 85.08
Start: 124.10s, End: 160.41s
Prompt: synchronized swimming
... and 19 more clips
----------------------------------------
Now that you know how to implement and interact with the classify endpoint using the Twelve Labs SDK, we can move forward with developing the full Streamlit application for users.
β
Creating the Streamlit Application
We chose to build the application on Streamlit due to its simplicity and rapid prototyping capabilities, making it ideal for the initial stages of development. Streamlit allows us to quickly create interactive web interfaces with minimal code, perfect for showcasing classification task results in a user-friendly manner. By leveraging the Twelve Labs SDK, we can create highly interactive web interfaces in minutes.
To get started with Streamlit, ensure you've set up your virtual environment and install Streamlit using the command:
These dependencies cover the core functionality needed for the Olympic video clip classification application.
The main app.py file contains the following functionality:
get_initial_classes(): Sets up predefined Olympic sports categories.
get_custom_classes() and add_custom_class(): Manage user-defined categories to provide more flexibility.
classify_videos(): Interacts with the TwelveLabs API to perform video classification based on selected categories.
get_video_urls(): Retrieves video URLs for classified content.
render_video(): Creates an embedded video player using HLS.js for smooth playback.
# Import Necessary Dependencies
import streamlit as st
from twelvelabs import TwelveLabs
import requests
import os
from dotenv import load_dotenv
load_dotenv()
# Get the API Key from the Dashboard - https://playground.twelvelabs.io/dashboard/api-key
API_KEY = os.getenv("API_KEY")
# Create the INDEX ID as specified in the README.md and get the INDEX_ID
INDEX_ID = os.getenv("INDEX_ID")
client = TwelveLabs(api_key=API_KEY)
# Background Setting of the Application
page_element = """
<style>
[data-testid="stAppViewContainer"] {
background-image: url("https://wallpapercave.com/wp/wp3589963.jpg");
background-size: cover;
}
[data-testid="stHeader"] {
background-color: rgba(0,0,0,0);
}
[data-testid="stToolbar"] {
right: 2rem;
background-image: url("");
background-size: cover;
}
</style>
"""
st.markdown(page_element, unsafe_allow_html=True)
# Classes to classify the video into, there are categories name and
# the prompts which specifc finds that factor to label that category
@st.cache_data
def get_initial_classes():
return [
{"name": "AquaticSports", "prompts": ["swimming competition", "diving event", "water polo match", "synchronized swimming", "open water swimming"]},
{"name": "AthleticEvents", "prompts": ["track and field", "marathon running", "long jump competition", "javelin throw", "high jump event"]},
{"name": "GymnasticsEvents", "prompts": ["artistic gymnastics", "rhythmic gymnastics", "trampoline gymnastics", "balance beam routine", "floor exercise performance"]},
{"name": "CombatSports", "prompts": ["boxing match", "judo competition", "wrestling bout", "taekwondo fight", "fencing duel"]},
{"name": "TeamSports", "prompts": ["basketball game", "volleyball match", "football (soccer) match", "handball game", "field hockey competition"]},
{"name": "CyclingSports", "prompts": ["road cycling race", "track cycling event", "mountain bike competition", "BMX racing", "cycling time trial"]},
{"name": "RacquetSports", "prompts": ["tennis match", "badminton game", "table tennis competition", "squash game", "tennis doubles match"]},
{"name": "RowingAndSailing", "prompts": ["rowing competition", "sailing race", "canoe sprint", "kayak event", "windsurfing competition"]}
]
# Session State for the custom classes
def get_custom_classes():
if 'custom_classes' not in st.session_state:
st.session_state.custom_classes = []
return st.session_state.custom_classes
# Utitlity Function to add the custom classes in app
def add_custom_class(name, prompts):
custom_classes = get_custom_classes()
custom_classes.append({"name": name, "prompts": prompts})
st.session_state.custom_classes = custom_classes
st.session_state.new_class_added = True
# Utitlity Function to classify all the videos in the specified Index
def classify_videos(selected_classes):
return client.classify.index(
index_id=INDEX_ID,
options=["visual"],
classes=selected_classes,
include_clips=True
)
# To get the video urls from the resultant video id
def get_video_urls(video_ids):
base_url = f"https://api.twelvelabs.io/v1.2/indexes/{INDEX_ID}/videos/{{}}"
headers = {"x-api-key": API_KEY, "Content-Type": "application/json"}
video_urls = {}
for video_id in video_ids:
try:
response = requests.get(base_url.format(video_id), headers=headers)
response.raise_for_status()
data = response.json()
if 'hls' in data and 'video_url' in data['hls']:
video_urls[video_id] = data['hls']['video_url']
else:
st.warning(f"No video URL found for video ID: {video_id}")
except requests.exceptions.RequestException as e:
st.error(f"Failed to get data for video ID: {video_id}. Error: {str(e)}")
return video_urls
# Utitlity Function to Render the Video by the resultant video url
def render_video(video_url):
hls_player = f"""
<script src="https://cdn.jsdelivr.net/npm/hls.js@latest"></script>
<div style="width: 100%; border-radius: 10px; overflow: hidden; box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);">
<video id="video" controls style="width: 100%; height: auto;"></video>
</div>
<script>
var video = document.getElementById('video');
var videoSrc = "{video_url}";
if (Hls.isSupported()) {{
var hls = new Hls();
hls.loadSource(videoSrc);
hls.attachMedia(video);
hls.on(Hls.Events.MANIFEST_PARSED, function() {{
video.pause();
}});
}}
else if (video.canPlayType('application/vnd.apple.mpegurl')) {{
video.src = videoSrc;
video.addEventListener('loadedmetadata', function() {{
video.pause();
}});
}}
</script>
"""
st.components.v1.html(hls_player, height=300)
The get_initial_classes() function returns a static list of predefined categories for Olympic sports. Streamlit caches this data, ensuring the function is executed only once and its result is saved in memory. Subsequent calls to this function return the cached result, avoiding repeated executions.
The get_initial_classes() function is cached, while get_custom_classes() is not. Custom classes, stored in Streamlit's session state, are expected to change during the application's runtime.
get_video_urls() takes a list of video IDs and makes API calls to TwelveLabs to retrieve corresponding HLS (HTTP Live Streaming) URLs, handling potential errors and missing URLs.
render_video() creates an HTML5 video player with HLS.js support, compatible with multiple browsers and devices. It falls back to native HLS if necessary.
In classify_videos(), user-selected classes are used to classify videos in the specified index using the TwelveLabs client. This function focuses on visual classification and includes clip information in the results.
# Main Function
def main():
# Basic Markdown Setup for the Application
st.markdown("""
<style>
.big-font {
font-size: 40px !important;
font-weight: bold;
color: #000000;
text-align: center;
margin-bottom: 30px;
}
.subheader {
font-size: 24px;
font-weight: bold;
color: #424242;
margin-top: 20px;
margin-bottom: 10px;
}
.stButton>button {
width: 100%;
}
.video-info {
background-color: #f0f0f0;
border-radius: 10px;
padding: 15px;
margin-bottom: 20px;
}
.custom-box {
background-color: #f9f9f9;
border-radius: 10px;
padding: 20px;
margin-bottom: 20px;
box-shadow: 0 2px 4px rgba(0,0,0,0.1);
}
.stTabs [data-baseweb="tab-list"] {
gap: 24px;
}
.stTabs [data-baseweb="tab"] {
height: 50px;
white-space: pre-wrap;
background-color: #f0f2f6;
border-radius: 4px 4px 0px 0px;
gap: 1px;
padding-top: 10px;
padding-bottom: 10px;
}
.stTabs [data-baseweb="tab-list"] button[aria-selected="true"] {
background-color: #e8eaed;
}
</style>
""", unsafe_allow_html=True)
st.markdown('<p class="big-font">Olympics Classification w/t Twelve Labs</p>', unsafe_allow_html=True)
# Updation of the classes
CLASSES = get_initial_classes() + get_custom_classes()
# Nav Tabs Creation
tab1, tab2 = st.tabs(["Select Classes", "Add Custom Class"])
with tab1:
st.markdown('<p class="subheader">Select Classes</p>', unsafe_allow_html=True)
with st.container():
class_names = [cls["name"] for cls in CLASSES]
# Multiselect option from the CLASSES
selected_classes = st.multiselect("Choose one or more Olympic sports categories:", class_names)
if st.button("Classify Videos", key="classify_button"):
if selected_classes:
with st.spinner("Classifying videos..."):
selected_classes_with_prompts = [cls for cls in CLASSES if cls["name"] in selected_classes]
res = classify_videos(selected_classes_with_prompts)
video_ids = [data.video_id for data in res.data]
# Retrieving the video urls from the resultant video which matches to the selected CLASSES
video_urls = get_video_urls(video_ids)
st.markdown('<p class="subheader">Classified Videos</p>', unsafe_allow_html=True)
# Iterating over to showcase the information for every resulatant video
for i, video_data in enumerate(res.data, 1):
video_id = video_data.video_id
video_url = video_urls.get(video_id, "URL not found")
st.markdown(f"### Video {i}")
st.markdown('<div class="video-info">', unsafe_allow_html=True)
st.markdown(f"**Video ID:** {video_id}")
for class_data in video_data.classes:
st.markdown(f"""
**Class:** {class_data.name}
- Score: {class_data.score:.2f}
- Duration Ratio: {class_data.duration_ratio:.2f}
""")
if video_url != "URL not found":
render_video(video_url)
else:
st.warning("Video URL not available. Unable to render video.")
st.markdown("---")
st.success(f"Total videos classified: {len(res.data)}")
else:
st.warning("Please select at least one class.")
st.markdown('</div>', unsafe_allow_html=True)
# Nav Tab for the addition of the Custom Classes to select from
with tab2:
st.markdown('<p class="subheader">Add Custom Class</p>', unsafe_allow_html=True)
with st.container():
custom_class_name = st.text_input("Enter custom class name")
custom_class_prompts = st.text_input("Enter custom class prompts (comma-separated)")
if st.button("Add Custom Class"):
if custom_class_name and custom_class_prompts:
add_custom_class(custom_class_name, custom_class_prompts.split(','))
st.success(f"Custom class '{custom_class_name}' added successfully!")
st.experimental_rerun()
else:
st.warning("Please enter both class name and prompts.")
st.markdown('</div>', unsafe_allow_html=True)
if st.session_state.get('new_class_added', False):
st.session_state.new_class_added = False
st.experimental_rerun()
if __name__ == "__main__":
main()
The main function of the application focuses heavily on CSS and layout, as well as the correct sequence of instructions. Users can add new classification categories through the "Add Custom Class" tab under the video classification tab. The "Select Classes" tab allows users to choose from predefined and custom categories.
When users click the "Classify Videos" button, the application:
Retrieves the selected classes.
Calls the TwelveLabs API to classify videos (classify_videos() function).
Fetches video URLs for the classified videos.
Displays the classification results and renders the videos.
The final application resultsβ
More Ideas to Experiment with the Tutorial
Understanding the working procedure and development of the application prepares you to implement your innovative ideas and make an impact in the world. Here are some use case ideas similar to the tutorial blog that you can build on:
π Video Search Engine: Create a searchable database of video content, allowing users to find specific scenes or topics within large video collections.
π₯ Security Footage Analyzer: Detect and categorize specific events or behaviors in security camera footage.
π Dance Move Classifier: Identify and categorize different dance styles or specific moves from dance videos.
β
Conclusion
This blog post aims to provide you with a detailed explanation of the working procedure and how the application is built using the classification task, leveraging Twelve Labs. Thank you for following along with the tutorial. We look forward to your ideas on improving the user experience and solving various problems.
Learn how to build a semantic video search engine with the powerful integration of Twelve Labs' Embed API with ApertureDB for advanced semantic video search.
Whether you're looking to find the perfect berry-toned lipstick or just curious about spotting specific colors in your videos, this guide will help you leverage cutting-edge AI to do so effortlessly.
Leverage Twelve Labs Embed API and LanceDB to create AI applications that can process and analyze video content with unprecedented accuracy and efficiency.