Youβre now subscribed to the Twelve Labs Newsletter! You'll be getting the latest news and updates in video understanding.
Oh no, something went wrong. Please try again.
Introduction
Video classification refers to the process of automatically assigning one or more predefined categories or labels to a video based on its content. This task involves analyzing the video's visual and sometimes audio information to recognize and understand the events, actions, objects, or other characteristics present in the video. Video classification is an important research area in computer vision and has numerous practical applications, such as video indexing, content-based video retrieval, video recommendation, video surveillance, and human activity recognition.
In the past, video classification was limited to predefined categories or labels, focusing on identifying events, actions, objects, and other features. Customizing classification criteria without retraining the model and updating criteria seemed like a distant dream. But, here's where Twelve Labs classification API enters the scene and saves the day by effortlessly and powerfully letting us classify videos based on our custom criteria, all in near real-time and without the fuss of training any models. Talk about a game-changer, yea!
β
Twelve Labs Classification API - Conceptual overview
Twelve Labs classification API is designed to label indexed videos based on the duration a class label occupies within each video. If the duration is less than 50%, the class label won't apply. Therefore, it's important to carefully design classes and their prompts, especially when uploading large videos. The API service can accommodate any number of classes, allowing you to add as many prompts within a class as you'd like.
For example, let's say you have a collection of hilarious videos featuring your dog, Bruno, and your cat, Karla, engaged in various antics. You can upload these videos to Twelve Labs' platform and create custom classification criteria tailored to the amusing escapades of your furry friends:
"classes": [
{
"name": "Doge_Bruno",
"prompts": [
"playing with my dog",
"my dog doing funny things",
"dog playing with water"
]
},
{
"name": "Kitty_Karla",
"prompts": [
"cat jumping",
"cat playing with toys"
]
}
]
β
With just one API call, you can classify your uploaded videos using the criteria you've created. If you happen to forget a few prompts or wish to introduce new classes, you can easily do so by adding more classes and prompts to your JSON. There's no need to train a new model or retrain an existing one, making the whole process hassle-free.
β
Quick Overview
Prerequisites: To smoothly navigate this tutorial, sign up for the Twelve Labs API suite and install the required packages. It's recommended to read the first and second tutorials to familiarize yourself with the basics π€.
Video Upload: Send your videos to the Twelve Labs platform, which effortlessly indexes them, enabling you to add custom classification criteria and manage your content on-the-fly! And guess what? You don't even need to train an ML model πππ.
Video Classification: Get ready for the real fun! We'll create our own custom classes and a range of prompts within each class. Once we've defined our criteria, we can use them right away to fetch the results. No delays, straight to the goodies! πΏβοΈπ₯³
Crafting a Demo App: We will create a Flask-based app to harness the results from the classification API, access videos stored in a local folder on our computer, and then render a custom-designed, sleek HTML page to stylishly showcase the classification results ππ»π¨.π¨βπ¨
β
Prerequisites
In the first tutorial, I covered the basics of using simple natural language queries to find specific moments within your videos. To keep things simple, I uploaded only one video to the platform and covered essential concepts such as creating an index, configuring the index, defining task API, basic monitoring of video indexing tasks, and step-by-step explanations of creating a Flask-based demo app.
The second tutorial went a step further, exploring the combination of multiple search queries to create more precise and targeted searches. I uploaded multiple videos asynchronously, created multiple indexes, implemented additional code for monitoring video indexing tasks and fetching details like estimated time for task completion. I also configured the Flask app to accommodate multiple videos and display them using an HTML template.
Continuing on this streak, the current tutorial will cover synchronous video uploads using Python's built-in concurrent.futures library. We will monitor the indexing statuses of the videos and record them to a CSV file. Additionally, we will surface the input classification criteria and relevant classification API response fields in the HTML template, making it easier to interpret the results.
If you encounter any difficulties while reading this or any of the previous tutorials, don't hesitate to reach out for help! We pride ourselves on providing quick support through our Discord server with response times faster than a speeding train π ποΈβ‘οΈ. Alternatively, you can also reach me via email. Twelve Labs is currently in Open Beta, so you can create a Twelve Labs account and access the API Dashboard to obtain your API key. With your free credits, you'll be able to classify up to 10 hours of your video content.
%env API_KEY=<your_API_key>
%env API_URL=https://api.twelvelabs.io/v1.1
!pip install requests
!pip install flask
import os
import requests
import glob
from pprint import pprint
#Retrieve the URL of the API and the API key
API_URL = os.getenv("API_URL")
assert API_URL
API_KEY = os.getenv("API_KEY")
assert API_KEY
β
Video Upload
Creating an index and configuring it for video upload:
# Construct the URL of the `/indexes` endpoint
INDEXES_URL = f"{API_URL}/indexes"
# Set the header of the request
default_header = {
"x-api-key": API_KEY
}
# Define a function to create an index with a given name
def create_index(index_name, index_options, engine):
# Declare a dictionary named data
data = {
"engine_id": engine,
"index_options": index_options,
"index_name": index_name,
}
# Create an index
response = requests.post(INDEXES_URL, headers=default_header, json=data)
# Store the unique identifier of your index
INDEX_ID = response.json().get('_id')
# Check if the status code is 201 and print success
if response.status_code == 201:
print(f"Status code: {response.status_code} - The request was successful and a new index was created.")
else:
print(f"Status code: {response.status_code}")
pprint(response.json())
return INDEX_ID
# Create the indexes
index_id_content_classification = create_index(index_name = "insta+tiktok", index_options=["visual", "conversation", "text_in_video", "logo"], engine = "marengo2.5")
# Print the created index IDs
print(f"Created index IDs: {index_id_content_classification}")
Status code: 201 - The request was successful and a new index was created.
{'_id': '64544b858b1dd6cde172af77'}
Created index IDs: 64544b858b1dd6cde172af77
β
Writing the upload function
This time I've cooked up the code that automatically scoops up all videos from a designated folder, assigns them the same name as their video file, and uploads them to the platform β all while strutting its stuff synchronously using a Python library. Just pop all the videos you want to index into a single folder, and you're good to go! The whole indexing process will take about 40% of the longest video's duration. Need to add more videos to the same index later? Easy peasy! No need for a new folder, just toss them into the existing one. The code's got your back, it checks for any indexed videos with the same name or pending indexing tasks before starting the process. This way, you'll dodge any pesky duplicates β pretty convenient, huh? π
import os
import requests
from concurrent.futures import ThreadPoolExecutor
TASKS_URL = f"{API_URL}/tasks"
TASK_ID_LIST = []
video_folder = 'classify' # folder containing the video files
INDEX_ID = '64544b858b1dd6cde172af77'
def upload_video(file_name):
# Validate if a video already exists in the index
task_list_response = requests.get(
TASKS_URL,
headers=default_header,
params={"index_id": INDEX_ID, "filename": file_name},
)
if "data" in task_list_response.json():
task_list = task_list_response.json()["data"]
if len(task_list) > 0:
if task_list[0]['status'] == 'ready':
print(f"Video '{file_name}' already exists in index {INDEX_ID}")
else:
print("task pending or validating")
return
# Proceed further to create a new task to index the current video if the video didn't exist in the index already
print("Entering task creation code for the file: ", file_name)
if file_name.endswith('.mp4'): # Make sure the file is an MP4 video
file_path = os.path.join(video_folder, file_name) # Get the full path of the video file
with open(file_path, "rb") as file_stream:
data = {
"index_id": INDEX_ID,
"language": "en"
}
file_param = [
("video_file", (file_name, file_stream, "application/octet-stream")),] #The video will be indexed on the platform using the same name as the video file itself.
response = requests.post(TASKS_URL, headers=default_header, data=data, files=file_param)
TASK_ID = response.json().get("_id")
TASK_ID_LIST.append(TASK_ID)
# Check if the status code is 201 and print success
if response.status_code == 201:
print(f"Status code: {response.status_code} - The request was successful and a new resource was created.")
else:
print(f"Status code: {response.status_code}")
print(f"File name: {file_name}")
pprint(response.json())
print("\n")
# Get list of video files
video_files = [f for f in os.listdir(video_folder) if f.endswith('.mp4')]
# Create a ThreadPoolExecutor
with ThreadPoolExecutor() as executor:
# Use executor to run upload_video in parallel for all video files
executor.map(upload_video, video_files)
β
Monitoring the indexing process
Similar to the upload function, I've designed a monitoring function that keeps track of all tasks happening concurrently. It diligently records the estimated time remaining and the upload percentage for each video being indexed simultaneously in a tidy CSV file. This attentive function continues to execute until every video in your folder has been indexed. To cap it off, it displays the total time taken for the synchronous indexing process, conveniently measured in seconds. Pretty efficient, right?
import time
import csv
from concurrent.futures import ThreadPoolExecutor, as_completed
def monitor_upload_status(task_id):
TASK_STATUS_URL = f"{API_URL}/tasks/{task_id}"
while True:
response = requests.get(TASK_STATUS_URL, headers=default_header)
STATUS = response.json().get("status")
if STATUS == "ready":
return task_id, STATUS
remain_seconds = response.json().get('process', {}).get('remain_seconds', None)
upload_percentage = response.json().get('process', {}).get('upload_percentage', None)
if remain_seconds is not None:
print(f"Task ID: {task_id}, Remaining seconds: {remain_seconds}, Upload Percentage: {upload_percentage}")
else:
print(f"Task ID: {task_id}, Status: {STATUS}")
time.sleep(10)
# Define starting time
start = time.time()
print("Starting to monitor...")
# Monitor the indexing process for all tasks
with ThreadPoolExecutor() as executor:
futures = {executor.submit(monitor_upload_status, task_id) for task_id in TASK_ID_LIST}
with open('upload_status.csv', 'w', newline='') as file:
writer = csv.writer(file)
writer.writerow(["Task ID", "Status"])
for future in as_completed(futures):
task_id, status = future.result()
writer.writerow([task_id, status])
# Define ending time
end = time.time()
print("Monitoring finished")
print("Time elapsed (in seconds): ", end - start)
Output:
Starting to monitor...
Monitoring finished
Time elapsed (in seconds): 253.00311
β
List all videos in the index:
To make sure we've got all the necessary videos indexed, let's do a thorough double-check by listing all the videos within the index. On top of that, we'll create a handy list containing all video IDs and their corresponding names. This list will come in useful later when we need to fetch the appropriate video names for video clips (those segments that match the classification criteria) returned by the classification API.
Just a heads-up, I've tweaked the page limit to 20, since we're dealing with 11 indexed videos. By default, the API returns 10 results per page, so if we don't update the limit, one sneaky result will slip onto page 2 and won't be included in the response_json we're using to create our video_id_name_list. So, let's keep it all on one page!
# List all the videos in an index
INDEX_ID='64544b858b1dd6cde172af77'
default_header = {
"x-api-key": API_KEY
}
# INDEX_ID='64502d238b1dd6cde172a9c5' #movies
# INDEX_ID= '64399bc25b65d57eaecafb35' #lex
INDEXES_VIDEOS_URL = f"{API_URL}/indexes/{INDEX_ID}/videos?page_limit=20"
response = requests.get(INDEXES_VIDEOS_URL, headers=default_header)
response_json = response.json()
pprint(response_json)
video_id_name_list = [{'video_id': video['_id'], 'video_name': video['metadata']['filename']} for video in response_json['data']]
pprint(video_id_name_list)
Before diving into the code, let's breeze through the theory behind it. Feel free to skim over and jump to the code if that's more your cup of tea. When it comes to classification, you can control how it works with the following parameters:
classes: An array of objects that outline the names and definitions of the entities or actions the platform needs to identify. Each object includes these fields:
name: A string that represents the name you want to assign to this class.
prompts: An array of strings that describe what the class contains. The platform relies on the values you provide here to classify your videos.
threshold: Utilize the threshold parameter to refine outcomes based on confidence levels aligned with the prompts outlined in your request. Ranging from 0 to 100, the default value of 75 applies when left unset. You can narrow the response you get by capturing only the most pertinent results using this parameter.
Letβs set our classification criteria and use the Twelve labs classify API to make a classification request, we will stick with the default threshold setting for this demo:
Class labels are assigned to the overall video according to the prompts present within the class. To pinpoint the appropriate video segments (clips relating to the prompts) and achieve precise video labeling, it's vital to supply numerous relevant prompts. Keep in mind, a class label is assigned only if the video duration matching the class label surpasses 50% of the video's total length, and this duration is established by combining video clips that align with the prompts.
Here's the outcome of the classification API call we executed. The "duration ratio" represents the proportion of video segments to the entire video length, "score" indicates the model's confidence, "name" refers to the class label, and all matched videos are showcased in descending order based on their confidence scores:
Now let's rewrite the same code and call the classification API, but with a small twist: we'll set include_clips to True. By doing this, we'll fetch all the relevant video clips along with their metadata that match the prompts provided within our classes:
To maintain succinctness, I've trimmed the output. Note how the output now displays the clip data for each video, including start and end timestamps, as well as the confidence score for the specific clip and its related prompt. We're diligently revamping the API endpoint to integrate the corresponding classification option tied to each prompt (e.g., visual and conversation, where visual represents an audio-visual match and conversation refers to a dialogue match).
Now it's time to store both the JSON results and serialize (pickle) them, along with the video_id_name_list we created earlier, into a file:
import pickle
with open('video_id_name_list.pickle', 'wb') as handle:
pickle.dump(video_id_name_list, handle, protocol=pickle.HIGHEST_PROTOCOL)
with open('duration_data.pickle', 'wb') as handle:
pickle.dump(duration_data, handle, protocol=pickle.HIGHEST_PROTOCOL)
with open('clips_data.pickle', 'wb') as handle:
pickle.dump(clips_data, handle, protocol=pickle.HIGHEST_PROTOCOL)
β
Crafting a demo app
As with our previous tutorials, we'll be crafting a Flask-based demo app that hosts a web page and makes use of the serialized data. By applying this data to the videos retrieved from our local drive, we'll create a visually appealing classification results web page. This way, we can experience firsthand how our video classification API can supercharge our applications and deliver impressive results.
In this tutorial, a slight variation is introduced on how to serve video files from a local directory and play specific segments using the HTML5 video player. The serve_video function employed in the Flask application serves video files from the classify_try directory, which is in the same directory as your Flask application script. The url_for('serve_video', filename=video_mapping[video.video_id]) expression in the HTML template generates the URL for the served video file.
As you may have noticed from the output of the classification API when we set "include_clips" to True, the API returned numerous clips along with their metadata. For simplicity's sake and to demonstrate the results that include these clips, I included a get_top_clips function. This function finds three unique prompts and returns all the clip metadata associated with them, giving a more comprehensive view of the classification results.
from flask import Flask, render_template, send_from_directory
import pickle
import os
from collections import defaultdict
app = Flask(__name__)
# Replace the following dictionaries with your data
with open('video_id_name_list.pickle', 'rb') as handle:
video_id_name_list = pickle.load(handle)
with open('duration_data.pickle', 'rb') as handle:
duration_data = pickle.load(handle)
with open('clips_data.pickle', 'rb') as handle:
clips_data = pickle.load(handle)
VIDEO_DIRECTORY = os.path.join(os.path.dirname(os.path.realpath(__file__)), "classify")
@app.route('/<path:filename>')
def serve_video(filename):
print(VIDEO_DIRECTORY, filename)
return send_from_directory(directory=VIDEO_DIRECTORY, path=filename)
def get_top_clips(clips_data, num_clips=3):
top_clips = defaultdict(list)
for video in clips_data['data']:
video_id = video['video_id']
unique_prompts = set()
for clip_class in video['classes']:
for clip in clip_class['clips']:
if clip['prompt'] not in unique_prompts and len(unique_prompts) < num_clips:
top_clips[video_id].append(clip)
unique_prompts.add(clip['prompt'])
return top_clips
@app.route('/')
def home():
video_id_name_dict = {video['video_id']: video['video_name'] for video in video_id_name_list}
top_clips = get_top_clips(clips_data)
return render_template('index.html', video_mapping=video_id_name_dict, duration_data=duration_data, top_clips=top_clips)
if __name__ == '__main__':
app.run(debug=True)
β
HTML template
Here's a sample Jinja2-based HTML template that integrates code within the HTML file to iterate using fields from the serialized data we prepared and passed earlier. This template fetches the required videos from the local drive and displays the results in response to our classification criteria:
Ready to take the fun to the next level? Here are some exciting ideas to try out:
Implement pagination and lazy loading to display all clips instead of picking three distinct prompts. This way, you could explore a broader range of classified clips and get a more comprehensive view of the video classification results.
Experiment with short, medium and long tail prompts, fine-tune thresholds, and share your analysis with fellow multimodal enthusiasts on our Discord channel.
β
Outro
Stay tuned for the forthcoming excitement! If you havenβt joined already I invite you to join our vibrant Discord community where you can connect with other like-minded individuals who are passionate about multimodal AI.
Learn how to build a semantic video search engine with the powerful integration of Twelve Labs' Embed API with ApertureDB for advanced semantic video search.
Whether you're looking to find the perfect berry-toned lipstick or just curious about spotting specific colors in your videos, this guide will help you leverage cutting-edge AI to do so effortlessly.
Leverage Twelve Labs Embed API and LanceDB to create AI applications that can process and analyze video content with unprecedented accuracy and efficiency.