Youβre now subscribed to the Twelve Labs Newsletter! You'll be getting the latest news and updates in video understanding.
Oh no, something went wrong. Please try again.
Premise
As a movie aficionado π¬πΏ and content creator π¨ποΈβοΈ, I have created my own Plex server to house my cherished film collection. Often, I like to use movie scenes as anecdotes to enhance my storytelling and create more engaging content. For example, when making a video about motivation, willpower, and overcoming odds, I might showcase relevant moments from a favorite anime, such as the exhilarating Super Saiyan transformation scenes from the Dragon Ball Super saga, or a workout and training scene from one of my favorite movies, like Never Back Down. Alternatively, directors or writers who are developing new movie scripts may want to analyze a set of similar movies to identify common themes or patterns, such as the number of comedy scenes, their duration, the occurrence of drift races, or the frequency of muscle cars being shown. Finding particular scenes within a vast array of movies or even within a single movie can be quite challenging, even for those with impressive memory skills. This is where video understanding comes to the rescue πβοΈ.
Twelve Labs Search API offers a flexible video search solution that enables users to write simple natural language queries and combine them ingeniously, helping to uncover the relevant video segments. For instance, one could craft a combined query to reveal the specific drift scenes where the lead actor drives a red Mitsubishi. Alternatively, users might search for the thrilling moment when their favorite Formula One car crosses the finish line victoriously πβοΈ.
Search result from the indexed Tokyo Drift movie for the combined query - 'drift' (search option: visual) AND 'Mitsubishi' (search option: logo) π
β
Introduction
In the first part of this tutorial series, we explored how to perform searches within videos using simple search, where we only used one query at a time in our search requests. To make the most of this follow-up tutorial, I highly recommend reviewing the previous one to understand the basics of the Twelve Labs Search API. Assuming you have a good grasp on the basics, this tutorial will introduce more advanced concepts. We'll dive into the combined queries feature offered by the Twelve Labs API, which enables us to flexibly and conveniently locate specific moments of interest within indexed videos. To showcase this, I will create two separate indexes: one for Formula One races, and another for a full-length well-known movie, βTokyo Driftβ from the Fast and the Furious franchise. I'll then demonstrate how to use various operators to combine search queries, allowing us to identify the intriguing moments we're looking for. With that said, let's proceed to the tutorial overview and concretely outline what you can expect to learn throughout this guide.
β
Quick overview
Prerequisites:Sign up for the Twelve Labs API suite and install the required packages to breeze through this tutorial. Don't forget to check out the first tutorial!
Video Upload: Send your videos over to the Twelve Labs platform and watch how it effortlessly indexes them, so you can throw in your complex combined queries and fetch those moments you're after!
Combined Queries: This is where the real excitement begins! Combined queries can be succinctly defined as two or more simple search queries merged together to form a single, unified query using operators such as "or", "and", "not", and "then". We'll briefly review the theoretical aspects of these operators, and then dive into some practical examples of using them to effectively combine two or more natural language queries, aiding us in finding top moments within the indexed videos that semantically match the combined query.
Crafting aDemo App: Craft a nifty Flask-based app to harness the results from the search API, access the videos stored in a local folder on your computer, and then render a custom-designed sleek HTML page to showcase the search results in style.
π‘ By the way, if you're reading this article and you're not a developer, fear not! I've included a link to a ready-made Jupyter notebook. You can easily tweak the queries and operators, then run the entire process to obtain the results π. Enjoy!
β
Prerequisites
The previous tutorial is the only prerequisite to this one. If you hit any snags while reading this or the previous one, don't hesitate to give me a shout for help! We have super quick response times π ποΈβ‘οΈ on our Discord server. If Discord isn't your vibe, feel free to reach out to me via email. After creating a Twelve Labs account, you can access the API Dashboard and obtain your API key. For the purpose of this demo, I'll be using my existing account:
!pip install requests
!pip install flask
import os
import requests
import glob
from pprint import pprint
# Retrieve the URL of the API and my API key
API_URL = os.getenv("API_URL")
assert API_URL
API_KEY = os.getenv("API_KEY")
assert API_KEY
β
Video upload
This is the initial step, where I'll create two indexes using our latest state-of-the-art video understanding engine, "Marengo 2.5," but with distinct indexing options. For the index focused on Formula One racing, besides visual and conversation, enabling text-in-video and logo options is beneficial because Formula One events are abundant with logos on vehicles, tracks, fences, and a significant amount of on-screen text during the award distribution. However, for the Tokyo Drift movie index, enabling text-in-video option may not provide any value. This is where the flexibility of creating indexes with different options comes into play. By customizing the indexing options to suit your specific needs, you can optimize the use of compute resources and ultimately save on costs.
# Construct the URL of the `/indexes` endpoint
INDEXES_URL = f"{API_URL}/indexes"
# Set the header of the request
default_header = {
"x-api-key": API_KEY
}
# Define a function to create an index with a given name
def create_index(index_name, index_options, engine):
# Declare a dictionary named data
data = {
"engine_id": engine,
"index_options": index_options,
"index_name": index_name,
}
# Create an index
response = requests.post(INDEXES_URL, headers=default_header, json=data)
# Store the unique identifier of your index
INDEX_ID = response.json().get('_id')
# Check if the status code is 201 and print success
if response.status_code == 201:
print(f"Status code: {response.status_code} - The request was successful and a new index was created.")
else:
print(f"Status code: {response.status_code}")
pprint(response.json())
return INDEX_ID
# Specify the names of the indexes
index_names = ["formula_one", "tokyo_drift"]
# Create the indexes
index_id_formula_one = create_index(index_name = "formula_one", index_options=["visual", "conversation", "text_in_video", "logo"], engine = "marengo2.5")
index_id_tokyo_drift = create_index(index_name = "tokyo_drift", index_options=["visual", "conversation", "logo"], engine = "marengo2.5")
# Print the created index IDs
print(f"Created index IDs: {index_id_formula_one}, {index_id_tokyo_drift}")
Output:
Status code: 201 - The request was successful and a new resource was created.
{'_id': '##38fb695b65d57eaecaf8##'}
Status code: 201 - The request was successful and a new resource was created.
{'_id': '##38fb695b65d57eaecaf8##'}
Created index IDs: ##38fb695b65d57eaecaf8##, ##38fb695b65d57eaecaf8##
β
Initiating video indexing tasks
I've set up the code to automatically take in all videos from a specific folder, assign them the same name as the video file itself, and upload them to the platform asynchronously. Just make sure to place all the videos you want to include in an index within a single folder. The total indexing time will be approximately 40% of the longest video's duration since, even if you upload videos using a 'for' loop asynchronously without creating a thread for parallel uploads, the system will index them synchronously (simultaneously). If you want to index more videos within the same index later, no problem! There's no need to create a new folder for new video files. Just add them to the existing folder, and the code will check if there's already an indexed video with the same name or a pending indexing task for a video with the same name before initiating indexing. This way, you can avoid duplicates β pretty cool, right? π
TASKS_URL = f"{API_URL}/tasks"
TASK_ID_LIST = []
video_folder = 'static' # folder containing the video files
INDEX_ID = index_id_tom # change this to the other index id while creating the index for lex fridman podcast videos
# INDEX_ID = '##38d9c4e4225d1c0eb1e8##'
# Iterate through all the video files in the folder
for file_name in os.listdir(video_folder):
# Validate if a video already exists in the index
task_list_response = requests.get(
TASKS_URL,
headers=default_header,
params={"index_id": INDEX_ID, "filename": file_name},
)
if "data" in task_list_response.json():
task_list = task_list_response.json()["data"]
if len(task_list) > 0:
if task_list[0]['status'] == 'ready':
print(f"Video '{file_name}' already exists in index {INDEX_ID}")
else:
print("task pending or validating")
continue
#Proceed further to create a new task to index the current video if the video didn't exist in the index already
print("Entering task creation code for the file: ", file_name)
if file_name.endswith('.mp4'): # Make sure the file is an MP4 video
file_path = os.path.join(video_folder, file_name) # Get the full path of the video file
with open(file_path, "rb") as file_stream:
data = {
"index_id": INDEX_ID,
"language": "en"
}
file_param = [
("video_file", (file_name, file_stream, "application/octet-stream")),] # The video will be indexed on the platform using the same name as the video file itself.
response = requests.post(TASKS_URL, headers=default_header, data=data, files=file_param)
TASK_ID = response.json().get("_id")
TASK_ID_LIST.append(TASK_ID)
# Check if the status code is 201 and print success
if response.status_code == 201:
print(f"Status code: {response.status_code} - The request was successful and a new resource was created.")
else:
print(f"Status code: {response.status_code}")
print(f"File name: {file_name}")
pprint(response.json())
print("\n")
Output:
Entering task creation code for the file: 20211113T190000Z.mp4
Status code: 201 - The request was successful and a new resource was created.
File name: 20211113T190000Z.mp4
{'_id': '6438fb6e5b65d57eaecaf8bc'}
Entering task creation code for the file: 20211113T193300Z.mp4
Status code: 201 - The request was successful and a new resource was created.
File name: 20211113T193300Z.mp4
{'_id': '##38fb755b65d57eaecaf8##'}
β
Monitoring the indexing process
I designed the monitoring function to display the estimated time remaining for the current video being indexed. Once the indexing task is complete, the monitoring process moves on to the next video indexing task, which is already in progress due to the system's parallel indexing approach. This continues until all the videos within your folder are indexed. Finally, the total time taken to perform this synchronous indexing is presented in seconds.
import time
def monitor_upload_status(task_id):
TASK_STATUS_URL = f"{API_URL}/tasks/{task_id}"
while True:
response = requests.get(TASK_STATUS_URL, headers=default_header)
STATUS = response.json().get("status")
if STATUS == "ready":
print(f"Task ID: {task_id}, Status code: {STATUS}")
break
remain_seconds = response.json().get('process', {}).get('remain_seconds', None)
upload_percentage = response.json().get('process', {}).get('upload_percentage', None)
if remain_seconds is not None:
print(f"Remaining seconds: {remain_seconds}, Upload Percentage: {upload_percentage}")
else:
print('.', end='')
time.sleep(10)
# Define starting time
start = time.time()
print("Starting to monitor...")
# Monitor the indexing process for all tasks
for task_id in TASK_ID_LIST:
print("Current Task being monitored: ", task_id)
monitor_upload_status(task_id)
# Define ending time
end = time.time()
print("Uploading finished")
print("Time elapsed (in seconds): ", end - start)
To ensure that all the required videos have been indexed, let's double-check by listing all the videos present within the index. Additionally, I'm creating a list containing all video IDs and their corresponding names, as this will later be used to fetch the corresponding video name for the video segment, which will be displayed along with the start and end timestamps.
# List all the videos in an index
INDEX_ID='##38d9c4e4225d1c0eb1e8##'
INDEXES_VIDEOS_URL = f"{API_URL}/indexes/{INDEX_ID}/videos"
response = requests.get(INDEXES_VIDEOS_URL, headers=default_header)
response_json = response.json()
pprint(response_json)
video_id_name_list = [{'video_id': video['_id'], 'video_name': video['metadata']['filename']} for video in response_json['data']]
print(video_id_name_list)
Once the system finishes indexing the videos and generating video embeddings, we will be ready to find the topmost semantically matching moments using the search API. We've already explored how to use simple queries in the previous tutorial; here, we will focus on formulating useful combined queries.
The search API enables constructing combined queries using the following operators:
AND: This operator represents the intersection of simple queries. For example, combining two simple queries, "a red car" and "a blue car," with the 'and' operator will fetch all the scenes where both red and blue cars are present.
OR: This operator is used for the union of simple queries. For our running example of two simple queries, "a red car" and "a blue car", combining them with the 'or' operator will fetch all the scenes where either a red or a blue car is present.
NOT: To use this operator, we will need to create a dictionary where the key is the $not string, and the value is a dictionary composed of two queries named origin and sub. The API will return video segments that match the origin query but do not match the sub query. For our existing example, using "a red car" as the origin and "a blue car" as the sub, the system will fetch segments where a red car is present, but a blue car is not. Note that both origin and sub queries can include any number of subqueries.
THEN: This operator can be used by creating a dictionary where the key is the $then string, and the value is an array of objects. Each object represents a subquery. The API will return results only when the order of the matching video fragments corresponds to the order of the subqueries. So, in the case of our example, video segments where a red car is seen followed by a blue car in that definitive sequence will be returned.
That was quite a bit of theory; now let's dive into the more exciting application aspect by performing our first search using a combined query. This combined query consists of two simple queries combined together using the βANDβ operator, with each query having different search options. The first query is to look for scenes that are semantically similar to the concept of "winning a trophy" in both audio and visuals, while the second query is to look for scenes containing text or a logo that reads "crypto.com." By combining these queries, we can find video segments that satisfy both criteria simultaneously:
# Construct the URL of the `/search` endpoint
SEARCH_URL = f"{API_URL}/search/"
# Declare a dictionary named `data`
data = {
"index_id": INDEX_ID,
"search_options": ["visual"],
"query": {
"$and": [
{
"text": "winning trophy",
"search_options": ["visual"]
},
{
"text": "crypto.com",
"search_options": ["text_in_video"]
}
]
}
}
# Make a search request
response = requests.post(SEARCH_URL, headers=default_header, json=data)
if response.status_code == 200:
print(f"Status code: {response.status_code} - Success")
else:
print(f"Status code: {response.status_code}")
pprint(response.json())
This part exhilarates me as it highlights the presence of intelligence. The model exhibits a human-like understanding of the video content. As you can see in the above screenshot, the system nails it by pinpointing the exact moments I wanted to extract.
β
Let's give it another shot by combining a set of simple queries to search specifically, but this time for the second index containing the entire Tokyo Drift Movie:
# Construct the URL of the `/search` endpoint
SEARCH_URL = f"{API_URL}/search/"
# Declare a dictionary named `data`
data = {
"index_id": INDEX_ID,
"search_options": ["visual"],
"query": {
"$and": [
{
"text": "drift",
"search_options": ["visual"]
},
{
"text": "mitsubishi",
"search_options": ["logo"]
}
]
}
}
# Make a search request
response = requests.post(SEARCH_URL, headers=default_header, json=data)
if response.status_code == 200:
print(f"Status code: {response.status_code} - Success")
else:
print(f"Status code: {response.status_code}")
pprint(response.json())
Bingo! Once again, the system pinpointed the perfect moment spot-on. The scene features Sean (Lucas Black), the lead actor, skillfully drifting a red Mitsubishi.
β
Let's prepare a python list that includes each video's ID, corresponding title, and its respective start and end timestamps. We'll pass this list to the Flask app in the next step, allowing us to display our search results on a webpage:
response_data = response.json()
# Extract unique video IDs
unique_video_ids = list(set([item['video_id'] for item in response_data['data']]))
# Create empty start and end instances lists for each video ID
video_start_end_lists = {video_id: {'starts': [], 'ends': []} for video_id in unique_video_ids}
def find_video_name(video_id, video_id_name_list):
for video in video_id_name_list:
if video['video_id'] == video_id:
return video['video_name']
return None
# Append start and end instances to their respective lists
for item in response_data['data']:
video_id = item['video_id']
video_start_end_lists[video_id]['starts'].append(item['start'])
video_start_end_lists[video_id]['ends'].append(item['end'])
for video_id, timestamps in video_start_end_lists.items():
video_name = find_video_name(video_id, video_id_name_list)
if video_name:
timestamps['video_name'] = video_name
else:
print(f"No video name found for ID '{video_id}'")
# Print the result
pprint(video_start_end_lists)
To save the list for later use in a Flask app, we can serialize (pickle) the list to a file:
import pickle
with open("lists.pkl", "wb") as f:
pickle.dump(video_start_end_lists, f)
β
Crafting ademo app
We've arrived at the final step, where we'll leverage the JSON responses we received to efficiently retrieve and display video segments without having to manually identify the start and end points. To achieve this, we'll host a web page that can utilize these timestamps and apply them to the videos retrieved from our local drive. As a result, we will have visually appealing video segments that match our search, all seamlessly displayed on our web page.
from flask import Flask, render_template
import pickle
with open("lists.pkl", "rb") as f:
video_start_end_lists = pickle.load(f)
app = Flask(__name__)
@app.route("/")
def index():
return render_template("index_local.html", video_start_end_lists=video_start_end_lists)
if __name__ == "__main__":
app.run(debug=True)
β
HTML template
Below is a sample Jinja2-based HTML template that incorporates code within the HTML file to iterate through the list we prepared earlier, fetch the required videos from the local drive, and display the results of our combined query:
Perfect! letβs just run the last cell of our Jupyter notebook to launch our Flask app:
%run app.py
β
You should see an output similar to the one below, which indicates that everything is going according to our expectations π:
β
Once you click on the URL link http://127.0.0.1:5000, depending upon your combined search query, the output will be as follows:
Here's the Jupyter Notebook containing the complete code that we've put together throughout this tutorial - https://tinyurl.com/combinedQueries
β
Fun activities for you to explore
Experiment with various permutations and combinations of search options and operators, and share your analysis with fellow multimodal enthusiasts on our Discord channel.
Experiment with variations in the wording of the simple queries and verify whether the results remain consistent or differ. Explore how subtle changes in language within the queriesβwhen combined togetherβaffect the search outcome and the accuracy of matching video segments.
Showcase your developer skills by enhancing the code! Develop a mechanism to upload all the videos at once in parallel (synchronous upload) and modify the code to monitor the indexing process accordingly.
β
Upcoming post
In the upcoming post, we'll dive into the Classification API and will develop classification criteria on the fly to effectively classify a set of videos. Stay tuned for the forthcoming excitement and don't forget to join our Discord community to engage with other like-minded individuals who are passionate about multimodal foundation models.
Learn how to build a semantic video search engine with the powerful integration of Twelve Labs' Embed API with ApertureDB for advanced semantic video search.
Whether you're looking to find the perfect berry-toned lipstick or just curious about spotting specific colors in your videos, this guide will help you leverage cutting-edge AI to do so effortlessly.