You’re now subscribed to the Twelve Labs Newsletter! You'll be getting the latest news and updates in video understanding.
Oh no, something went wrong. Please try again.
Introduction
Have you ever wanted to pinpoint specific color shades in a video, perhaps to find a product or a particular moment that features your favorite hues? Recently, I received a personal color consulting service and discovered that berry shades suit me best.
As I combed through my collection of archived YouTube videos, I wished there was an easy way to identify products in those exact shades. Fortunately, with the power of Twelve Labs' image-to-video search technology, I was able to create an app that does just that.
In this tutorial, I'll walk you through how I built this "Shade Finder" app using the Twelve Labs API. Whether you're looking to find the perfect berry-toned lipstick or just curious about spotting specific colors in your videos, this guide will help you leverage cutting-edge AI to do so effortlessly. Let’s dive in!
Next, create an index and upload videos into this index. Once that's done, you're ready to dive into video searching!
The app was built with JavaScript and Node.
The repository containing all the files for this app is available on Github.
Table of Contents
The structure of the app is straightforward and easy to follow. At a high level, it consists of three main components: index.html, script.js, and server.js.
We'll begin with a quick overview of index.html, then dive into the flow for both the server and client sides, covering how to get videos, retrieve a single video, perform image-based searches, and paginate search results using a page token.
HTML
The index.html file acts as the skeleton of the app, providing the basic structure and layout. The server.js file is responsible for managing all the API calls to the Twelve Labs API via SDK, ensuring that the app can efficiently process and return the relevant data.The script.js file functions as the client-side logic, handling user interactions, making requests to the server, and performing the search operations.
Below is the body of index.html, which lays out the core components of the app:
An image carousel for displaying query images
A search button to initiate queries
A video list that presents videos from a given index
A search results section that displays the results after the search button is clicked
server.js is the file that manages all the api calls to the Twelve Labs API. There are four routes: get/paginate videos, get video, (image to video) search, and get search results by page token.
Four Requests for Twelve Labs API
💡Twelve Labs provides SDKs that enable you to integrate and utilize the platform within your application. In this app, the Javascript SDK (version 0.2.5) was used.
Set-ups
1 - Store your Twelve Labs API Key and Index Id in .env
Inside the backend folder, you will find a .env file with the keys commented out. Uncomment them and update the values.
.env
TWELVE_LABS_API_KEY=<YOUR API KEY>
TWELVE_LABS_INDEX_ID=<YOUR_INDEX_ID>
2 - Install and Import Twelve Labs SDK
First, Install the twelvelabs-js package.
yarn add twelvelabs-js # or npm i twelvelabs-js
Then, Import the required packages into your application. I’ve imported them in server.js (built with Node.js) as shown below.
To retrieve videos by page, you can use client.index.video.listPagination and pass in the index id along with the desired page. Optionally, you can include the pageLimit parameter to control how many videos are returned per page.
💡 Tip: Parameters available in the API documentation (for all requests) can be used within the Javascript SDK by converting them to camelCase.
After receiving the videos, I extract the id and metadata for later use and return them along with the pageInfo.
💡 Check out the Guides for details on paginating videos. The official example code is super helpful as well!
Similar to getting/paginating videos, you can get details of a single video using client.index.video.retrieve by passing in index id and video id.
After receiving the video, I extract only the necessary information - metadata, hls, and source - and return them. Specifically, I later use the video title from the metadata, thumbnailUrls from the HLS, and the url from the source.
Now comes the exciting part! You can perform an image-to-video search using the client.search.query method by passing in four required parameters: indexId, queryMediaFile, queryMediaType, and options.
Especially, to correctly pass in the queryMediaFile, a few steps are necessary:
Path Construction: First, you need to construct the full path to the image file. In this app, all the images are already stored in the images folder. So this is done using the current directory (__dirname), a relative path which is ../frontend/public/images for this app, and the image filename (imageSrc).
Existence Check: After constructing the path, you check if the image file exists at that location. If the file is not found, a 404 error response is returned to the client.
Read Stream Creation:If the file does exist, a readable stream is created from the image file. This stream is then sent efficiently to the Twelve Labs API.
In this app, I also included optional parameters such as threshold, pageLimit, and adjustConfidenceLevel. You can find the full list of parameters in the API documentation.
After receiving the videos, I extract the data and pageInfo for later use and return them to the client.
💡 Be sure to check out the Guides for more details on image query search. The official example code is very helpful as well!
Searching by page token is pretty straightforward. You can use client.search.byPageToken,passing in the pageToken you obtained from the previous request (Request 3). The response looks the same as what you received from the initial search request (Request 3).
/** Get search results of a specific page */
app.get(
"/search/:pageToken",
asyncHandler(async (req, res, next) => {
const { pageToken } = req.params;
let searchByPageResponse = await client.search.byPageToken(`${pageToken}`);
res.json({
searchResults: searchByPageResponse.data,
pageInfo: searchByPageResponse.pageInfo,
});
})
);
Client
With the server fully set up, including all the necessary API endpoints, we can now focus on the client-side code. This part of the application is responsible for making requests to the server and handling the data received.
Following the flow established on the server side, we’ll first explore how the app initially displays videos from an index. Afterward, we’ll dive into the functionality for searching videos and paginating the search results.
1 - Showing videos of an index
showVideos function
One of the first functions that run when the page renders is showVideos.
getVideos makes a request to the server to fetch videos for the specified page (as implemented in Request 1 of the Server section).
If videos are found, the function loops over each video, calling getVideo to fetch the details (implemented in Request 2 of the Server section).
The details are then cached using a simple caching mechanism to optimize subsequent requests.
Creating Video Containers
Once showVideos has the details for each video, it creates a video container based on these details, including the video URL, thumbnail URL, and video title.
Pagination Buttons
Finally, pagination buttons are created based on the total number of pages obtained from getVideos. Each button is set up with an event listener that calls showVideos for the respective page.
Once a user clicks the “Search” button, the handleSearchButtonClick function is executed to make the search request towards the server. Let’s take a look at how it works step by step.
First, the search button is toggled to false to disable it during the search process.
Next, the function resets nextPageToken to null and clears any existing content in the searchResultContainer and searchResultList. It also hides the videoListContainer and ensures the searchResultContainer is visible.
A loading spinner is then created and displayed to indicate that the search is in progress.
The search is executed by calling searchByImage, which makes the search request to the server.
After the search is complete, the loading spinner is removed, and the search results are displayed using showSearchResults, or a message is shown if no results are found.
Finally, the search button is toggled back to true to re-enable it, allowing the user to perform another search if desired.
3 - Show Search Results by Page Token
If there is more than one page of search results (i.e., if there is a nextPageToken), the createShowMoreButton function will execute to show the user a “Show More” button. This button retrieves and displays the next page of search results. Let's break down how it works, step by step.
First, any existing "Show More" button is removed to prevent duplicate buttons.
Next, the function creates a container and the "Show More" button to be displayed.
An event listener is added to the button to handle clicks: it shows a loading spinner, retrieves the next page of search results, removes the spinner, displays the new results, and updates the nextPageToken.
Finally, the "Show More" button and its container are appended to the searchResultContainer, making it visible to the user.
Conclusion
I hope this post has offered insights into Twelve Labs' recent image-to-video search API and its practical applications. Thank you for reading, and I look forward to seeing how you leverage these advancements in your own projects!
Learn how to build a semantic video search engine with the powerful integration of Twelve Labs' Embed API with ApertureDB for advanced semantic video search.
Leverage Twelve Labs Embed API and LanceDB to create AI applications that can process and analyze video content with unprecedented accuracy and efficiency.