Tutorial
Tutorial
Tutorial
Building a Video Multilingual Transcriber App with Twelve Labs


Hrishikesh Yadav
Hrishikesh Yadav
Hrishikesh Yadav
The MultiLingual Video Transcriber leverages the power of Twelve Labs models to automatically transcribe videos in multiple languages. Whether you're a content creator, educator, or professional working with international audiences, this tool will save time and improve accessibility by accurately converting spoken language into text across various languages.
The MultiLingual Video Transcriber leverages the power of Twelve Labs models to automatically transcribe videos in multiple languages. Whether you're a content creator, educator, or professional working with international audiences, this tool will save time and improve accessibility by accurately converting spoken language into text across various languages.


Join our newsletter
Receive the latest advancements, tutorials, and industry insights in video understanding
Nov 19, 2024
Nov 19, 2024
Nov 19, 2024
10 Min
10 Min
10 Min
Copy link to article
Copy link to article
Copy link to article
Introduction
Are you struggling to understand video content in different languages? Or perhaps finding it difficult to make your content accessible to a global audience? 🌍
In this tutorial, we'll introduce you to the MultiLingual Video Transcriber Application and explain how it was developed as a solution. This application uses AI models from Twelve Labs to understand videos and provide seamless transcription across multiple languages.
What sets this program apart is its ability to adjust transcriptions based on user-selected proficiency levels—beginner, intermediate, and advanced. Users get transcriptions or translations tailored to their chosen level. Additionally, the application provides accurate timestamps, allowing users to track spoken words with their transcriptions. This feature enables easy navigation and comprehension of the content. Let's explore how this application works and how you can build similar solutions using the TwelveLabs Python SDK.
You can explore the demo of the application here: Video Multilingual Transcriber
If you want to access the code and experiment with the app directly, you can use this Replit Template.
Prerequisites
Generate an API key by signing up at the Twelve Labs Playground.
Find the repository for this application on Video Multilingual Transcriber
There are several things you should already be familiar with - Flask, HTML, CSS, JavaScript.
Working of the Application
This section outlines the application flow for developing the MultiLingual Video Transcriber Application. The process involves obtaining a video and generating transcriptions in various languages and proficiency levels based on user preferences. This application goes beyond simple transcription, offering a more comprehensive solution.
The system architecture comprises four main components: the Frontend Layer, Backend Layer, Storage Layer, and TwelveLabs Service. Here's how it works: users upload a video (potentially in a foreign language) and select their desired transcript language and proficiency level (beginner, intermediate, or advanced).

Upon clicking the submit button after video upload, an Index ID is generated and stored in the session state for future use. Next, the system creates a Task ID by uploading the video, which then yields a Video ID once indexing is complete. The indexing process utilizes Marengo 2.6 as the embedding engine, while the Pegasus 1.1 Engine (Generative Engine) handles transcript generation through the Generate API.
To enhance user accessibility, the application generates timestamps alongside the transcript. This feature allows for interactive synchronization between the transcript and the video playback.
Preparation Steps
Obtain your API Key from the Twelve Labs Playground and prepare the environment variable.
Clone the project from Github or use the Replit Template
Set up the
.env
file with your API Key, along with the main file.
API_KEY=your_api_key_here
With these steps completed, you're now ready to dive in and develop the application!
Walkthrough for VidScribe - Video Multilingual Transcriber
In this tutorial, we'll build a Flask application with a minimal frontend. Here's the directory structure:
. ├── app.py ├── requirements.txt ├── static │ ├── style.css │ └── main.js ├── templates │ └── index.html └── uploads
Creating the Flask Application
Now that you've completed the previous steps, it's time to build the Flask application. This will provide a simple way for you to upload videos and generate transcripts in various languages based on user preferences.
You can find the required dependencies to prepare and create a virtual environment here: requirements.txt
Create a virtual Python environment, then set up the environment for the application using the following command:
pip install -r
1 - Setting up the Main Application
This section focuses on the main application utility function, which contains the crucial logic and instruction flow. We'll break down the main application into various sections:
Creation of the Index
Generation of the Result
Uploading Component
1.1 - Creation of the Index
Here, we'll discuss how the index is configured using the Twelve Labs SDK. This Flask application allows users to upload and process videos for various purposes. It employs a secure filename handling system and session management to ensure reliable operation in a production environment.
# Importing of the necesary modules from flask import Flask, render_template, request, jsonify, send_from_directory, session from werkzeug.utils import secure_filename import os import uuid from twelvelabs import TwelveLabs from twelvelabs.models.task import Task from dotenv import load_dotenv load_dotenv() # Loading the Twelve Labs API from the environment API_KEY = os.getenv("API_KEY") app = Flask(__name__) app.secret_key = os.urandom(24) UPLOAD_FOLDER = 'uploads' ALLOWED_EXTENSIONS = {'mp4', 'avi', 'mov'} app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER app.config['MAX_CONTENT_LENGTH'] = 100 * 1024 * 1024 # Intiailizing the client of the Twelve Labs SDK client = TwelveLabs(api_key=API_KEY) def allowed_file(filename): return '.' in filename and filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS @app.route('/') def index(): return render_template('index.html') # Utility function to check the status def on_task_update(task: Task): print(f"Status={task.status}") def process_video(filepath, language, difficulty): try: if 'index_id' not in session: # Index Name index_name = f"Translate{uuid.uuid4().hex[:8]}" # Defining Engine engines = [ { "name": "pegasus1.1", "options": ["visual", "conversation"] }, { "name": "marengo2.6", "options": ["visual", "conversation", "text_in_video", "logo"] } ] # Creation of Index with the config index = client.index.create( name=index_name, engines=engines ) # Storing the Index ID in the session to use it again session['index_id'] = index.id print(f"Created new index with ID: {index.id}") else: print(f"Using existing index with ID: {session['index_id']}") # Creation of the task task = client.task.create(index_id=session['index_id'], file=filepath) task.wait_for_done(sleep_interval=5, callback=on_task_update) if task.status != "ready": raise RuntimeError(f"Indexing failed with status {task.status}") print(f"The unique identifier of your video is {task.video_id}.")
We manage the session state to handle the creation of the Index ID robustly. A UUID creates a random number appended to the Index name, ensuring a unique index name for each new session. This new Index ID is saved in the session for future reference.
The index configuration defines two engines: Marengo 2.6 (Embedding Engine) for indexing the video, and Pegasus 1.1 (Generative Engine) for accessing the indexed video and generating content based on open-ended prompts or other parameters.
To create a task, we provide the Index ID and video file path. We can track the task's processing status using task.status. Once indexing is complete, the resulting Video ID is used in the next step.
1.2 - Generation of the Result
This section covers the generation of text from the indexed video based on user-selected difficulty levels and language preferences. Our prompt engineering system adapts the video transcript's complexity and detail level to match the user's comprehension level while maintaining accurate timestamps and translations.
# Updated prompt with difficulty level difficulty_prompts = { "beginner": "Provide a simplified and easy-to-understand", "intermediate": "Provide a moderately detailed", "advanced": "Provide a comprehensive and detailed" } # To store the difficulty level from the user form base_prompt = difficulty_prompts.get(difficulty, difficulty_prompts["intermediate"]) # Open Ended Prompt for Generation prompt = f"Provide the Only Transcript in the Translated {language.capitalize()} Language, {base_prompt} level with the timestamp duration (in the format of ss : ss) of the Indexed Video Content." res = client.generate.text(video_id=task.video_id, prompt=prompt, temperature=0.25) print(res) return { 'status': 'ready', 'message': 'File processed successfully', 'transcript': res.data, 'video_path': f'/uploads/{os.path.basename(filepath)}' } except Exception as e: print(f"Error processing video: {str(e)}") return {'status': 'error', 'message': str(e)}
We use a dictionary-based approach to map difficulty levels to appropriate prompt templates, with a default fallback to the intermediate level. The system maintains a low temperature value of 0.25 to ensure consistent and reliable output generation. The response includes the processed transcript with timestamps.
1.3 - Uploading Component
In this section, we'll explore how we generate and manage secure file upload endpoints and handle user file submissions in our Flask application. This component manages the file upload process, validates file types, handles language and difficulty preferences, and ensures secure storage of uploaded videos while maintaining proper error handling throughout the workflow.
# Route to handle file uploads - only accepts POST requests @app.route('/upload', methods=['POST']) def upload_file(): # Check if a file was included in the request if 'file' not in request.files: return jsonify({'status': 'error', 'message': 'No file part'}), 400 # Get the file from the request file = request.files['file'] # Extract language preference from form data, default to German if not specified language = request.form.get('language', 'german') difficulty = request.form.get('difficulty', 'intermediate') # Validate that a file was actually selected if file.filename == '': return jsonify({'status': 'error', 'message': 'No selected file'}), 400 # Check if file extension is allowed and proceed with processing if file and allowed_file(file.filename): filename = secure_filename(file.filename) filepath = os.path.join(app.config['UPLOAD_FOLDER'], filename) file.save(filepath) # Process the video with specified language and difficulty result = process_video(filepath, language, difficulty) # Return the processing result as JSON return jsonify(result) # Return error if file type is not allowed return jsonify({'status': 'error', 'message': 'File type not allowed'}), 400 # To serve uploaded files @app.route('/uploads/<filename>') def uploaded_file(filename): return send_from_directory(app.config['UPLOAD_FOLDER'], filename)
We provide an upload endpoint that processes incoming video files and their associated metadata. The upload route uses Werkzeug's secure_filename
utility for secure filename handling and implements robust error checking with appropriate HTTP status codes. If no language and difficulty preferences are set, the system defaults to German language and intermediate difficulty. The file serving route allows secure access to uploaded files from a designated directory, ensuring proper access control and file system security.
2 - Handling the component with JavaScript
This section explores key handling utility functions found in the app.js file. This JavaScript file manages file uploads, form submissions, video playback synchronization with time stamped transcripts, error handling, and other application parameters. We'll break down the main.js application into two sections:
File Upload and Validation
Transcript Processing
2.1 - File Upload and Validation
This section provides an overview of a comprehensive client-side file validation and submission system. It implements essential functionality for validating video files, handling file removals, and managing asynchronous form submissions throughout the upload and processing workflow.
// Validates the uploaded File function validateFile(file) { const validTypes = ['video/mp4', 'video/avi', 'video/quicktime']; const maxSize = 100 * 1024 * 1024; // 100MB if (!validTypes.includes(file.type)) { updateStatus('Please select a valid video file (MP4, AVI, or MOV)', 'error'); return false; } if (file.size > maxSize) { updateStatus('File size must be less than 100MB', 'error'); return false; } return true; } // Utiltiy function to handle the removal of a selected file function handleFileRemove(e) { e.preventDefault(); fileInput.value = ''; selectedFile.classList.add('hidden'); uploadPrompt.textContent = 'Choose a video or drag it here'; updateStatus('', ''); } // Form submission handling async function handleFormSubmit(e) { e.preventDefault(); const formData = new FormData(e.target); if (!fileInput.files || !fileInput.files[0]) { updateStatus('Please select a file first', 'error'); return; } const loadingOverlay = document.getElementById('loading-overlay'); loadingOverlay.classList.remove('hidden'); updateStatus('Uploading file and processing...', 'loading'); // Hide the previous results hideResult(); try { // Send POST request to upload endpoint with form const response = await fetch('/upload', { method: 'POST', body: formData }); // Parse the JSON response const data = await response.json(); console.log('Server response:', data); // Check if upload and processing were successful if (response.ok && data.status === 'ready') { // Update status and display results updateStatus('Processing complete!', 'success'); showResult(); displayTranscript(data.transcript); displayVideo(data.video_path); } else { // Throw error if processing failed throw new Error(data.message || 'An error occurred during processing'); } } catch (error) { // log and display of errors console.error('Error:', error); updateStatus(`Error: ${error.message}`, 'error'); } finally { // Hide loading overlay regardless of outcome loadingOverlay.classList.add('hidden'); } }
The validateFile
function ensures only correct video formats and file sizes are accepted, while handleFileRemove
resets the file selection state upon removal. The handleFormSubmit
function enables asynchronous uploads, managing errors, filling states, and user feedback comprehensively.
The process features appropriate UI updates—including loading overlays, success messages, and error notifications—at various stages. To maintain a smooth user experience, the implementation ensures consistent error handling and validation throughout the uploading and processing phases.
2.2 - Utility for Transcript Parsing for Open Ended Prompt Result
This section focuses on developing a robust JavaScript transcript parsing system. The code handles various timestamp formats and text patterns common in video transcripts, ensuring reliable extraction and structuring of time-coded content.
The parser converts raw transcript data into a format suitable for display and synchronization with video playback. It also manages multiple edge cases and format variations that may arise from the transcription process.
// Parses a transcript string into structured data with timestamps and text function parseTranscript(transcript) { console.log('Raw transcript:', transcript); // Initialize Map to store unique entries (prevents duplicates) const entries = new Map(); if (!transcript) { console.error('Empty transcript received'); return []; } // Extract data from JSON response and handle escapes let transcriptText = transcript; try { if (typeof transcript === 'string' && (transcript.includes('"id":') || transcript.includes("'id':"))) { const dataMatch = transcript.match(/['"]data['"]\s*:\s*['"]([^]+?)['"]\s*$/); if (dataMatch && dataMatch[1]) { transcriptText = dataMatch[1] .replace(/\\n/g, '\n') .replace(/\\'/g, "'") .replace(/\\"/g, '"') .replace(/\\\\/g, '\\'); } } } catch (e) { console.error('Error parsing JSON response:', e); } // Different timestamp patterns const patterns = [ // HH:MM - HH:MM : "text" /(\d{2}):(\d{2})\s*-\s*(\d{2}):(\d{2})\s*:\s*["']([^"']+)["']/g, // HH:MM - HH:MM: text /(\d{2}):(\d{2})\s*-\s*(\d{2}):(\d{2})\s*:\s*([^"\n]+)/g, // Simple format with quotes /(\d{2}):(\d{2})\s*-\s*(\d{2}):(\d{2})\s*["']([^"']+)["']/g ]; // Try each pattern against the transcript text for parsing for (const pattern of patterns) { let match; while ((match = pattern.exec(transcriptText)) !== null) { try { const [_, startMin, startSec, endMin, endSec, text] = match; // Convert timestamps to seconds const startTime = parseInt(startMin) * 60 + parseInt(startSec); const endTime = parseInt(endMin) * 60 + parseInt(endSec); // Skip invalid timestamps if (isNaN(startTime) || isNaN(endTime)) continue; // Clean up theo transcript text, if appears const cleanText = text .replace(/^["'\s]+|["'\s]+$/g, '') .replace(/\\n/g, ' ') .replace(/\*\*/g, '') .replace(/\\'/g, "'") .replace(/\\"/g, '"') .replace(/\\\\/g, '\\') .replace(/\s+/g, ' ') .trim(); if (cleanText && !cleanText.includes('Note:')) { const key = `${startTime}-${cleanText.substring(0, 50)}`; entries.set(key, { start: startTime, end: endTime, text: cleanText }); } } catch (e) { console.error('Error processing match:', e); continue; } } } // Convert Map to Array and sort by start time const sortedEntries = Array.from(entries.values()) .sort((a, b) => a.start - b.start); // Log processed result console.log('Parsed entries:', sortedEntries); return sortedEntries; }
For consistent processing, the system employs regular expressions to match different timestamp patterns, converts time markers to seconds, and performs comprehensive text cleaning to remove unwanted artifacts. A Map structure prevents duplicate entries and maintains chronological order. Detailed logging is implemented throughout the parsing process for debugging purposes.
The final output is a list of transcript entries containing start time, end time, and cleaned text content, ready for integration with video players and transcript display components.
You can find the complete version of the JavaScript file containing the code discussed above in this app.js file.
Demo Application
First, the user selects the target language for translation and the desired difficulty level. Then, they upload the video. Once the upload is complete, the indexing process begins.

After indexing and task creation are finished, the video ID is used to generate the transcription based on the user's preferences. The demo below showcases this generation process using the Twelve Labs SDK.

To explore Twelve Labs further, try generating video-based use cases in various sectors such as content creation, education, or any other area that interests you.
More Ideas to Experiment with the Tutorial
Understanding how multilingual video transcription applications work and how they're developed allows you to implement innovative ideas and create products that satisfy a wide variety of video solutions. Here are some use cases that could benefit users working with video content:
🌍 Global Content Creators: Generate transcriptions in multiple languages simultaneously, enabling instant content localization for video content.
🎓 International Education: Make educational content accessible by automatically transcribing lectures into various languages and difficulty levels. This can also be leveraged for language learning from video content where resources are limited.
💼 Cross-Cultural Business: Facilitate communication in multinational settings by generating meeting transcripts.
Conclusion
Thank you for following along with this tutorial on the development and functionality of the Video Multi-Lingual Transcriber application with Twelve Labs. We hope this guide helps you to ignite the idea of combining the video understanding and the application workflow. We welcome your ideas on how to enhance the user experience and address any challenges.
Additional Resources
Learn more about the engines used for the generation task, Marengo 2.6 (Embedding Engine) and Pegasus 1.1 (Generator Engine). To further explore Twelve Labs and enhance your understanding of video content analysis, check out these valuable resources:
Discord Community: Join our vibrant community of developers and enthusiasts to discuss ideas, ask questions, and share your projects. Join the Twelve Labs Discord
Sample Applications: Explore a variety of sample applications to inspire your next project or learn new implementation techniques.
Explore Tutorials: Dive deeper into Twelve Labs capabilities with our comprehensive tutorials.
We encourage you to leverage these resources to expand your knowledge and create innovative applications using Twelve Labs video understanding technology.
Introduction
Are you struggling to understand video content in different languages? Or perhaps finding it difficult to make your content accessible to a global audience? 🌍
In this tutorial, we'll introduce you to the MultiLingual Video Transcriber Application and explain how it was developed as a solution. This application uses AI models from Twelve Labs to understand videos and provide seamless transcription across multiple languages.
What sets this program apart is its ability to adjust transcriptions based on user-selected proficiency levels—beginner, intermediate, and advanced. Users get transcriptions or translations tailored to their chosen level. Additionally, the application provides accurate timestamps, allowing users to track spoken words with their transcriptions. This feature enables easy navigation and comprehension of the content. Let's explore how this application works and how you can build similar solutions using the TwelveLabs Python SDK.
You can explore the demo of the application here: Video Multilingual Transcriber
If you want to access the code and experiment with the app directly, you can use this Replit Template.
Prerequisites
Generate an API key by signing up at the Twelve Labs Playground.
Find the repository for this application on Video Multilingual Transcriber
There are several things you should already be familiar with - Flask, HTML, CSS, JavaScript.
Working of the Application
This section outlines the application flow for developing the MultiLingual Video Transcriber Application. The process involves obtaining a video and generating transcriptions in various languages and proficiency levels based on user preferences. This application goes beyond simple transcription, offering a more comprehensive solution.
The system architecture comprises four main components: the Frontend Layer, Backend Layer, Storage Layer, and TwelveLabs Service. Here's how it works: users upload a video (potentially in a foreign language) and select their desired transcript language and proficiency level (beginner, intermediate, or advanced).

Upon clicking the submit button after video upload, an Index ID is generated and stored in the session state for future use. Next, the system creates a Task ID by uploading the video, which then yields a Video ID once indexing is complete. The indexing process utilizes Marengo 2.6 as the embedding engine, while the Pegasus 1.1 Engine (Generative Engine) handles transcript generation through the Generate API.
To enhance user accessibility, the application generates timestamps alongside the transcript. This feature allows for interactive synchronization between the transcript and the video playback.
Preparation Steps
Obtain your API Key from the Twelve Labs Playground and prepare the environment variable.
Clone the project from Github or use the Replit Template
Set up the
.env
file with your API Key, along with the main file.
API_KEY=your_api_key_here
With these steps completed, you're now ready to dive in and develop the application!
Walkthrough for VidScribe - Video Multilingual Transcriber
In this tutorial, we'll build a Flask application with a minimal frontend. Here's the directory structure:
. ├── app.py ├── requirements.txt ├── static │ ├── style.css │ └── main.js ├── templates │ └── index.html └── uploads
Creating the Flask Application
Now that you've completed the previous steps, it's time to build the Flask application. This will provide a simple way for you to upload videos and generate transcripts in various languages based on user preferences.
You can find the required dependencies to prepare and create a virtual environment here: requirements.txt
Create a virtual Python environment, then set up the environment for the application using the following command:
pip install -r
1 - Setting up the Main Application
This section focuses on the main application utility function, which contains the crucial logic and instruction flow. We'll break down the main application into various sections:
Creation of the Index
Generation of the Result
Uploading Component
1.1 - Creation of the Index
Here, we'll discuss how the index is configured using the Twelve Labs SDK. This Flask application allows users to upload and process videos for various purposes. It employs a secure filename handling system and session management to ensure reliable operation in a production environment.
# Importing of the necesary modules from flask import Flask, render_template, request, jsonify, send_from_directory, session from werkzeug.utils import secure_filename import os import uuid from twelvelabs import TwelveLabs from twelvelabs.models.task import Task from dotenv import load_dotenv load_dotenv() # Loading the Twelve Labs API from the environment API_KEY = os.getenv("API_KEY") app = Flask(__name__) app.secret_key = os.urandom(24) UPLOAD_FOLDER = 'uploads' ALLOWED_EXTENSIONS = {'mp4', 'avi', 'mov'} app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER app.config['MAX_CONTENT_LENGTH'] = 100 * 1024 * 1024 # Intiailizing the client of the Twelve Labs SDK client = TwelveLabs(api_key=API_KEY) def allowed_file(filename): return '.' in filename and filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS @app.route('/') def index(): return render_template('index.html') # Utility function to check the status def on_task_update(task: Task): print(f"Status={task.status}") def process_video(filepath, language, difficulty): try: if 'index_id' not in session: # Index Name index_name = f"Translate{uuid.uuid4().hex[:8]}" # Defining Engine engines = [ { "name": "pegasus1.1", "options": ["visual", "conversation"] }, { "name": "marengo2.6", "options": ["visual", "conversation", "text_in_video", "logo"] } ] # Creation of Index with the config index = client.index.create( name=index_name, engines=engines ) # Storing the Index ID in the session to use it again session['index_id'] = index.id print(f"Created new index with ID: {index.id}") else: print(f"Using existing index with ID: {session['index_id']}") # Creation of the task task = client.task.create(index_id=session['index_id'], file=filepath) task.wait_for_done(sleep_interval=5, callback=on_task_update) if task.status != "ready": raise RuntimeError(f"Indexing failed with status {task.status}") print(f"The unique identifier of your video is {task.video_id}.")
We manage the session state to handle the creation of the Index ID robustly. A UUID creates a random number appended to the Index name, ensuring a unique index name for each new session. This new Index ID is saved in the session for future reference.
The index configuration defines two engines: Marengo 2.6 (Embedding Engine) for indexing the video, and Pegasus 1.1 (Generative Engine) for accessing the indexed video and generating content based on open-ended prompts or other parameters.
To create a task, we provide the Index ID and video file path. We can track the task's processing status using task.status. Once indexing is complete, the resulting Video ID is used in the next step.
1.2 - Generation of the Result
This section covers the generation of text from the indexed video based on user-selected difficulty levels and language preferences. Our prompt engineering system adapts the video transcript's complexity and detail level to match the user's comprehension level while maintaining accurate timestamps and translations.
# Updated prompt with difficulty level difficulty_prompts = { "beginner": "Provide a simplified and easy-to-understand", "intermediate": "Provide a moderately detailed", "advanced": "Provide a comprehensive and detailed" } # To store the difficulty level from the user form base_prompt = difficulty_prompts.get(difficulty, difficulty_prompts["intermediate"]) # Open Ended Prompt for Generation prompt = f"Provide the Only Transcript in the Translated {language.capitalize()} Language, {base_prompt} level with the timestamp duration (in the format of ss : ss) of the Indexed Video Content." res = client.generate.text(video_id=task.video_id, prompt=prompt, temperature=0.25) print(res) return { 'status': 'ready', 'message': 'File processed successfully', 'transcript': res.data, 'video_path': f'/uploads/{os.path.basename(filepath)}' } except Exception as e: print(f"Error processing video: {str(e)}") return {'status': 'error', 'message': str(e)}
We use a dictionary-based approach to map difficulty levels to appropriate prompt templates, with a default fallback to the intermediate level. The system maintains a low temperature value of 0.25 to ensure consistent and reliable output generation. The response includes the processed transcript with timestamps.
1.3 - Uploading Component
In this section, we'll explore how we generate and manage secure file upload endpoints and handle user file submissions in our Flask application. This component manages the file upload process, validates file types, handles language and difficulty preferences, and ensures secure storage of uploaded videos while maintaining proper error handling throughout the workflow.
# Route to handle file uploads - only accepts POST requests @app.route('/upload', methods=['POST']) def upload_file(): # Check if a file was included in the request if 'file' not in request.files: return jsonify({'status': 'error', 'message': 'No file part'}), 400 # Get the file from the request file = request.files['file'] # Extract language preference from form data, default to German if not specified language = request.form.get('language', 'german') difficulty = request.form.get('difficulty', 'intermediate') # Validate that a file was actually selected if file.filename == '': return jsonify({'status': 'error', 'message': 'No selected file'}), 400 # Check if file extension is allowed and proceed with processing if file and allowed_file(file.filename): filename = secure_filename(file.filename) filepath = os.path.join(app.config['UPLOAD_FOLDER'], filename) file.save(filepath) # Process the video with specified language and difficulty result = process_video(filepath, language, difficulty) # Return the processing result as JSON return jsonify(result) # Return error if file type is not allowed return jsonify({'status': 'error', 'message': 'File type not allowed'}), 400 # To serve uploaded files @app.route('/uploads/<filename>') def uploaded_file(filename): return send_from_directory(app.config['UPLOAD_FOLDER'], filename)
We provide an upload endpoint that processes incoming video files and their associated metadata. The upload route uses Werkzeug's secure_filename
utility for secure filename handling and implements robust error checking with appropriate HTTP status codes. If no language and difficulty preferences are set, the system defaults to German language and intermediate difficulty. The file serving route allows secure access to uploaded files from a designated directory, ensuring proper access control and file system security.
2 - Handling the component with JavaScript
This section explores key handling utility functions found in the app.js file. This JavaScript file manages file uploads, form submissions, video playback synchronization with time stamped transcripts, error handling, and other application parameters. We'll break down the main.js application into two sections:
File Upload and Validation
Transcript Processing
2.1 - File Upload and Validation
This section provides an overview of a comprehensive client-side file validation and submission system. It implements essential functionality for validating video files, handling file removals, and managing asynchronous form submissions throughout the upload and processing workflow.
// Validates the uploaded File function validateFile(file) { const validTypes = ['video/mp4', 'video/avi', 'video/quicktime']; const maxSize = 100 * 1024 * 1024; // 100MB if (!validTypes.includes(file.type)) { updateStatus('Please select a valid video file (MP4, AVI, or MOV)', 'error'); return false; } if (file.size > maxSize) { updateStatus('File size must be less than 100MB', 'error'); return false; } return true; } // Utiltiy function to handle the removal of a selected file function handleFileRemove(e) { e.preventDefault(); fileInput.value = ''; selectedFile.classList.add('hidden'); uploadPrompt.textContent = 'Choose a video or drag it here'; updateStatus('', ''); } // Form submission handling async function handleFormSubmit(e) { e.preventDefault(); const formData = new FormData(e.target); if (!fileInput.files || !fileInput.files[0]) { updateStatus('Please select a file first', 'error'); return; } const loadingOverlay = document.getElementById('loading-overlay'); loadingOverlay.classList.remove('hidden'); updateStatus('Uploading file and processing...', 'loading'); // Hide the previous results hideResult(); try { // Send POST request to upload endpoint with form const response = await fetch('/upload', { method: 'POST', body: formData }); // Parse the JSON response const data = await response.json(); console.log('Server response:', data); // Check if upload and processing were successful if (response.ok && data.status === 'ready') { // Update status and display results updateStatus('Processing complete!', 'success'); showResult(); displayTranscript(data.transcript); displayVideo(data.video_path); } else { // Throw error if processing failed throw new Error(data.message || 'An error occurred during processing'); } } catch (error) { // log and display of errors console.error('Error:', error); updateStatus(`Error: ${error.message}`, 'error'); } finally { // Hide loading overlay regardless of outcome loadingOverlay.classList.add('hidden'); } }
The validateFile
function ensures only correct video formats and file sizes are accepted, while handleFileRemove
resets the file selection state upon removal. The handleFormSubmit
function enables asynchronous uploads, managing errors, filling states, and user feedback comprehensively.
The process features appropriate UI updates—including loading overlays, success messages, and error notifications—at various stages. To maintain a smooth user experience, the implementation ensures consistent error handling and validation throughout the uploading and processing phases.
2.2 - Utility for Transcript Parsing for Open Ended Prompt Result
This section focuses on developing a robust JavaScript transcript parsing system. The code handles various timestamp formats and text patterns common in video transcripts, ensuring reliable extraction and structuring of time-coded content.
The parser converts raw transcript data into a format suitable for display and synchronization with video playback. It also manages multiple edge cases and format variations that may arise from the transcription process.
// Parses a transcript string into structured data with timestamps and text function parseTranscript(transcript) { console.log('Raw transcript:', transcript); // Initialize Map to store unique entries (prevents duplicates) const entries = new Map(); if (!transcript) { console.error('Empty transcript received'); return []; } // Extract data from JSON response and handle escapes let transcriptText = transcript; try { if (typeof transcript === 'string' && (transcript.includes('"id":') || transcript.includes("'id':"))) { const dataMatch = transcript.match(/['"]data['"]\s*:\s*['"]([^]+?)['"]\s*$/); if (dataMatch && dataMatch[1]) { transcriptText = dataMatch[1] .replace(/\\n/g, '\n') .replace(/\\'/g, "'") .replace(/\\"/g, '"') .replace(/\\\\/g, '\\'); } } } catch (e) { console.error('Error parsing JSON response:', e); } // Different timestamp patterns const patterns = [ // HH:MM - HH:MM : "text" /(\d{2}):(\d{2})\s*-\s*(\d{2}):(\d{2})\s*:\s*["']([^"']+)["']/g, // HH:MM - HH:MM: text /(\d{2}):(\d{2})\s*-\s*(\d{2}):(\d{2})\s*:\s*([^"\n]+)/g, // Simple format with quotes /(\d{2}):(\d{2})\s*-\s*(\d{2}):(\d{2})\s*["']([^"']+)["']/g ]; // Try each pattern against the transcript text for parsing for (const pattern of patterns) { let match; while ((match = pattern.exec(transcriptText)) !== null) { try { const [_, startMin, startSec, endMin, endSec, text] = match; // Convert timestamps to seconds const startTime = parseInt(startMin) * 60 + parseInt(startSec); const endTime = parseInt(endMin) * 60 + parseInt(endSec); // Skip invalid timestamps if (isNaN(startTime) || isNaN(endTime)) continue; // Clean up theo transcript text, if appears const cleanText = text .replace(/^["'\s]+|["'\s]+$/g, '') .replace(/\\n/g, ' ') .replace(/\*\*/g, '') .replace(/\\'/g, "'") .replace(/\\"/g, '"') .replace(/\\\\/g, '\\') .replace(/\s+/g, ' ') .trim(); if (cleanText && !cleanText.includes('Note:')) { const key = `${startTime}-${cleanText.substring(0, 50)}`; entries.set(key, { start: startTime, end: endTime, text: cleanText }); } } catch (e) { console.error('Error processing match:', e); continue; } } } // Convert Map to Array and sort by start time const sortedEntries = Array.from(entries.values()) .sort((a, b) => a.start - b.start); // Log processed result console.log('Parsed entries:', sortedEntries); return sortedEntries; }
For consistent processing, the system employs regular expressions to match different timestamp patterns, converts time markers to seconds, and performs comprehensive text cleaning to remove unwanted artifacts. A Map structure prevents duplicate entries and maintains chronological order. Detailed logging is implemented throughout the parsing process for debugging purposes.
The final output is a list of transcript entries containing start time, end time, and cleaned text content, ready for integration with video players and transcript display components.
You can find the complete version of the JavaScript file containing the code discussed above in this app.js file.
Demo Application
First, the user selects the target language for translation and the desired difficulty level. Then, they upload the video. Once the upload is complete, the indexing process begins.

After indexing and task creation are finished, the video ID is used to generate the transcription based on the user's preferences. The demo below showcases this generation process using the Twelve Labs SDK.

To explore Twelve Labs further, try generating video-based use cases in various sectors such as content creation, education, or any other area that interests you.
More Ideas to Experiment with the Tutorial
Understanding how multilingual video transcription applications work and how they're developed allows you to implement innovative ideas and create products that satisfy a wide variety of video solutions. Here are some use cases that could benefit users working with video content:
🌍 Global Content Creators: Generate transcriptions in multiple languages simultaneously, enabling instant content localization for video content.
🎓 International Education: Make educational content accessible by automatically transcribing lectures into various languages and difficulty levels. This can also be leveraged for language learning from video content where resources are limited.
💼 Cross-Cultural Business: Facilitate communication in multinational settings by generating meeting transcripts.
Conclusion
Thank you for following along with this tutorial on the development and functionality of the Video Multi-Lingual Transcriber application with Twelve Labs. We hope this guide helps you to ignite the idea of combining the video understanding and the application workflow. We welcome your ideas on how to enhance the user experience and address any challenges.
Additional Resources
Learn more about the engines used for the generation task, Marengo 2.6 (Embedding Engine) and Pegasus 1.1 (Generator Engine). To further explore Twelve Labs and enhance your understanding of video content analysis, check out these valuable resources:
Discord Community: Join our vibrant community of developers and enthusiasts to discuss ideas, ask questions, and share your projects. Join the Twelve Labs Discord
Sample Applications: Explore a variety of sample applications to inspire your next project or learn new implementation techniques.
Explore Tutorials: Dive deeper into Twelve Labs capabilities with our comprehensive tutorials.
We encourage you to leverage these resources to expand your knowledge and create innovative applications using Twelve Labs video understanding technology.
Introduction
Are you struggling to understand video content in different languages? Or perhaps finding it difficult to make your content accessible to a global audience? 🌍
In this tutorial, we'll introduce you to the MultiLingual Video Transcriber Application and explain how it was developed as a solution. This application uses AI models from Twelve Labs to understand videos and provide seamless transcription across multiple languages.
What sets this program apart is its ability to adjust transcriptions based on user-selected proficiency levels—beginner, intermediate, and advanced. Users get transcriptions or translations tailored to their chosen level. Additionally, the application provides accurate timestamps, allowing users to track spoken words with their transcriptions. This feature enables easy navigation and comprehension of the content. Let's explore how this application works and how you can build similar solutions using the TwelveLabs Python SDK.
You can explore the demo of the application here: Video Multilingual Transcriber
If you want to access the code and experiment with the app directly, you can use this Replit Template.
Prerequisites
Generate an API key by signing up at the Twelve Labs Playground.
Find the repository for this application on Video Multilingual Transcriber
There are several things you should already be familiar with - Flask, HTML, CSS, JavaScript.
Working of the Application
This section outlines the application flow for developing the MultiLingual Video Transcriber Application. The process involves obtaining a video and generating transcriptions in various languages and proficiency levels based on user preferences. This application goes beyond simple transcription, offering a more comprehensive solution.
The system architecture comprises four main components: the Frontend Layer, Backend Layer, Storage Layer, and TwelveLabs Service. Here's how it works: users upload a video (potentially in a foreign language) and select their desired transcript language and proficiency level (beginner, intermediate, or advanced).

Upon clicking the submit button after video upload, an Index ID is generated and stored in the session state for future use. Next, the system creates a Task ID by uploading the video, which then yields a Video ID once indexing is complete. The indexing process utilizes Marengo 2.6 as the embedding engine, while the Pegasus 1.1 Engine (Generative Engine) handles transcript generation through the Generate API.
To enhance user accessibility, the application generates timestamps alongside the transcript. This feature allows for interactive synchronization between the transcript and the video playback.
Preparation Steps
Obtain your API Key from the Twelve Labs Playground and prepare the environment variable.
Clone the project from Github or use the Replit Template
Set up the
.env
file with your API Key, along with the main file.
API_KEY=your_api_key_here
With these steps completed, you're now ready to dive in and develop the application!
Walkthrough for VidScribe - Video Multilingual Transcriber
In this tutorial, we'll build a Flask application with a minimal frontend. Here's the directory structure:
. ├── app.py ├── requirements.txt ├── static │ ├── style.css │ └── main.js ├── templates │ └── index.html └── uploads
Creating the Flask Application
Now that you've completed the previous steps, it's time to build the Flask application. This will provide a simple way for you to upload videos and generate transcripts in various languages based on user preferences.
You can find the required dependencies to prepare and create a virtual environment here: requirements.txt
Create a virtual Python environment, then set up the environment for the application using the following command:
pip install -r
1 - Setting up the Main Application
This section focuses on the main application utility function, which contains the crucial logic and instruction flow. We'll break down the main application into various sections:
Creation of the Index
Generation of the Result
Uploading Component
1.1 - Creation of the Index
Here, we'll discuss how the index is configured using the Twelve Labs SDK. This Flask application allows users to upload and process videos for various purposes. It employs a secure filename handling system and session management to ensure reliable operation in a production environment.
# Importing of the necesary modules from flask import Flask, render_template, request, jsonify, send_from_directory, session from werkzeug.utils import secure_filename import os import uuid from twelvelabs import TwelveLabs from twelvelabs.models.task import Task from dotenv import load_dotenv load_dotenv() # Loading the Twelve Labs API from the environment API_KEY = os.getenv("API_KEY") app = Flask(__name__) app.secret_key = os.urandom(24) UPLOAD_FOLDER = 'uploads' ALLOWED_EXTENSIONS = {'mp4', 'avi', 'mov'} app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER app.config['MAX_CONTENT_LENGTH'] = 100 * 1024 * 1024 # Intiailizing the client of the Twelve Labs SDK client = TwelveLabs(api_key=API_KEY) def allowed_file(filename): return '.' in filename and filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS @app.route('/') def index(): return render_template('index.html') # Utility function to check the status def on_task_update(task: Task): print(f"Status={task.status}") def process_video(filepath, language, difficulty): try: if 'index_id' not in session: # Index Name index_name = f"Translate{uuid.uuid4().hex[:8]}" # Defining Engine engines = [ { "name": "pegasus1.1", "options": ["visual", "conversation"] }, { "name": "marengo2.6", "options": ["visual", "conversation", "text_in_video", "logo"] } ] # Creation of Index with the config index = client.index.create( name=index_name, engines=engines ) # Storing the Index ID in the session to use it again session['index_id'] = index.id print(f"Created new index with ID: {index.id}") else: print(f"Using existing index with ID: {session['index_id']}") # Creation of the task task = client.task.create(index_id=session['index_id'], file=filepath) task.wait_for_done(sleep_interval=5, callback=on_task_update) if task.status != "ready": raise RuntimeError(f"Indexing failed with status {task.status}") print(f"The unique identifier of your video is {task.video_id}.")
We manage the session state to handle the creation of the Index ID robustly. A UUID creates a random number appended to the Index name, ensuring a unique index name for each new session. This new Index ID is saved in the session for future reference.
The index configuration defines two engines: Marengo 2.6 (Embedding Engine) for indexing the video, and Pegasus 1.1 (Generative Engine) for accessing the indexed video and generating content based on open-ended prompts or other parameters.
To create a task, we provide the Index ID and video file path. We can track the task's processing status using task.status. Once indexing is complete, the resulting Video ID is used in the next step.
1.2 - Generation of the Result
This section covers the generation of text from the indexed video based on user-selected difficulty levels and language preferences. Our prompt engineering system adapts the video transcript's complexity and detail level to match the user's comprehension level while maintaining accurate timestamps and translations.
# Updated prompt with difficulty level difficulty_prompts = { "beginner": "Provide a simplified and easy-to-understand", "intermediate": "Provide a moderately detailed", "advanced": "Provide a comprehensive and detailed" } # To store the difficulty level from the user form base_prompt = difficulty_prompts.get(difficulty, difficulty_prompts["intermediate"]) # Open Ended Prompt for Generation prompt = f"Provide the Only Transcript in the Translated {language.capitalize()} Language, {base_prompt} level with the timestamp duration (in the format of ss : ss) of the Indexed Video Content." res = client.generate.text(video_id=task.video_id, prompt=prompt, temperature=0.25) print(res) return { 'status': 'ready', 'message': 'File processed successfully', 'transcript': res.data, 'video_path': f'/uploads/{os.path.basename(filepath)}' } except Exception as e: print(f"Error processing video: {str(e)}") return {'status': 'error', 'message': str(e)}
We use a dictionary-based approach to map difficulty levels to appropriate prompt templates, with a default fallback to the intermediate level. The system maintains a low temperature value of 0.25 to ensure consistent and reliable output generation. The response includes the processed transcript with timestamps.
1.3 - Uploading Component
In this section, we'll explore how we generate and manage secure file upload endpoints and handle user file submissions in our Flask application. This component manages the file upload process, validates file types, handles language and difficulty preferences, and ensures secure storage of uploaded videos while maintaining proper error handling throughout the workflow.
# Route to handle file uploads - only accepts POST requests @app.route('/upload', methods=['POST']) def upload_file(): # Check if a file was included in the request if 'file' not in request.files: return jsonify({'status': 'error', 'message': 'No file part'}), 400 # Get the file from the request file = request.files['file'] # Extract language preference from form data, default to German if not specified language = request.form.get('language', 'german') difficulty = request.form.get('difficulty', 'intermediate') # Validate that a file was actually selected if file.filename == '': return jsonify({'status': 'error', 'message': 'No selected file'}), 400 # Check if file extension is allowed and proceed with processing if file and allowed_file(file.filename): filename = secure_filename(file.filename) filepath = os.path.join(app.config['UPLOAD_FOLDER'], filename) file.save(filepath) # Process the video with specified language and difficulty result = process_video(filepath, language, difficulty) # Return the processing result as JSON return jsonify(result) # Return error if file type is not allowed return jsonify({'status': 'error', 'message': 'File type not allowed'}), 400 # To serve uploaded files @app.route('/uploads/<filename>') def uploaded_file(filename): return send_from_directory(app.config['UPLOAD_FOLDER'], filename)
We provide an upload endpoint that processes incoming video files and their associated metadata. The upload route uses Werkzeug's secure_filename
utility for secure filename handling and implements robust error checking with appropriate HTTP status codes. If no language and difficulty preferences are set, the system defaults to German language and intermediate difficulty. The file serving route allows secure access to uploaded files from a designated directory, ensuring proper access control and file system security.
2 - Handling the component with JavaScript
This section explores key handling utility functions found in the app.js file. This JavaScript file manages file uploads, form submissions, video playback synchronization with time stamped transcripts, error handling, and other application parameters. We'll break down the main.js application into two sections:
File Upload and Validation
Transcript Processing
2.1 - File Upload and Validation
This section provides an overview of a comprehensive client-side file validation and submission system. It implements essential functionality for validating video files, handling file removals, and managing asynchronous form submissions throughout the upload and processing workflow.
// Validates the uploaded File function validateFile(file) { const validTypes = ['video/mp4', 'video/avi', 'video/quicktime']; const maxSize = 100 * 1024 * 1024; // 100MB if (!validTypes.includes(file.type)) { updateStatus('Please select a valid video file (MP4, AVI, or MOV)', 'error'); return false; } if (file.size > maxSize) { updateStatus('File size must be less than 100MB', 'error'); return false; } return true; } // Utiltiy function to handle the removal of a selected file function handleFileRemove(e) { e.preventDefault(); fileInput.value = ''; selectedFile.classList.add('hidden'); uploadPrompt.textContent = 'Choose a video or drag it here'; updateStatus('', ''); } // Form submission handling async function handleFormSubmit(e) { e.preventDefault(); const formData = new FormData(e.target); if (!fileInput.files || !fileInput.files[0]) { updateStatus('Please select a file first', 'error'); return; } const loadingOverlay = document.getElementById('loading-overlay'); loadingOverlay.classList.remove('hidden'); updateStatus('Uploading file and processing...', 'loading'); // Hide the previous results hideResult(); try { // Send POST request to upload endpoint with form const response = await fetch('/upload', { method: 'POST', body: formData }); // Parse the JSON response const data = await response.json(); console.log('Server response:', data); // Check if upload and processing were successful if (response.ok && data.status === 'ready') { // Update status and display results updateStatus('Processing complete!', 'success'); showResult(); displayTranscript(data.transcript); displayVideo(data.video_path); } else { // Throw error if processing failed throw new Error(data.message || 'An error occurred during processing'); } } catch (error) { // log and display of errors console.error('Error:', error); updateStatus(`Error: ${error.message}`, 'error'); } finally { // Hide loading overlay regardless of outcome loadingOverlay.classList.add('hidden'); } }
The validateFile
function ensures only correct video formats and file sizes are accepted, while handleFileRemove
resets the file selection state upon removal. The handleFormSubmit
function enables asynchronous uploads, managing errors, filling states, and user feedback comprehensively.
The process features appropriate UI updates—including loading overlays, success messages, and error notifications—at various stages. To maintain a smooth user experience, the implementation ensures consistent error handling and validation throughout the uploading and processing phases.
2.2 - Utility for Transcript Parsing for Open Ended Prompt Result
This section focuses on developing a robust JavaScript transcript parsing system. The code handles various timestamp formats and text patterns common in video transcripts, ensuring reliable extraction and structuring of time-coded content.
The parser converts raw transcript data into a format suitable for display and synchronization with video playback. It also manages multiple edge cases and format variations that may arise from the transcription process.
// Parses a transcript string into structured data with timestamps and text function parseTranscript(transcript) { console.log('Raw transcript:', transcript); // Initialize Map to store unique entries (prevents duplicates) const entries = new Map(); if (!transcript) { console.error('Empty transcript received'); return []; } // Extract data from JSON response and handle escapes let transcriptText = transcript; try { if (typeof transcript === 'string' && (transcript.includes('"id":') || transcript.includes("'id':"))) { const dataMatch = transcript.match(/['"]data['"]\s*:\s*['"]([^]+?)['"]\s*$/); if (dataMatch && dataMatch[1]) { transcriptText = dataMatch[1] .replace(/\\n/g, '\n') .replace(/\\'/g, "'") .replace(/\\"/g, '"') .replace(/\\\\/g, '\\'); } } } catch (e) { console.error('Error parsing JSON response:', e); } // Different timestamp patterns const patterns = [ // HH:MM - HH:MM : "text" /(\d{2}):(\d{2})\s*-\s*(\d{2}):(\d{2})\s*:\s*["']([^"']+)["']/g, // HH:MM - HH:MM: text /(\d{2}):(\d{2})\s*-\s*(\d{2}):(\d{2})\s*:\s*([^"\n]+)/g, // Simple format with quotes /(\d{2}):(\d{2})\s*-\s*(\d{2}):(\d{2})\s*["']([^"']+)["']/g ]; // Try each pattern against the transcript text for parsing for (const pattern of patterns) { let match; while ((match = pattern.exec(transcriptText)) !== null) { try { const [_, startMin, startSec, endMin, endSec, text] = match; // Convert timestamps to seconds const startTime = parseInt(startMin) * 60 + parseInt(startSec); const endTime = parseInt(endMin) * 60 + parseInt(endSec); // Skip invalid timestamps if (isNaN(startTime) || isNaN(endTime)) continue; // Clean up theo transcript text, if appears const cleanText = text .replace(/^["'\s]+|["'\s]+$/g, '') .replace(/\\n/g, ' ') .replace(/\*\*/g, '') .replace(/\\'/g, "'") .replace(/\\"/g, '"') .replace(/\\\\/g, '\\') .replace(/\s+/g, ' ') .trim(); if (cleanText && !cleanText.includes('Note:')) { const key = `${startTime}-${cleanText.substring(0, 50)}`; entries.set(key, { start: startTime, end: endTime, text: cleanText }); } } catch (e) { console.error('Error processing match:', e); continue; } } } // Convert Map to Array and sort by start time const sortedEntries = Array.from(entries.values()) .sort((a, b) => a.start - b.start); // Log processed result console.log('Parsed entries:', sortedEntries); return sortedEntries; }
For consistent processing, the system employs regular expressions to match different timestamp patterns, converts time markers to seconds, and performs comprehensive text cleaning to remove unwanted artifacts. A Map structure prevents duplicate entries and maintains chronological order. Detailed logging is implemented throughout the parsing process for debugging purposes.
The final output is a list of transcript entries containing start time, end time, and cleaned text content, ready for integration with video players and transcript display components.
You can find the complete version of the JavaScript file containing the code discussed above in this app.js file.
Demo Application
First, the user selects the target language for translation and the desired difficulty level. Then, they upload the video. Once the upload is complete, the indexing process begins.

After indexing and task creation are finished, the video ID is used to generate the transcription based on the user's preferences. The demo below showcases this generation process using the Twelve Labs SDK.

To explore Twelve Labs further, try generating video-based use cases in various sectors such as content creation, education, or any other area that interests you.
More Ideas to Experiment with the Tutorial
Understanding how multilingual video transcription applications work and how they're developed allows you to implement innovative ideas and create products that satisfy a wide variety of video solutions. Here are some use cases that could benefit users working with video content:
🌍 Global Content Creators: Generate transcriptions in multiple languages simultaneously, enabling instant content localization for video content.
🎓 International Education: Make educational content accessible by automatically transcribing lectures into various languages and difficulty levels. This can also be leveraged for language learning from video content where resources are limited.
💼 Cross-Cultural Business: Facilitate communication in multinational settings by generating meeting transcripts.
Conclusion
Thank you for following along with this tutorial on the development and functionality of the Video Multi-Lingual Transcriber application with Twelve Labs. We hope this guide helps you to ignite the idea of combining the video understanding and the application workflow. We welcome your ideas on how to enhance the user experience and address any challenges.
Additional Resources
Learn more about the engines used for the generation task, Marengo 2.6 (Embedding Engine) and Pegasus 1.1 (Generator Engine). To further explore Twelve Labs and enhance your understanding of video content analysis, check out these valuable resources:
Discord Community: Join our vibrant community of developers and enthusiasts to discuss ideas, ask questions, and share your projects. Join the Twelve Labs Discord
Sample Applications: Explore a variety of sample applications to inspire your next project or learn new implementation techniques.
Explore Tutorials: Dive deeper into Twelve Labs capabilities with our comprehensive tutorials.
We encourage you to leverage these resources to expand your knowledge and create innovative applications using Twelve Labs video understanding technology.