Artificial intelligence (AI) workflows of all kinds typically require massive amounts of data—often unstructured data such as video and images—to adequately train models and generate accurate insights. But getting all that data into a cloud bucket or other storage, so you can run AI workflows on it, can be challenging:
- Adding large datasets to Amazon S3, for example, can be a technical process involving multipart upload using a command line interface (CLI) tool.
- Natively uploading large datasets to S3 is often slow and unreliable.
These issues can significantly impact time-to-market for AI companies hoping to reap a competitive advantage.
MASV, however, simplifies data ingestion for AI processing, reducing time-to-market for AI companies to monetize and operationalize AI models—including Twelve Labs, which alongside MASV recently presented a custom AI workflow using S3 object storage.
Note: This workflow was presented by MASV CTO Majed Alhajry and Twelve Labs Head of Growth Maninder Saini at IBC 2024. Here’s a link to the video of their presentation.
Table of Contents
Easily Ingest Big Datasets for AI Workloads
Enjoy secure, reliable, and fast transfer of massive 4K, 8K, or 12K video files and other big datasets with MASV.
What Can AI Workflows Do for Video?
At this point, explaining all the benefits of AI tools in video production and other applications may be a waste of time—most of us have heard much of it before. But the technology moves so fast that I’ll go ahead and explain it anyway. When it comes to video productions, AI can:
- Perform complex analysis on footage to generate transcriptions, tagging, and other context, such as Twelve Labs’ video understanding and Strada AI video search.
- Generate realistic original video, such as Heygen, which can produce studio-quality video in more than 170 languages using generative AI.
- Perform text-to-speech, audio cleanup, and AI-based dubbing into other languages, such as what’s offered by companies like ElevenLabs.
- Speed up post-production and repetitive tasks using AI automation, such as Izotope Neuron for audio mixing and Runway for masking.
- It’s not just about professional video production, either: Companies like Spatialdata.ai use optical sensor data and machine learning to better assess the health and risk of critical assets.
But at the end of the day, all these applications face a similar challenge: Corralling massive amounts of data and getting it into storage for AI automation, model training, and execution.
Read more: Scaling Strada AI Video Search With MASV
The Hidden Challenge of AI Workflows: Data Ingest
“I have the model—I just need to get the data into the model.”
Sounds simple, right? But how many times has anyone who regularly works with or develops complex AI algorithms heard the above?
The answer: Plenty. That’s because one of the biggest challenges of working with AI remains getting the massive amounts of data needed to train and run these models into one place, such as an S3 bucket.
- Uploading large datasets to S3 is typically a time-consuming process that’s hindered by strict file size limitations and often requires technical expertise, such as multipart upload via CLI.
- Many third-party applications designed to get data to the cloud, like Cyberduck, are based on the file transfer protocol (FTP)—which is also much more technical than managed file transfer, along with being slow and insecure.
To run AI workloads and similar jobs requiring cloud compute, users need to get data from the right side of the above diagram (machines, people, and apps) into services provided by cloud services providers on the left.
This inherently creates challenges in terms of accounts that must be created, keys that must be generated, and access that needs to be managed for multiple users—along with the performance, reliability, and security issues mentioned above.
But putting MASV in the middle of this workflow provides a single point of access to all these services from all your people, apps, and machines.
And this is the crux of the conversation: How do AI companies empower their users to get big datasets into cloud storage quickly, securely, and without running into technical issues, so they can run AI algorithms and generative AI on the data without backbreaking delays?
The MASV-S3-Twelve Labs Model Workflow
MASV’s simple yet powerful file transfer technology helps solve this problem, which is why other AI firms such as ElevenLabs, SpatialData.ai, and HeyGen also use MASV to get data to the cloud.
This AI workflow automation can save orders of magnitude of time and frustration and remove significant friction from your users when uploading large datasets to cloud storage.
💡 Note: This workflow can be replicated with other AI tools that have an API and other cloud storage platforms integrated with MASV. The full list of MASV integrations can be found here.
Tools we used
- MASV Portals. Spinning up a web-based and customizable MASV Portal file uploader doesn’t require software or plugin installations, provides global file transfer acceleration, and relentless reliability—and it can also help expedite the issue of getting data into the cloud quickly to run AI workloads.
- MASV integrations and automations. No-code integrations and file-transfer automations can be configured in minutes to automatically ingest files to cloud storage, kind of like an AI workflow automation tool.
- Amazon S3. A highly scalable cloud object storage service offering high availability and performance, that’s used by many AI companies, along with AWS Lambda serverless functions.
- Twelve Labs multimodal video understanding AI capabilities. Can analyze terabytes or petabytes of video for AI search, classification, and other functions, eliminating repetitive tasks associated with manual search.
Here’s a step-by-step guide to our automated workflow:
- Sign up for a MASV account. It’s free.
- Connect S3 to your account via MASV no-code integrations.
- Set up a custom or instant MASV Portal: Configure your Portal’s name, notification settings, and set up an automated Portal download to your integrated cloud storage.
- You can also automate file uploads to your MASV Portal via MASV Watch Folders.
- You can also set up custom file upload workflows via the MASV Transfer Agent or API, but that’s beyond the scope of this article.
- Create a Lambda function that uses the Twelve Labs API, and is triggered upon object creation in your cloud storage.
You’re now ready to upload files to your MASV Portal:
- Drag-and-drop your files to upload them to the MASV Portal.
- The preconfigured MASV automation will then automatically send the files to your S3 bucket. You can upload files as large as 5TB to S3 using MASV. Easy!
- The Lambda function is then triggered, which generates a JSON payload, which then calls the Twelve Labs API.
- Twelve Labs begins indexing your files using video understanding AI, allowing you to run AI or other high-performance computing workloads on the files in the cloud.
The output
From there, you can use Twelve Labs’ AI technology to perform a range of actions, such as semantically searching instead of manual search and tagging.
- Users can prompt the system to create a highlight video of all of a certain player or team’s goals out of hundreds or thousands of hours of footage, for example, or find the most important narrative moments for fans.
- Or you can ask the system to separate long video clips into chapters for better organization.
MASV: The Ideal Solution to Ingest Big Data to S3 for AI Workloads
MASV‘s cloud-based large file transfer platform makes it easy for users to effortlessly upload large files to S3 for AI workflows, speeding up time-to-market for AI companies in a hyper competitive (and fast-moving) industry.
MASV provides all the tools and capabilities to power hands-free cloud, on-prem, or hybrid cloud upload and file management workflows by facilitating the easy transfer of large media assets with:
- A simplified, web-based, plugin-free, reliable, and universal uploader that’s turnkey and fully customizable.
- A suite of no-code file transfer automation tools.
- Developer documentation and tools, including the MASV API and cloud/networked on-prem integrations that allow you to build sophisticated automated workflows, like delivering files to cloud buckets without friction to your end users.
- Unmatched file transfer performance that can keep up with pipes optimized up to 10Gbps.
- Relentless file transfer reliability: MASV recovers from network instability and automatically retries all transfers, even in the case of network outages, until they’re complete.
- No file size limits on file package uploads.
- Enterprise-grade security tools and compliance with major data privacy regulations out of the box.
Sign up for MASV today and get 10GB of free transfer credits every month to try out this AI automated workflow (or any other file transfer workflow that makes sense for your business).
Transfer Files With No Limitations
Big files? No worries. MASV has no limits on file packages and handles files up to 15TB.