OpenCV founders launch AI video startup to take on OpenAI and Google

A new artificial intelligence startup founded by the creators of the world’s most widely used computer vision library has emerged from stealth with technology that generates realistic, human-centered videos up to five minutes long – a dramatic leap beyond the capabilities of its competitors, including OpenAI. Sora and that of Google I see.
History of craftswhich launched Tuesday with $2 million in funding, introduces Model 2.0, a video generation system that addresses one of the most significant limitations facing the nascent AI video industry: length. While OpenAI Sora2 reaches 25 seconds and most competing models generate clips 10 seconds or less, the CraftStory system can produce continuous, consistent video performances that last as long as a typical YouTube tutorial or product demonstration.
This advancement could generate substantial business value for companies struggling to scale their video production for training, marketing, and customer education purposes – markets where brief AI-generated clips have proven inadequate despite their visual quality.
"If you really try to create a video with one of these video generation systems, you’ll find that most of the time you want to implement a certain creative vision, and no matter how detailed the instructions are, the systems basically ignore some part of your instructions." said Victor Erukhimov, founder and CEO of CraftStory, in an exclusive interview with VentureBeat. "We have developed a system that can generate videos for as long as you need them."
How parallel processing solves the problem of long-form video
CraftStory’s advancement relies on what the company describes as a parallelized streaming architecture: a fundamentally different approach to how AI models generate video compared to the sequential methods employed by most competitors.
Traditional video generation models work by running streaming algorithms on increasingly larger three-dimensional volumes where time represents the third axis. To generate a longer video, these models require proportionately larger networks, more training data, and significantly more computing resources.
History of crafts instead, it runs multiple smaller streaming algorithms simultaneously over the entire duration of the video, with bidirectional constraints connecting them. "The last part of the video can also influence the first part of the video," Erukhimov explained. "And this is quite important, because if you do it one by one, then the artifact that appears in the first part spreads in the second, and then accumulates."
Rather than generating eight seconds and then stitching together additional segments, CraftStory’s system processes all five minutes simultaneously through interconnected delivery processes.
Importantly, CraftStory trained its model on proprietary images rather than relying solely on videos scraped from the internet. The company has hired studios to film actors using high frame rate camera systems that capture sharp details even in fast-moving elements like fingers, avoiding the motion blur inherent in standard 30 frames per second YouTube clips.
"What we’ve shown is that you don’t need a lot of data or a lot of training budget to create high-quality videos." » said Erukhimov. "You just need high quality data."
The 2.0 model currently functions as a video-to-video system: users upload a still image to animate and a "driving video" containing a person whose movements the AI ​​will reproduce. CraftStory offers pre-made driving videos shot with professional actors, who receive revenue shares when their motion data is used, or users can upload their own footage.
The system generates low-resolution 30-second clips in approximately 15 minutes. An advanced lip sync system synchronizes mouth movements with scripts or audio tracks, while gesture alignment algorithms ensure body language matches speech rhythm and emotional tone.
Running a war chest with $2 million versus billions
CraftStory’s funding comes almost entirely from Andrew Filevwho sold his project management software company Wrike to Citrix for $2.25 billion in 2021 and now works Zencoderan AI coding company. This modest increase stands in stark contrast to the billions invested in competing efforts – OpenAI has raised more than $6 billion just in its latest funding cycle.
Erukhimov pushed back against the idea that massive capital is a prerequisite for success. "I don’t necessarily subscribe to the thesis that calculation is the path to success," he said. "It definitely helps if you have a calculus. But if you raise a billion dollars on a PowerPoint, in the end, no one is happy, neither the founders nor the investors."
Filev defended the David versus Goliath approach. "When you invest in startups, you are fundamentally betting on people," he said in an interview with VentureBeat. "To paraphrase Margaret Mead: never underestimate what a small group of thoughtful, committed engineers and scientists can build."
He argued that CraftStory benefits from a focused strategy. "Major labs are engaged in an arms race to build general-purpose video base models," » said Filev. "CraftStory is riding this wave and diving deeper into a specific format: long-form, engaging, human-centered video."
Why Computer Vision Expertise Matters in Generative AI Video
Erukhimov’s credibility comes from his deep roots in computer vision rather than the transformer architectures that have dominated recent advances in AI. He was one of the first contributors to OpenCV — the Open Source Computer Vision library which has become the de facto standard for computer vision applications, with over 84,000 stars on GitHub.
When Intel reduced its support for OpenCV in the mid-2000s, Erukhimov co-founded Itseez with the explicit goal of maintaining and advancing the library. The company significantly expanded OpenCV and moved into automotive safety systems before Intel acquired it in 2016.
Filev said that this context is precisely what makes Erukhimov well-positioned for the video generation. "What people sometimes forget is that AI generative video is not just about the generative part. It’s about understanding movement, facial dynamics, temporal coherence and how humans actually move," » said Filev. "Victor has spent his career mastering exactly these issues."
Enterprise focus targets training videos and product demos
While much of the public excitement around AI video generation has focused on consumer-facing creative tools, CraftStory is pursuing a decidedly business-centric strategy.
"We definitely think more about B2B than the consumer," » said Erukhimov. "We’re thinking of companies, especially software companies, who could create interesting training videos, product videos, and launch videos."
The logic is simple: corporate training, product tutorials, and customer training videos are often several minutes long and require consistent quality. A 10-second AI clip cannot effectively demonstrate how to use enterprise software or explain a complex product feature.
"If you need a longer video, you should accompany us," » said Erukhimov. "We can create up to five minutes of cohesive, high-quality video."
Filev echoed this assessment. "A huge gap in this market is the lack of models that can generate consistent video over longer sequences – and that’s extremely important for real-world use." he said. "If you’re creating an ad for your business, a 10-second video, no matter how good it looks, isn’t enough. You need 30 seconds, you need two minutes – you need more."
The company anticipates cost savings for customers. Filev suggested that "a small business owner could create content in minutes that previously would have cost $20,000 and taken two months to produce."
CraftStory is also courting creative agencies that produce video content for corporate clients, with a value proposition focused on cost and speed: agencies can record an actor on camera and turn that footage into a finished AI video, rather than managing costly multi-day shoots.
The next major development on CraftStory’s roadmap is a text-to-video model that would allow users to generate long-form content directly from scripts. The team is also developing support for moving camera scenarios, including the popular "walk and talk" common format in high-end advertising.
Where CraftStory fits into a fragmented competitive landscape
CraftStory is entering a crowded and rapidly changing market. OpenAI Sora2although not yet available to the public, has generated significant buzz. Google I see patterns are moving forward quickly. Track, PikaAnd Stability AI all offer video generation tools with different capabilities.
Erukhimov acknowledged competitive pressure, but emphasized that CraftStory serves a distinct niche focused on human-centered videos. He positioned rapid innovation and market capture as the company’s core strategy rather than relying on technical moats.
Filev sees the market fragmenting into distinct layers, with large tech companies acting as "Powerful, general-purpose build model API providers" while specialized players like CraftStory focus on specific use cases. "If the big players build the engines, CraftStory builds the production studio and assembly line on top," he said.
Model 2.0 is available now at app.craftstory.com/model-2.0, with the company offering early access to users and businesses interested in testing the technology. It remains uncertain whether a poorly funded startup can capture significant market share against deep-pocketed incumbents, but Erukhimov is characteristically confident about the opportunities ahead.
"AI-generated video will soon become the primary way businesses communicate their stories," he said.




