Alibaba’s new AI model lets you direct a full film — with just a text prompt


HANGZHOU, China (Apr. 2026) — Typing a single prompt and getting back a fully edited, cinematically coherent short film used to sound like science fiction. Alibaba just made it a product feature.

The company has launched Wan2.7-Video, a new AI video generation model that goes well beyond the clip-generation tools most creators are used to. Instead of just producing raw footage, it handles the full pipeline — from storyboarding to post-production editing — using natural language instructions.

The launch follows the debut of Wan2.7-Image just days earlier, marking a rapid push by Alibaba to build out a complete AI-powered multimedia suite.

One prompt, full storyboard

Wan2.7-Video is actually a suite of four models: Wan2.7-t2v (text-to-video), Wan2.7-i2v (image-to-video), Wan2.7-r2v (reference-to-video), and Wan2.7-videoedit. Together, they handle generation, editing, continuation, and referencing across text, image, video, and audio inputs in one unified system.

A single text prompt can produce a fully realized storyboard complete with multi-shot pacing, FPV drone dives, 360-degree orbital camera moves, and context-aware lighting. The model supports video lengths from 2 to 15 seconds and outputs at 720p or 1080p.

Director-level control, no editing suite required

The more notable capability is how much control creators have after the initial generation. Through natural language alone, users can modify character actions, dialogue, appearance, scenes, visual styles, and camera angles — without touching a timeline editor.

The model also handles dialogue editing by automatically syncing lip movements to rewritten scripts while preserving each character’s vocal signature. It supports cross-video consistency for up to five distinct characters, maintaining unique voice tones and visual identities across complex multi-scene narratives.

For style and tone, the model supports thousands of style combinations and over 50 distinct emotional expressions for character performance.

Wan2.7-Image: Color accuracy and personalization

Alongside the video model, Alibaba’s Wan2.7-Image tackles a persistent problem in AI image generation — generic aesthetics and inconsistent color reproduction.

The model includes a deep personalization engine for fine-tuning specific character traits, down to bone structure and eye shape. A “color palette” feature lets users match exact color codes, which makes it practical for brand work. A notable breakthrough is its text rendering capability: using a 3,000-token context window, it can generate print-quality academic text, complex formulas, and tables across 12 languages.

For high-volume production, Wan2.7-Image can process up to nine reference images and output 12 distinct results in a single batch — useful for e-commerce shoots or storyboard production. A “click-to-edit” interface handles pixel-level adjustments for positioning and alignment.

A Pro version, Wan2.7-Image-Pro, is also available with sharper prompt interpretation, more stable composition, and 4K output.

Availability

Both Wan2.7-Video and Wan2.7-Image are available now on Alibaba Cloud’s Model Studio and the official Wan website, with integration into the Qwen App. Enterprise APIs for batch processing and custom workflows are also available.

Philippine pricing and local availability through Alibaba Cloud have not been announced at the time of writing.


What's Your Reaction?

Wakeke Wakeke
0
Wakeke
BULOK! BULOK!
0
BULOK!
Aww :( Aww :(
0
Aww :(
ASTIG! ASTIG!
0
ASTIG!
AMP#*@! AMP#*@!
0
AMP#*@!
Nyeam! Nyeam!
0
Nyeam!
ASTIG PH Team

Pinoy experiences online. A community dedicated to serving the best stories from the Philippines to the rest of the world. Want to work with us?