What Makes a Good Video:
Next Practices in Video Generation and Evaluation

@ NeurIPS 2025 Workshop

Exploring the challenges and opportunities in video generation and evaluation

Saturday, December 6th at 8:00 AM PST.

Upper Level Ballroom 6B, San Diego Convention Center

About the Workshop

This workshop aims to explore the paradigm evolution for next-generation video generation models. We focus on core topics such as video generation, video understanding, benchmarks, and application implementation. Through in-depth discussions, we hope to, on the one hand, deepen the understanding of the limitations of current video models, and on the other hand, identify more valuable exploration directions for future research and practice.​ To this end, we have specially invited senior experts from both academia and industry to serve as speakers. They will deliver theme-sharing sessions from diverse perspectives, helping participants gain a comprehensive insight into the cutting-edge developments in the field.

Topics of Interest

We invite submissions on the following topics but are not limited to:


Video Generation Models

  • Prompt-driven controllable synthesis
  • Layout & motion conditioning (spatial layouts, motion trajectories)
  • 3D / 4D / physics-informed priors for geometric and physical realism
  • Narrative & expressive modeling to capture story structure and affect
  • Cross-modal generation from audio, text, or interactive inputs
  • Latent space control and editing for interactive generation

Benchmarks & Evaluation

  • Long-form & multi-shot datasets for task-oriented assessment
  • Temporal-coherence evaluation: consistency over time, scene continuity, cinematic quality
  • Temporal / semantic / causal metrics
  • Human-aligned scoring for spatiotemporal and narrative fidelity
  • LLM-powered automated review with interpretable feedback
  • Reproducible benchmarking protocols

Applications

  • Feedback-in-the-loop frameworks
  • Media & content production (film, short-form video, advertising)
  • Education & training (instructional demos, simulation-based learning)
  • Immersive AR/VR dynamic scene generation
  • Robotics & simulation synthetic video for perception and planning
  • Interactive entertainment and gaming narratives

Speakers

Ming-Yu Liu

Ming-Yu Liu

NVIDIA

Vice President of Research

Hao Zhang

Hao Zhang

UC San Diego

Assistant Professor

Saining Xie

Saining Xie

New York University

Assistant Professor

Jiajun Wu

Jiajun Wu

Stanford University

Assistant Professor

Dima Damen

Dima Damen

University of Bristol & Google DeepMind

Professor

Yi Jiang

Yi Jiang

Bytedance

Research Leader

Xun Huang

Xun Huang

Stealth Startup

Founder & Chief Scientist

Hengshuang Zhao

Hengshuang Zhao

The University of Hong Kong

Assistant Professor

Organizers

Xinting Hu

Xinting Hu

Max Planck Institute for Informatics

Yongliang Wu

Yongliang Wu

Southeast University

Anna Kukleva

Anna Kukleva

Max Planck Institute for Informatics

Zhicai Wang

Zhicai Wang

University of Science and Technology of China

Chenyang Si

Chenyang Si

Nanjing University

Jiang Li

Li Jiang

Chinese University of Hong Kong, Shenzhen

Gang Yu

Gang Yu

StepFun

Xu Yang

Xu Yang

Southeast University

Ziwei Liu

Ziwei Liu

Nanyang Technological University

Bernt Schiele

Bernt Schiele

Max Planck Institute for Informatics

Workshop Schedule

Morning

8:00 – 8:10 AM
Opening Remarks
8:10 – 8:40 AM
Cosmos World Foundation Model Platform
Ming-Yu Liu
8:40 – 9:15 AM
Video Understanding Out of the Frame - an Egocentric Perspective
Dima Damen
9:15 – 9:35 AM
Morning Coffee Break
9:35 – 10:05 AM
Action-Conditioned Video Generation
Jiajun Wu
10:05 – 10:40 AM
TBD
Hao Zhang
10:40 – 11:40 AM
Oral Talks
11:40 AM – 1:30 PM
Lunch Break

Afternoon

1:30 – 2:00 PM
Interactive World Simulation
Hengshuang Zhao
2:00 – 2:35 PM
From Video Generation to Video World Models
Xun Huang
2:35 – 3:10 PM
Diffusion Transformers with Representation Autoencoders
Saining Xie
3:10 – 3:30 PM
Afternoon Coffee Break
3:30 – 4:05 PM
Towards autoregressive modeling for Scalable and Versatile Visual Generation
Yi Jiang
4:05 – 4:40 PM
Panel Discussion
Xun Huang · Hengshuang Zhao · Saining Xie · Ming-Yu Liu · Yi Jiang · Jiajun Wu
4:40 – 4:55 PM
Oral Talk (Cancelled)
4:55 – 5:00 PM
Best Paper Award

Note for Poster Authors: Authors who need to set up posters can do so during the following time windows: 7:30-8:00 AM (before the workshop starts), 9:15-9:35 AM (Morning Coffee Break), 11:40 AM – 1:30 PM (Lunch Break), and 3:10 – 3:30 PM (Afternoon Coffee Break).

Paper Submission

Submissions Due August 29, 2025 AoE
Author Notification September 22, 2025 AoE
Camera-Ready Due September 29, 2025 AoE

Submission Guidelines

We welcome the following types of submissions:

  • Full Paper: In this track, we welcome the submissions that are intended to demonstrate original research ideas and their impacts, and have not been published to other conferences/journals. The text length should be 5-9 pages (excluding references and appendices).
  • Short Paper: In this track, we welcome the submissions that are intended for reporting promising early-stage research, novel ideas, or results that may not yet be fully developed for a full paper. The text length should be 2-4 pages (excluding references and appendices).

All submissions will be featured in the workshop poster session to give the authors the opportunity to present their work, and a subset of the submissions will be selected for a oral talk session during the workshop. All accepted papers are non-archival.

We employ a double-blind review process conducted through the OpenReview, and all papers must adhere to the NeurIPS 2025 submission format and use NeurIPS 2025 latex template.

Submission Portal: OpenReview

Review Guidelines

Our review process follows the standards of NeurIPS 2025, and reviewers are expected to adhere to the same NeurIPS 2025 reviewer guidelines. We sincerely thank all reviewers for their dedication to maintaining the high academic standards of the NextVid Workshop@NeurIPS 2025!