Oral Papers
- Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
- MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models
- VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention
- Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency
- Video Killed the Energy Budget: Characterizing the Latency and Power Regimes of Open Text-to-Video Models
Poster Papers
- VIBE: Annotation-Free Video-to-Text Information Bottleneck Evaluation for TL;DR
- DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion
- Seeing Beyond the Scene: Analyzing and Mitigating Background Bias in Action Recognition
- The Unwinnable Arms Race of AI Image Detection
- ZeroTrail: Zero-Shot Trajectory Control Framework for Video Diffusion Models
- Interaction-Aware Video Narrative Generation for Short-Form Gaming Content
- Petri Net Structure-Driven Video Generation
- Scaling Image and Video Generation via Test-Time Evolutionary Search
- Learning Human-Perceived Fakeness in AI-Generated Videos via Multimodal LLMs
- Confidence Scores for Temporal Properties over Sequences of Predictions
- Reframe Anything: LLM Agent for Open World Video Reframing