MPI-INF Logo
Homepage

Contact

Lin Geng Foo

Lin Geng Foo

Max-Planck-Institut für Informatik
Department 6 - Visual Computing and Artificial Intelligence
 office: Campus E1 4, Room 115F
Saarland Informatics Campus
66123 Saarbrücken
Germany
 email: Get my email address via email
 phone: +49 681 9325 4553

Research Interests

  • Video Understanding
  • Video Generation and Editing
  • Diffusion Models

Selected Publications

Please also refer to my Google Scholar or my personal homepage for a full publication list.

Physical Simulator In-the-Loop Video Generation

Lin Geng Foo   Mark He Huang   Alexandros Lattas   Stylianos Moschoglou   Thabo Beeler   Christian Theobalt  

CVPR 2026

Abstract
Recent advances in diffusion-based video generation have achieved remarkable visual realism but still struggle to obey basic physical laws such as gravity, inertia, and collision. Generated objects often move inconsistently across frames, exhibit implausible dynamics, or violate physical constraints, limiting the realism and reliability of AI-generated videos. We address this gap by introducing Physical Simulator In-the-Loop Video Generation (PSIVG), a novel framework that integrates a physical simulator into the video diffusion process. Starting from a template video generated by a pre-trained diffusion model, PSIVG reconstructs the 4D scene and foreground object meshes, initializes them within a physical simulator, and generates physically consistent trajectories. These simulated trajectories are then used to guide the video generator toward spatio-temporally physically coherent motion. To further improve texture consistency during object movement, we propose a Test-Time Texture Consistency Optimization (TTCO) technique that adapts text and feature embeddings based on pixel correspondences from the simulator. Comprehensive experiments demonstrate that PSIVG produces videos that better adhere to real-world physics while preserving visual quality and diversity.

[pdf], [code], [project page], [arxiv]



Step2Motion: Locomotion Reconstruction from Pressure Sensing Insoles

Jose Luis Ponton   Eduardo Alvarado   Lin Geng Foo   Nuria Pelechano   Carlos Andujar   Marc Habermann  

Eurographics 2026

Abstract
Human motion is fundamentally driven by continuous physical interaction with the environment. Whether walking, running, or simply standing, the forces exchanged between our feet and the ground provide crucial insights for understanding and reconstructing human movement. Recent advances in wearable insole devices offer a compelling solution for capturing these forces in diverse, real-world scenarios. Sensor insoles pose no constraint on the users' motion (unlike mocap suits) and are unaffected by line-of-sight limitations (in contrast to optical systems). These qualities make sensor insoles an ideal choice for robust, unconstrained motion capture, particularly in outdoor environments. Surprisingly, leveraging these devices with recent motion reconstruction methods remains largely unexplored. Aiming to fill this gap, we present Step2Motion, the first approach to reconstruct human locomotion from multi-modal insole sensors. Our method utilizes pressure and inertial data, captured by the insoles, to reconstruct human motion. We evaluate the effectiveness of our approach across a range of experiments to show its versatility for diverse locomotion styles, from simple ones like walking or jogging up to moving sideways, on tiptoes, slightly crouching, or dancing.

[pdf], [video], [code], [project page], [arxiv]



Matrix-free Second-order Optimization of Gaussian Splats with Residual Sampling

Hamza Pehlivan   Andrea Boscolo Camiletto   Lin Geng Foo   Marc Habermann   Christian Theobalt  

3DV 2026 (Oral Presentation)

Abstract
3D Gaussian Splatting (3DGS) is widely used for novel view synthesis due to its high rendering quality and fast inference time. However, 3DGS predominantly relies on first-order optimizers such as Adam, which leads to long training times. To address this limitation, we propose a novel second-order optimization strategy based on Levenberg-Marquardt (LM) and Conjugate Gradient (CG), specifically tailored towards Gaussian Splatting. Our key insight is that the Jacobian in 3DGS exhibits significant sparsity since each Gaussian affects only a limited number of pixels. We exploit this sparsity by proposing a matrix-free and GPU-parallelized LM optimization. To further improve its efficiency, we propose sampling strategies for both camera views and loss function and, consequently, the normal equation, significantly reducing the computational complexity. In addition, we increase the convergence rate of the second-order approximation by introducing an effective heuristic to determine the learning rate that avoids the expensive computation cost of line search methods. As a result, our method achieves a 4x speedup over standard LM and outperforms Adam by about 5x when the Gaussian count is low while providing about 1.3x speed in moderate counts. In addition, our matrix-free implementation achieves 2x speedup over the concurrent second-order optimizer 3DGS-LM, while using 3.5x less memory.

[pdf], [code], [project page], [arxiv]



VHOI: Controllable Video Generation of Human-Object Interactions from Sparse Trajectories via Motion Densification

Wanyue Zhang   Lin Geng Foo   Thabo Beeler   Rishabh Dabral   Christian Theobalt  

arXiv 2025

Abstract
Synthesizing realistic human-object interactions (HOI) in video is challenging due to the complex, instance-specific interaction dynamics of both humans and objects. Incorporating controllability in video generation further adds to the complexity. Existing controllable video generation approaches face a trade-off: sparse controls like trajectories are easy to specify but lack instance-awareness, while dense signals such as optical flow, depths or 3D meshes are informative but costly to obtain. We propose VHOI, a two-stage framework that first densifies sparse trajectories into HOI mask sequences, and then fine-tunes a video diffusion model conditioned on these dense masks. We introduce a novel HOI-aware motion representation that uses color encodings to distinguish not only human and object motion, but also body-part-specific dynamics. This design incorporates a human prior into the conditioning signal and strengthens the model's ability to understand and generate realistic HOI dynamics. Experiments demonstrate state-of-the-art results in controllable HOI video generation. VHOI is not limited to interaction-only scenarios and can also generate full human navigation leading up to object interactions in an end-to-end manner.

[pdf], [video], [project page], [arxiv]



OnlineSplatter: Pose-Free Online 3D Reconstruction for Free-Moving Objects

Mark He Huang   Lin Geng Foo   Christian Theobalt   Ying Sun   De Wen Soh  

NeurIPS 2025 (Spotlight Presentation)

Abstract
Free-moving object reconstruction from monocular video remains challenging, particularly without reliable pose or depth cues and under arbitrary object motion. We introduce OnlineSplatter, a novel online feed-forward framework generating high-quality, object-centric 3D Gaussians directly from RGB frames without requiring camera pose, depth priors, or bundle optimization. Our approach anchors reconstruction using the first frame and progressively refines the object representation through a dense Gaussian primitive field, maintaining constant computational cost regardless of video sequence length. Our core contribution is a dual-key memory module combining latent appearance-geometry keys with explicit directional keys, robustly fusing current frame features with temporally aggregated object states. This design enables effective handling of free-moving objects via spatial-guided memory readout and an efficient sparsification mechanism, ensuring comprehensive yet compact object coverage. Evaluations on real-world datasets demonstrate that OnlineSplatter significantly outperforms state-of-the-art pose-free reconstruction baselines, consistently improving with more observations while maintaining constant memory and runtime.

[pdf], [video], [project page], [arxiv]



Avatar Concept Slider: Controllable Editing of Concepts in 3D Human Avatars

Lin Geng Foo   Yixuan He   Ajmal Saeed Mian   Hossein Rahmani   Jun Liu   Christian Theobalt  

arXiv 2025

Abstract
Text-based editing of 3D human avatars to precisely match user requirements is challenging due to the inherent ambiguity and limited expressiveness of natural language. To overcome this, we propose the Avatar Concept Slider (ACS), a 3D avatar editing method that allows precise editing of semantic concepts in human avatars towards a specified intermediate point between two extremes of concepts, akin to moving a knob along a slider track. To achieve this, our ACS has three designs: Firstly, a Concept Sliding Loss based on linear discriminant analysis to pinpoint the concept-specific axes for precise editing. Secondly, an Attribute Preserving Loss based on principal component analysis for improved preservation of avatar identity during editing. We further propose a 3D Gaussian Splatting primitive selection mechanism based on concept-sensitivity, which updates only the primitives that are the most sensitive to our target concept, to improve efficiency. Results demonstrate that our ACS enables controllable 3D avatar editing, without compromising the avatar quality or its identifying attributes.

[pdf], [arxiv]

Teaching