Single-View 3D Garment Reconstruction
By Ashlyn Lacovara
Carnegie Mellon Univeristy, Texas A&M University, and Google AR Reserachers: Yuanhao Wang, Cheng Zhang, Goncalo Frazao, Jinlong Yang, Alexandru-Eugen Ichim, Thabo Beeler, Fernado De La Torre
GarmentCrafter introduces a new way of creating and editing 3D garments using only a single image, significantly lowering the barrier for non-professional users to work with digital apparel. While recent advances in image generation have made 2D garment design more accessible, producing accurate, editable 3D garments has remained difficult without specialized tools and expertise. The work addresses this gap by proposing a method that allows users to generate and modify garments in three dimensions through simple interactions with a single-view image.
The need for such tools is growing as digital garments become increasingly important across virtual environments, gaming, and personalized digital experiences. Professional fashion designers have long relied on sophisticated 3D software to craft detailed virtual apparel, but these workflows are not accessible to most users. Although modern image generation and editing techniques have enabled high-quality 2D garment design, achieving comparable realism and control in 3D remains a major challenge. Existing single-view 3D garment approaches typically rely on either template-based systems—where garments are deformed or registered onto predefined human body models—or novel view synthesis methods built on pre-trained diffusion models. Both strategies have limitations, often struggling to capture realistic geometry and appearance simultaneously.
Garments themselves introduce unique technical challenges. Their shapes, structures, and textures vary widely, making template-based approaches difficult to generalize across styles. Many methods prioritize either geometry or texture rather than balancing both, leading to incomplete or inconsistent results. At the same time, fine garment details require strong consistency across viewpoints. Existing novel view synthesis techniques frequently fail to maintain these relationships, resulting in mismatches where visual elements do not align correctly between views.
GarmentCrafter addresses these issues through a progressive novel-view synthesis pipeline designed to improve cross-view coherence and structural accuracy. The method begins by estimating depth from a single input image and using image warping to approximate unseen viewpoints. A multi-view diffusion model is then applied to reconstruct occluded or unknown garment regions, guided by an evolving camera pose. By jointly inferring RGB and depth information, the system preserves relationships between views and reconstructs more precise geometry and fine surface details. Depth-based warped imagery is used as an additional condition to guide alignment, allowing the model to refine both geometry and texture progressively along a predefined camera trajectory.
This enables new forms of interaction that were not previously available in single-view garment reconstruction. Users can perform local edits—such as adjusting surface details—or manipulate specific garment parts directly within a single image, with those changes reflected consistently in the resulting 3D representation. Trained on large-scale 3D garment datasets and evaluated on both curated and real-world clothing images, GarmentCrafter demonstrates strong performance across scenarios.
Experimental results show that the method outperforms current state-of-the-art 2D-to-3D garment reconstruction techniques in geometric accuracy, visual fidelity, and cross-view consistency. By combining progressive depth prediction, image warping, and multi-view diffusion, GarmentCrafter represents a step toward more practical and accessible 3D garment creation—bringing capabilities once limited to professional design software into workflows that can be used by a much broader audience.
