GenLCA

FIRST_AUTHOR_LAST, FIRST_AUTHOR_FIRST; SECOND_AUTHOR_LAST, SECOND_AUTHOR_FIRST

GenLCA
3D Diffusion for Full-Body Avatars
from In-the-Wild Videos

Yiqian Wu^2,1, Rawal Khirodkar¹, Egor Zakharov¹, Timur Bagautdinov¹, Lei Xiao¹, Zhaoen Su¹,
Shunsuke Saito¹, Xiaogang Jin², Junxuan Li¹

¹Codec Avatars Lab, Meta
²State Key Laboratory of CAD&CG, Zhejiang University

Arxiv

Paper (Arxiv) Supplementary Video

If you are interested in academic comparisons, please contact Junxuan Li.

Note: This video contains audio narration

TL;DR: GenLCA is a diffusion-based generative model for generating and editing photorealistic full-body avatars from text and image inputs.

Insights

The core idea of is GenLCA a novel paradigm that enables training a full-body 3D diffusion model from partially observable 2D data, allowing the training dataset to scale to millions of real-world videos. This scalability contributes to the superior photorealism and generalizability of GenLCA.
Specifically, we scale up the dataset by repurposing a pretrained feed-forward avatar reconstruction model as an animatable 3D tokenizer. The reconstruction model encodes unstructured video frames of a person into 3D tokens, which can then be decoded into an animatable 3D Gaussian avatar. We then train a flow-based diffusion model, GenLCA.
GenLCA Method Overview

However, most real-world videos only provide partial observations of body parts, while the reconstruction model is inherently limited in hallucinating unobserved regions. As a result, excessive blurring or transparent artifacts often occur in the occluded areas. GenLCA Method Overview

To address this, we propose a novel visibility-aware diffusion training strategy that replaces invalid regions with learnable tokens and computes losses only over valid regions. Our approach effectively enables the use of large-scale real-world video data to train a diffusion model natively in 3D. GenLCA Method Overview

Text-based Generation

GenLCA generates high-quality, animatable 3D avatars from text descriptions. Click on any avatar to explore it in the interactive 3D viewer.

Note: Due to the constraints of the web viewer, we only show static GS results. However, all avatars are fully animatable. Please refer to the videos in Animation Examples.

Animation Examples

Text-Based Result Gallery (3D Avatars) Text-Based Result Gallery (Videos)

Body-part-based Generation

GenLCA generates high-quality, animatable 3D avatars from reference body part images, including face, hair, upper clothes, lower clothes, and shoes. Click on any avatar to explore it in the interactive 3D viewer.

Note: Due to the constraints of the web viewer, we only show static GS results. However, all avatars are fully animatable. Please refer to the videos in Animation Examples.

Animation Examples

Body-part-based Result Gallery (3D Avatars) Body-part-based Result Gallery (Videos)

Multi-modal Editing

GenLCA supports multi-modal editing operations by leveraging text, RGB images, or scribbles as control signals. Click on any avatar to view it in the 3D viewer.

BibTeX

@misc{wu2026genlca3ddiffusionfullbody,
            title={GenLCA: 3D Diffusion for Full-Body Avatars from In-the-Wild Videos}, 
            author={Yiqian Wu and Rawal Khirodkar and Egor Zakharov 
              and Timur Bagautdinov and Lei Xiao and Zhaoen Su and 
              Shunsuke Saito and Xiaogang Jin and Junxuan Li},
            year={2026},
            eprint={2604.07273},
            archivePrefix={arXiv},
            primaryClass={cs.CV},
            url={https://arxiv.org/abs/2604.07273}
          }

GenLCA3D Diffusion for Full-Body Avatars from In-the-Wild Videos

TL;DR: GenLCA is a diffusion-based generative model for generating and editing photorealistic full-body avatars from text and image inputs.

Insights

Text-based Generation

Animation Examples

Body-part-based Generation

Animation Examples

Multi-modal Editing

BibTeX

3D Avatar Viewer

GenLCA
3D Diffusion for Full-Body Avatars
from In-the-Wild Videos