AnimPortrait3D: Text-based Animatable 3D Avatars with Morphable Model Alignment

Abstract

The generation of high-quality, animatable 3D head avatars from text has enormous potential in content creation applications such as games, movies, and embodied virtual assistants. Current text-to-3D generation methods typically combine parametric head models with 2D diffusion models using score distillation sampling to produce 3D-consistent results. However, they struggle to synthesize realistic details and suffer from misalignments between the appearance and the driving parametric model, resulting in unnatural animation results. We discovered that these limitations stem from ambiguities in the 2D diffusion predictions during 3D avatar distillation, specifically: (i) the avatar's appearance and geometry is underconstrained by the text input, and (ii) the semantic alignment between the predictions and the parametric head model is insufficient because the diffusion model is unaware of the parametric model.

In this work, we propose a novel framework, AnimPortrait3D, for text-based realistic animatable 3DGS avatar generation with morphable model alignment, and introduce two key strategies to address these challenges. First, we tackle appearance and geometry ambiguities by utilizing prior information from a pretrained text-to-3D model to initialize a 3D avatar with robust appearance, geometry, and rigging relationships to the morphable model. Second, we refine the initial 3D avatar for dynamic expressions using a ControlNet that is conditioned on semantic and normal maps of the morphable model to ensure accurate alignment. As a result, our method outperforms existing approaches in terms of synthesis quality, alignment, and animation fidelity. Our experiments show that the proposed method advances the state of the art in text-based, animatable 3D head avatar generation.

Video

Pipeline

Overview of AnimPortrait3D. Given an input text, the 3D Avatar Initialization stage generates a well-defined initial avatar that provides appearance and geometry prior information, and is rigged to SMPL-X for animation. During the Dynamic Optimization stage, we optimize the avatar for dynamic poses and expressions using a 2D diffusion model. We first pre-train the eye and mouth regions, then optimize the full avatar and apply a final refinement strategy to produce the final result. AnimPortrait3D is able to generate avatars with diverse appearances, ethnicities, and ages.

Comparison

Qualitative comparison to SOTA text-to-3D approaches: HeadStudio, TADA, HumanGaussian, PortraitGen, GPAvatar, GAGAvatar, and our method. While other methods take a text prompt as input (shown at the top), GPAvatar and GAGAvatar use an image as input. The reference images are sourced from the video data in the VFHQ dataset.

Resources

Gallery of Results

We provide a gallery of results generated by AnimPortrait3D on a diverse set of text prompts. Each result is accompanied by the input text prompt and the generated animatable 3D avatar. The preview image and the results can be found at huggingface.

Pre-trained ControlNet

This ControlNet can generate high-quality RGB images for facial, mouth, and eye regions, leveraging their respective conditional inputs (text, normal map and segmentation map). The pre-trained model can be found at huggingface.

Reconstructed Face Motions for Animation

We provide some motion sequences reconstructed from VFHQ dataset using VHAP, please refer to google drive for download.

Interactive Rendering

We provide an interactive user interface for animating our generated avatars using motion sequences or custom parameters, with real-time visualization of the underlying mesh. For more details, please visit our github repo.

BibTeX

If you find this project helpful to your research, please consider citing:

@article{AnimPortrait3D_sig25,
      author = {Wu, Yiqian and Prinzler, Malte and Jin, Xiaogang and Tang, Siyu},
      title = {Text-based Animatable 3D Avatars with Morphable Model Alignment},
      year = {2025}, 
      isbn = {9798400715402}, 
      publisher = {Association for Computing Machinery},
      address = {New York, NY, USA},
      url = {https://doi.org/10.1145/3721238.3730680},
      doi = {10.1145/3721238.3730680},
      articleno = {},
      numpages = {11},
      location = {Vancouver, BC, Canada},
      series = {SIGGRAPH '25}
}

Text-based Animatable 3D Avatars
with Morphable Model Alignment

Woman, bold, bright colours, rainbow Mohawk haircut, cyberpunk, wearing jacket, handsome, metallic necklace, realistic, soft lighting, professional Photography.

A handsome young brown lightly wavy Pompadour with fade, muscled sportsman in navy satin large pinstripe double-breasted suit, club tie, pompadour haircut

Woman, white hair and black skin, portrait, Off-shoulder top and wide-leg pants for a rooftop bar night, (high detailed skin:1.2)

A Teen boy, pensive look, dark hair. Preppy sweater, collared shirt, moody room, 80s memorabilia

A sophisticated, silver-haired Caucasian woman in her 60s, with a heart-shaped face, styled in a perfectly coiffed bouffant hairstyle, dressed in a tailored tweed suit, pearl necklace, and carrying a classic leather handbag, strolling through an art gallery, admiring vibrant abstract paintings

The middle-aged Hispanic man had a strong jawline, bushy black mustache, and a receding hairline. He wore a fitted navy blue suit with a crisp white shirt, and a gold watch shone on his wrist as he confidently stood in a bustling city street

A graceful merfolk with scales that glimmer in hues of aquamarine and gold, waering Flowing garments made of shimmering seaweed or kelp, flowing teal hair, and webbed fingers tipped with pearlescent claws

With a salt and pepper mustache, the 62-year-old African American man had an oval-shaped face, wore a plaid button-up shirt, and rested his hands on his cane as he walked along the beach at sunset

A two-year-old African American girl with chubby cheeks and a round face, has short, curly hair, wears a pink ruffled dress with a small necklace bearing her name, and is seen giggling in a sunlit garden

A weathered 70 year old man with a long gray beard, on a boat in calm water on the louisiana bayou on a hot summer night, t-shirt and overalls

A vivacious 41-year-old Middle Eastern woman with an olive complexion and captivating black-brown eyes. Her shoulder-length jet-black hair is styled in loose curls, adding volume. She wears a patterned silk tunic, paired with dangling silver earrings, confidently navigating a bustling bazaar

An eight-year-old Latina girl with a warm, oval face and almond-shaped eyes, her dark brown hair in a short sleek ponytail, dons a white blouse with lace trimmings; she gazes curiously at the aquarium fish, her silver locket glinting in the light

A gorgeous 1950s americana vintage portrait of a boy, eyeshadow, wearing 1950s americana shirt with a maroon striped sweater vest and (skin-tight purple navy high-waisted bell-bottom pants with a wide belt)), serious expression, subsurface scattering

A woman, long blonde hair styled in a 1950s-inspired look, and is dressed in a pink checkered dress with a white apron, wears pearls around her neck, a retro, pastel-themed kitchen with a whimsical and vintage aesthetic

Dressed in a casual and laid-back manner, a 24-year-old Hispanic young man radiates confidence with his athletic build, tousled dark hair, wearing a plain black t-shirt, ripped jeans, and a worn-out leather jacket, embodying effortless coolness in a vibrant city