Portrait3D

Text-Guided High-Quality 3D Portrait Generation
Using Pyramid Representation and GANs Prior

ACM Transactions on Graphics (Proceedings of SIGGRAPH 2024)

Yiqian Wu¹, Hao Xu¹, Xiangjun Tang¹, Xien Chen², Siyu Tang³, Zhebin Zhang⁴, Chen Li⁴, Xiaogang Jin*¹

¹State Key Lab of CAD&CG, Zhejiang University, ²Yale University, ³ETH Zürich, ⁴OPPO US Research Center

Paper (ACM Digital Library) ArXiv Suppl Video Code Results

Portrait3D produces high-quality, view-consistent, realistic, and canonical 3D portraits that are in alignment with the input text prompts.

1-year-old baby, brunette, wearing a pink dress and pearl necklace, with a Minnie mouse-themed blurred background, 50 mm, f3.0

A handsome man retro America style 1970-1990, hippie

An India fish seller, behind his counter, sophisticated

Congolese woman, wrinkles, aged, necklace, crowded papers room, neutral colors, barren land

24 y.o man in (casual clothes:1.2), long brown hair, big nose, wet hair, aquiline nose, natural skin, 8k uhd, high quality, film grain, Fujifilm XT3

A Scottish teenage girl dons a teddy bear-themed costume. She meets the camera with a mischievous gaze, her face adorned with playful makeup that accentuates her freckles and adds a touch of whimsy. Her hair is braided with colorful ribbons

55 y.o man, traveler clothes, standing in the forest, natural skin, 8k uhd, high quality, film grain, Fujifilm XT3, 8k uhd, high quality, film grain, Fujifilm XT3, professional Photography

Woman, bold, bright colours, rainbow Mohawk haircut, cyberpunk, wearing jacket, handsome, metallic necklace, realistic, soft lighting, professional Photography

25 y.o man in casual clothes, night, city street, (high detailed skin:1.2), 8k uhd, dslr, soft lighting, high quality, film grain, Fujifilm XT3

1girl, In the bustling morning market, a Muslim young teenager stands adorned in a vibrant red outfit, her hair modestly covered, Her delicate visage, with perfect pale skin, is illuminated by the gentle sunlight

Male wearing a classic well-fitted crew-neck t-shirt, neutral backdrop, even and flattering lighting, soft diffused lighting, confident and empowering mood, looking at camera, beard epiCPhoto

Dark-skinned HOT 28 y.o African woman, very short haircut style, bangs, wearing black vest, makeup, dark lipstick, mascara, eyeliner, fake eyelashes, dark theme, soothing tones

High resolution photography portrait of a black man, symmetrical face pose, standing outside in the cold. Handsome. Powerful, black hair, professional Photography, Photorealistic

Early 80s poloraid color photo, headshot, medium brown hair with bangs, ((wearing a nurse outfit)), a woman in a hospital hall, bokeh, professional

Man, middle age, long grey hair with man bun, tan skin, oval face, beard, dark eyes, wearing jacket

Woman, serious,short hair,cinematic, Canon powershot, bokeh, filmic grain, motion blur, soft lighting, neon lights, uhd, 8k, ultrasharp, masterpiece, city street, nighttime, wearing a sweater, low brightness, dimly lit

A woman wearing a golden dress by Madame Gres, soft lighting, professional Photography, Photorealistic, detailed

A fat Asian man wearing a gold blue and green rubber apron, realistic skin, fine detail, in a restaurant

French victoria secret model, bangs long dark hair, (((sitting backstage of fashion show in Paris))), nervous, natural aged skin texture, detailed face, 24mm, 4k textures, soft cinematic light, adobe lightroom

A male samurai, ponytail, epic, ruins background, perfect composition, cinematic, moody, rule of thirds, majestic, detailed, sharp details

An asian grandmother with a beautiful smile and delicate wrinkles, wearing floral shirt, 8k resolution, hyperdetailed photograph, beautiful lighting, dark contrast, deep shadows, award winning composition

A Japanese girl, smile, twin braids hair, black hair, brown eyes, pink kimono, best quality, ultra-detailed, extremely delicate and beautiful, Japanese room

A 18yo boy, (sideburns, ginger facial stubble, short hair), skin imperfections, expressive, chill atmosphere, masterpiece, short hair, casual, classroom, shirt, pants

A 30 years old Russian farmer woman, in a rural Russian village in 1988 in autumn, style of wes Anderson

Mr. Bean, short, neatly parted hair, wearing a signature tweed jacket, a thin red tie, 8k uhd, high quality, dramatic, cinematic

A woman,short hair, detailed skin texture,masterpiece, face focus, photorealistic, woman, 4k, HDR, backlighting, bloom, light, RAW color photo,(fully in frame:1.1), (blush:0.5), wearing a blouse, asian. happy, idol

A man, solo, scruffy facial hair, dressed in a long-sleeved shirt, waistcoat, and a beanie, exuding a distinct 'off-the-grid' aura

A graceful ballerina, wavy brunette hair in a neat chignon, captivating almond-shaped hazel eyes, in a rosy tutu

A gorgeous 1950s americana vintage portrait of a boy, eyeshadow, wearing 1950s americana shirt with a maroon striped sweater vest and (skin-tight purple navy high-waisted bell-bottom pants with a wide belt)), serious expression, subsurface scattering, f2, 35mm, film grain

A riya woman wearing elegant high-neck Kalamkari Dress (hand-painted fabric) with Raised chin, brown hair, black eyes, film grain, perfect eyes

Studio photo of an old yet stylish grandpa, detailed, studio lighting, medium format

A middle-aged redhead woman taken at a convenience store during the night,exhibiting a grainy texture, jpg artifacts, film grain, gritty, raw aesthetic

A man Model wearing a pink jumper, Red Autumn Forest Background, Natural Lighting, chaos, minimalist, 1990s, (shot on Portra 400).

A young woman with dark blue bob hair with a steampunk bow, all in a post apocalyptic steampunk farwest decor

Fashion Editorial. Castle edition, analog fashion portrait. Portra 400 high dpi scan. Beautiful American young , twenty years old ,as Disney prince Philip sepia pastel colors , long dark red velvet cape

A black girl about 10 years old with hair in two Afro puffs. She is standing against a colorful graffiti wall. She is wearing a deep red coloured t-shirt

A baby-faced redhead boy taken at a convenience store during the night, exhibiting a grainy texture, jpg artifacts, film grain, gritty, raw aesthetic

A very handsome elegant Asian man, dressed in a black suit and tie with short dark hair

A boy 7 years old, next to an mountains, in the style of portraits with soft lighting, he jiaying, qian xuan

A bearded man in elegant tweed suit, misty november day, gray and orange hues, fog, unusual angle, professional photography, shot with Hasselblad

A blonde woman is wearing a shiny satin shirt, 1980s N.Y the Bronx

A handsome 35-year-old wide mountain man feature with a faded line-up style, light brown hair, bluish-grey eyes, wide barrel chest, wearing shirt, in an office

A weathered 70 year old man with a long gray beard, on a boat in calm water on the louisiana bayou on a hot summer night, t-shirt and overalls, character, digital photo, low light

A Teen boy, pensive look, dark hair. Preppy sweater, collared shirt, moody room, 80s memorabilia

A handsome Japanese young male idol with blue hair wearing a T-shirt and denim shorts in a grocery store in Tokio in the 1970s

Abstract

Existing neural rendering-based text-to-3D-portrait generation methods typically make use of human geometry prior and diffusion models to obtain guidance. However, relying solely on geometry information introduces issues such as the Janus problem, over-saturation, and over-smoothing.

We present Portrait3D, a novel neural rendering-based framework with a novel joint geometry-appearance prior to achieve text-to-3D-portrait generation that overcomes the aforementioned issues. To accomplish this, we train a 3D portrait generator, 3DPortraitGAN-Pyramid, as a robust prior. This generator is capable of producing 360° canonical 3D portraits, serving as a starting point for the subsequent diffusion-based generation process. To mitigate the "grid-like" artifact caused by the high-frequency information in the feature-map-based 3D representation commonly used by most 3D-aware GANs, we integrate a novel pyramid tri-grid 3D representation into 3DPortraitGAN-Pyramid. To generate 3D portraits from text, we first project a randomly generated image aligned with the given prompt into the pre-trained 3DPortraitGAN-Pyramid's latent space. The resulting latent code is then used to synthesize a pyramid tri-grid. Beginning with the obtained pyramid tri-grid, we use score distillation sampling to distill the diffusion model's knowledge into the pyramid tri-grid. Following that, we utilize the diffusion model to refine the rendered images of the 3D portrait and then use these refined images as training data to further optimize the pyramid tri-grid, effectively eliminating issues with unrealistic color and unnatural artifacts.

Our experimental results show that Portrait3D can produce realistic, high-quality, and canonical 3D portraits that align with the prompt.

Video

Pipeline

The pipeline of Portrait3D. During the process of text-to-3D-portraits generation, given a text prompt, we first randomly generate a portrait image by feeding the text prompt into the diffusion model, and then project the generated image into the latent space of our generator. The resulting latent code is used to synthesize the corresponding pyramid tri-grid, which serves as the starting point of the subsequent diffusion-based generation process. Following that, we distill the knowledge of the diffusion model into the pyramid tri-grid through score distillation sampling. This process produces a 3D portrait that aligns with the input prompt. To further enhance the quality of the obtained 3D portrait, we apply the diffusion model to process its rendered images. The refined rendered images are then used as training data to optimize the pyramid tri-grid, yielding the final results.

Comparison

Qualitative comparison to SOTA text-to-3D approaches: DreamFusion, LucidDreamer, TADA, AvatarCraft, AvatarStudio, HumanGaussian, AvatarVerse, HumanNorm, SEEAvatar, TECA, and our method. The input prompt is presented at the top.

Qualitative comparison to SOTA diffusion-based reconstruction approaches: One-2-3-45, DreamGaussian, Wonder3D, SyncDreamer and TeCH, and our method. The reference prompt is presented at the top, and the reference image (which is also the generated image used for image inversion in our framework) is presented at the left.

Results Gallery

We offer a gallery of 300 3D portraits (with their corresponding prompts) generated by our method, all viewable and accessible on huggingface. For visualization, please refer to our github repository. (Bellow are some examples of the generated 3D portraits.)

BibTeX

If you find this project helpful to your research, please consider citing:

@article{Portrait3D_sig24,
author = {Wu, Yiqian and Xu, Hao and Tang, Xiangjun and Chen, Xien and Tang, Siyu and Zhang, Zhebin and Li, Chen and Jin, Xiaogang},
title = {Portrait3D: Text-Guided High-Quality 3D Portrait Generation Using Pyramid Representation and GANs Prior},
year = {2024},
publisher = {Association for Computing Machinery},
volume = {43},
number = {4},
url = {https://doi.org/10.1145/3658162},
doi = {10.1145/3658162},
journal = {ACM Trans. Graph.},
month = {Jul},
articleno = {45}
}

Text-Guided High-Quality 3D Portrait Generation Using Pyramid Representation and GANs Prior