Multi-concept 3D customization with MultiDreamer3D. MultiDreamer3D can generate 3D content incorporating multiple input concepts in three cases: 1) multiple subjects, 2) property change, and 3) interaction.
While single-concept customization has been studied in 3D, multi-concept customization remains largely unexplored. To address this, we propose MultiDreamer3D that can generate coherent multi-concept 3D content in a divide-and-conquer manner. First, we generate 3D bounding boxes using an LLM-based layout controller. Next, a selective point cloud generator creates coarse point clouds for each concept. These point clouds are placed in the 3D bounding boxes and initialized into 3D Gaussian Splatting with concept labels, enabling precise identification of concept attributions in 2D projections. Finally, we refine 3D Gaussians via concept-aware interval score matching, guided by concept-aware diffusion. Our experimental results show that MultiDreamer3D not only ensures object presence and preserves the distinct identities of each concept but also successfully handles complex cases such as property change or interaction. To the best of our knowledge, we are the first to address the multi-concept customization in 3D.
Overall pipeline of MultiDreamer3D. (a) The 3D layout controller produces 3D bounding boxes given text descriptions. Subsequently, the selective concept point cloud generator outputs coarse concept point clouds and positions within the 3D bounding boxes. (b) The images and concept masks are rendered from 3D Gaussian Splatting (3DGS) $\Theta$ and updated with concept-aware interval score matching (CISM) loss, facilitated by regional concept attention (RCA).
The Regional Concept Attention (RCA) modulates the cross-attention layer in the diffusion model. Individual concept query vectors are computed with image features and each concept masks. Subsequently, key and value vectors for each concept are derived using concept-specific LoRAs and prompts. Then concept-specific attention features are computed with each query, key, and value. The final cross-attention features are aggregated with masked concept-specific attention features.
Qualitative results. We compare our method with other baselines in three cases, multiple subjects, property change, and interaction. The red dashed line indicates the objects mentioned in the text prompt that are missing.
@article{song2025multidreamer3d,
title={MultiDreamer3D: Multi-concept 3D Customization with Concept-Aware Diffusion Guidance},
author={Song, Wooseok and Chang, Seunggyu and Yoo, Jaejun},
journal={arXiv preprint arXiv:2501.13449},
year={2025}
}