DIPO: Dual-State Images Controlled Articulated Object Generation Powered by Diverse Data

Ruiqi Wu^1,3* Xinjie Wang³ Liu Liu³ Chunle Guo^1,2† Jiaxiong Qiu³ Chongyi Li^1,2 Lichao Huang³
Zhizhong Su³ Ming-Ming Cheng^1,2

¹VCIP, CS, Nankai University ²NKIARI, Shenzhen Futian ³Horizon Robotics
^*Work done as Research Intern ^†Corresponding Author

ArXiv Preprint 2025

📄 Paper 💻 Code 🗂️ Dataset 🎥 Demo 📚 BibTex

🔥Controllable Articulated Object Generation from Dual-State Images 🔥

Our DIPO framework generates high-quality articulated 3D objects from dual-state image pairs (resting and articulated states). Unlike single-image methods that struggle with motion ambiguity, our approach leverages explicit articulation information for precise control.

🔧 PM-X Dataset: Building Complex Articulated Objects

Abstract

We present DIPO, a novel framework for the controllable generation of articulated 3D objects from a pair of images: one depicting the object in a resting state and the other in an articulated state. Compared to the single-image approach, our dual-image input imposes only a modest overhead for data collection, but at the same time provides important motion information, which is a reliable guide for predicting kinematic relationships between parts. Specifically, we propose a dual-image diffusion model that captures relationships between the image pair to generate part layouts and joint parameters. In addition, we introduce a Chain-of-Thought (CoT) based graph reasoner that explicitly infers part connectivity relationships. To further improve robustness and generalization on complex articulated objects, we develop a fully automated dataset expansion pipeline, named LEGO-Art, that enriches the diversity and complexity of PartNet-Mobility dataset. We propose PM-X, a large-scale dataset of complex articulated 3D objects, accompanied by rendered images, URDF annotations, and textual descriptions. Extensive experiments demonstrate that DIPO significantly outperforms existing baselines in both the resting state and the articulated state, while the proposed PM-X dataset further enhances generalization to diverse and structurally complex articulated objects.

Method Overview

Our DIPO framework consists of three key components:

1. Dual-State Image Conditioning: We condition the denoising process on both resting-state and articulated-state images using DINOv2 features. A Dual-State Injection Module integrates motion-aware cues by performing cross-attention between the two states.

2. Chain-of-Thought Graph Reasoner: This module predicts articulated part connectivity graphs from dual-state images using a step-by-step reasoning paradigm. It identifies candidate parts, estimates spatial layouts, verifies articulation rules, and infers attachment relationships.

3. LEGO-Art Pipeline: A fully automated synthesis pipeline that generates complex articulated 3D assets by assembling part primitives from existing datasets. It includes Description Roller, Layout Builder, Scripting Toolkit, Retrieval & Render, and Visual Filter modules.

PM-X Dataset & LEGO-Art

We introduce PM-X (PartNet-Mobility-Complex), a large-scale dataset of structurally complex articulated objects built using our LEGO-Art pipeline.

Quantitative Results

We evaluate our method on both PartNet-Mobility and ACD datasets, demonstrating superior performance in reconstruction quality and graph prediction accuracy.

Results on PartNet-Mobility Test Set

Method	Reconstruction Quality						Graph Acc% ↑
Method	RS-dgIoU ↓	AS-dgIoU ↓	RS-dcDist ↓	AS-dcDist ↓	RS-dCD ↓	AS-dCD ↓	Graph Acc% ↑
URDFormer [6]	1.2327	1.2332	0.2885	0.4403	0.4417	0.6910	6.62
NAP-ICA [18]	0.5706	0.5765	0.0563	0.2547	0.0209	0.3473	25.06
SINGAPO [23]	0.5134	0.5236	0.0487	0.1107	0.0191	0.1270	75.97
DIPO (Ours)	0.4561	0.4683	0.0359	0.0732	0.0132	0.0423	85.06

Qualitative Comparison

Our method demonstrates superior visual quality and better accuracy in articulation graph prediction. Thanks to the large-scale structurally diverse training provided by the PM-X dataset, DIPO shows better robustness when handling complex objects or real-world data.

Acknowledgements

This work was done while Ruiqi Wu was a Research Intern with Horizon Robotics. We thank the reviewers for their valuable feedback.

BibTex

@article{wu2025dipo,
    title={DIPO: Dual-State Images Controlled Articulated Object Generation Powered by Diverse Data},
    author={Wu, Ruiqi and Wang, Xinjie and Liu, Liu and Guo, Chunle and Qiu, Jiaxiong and Li, Chongyi and Huang, Lichao and Su, Zhizhong and Cheng, Ming-Ming},
    journal={arXiv preprint arXiv:2505.20460},
    year={2025}
}

Contact

Feel free to contact us at wuruiqi@mail.nankai.edu.cn

Visitor Count