Huazhong Agricultural University's (HZAU) machine learning and computer vision team recently achieved a major advancement in 3D object retrieval. Their paper, titled Describe, Adapt and Combine: Empowering CLIP Encoders for Open-set 3D Object Retrieval, was accepted for presentation at ICCV 2025, one of the top international conferences in computer vision.
The research addresses the challenging problem of open-set 3D object retrieval, where models must accurately identify objects from unseen categories. Traditional methods rely on complex multimodal inputs such as point clouds or voxels, limiting practicality. The team proposed a novel lightweight adaptation framework called DAC, which leverages the powerful representation ability of the large-scale pre-trained CLIP model using only multi-view images as input.
Using a low-rank adaptation (LoRA) technique, DAC efficiently fine-tunes CLIP with minimal labeled data, significantly improving retrieval accuracy while maintaining model simplicity. To prevent overfitting on known categories, DAC introduces a learnable additive bias independent of inputs, preserving generalization to unknown classes. Additionally, DAC incorporates text modality by generating semantic descriptions of 3D objects, enhancing recognition and feature matching for unseen objects.
Wang Zhichuan, a 2024 graduate student from the School of Information, is the paper's first author. Professors Wang Yulong and He Xinwei serve as corresponding authors. The work received guidance from experts at Shenzhen University, University of Hong Kong, University of Louisville, ByteDance AI Lab, and Huazhong University of Science and Technology. Funding came from the National Natural Science Foundation Youth Project and other grants.

This breakthrough marks a significant step toward practical, efficient 3D retrieval systems with strong generalization for real-world applications. [Photo/news.hzau.edu.cn]