Heterogeneous Pretrained Transformers for Complex Control |
کد مقاله : 1655-ISME2025 |
نویسندگان |
آرین سرداری، علی موسوی * دانشگاه صنعتی شریف |
چکیده مقاله |
Transformer-based architectures have recently achieved remarkable success in various domains such as language processing, vision, and multi-modal learning. However, most existing approaches rely on homogeneous layers and shared parameters across all modalities, which can lead to inefficiencies when dealing with heterogeneous data in complex control environments. This paper introduces Heterogeneous Pretrained Transformers (HPT), a framework in which specialized transformer blocks process distinct input modalities (textual instructions, visual data, and sensor signals) while maintaining cross-attention for meaningful information fusion. We first present the architecture of HPT, including its modality-specific embedding functions and fusion points. We then demonstrate how the proposed framework can be adapted for a robotic control scenario, where an agent must combine textual commands, real-time camera feeds, and sensor readings to execute fine-grained actions. Experimental results from simulation studies indicate that HPT can outperform baseline multi-modal transformers in task success rate and data efficiency. This work paves the way for more effective use of diverse data streams in complex control applications. |
کلیدواژه ها |
Complex Control, Heterogeneous Networks, Multi-Modal Learning, Robotics, Transformers |
وضعیت: پذیرفته شده برای ارائه شفاهی |