Baoru Huang
Baoru Huang
Home
News
Experience
Projects
Publications
Light
Dark
Automatic
1
StereoMamba: Real-time and Robust Intraoperative Stereo Disparity Estimation via Long-range Spatial Dependencies
Stereo disparity estimation is crucial for obtaining depth information in robot-assisted minimally invasive surgery (RAMIS). We propose the StereoMamba architecture, which is specifically designed for stereo disparity estimation in RAMIS. Our approach is based on a novel Feature Extraction Mamba (FE-Mamba) module, which enhances long-range spatial dependencies both within and across stereo images.
Xu Wang
,
Jialang Xu
,
Shuai Zhang
,
Baoru Huang
,
Danail Stoyanov
,
Evangelos B Mazomenos
PDF
Cite
GraspMAS: Zero-Shot Language-Driven Grasp Detection with Multi-Agent System
Language-driven grasp detection has the potential to revolutionize human-robot interaction by allowing robots to understand and execute grasping tasks based on natural language commands. In this paper, we introduce GraspMAS, a new multi-agent system framework for language-driven grasp detection. GraspMAS is designed to reason through ambiguities and improve decision-making in real-world scenarios.
Quang Nguyen
,
Tri Le
,
Huy Nguyen
,
Thieu Vo
,
Tung D Ta
,
Baoru Huang
,
Minh N Vu
,
Anh Nguyen
PDF
Cite
Code
Dataset
SplineFormer: An Explainable Transformer-Based Approach for Autonomous Endovascular Navigation
Endovascular navigation is a crucial aspect of minimally invasive procedures, where precise control of curvilinear instruments like guidewires is critical for successful interventions. We propose SplineFormer, a new transformer-based architecture, designed specifically to predict the continuous, smooth shape of the guidewire in an explainable way.
Tudor Jianu
,
Shayan Doust
,
Mengyun Li
,
Baoru Huang
,
Tuong Do
,
Hoan Nguyen
,
Karl Bates
,
Tung D Ta
,
Sebastiano Fichera
,
Pierre Berthet-Rayne
,
Anh Nguyen
PDF
Cite
Code
Dataset
Robotic-CLIP: Fine-tuning CLIP on Action Data for Robotic Applications
Vision language models have played a key role in extracting meaningful features for various robotic applications. In this paper, we introduce Robotic-CLIP to enhance robotic perception capabilities.
Nghia Nguyen
,
Minh Nhat Vu
,
Tung D Ta
,
Baoru Huang
,
Thieu Vo
,
Ngan Le
,
Anh Nguyen
PDF
Cite
Code
Dataset
Tracking everything in robotic-assisted surgery
Accurate tracking of tissues and instruments in videos is crucial for Robotic-Assisted Minimally Invasive Surgery. We introduce a new annotated surgical tracking dataset for benchmarking tracking methods for surgical scenarios, comprising real-world surgical videos with complex tissue and instrument motions.
Bohan Zhan
,
Wang Zhao
,
Yi Fang
,
Bo Du
,
Francisco Vasconcelos
,
Danail Stoyanov
,
Daniel S Elson
,
Baoru Huang
PDF
Cite
Code
Dataset
FedEFM: Federated Endovascular Foundation Model with Unseen Data
In endovascular surgery, the precise identification of catheters and guidewires in X-ray images is essential for reducing intervention risks. This paper proposes a new method to train a foundation model in a decentralized federated learning setting for endovascular intervention.
Tuong Do
,
Nghia Vu
,
Tudor Jianu
,
Baoru Huang
,
Minh Vu
,
Jionglong Su
,
Erman Tjiputra
,
Quang D Tran
,
Te-Chuan Chiu
,
Anh Nguyen
PDF
Cite
Code
Dataset
Hybrid Deep Reinforcement Learning for Radio Tracer Localisation in Robotic-assisted Radioguided Surgery
This paper presents a learning-based method to realize the autonomous radiotracer detection in robot-assisted surgeries by navigating the probe to the radioactive target. Real-world evaluation on the da Vinci Research Kit (dVRK) further confirms the feasibility of the approach, achieving an 80% success rate in radiotracer detection.
Hanyi Zhang
,
Kaizhong Deng
,
Zhaoyang Jacopo Hu
,
Baoru Huang
,
Daniel S Elson
PDF
Cite
SurgicalGS: Dynamic 3d gaussian splatting for accurate robotic-assisted surgical scene reconstruction
Accurate 3D reconstruction of dynamic surgical scenes from endoscopic video is essential for robotic-assisted surgery. We present SurgicalGS, a dynamic 3D Gaussian Splatting framework specifically designed for surgical scene reconstruction with improved geometric accuracy.
Jialei Chen
,
Xin Zhang
,
Mobarakol Islam
,
Francisco Vasconcelos
,
Danail Stoyanov
,
Daniel S Elson
,
Baoru Huang
PDF
Cite
Code
Dataset
HabiCrowd: A High Performance Simulator for Crowd-Aware Visual Navigation
We present HabiCrowd, a benchmark for crowd-aware visual navigation, integrating diverse human dynamics into photorealistic environments. HabiCrowd achieves state-of-the-art collision avoidance and superior computational efficiency, advancing studies in human-robot interaction and navigation.
An Dinh Vuong
,
Toan Tien Nguyen
,
Minh Nhat Vu
,
Baoru Huang
,
Dzung Nguyen
,
Huynh Thi Thanh Binh
,
Thieu Vo
,
Anh Nguyen
PDF
Cite
Code
Dataset
Language-driven Grasp Detection with Mask-guided Attention
We propose a novel method for language-driven grasp detection using mask-guided attention and transformer mechanisms with semantic segmentation features. By integrating visual data and natural language, our approach achieves a 10% improvement in grasp detection accuracy and excels in real-world robotic experiments.
Tuan Van Vo
,
Minh Nhat Vu
,
Baoru Huang
,
An Vuong
,
Ngan Le
,
Thieu Vo
,
Anh Nguyen
PDF
Cite
Code
Dataset
»
Cite
×