I am currently an Assistant Professor in the Department of Computer Science at the University of Liverpool. Prior to this, I worked as a Research Fellow in the Department of Computer Science at University College London, funded by the EU Horizon 2020 program EndoMapper. I completed my PhD at the Hamlyn Centre, Imperial College London. I also had the privilege of working as a Research Scientist at Reality Labs, Meta (Facebook), where I gained invaluable experience and contributed to pioneering projects in Eye Tracking for AR/VR applications.
I earned my BEng degree in Mechanical Engineering with First Class Honours from the University of Birmingham, UK, in 2018, followed by an MRes degree in Medical Robotics and Image-Guided Intervention with Distinction from Imperial College London, UK, in 2019.
PhD in AI, Computer Vision & Medical Robotics, 2023
Imperial College London
MRes in Medical Robotics and Image-guided Intervention (with Distinction), 2019
Imperial College London
BEng in Mechanical Engineering (with Honours Class I), 2018
University of Birmingham
We introduce Guide3D, a high-resolution bi-planar X-ray dataset for 3D reconstruction, addressing the limitations of monoplanar fluoroscopic data in endovascular surgical tool navigation and providing a benchmark for advancing segmentation and reconstruction techniques in real-world applications.
We present a novel approach for language-driven 6-DoF grasp detection in cluttered point clouds, introducing Grasp-Anything-6D, a large-scale dataset, and a diffusion model with negative prompt guidance to enable robots to grasp objects based on natural language commands, surpassing baselines in both benchmarks and real-world applications.
We propose a method for language-conditioned affordance detection and 6-DoF pose estimation in 3D point clouds, enabling robots to handle diverse affordances beyond predefined sets. Our approach features an open-vocabulary affordance detection branch and a language-guided diffusion model for pose generation. A new dataset supports the task, and experiments show significant performance improvements over baselines. The method demonstrates strong potential in real-world robotic applications.
We introduce an open-vocabulary affordance detection method for 3D point clouds, addressing the challenges of complex object shapes and diverse affordances. Using knowledge distillation and a novel text-point correlation approach, our method enhances feature extraction and semantic understanding. It outperforms baselines with a 7.96% mIOU improvement and supports real-time inference, ideal for robotic manipulation tasks.
Robot grasp detection is a complex challenge with significant industrial relevance. To address this, we present Grasp-Anything++, a new language-driven grasp detection dataset containing 1M samples, over 3M objects, and 10M grasping instructions. Leveraging foundation models, we frame grasp detection as a conditional generation task and propose a novel diffusion model-based method with a contrastive training objective to improve language-guided grasp pose detection. Our approach surpasses state-of-the-art methods, supports real-world robotic grasping, and enables zero-shot grasp detection. The dataset serves as a challenging benchmark, promoting advancements in language-driven robotic grasping research.
We propose a novel Residual Aligner-based Network (RAN) for deformable image registration, addressing challenges in capturing separate and sliding motions of organs. By introducing a Motion Separable backbone and a Residual Aligner module, RAN achieves state-of-the-art accuracy in unsupervised registration of abdominal and lung CT scans, with reduced model size and computational cost.
We propose a simple regression network to enhance intraoperative gamma activity visualization in endoscopic radio-guided cancer detection and resection. By leveraging high-dimensional image features and probe position data, our method effectively detects sensing areas, outperforming prior geometric approaches.