I am a Research Associate at the University of Bristol and a member of the MaVi research group. My current research is on 4D video understanding, aiming to develop systems that can perceive and reason about dynamic 3D scenes over time.
Previously, as a PhD researcher of Computer Vision at the University of Bristol, supervised by Prof. Dima Damen, my research focus was on leveraging multimodal data for egocentric video understanding. This included topics such as audio-visual deep learning, action recognition/detection, predicting object-interactions using eye-gaze and 3D annotations, and long-term 3D multi-object tracking. During this time, I was also a PhD intern with the Visual Representation Learning team at NAVER Labs Europe.
Prior to my PhD, I earned a First Class Honours MEng in Computer Science from the University of Bristol, where my dissertation on "Video GANs for Human-Object Interactions" was highly graded. Alongside research, I've gained teaching experience across multiple undergraduate modules, contributing to both coursework design and lab-based support.
My technical strengths lie in deep learning, computer vision, and multimodal modelling, with extensive experience in Python (PyTorch) and capabilities with C++ and Javascript.
* denotes equal contribution
arXiv preprint arXiv:2512.16456, 2025
Conference on Computer Vision and Pattern Recognition (CVPR), 2025
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023