You will build the foundation model layer that makes manual labeling optional for surgical video. By training self-supervised video encoders and spatiotemporal transformers on raw 2-6 hour surgical footage, you will enable machines to understand tool-tissue interactions and procedure progression, directly shaping the future of autonomous surgical robotics and AI-driven operative reporting.
This job is no longer actively hiring. Talk to Jack to find live roles.
Foundation Modelling Engineer at Uncovr
Join an EF-backed Paris startup building the foundation model layer for the next generation of surgical robotics. As a Foundation Modelling Engineer, you will tackle the immense challenge of teaching machines to understand complex 2-6 hour surgical procedures from raw video alone, without relying on brittle, manually-labeled steps. Working alongside a world-class team from ETH Zurich and Oxford Dynamics, you'll build self-supervised video encoders and spatiotemporal transformers that transform raw footage into structured intelligence. This is a rare opportunity to join a high-growth AI team of fewer than ten people where your technical decisions will directly shape the future of robotic autonomy.
Want to apply for this role?
This role is no longer actively hiring, but Jack can still help you discover similar open roles that fit.
Location
Paris, France
Compensation
Not Disclosed
Company
Uncovr
Role overview
Foundation Model for Surgery, building for surgical efficiency today and Surgical robotic autonomy tomorrow.
What you will do
- Train self-supervised video encoders using masked video modeling and temporal contrastive learning on laparoscopic and robotic video datasets.
- Build spatiotemporal transformers to track tool-tissue interactions and construct scene graphs that describe surgical actions independent of tool brands.
- Develop production-grade training infrastructure and data pipelines to turn massive hospital video streams into scalable, high-performance training workflows.
Who this is a fit for
- 5-8 years of hands-on experience training large-scale vision or video models, with deep expertise in representation learning and self-supervised training.
- Proficiency in optimizing distributed PyTorch training for long-context temporal modeling and handling multi-hour video sequences.
- A proven builder mindset with the ability to bridge high-level research and production-grade engineering, ideally supported by publications at CVPR, NeurIPS, or ICLR.
Why this role is remarkable
- Work with a world-class team from ETH Zurich, ESA, and Oxford Dynamics, advised by Prof. Dan Stoyanov, a global leader in surgical computer vision.
- Tackle one of AI's hardest frontiers: training large-scale foundation models on complex, long-context video data where labels are traditionally scarce and brittle.
- Join an early-stage startup as one of the first ten employees, gaining meaningful equity and the autonomy to design technical architectures from scratch.
How Jack & Jill work together
Jack gets to know what you're great at and what you want next, then searches 15 million jobs daily and helps you discover roles at companies like this.
Meet Jack
What happens next?
Jack’s an AI agent for job searching and career coaching. He works for you.
Jill is the AI recruiter working for the company. She recruits from Jack’s network.
If your profile’s a match and Uncovr wants to meet, Jill will make the intro. In the meantime, Jack will send you excellent alternatives.