Foundation Modelling Engineer at Uncovr

Join an EF-backed Paris startup building the foundation model layer for the next generation of surgical robotics. As a Foundation Modelling Engineer, you will tackle the immense challenge of teaching machines to understand complex 2-6 hour surgical procedures from raw video alone, without relying on brittle, manually-labeled steps. Working alongside a world-class team from ETH Zurich and Oxford Dynamics, you'll build self-supervised video encoders and spatiotemporal transformers that transform raw footage into structured intelligence. This is a rare opportunity to join a high-growth AI team of fewer than ten people where your technical decisions will directly shape the future of robotic autonomy.

About this role

Role overview

You will build the foundation model layer that makes manual labeling optional for surgical video. By training self-supervised video encoders and spatiotemporal transformers on raw 2-6 hour surgical footage, you will enable machines to understand tool-tissue interactions and procedure progression, directly shaping the future of autonomous surgical robotics and AI-driven operative reporting.

About the company

Uncovr

Healthtech Medtech AI10 employeesSeed (VC-backed)

EF-backed Paris startup building the intelligence layer for surgical robotics

What you'll do

What you will do

Train self-supervised video encoders using masked video modeling and temporal contrastive learning on laparoscopic and robotic video datasets.
Build spatiotemporal transformers to track tool-tissue interactions and construct scene graphs that describe surgical actions independent of tool brands.
Develop production-grade training infrastructure and data pipelines to turn massive hospital video streams into scalable, high-performance training workflows.

Who you are

Who this is a fit for

5-8 years of hands-on experience training large-scale vision or video models, with deep expertise in representation learning and self-supervised training.
Proficiency in optimizing distributed PyTorch training for long-context temporal modeling and handling multi-hour video sequences.
A proven builder mindset with the ability to bridge high-level research and production-grade engineering, ideally supported by publications at CVPR, NeurIPS, or ICLR.

Why this role

Why this role is remarkable

Work with a world-class team from ETH Zurich, ESA, and Oxford Dynamics, advised by Prof. Dan Stoyanov, a global leader in surgical computer vision.
Tackle one of AI's hardest frontiers: training large-scale foundation models on complex, long-context video data where labels are traditionally scarce and brittle.
Join an early-stage startup as one of the first ten employees, gaining meaningful equity and the autonomy to design technical architectures from scratch.

Jack & Jill

How Jack & Jill work together

I get to know what you’re great at, then find roles you’d never find yourself.Ok, I'll go first. I'm Jack, an AI that gets to know you on a quick call, learning what you're great at and what you want from your career. Then I help you land your dream job by finding unmissable opportunities as they come up, supporting you with applications, interview prep, and moral support.

I recruit from Jack’s network and make the intro when I spot a great match.And I'm Jill, an AI Recruiter who talks to companies to understand who they're looking to hire. Then I recruit from Jack's network, making an introduction when I spot an excellent candidate.

Meet Jack

Jack gets to know what you're great at and what you want next, then searches 15 million jobs daily and helps you discover roles at companies like this.

How does this work?

Tell Jack what you want

He learns what great work looks like for you, then scans 15M jobs every night to find your best matches.

Reply to refine your search

React to roles in-app or by email. Jack adjusts, tracks applications, and keeps improving the more you share.

Land the role

Mock interviews, salary benchmarking, and coaching calls. Free for candidates.

Find a job with Jack