Skip to main content

This job is no longer actively hiring. Talk to Jack to find live roles.

Foundation Modelling Engineer at Uncovr

Join an EF-backed Paris startup building the foundation model layer for the next generation of surgical robotics. As a Foundation Modelling Engineer, you will tackle the immense challenge of teaching machines to understand complex 2-6 hour surgical procedures from raw video alone, without relying on brittle, manually-labeled steps. Working alongside a world-class team from ETH Zurich and Oxford Dynamics, you'll build self-supervised video encoders and spatiotemporal transformers that transform raw footage into structured intelligence. This is a rare opportunity to join a high-growth AI team of fewer than ten people where your technical decisions will directly shape the future of robotic autonomy.

Want to apply for this role?

Uncovr

This role is no longer actively hiring, but Jack can still help you discover similar open roles that fit.

Location

Paris, France

Compensation

Not Disclosed

Company

Uncovr

See Open Roles

Role overview

You will build the foundation model layer that makes manual labeling optional for surgical video. By training self-supervised video encoders and spatiotemporal transformers on raw 2-6 hour surgical footage, you will enable machines to understand tool-tissue interactions and procedure progression, directly shaping the future of autonomous surgical robotics and AI-driven operative reporting.

Healthtech Medtech AI10 employeesSeed (VC-backed)

Foundation Model for Surgery, building for surgical efficiency today and Surgical robotic autonomy tomorrow.

What you will do

  • Train self-supervised video encoders using masked video modeling and temporal contrastive learning on laparoscopic and robotic video datasets.
  • Build spatiotemporal transformers to track tool-tissue interactions and construct scene graphs that describe surgical actions independent of tool brands.
  • Develop production-grade training infrastructure and data pipelines to turn massive hospital video streams into scalable, high-performance training workflows.

Who this is a fit for

  • 5-8 years of hands-on experience training large-scale vision or video models, with deep expertise in representation learning and self-supervised training.
  • Proficiency in optimizing distributed PyTorch training for long-context temporal modeling and handling multi-hour video sequences.
  • A proven builder mindset with the ability to bridge high-level research and production-grade engineering, ideally supported by publications at CVPR, NeurIPS, or ICLR.

Why this role is remarkable

  • Work with a world-class team from ETH Zurich, ESA, and Oxford Dynamics, advised by Prof. Dan Stoyanov, a global leader in surgical computer vision.
  • Tackle one of AI's hardest frontiers: training large-scale foundation models on complex, long-context video data where labels are traditionally scarce and brittle.
  • Join an early-stage startup as one of the first ten employees, gaining meaningful equity and the autonomy to design technical architectures from scratch.

How Jack & Jill work together

Jack
I get to know what you’re great at, then find roles you’d never find yourself.
Jill
I recruit from Jack’s network and make the intro when I spot a great match.
Thumbnail for Meet Jack

Jack gets to know what you're great at and what you want next, then searches 15 million jobs daily and helps you discover roles at companies like this.

Meet Jack

What happens next?

Jack’s an AI agent for job searching and career coaching. He works for you.

Jill is the AI recruiter working for the company. She recruits from Jack’s network.

If your profile’s a match and Uncovr wants to meet, Jill will make the intro. In the meantime, Jack will send you excellent alternatives.

Learn about Jack

Ready to find your next role?

Talk to Jack for 10 minutes and see your first matches.