Skip to main content
Back to all jobs

Senior ML Infrastructure Engineer at Prior Labs GmbH

Prior Labs GmbH is seeking a Senior ML Infrastructure Engineer in Berlin to lead the compute strategy for their world-class tabular foundation models. With tens of millions in annual GPU spend, you won't just be supporting research; you'll be making high-stakes architectural decisions on cluster scheduling, hardware selection, and provider strategy. This is a unique opportunity to own the full stack—from Slurm and GCP today to multi-provider orchestration tomorrow. If you have deep expertise in distributed training, PyTorch internals, and scaling GPU clusters, join this lean, high-impact team where your infrastructure calls directly enable the next generation of AI.

Want to apply for this role?

Prior Labs GmbH

Jack finds you jobs at companies like Prior Labs GmbH. Talk to Jack to get considered for roles that fit what you're great at.

Location

Berlin, Germany

Compensation

€200k + Equity

Company

Prior Labs GmbH

Talk to Jack

Role overview

You will own the GPU infrastructure powering foundation model training for structured data, managing tens of millions in compute spend. Transitioning from Slurm on GCP to multi-provider environments, you'll optimize cluster architecture, scheduling, and distributed training performance. This is a high-leverage role working directly with world-class researchers to build the next generation of AI systems.

Prior Labs builds multimodal tabular foundation models (TFMs), starting with TabPFN, designed to understand tables natively and perform statistical reasoning directly from data. The company says its broader vision is to create agentic AI systems that can understand high-level goals, combine tables, language, and images, reason across modalities, integrate domain knowledge, infer causality, and adapt dynamically.

What you will do

  • Design and evolve multi-cluster GPU infrastructure, moving beyond current Slurm/GCP setups to incorporate multi-provider orchestration and the latest hardware generations.
  • Drive maximum training efficiency by profiling distributed training runs, debugging systems-level bottlenecks, and optimizing GPU utilization to minimize cost-per-FLOP.
  • Build the internal developer platform, including experiment tracking, CI pipelines, and model registries, to ensure the research team maintains high iteration speeds.

Who this is a fit for

  • Possesses 5+ years of experience operating production-scale GPU infrastructure or distributed training systems at a major AI lab, well-funded startup, or HPC environment.
  • Demonstrates deep expertise in Slurm and systems-level thinking, with a proven ability to profile PyTorch internals and identify hardware-level bottlenecks like memory bandwidth or communication latencies.
  • Exhibits a track record of managing significant compute budgets and making high-stakes infrastructure calls that measurably improve training performance or cost efficiency.

Why this role is remarkable

  • Extreme Compute Ownership: You will manage a GPU budget in the tens of millions, making critical architectural decisions on hardware, scheduling, and provider strategy where a single optimization can save six figures.
  • State-of-the-Art Research: Work at the frontier of AI by building the infrastructure for foundation models specifically designed for structured data, operating in a lean, high-talent environment without corporate overhead.
  • Architectural Freedom: Lead the evolution from a single Slurm/GCP cluster to a multi-provider, multi-cluster infrastructure, evaluating new hardware generations as they come online to maximize training throughput.

How Jack & Jill work together

Jack
I get to know what you’re great at, then find roles you’d never find yourself.
Jill
I recruit from Jack’s network and make the intro when I spot a great match.
Thumbnail for Meet Jack

Jack gets to know what you're great at and what you want next, then searches 15 million jobs daily and helps you discover roles at companies like this.

Meet Jack

What happens next?

Jack’s an AI agent for job searching and career coaching. He works for you.

Jill is the AI recruiter working for the company. She recruits from Jack’s network.

If your profile’s a match and Prior Labs GmbH wants to meet, Jill will make the intro. In the meantime, Jack will send you excellent alternatives.

Learn about Jack

Ready to find your next role?

Talk to Jack for 10 minutes and see your first matches.