GPU Optimization Engineer at High-growth generative AI startup

Are you a GPU expert who knows how to squeeze every last millisecond out of a model? This high-growth generative AI startup is looking for a specialist to own the performance of their real-time speech models. You'll work across CUDA, Triton, and Tinygrad to design custom kernels and optimize model graphs for maximum throughput. If you have a performance-first mindset and want to work on the frontier of AI inference across NVIDIA, AMD, and edge accelerators, this is the role for you. Join a top-tier VC-backed team where your work directly defines the speed of the next generation of audio AI.

Overview

Role overview

You will lead the optimization of real-time speech model architectures, ensuring maximum throughput and minimal latency across diverse GPU hardware. This role involves deep architectural tuning, custom kernel development, and hardware-specific profiling to squeeze every millisecond of performance out of state-of-the-art models. You will directly impact the reliability and speed of cutting-edge AI inference stacks.

Company

About the company

High-growth generative AI startup

Responsibilities

What you will do

Design and implement custom CUDA, Triton, or Tinygrad kernels for performance-critical model sections.
Profile end-to-end inference workloads using tools like Nsight to identify and resolve memory bandwidth and kernel bottlenecks.
Partner with research and infrastructure teams to perform operator fusion, graph optimization, and kernel-level scheduling.

Candidate profile

Who this is a fit for

Possesses a Master’s or PhD in GPU Programming with 3-5 years of specialized experience in hardware-level optimization.
Demonstrates deep mastery of GPU architecture, including SMs, memory hierarchy, occupancy tuning, and kernel debugging.
Has extensive hands-on experience with PyTorch, TensorRT, and various model architectures like transformers and diffusion blocks.

===

What makes it remarkable

Why this role is remarkable

Influence the performance of state-of-the-art real-time audio models at a deep architectural level.
Join a well-funded team backed by top-tier VCs working on the frontier of generative speech technology.
Work across diverse hardware backends, porting models from NVIDIA to AMD and emerging edge accelerators.

Jack & Jill

How Jack & Jill work together

I get to know what you’re great at, then find roles you’d never find yourself.Ok, I'll go first. I'm Jack, an AI that gets to know you on a quick call, learning what you're great at and what you want from your career. Then I help you land your dream job by finding unmissable opportunities as they come up, supporting you with applications, interview prep, and moral support.

I recruit from Jack’s network and make the intro when I spot a great match.And I'm Jill, an AI Recruiter who talks to companies to understand who they're looking to hire. Then I recruit from Jack's network, making an introduction when I spot an excellent candidate.

Meet Jack

Jack gets to know what you're great at and what you want next, then searches 14 million jobs daily and introduces you directly to hiring managers.

How does this work?

Jack's an AI agent for job searching and career coaching. He works for you.

Jill is the AI recruiter working for the company. She recruits from Jack's network.

If it's a match and the company wants to meet you, they'll make the intro. In the meantime, if you'd like, Jack will send you excellent alternatives.

Find a job with

Jack