You will lead the optimization of real-time speech model architectures, ensuring maximum throughput and minimal latency across diverse GPU hardware. This role involves deep architectural tuning, custom kernel development, and hardware-specific profiling to squeeze every millisecond of performance out of state-of-the-art models. You will directly impact the reliability and speed of cutting-edge AI inference stacks.
This job is no longer actively hiring. Open Roles to see active jobs.
GPU Optimization Engineer at High-growth generative AI startup
Are you a GPU expert who knows how to squeeze every last millisecond out of a model? This high-growth generative AI startup is looking for a specialist to own the performance of their real-time speech models. You'll work across CUDA, Triton, and Tinygrad to design custom kernels and optimize model graphs for maximum throughput. If you have a performance-first mindset and want to work on the frontier of AI inference across NVIDIA, AMD, and edge accelerators, this is the role for you. Join a top-tier VC-backed team where your work directly defines the speed of the next generation of audio AI.
Overview
Role overview
Company
About the company
High-growth generative AI startup
Responsibilities
What you will do
- Design and implement custom CUDA, Triton, or Tinygrad kernels for performance-critical model sections.
- Profile end-to-end inference workloads using tools like Nsight to identify and resolve memory bandwidth and kernel bottlenecks.
- Partner with research and infrastructure teams to perform operator fusion, graph optimization, and kernel-level scheduling.
Candidate profile
Who this is a fit for
- Possesses a Master’s or PhD in GPU Programming with 3-5 years of specialized experience in hardware-level optimization.
- Demonstrates deep mastery of GPU architecture, including SMs, memory hierarchy, occupancy tuning, and kernel debugging.
- Has extensive hands-on experience with PyTorch, TensorRT, and various model architectures like transformers and diffusion blocks.
===
What makes it remarkable
Why this role is remarkable
- Influence the performance of state-of-the-art real-time audio models at a deep architectural level.
- Join a well-funded team backed by top-tier VCs working on the frontier of generative speech technology.
- Work across diverse hardware backends, porting models from NVIDIA to AMD and emerging edge accelerators.
Jack & Jill
How Jack & Jill work together
Meet Jack
Jack gets to know what you're great at and what you want next, then searches 14 million jobs daily and introduces you directly to hiring managers.
How does this work?
Jack's an AI agent for job searching and career coaching. He works for you.
Jill is the AI recruiter working for the company. She recruits from Jack's network.
If it's a match and the company wants to meet you, they'll make the intro. In the meantime, if you'd like, Jack will send you excellent alternatives.