Skip to main content
Back to all jobs

This job is no longer actively hiring. Open Roles to see active jobs.

Confidential company

Job listing

RemoteNot Disclosed

GPU Optimization Engineer at High-growth generative AI startup

Are you a GPU expert who knows how to squeeze every last millisecond out of a model? This high-growth generative AI startup is looking for a specialist to own the performance of their real-time speech models. You'll work across CUDA, Triton, and Tinygrad to design custom kernels and optimize model graphs for maximum throughput. If you have a performance-first mindset and want to work on the frontier of AI inference across NVIDIA, AMD, and edge accelerators, this is the role for you. Join a top-tier VC-backed team where your work directly defines the speed of the next generation of audio AI.

Overview

Role overview

You will lead the optimization of real-time speech model architectures, ensuring maximum throughput and minimal latency across diverse GPU hardware. This role involves deep architectural tuning, custom kernel development, and hardware-specific profiling to squeeze every millisecond of performance out of state-of-the-art models. You will directly impact the reliability and speed of cutting-edge AI inference stacks.

Company

About the company

High-growth generative AI startup

Responsibilities

What you will do

  • Design and implement custom CUDA, Triton, or Tinygrad kernels for performance-critical model sections.
  • Profile end-to-end inference workloads using tools like Nsight to identify and resolve memory bandwidth and kernel bottlenecks.
  • Partner with research and infrastructure teams to perform operator fusion, graph optimization, and kernel-level scheduling.

Candidate profile

Who this is a fit for

  • Possesses a Master’s or PhD in GPU Programming with 3-5 years of specialized experience in hardware-level optimization.
  • Demonstrates deep mastery of GPU architecture, including SMs, memory hierarchy, occupancy tuning, and kernel debugging.
  • Has extensive hands-on experience with PyTorch, TensorRT, and various model architectures like transformers and diffusion blocks.

===

What makes it remarkable

Why this role is remarkable

  • Influence the performance of state-of-the-art real-time audio models at a deep architectural level.
  • Join a well-funded team backed by top-tier VCs working on the frontier of generative speech technology.
  • Work across diverse hardware backends, porting models from NVIDIA to AMD and emerging edge accelerators.

Jack & Jill

How Jack & Jill work together

Jack
I get to know what you’re great at, then find roles you’d never find yourself.
Jill
I recruit from Jack’s network and make the intro when I spot a great match.

Meet Jack

Thumbnail for Meet Jack

Jack gets to know what you're great at and what you want next, then searches 14 million jobs daily and introduces you directly to hiring managers.

How does this work?

Jack's an AI agent for job searching and career coaching. He works for you.

Jill is the AI recruiter working for the company. She recruits from Jack's network.

If it's a match and the company wants to meet you, they'll make the intro. In the meantime, if you'd like, Jack will send you excellent alternatives.

Find a job withJack

Ready to find your next role?

Talk to Jack for 10 minutes and see your first matches.