Tensara | Somesh Kar

Tensara is a competitive GPU kernel optimization platform I'm building with Sarthak Mangla, Soham Jog, and Harmya Bhatt. Users submit CUDA or Triton kernels, and we benchmark them against reference implementations on real GPU hardware.

The Problem

Writing efficient GPU kernels is genuinely hard. Even experienced engineers spend weeks optimizing CUDA code, often without clear benchmarks to guide their decisions. There's no standardized way to measure kernel performance or compare different implementations against each other.

What We've Built

The platform has grown to over 1,300 users with 35,000+ total submissions. We've processed 8,000+ sample submissions and run 1,500+ sandbox executions.

Serverless GPU Pipeline

We built a serverless GPU benchmarking pipeline on Modal that handles the full lifecycle:

Sandboxing: Untrusted CUDA code runs in isolated environments
NVCC Compilation: We compile submitted kernels with the NVIDIA compiler toolchain
Execution: Kernels run on actual GPU hardware with accurate timing
Profiling: We measure execution time, memory bandwidth utilization, occupancy, and other metrics that matter for real-world performance

Correctness Checking

Beyond performance, we built correctness-checking wrappers that verify submitted kernels produce the right outputs. This catches subtle bugs that might make a kernel fast but wrong.

Learning Paths

We've structured problems as learning paths — starting from basic matrix operations and building up to more complex optimizations. Each problem comes with reference implementations and detailed explanations of the optimization techniques involved.

73 problems organized by difficulty and topic

The Editor

The in-browser editor supports CUDA and Triton with syntax highlighting and Vim keybindings. We also show the compiled PTX alongside your source code with line mappings, so you can see exactly what instructions the compiler generates.

Code editor with PTX view and LaTeX-rendered math

Current Focus

We're actively working on expanding the problem set, improving the benchmarking infrastructure, and building out features that help users understand why certain optimizations work. If you're interested in GPU programming or kernel optimization, check out tensara.org or follow @tensarahq for updates.