How Compilers Optimize Kernels

When I first learned how a simple kernel like matrix multiplication actually runs on hardware, I was surprised by how much performance comes down to how the loops are structured, not just what the math is doing. At first glance, C = A × B looks harmless. But even a 512×512 multiply involves over 100 million operations. That’s where compilers and kernel engineers step in - rearranging loops, tuning memory access, and squeezing every bit of performance from the hardware....

November 15, 2025 ·  3 min

Getting Started with MLIR

Most of my experience so far has been with CPU compilers. I’m used to thinking about traditional compiler pipelines that start from source code, parse it into an AST, lower it to an intermediate representation (IR), optimize it, and then generate machine code. The work usually revolves around control flow, register allocation, and data dependencies. It’s all about turning human-written code into something that runs efficiently on a CPU. But machine learning (ML) compilers operate in a completely different world....

October 22, 2025 ·  4 min