» “can you train a model on Apple’s Neural Engine?”

Manjeet Singh:

Apple doesn’t want you to know the answer. They don’t publish the ANE’s ISA. They don’t document its internal architecture. They don’t even give you a way to program it directly — everything goes through CoreML, which adds layers of abstraction, optimization passes, and overhead that make it nearly impossible to understand what the hardware is actually doing.

So we reverse-engineered it.

Over several days, we mapped the entire software stack from CoreML down to the IOKit kernel driver, discovered how to compile and execute programs on the ANE without CoreML, cracked the binary format, measured the true peak performance (spoiler: Apple’s “38 TOPS” number is misleading), and ultimately got a neural network training on a chip designed exclusively for inference.

This is Part 1 of a three-part series. Here we cover the reverse engineering — how we peeled back the layers to understand what the M4 Neural Engine actually is and how to talk to it directly.