▌ GitHub radar

gateGPT: a GPT baked into a chip, 56k tokens/sec

A working transformer that runs entirely in hardware on an FPGA, no CPU or GPU involved. It spits out names at tens of thousands of tokens per second, and it picked up hundreds of stars in days.

01fguzman82/gateGPT 454Verilog

gateGPT is a complete tiny GPT implemented in Verilog and run on a Xilinx Virtex-5 FPGA, so the model lives in dedicated hardware rather than as software on a CPU or GPU. It is a single transformer block with attention, a key-value cache and fixed-point math, generating human names one character at a time. At an 80 MHz clock it reaches roughly 56,000–69,000 tokens per second, and every stage was verified to match a Python reference exactly.

Why a vibe-coder should care

It is a rare, hands-on look at what a language model really is once you strip away the frameworks: matrix multiplies, attention and sampling turned into logic gates. For anyone curious how AI works under the hood, this is about as concrete as it gets.

Open on GitHub →