Gentian is a mesh-able custom silicon architecture for transformer inference.
The memory-wall context
Transformer inference is memory-bound during decode. A 70-billion-parameter model with a long-context KV cache moves hundreds of gigabytes across off-chip memory for every generated token. On GPU-class hardware, most of the energy is spent moving state, not computing on it. High-bandwidth memory (HBM) and advanced packaging (2.5D interposers, CoWoS) mitigate the bandwidth cost; they do not remove the underlying fact that compute and memory are separate silicon, connected by a link that costs an order of magnitude more per byte than an on-die access.
A distributed inference architecture that keeps parameters and activations where the computation happens — rather than shuttling them across a memory boundary — trades an external bandwidth problem for an internal coordination problem.
The architecture
Gentian is a tile-mesh where each tile holds its share of the model state in on-die SRAM and tiles compose via a regular mesh across die boundaries on an ordinary PCB. The design commits to four engineering invariants:
- Regular mesh topology. Nearest-neighbour links only; no crossbar, no global bus.
- Master-free peer-to-peer coordination. No central scheduler or arbiter.
- Stationary data. Weights and activations stay on their tile; only small traversal objects move.
- Clock-velocity interconnect. Inter-die hops at the CPU clock rate, not via packetised SerDes.
Together the four commitments put wall-clock runtime on the causal lower bound of parallel computation, and make the architecture node-agnostic and portable across commercial advanced-node processes.
Project state
| Milestone | State |
|---|---|
| End-to-end execution | A real trained transformer runs on the mesh |
| RTL | Validated against C++ reference |
| Physical design | Place-and-route sign-off reached |
| Architecture paper | Preprint in preparation |
| Simulation archive | Available to partners under NDA |
| Commercial-node tape-out | Foundry partner selection in progress |
Engagement
Technical briefing, architecture manuscript, PnR reports, and the simulation archive are provided under mutual NDA. A partnership conversation moves directly to engineering review. Broader research context at cybiont Research.