FFT Core
Published in Advanced Digital Design, Spring 2026, 2026
Design and VLSI Implementation of a high-performance, 1024-point Radix-2 DIF FFT core
Architecture
- 16-bit fixed-point precision
- Data SRAM, twiddle factor ROM
- TSMC 65 nm CMOS process
Optimization Features
- Four-stage interleaved pipeline: Optimize memory idle cycles and access time
- Dual-port SRAM interleaving: FSM controller coordinating alternating read, execute, writeback phases, keeping the memory busy.
- RTNE ALU: 3 dB SQNR improvement
- Programmable scaling mask: Let the user trade between accuracy and overflow protecting depending on input profiles

Physical Implementation
Full RTL-to-GDSII layout flow via Synopsys Design Compiler, QuestaSim, Innovus, and Virtuoso
- Clock frequency: 400 MHz (limited by ROM). Underlying logic capable for 500 MHz
- Throughput: 39.8 MS/s
- Area: 0.148 mm², mostly data SRAM
- Precision: 60 dB SQNR, 0.0067% NRMSE

Simulation
- Post-APR Qsim on Python FP golden model and bit-accurate C Int16 model
- Gate-level simulation and power analysis on a subset of inputs

