Chaitanyh Singh — AI Semiconductor Engineer

➕ Arithmetic Circuits 3 modules ▼

100-Bit Binary Adder

CombinationalRipple Carry

▼

100-bit binary adder using XOR/AND ripple-carry architecture. Accepts two 100-bit operands and produces a 101-bit sum with carry-out propagation through every bit position.

Architecture

Ripple Carry Adder

Data Width

100-bit operands

Logic Style

Combinational (always_comb)

Output

101-bit sum + carry

Core Logic Snippet

assign {cout, sum} = a + b;

3-Bit Full Adder

Combinational

▼

Classic 3-bit full adder. Generates sum and carry using XOR and AND gate logic. Building block for wider adder chains.

Inputs

a[2:0], b[2:0], cin

Outputs

sum[2:0], cout

Signed Addition Overflow Detector

Signed ArithmeticOverflow

▼

Detects signed overflow in two's complement addition. Overflow occurs when adding two same-sign numbers yields an opposite-sign result — critical for ALU correctness.

Detection Method

MSB carry XOR logic

Application

ALU / Exception Flags

🔢 Counters & Sequences 6 modules ▼

8-Bit Odd Counter

SequentialTestbench

▼

Counter that increments by 2 starting from 1, generating the odd number sequence. Verified with a 128-cycle simulation testbench.

Width

8-bit

Step

+2 (odd only)

Reset Value

8'h1

Verification

128-cycle testbench

RTL

always_ff@(posedge clk or posedge reset) if (reset) cnt_o <= 8'h1; else cnt_o <= cnt_o + 8'h02;

Load-Enabled Counter

SequentialLoad

▼

Counter with parallel load capability and enable control. When load is asserted the counter captures external data; enable gates counting independently.

Decade Counter (1–10)

MOD-10Wrap-around

▼

Counts 1→10 in decimal then auto-wraps back to 1. Used in BCD display drivers and timing circuits.

Performance Counter

CPU PipelineParameterizable

▼

Hardware performance counter for CPU pipeline event tracking. Parameterizable width, triggered by CPU events, software-readable with reset-on-read semantics.

Trigger

CPU event signals

Read Mode

Reset-on-read

Width

Parameterizable

Application

PMU / Profiling

Perfect Square Sequence Generator

Sequence

▼

Generates 1, 4, 9, 16, 25… (perfect squares) using incremental difference arithmetic — avoids a multiplier in hardware.

↔️ Shift Registers & Barrel Shifters 10 modules ▼

4-Bit Barrel Shifter (4 Modes)

Shift/RotateCombinational

▼

Multi-mode barrel shifter supporting logical left, logical right, rotate left, and rotate right — all in one combinational block.

Mode 00

Logical Left Shift

Mode 01

Logical Right Shift

Mode 10

Rotate Left

Mode 11

Rotate Right

32-Bit Galois LFSR

PRNGFeedback

▼

32-bit Galois-form Linear Feedback Shift Register for pseudo-random sequence generation. XOR feedback taps at bit positions 1, 2, and 22.

Width

32-bit

Reset Seed

32'h0000_0001

Feedback Taps

Bits 0, 1, 2, 22

Application

PRNG / Test Pattern Gen

100-Bit Rotator with Load

100-bitRotate

▼

100-bit rotation register with parallel load. Rotate left/right independently via enable signals. Useful for wide data manipulation in DSP pipelines.

PISO / SIPO / SISO Registers

SerialParallel

▼

Complete set of shift register variants: Parallel-In Serial-Out, Serial-In Parallel-Out, and Serial-In Serial-Out — both 4-bit and N-bit parameterized versions.

4-bit PISON-bit PISO 4-bit SIPO4-bit SISO Arithmetic Shift Register

📡 Edge Detectors 6 modules ▼

8-Bit Rising Edge Detector

8-bitPosedge

▼

Detects rising transitions on individual bits of an 8-bit input. One-cycle delayed comparison via a registered copy of the input.

Core Logic

always_comb for(int i=0; i<8; i++) pedge_int[i] = (q_dly[i]==0 && in[i]==1) ? 1'b1 : 1'b0;

8-Bit Any-Edge Detector (Rise + Fall)

Dual-EdgeXOR detect

▼

Detects both rising and falling edges simultaneously on an 8-bit bus using XOR of current and delayed values.

Key Expression

assign edge_detect = in ^ q_dly; // 1 on any transition

32-Bit Edge Detection

32-bit

▼

Scales the same edge-detection pattern to 32-bit width. Per-bit rising edge flag generation for interrupt-controller style event monitoring.

Dual-Edge Flip-Flop

DDRBoth Edges

▼

Flip-flop that captures data on both rising and falling clock edges — fundamental building block for DDR (Double Data Rate) interfaces.

🔒 Flip-Flops & Latches 4 modules ▼

JK Flip-Flop

FFToggle

▼

Full JK flip-flop with all four states: hold, reset, set, and toggle. Demonstrates all classical FF behaviour in synthesisable SystemVerilog.

J=0, K=0

Hold (no change)

J=0, K=1

Reset (Q→0)

J=1, K=0

Set (Q→1)

J=1, K=1

Toggle (Q→~Q)

Simple Latch (Level-Sensitive)

Latchalways_latch

▼

Transparent latch using SystemVerilog's always_latch construct. Passes D to Q while enable is high; holds value when enable de-asserts.

RTL

always_latch if (ena) q <= d;

16-Bit Byte-Enable Flip-Flop

16-bitByte Enable

▼

16-bit register with per-byte write enables. Only the enabled byte lanes update on clock edge — as seen in bus-connected register banks.

Width

16-bit (2 bytes)

Enable

be[1:0] per byte

🔍 Pattern Detectors (FSM) 2 modules ▼

FSM: "1101" Pattern Detector (Non-Overlapping)

FSMMealy

▼

Mealy FSM that detects the bit sequence "1101" on a serial input stream. Non-overlapping mode resets the state machine after each detection.

Pattern

"1101"

FSM Type

Mealy (4 states)

Mode

Non-overlapping

States

S0 → S1 → S2 → S3

FSM: "111" Pattern Detector (Non-Overlapping)

FSM

▼

Detects three consecutive 1s. Non-overlapping: after detection the FSM resets to S0 rather than continuing with the third bit as the start of a new match.

⏱️ Clock Generators & Dividers 4 modules ▼

Divide-by-7 Clock Generator

Freq Div50% Duty

▼

Parameterizable clock divider using a MOD-8 counter. Generates approximately 50% duty-cycle output for odd divider ratios — a common interview topic.

Divider Ratio

7 (parameterizable)

Duty Cycle

~50%

Counter

MOD-8

Technique

Dual-edge toggle

1Hz from 1000Hz Frequency Divider

÷1000

▼

Divides a 1000Hz input clock down to 1Hz for real-time / one-second tick generation. Useful for timer and RTC (Real-Time Clock) modules.

1000MHz Clock with 50% Duty Cycle

Testbench

▼

Generates a 1GHz reference clock with precisely 50% duty cycle. Verified with simulation testbench for timing accuracy validation.

⚖️ Arbiters & Priority Logic 3 modules ▼

4-Bit Round-Robin Arbiter

FairnessOne-Hot GrantTestbench

▼

Round-robin arbiter for 4 requesters. Rotating priority pointer guarantees fairness — no single requester can starve others. One-hot grant output. Includes enhanced documentation.

Ports

4 requesters

Grant

One-hot signal

Fairness

Rotating pointer

Files

design + testbench + docs

Static Priority Arbiter

Fixed PriorityParameterizable

▼

Cascading if-else combinational arbiter with fixed priority ordering. Grants the highest-priority active requester. Parameterizable port count.

Round-Robin Arbiter: Starvation Debug

DebugBug FixGTKWave

▼

Debugging exercise fixing starvation in a faulty round-robin arbiter. The bug: pointer advanced even when no request was active. Fix: advance pointer only on active grant. Includes GTKWave waveform captures and debug scripts.

Bug

Pointer advanced on idle

Fix

Guard with req_any signal

Tools

GTKWave, VCS

Artifacts

Waveform + debug script

🗄️ FIFO & Memory 3 modules ▼

Asynchronous FIFO (CDC-Safe)

Async FIFOGray CodeCDCTestbench

▼

Production-grade asynchronous FIFO for clock domain crossing. Gray-coded pointers prevent multi-bit transitions. Two-stage synchronizers protect against metastability. Parameterizable depth and width. Verified across independent 100MHz/71MHz clock domains.

CDC Safety

Gray code + 2-FF sync

Clocks

Independent wclk / rclk

Parameters

DEPTH, WIDTH

Flags

full, empty

Gray Code Conversion

function logic [PTR-1:0] bin2gray(input logic [PTR-1:0] b); return b ^ (b >> 1); endfunction

Synchronous FIFO with Threshold Flags

Sync FIFOAlmost-Full

▼

Single-clock FIFO with almost-full and almost-empty threshold flags. Extra MSB in pointer distinguishes full from empty. Configurable thresholds for backpressure control.

Full Detect

MSB XOR of read/write ptrs

Empty Detect

Pointer equality

Extra Flags

almost_full, almost_empty

Ptr Width

ADDR_WIDTH + 1

FIFO with Asymmetric Read/Write Width

AsymmetricWidth Mismatch

▼

FIFO where the write-side data width differs from the read-side. Common in protocol bridges (e.g., 32-bit write → 8-bit serial read). Requires careful pointer arithmetic to track occupancy across different granularities.

🔌 Protocols & Interfaces 4 modules ▼

APB Master Interface

APBFSMAMBATestbench

▼

Full APB (AMBA Advanced Peripheral Bus) master implementation driven by a 2-bit command input. Three-state FSM: IDLE → SETUP → ACCESS. Supports read-modify-write operations.

Protocol

AMBA APB

FSM

IDLE → SETUP → ACCESS

Cmd 2'b01

Read from 0xDEAD_CAFE

Cmd 2'b10

Inc + write back

Events-to-APB Bridge

APBEvent Counter

▼

Converts three independent hardware event pulses into APB write transactions. Each event has a dedicated counter and a fixed target address.

Event A Addr

0xABBA_0000

Event B Addr

0xBAFF_0000

Event C Addr

0xCAFE_0000

FSM

IDLE → SETUP → ACCESS

Skid Buffer (Valid/Ready Decoupling)

HandshakePipelineBackpressure

▼

Buffer that decouples a producer from a consumer in a valid/ready handshake pipeline. Absorbs one beat of backpressure without stalling the upstream. Enables full-throughput pipelining.

Data Width

8-bit

Protocol

Valid / Ready

Depth

1 skid entry

Application

AXI / AXIS pipelines

Parallel-to-Serial Converter (Valid/Ready)

P2SFSMShift Register

▼

Converts parallel data to a serial bit stream using a valid/ready handshake on both input and output sides. Two-state FSM (ST_RX → ST_TX) with shift register for bit-by-bit serialisation.

FSM

ST_RX → ST_TX

Width

Parameterizable

Protocol

Valid/Ready both sides

Special Case

DATA_W=1 optimised

🧬 Cellular Automata 2 modules ▼

Rule 90 Cellular Automaton (512-bit)

Parallel Compute512-bit

▼

512-cell elementary cellular automaton implementing Rule 90 (next cell = left XOR right neighbour). Generates fractal-like Sierpiński triangle patterns. Loadable initial state; boundary cells treat out-of-bounds as 0.

State Width

512 bits

Rule

next[i] = left XOR right

Boundary

Zero padding

Pattern

Sierpiński triangle

Core Update Rule

for (int i = 0; i < 512; i++) begin left = (i == 0) ? 1'b0 : current[i-1]; right = (i == 511) ? 1'b0 : current[i+1]; next[i] = left ^ right; end

Rule 110 Cellular Automaton

Turing Complete

▼

Rule 110 is the simplest known Turing-complete cellular automaton. More complex update rules than Rule 90 — the output depends on 3-cell neighbourhood combinations.

🔄 Data Converters & Misc Logic 8 modules ▼

Running / Sliding Window Average

DSPCircular Buffer

▼

N-entry sliding window average using a circular buffer and accumulator. Old values are subtracted as new ones arrive — no full re-sum each cycle. Output is right-shifted by log2(N) for division.

Core Expression

nxt_acc = data_i + acc - ({32{count_max}} & mem[ptr]); average_o = nxt_acc >> $clog2(N);

Binary to One-Hot Converter

EncodingTestbench

▼

Converts a binary-encoded value to its one-hot representation. Common in decoder circuits, mux selects, and state machine output encoding.

Little-Endian ↔ Big-Endian Converter

Byte Swap

▼

Byte-order reversal for protocol bridging. Converts between little-endian and big-endian representations — used in bus bridges and network packet processors.

8-Bit Priority Encoder

PriorityCombinational

▼

Encodes an 8-bit one-hot or priority input to a 3-bit binary output representing the highest-priority active bit. Foundation of interrupt controllers.

K-Map Optimised Logic (6 Variants)

K-MapMinimization

▼

Six combinational logic problems solved via Karnaugh-map minimisation. Demonstrates POS/SOP reduction and gate-level optimisation from truth tables.

3-Input LUT via Shift Register

LUTFPGA Primitive

▼

Implements a 3-input Look-Up Table using a shift register — the same principle used inside FPGA LUT primitives. Demonstrates how FPGAs map arbitrary logic.

📚 Design Articles & Deep-Dives 2 articles ▼

RV32I Single-Cycle Processor Design — YARP

RISC-VRTL CPUSystemVerilog

▼

Comprehensive 14-chapter reference covering computer architecture foundations, all six RV32I instruction formats, and a complete SystemVerilog implementation of YARP — Yet Another RISC-V Processor. Includes Fetch, Decode, Register File, ALU, Data Memory, and Control Unit RTL with SVG datapaths and worked examples.

ISA

RISC-V RV32I

Chapters

14 · Basics to Advanced

Language

SystemVerilog RTL

Read Time

~25 min

📖 Read Article ↓ Download PDF

4-Way Set Associative VIPT Instruction Cache — RISC-V

CacheVIPT RISC-VArchitecture

▼

11-chapter deep-dive into a 16 KB, 4-Way Set Associative VIPT Instruction Cache designed for a RISC-V core. Covers the CPU–memory speed gap, parallel TLB+SRAM lookup, aliasing conditions, PLRU replacement policy, and full SoC integration — from first principles to tapeout-ready depth.

Cache Size

16 KB · 4-Way · 64 Sets

Chapters

11 · Basics to SoC

Key Topics

VIPT · PLRU · FENCE.I · TLB

Read Time

~18 min

📖 Read Article ↓ Download PDF

🖥️ RISC-V CPU Design Advanced Project ▼

RISC-V Processor Implementation

CPUISAPipelineAdvanced

▼

Full RISC-V processor design in SystemVerilog. The top-level project in the Leet Silicon vault — demonstrates integration of all fundamental concepts: ALU, register file, instruction fetch/decode, memory interface, and control logic.

ISA

RISC-V (RV32I base)

Language

SystemVerilog

Variants

Base + Modified (RISCV_mod)

Complexity

Full pipeline CPU

📖 Read Full Documentation

Chaitanyh Singh — AI Semiconductor Engineer at AMD

Building Silicon
for the AI Era

Core Skill Matrix

Professional Experience

Awards & Achievements

Training & Certifications

Education

Blogs & Write-ups

VIPT Instruction Cache — From First Principles to Silicon

RV32I Single-Cycle Processor Design — YARP

The Art of SoC Integration — Managing Complexity at 3nm

Low-Power Design with UPF — Lessons from Real Tapeouts

AI Accelerator Architectures — From an RTL Engineer's Lens

Design Vault — Restricted Access

LeetSilicon — Design Vault

67+ RTL Design Problems — Solved & Documented

Restricted — AMD Spotlight

Chaitanyh Singh — AI Semiconductor Engineer at AMD

Building Siliconfor the AI Era

Core Skill Matrix

Professional Experience

Awards & Achievements

Training & Certifications

Education

Blogs & Write-ups

VIPT Instruction Cache — From First Principles to Silicon

RV32I Single-Cycle Processor Design — YARP

The Art of SoC Integration — Managing Complexity at 3nm

Low-Power Design with UPF — Lessons from Real Tapeouts

AI Accelerator Architectures — From an RTL Engineer's Lens

Design Vault — Restricted Access

LeetSilicon — Design Vault

67+ RTL Design Problems — Solved & Documented

Restricted — AMD Spotlight

Building Silicon
for the AI Era