CHAITANYH
Silicon Design Engineer
Member of Technical Staff · AMD
Chaitanyh Singh
SoC Design
Get In Touch View Career
Scroll

About

Building Silicon
for the AI Era

VLSI professional with 8 years of excellence in RTL design and SoC integration, driving the complete design engineering lifecycle — from RTL through synthesis, timing signoff, and tape-out.

Currently MTS Engineer at AMD on 3nm SoC AI projects, driving UPF development, formal verification, LINT/CDC sign-off, and Conformal flows from non-tiled RTL to final PNR netlist — powering next-generation AI silicon.

Design Compiler Genus Tempus Primetime Formality VCLP Magillem SystemVerilog UPF · IEEE-1801 Python Perl Vivado
Chaitanyh Singh
Chaitanyh Singh · MTS · AMD

Expertise

Core Skill Matrix

01🔧
RTL Design & SoC Integration
Verilog · SystemVerilog · VHDL
IP Integration · Tiled/Non-Tiled RTL
02
Synthesis
Design Compiler · Genus
Power-Aware · CGC · FE Synthesis
03⏱️
Static Timing Analysis
Tempus · Primetime
Setup/Hold/DRV · Timing Signoff
04🔋
Low Power Design
UPF · VCLP · IEEE-1801
Multi-Voltage Domain · MTCMOS
05
Formal Verification
Formality · LEC · Conformal
RTL vs Netlist · Non-Tiled vs Tiled
06🔀
Clock Domain Crossing
CDC Analysis · Constraint Dev
Violation Debug · SoC CDC Sign-off
07🧹
LINT & Constraints
LINT Reports · Timing Constraints
Partition-Level STA · Constraint Auth
08🤖
AI Scripting & Automation
Python · Perl · Unix Shell
AI-Assisted EDA · Report Generation
09🤝
Leadership & Architecture
Cross-Functional Teams · PPA Strategy
Mentoring · Stakeholder Mgmt

Career

Professional Experience

Apr 2024
Present
AMD
AMD
Bengaluru, IN
Member of Technical Staff (MTS)
Strategic Silicon Solutions
  • SoC Integration — review and resolution of LINT and CDC reports
  • IP integration for 3rd Party IPs within complex SoC hierarchies on 3nm TSMC
  • RTL UPF development for Tiles, Partitions, and Top-level; FE VCLP for clean UPF delivery
  • Formal Verification of Non-Tiled RTL vs Tiled RTL using Formality
  • Timing signoff — unconstrained endpoints, unclocked registers, constraints update
  • FE Synthesis for 3rd Party IPs — register count and CGC insertion sanity checks
  • Partner with RTL, PD, Low Power, and Automation teams to optimize PPA
  • Led Conformal Flow initiative from Non-Tiled RTL to Final PNR Netlist
TempusPrimetime Design CompilerFormality VCLPUPF / IEEE-1801
▸ 3nm SoC Integration & AI Accelerator Projects
Nov 2019
Mar 2024
Qualcomm
Qualcomm
Noida, IN
Senior Lead Engineer
SoC Implementation
  • Synthesis with timing violation reporting and resolution (setup, hold, DRVs)
  • Authored timing constraints; ran partition-level STA on 5nm and 8nm projects
  • Power-aware synthesis of tiles, partitions, and block/core levels
  • Cross-team collaboration with RTL, PD, LP, and Automation for PPA optimization
  • Generated, reviewed, and validated CDC constraints for SoC timing closure
TempusPrimetime Design Compiler5nm · 8nm
▸ 5nm & 8nm Mobile SoC Implementation
Apr 2017
Oct 2019
Intel
Intel
Bengaluru, IN
Senior Design Engineer
Consultant
  • Assessed CDC violations and reported to IP owners; added constraints to resolve
  • Functional verification and LEC of synthesis RTL vs generated netlist
  • Assisted in FEBE flow — ran DC, Formal Verification stages for releases
  • Python scripting to automate report generation and library file processing
FEBE FlowDesign Compiler Python11nm
▸ 11nm Frontend-to-Backend Handoff Projects
Dec 2016
Mar 2018
Logic Fruit
Logic Fruit Technologies
Bengaluru, IN
RTL / RND Design Engineer
FPGA
  • RTL development of video tracking system in VHDL on Artix-7
  • Local adaptive contrast enhancement algorithm in VHDL on Kintex-7
  • Canny edge detection algorithm in VHDL on Artix-7
  • 1G MAC Ethernet protocol — TX and RX sections in VHDL & Verilog
VivadoISE VHDLVerilog
▸ Image Processing FPGA Algorithm Projects
May 2015
Jun 2015
HP
Hewlett-Packard (HP)
Bengaluru, IN
Intern — VLSI & VHDL

Recognition

Awards & Achievements

🥇
Impact Award
Qualcomm
Recognized for successful, zero-bug delivery of timing constraints for SoC — setting a benchmark for constraint quality and execution excellence.
🥈
Silver Medalist
NIT Patna · 2017
Silver Medal in Electronics & Communication Engineering with CGPA 9.0 — top performer of the ECE batch at National Institute of Technology, Patna.

Credentials

Training & Certifications

Industry Certifications
Online Courses & Badges

Academic

Education

NIT Patna
9.0
CGPA out of 10.0
B.Tech — Electronics &
Communication Engineering
National Institute of Technology, Patna
2013 – 2017
🥈 Silver Medalist · Top ECE Graduate

Writing

Blogs & Write-ups

Coming Soon SoC Design

The Art of SoC Integration — Managing Complexity at 3nm

A deep-dive into how modern SoC integration workflows evolve as process nodes shrink — covering hierarchy planning, IP stitching, and cross-domain sign-off challenges.

Coming Soon RTL & Verification

Low-Power Design with UPF — Lessons from Real Tapeouts

Practical observations on IEEE 1801 UPF intent, power domain crossings, and common pitfalls that only surface at gate-level simulation or formal verification.

Coming Soon AI & Silicon

AI Accelerator Architectures — From an RTL Engineer's Lens

What it actually means to build AI silicon — the gap between algorithmic intent and synthesisable RTL, and how design decisions at block-level ripple all the way to PPA.

More write-ups on the way — follow on LinkedIn for updates.

RTL Design Practice

LeetSilicon — Design Vault

67+ RTL Design Problems — Solved & Documented

A personal practice vault of SystemVerilog implementations covering everything from basic flip-flops to async FIFOs, APB protocol masters, cellular automata, and RISC-V CPU design. Each module is synthesisable, verified, and documented.

67+
Problems
14
Categories
SV
Language
Arithmetic Circuits 3 modules
100-Bit Binary Adder
CombinationalRipple Carry
100-bit binary adder using XOR/AND ripple-carry architecture. Accepts two 100-bit operands and produces a 101-bit sum with carry-out propagation through every bit position.
Architecture
Ripple Carry Adder
Data Width
100-bit operands
Logic Style
Combinational (always_comb)
Output
101-bit sum + carry
Core Logic Snippet
assign {cout, sum} = a + b;
3-Bit Full Adder
Combinational
Classic 3-bit full adder. Generates sum and carry using XOR and AND gate logic. Building block for wider adder chains.
Inputs
a[2:0], b[2:0], cin
Outputs
sum[2:0], cout
Signed Addition Overflow Detector
Signed ArithmeticOverflow
Detects signed overflow in two's complement addition. Overflow occurs when adding two same-sign numbers yields an opposite-sign result — critical for ALU correctness.
Detection Method
MSB carry XOR logic
Application
ALU / Exception Flags
🔢 Counters & Sequences 6 modules
8-Bit Odd Counter
SequentialTestbench
Counter that increments by 2 starting from 1, generating the odd number sequence. Verified with a 128-cycle simulation testbench.
Width
8-bit
Step
+2 (odd only)
Reset Value
8'h1
Verification
128-cycle testbench
RTL
always_ff@(posedge clk or posedge reset) if (reset) cnt_o <= 8'h1; else cnt_o <= cnt_o + 8'h02;
Load-Enabled Counter
SequentialLoad
Counter with parallel load capability and enable control. When load is asserted the counter captures external data; enable gates counting independently.
Decade Counter (1–10)
MOD-10Wrap-around
Counts 1→10 in decimal then auto-wraps back to 1. Used in BCD display drivers and timing circuits.
Performance Counter
CPU PipelineParameterizable
Hardware performance counter for CPU pipeline event tracking. Parameterizable width, triggered by CPU events, software-readable with reset-on-read semantics.
Trigger
CPU event signals
Read Mode
Reset-on-read
Width
Parameterizable
Application
PMU / Profiling
Perfect Square Sequence Generator
Sequence
Generates 1, 4, 9, 16, 25… (perfect squares) using incremental difference arithmetic — avoids a multiplier in hardware.
↔️ Shift Registers & Barrel Shifters 10 modules
4-Bit Barrel Shifter (4 Modes)
Shift/RotateCombinational
Multi-mode barrel shifter supporting logical left, logical right, rotate left, and rotate right — all in one combinational block.
Mode 00
Logical Left Shift
Mode 01
Logical Right Shift
Mode 10
Rotate Left
Mode 11
Rotate Right
32-Bit Galois LFSR
PRNGFeedback
32-bit Galois-form Linear Feedback Shift Register for pseudo-random sequence generation. XOR feedback taps at bit positions 1, 2, and 22.
Width
32-bit
Reset Seed
32'h0000_0001
Feedback Taps
Bits 0, 1, 2, 22
Application
PRNG / Test Pattern Gen
100-Bit Rotator with Load
100-bitRotate
100-bit rotation register with parallel load. Rotate left/right independently via enable signals. Useful for wide data manipulation in DSP pipelines.
PISO / SIPO / SISO Registers
SerialParallel
Complete set of shift register variants: Parallel-In Serial-Out, Serial-In Parallel-Out, and Serial-In Serial-Out — both 4-bit and N-bit parameterized versions.
4-bit PISON-bit PISO 4-bit SIPO4-bit SISO Arithmetic Shift Register
📡 Edge Detectors 6 modules
8-Bit Rising Edge Detector
8-bitPosedge
Detects rising transitions on individual bits of an 8-bit input. One-cycle delayed comparison via a registered copy of the input.
Core Logic
always_comb for(int i=0; i<8; i++) pedge_int[i] = (q_dly[i]==0 && in[i]==1) ? 1'b1 : 1'b0;
8-Bit Any-Edge Detector (Rise + Fall)
Dual-EdgeXOR detect
Detects both rising and falling edges simultaneously on an 8-bit bus using XOR of current and delayed values.
Key Expression
assign edge_detect = in ^ q_dly; // 1 on any transition
32-Bit Edge Detection
32-bit
Scales the same edge-detection pattern to 32-bit width. Per-bit rising edge flag generation for interrupt-controller style event monitoring.
Dual-Edge Flip-Flop
DDRBoth Edges
Flip-flop that captures data on both rising and falling clock edges — fundamental building block for DDR (Double Data Rate) interfaces.
🔒 Flip-Flops & Latches 4 modules
JK Flip-Flop
FFToggle
Full JK flip-flop with all four states: hold, reset, set, and toggle. Demonstrates all classical FF behaviour in synthesisable SystemVerilog.
J=0, K=0
Hold (no change)
J=0, K=1
Reset (Q→0)
J=1, K=0
Set (Q→1)
J=1, K=1
Toggle (Q→~Q)
Simple Latch (Level-Sensitive)
Latchalways_latch
Transparent latch using SystemVerilog's always_latch construct. Passes D to Q while enable is high; holds value when enable de-asserts.
RTL
always_latch if (ena) q <= d;
16-Bit Byte-Enable Flip-Flop
16-bitByte Enable
16-bit register with per-byte write enables. Only the enabled byte lanes update on clock edge — as seen in bus-connected register banks.
Width
16-bit (2 bytes)
Enable
be[1:0] per byte
🔍 Pattern Detectors (FSM) 2 modules
FSM: "1101" Pattern Detector (Non-Overlapping)
FSMMealy
Mealy FSM that detects the bit sequence "1101" on a serial input stream. Non-overlapping mode resets the state machine after each detection.
Pattern
"1101"
FSM Type
Mealy (4 states)
Mode
Non-overlapping
States
S0 → S1 → S2 → S3
FSM: "111" Pattern Detector (Non-Overlapping)
FSM
Detects three consecutive 1s. Non-overlapping: after detection the FSM resets to S0 rather than continuing with the third bit as the start of a new match.
⏱️ Clock Generators & Dividers 4 modules
Divide-by-7 Clock Generator
Freq Div50% Duty
Parameterizable clock divider using a MOD-8 counter. Generates approximately 50% duty-cycle output for odd divider ratios — a common interview topic.
Divider Ratio
7 (parameterizable)
Duty Cycle
~50%
Counter
MOD-8
Technique
Dual-edge toggle
1Hz from 1000Hz Frequency Divider
÷1000
Divides a 1000Hz input clock down to 1Hz for real-time / one-second tick generation. Useful for timer and RTC (Real-Time Clock) modules.
1000MHz Clock with 50% Duty Cycle
Testbench
Generates a 1GHz reference clock with precisely 50% duty cycle. Verified with simulation testbench for timing accuracy validation.
⚖️ Arbiters & Priority Logic 3 modules
4-Bit Round-Robin Arbiter
FairnessOne-Hot GrantTestbench
Round-robin arbiter for 4 requesters. Rotating priority pointer guarantees fairness — no single requester can starve others. One-hot grant output. Includes enhanced documentation.
Ports
4 requesters
Grant
One-hot signal
Fairness
Rotating pointer
Files
design + testbench + docs
Static Priority Arbiter
Fixed PriorityParameterizable
Cascading if-else combinational arbiter with fixed priority ordering. Grants the highest-priority active requester. Parameterizable port count.
Round-Robin Arbiter: Starvation Debug
DebugBug FixGTKWave
Debugging exercise fixing starvation in a faulty round-robin arbiter. The bug: pointer advanced even when no request was active. Fix: advance pointer only on active grant. Includes GTKWave waveform captures and debug scripts.
Bug
Pointer advanced on idle
Fix
Guard with req_any signal
Tools
GTKWave, VCS
Artifacts
Waveform + debug script
🗄️ FIFO & Memory 3 modules
Asynchronous FIFO (CDC-Safe)
Async FIFOGray CodeCDCTestbench
Production-grade asynchronous FIFO for clock domain crossing. Gray-coded pointers prevent multi-bit transitions. Two-stage synchronizers protect against metastability. Parameterizable depth and width. Verified across independent 100MHz/71MHz clock domains.
CDC Safety
Gray code + 2-FF sync
Clocks
Independent wclk / rclk
Parameters
DEPTH, WIDTH
Flags
full, empty
Gray Code Conversion
function logic [PTR-1:0] bin2gray(input logic [PTR-1:0] b); return b ^ (b >> 1); endfunction
Synchronous FIFO with Threshold Flags
Sync FIFOAlmost-Full
Single-clock FIFO with almost-full and almost-empty threshold flags. Extra MSB in pointer distinguishes full from empty. Configurable thresholds for backpressure control.
Full Detect
MSB XOR of read/write ptrs
Empty Detect
Pointer equality
Extra Flags
almost_full, almost_empty
Ptr Width
ADDR_WIDTH + 1
FIFO with Asymmetric Read/Write Width
AsymmetricWidth Mismatch
FIFO where the write-side data width differs from the read-side. Common in protocol bridges (e.g., 32-bit write → 8-bit serial read). Requires careful pointer arithmetic to track occupancy across different granularities.
🔌 Protocols & Interfaces 4 modules
APB Master Interface
APBFSMAMBATestbench
Full APB (AMBA Advanced Peripheral Bus) master implementation driven by a 2-bit command input. Three-state FSM: IDLE → SETUP → ACCESS. Supports read-modify-write operations.
Protocol
AMBA APB
FSM
IDLE → SETUP → ACCESS
Cmd 2'b01
Read from 0xDEAD_CAFE
Cmd 2'b10
Inc + write back
Events-to-APB Bridge
APBEvent Counter
Converts three independent hardware event pulses into APB write transactions. Each event has a dedicated counter and a fixed target address.
Event A Addr
0xABBA_0000
Event B Addr
0xBAFF_0000
Event C Addr
0xCAFE_0000
FSM
IDLE → SETUP → ACCESS
Skid Buffer (Valid/Ready Decoupling)
HandshakePipelineBackpressure
Buffer that decouples a producer from a consumer in a valid/ready handshake pipeline. Absorbs one beat of backpressure without stalling the upstream. Enables full-throughput pipelining.
Data Width
8-bit
Protocol
Valid / Ready
Depth
1 skid entry
Application
AXI / AXIS pipelines
Parallel-to-Serial Converter (Valid/Ready)
P2SFSMShift Register
Converts parallel data to a serial bit stream using a valid/ready handshake on both input and output sides. Two-state FSM (ST_RX → ST_TX) with shift register for bit-by-bit serialisation.
FSM
ST_RX → ST_TX
Width
Parameterizable
Protocol
Valid/Ready both sides
Special Case
DATA_W=1 optimised
🧬 Cellular Automata 2 modules
Rule 90 Cellular Automaton (512-bit)
Parallel Compute512-bit
512-cell elementary cellular automaton implementing Rule 90 (next cell = left XOR right neighbour). Generates fractal-like Sierpiński triangle patterns. Loadable initial state; boundary cells treat out-of-bounds as 0.
State Width
512 bits
Rule
next[i] = left XOR right
Boundary
Zero padding
Pattern
Sierpiński triangle
Core Update Rule
for (int i = 0; i < 512; i++) begin left = (i == 0) ? 1'b0 : current[i-1]; right = (i == 511) ? 1'b0 : current[i+1]; next[i] = left ^ right; end
Rule 110 Cellular Automaton
Turing Complete
Rule 110 is the simplest known Turing-complete cellular automaton. More complex update rules than Rule 90 — the output depends on 3-cell neighbourhood combinations.
🔄 Data Converters & Misc Logic 8 modules
Running / Sliding Window Average
DSPCircular Buffer
N-entry sliding window average using a circular buffer and accumulator. Old values are subtracted as new ones arrive — no full re-sum each cycle. Output is right-shifted by log2(N) for division.
Core Expression
nxt_acc = data_i + acc - ({32{count_max}} & mem[ptr]); average_o = nxt_acc >> $clog2(N);
Binary to One-Hot Converter
EncodingTestbench
Converts a binary-encoded value to its one-hot representation. Common in decoder circuits, mux selects, and state machine output encoding.
Little-Endian ↔ Big-Endian Converter
Byte Swap
Byte-order reversal for protocol bridging. Converts between little-endian and big-endian representations — used in bus bridges and network packet processors.
8-Bit Priority Encoder
PriorityCombinational
Encodes an 8-bit one-hot or priority input to a 3-bit binary output representing the highest-priority active bit. Foundation of interrupt controllers.
K-Map Optimised Logic (6 Variants)
K-MapMinimization
Six combinational logic problems solved via Karnaugh-map minimisation. Demonstrates POS/SOP reduction and gate-level optimisation from truth tables.
3-Input LUT via Shift Register
LUTFPGA Primitive
Implements a 3-input Look-Up Table using a shift register — the same principle used inside FPGA LUT primitives. Demonstrates how FPGAs map arbitrary logic.
🖥️ RISC-V CPU Design Advanced Project
RISC-V Processor Implementation
CPUISAPipelineAdvanced
Full RISC-V processor design in SystemVerilog. The top-level project in the Leet Silicon vault — demonstrates integration of all fundamental concepts: ALU, register file, instruction fetch/decode, memory interface, and control logic.
ISA
RISC-V (RV32I base)
Language
SystemVerilog
Variants
Base + Modified (RISCV_mod)
Complexity
Full pipeline CPU

Contact

Let's build
tomorrow's silicon
together.

Open to discussions on semiconductor design challenges, AI accelerator architecture, and collaboration in the EDA/VLSI space.

// Current Focus · 2025–26

At AMD Strategic Silicon, I work on 3nm AI SoC projects where every transistor matters. The world now runs on AI — and that AI runs on silicon. From RTL to tape-out, I ensure our chips deliver the performance, power, and area targets that keep AMD at the frontier of AI computing.

AI Accelerators 3nm TSMC N3E SoC Integration Low Power UPF Formal Verification PPA Optimization