Skip to content
← Back to job listings

AI Scale-up Switch System Design Engineer

AMD · NJ, New Jersey US, United States

Other EngineeringQuick applyfull-timeabout 7 hours ago

About The Role

WHAT YOU DO AT AMD CHANGES EVERYTHING

At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.

THE ROLE

We are looking for a hands-on, technically sharp system design engineer to join our growing team and lead the bring-up of cutting-edge scale-up switches at the heart of next-generation AI rack infrastructure. As a key contributor, you will bring deep expertise in high-speed Ethernet, server management, and platform validation to drive switch platforms from initial power-on through full system qualification. In this role, you will take full ownership of bring-up execution, apply your debugging skills to solve complex multi-layer problems, and collaborate closely with hardware, firmware, and software teams to deliver production-ready systems.

THE PERSON

You're a highly motivated team player with a strong development background, problem solving mentality, excellent communication skills, ability to prioritize tasks along with willingness to learn and adapt. Excellent teamwork skills and capable of working independently.

KEY RESPONSIBILITIES

Lead the system bring-up and validation of state-of-the-art AI scale-up switches purpose-built for high-density GPU compute racks, from initial power-on through full system validation

Perform high-speed SerDes and link bring-up, including configuring and validating Auto-Negotiation/Link Training (AN/LT), tuning TX equalization, and characterizing signal integrity across 200G/400G/800G interfaces

Execute comprehensive link qualification testing using PRBS (Pseudo-Random Binary Sequence), Snake Traffic loopback testing, and FEC (Forward Error Correction) analysis to validate BER performance at scale

Utilize LinkCAT and Broadcom SDK tools to characterize port performance, diagnose link failures, and validate PHY configurations across large port counts

Integrate and validate server management infrastructure including BMC/IPMI, Redfish API, and out-of-band management workflows for automated bring-up and health monitoring

Develop and maintain bring-up scripts and test automation (Python) to accelerate validation coverage across chassis configurations

Debug complex system-level failures spanning hardware, firmware, and software including signal integrity issues, firmware crashes, and management plane anomalies and drive issues to root cause

Collaborate with hardware, firmware, and software teams to reproduce failures, document findings, and verify fixes across platform revisions

Maintain detailed bring-up documentation, test reports, and issue tracking throughout the product development lifecycle

PREFERRED EXPERIENCE

  • Extensive hands-on experience in hardware bring-up, platform validation, or high-speed networking silicon characterization
  • Experience with high-speed switch ASICs (Broadcom TH6/Tomahawk series preferred) and familiarity with Broadcom's SDK/DAPI frameworks
  • Deep understanding of high-speed Ethernet standards (400GbE, 800GbE) including AN/LT (IEEE 802.3), RS-FEC / KP4-FEC, and PAM4 SerDes technology
  • Hands-on experience with PRBS testing, BER measurement, eye diagram analysis, and Snake/loopback traffic validation methodologies
  • Familiarity with LinkCAT or equivalent PHY/link characterization tools
  • Experience with server management protocols: IPMI, Redfish/OpenBMC, KCS, IPMB, and PLDM for out-of-band control and telemetry
  • Proficiency in Python for test automation, log parsing, and data analysis
  • Strong debugging skills — comfortable working across hardware (oscilloscope, protocol analyzer), firmware logs, and software traces to isolate root cause
  • Experience reading schematics and PCB layout to correlate signal integrity observations with hardware design
  • Excellent communication skills with the ability to document findings clearly and collaborate across multidisciplinary teams
  • Experience with high-density switch/router platforms or AI/ML fabric infrastructure is a strong plus

ACADEMIC CREDENTIALS

Bachelor’s/Master’s degree in Computer Science or related field strongly preferred

This role is not eligible for visa sponsorship.

#LI-SC3

Benefits offered are described: AMD benefits at a glance.

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.

AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD’s “Responsible AI Policy” is available here.

This posting is for an existing vacancy.

This listing was posted by a verified recruiter at AMD. Report this listing