Job Description:
We are a fast-growing AI infrastructure startup based in Silicon Valley, and we’re looking for an Infra Support Engineer to join our global team.
This is a remote, work-from-home role, but you must be comfortable with 24/7 rotational shifts and on-call duties.
You will provide L1/L2 technical support for GPU clusters (NVIDIA), handle system delivery, monitoring, incident triage, and escalation to SRE teams.
Responsibilities:
- Support AI infrastructure (GPU/CPU nodes, networking, storage, orchestration) via tickets, email, Slack
- Assist GPU cluster delivery: provisioning, imaging, network validation, BIOS/firmware updates, GPU driver installation
- Monitor system health (alerts, dashboards) and respond 24x7 as scheduled
- Triage incidents, follow runbo...