🌿 Back to all jobs

🥝 Site Reliability Engineer (HPC)

Microsoft Corporation | Multiple Locations, United States | Posted June 03, 2026

Job Description

**Overview**

As Microsoft continues to push the boundaries of AI, we are on the lookout for passionate individuals to work with us on the most interesting and challenging AI questions of our time. Our vision is bold and broad — to build systems that have true artificial intelligence across agents, applications, services, and infrastructure. It’s also inclusive: we aim to make AI accessible to all — consumers, businesses, developers — so that everyone can realize its benefits.

We’re looking for an experienced HPC **Site Reliability Engineer (SRE)** to join our High Performance Computing (HPC) infrastructure team. In this role, you’ll blend software engineering and systems engineering to keep our large-scale distributed AI infrastructure reliable and efficient. You’ll ensure that AI systems stay efficient and reliable with very high uptimes.

**Microsoft Superintelligence Team**

Microsoft Superintelligence team’s mission is to empower every person and...

Apply for This Position

Submit Application