Job Description
Description
The Software Development Engineer will lead the team in technical strategy, design, build, and operation of infrastructure services including provisioning and availability of AWS Trainium-based AI servers. This role requires expertise in architecting large-scale systems, building micro services, and cross-functional collaboration with several other teams such as capacity management, hardware engineering, and datacenter teams to manage AI/ML infrastructure.
Key job responsibilities
- Design and develop innovative technologies that power the infrastructure supporting AI workloads on Ultraservers
- Lead technical projects establishing EC2 as the pioneer in cloud computing for AI/ML workloads across diverse applications including LLMs, multimodal systems, and emerging model architectures.
- Collaborate with various teams to influence architecture of provisioning systems and improve to operate at scale and efficiently.
- Build customer relationships by in...