🌿 Back to all jobs

🥝 Software Development Manager, AWS Neuron SDK - Distributed Training

Amazon | Cupertino, United States | Posted June 29, 2026

Job Description

Description
Job description
AWS Neuron is a software stack for the Annapurna Inferentia and Trainium machine
learning accelerators hosted inside AWS EC2 Trn1/2 and Inf1 servers.

As the Principal Engineer for the Neuron Distributed Training team, you will be responsible for working hands-on with a strong team of engineers to help design and optimize ML on Neuron devices. Specifically focus on bringing up a coherent solution across the stack to increase the training resiliency for ultra clusters with thousands of nodes. You will Scale and Optimize the application stack for LLMs that leverage multi-modal modes of input/output-generation such as Text, Vision, Video, Audio etc. You will be responsible for the full development life cycle of providing Distributed Training support for multi-modal transformer models such as MM-Llama3.2, DiT/Pixart, CLIP etc. You will develop scalability features and performance optimizations in the Neuron ML Framework components to enable them...

Apply for This Position

Submit Application