NVIDIA is looking for a talented Senior HPC and AI Networking Performance Research and Analysis Engineer to join our Performance group.
The ideal candidate will profile and analyze AI workloads on large GPUs and CPUs scale clusters for distributed Deep Learning LLM training focusing at the collectives communication and networking.
You will work and interact with many types of HW and platforms such as HCAs, Switches, CPUs, GPUs, and Systems.
You will experience with and develop performance analysis tools and methodologies to dive deeply into the details, understand performance expectation, limitations, and bottlenecks.
What you'll be doing:
Experience and research AI workloads and DL models specifically tailored for large-scale deep learning LLM training on NVIDIA supercomputers with a focus on High-performance networking.
Benchmarking, Profiling, and Analyzing the performance to find ...