dadaconsultants pte. ltd. | singapore, Singapore | Posted June 05, 2026
Job Description
About our client A technology group establishing a new AI Centre of Excellence in Singapore is looking for an engineer to own the distributed training infrastructure for large-scale AIGC model development. What you'll work on Design and build distributed training toolchains supporting ultra-large-scale model training Optimise across compute, communication, and storage layers Diagnose and resolve training bottlenecks improve stability and throughput Track and apply frontier distributed training techniques end-to-end What we're looking for Master's or above in CS or related field 2+ years of relevant experience Deep hands-on experience with distributed training paradigms: Data / Pipeline / Tensor / Expert Parallelism Proficient in PyTorch, DeepSpeed, Megatron-LM Familiar with GPU architecture and CUDA programming experience in CUDA kernel development and NCCL/cuDNN Understanding of AIGC pre-training, Transformer architectures, and Diffusion models (Stable Diffusion, Flux)
About Us Dad...