DADACONSULTANTS PTE. LTD. is seeking an engineer to develop distributed training infrastructure for large-scale AIGC model development. The ideal candidate will design and build distributed training toolchains, optimize across compute, communication, and storage layers, and diagnose training bottlenecks.
Candidates should have a Master's degree in Computer Science and over 2 years of hands-on experience with distributed training paradigms like Tensor and Pipeline Parallelism, coupled with proficiency in PyTorch and CUDA programming.
#J-18808-Ljbffr