Title: AI Infra Engineer
Duration: 10+ Months
Location: Morrisville, NC, 27560 Short Description: This role combines IT operations, hardware troubleshooting, and AI infrastructure expertise. expect to handle day-to-day system administration, diagnose and resolve issues, and ensure optimal performance for ML workloads.
Key Responsibilities
- Hardware Management and Troubleshooting: Monitor and maintain GPU servers/workstations, including diagnosing and resolving hardware failures (e.g., GPU faults, power issues, cooling problems). Coordinate repairs, replacements, or upgrades as needed to ensure system uptime.
- Software and Driver Management: Install, update, and configure CUDA drivers, Linux operating systems (e.g., Ubuntu or CentOS), and related dependencies. Ensure compatibility across hardware and sof...