🌿 Back to all jobs
🥝 Site Reliability Engineer | $70/hr Remote
Crossing Hurdles | Remote, South-Africa | Posted June 06, 2026
Job Description
Responsibilities
- Deploy, monitor, and recover containerized AI training environments.
- Troubleshoot infrastructure bottlenecks and resolve system failures in real time.
- Build and manage resilient systems for stability and performance optimization.
- Collaborate with engineering teams to improve CI/CD pipelines and automation.
- Manage filesystem structures, storage, and process scheduling in containerized environments.
- Execute dynamic replanning during runtime issues and system failures.
- Document system processes, solutions, and best practices.
Requirements
- Strong experience with terminal-based system administration and troubleshooting.
- Expertise in containerized environments such as Docker or Kubernetes.
- Strong Python skills for scripting, automation, and debugging.
- Proficiency in Bash and familiarity with additional programming languages.