🌿 Back to all jobs

🥝 Site Reliability Engineer | $70/hr Remote

Crossing Hurdles | Remote, South-Africa | Posted June 06, 2026

Job Description

Responsibilities

  • Deploy, monitor, and recover containerized AI training environments.
  • Troubleshoot infrastructure bottlenecks and resolve system failures in real time.
  • Build and manage resilient systems for stability and performance optimization.
  • Collaborate with engineering teams to improve CI/CD pipelines and automation.
  • Manage filesystem structures, storage, and process scheduling in containerized environments.
  • Execute dynamic replanning during runtime issues and system failures.
  • Document system processes, solutions, and best practices.

Requirements

  • Strong experience with terminal-based system administration and troubleshooting.
  • Expertise in containerized environments such as Docker or Kubernetes.
  • Strong Python skills for scripting, automation, and debugging.
  • Proficiency in Bash and familiarity with additional programming languages.
  • Apply for This Position

    Submit Application