EPAM Systems, Inc. | desde casa, Mexico | Posted June 27, 2026
Job Description
We are looking for an experienced **Site Reliability Engineer (SRE)** to take a leadership role in ensuring the stability, scalability, and performance of our cloud infrastructure on **Google Cloud Platform (GCP)**. As an SRE, you will be at the forefront of optimizing system reliability, automating processes, and collaborating with engineering teams to enhance operational excellence. If you're passionate about **infrastructure-as-code, automation, and building resilient systems**, we’d love to hear from you.
**Responsibilities**
- Lead reliability initiatives to optimize system performance, scalability, and cost efficiency
- Manage and participate in on-call rotations, providing 24/7 support for critical infrastructure
- Troubleshoot incidents, conduct root cause analysis (RCA), and implement long-term solutions
- Deploy and manage microservices in alignment with release cycles
- Design and maintain infrastructure-as-code solutions using Terraform
- Collaborate wi...