🌿 Back to all jobs
🥝 Lead Site Reliability Engineer
EPAM Systems, Inc. | desde casa, Mexico | Posted June 05, 2026
Job Description
Join our team as a **Lead Site Reliability Engineer** dedicated to providing advanced support for critical Azure-based systems.
**Responsibilities**
- Resolve complex incidents to ensure system availability
- Maintain reliability and performance of Azure-based enterprise infrastructure
- Deploy observability, monitoring, and logging tools
- Automate infrastructure management with Terraform and scripting technologies
- Improve system performance and uptime through centralized monitoring
- Collaborate with multiple teams to enhance service reliability
- Perform root cause analysis and oversee postmortems for incidents
- Configure deployment pipelines in Azure DevOps for secure workflows
- Write and maintain automation scripts for incident recovery and recurring tasks
- Enhance monitoring frameworks with platforms like Prometheus and Grafana
- Respond promptly to incidents to meet SLA expectations
- Facilitate integration of monitoring data from Azur...