Job Description
We are seeking a Senior Technical Lead Site Reliability Engineer to own the reliability, scalability, performance, and operational integrity of critical production services. This role is accountable for the full-service lifecycle, from design and deployment readiness through production operations, incident response, and continuous improvement. Reliability is a core engineering responsibility, requiring strong software engineering skills and autonomous operation across AWS, hybrid data centers, and customer-hosted environments.
Roles and Responsibilities
· Own production services end to end. Accountable for reliability, availability, scalability, performance, and operational health.
· Define and manage SLIs and SLOs, using error budgets to guide delivery decisions.
· Influence of service and system design to improve fault tolerance, observability and operational sustainability.
· Debug complex production issues across application c...