What the Role Entails You will support the reliability, scalability, and security of Tencent’s business‑critical systems in a cloud‑native environment.
System Monitoring & Incident Response - Monitor production systems using tools like Prometheus/Grafana; identify and troubleshoot outages.
- Participate in on‑call rotations to resolve real‑time incidents (with mentor guidance).
Automation & DevOps Practices - Develop scripts (Python/Shell) to automate deployment, scaling, and recovery tasks.
- Assist in CI/CD pipeline optimization using GitLab, Docker, and Kubernetes.
Infrastructure Optimization - Analyze system performance metrics; propose solutions to enhance reliability and cost efficiency.
- Support cloud infrastructure management (Tencent Cloud/AWS/Azure).
Collaboration & Documentation - Work with cross‑functional teams (Dev, Data, Security) to design SLOs/SLIs f...