🌿 Back to all jobs
🥝 System Reliability Engineer (Data Centre)
Centre for Strategic Infocomm Technologies | Singapore, Singapore | Posted June 26, 2026
Job Description
Responsibilities
Oversee and manage IT operations within the data centre, including day-to-day monitoring, incident management, and problem managementLead the end-to-end incident management lifecycle that encompass immediate troubleshooting, root cause identification, and resolution implementation to restore services, followed by comprehensive post-incident analysisDevelop and maintain documentation on IT infrastructure, operations, and procedures within the data centrePerform capacity planning to ensure IT infrastructure is scalable for future demandsCollaborate and coordinate with Data Centre Facilities teams on matters related to power, cooling, and physical infrastructureDesign and implement robust observability platform alongside network monitoring tools for performance monitoring and real-time alerting of IT devices and networksImplement and manage remote management tools for out...