Job Description
Drive AI advancements as a Senior Engineer in observability at TR. Focus on designing and operating observability tools for AI and LLM workloads in a hybrid work model.
As a key member of the Platform Engineering & Enterprise AI Services team, you will oversee the end-to-end observability stack for AI products. Your expertise will empower teams to enhance model quality, reduce latency, and control operational costs. Collaborating with cross-functional stakeholders, you will ensure robust performance monitoring and issue detection in production.
Key Responsibilities:
• Define Kubernetes deployment standards for AI services
• Own AI observability platform with Braintrust and Langfuse
• Standardize telemetry across AI products for governance
• Build telemetry pipelines and performance dashboards
• Establish monitoring and incident response practices
Requirements:
• 5+ years in SRE or Observability Engineering
...