Back to all roles

[Remote] Director of Site Reliability Engineering

Remote-first Full-time Now hiring

Note: The job is a remote job and is open to candidates in USA. Talently is a cutting-edge organization in the Technology, Information and Media industry, and they are seeking a Director of Site Reliability Engineering. In this role, you will lead and build world-class Site Reliability Engineering practices, driving strategic reliability initiatives and mentoring engineering teams in a remote-first environment.

Responsibilities

  • Define and execute a comprehensive company-wide Site Reliability Engineering strategy, embedding reliability as a core discipline across engineering teams
  • Build, lead, and develop a high-performing SRE organization, including hiring, mentoring, and fostering a reliability-focused culture
  • Establish SLIs, SLOs, KPIs, and error budgets to measure and drive platform reliability and performance improvement
  • Guide architecture decisions and technical roadmaps for highly available, resilient, and scalable distributed systems
  • Drive adoption of observability, monitoring, logging, and incident response solutions across cloud-based microservices environments, primarily on Google Cloud Platform
  • Establish and oversee robust incident response frameworks, operational governance, and post-incident analysis processes
  • Promote and implement best practices for infrastructure automation, cloud-native operations, and cost optimization
  • Lead continuous improvement and innovation initiatives, including exploring AI-driven operations and new SRE methodologies

Skills

  • 12+ years of experience in Site Reliability Engineering, Infrastructure Engineering, or DevOps in high-scale environments
  • 5+ years of proven technical leadership, building and scaling SRE teams and practices
  • Strong expertise with distributed systems, cloud-native infrastructures, microservices, and hands-on Google Cloud Platform experience (GKE, Compute Engine, Cloud Functions)
  • Deep proficiency with infrastructure as code, automation frameworks, and CI/CD deployment pipelines
  • Track record designing large-scale observability and monitoring solutions using tools like Prometheus, Grafana, Datadog, or New Relic
  • Excellent communication, organizational development, and mentorship abilities
  • Strong programming ability in Python, Go, Java, or similar languages
  • Cloud or reliability certifications (e.g., Google Cloud Professional, SRE certifications)
  • Experience implementing AIOps, anomaly detection, predictive analytics, or automated remediation/self-healing infrastructure
  • Familiarity with AI/ML tools for operational intelligence and intelligent alerting
  • Strong database performance tuning and distributed data systems knowledge
  • Comfortable operating in fast-paced, high-growth technology environments
  • Bachelor's degree in Computer Science, Engineering, or related field

Company Overview

  • Talently provides nationwide recruitment services, executive search, and career alignment programs. It was founded in 2022, and is headquartered in Newport Beach, California, US, with a workforce of 11-50 employees. Its website is https://www.talently.com/.
  • Apply To This Job

    More remote roles

    [Remote] Senior Data Scientist – Entity Resolution

    Remote-first Full-time

    [Remote] Director of Legal Recruiting

    Remote-first Full-time

    [Remote] Principal Recruiter - Life Sciences

    Remote-first Full-time

    [Remote] Principal Recruiter

    Remote-first Full-time

    [Remote] Machine Learning Engineer

    Remote-first Full-time

    [Remote] Senior Full Stack Software Engineer

    Remote-first Full-time

    [Remote] Seismic Operations Specialist

    Remote-first Full-time

    [Remote] Salesforce Solutions Lead

    Remote-first Full-time

    [Remote] Program Manager

    Remote-first Full-time

    [Remote] Product Manager, Salesforce & Internal Platforms

    Remote-first Full-time

    Cardiovascular Specialist, Health and Science Professional - Marietta, OH

    Remote-first Full-time

    Infrastructure Automation Engineer (Terraform-heavy)

    Remote-first Full-time

    Experienced Full Stack Data Entry Specialist – Web & Cloud Application Development

    Remote-first Full-time

    Experienced Virtual Assistant & Data Entry Specialist – Part-Time Remote Opportunity at arenaflex

    Remote-first Full-time

    Remote Data Entry Specialist – $30/Hour | Flexible Work-From-Home Opportunity at arenaflex

    Remote-first Full-time

    Senior Backend Engineer (Elixir)

    Remote-first Full-time

    Lead Data Analyst - Business Intelligence & Analytics (Full-Time) | Driving Strategic Insights at arenaflex

    Remote-first Full-time

    Analytics Manager - Research & Client Strategy

    Remote-first Full-time

    Software Engineer, Data Infrastructure & Acquisition - Silver Spring, MD, USA

    Remote-first Full-time

    Field Service Technician | Marysville/Bellefountaine, OH

    Remote-first Full-time