Back to all roles

Site Reliability Engineer, K8s

Remote-first Full-time Now hiring

WebMD and its affiliates is an Equal Opportunity/Affirmative Action employer and does not discriminate on the basis of race, ancestry, color, religion, sex, gender, age, marital status, sexual orientation, gender identity, national origin, medical condition, disability, veterans status, or any other basis protected by law. Position Overview Our BI team runs a set of GCP-based APIs and data services that a lot of internal products depend on. As we've grown, keeping things running has increasingly been a side responsibility for engineers who are primarily building features — and that's not sustainable. We're looking for an SRE to own that space: service health, incident response, infrastructure monitoring, and making sure we're not blindly burning cloud budget. The Site Reliability Engineer will ensure the availability, performance, and security of the Business Intelligence team's GCP-hosted APIs and data infrastructure. This role is responsible for proactive monitoring, incident response, and continuous improvement of platform reliability across a cloud-native stack. The engineer will work closely with backend and data engineers to maintain service health and drive operational excellence. This position also carries responsibility for GCP cost visibility, helping the team track and optimize cloud spend through structured monitoring and alerting.

Responsibilities

Monitor and maintain uptime of GCP-hosted APIs and services, keeping performance within agreed targets Lead incident response for BI platform services — triage, resolve, and follow up with post-mortems that actually prevent recurrence Build and manage observability infrastructure: dashboards, alerts, and logging across GCP services Track GCP cloud spend and set up cost alerting to flag anomalies before they become problems Review and fix security gaps — IAP configs, service account permissions, API access controls Work with data and backend engineers to shore up reliability of data pipelines and BigQuery workflows Contribute to infrastructure-as-code and help keep deployments documented and reproducible Qualifications 2+ years in a Site Reliability, DevOps, or Cloud Infrastructure role in a production environment Bachelor's degree in Computer Science, Engineering, or related field, or equivalent hands-on experience Practical experience with GCP — Cloud Run, API Gateway, and BigQuery in particular Experience with monitoring and observability tooling (Cloud Monitoring, Datadog, or similar) Solid grasp of cloud security fundamentals — IAM, network controls, access management Proficiency with Git and version control in a team setting Please list the preferred skills here: CI/CD pipelines and deployment automation (GitHub Actions, Cloud Build, or similar) Terraform or other infrastructure-as-code tools Python for scripting or automation MySQL, Spanner, or BigQuery at any meaningful depth GCP cost management and spend optimization Experience with dbt or Looker Comfortable working across CET/EST hours in a distributed team Apply To This Job

More remote roles

BCS Business Card Consultant - Atlanta

Remote-first Full-time

Manager, Digital & Technology Internal Audit

Remote-first Full-time

Senior NERC Compliance Specialist

Remote-first Full-time

Machine Learning Research Engineer

Remote-first Full-time

Principal, Business Development

Remote-first Full-time

Vision Sales Engineer - México

Remote-first Full-time

Business Development Representative

Remote-first Full-time

Systems Engineer

Remote-first Full-time

Clinical Product Specialist Women's Health Care London and South East

Remote-first Full-time

Integration Support Specialist

Remote-first Full-time

Contact Center Quality Analyst

Remote-first Full-time

Experienced Full Stack Data Entry Specialist – Work From Home Opportunity at arenaflex

Remote-first Full-time

Experienced Part-Time Remote Chat Support Agent – Flexible Hours, Competitive Pay, and Career Growth Opportunities

Remote-first Full-time

Senior Manager, Events & Community

Remote-first Full-time

Experienced Customer Service Representative – Full Time Remote Jobs at arenaflex

Remote-first Full-time

Experienced Customer Service Representative – Delivering Exceptional Experiences for arenaflex Customers

Remote-first Full-time

Teletherapy School Psychologist in MT

Remote-first Full-time

Experienced Customer Service Representative – Contract to Hire Position in Hampton, VA at arenaflex

Remote-first Full-time

Strategic Initiatives, Codex

Remote-first Full-time

Remote Customer Service Representative – Global Aviation Support for arenaflex – Flexible Work‑From‑Home Role with Career Growth Opportunities

Remote-first Full-time