[Remote] Manager, Site Reliability Engineering - Fleet Management
Note: The job is a remote job and is open to candidates in USA. MongoDB is a leading database company that empowers customers to innovate at market speed. The Manager of Site Reliability Engineering for Fleet Management will oversee a team dedicated to managing Kubernetes infrastructure and ensuring operational reliability, while also contributing to strategic technical roadmaps.
Responsibilities
- Manage a team of 6-8 engineers, fostering a positive culture, handling career growth and performance conversations, and proactively removing blockers
- Help develop a clear technical vision and comprehensive roadmap for our runtime environment, balancing long-term strategic infrastructure goals with immediate engineering needs
- Contribute through light hands-on technical work, such as leading architectural design reviews, reviewing PRs, and stepping in to guide the team through complex operational challenges
- Act as the primary liaison for the Fleet Management team, collaborating closely with other engineering leaders to ensure platform alignment and manage stakeholder expectations
Skills
- 10+ years of experience working on software and operating distributed systems, with 2+ years managing engineering teams
- Possess a customer-focused mindset, treating internal developers as your primary users
- Value efficiency in processes and operations, and have a track record of optimizing team workflows
- Prefer automation over manual processes ('allergic to ops work'), fostering a culture of building software solutions to eliminate toil
- Have deep technical familiarity with Kubernetes ecosystems, containerization technologies, and modern IaC tooling (e.g., Terraform, Crossplane, or Operators) so you can effectively guide the team's technical decisions
- Excel at translating complex business and engineering requirements into actionable, phased technical roadmaps
- Have a high level of empathy, responsibility, ownership, and accountability
- Excellent verbal and written technical communication skills
- Leading major architectural shifts, such as migrating teams from traditional IaC to Operator-driven lifecycle management
- Managing and scaling infrastructure across multi-cloud environments (AWS, GCP, or Azure)
- Designing secure, multi-tenant runtime environments at scale
Benefits
- Equity
- Participation in the employee stock purchase program
- Flexible paid time off
- 20 weeks fully-paid gender-neutral parental leave
- Fertility and adoption assistance
- 401(k) plan
- Mental health counseling
- Access to transgender-inclusive health insurance coverage
- Health benefits offerings
Company Overview
Company H1B Sponsorship