Site Reliability Engineering Manager
This job is brought to you by Jobs/Redefined, the UK's leading over-50s age inclusive jobs board.
Role: Site Reliability Engineering Manager (Website Infrastructure & Operations Manager)
Location: London / Hybrid - 2 days per week in the office
Reporting to: Head of Technology Operations
The Platform and Reliability Engineering Team are responsible for the technology platforms and services that underpin the Rightmove website, ensuring it is available, secure and performing to a world-class standard. We strive to deliver annual availability of at least 99.99% (less than 5 mins downtime a month).
The Site Reliability Engineering Manager's focus is to ensure their teams maintain our datacentre and cloud website infrastructure, safely migrate services to Google Cloud, and enable others to easily manage the reliability of production services across the Rightmove Website Estate.
A typical week as the Site Reliability Engineering Manager might involve:
• Ensuring the right people, process and tooling are in place to maintain a healthy, resilient, and secure datacentre and cloud website platform.
• Creating and managing technical plans for the migration of applications and infrastructure to Google Cloud.
• Developing cloud engineering and operations skills within your teams
• Working through supplier due diligence process for support contract renewals to ensure key components are kept in support.
• Working with engineering managers, product owners, and engineers to optimise and improve service health
• Identify, plan and implement improvements to the incident management process
• Reducing handoffs or improving flow/lead times within development teams by providing operational/infrastructure support for the platform.
We're looking for someone who:
• Has previous experience managing engineers that are building and running website infrastructure and web services and previous experience running website technical operations.
• Is highly operationally aware, understanding what it takes to maintain a healthy website infrastructure and services.
• Is an experienced manager who understands how to get the best out of their people and teams.
• Has excellent judgement and can instill this in engineers, leading them to the best outcomes on technical decisions and architecture whilst enabling their development.
• Is happy to dive deep into technical discussions with their team and can surface risks and issues relating to projects.
• Is able to keep calm and work effectively in high pressure situations
• Has experience migrating infrastructure and web services from datacentres to cloud
• Has deep experience and understanding of DevOps and SRE principles and practices
• Always pushes for continuous improvement and has strong attention to detail
Relevant Technology we use:
• F5, Juniper, Arbor
• VMware, HP 3Par
• Google Cloud Platform
• Google Kubernetes Engine with Anthos Service Mesh
• Confluent Cloud
• Incident.io
• Gitlab
• Jira, Confluence, Slack, Teams
• Elastic APM, Kibana
• Eggplant Monitoring, Xymon
• Java, Node, Python, Javascript, Go