Service Reliability Engineer
Job Description
We have an exciting new opportunity for a Service Reliability Engineer to join our IT team, based in the A&O Shearman Belfast office.
In this role, you will oversee the maintenance of the firm's cloud platform and work closely with the wider Engineering and Architect teams. If you have a background in site reliability engineering and experience with cloud computing platforms we would love to hear from you!
Apply today via the link below or reach out to me at janet.walsh@aoshearman.com for more information.
Please note this role is based in the A&O Shearman Belfast office, with an on-site presence required in line with our hybrid working policy.
Technology Services
The Technology Services department is responsible for providing world class support and services to our business across the globe. The Infrastructure & Operations team is part of Technology Services and operates the IT services and infrastructure for the firm.
What you will do
- System Maintenance and Scalability: Ensure high availability and scalability of systems and services, collaborating with Engineering teams and Architects for optimal design and implementation. Deploy and maintain cloud services for business applications.
- Technology Evaluation: Recommend new technologies and tools to enhance the SRE toolkit.
- Performance Monitoring: Monitor and analyze performance of cloud platforms, systems, and applications, identifying and implementing improvements.
- Service Level Management: Define Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to establish response times and Mean Time to Resolve (MTTR).
- Monitoring and Alerting: Maintain real-time monitoring and alerting systems, analyzing metrics to identify trends and recommend improvements.
- Proactive Issue Resolution: Identify and address potential performance and scalability issues with development teams.
- Automation: Develop and maintain automated deployment and testing processes to ensure code changes do not impact system performance or reliability.
- Collaboration: Work with development teams to design and implement systems that meet performance and reliability requirements, providing feedback and recommendations.
- Incident Response: Lead incident response, troubleshooting, and resolution, conducting postmortems and root cause analysis to prevent future issues.
- Documentation and Procedures: Maintain deployment and support documentation, develop Standard Operating Procedures (SOPs) for common tasks, and create effective response procedures for major incidents.
- Disaster Recovery: Test and improve disaster recovery strategies for critical business services, ensuring alignment with Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO).
What you will have
- Experience in a Site Reliability Engineering or related role
- Proven examples of eliminating labour intensive, repetitive low value tasks to eliminate toil.
- Strong experience with cloud computing platforms, such as AWS or Azure.
- Familiar with programming and scripting languages, such as Python, .NET, Bash, Powershell.
- Experience with ELK is advantageous, but experience in other monitoring and alerting tools such as Prometheus or Grafana or Azure Monitoring and Azure Log Analytics is acceptable.
- Experience with automation tools, such as Ansible, Azure BICEP or Terraform
- Experience with containerisation technologies, such as Docker or Kubernetes
- Knowledge of networking, storage, and database technologies
- Excellent troubleshooting and problem-solving skills
- Strong communication and collaboration skills
- Experience with agile development methodologies, such as Scrum or Kanban
- Solid understanding of security and governance best practices in cloud computing, including IAM, network security, and data protection.
What we can offer you
At A&O Shearman, we recognise that our people are our most valuable asset, which is reflected in the wide range of benefits that are available to our employees. Some of these benefits include: our occupational pension scheme, group income protection cover, private medical insurance, mental health resources and free apps, health and wellbeing services including GP service, emergency back-up care support, parental and special leave, holiday entitlement increasing with length of service, holiday trading, online discounts and lifestyle management services.
Should you require additional support at any stage of the recruitment process due to a disability or a health condition, please do not hesitate to contact a member of our recruitment team who will work with you to provide any adjustments as required.
We are an equal opportunities recruiter and do not discriminate on the basis of race, colour, sex, religion, sexual orientation, national origin, disability, or any other protected characteristic.
We recognise the value of flexible working and embrace hybrid working, allowing our people to work from home up to 40% of their working time. We do however remain committed to working together in person for the remaining 60% of time so that we can learn, grow and succeed together.
If this role is not of interest we may have another suitable opportunity here for you at A&O Shearman! Visit our careers portal at A&O Shearman Careers and submit your CV to our Talent Network to be notified when the perfect opportunity becomes available.