
Service Design & Performance Specialist
Early applicant
Hybrid
Mid Level
A key role in Arm’s next‑generation SMO, this position owns service quality and reliability, embedding SRE principles and guiding features into production while eliminating operational risk early.
You will champion resilient, observable, and automated services, defined by meaningful performance and experience metrics. From day one, you will ensure strong support readiness and reliability standards, building stability, transparency, and trust across the ecosystem.
Responsibilities
- Define and govern Service Acceptance Criteria aligned to ITIL v4 and SRE principles.
- Lead risk-based readiness reviews for new services, features, and major releases.
- Govern service transition into production, ensuring operational viability, release integrity, rollback capability, and launch readiness.
- Embed SLOs, SLIs, SLAs, and experience-led XLAs into service design.
- Ensure services have measurable reliability targets, defined error budgets, and are supportable aligned with IT operating model, with defined automation standards.
- Partner with engineering and operational teams to embed SRE practices early in the lifecycle.
- Validate resilience patterns and address reliability gaps prior to production release.
- Embed observability, monitoring, alerting, and health models into service architecture.
- Ensure performance, resilience, automation, and AI-readiness are built into services by design.
- Maintain service health & performance dashboards and reliability reporting.
- Validate operational documentation, knowledge articles, runbooks, and support models.
- Proactively identify systemic, architectural, and design risks before they materialise as incidents.
Requirements
- Demonstrated ability in IT Service Design, Service Transition, or Reliability Engineering, with at least 5 years operating in a fast pace, engineering delivery environment.
- ITIL v4 certified with foundation as a minimum.
- Experience embedding SRE principles and reliability practices into the end-to-end service lifecycle.
- Experience defining and governing measurable service performance models, including SLIs, SLOs, SLAs, and error budgets as well as integrating observability, monitoring and performance telemetry into build.
- Experience working within DevOps and CI/CD environments, including release and deployment governance.
- Data-driven mentality with the ability to use metrics to guide service quality and performance decisions.
Nice To Have
- Practical familiarity and implementation of Site Reliability Engineering (SRE) operating models.
- Experience designing or supporting AI-enabled or automation-first service workflows!
- Certification in ITIL V4 modules in addition to foundation, ideally at Practice Manager level.
- Experience in Service Now!
- Knowledge of SaaS and PaaS based services and applications.
With Arm’s growth trajectory, you’ll have clear opportunities to develop your career, take on new challenges, and make a real impact on our continued success.
Skills
ITIL v4
SRE principles
Service design
Service transition
Reliability engineering
DevOps
CI/CD
Observability
Monitoring
Performance metrics
ServiceNow
SaaS
PaaS


