← Back to job listings
D&
Senior Site Reliability (R-19383)
Dun & Bradstreet · Dublin, Ireland
About The Role
Responsibilities
- Own and improve the reliability, availability, and performance of production services in Google Cloud (GCP).
- Participate in incident management, including detection, triage, mitigation, escalation, and recovery.
- Use and improve incident workflows and tooling (e.g., ServiceNow) to ensure clear ownership and timely communication.
- Design, implement, and operate observability solutions including monitoring, logging, tracing, synthetics, and dashboards (e.g., Splunk Observability, OpenTelemetry).
- Reduce operational toil through automation and engineering-led solutions, proactively introducing and driving SRE best practices.
- Support on-call rotations across multiple time zones, contributing to a sustainable 24/7 support model.
- Define, monitor, and report SLIs, SLOs, and error budgets for critical services.
- Drive and be accountable for best-in-class service availability through SRE principles, automation, and proactive reliability engineering.
Essential skills and/or Certifications
- Bachelor’s degree in Computer Science, Information Technology or related field
- Strong experience with cloud-native concepts and technologies, with a strong preference for Google Cloud Platform (GCP) and Kubernetes (GKE ).
- Proven experience with Site Reliability Engineering and production incident management , ideally using platforms such as ServiceNow.
- Experience with monitoring and observability tools, including metrics, logs, traces, and synthetics (e.g., Splunk Observability, OpenTelemetry) .
- Exposure to reliability testing, resilience engineering, or cost optimisation initiatives.
- Excellent analytical and problem-solving skills, with the ability to diagnose complex production issues quickly.
- Software development or automation experience using Python, shell scripts, or similar languages .
- Hands-on experience operating production cloud infrastructure at scale.
- Experience managing multi-region, high-availability production systems with a focus on scalability, resilience, and minimising service disruption during failures.
- Proficiency in Microsoft Office Suites Skills
- Show an ownership mindset in everything you do; be a problem solver, be curious and be inspired to take action, be proactive, seek ways to collaborate and connect with people and teams in support of driving success.
- Continuous growth mindset, keep learning through social experiences and relationships with stakeholders, experts, colleagues and mentors as well as widen and broaden your competencies through structural courses and programs.
- Where applicable, fluency in English and languages relevant to the working market.
This listing was posted by a verified recruiter at Dun & Bradstreet. Report this listing
JobSpring