Skip to content
← Back to job listings

Senior Site Reliability (R-19383)

Dun & Bradstreet · Dublin, Ireland

Software DevelopmentSenior LevelQuick applycontract1 day ago

About The Role

Responsibilities

  • Own and improve the reliability, availability, and performance of production services in Google Cloud (GCP).
  • Participate in incident management, including detection, triage, mitigation, escalation, and recovery.
  • Use and improve incident workflows and tooling (e.g., ServiceNow) to ensure clear ownership and timely communication.
  • Design, implement, and operate observability solutions including monitoring, logging, tracing, synthetics, and dashboards (e.g., Splunk Observability, OpenTelemetry).
  • Reduce operational toil through automation and engineering-led solutions, proactively introducing and driving SRE best practices.
  • Support on-call rotations across multiple time zones, contributing to a sustainable 24/7 support model.
  • Define, monitor, and report SLIs, SLOs, and error budgets for critical services.
  • Drive and be accountable for best-in-class service availability through SRE principles, automation, and proactive reliability engineering.

Essential skills and/or Certifications

  • Bachelor’s degree in Computer Science, Information Technology or related field
  • Strong experience with cloud-native concepts and technologies, with a strong preference for Google Cloud Platform (GCP) and Kubernetes (GKE ).
  • Proven experience with Site Reliability Engineering and production incident management , ideally using platforms such as ServiceNow.
  • Experience with monitoring and observability tools, including metrics, logs, traces, and synthetics (e.g., Splunk Observability, OpenTelemetry) .
  • Exposure to reliability testing, resilience engineering, or cost optimisation initiatives.
  • Excellent analytical and problem-solving skills, with the ability to diagnose complex production issues quickly.
  • Software development or automation experience using Python, shell scripts, or similar languages .
  • Hands-on experience operating production cloud infrastructure at scale.
  • Experience managing multi-region, high-availability production systems with a focus on scalability, resilience, and minimising service disruption during failures.
  • Proficiency in Microsoft Office Suites Skills
  • Show an ownership mindset in everything you do; be a problem solver, be curious and be inspired to take action, be proactive, seek ways to collaborate and connect with people and teams in support of driving success.
  • Continuous growth mindset, keep learning through social experiences and relationships with stakeholders, experts, colleagues and mentors as well as widen and broaden your competencies through structural courses and programs.
  • Where applicable, fluency in English and languages relevant to the working market.

This listing was posted by a verified recruiter at Dun & Bradstreet. Report this listing