- Erlangen - Bayern - Allemagne
Site Reliability Engineering Lead (f/m/d)
We are looking for a proactive and experienced Site Reliability Engineering Lead (f/m/d) with proven experience in SRE transformations to drive our cloud operations and reliability engineering practices within our cloud ecosystem. This role is essential to ensuring the reliability, availability, security and performance of our productive environments. You will be responsible for defining and driving CloudOps strategy, and topics like managing level 3 operational support, including incident triage, root cause analysis, resolution, and ensuring our products meet all their Service Level Agreements, and finally establish and monitor key reliability metrics.
What we offer you
- An attractive remuneration package
- Access to Siemens share plans
- 30 days of paid vacation and a variety of flexible work schedules that allow time off for you and your family
- 2 to 3 days of mobile working per week as a global standard
- Flexible training opportunities for both your professional and personal development that you can tailor to your interests
Since each of over 300,000 team members feels that other benefits are particularly important, and we cannot list our entire benefit portfolio here, you can find more information here.
The individual benefits are subject to regulatory, contractual, or corporate conditions.
How you’ll make an impact
- Collaborate with all stakeholders to define the CloudOps and Site Reliability Engineering strategy and execution for cloud-hosted products.
- Manage Level 3 operational support, including incident triage, root cause analysis and resolution in close collaboration with R&D and service delivery teams (Bx tech support and cloud operation level 01 and 02).
- Ensure the definition and tracking of Service Level Indicators (SLIs), Service Level Objectives (SLOs), operational level agreements (OLA) and Service Level Agreements (SLAs) to measure and improve system reliability and availability.
- Ensure KPIs such as uptime, latency, error rates, and incident resolution time are established and monitored.
- Operate and optimize cloud infrastructure with a focus on availability, performance, and cost-efficiency.
- Ensure the harmonization and enhancement of topics like observability and alerting systems for the different products (e.g., CloudWatch, Datadog, Prometheus, Grafana).
- Lead level 3 incident response and on-call coordination using PagerDuty, ensuring rapid mitigation and root cause analysis. Clear process and escalation paths with L1 and L2.
- Collaborate with development and platform teams to ensure smooth deployment and operations.
- Maintain operational documentation and runbooks.
- Drive automation and continuous improvement in deployment, monitoring, and recovery processes.
Your defining qualities
- Education
- Master's degree in Computer Science, Engineering, or a related field.
- Experience & Skills
- Longterm experience in CloudOps, Site Reliability Engineering.
- Longterm experience in managing operations for a SAAS Product.
- Deep understanding of AWS services and cloud-native architectures.
- Profound experience with incident management and escalation processes (PagerDuty or similar).
- Proven experience with monitoring, logging, and alerting tools.
- Solid understanding of networking, security, and system administration in cloud environments.
- Experience with ITIL-based support processes and ticketing systems.
- Strong analytical and problem-solving skills.
- Ways of working
- Availability(On-Call) on weekends.
- Languages
- Excellent communication skills in English.
You are much more than your qualifications, and we believe in the potential of every single candidate. We look forward to getting to know you!
Your individual personality and perspective are important to us. We create a working environment that reflects the diversity of the society and support you in your personal and professional development. Let’s get to know your authentic personality and create a better future together with us. As an equal-opportunity employer we are happy to consider applications from individuals with disabilities.
About us
Ready to dive into the future?
At Siemens Smart Infrastructure Buildings, we focus on creating innovative solutions to make buildings smarter, safer, and more sustainable – in other words, simply better. Interested in being part of this journey? Join us and seize the opportunity to shape the future with us.
You can find more information about the department and its products here: https://www.siemens.com/global/en/products/buildings/smart-buildings.html
www.siemens.de/careers – if you would like to find out more about jobs & careers at Siemens.
FAQ – if you need further information on the application process.