As part of the Siemens DISW cloud operations organization, this position makes significant contributions towards the delivery of DevOps solutions that support best-in-class cloud based microservice applications. Our team is looking for an engineer who is excited about the automation & integration of observability within Datadog. SREs discover ways to help promote the availability of services and applications, improve processes through remediation of manual and/or repetitive tasks, and solve complex technical problems in a fast-paced, collaborative, inclusive, and iterative environment.
The candidate will support the Siemens Xcelerator platform and will be responsible for identifying, managing, improving, and reporting on availability, resiliency, reliability, and stability efficiencies. This includes providing technical guidance and leadership to drive solutions, create & enhance processes that deliver excellence. A strong relationship with the various product teams of the Xcelerator platform is necessary to support core objectives. This roles success will be defined by product teams within DISW business units meeting their SLAs.
• Provide & lead the design, implementation, automation, of observability solutions within Datadog.
• Create and maintain standards as it relates to onboarding Xcelerator services to the Datadog platform.
• Collaborate with other technical platforms and partners to engineer automated and integrated solutions between tools, services, teams that increase availability, reliability, and performance.
• Own and ensure the internal and external SLA’s meet and exceed expectations
• Be part of maintaining a 24x7, global, highly available SaaS environment
• Participate in an on-call rotation that supports our production infrastructure
• Troubleshoot production availability incidents that often span across multiple teams and services.
• Lead production incident post-mortems, and contribute to solutions to prevent problem recurrence; with the goal of automated response to all non-exceptional service conditions
• Communicate to business and technical partners on incidents as they occur when they impact system performance or availability at a critical level
Required Knowledge/Skills, Education, and Experience
• Bachelor’s Degree with at least 2+ years of IT experience or equivalent experience.
• 4+ years experience with monitoring tools specifically Datadog
• 3+ years experience with automation and integration of tools and services
• 2+ years experience as a Site Reliability Engineer or equivalent role
• 2+ years experience with containerization, specifically Kubernetes
• 2+ years experience with Amazon Web Services (AWS) services
• 2+ years experience Terraform, CloudFormation, Ansible, or equivalent tools
Qualified Applicants must be legally authorized for employment in the United States. Qualified Applicants will not require employer sponsored work authorization now or in the future for employment in the United States.
Preferred Knowledge/Skills, Education, and Experience
• **Siemens Teamcenter software**
• Desired certifications include: Datadog, Kubernetes, AWS or Azure certification
• 2+ years experience with issue/incident tracking tool
(ServiceNOW, ServiceDesk, Jira or equivalent tools)
• 2+ years experience with open source tools (Linux, Python, Git, Ansible)
• 2+ years experience Enterprise IT environment with distributed environments
• Networking concepts, including firewalls, VPN, routing, load balancers, security and DNS
• Senior level system administration experience, including troubleshooting, support, mentorship/training, and oversight
Organization: Digital Industries
Company: Siemens Industry Software (India) Private Limited
Experience Level: Experienced Professional
Full / Part time: Full-time