Job Description:
As the LPC/VPC deployment and system reliability engineer, the eligible candidates should be responsible for the 7 * 24 operations support and be involved in the deployment of MindSphere LPC/VPC, including:
- Act as first contact point of MindSphere LPC/VPC alerts, respond in a timely manner
- Utilize our monitoring platform to troubleshoot, to primarily analyze and identify the probable cause of the failure, to track and document the observations, root cause analysis, handling process and finalized solutions, as well as experience gathering
- Work with LPC/VPC deployment manager to ensure on-time product deployment in customer environment and keep track with the ongoing maintenance and operations tasks
- Contribute in the regular Rancher Box maintenance and release, including MindSphere services and backing services
To fit in this position, the candidate should have the following knowledge and skills:
- Have solid knowledge in common DevOps concepts and practices, including CICD, Kubernetes, Docker, Helm and Git/GitLab
- Be familiar with the in-the-industry monitoring and alerting tools like Prometheus, Grafana, ELK and PagerDuty
- Be capable of using Linux deftly and scripting skill is preferred
- Be experienced in 555 workflow and can proficiently handle and solve system or service level alerts
- Have basic knowledge in general open source softwares like PostgreSQL, Redis, OpenSearch, Kafka etc.
- Having working experience in AliCloud or equivalent provider is preferred
- Can work in an English environment
Organization: Digital Industries
Company: Siemens Ltd., China, Shanghai No. 3 Branch
Experience Level: Mid-level Professional
Job Type: Full-time