Site Reliability Engineer (Monitoring & Incident Response Focus)
About this role
Salary: CHF 65’000 - 75’000 per year
Requirements:- • Must-have skills:
- ○ Strong experience with Linux system administration, including networking, firewall, and certificate management.
- ○ Hands-on experience with monitoring/observability tools (Splunk, Grafana, Dynatrace).
- ○ Software development skills, ideally in Java (Spring Boot, Maven).
- ○ Scripting expertise (Bash, Ansible) for automation and diagnostics.
- ○ Ability to analyze and resolve complex incidents in enterprise environments.
- ○ An SRE mindset: focus on reliability, prevention, resilience, and automation.
- ○ Fluency in both Italian and English.
- • Nice-to-have skills:
- ○ Knowledge of authentication protocols (SAML, OAuth/OIDC).
- ○ Experience with Jenkins, Docker, Kafka, ArgoCD.
- ○ Familiarity with distributed and secure system architectures.
- ○ Proficiency in German is a plus.
- • Monitor and ensure the availability and reliability of critical Java-based applications.
- • Participate in incident response and root cause analysis, reducing downtime and preventing recurrence.
- • Manage and optimize observability platforms (Splunk, Grafana, Dynatrace).
- • Automate monitoring and diagnostic tasks using scripting and tools (Bash, Ansible).
- • Administer Linux environments, including network configuration, firewalls, and certificates.
- • During non-monitoring weeks, contribute to development or operational tasks (Java, Spring Boot or automations) to improve applications and tools.
- • Act as a technical point of contact for internal and external teams (developers, business stakeholders, infrastructure).
- • Propose and implement new monitoring and alerting solutions to improve overall system resilience.
- Ansible
- ArgoCD
- Bash
- Docker
- Dynatrace
- Firewall
- Grafana
- Java
- Jenkins
- Kafka
- Linux
- MacBook
- Maven
- Network
- OAuth
- SAML
- Splunk
- Spring
- Spring Boot
More:
Site Reliability Engineer (Monitoring & Incident Response Focus)
About the role
WellD is looking for a Site Reliability Engineer to join a key enterprise project. In this role, you will be responsible for monitoring and ensuring the reliability of mission-critical applications, specifically authentication and user management systems used by millions of B2C and B2B customers. As an SRE, you will act as the technical point of reference for service stability: not only monitoring system health, but also taking ownership of incident response, troubleshooting, and root cause analysis, while contributing to automation and improvements that increase resilience and reduce downtime. You will work in a complex enterprise environment with state-of-the-art observability tools and have the opportunity to propose and implement new monitoring solutions.
What we offer
• Full-time permanent contract with competitive Swiss salaries.
• MacBook Pro and all the equipment you need to work comfortably.
• Swiss public transportation pass.
• Annual training budget for conferences, certifications, and courses.
• Access to our technical library (we’ll order the books you need).
• Hack Days, Tech Lunches, MeetUps, team-building activities – fully supported by WellD.
• A stimulating environment with enterprise-grade monitoring tools and room to grow your skills.
Note: This position is open to Swiss or EU/Schengen citizens with valid work/residency permits.
last updated 41 week of 2025