cover image
Solace

Solace

solace.com

1 Job

654 Employees

About the Company

Solace helps large enterprises become modern and real-time by giving them everything they need to make their business operations and customer interactions event-driven. With PubSub+, the market's first and only event management platform, the company provides a comprehensive way to create, document, discover and stream events from where they are produced to where they need to be consumed - securely, reliably, quickly, and guaranteed. Behind Solace technology is the world's leading group of data movement experts, with over 20 years of experience helping global enterprises solve some of the most demanding challenges in a variety of industries - from capital markets, retail, and gaming to space, aviation, and automotive. Established enterprises such as SAP, Barclays and the Royal Bank of Canada, multinational automobile manufacturers such as Groupe Renault and Groupe PSA, and industry disruptors such as Jio use Solace's advanced event broker technologies to modernize legacy applications, deploy modern microservices, and build an event mesh to support their hybrid cloud, multi-cloud and IoT architectures. Learn more at solace.com.

Listed Jobs

Company background Company brand
Company Name
Solace
Job Title
Senior Cloud Site Reliability Engineer
Job Description
**Job Title:** Senior Cloud Site Reliability Engineer **Role Summary:** Lead the day‑to‑day operation of a multi‑cloud SaaS platform, ensuring reliability, performance, and SLA compliance across AWS, Azure, GCP, and Kubernetes environments. Drive automation, observability, and incident response processes while collaborating directly with customers to resolve high‑impact operational issues. **Expectations:** - Maintain continuous availability and performance of the cloud service. - Deliver incident‑driven improvements and proactive capacity planning. - Provide 24/7 on‑call support and manage escalations in a mission‑critical context. - Communicate complex technical findings clearly to both technical and non‑technical stakeholders. **Key Responsibilities:** - Operate and monitor SaaS workloads on AWS, Azure, GCP and Kubernetes (EKS, AKS, GKE). - Design and implement infrastructure tooling, observability dashboards, and automation pipelines. - Perform root‑cause analysis, post‑mortem documentation, and corrective action for production incidents. - Process service requests, provisioning, and customer escalations. - Develop and maintain Terraform/Cfn scripts and configuration for cloud resources. - Manage and fine‑tune monitoring solutions (Datadog, Kibana, Prometheus). - On‑call duty and off‑hour support as part of a rotation. - Collaborate with engineering teams to improve system resilience and deployment practices. **Required Skills:** - Expertise with AWS, Azure, and GCP services and features. - Hands‑on Kubernetes operations across EKS, AKS, and GKE. - Proficiency with Linux OS debugging and incident management. - Practical experience with monitoring (Datadog, Kibana, Prometheus). - Infrastructure automation using Terraform and CloudFormation. - Scripting/Programming in Groovy, Python, or Go. - Strong communication and customer‑facing support skills. **Required Education & Certifications:** - Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent work experience). - Certified Kubernetes Administrator (CKA). - Certified Cloud Administrator forAWS, Azure, or GCP (e.g., AWS Certified Solutions Architect, Azure Administrator Associate, or GCP Associate Cloud Engineer).
Ottawa, Canada
Hybrid
Senior
31-10-2025