Job Description

What Is the Opportunity

We are seeking a highly skilled and motivated Technical Engineer with expertise in Site Reliability Engineering (SRE)-like practices, cloud technologies, and SaaS application deployment. The ideal candidate will have strong development skills, a deep understanding of cloud infrastructure, and hands-on experience in setting up and managing SaaS applications. This role requires a proactive individual who can ensure system reliability, scalability, and performance while driving automation and innovation. Additionally, the candidate should have experience in SRE principles, including incident management, observability, and building reliable systems that meet business and user needs.

What Will You Do?

Work closely with Quality Engineering, DevOps, Development, IT, and Cloud teams to align SRE practices with organizational goals.
System Reliability and Performance:
- Design, implement, and maintain reliable and scalable systems to ensure high availability and performance.
- Monitor system health, identify bottlenecks, and proactively resolve issues to minimize downtime.
- Develop and maintain Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs).
- Apply SRE principles to improve system reliability and reduce operational toil.
Cloud Infrastructure Management:
- Architect, deploy, and manage cloud-based infrastructure (e.g., AWS, Azure, GCP).
- Optimize cloud resources for cost efficiency and performance.
- Implement Infrastructure as Code (IaC) using tools like Terraform, CloudFormation, or Pulumi
SaaS Application Deployment:
- Set up and configure new SaaS applications, ensuring seamless integration with existing systems.
- Automate deployment pipelines using CI/CD tools (e.g., Jenkins, GitHub Actions, GitLab CI/CD).
- Collaborate with cross-functional teams to ensure smooth onboarding of SaaS solutions.
Development and Automation:
- Write clean, efficient, and maintainable code in languages such as Python, Go, Java, or Ruby.
- Develop automation scripts for repetitive tasks, monitoring, and incident response.
- Build and maintain tools to improve developer productivity and system reliability.
Incident Management and Troubleshooting:
- Lead incident response efforts, including root cause analysis and post-mortem reviews.
- Implement robust monitoring and alerting systems using tools like Prometheus, Grafana, or Datadog.
- Ensure effective communication and resolution during critical incidents.
- Establish and refine incident management processes to minimize Mean Time to Recovery (MTTR).
Observability and Monitoring:
- Design and implement observability solutions to provide deep insights into system performance and behavior.
- Utilize tools like Prometheus, Grafana, Datadog, or New Relic to monitor system health and detect anomalies.
- Develop dashboards and alerts to ensure proactive issue detection and resolution.
Security and Compliance:
- Implement security best practices for cloud and SaaS environments.
- Ensure compliance with industry standards and regulations (e.g., GDPR, SOC 2, ISO 27001).
- Conduct regular security audits and vulnerability assessments.
Collaboration and Knowledge Sharing:
- Work closely with development, operations, and product teams to align technical solutions with business goals.
- Document processes, workflows, and best practices to foster knowledge sharing within the team.
- Mentor junior team members and contribute to a culture of continuous learning.

What Do You Need to Succeed?

Must have:

Technical Skills:
- Proficiency in programming languages such as Python, Go, Java, or Ruby.
- Strong understanding of cloud platforms (AWS, Azure, GCP) and their services.
- Experience with containerization and orchestration tools (e.g., Docker, Kubernetes).
- Hands-on experience with CI/CD pipelines and tools (e.g., Jenkins, GitHub Actions, GitLab CI/CD).
- Knowledge of Infrastructure as Code (IaC) tools (e.g., Terraform, CloudFormation, Pulumi).
SRE Expertise:
- Proven experience in applying SRE principles to improve system reliability and scalability.
- Experience in incident management, root cause analysis, and post-mortem processes.
SaaS Expertise:
- Proven experience in deploying and managing SaaS applications.
- Familiarity with SaaS integration and API management.
Monitoring and Observability:
- Experience with monitoring tools (e.g., Dynatrace)
Automation and Scripting:
- Strong scripting skills in Bash, Python, or similar languages.
- Experience in automating repetitive tasks and workflows.
Soft Skills:
- Excellent problem-solving and troubleshooting abilities.
- Strong communication and collaboration skills.

Nice-to-have:

Salesforce DevOps: Familiarity with Salesforce & Flosum for managing source-driven development and CI/CD workflows.
Bachelor’s degree in Computer Science, Engineering, or in a field relevant to the role.
Strategic thinker with excellent interpersonal skills to work across functions and businesses

What’s in it for you?

We thrive on the challenge to be our best, progressive thinking to keep growing, and working together to deliver trusted advice to help our clients thrive and communities prosper. We care about each other, reaching our potential, making a difference to our communities, and achieving success that is mutual.

A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions, and stock where applicable
Leaders who support your development through coaching and managing opportunities
Ability to make a difference and lasting impact
Work in a dynamic, collaborative, progressive, and high-performing team
A world-class training program in financial services
Flexible work/life balance options
Opportunities to do challenging work

#LI-POST

Job Skills

Agile Methodology, Application Infrastructure, Atlassian JIRA, Automation, Cloud Platform, Cloud Technology, DevOps, Group Problem Solving, IT Automation, IT Monitoring, Operations Support, PagerDuty, Production Support, Site Reliability Engineering, Software Development Life Cycle (SDLC), Software Engineering, Software Product Technical Knowledge, System Applications, Systems Software, Teamwork

Additional Job Details

Address:

RBC WATERPARK PLACE, 88 QUEENS QUAY W:TORONTO

City:

Toronto

Country:

Canada

Work hours/week:

37.5

Employment Type:

Full time

Platform:

TECHNOLOGY AND OPERATIONS

Job Type:

Regular

Pay Type:

Salaried

Posted Date:

2026-03-04

Application Deadline:

2026-03-27

Note: Applications will be accepted until 11:59 PM on the day prior to the application deadline date above

Inclusion and Equal Opportunity Employment

At RBC, we believe an inclusive workplace that has diverse perspectives is core to our continued growth as one of the largest and most successful banks in the world. Maintaining a workplace where our employees feel supported to perform at their best, effectively collaborate, drive innovation, and grow professionally helps to bring our Purpose to life and create value for our clients and communities. RBC strives to deliver this through policies and programs intended to foster a workplace based on respect, belonging and opportunity for all.

Join our Talent Community

Stay in-the-know about great career opportunities at RBC. Sign up and get customized info on our latest jobs, career tips and Recruitment events that matter to you.

Expand your limits and create a new future together at RBC. Find out how we use our passion and drive to enhance the well-being of our clients and communities at jobs.rbc.com.

RBC is presently inviting candidates to apply for this existing vacancy. Applying to this posting allows you to express your interest in this current career opportunity at RBC. Qualified applicants may be contacted to review their resume in more detail.

Lead Site Reliability Engineer

Related Jobs

Physical Therapist

Senior Fullstack - Node, React and Ruby

Senior Fullstack - Node, React and Ruby

Senior Fullstack - Node, React and Ruby

Senior Fullstack - Node, React and Ruby

Senior Backend Engineer, Platform