Comcast

Cloud SRE Engineer - Kubernetes

PA - Philadelphia, 1800 Arch St Full time
Make your mark at Comcast -- a Fortune 30 global media and technology company. From the connectivity and platforms we provide, to the content and experiences we create, we reach hundreds of millions of customers, viewers, and guests worldwide. Become part of our award-winning technology team that turns big ideas into cutting-edge products, platforms, and solutions that our customers love. We create space to innovate, and we recognize, reward, and invest in your ideas, while ensuring you can proudly bring your authentic self to the workplace. Join us. You’ll do the best work of your career right here at Comcast. (In most cases, Comcast prefers to have employees on-site collaborating unless the team has been designated as virtual due to the nature of their work. If a position is listed with both office locations and virtual offerings, Comcast may be willing to consider candidates who live greater than 100 miles from the office for the remote option.)

Job Summary

We are seeking a talented and experienced Site Reliability Engineer (SRE) to join our Containers Platform Team. As an SRE on the team, you will play a critical role in ensuring the reliability, scalability, and performance of our containerized infrastructure, enabling our tenants to deploy and manage their applications with confidence. You will work closely with cross-functional teams to design, implement, and maintain complex systems that power our cloud platform.

In addition, the candidate must utilize technical skills, effective communication, problem-solving abilities, and the ability to work well on a fast-paced team. Continuously staying updated with the latest trends and advancements in container technology and cloud ecosystems to keep the infrastructure reliable and efficient.

Job Description

Responsibilities:

  • You will be responsible for the deployment of containerized applications across a cluster of bare-metal servers, facilitating automatic scaling of containerized applications based on demand, and managing utilization of compute resources such as CPU, memory, and storage across the cluster.

  • Implementing monitoring solutions (e.g., Prometheus, Grafana) to track the health and performance of bare metal clusters and infrastructure components. You should be able to set up alerting mechanisms to detect and respond to issues proactively and recover from failures by restarting failed containers or reallocating workloads to healthy nodes.

  • Working closely with development and engineering teams to establish CI/CD pipelines for automating the deployment and rollout of Kubernetes services. Support seamless rolling updates allowing new versions to be deployed gradually while maintaining application availability.

  • Identifying performance bottlenecks in containerized environments and optimizing resource utilization through capacity planning, auto-scaling, and performance tuning.

  • Documenting processes, procedures, and best practices related to the platform operations and sharing knowledge with team members.

Qualifications

  • Bachelor’s degree in computer science or a related field, or equivalent experience, typically 2 years in a Site Reliability Engineering, DevOps, or Systems Engineering role.

  • Must be familiar with containerized technologies such as Kubernetes, container, and/or nerdctl. This includes the ability to deploy, manage, and scale containerized applications effectively.

  • Intermediate experience implementing continuous integration and continuous delivery (CI/CD) tools and systems.

  • Proficiency in programming languages such as: Shell scripting (Bash), and familiarity with YAML/JSON.

  • Automation scripting with tools such as Ansible playbooks or similar imaging solutions.

  • General understanding of networking fundamentals, including TCP, DNS, UDP, IPv4/IPv6 networking, Load Balancing, and protocols. Understanding IP networking and traffic scaling is also important.

  • Excellent analytical and problem-solving skills with the ability to effectively communicate complex technical information.

  • Strong written communication skills are essential, as well as the ability to create clear and informative documentation.

  • Ability to work effectively across internal and external organizations.

  • Flexibility to work off-hours for on-call duties. SREs are often responsible for maintaining the reliability and availability of systems outside of standard working hours, so the ability to respond to incidents and perform maintenance tasks as needed is required.

This position is ineligible for visa sponsorship.  To be considered for this role, you must be legally authorized to work in the United States and not require sponsorship for employment now or in the future.

Employees at all levels are expected to:

  • Understand our Operating Principles; make them the guidelines for how you do your job.
  • Own the customer experience - think and act in ways that put our customers first, give them seamless digital options at every touchpoint, and make them promoters of our products and services.
  • Know your stuff - be enthusiastic learners, users and advocates of our game-changing technology, products and services, especially our digital tools and experiences.
  • Win as a team - make big things happen by working together and being open to new ideas.
  • Be an active part of the Net Promoter System - a way of working that brings more employee and customer feedback into the company - by joining huddles, making call backs and helping us elevate opportunities to do better for our customers.
  • Drive results and growth.
  • Support a culture of inclusion in how you work and lead.
  • Do what's right for each other, our customers, investors and our communities.

Disclaimer:

  • This information has been designed to indicate the general nature and level of work performed by employees in this role. It is not designed to contain or be interpreted as a comprehensive inventory of all duties, responsibilities and qualifications.

Skills

AWS Elastic Kubernetes Service (EKS), Communication, DevOps, Kubernetes, Python (Programming Language)

We believe that benefits should connect you to the support you need when it matters most, and should help you care for those who matter most. That's why we provide an array of options, expert guidance and always-on tools that are personalized to meet the needs of your reality—to help support you physically, financially and emotionally through the big milestones and in your everyday life.


Please visit the benefits summary on our careers site for more details.

Education

Bachelor's Degree

While possessing the stated degree is preferred, Comcast also may consider applicants who hold some combination of coursework and experience, or who have extensive related professional experience.

Certifications (if applicable)

Relevant Work Experience

2-5 Years

Comcast is an equal opportunity workplace. We will consider all qualified applicants for employment without regard to race, color, religion, age, sex, sexual orientation, gender identity, national origin, disability, veteran status, genetic information, or any other basis protected by applicable law.