If you are motivated and believe in the credit union philosophy of "People Helping People," join our team!
Position Overview:
The Senior Site Reliability Engineer mentors more junior SREs and coaches on process improvement opportunities. This role takes responsibility for support and stability of many applications. This role develops, maintains, and build on team standards. This role regularly reviews and collaborates with other SRE for best practices in the development and maintenance of varied technology delivery pipelines. This role actively monitors and ensures application monitoring methods are consistent and optimized.
Essential Responsibilities:
- 40% Maintain, monitor, troubleshoot and optimize systems for reliability and efficient performance on a 24/7 365 days a year model. Partner with development and other teams to improve services through rigorous testing and release procedures. Participate in system design consulting, platform management, and capacity planning. Balance feature development speed and reliability with well-defined service-level objectives. Support of Dev, QA, UAT, Production and DR for many team-supported applications. Follows ITSM processes for incident response, change management, and problem investigation.
- 20% Automate infrastructure and operations tasks, creating sustainable systems and services through automation and uplifts. Leverage data analytics to identify trends, predict and prevent issues, and promote data-driven decision-making. Continuously reviews processes and procedures for improvement opportunities.
- 10% Ensure systems adhere to relevant security protocols and regulations.
- 10% Ensure certificates are renewed and maintained within expiration windows.
- 10% Prepare disaster recovery plans.
- 10% Provides guidance and coaching to more junior level team members.
Required Education & Experience (Knowledge, Skills, & Abilities):
- Strong working knowledge of Windows and Linux distributed environments
- Proficiency in programming/scripting languages like Python, Go, or Shell.
- Strong knowledge of distributed storage technologies like NFS, HDFS, CephFS, and Amazon S3, as well as dynamic resource management frameworks like Apache Mesos, Kubernetes, or Yarn
- Understanding of system administration, cloud services (e.g., AWS, GCP), and infrastructure automation tools (e.g., Terraform, Ansible).
- Knowledge of networking, security, and database management.
- Proactive approach to triaging and troubleshooting problems, performance bottlenecks, and areas for improvement in a distributed environment.
- Must have professional experience with support and troubleshooting of VPNs, firewalls, networking, cloud infrastructure concepts, SSL, and mTLS.
- Strong knowledge and experience in troubleshooting platforms, applications, or infrastructure with high availability, disaster recovery, load balancing, and clustering concepts.
- Professional experience working within ITSM framework processes (Change and Incident).
- Experience with maintaining support documentation for platforms and/or supported applications.
- Must be passionate about contributing to an organization focused on continuously improving member experiences.
- Professional experience with Jira, Confluence, Lucid Charts/Visio, and BMC Helix
- 5 years of support, delivery, and/or continuous improvement experience
Preferred Education & Experience (Knowledge, Skills, & Abilities):
- Experience with continuous integration and deployment (CI/CD) practices, monitoring, and incident response.
- Previous success in technical engineering
- Coding experience beyond simple scripts
- Bachelor’s degree, associate’s degree, or 5+ years of support, delivery, and/or continuous improvement experience
Job Environment & Physical Requirements:
- Sitting for prolonged periods
- Computer for prolonged periods
- Telephone for prolonged periods
SECU provides equal employment opportunity to all qualified persons regardless of race, color, religion, age, sex, sexual orientation, gender identity, national origin, genetic information, disability, veteran status, or other classification protected by law.
Disclaimer
State Employees' Credit Union reserves the right to fill this role at a higher/lower level based on business need.