Job Summary
Synechron is seeking an experienced Reliability Engineer to strengthen our cloud-native systems’ stability, scalability, and performance. In this role, you will own AWS infrastructure, containerized workloads, CI/CD pipelines, and observability frameworks. Your efforts will ensure high availability, operational efficiency, and security in a complex enterprise environment. Collaborating with DevOps, Development, QA, and Security teams, you will drive automation, incident management, and optimization to support the organization’s digital and operational goals.

Software Requirements

Required: AWS (EC2, ECS/EKS, VPC, IAM, ALB/NLB, Route 53, S3, CloudWatch), Docker, Terraform or CloudFormation, Python/Bash scripting, monitoring tools (CloudWatch, Prometheus, Grafana, ELK/OpenSearch, X-Ray)
Preferred: Kubernetes (EKS or other), Jenkins or Azure DevOps, Blue/Green and Canary deployment tools, Secrets management tools, vulnerability scanning tools
Experience level: Proven hands-on expertise owning cloud infrastructure and automation in production environments

Overall Responsibilities

Define and monitor SLOs, SLIs, error budgets, and reliability strategies aligning with organizational targets
Manage and optimize AWS infrastructure including compute, storage, network, and security components
Containerize, orchestrate, and operate services using Docker and Kubernetes or ECS/EKS
Develop and improve CI/CD pipelines to enable continuous delivery with blue/green and canary deployment practices
Implement and maintain observability solutions—metrics, logs, traces—using tools like CloudWatch, Prometheus, Grafana, and ELK stack
Automate operational tasks through Infrastructure as Code (Terraform, CloudFormation) and scripting
Establish incident management practices, including on-call support, runbooks, root cause analysis, and postmortems
Drive system performance tuning, capacity planning, and cost management initiatives
Enforce security best practices such as IAM least privilege, secrets management, SSL/TLS, and vulnerability assessments

Technical Skills (By Category)

Programming Languages:
- Required: Python, Bash scripting
- Preferred: Go, Ruby, or other scripting languages for automation
Databases/Data Management:
- Familiarity with AWS RDS, Redis/ElastiCache, and storage services (S3, EFS)
Cloud Technologies:
- Required: AWS core services, infrastructure deployment, and management
Frameworks and Libraries:
- Experience with cloud-native automation frameworks, monitoring, and logging tools
Development Tools & Methodologies:
- CI/CD pipelines (GitHub Actions, Jenkins, Azure DevOps), version control (Git), containerization (Docker), orchestration (Kubernetes/EKS)
Security Protocols:
- Knowledge of TLS/SSL, IAM policies, service mesh security, secrets management (HashiCorp Vault, AWS Secrets Manager)

Experience Requirements

7+ years in SRE, DevOps, or cloud operations with direct production responsibilities
Hands-on experience managing AWS infrastructure, containerized workloads, and CI/CD pipelines
Demonstrable experience with observability, incident response, and automation in enterprise environments
Industry experience in finance, banking, fintech, or other regulated sectors is preferred
Alternative pathways: Extensive experience in large-scale distributed systems, infrastructure design, or security-focused cloud operations

Day-to-Day Activities

Manage and optimize AWS infrastructure, ensuring stability, resilience, and security
Maintain and enhance CI/CD pipelines and deployment automation
Monitor system health through metrics, logs, and trace tools, responding to incidents proactively
Collaborate with development teams to incorporate reliability patterns such as circuit breakers, retries, and health checks
Perform capacity planning and cost optimization activities
Document procedures, runbooks, and post-incident reports
Conduct routine performance tuning, vulnerability scans, and security updates

Qualifications

Bachelor’s degree in Computer Science, Information Technology, or related field; equivalent practical experience accepted
Certifications such as AWS Certified Solutions Architect, DevOps Engineer, or equivalent are highly desirable
Proven experience leading infrastructure operations and reliability initiatives in cloud environments
A continued commitment to staying current with cloud certifications, security standards, and automation tools

Professional Competencies

Strong analytical skills with a focus on failure prevention and system resilience
Excellent collaboration and communication skills for cross-team coordination
Ability to influence and lead technical initiatives and best practices
Self-motivated with a problem-solving mindset and proactive approach to operational challenges
Capacity to learn continuously and adapt to evolving cloud technologies and best practices
Focus on delivering dependable, scalable, and secure systems

SYNECHRON’S DIVERSITY & INCLUSION STATEMENT

Diversity & Inclusion are fundamental to our culture, and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity, Equity, and Inclusion (DEI) initiative ‘Same Difference’ is committed to fostering an inclusive culture – promoting equality, diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger, successful businesses as a global company. We encourage applicants from across diverse backgrounds, race, ethnicities, religion, age, marital status, gender, sexual orientations, or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements, mentoring, internal mobility, learning and development programs, and more.

All employment decisions at Synechron are based on business needs, job requirements and individual qualifications, without regard to the applicant’s gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law.

Candidate Application Notice

Reliability Engineer | AWS Cloud, Containerization, CI/CD, Monitoring, Security & Incidence Response

Related Jobs

Software Engineer, Mobile Applications

Software Engineer III, Ireland

Analytics Engineer (AJC)

Embedded AI Software Engineer

Senior Specialist: Data Governance

Java or C++ Database Developer