Important Information
Experience: +10 years
Job Mode: Full-time
Work Mode: Remote
ID: 20192
Job Summary
We are seeking a Senior AWS Infrastructure & Platform Engineer with deep expertise in designing, optimizing, and operating cloud infrastructure on AWS. This role is responsible for improving the reliability, performance, and cost-efficiency of our AWS environments while enabling engineering teams to deploy faster and with greater confidence. The ideal candidate combines hands-on AWS architecture skills with a strategic mindset for infrastructure optimization and a track record of measurable improvements in system performance, deployment velocity, and cloud spend.
Responsibilities and Duties
AWS Infrastructure Strategy & Cost Optimization
- AWS Platform Strategy: Define and implement the strategic vision for LendKey’s AWS infrastructure, ensuring alignment with business goals while continuously optimizing for performance, reliability, and cost.
- Cost Optimization: Proactively analyze and reduce AWS spend using Cost Explorer, Trusted Advisor, Compute Optimizer, and custom reporting. Implement rightsizing, Reserved Instance/Savings Plan strategies, spot instance utilization, and architectural efficiencies to drive measurable cost reductions.
- Infrastructure Efficiency: Identify and eliminate waste across compute, storage, networking, and data transfer. Establish tagging strategies and cost allocation models to drive accountability across teams.
Reliability & Performance Engineering
- System Reliability: Design and implement highly available, fault-tolerant architectures across multiple AWS availability zones. Drive improvements to meet and exceed 99.9% SLA through proactive capacity planning, auto-scaling, and chaos engineering practices.
- Performance Optimization: Continuously profile, benchmark, and optimize system performance across compute, networking, storage, and database layers. Reduce latency, improve throughput, and eliminate bottlenecks.
- Disaster Recovery: Design, document, and regularly test disaster recovery procedures to ensure business continuity across all critical systems.
Deployment Speed & Developer Experience
- CI/CD Pipeline Optimization: Design, maintain, and continuously improve CI/CD pipelines to minimize build times, reduce deployment friction, and enable engineering teams to release frequently and safely.
- Deployment Strategies: Implement and manage blue/green, canary, and rolling deployment patterns to minimize downtime and reduce deployment risk.
- Developer Self-Service: Build internal tooling, templates, and self-service capabilities that enable engineering teams to independently provision resources, deploy services, and troubleshoot issues within established guardrails.
- Infrastructure as Code (IaC): Champion Terraform best practices including module development, state management, code review workflows, and automated drift detection to ensure all infrastructure is version-controlled and reproducible.
Monitoring, Observability & Incident Management
- Observability: Implement and maintain comprehensive monitoring, logging, and tracing solutions (e.g., CloudWatch, New Relic, Prometheus/Grafana, OpenTelemetry) to provide full-stack visibility into system health and performance.
- SLI/SLO Framework: Define and instrument Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to create data-driven reliability targets and alerting.
- Incident Response: Lead a rotational on-call schedule. Drive blameless post-incident reviews and implement preventive measures to reduce incident recurrence.
What We Are Looking For
- A hands-on AWS infrastructure engineer who architects and optimizes cloud environments for reliability, speed, and cost-efficiency.
- A systems thinker who proactively identifies bottlenecks—whether in deployment pipelines, cloud costs, or system performance—and drives measurable improvements.
- A builder who creates tools, automation, and self-service platforms that multiply the effectiveness of engineering teams.
- A leader who thrives in collaborative environments and can independently drive complex projects from concept to production.
- Someone who stays current with AWS services, cloud-native architecture patterns, and infrastructure best practices.
- A professional who brings a FinOps mindset—balancing cost, performance, and reliability in every infrastructure decision.
Essential Qualifications
10+ years in the technology field, with at least 3–5 years focused specifically on AWS infrastructure engineering, platform engineering, or cloud/site reliability operations.
Proven track record of managing and optimizing large-scale AWS environments in production, with demonstrated cost savings or performance improvements.
AWS Cloud Services & Architecture
- Deep, hands-on experience with core AWS services: ECS, EC2, EKS, EBS, S3, RDS/Aurora, Lambda, CloudFront, Route 53, ALB/NLB, and VPC networking.
- Demonstrated ability to design highly available, fault-tolerant, and auto-scaling architectures across multiple availability zones.
- Hands-on experience with AWS cost management tools (Cost Explorer, Trusted Advisor, Compute Optimizer) and proven implementation of cost reduction strategies including rightsizing, Reserved Instances, Savings Plans, and spot instance optimization.
- Working knowledge of the AWS Well-Architected Framework, particularly the Cost Optimization, Reliability, and Performance Efficiency pillars.
Container Orchestration
- Experience managing and scaling Kubernetes clusters (EKS) in production, including cluster lifecycle management, node group optimization, and workload right-sizing.
- Understanding of Kubernetes networking (CNI, service mesh), resource management (requests/limits, HPA/VPA), and container security best practices.
Infrastructure as Code & Automation
- Strong proficiency in Terraform for building, managing, and versioning AWS infrastructure, including module authoring, remote state management, and workspace strategies.
- Experience with configuration management tools (Ansible, Chef, or equivalent) for operational automation.
- Proficiency in scripting languages (Python, Bash, or Go) for building infrastructure automation, tooling, and custom integrations.
CI/CD & Deployment Engineering
- Hands-on experience designing and optimizing CI/CD pipelines (e.g., GitHub Actions, Jenkins, or AWS CodePipeline) for fast, safe, and repeatable deployments.
- Experience implementing and managing deployment strategies such as blue/green, canary, and rolling deployments to minimize risk and downtime.
Monitoring & Observability
- Experience implementing and managing monitoring, logging, and alerting solutions (CloudWatch, New Relic, Prometheus/Grafana, ELK/OpenSearch).
- Demonstrated ability to define SLIs/SLOs and build dashboards and alerts that drive proactive issue detection and resolution.
Documentation & Communication
- Strong capabilities in creating and maintaining comprehensive technical documentation including runbooks, architecture decision records (ADRs), and operational playbooks.
- Ability to communicate complex technical concepts clearly to both engineering teams and non-technical business stakeholders.
Preferred Qualifications
- AWS certifications such as Solutions Architect Professional, DevOps Engineer Professional, or SysOps Administrator.
- Experience with FinOps practices, including building cost visibility dashboards and driving cost accountability across engineering teams.
- Experience with GitOps workflows and tools (ArgoCD, Flux).
- Familiarity with service mesh technologies or API gateway management.
- Experience migrating or re-architecting legacy workloads to cloud-native patterns on AWS.
About Encora
Encora is a global company that offers Software and Digital Engineering solutions. Our practices include Cloud Services, Product Engineering & Application Modernization, Data & Analytics, Digital Experience & Design Services, DevSecOps, Cybersecurity, Quality Engineering, AI & LLM Engineering, among others.
At Encora, we hire professionals based solely on their skills and do not discriminate based on age, disability, religion, gender, sexual orientation, socioeconomic status, or nationality.