Network Optix (Nx) is a global powerhouse in video software development, driven by a mission to empower the creation of intelligent video-based solutions and products capable of converting video into actionable data. Over a decade in the making, the Network Optix Enterprise Video Operating System helps innovative organizations rapidly and affordably build world-class, custom-tailored, enterprise-scale video products and solutions.
Nx is headquartered in Walnut Creek, California, with additional locations in Burbank, California, Portland, Oregon, Belgrade, Serbia, Taipei, Taiwan, and regional teams distributed across the globe. Recognized on the Inc 5000 Fastest Growing Companies list for 9 years running, we are committed to expanding our teams cross-functionally and globally.
Network Optix aims to power the world’s most intelligent video solutions, with the ultimate goal of carving a path toward revolutionizing the landscape of video technology and transforming how we perceive the world around us.
Purpose of the role
Lead cloud release execution, production readiness, and incident coordination across the Nx Cloud platform. This role ensures that every release is executed safely, monitored effectively, and communicated clearly across internal and customer-facing teams. Be the operational bridge between Release Engineering, SRE, Support, and Product Engineering, ensuring that cloud deployments are reliable, transparent, and continuously improving.
What you’ll be doing:
- Release Management
Own and coordinate end-to-end cloud release cycles, from release readiness and change approvals to deployment and verification.
Facilitate release sign-off, ensuring quality, stability, and risk mitigation before going live.
Partner with engineering and QA to validate build quality and rollback procedures.
Maintain a central release calendar, checklists, and change logs.
Drive change management governance and ensure compliance with release standards.
- Monitoring & Incident Management
Collaborate with the SRE team to enhance production monitoring, alerting, and observability for critical services.
Lead incident coordination and communication, ensuring timely resolution and clear updates to stakeholders.
Oversee post-incident reviews and RCA documentation, tracking remediation progress.
Proactively identify recurring issues and drive preventive improvements.
- Operational Readiness & Risk Management
Ensure environments (test, staging, production) are release-ready and well-maintained.
Support environment stability by coordinating with SRE and QA for timely updates and access management.
Lead risk assessment for releases, including rollback and contingency plans.
Partner with support and customer success teams to ensure customer impact visibility and communication.
- Tooling & Process Improvement
Collaborate with Release Engineering and SRE on Argo-based deployment tools and CI/CD improvements.
Identify automation opportunities to streamline release and monitoring workflows.
Contribute to documentation, playbooks, and training for new release procedures.
- Cross-Team Communication
Provide real-time release status updates to leadership, engineering, and support teams.
Ensure consistent post-release summaries and stakeholder reporting.
Act as the primary communication bridge between engineering, SRE, and customer-facing teams during releases and incidents.
What we’re looking for:
- Experience
10+ years in Release Management, SRE, or DevOps operations roles (give examples of roles) for cloud platforms
Proven experience managing production releases and incident response in a SaaS or distributed environment.
Hands-on experience operating hybrid infrastructures, combining cloud-based Kubernetes (EKS) with on-premise services (e.g., relays and mediators) deployed using Ansible and Docker Compose on providers such as Datapacket.
Hands-on experience with AWS Athena (to analyze ALB logs).
- Technical skills
Kubernetes, GitOps, ArgoCD, Jenkins, Terraform, or similar CI/CD tooling.
Ansible and Docker Compose
Proficiency with Jira
- Knowledge
Strong understanding of AWS (EKS/ECS, IAM, CloudWatch, Lambda, S3, EC2); familiarity with GCP/Azure
Strong understanding of Kubernetes (EKS) and service mesh deployment, operation, and lifecycle management, including zero-downtime upgrades and version rollouts (e.g., Istio, Gloo, Ambassador, or Traefik).
Knowledge of monitoring and alerting systems (Grafana, Prometheus, ELK, NetData, OpsGenie).
- Education
Bachelor’s degree in Computer Science or a related field, or equivalent professional experience
What we offer:
The position is ideally a hybrid role in one of our offices located in Burbank, CA , Walnut Creek, CA and Portland, OR.
Please note: We do not accept unsolicited resumes from third-party recruiters or staffing agencies. Any unsolicited resumes sent to our employees or submitted to our careers page or job postings without a formal agreement in place will be considered property of Network Optix, and no fees will be paid in the event that candidate is hired by the company.
Network Optix is an equal opportunity employer committed to diversity and inclusion in the workplace. We celebrate the diversity of our workforce, which includes people of all cultural, national, racial, gender identities, and those who have served in the military. We strive for an environment where creativity and collaborative growth thrive. If you have a disability or special need that requires accommodation, please let us know.