Sr SRE to collaborate with IP Network specialists/architects to troubleshoot and resolve issues, deploying automation & reliability initiatives on an i
Is remote work available? Are there any required days in office? 3 days in office preferred but not mandatory (as this is a contractual position) Responsibilities The position is for leading delivery & support of a large-scale IP Network Management platform that is Kubernetes based.
The day-to-day responsibilities include collaborating with IP Network specialists/architects to troubleshoot and resolve issues, deploying automation & reliability initiatives on an infrastructure set of 125+ servers for proactive monitoring/issue-detection/self-healing measures leveraging the latest SRE toolkit/tech stack, working with the vendor to resolve platform related issues + developing roadmap for platform life-cycle and challenging vendor to quickly mitigate platform risks Deploy features/fixes based on network specialists’ needs.
Also includes participating in pager rotation for 24/7 support. Deeply understands business drivers and cross-departmental impacts Develops business cases to justify application related capital investments Translates business requirements into technical requirements.Explaining complicated technical issues in a simplistic way to all levels of the organization Leads system requirements gathering for scalable, robust, and optimized designs Provides input and direction to vendors to ensure optimal designs Provides analysis and recommendations for new software / infrastructure Evaluates test results to determine pass/fail status Supports the project team with defect resolution during test activities Must Haves Support of Kubernetes based platforms with proven experience of critical issues mitigation.
Demonstrated experience with monitoring & observability ( Zabbix/Dynatrace/Datadog for infrastructure monitoring and ELK stack for log aggregation + visualization + analysis). Fundamental knowledge of TCP/IP Networks - ideally in a telco environment.2 Rounds of interviews (in-person preferred but not mandatory based on candidate location) including initial screening to guage past experience and a second technical deep-dive "What specific projects will be worked on? 24/7 Support of an IP Network Management System + Platform lifecycle initiatives (new infra deployment and management, application patching & upgrades)" Apply