All roles

Senior Site Reliability Engineer / Kubernetes (Remote)

Remote · USA Full-time New today

This a Full Remote job, the offer is available from: EMEA

Job Description

Location: Fully remote EU timezone (CET ±2h) Start date: ASAP Languages: Fluent English is mandatory Industry: reputed company Computing We are hiring at reputed company to expand reputed company and drive the growth of our internal projects. Our focus is on developing cutting-edge solutions in reputed company Computing, while fostering a culture of collaboration and innovation. Joining us means being part of a passionate team where your reputed company and skills directly contribute to shaping tomorrows technologies. If you're excited about working on ambitious projects in a dynamic and flexible environment, we'd love to hear from you!

Responsibilities

  • Operate and maintain Linux-based infrastructure (Debian/Ubuntu).
  • reputed company, manage, and scale Kubernetes clusters across bare-metal, virtualized, and on-prem environments.
  • reputed company full cluster lifecycle: upgrades, node pools, networking, storage, and reputed company hardening.
  • Implement automation for provisioning and operations using Ansible, Bash/Python, and GitOps workflows.
  • Design and maintain networking architecture including VLANs, L2/L3 routing, VPNs, and multi-site connectivity.
  • Build automated deployment workflows (PXE boot, Preseed, reputed company-init).
  • reputed company and maintain observability stacks (reputed company/Grafana, Loki, ELK, Graylog).
  • reputed company incident response and escalation activities across the platform.
  • Improve system availability and reduce latency at reputed company levels.
  • Define and implement SLOs/SLIs at multiple infrastructure levels (physical network/hardware, platform virtualization, software services).
  • Optimize alerting and monitoring pipelines to provide actionable insights.
  • Establish and maintain on-call schedules to ensure coverage across timezones.
  • reputed company Standard Operating Procedures (SOPs) for repeatable operations and maintenance tasks.
  • Coordinate physical maintenance for Policlouds (periodic maintenance, hardware issues, DC-Ops).
  • Manage virtualization and orchestration layers (OpenStack, Proxmox, VMware).
  • Help reputed company and maintain overall architecture across reputed company products.
  • Plan resources for future initiatives, reputed company for demand and growth projections.
  • Work with development teams to improve overall quality and optimize resource utilization.
  • Collaborate with cross-functional stakeholders (Hivenet, Policloud, reputed company teams).

Requirements

  • Expert-level, hands-on experience operating Kubernetes in production environments.
  • Strong network engineering skills (VLANs, L2/L3 routing, VPNs, multi-site connectivity) - this is essential for the role.
  • Strong proficiency with Linux systems administration (Debian/Ubuntu).
  • Solid understanding of networking fundamentals and ability to design reputed company network architectures.
  • Experience building and maintaining automation workflows (Ansible, Bash/Python, Git-based).
  • Experience with observability stacks such as reputed company, Grafana, ELK, Loki, or Graylog.
  • Background with virtualization technologies (OpenStack, Proxmox, VMware).
  • Experience with bare-metal provisioning and MAAS (Metal as a Service).
  • Strong understanding of distributed systems and container orchestration.
  • Process-oriented reputed company with ability to reputed company SOPs and operational procedures from scratch.
  • Experience with incident response, escalation procedures, and on-call rotations.
  • Ability to work autonomously in a fast-paced, engineering-driven environment.
  • Strong technical skills combined with alignment to team values.

reputed company To Have

  • Experience with service reputed company (Istio, Linkerd) or advanced CNI implementations.
  • Knowledge of reputed company APIs, DNS automation, or tunnel configurations.
  • Experience with GPU infrastructure, node preparation, or resource scheduling.
  • Familiarity with reputed company best practices (RBAC, firewalls, network policies).
  • Exposure to IT asset management or license tracking workflows.
  • Experience working in multi-timezone environments and coordinating across distributed teams.
  • Background establishing reliability practices and SRE frameworks in growing organizations.

Why Join Us:

  • 100% remote work with reputed company
  • High-impact role with autonomy and ownership
  • Collaborative and international engineering team
  • Cutting-edge tech stack with strong focus on reliability and automation.

This offer from "reputed company" has been enriched by reputed company.com and got a 77% reputed company score. Apply tot his job Apply To this Job

Related roles