Utilitzem eines d'analítica first-party per entendre com els candidats usen Mainder Jobs. No compartim dades amb anunciants. Política de privacitat
Resum del rol
Publicat via Mainder
The Platform/DevOps Engineer II is a key individual contributor on the Platform Engineering team, responsible for designing, building, and maintaining the cloud infrastructure and developer tooling that powers RegalCineworld's global digital products. Operating across Azure and Cloudflare, you will own critical platform components, define platform design patterns, and drive reliability and scalability improvements that directly impact millions of moviegoers across multiple territories.
This role sits within a team managing a multi-cloud platform spanning e-commerce, mobile, loyalty, and content management systems across US (Regal), UK (Cineworld/Picturehouse), and international (CCI) territories.
Own and operate critical cloud platform components across Azure, including App Services, API Management, Kubernetes (AKS), and serverless workloads.
Define and implement Infrastructure as Code (IaC) patterns using Terraform, ensuring infrastructure is codified, repeatable, and auditable.
Design and maintain CI/CD pipelines (GitHub Actions, TeamCity) for build, test, and deployment of platform and application services.
Manage Cloudflare configurations including DNS, WAF, rate limiting, and edge security rules across multiple brand domains.
Monitor and optimize cloud spend across Azure and Cloudflare; contribute to FinOps practices and cost attribution.
Contribute to disaster recovery planning, failover testing, and backup validation.
Participate in change advisory processes for production deployments, ensuring risk assessment and rollback plans.
Drive unified observability strategy across all territories via Dynatrace, including APM instrumentation, synthetic monitoring, and real-time alerting.
Define and implement monitoring solutions for production services, ensuring proactive detection and rapid diagnosis of incidents.
Participate in incident response, lead root cause analysis, and drive remediation actions that permanently improve system reliability.
Maintain and improve the Incident Management Framework including Jira ITSM workflows, severity classification, and escalation paths.
Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for platform services.
Participate in a weekly on-call rotation (approximately once every 4–6 weeks), providing after-hours incident response for production platform services across all territories, including:
- Production deployments — supporting and monitoring off-hours releases and rollbacks.
- Sev-1 / Sev-2 incidents — first-responder triage, escalation, and coordination for critical outages or degraded service.
- Security events / bad actors — responding to WAF alerts, DDoS attempts, bot attacks, and suspicious traffic patterns.
- Tech debt emergencies — addressing urgent platform stability issues stemming from accumulated technical debt.
Author and maintain operational runbooks to enable efficient on-call triage and escalation.
Continuously improve alerting quality to reduce noise and mean-time-to-resolution (MTTR).
Enforce production access models with least-privilege principles, access requests/approvals workflows, and audit trails.
Remediate security findings from scans (TLS configurations, HTTP security headers, WAF rule tuning).
Investigate and respond to security incidents including bot attacks, credential stuffing, and anomalous traffic patterns.
Maintain and tune Cloudflare WAF rules, rate limiting, and bot management policies in response to emerging threats.
Support product engineering teams through the "You Build It, You Run It" operating model, enabling teams to own production safely.
Provide platform tooling and documentation that reduce friction for development teams shipping software.
Handle intake requests from engineering and data teams, translating needs into platform solutions.
Collaborate with Principal Engineers and product teams to define platform working agreements and service ownership boundaries.
Influence technical direction for infrastructure decisions across the organization.
Mentor junior and mid-level engineers, sharing knowledge of cloud architecture and DevOps best practices.
Contribute to release readiness processes, environment promotion models, and deployment notification systems.
6-8+ years of experience in platform engineering, DevOps, SRE, or cloud infrastructure roles.
Deep expertise with Microsoft Azure (App Services, AKS, API Management, Azure Functions, SQL/MySQL, Key Vault, Virtual Networks).
Strong experience with Infrastructure as Code (Terraform strongly preferred).
Proficiency with CI/CD systems (GitHub Actions, TeamCity, or equivalent).
Proficiency with scripting/automation languages (Python, Bash, PowerShell, or Go).
Solid understanding of networking (DNS, CDN, load balancing, VPN, private endpoints).
Experience with container orchestration (Kubernetes/AKS) and containerized application deployments.
Hands-on experience with observability platforms (Dynatrace, Datadog, or equivalent) including APM, log management, and synthetic monitoring.
Strong incident management and troubleshooting skills for complex distributed systems.
Experience with security engineering (WAF, SSO/SAML/OIDC, vulnerability scanning, SIEM integration).
Experience with Azure in a multi-cloud environment.
Experience with Cloudflare (DNS, WAF, Workers, rate limiting).
Familiarity with Jira ITSM or similar on-call management tools.
Experience in regulated or high-availability e-commerce environments (PCI, payment systems).
Experience migrating CI/CD pipelines (e.g., TeamCity to GitHub Actions).
Experience with Dynatrace.