TasksTasks

We are seeking a seasoned DevOps Site Reliability Engineer (SRE) with over 7 years of experience to lead and manage our DevOps platform. This role demands a strategic thinker who can ensure high availability, drive tool upgrades, plan long-term roadmaps, and optimize costs across the infrastructure. You will be the go-to expert for platform reliability, scalability, and efficiency.

Key Responsibilities

Platform Ownership: Manage and maintain the DevOps platform, ensuring seamless integration, performance, and scalability.
Tool Lifecycle Management: Lead the upgrade, migration, and deprecation of DevOps tools (CI/CD, monitoring, logging, IaC, etc.).
24/7 Availability: Implement robust monitoring and alerting systems to ensure tools and services are available round-the-clock.
Strategic Roadmapping: Develop and maintain a 5-year roadmap for platform evolution aligned with business goals and technology trends.
Cost Optimization: Analyze infrastructure and tool usage to identify cost-saving opportunities without compromising performance or reliability.
Automation & Efficiency: Drive automation across deployment, monitoring, and incident response to reduce manual effort and improve consistency.
Collaboration: Work closely with engineering, product, and infrastructure teams to align DevOps practices with organizational needs.
Security & Compliance: Ensure platform adheres to security best practices and compliance requirements.Key Responsibilities:

Platform Operations & Availability

Lead the Operations Team responsible for end-to-end platform availability, performance, and reliability.
Establish and monitor SLAs, KPIs, and incident response processes to ensure uninterrupted service.
Drive automation initiatives to eliminate manual tasks and improve operational efficiency.

Development & Delivery

Translate business needs into clear, actionable requirements for the development team.
Ensure timely delivery of features and enhancements that align with security and compliance standards.
Promote DevSecOps practices and continuous improvement across teams.

Key Responsibilities

Platform Ownership: Manage and maintain the DevOps platform, ensuring seamless integration, performance, and scalability.
Tool Lifecycle Management: Lead the upgrade, migration, and deprecation of DevOps tools (CI/CD, monitoring, logging, IaC, etc.).
24/7 Availability: Implement robust monitoring and alerting systems to ensure tools and services are available round-the-clock.
Strategic Roadmapping: Develop and maintain a 5-year roadmap for platform evolution aligned with business goals and technology trends.
Cost Optimization: Analyze infrastructure and tool usage to identify cost-saving opportunities without compromising performance or reliability.
Automation & Efficiency: Drive automation across deployment, monitoring, and incident response to reduce manual effort and improve consistency.
Collaboration: Work closely with engineering, product, and infrastructure teams to align DevOps practices with organizational needs.
Security & Compliance: Ensure platform adheres to security best practices and compliance requirements.Key Responsibilities:

Platform Operations & Availability

Lead the Operations Team responsible for end-to-end platform availability, performance, and reliability.
Establish and monitor SLAs, KPIs, and incident response processes to ensure uninterrupted service.
Drive automation initiatives to eliminate manual tasks and improve operational efficiency.

Development & Delivery

Translate business needs into clear, actionable requirements for the development team.
Ensure timely delivery of features and enhancements that align with security and compliance standards.
Promote DevSecOps practices and continuous improvement across teams.

QualificationsQualifications

Required Skills & Qualifications

Proven experience managing large-scale DevOps platforms and toolchains.
Deep understanding of CI/CD pipelines, container orchestration (Kubernetes, Docker), cloud platforms (AWS, Azure, GCP), and IaC tools (Terraform, Ansible).
Strong scripting and automation skills (Python, Bash, Go).
Experience with monitoring and observability tools (Prometheus, Grafana, ELK, Datadog).
Excellent problem-solving and incident management capabilities.
Ability to create and execute long-term strategic plans.
Strong focus on cost-efficiency and resource optimization.
Exceptional communication and stakeholder management skills.

?? Preferred Qualifications

Any Certifications in cloud platforms or DevOps tools will be plus.
Experience with FinOps or cloud cost management platforms.
Exposure to SRE principles and practices (SLIs, SLOs, error budgets).

Our company cultureOur company culture

At Daimler Truck, we promote diversity and foster an inclusive corporate culture. We value the individual strengths of our employees, as these lead to the best team performance and thus to the success of our company. Inclusion and Equal opportunities are important to us – regardless of where you come from and who you are. We look forward to receiving applications from people of all cultures and genders, parents, people with disabilities and people from the LGBTIQ+ community.