Looking for a

SRE

POS-307
Location: Remote
Type: Full-time
Seniority: Senior

About Us:

As a Senior SRE at Kenility, you’ll join a tight-knit family of creative developers, engineers, and designers who strive to develop and deliver the highest quality products into the market.

 

Technical Requirements:

  • Bachelor’s degree in Computer Science, Software Engineering, or a related field.
  • At least five years of experience in Site Reliability Engineering, DevOps, or advanced systems engineering positions.
  • Demonstrated background in establishing or evolving SRE frameworks and reliability practices within an organization.
  • Experience defining, tracking, and analyzing stability and performance metrics to ensure system reliability.
  • Strong hands-on expertise working with Amazon Web Services (AWS).
  • Solid experience with secrets management solutions such as AWS Secrets Manager, HashiCorp Vault, Keeper, Infisical, or similar tools.
  • Familiarity with Atlassian suite products, including Jira, Confluence, and Bitbucket.
  • Practical experience implementing infrastructure-as-code using Terraform and configuration management tools such as Chef, Puppet, or Ansible.
  • Ability to collaborate closely with development teams to identify and address automation and monitoring requirements.
  • Deep knowledge of monitoring, observability practices, incident management, and service reliability engineering.
  • Proven ability to define and implement observability standards, incident response processes, and SLIs/SLOs.
  • AWS certifications are highly valued.
  • Experience with programming languages such as Ruby, Python, or .NET to support integrations and legacy systems is a plus.
  • Background in regulated industries such as insurance or fintech is desirable.
  • Familiarity with IT service management platforms like incident.io, Jira Service Manager, or similar tools.
  • Experience working with CI/CD pipelines and modern DevOps methodologies.
  • Minimum Upper Intermediate English (B2) or Proficient (C1).

 

Tasks and Responsibilities:

  • Establish and shape the SRE function from the ground up, defining standards, processes, and tooling to build a robust reliability practice.
  • Design and implement best practices, frameworks, and automation strategies that support long-term system stability and scalability.
  • Lead the evolution of the existing NOC into a technically empowered, automation-focused reliability team.
  • Oversee and enhance observability by improving monitoring, alerting, and dashboarding capabilities using tools such as Grafana, CloudWatch, and Datadog.
  • Define and track SLIs, SLOs, and error budgets to ensure accountability for uptime and system performance objectives.
  • Partner with DevOps, SysOps, and engineering teams to strengthen reliability standards and mentor team members, fostering skill development.
  • Drive cultural and technical transformation by promoting forward-thinking reliability and automation practices across the organization.

 

Soft Skills:

  • Responsibility
  • Proactivity
  • Flexibility
  • Great communication skills
Join us

Ready to be part of our team?

Tell us what you're working on—we’ll help you design, scale, and deliver AI-powered software that drives real business outcomes.
Thank you!
Your message has been sent.
We will review it shortly and get back to you.