altText

Senior Site Reliability Engineer (SRE)

India

Full Time

At Atlan, we are both proud and humbled by the fact that we are building products that power data teams in more than 50 countries around the world, ranging from small startups to Fortune 500 giants. Our products form the very backbone of our users, have become a part of their daily workflows and directly impact their success.

In the past decade, products like Github helped companies build great engineering cultures. At Atlan, we are on a quest to help build winning data teams of the future.

Every line of code we write, every feature we add, every pixel we create—everything we do—is to help data teams around the world do their lives’ best work.

Let’s show the world how it’s done!

founders sign

(Co-founders of Atlan)

About the Role

  • We are seeking a talented Site Reliability Engineer (SRE) to join our dynamic team and take ownership of platform-first projects that drive scalability, reliability, and reusability of core platform components. As an SRE, you will be responsible for designing, building, scaling, and maintaining the infrastructure that supports our cutting-edge technology platform.

What will you do?

  • Lead platform-first initiatives to ensure the scalability, reliability, and performance of our technology platform.
  • Design, build, and maintain robust infrastructure to support our distributed systems, leveraging technologies such as Kubernetes, Kafka, Postgres, Cassandra, and Redis.
  • Collaborate closely with development teams to ensure seamless integration of platform services and features.
  • Implement monitoring and alerting systems to ensure high availability and performance of the platform, focusing on SLA and availability.
  • Work with CXOs to develop comprehensive cost reporting and cost considerations for each build.
  • Continuously evaluate and recommend improvements to platform infrastructure and processes to enhance efficiency and reliability.
  • Collaborate with other teams to align the platform with customer needs and business goals.
  • Develop and maintain CI/CD pipelines for seamless deployment and release management.

What makes you a match?

  • Proven experience in software development and engineering, with a strong emphasis on building large-scale distributed systems.
  • Proficiency in one of the commonly used programming languages for building distributed systems, such as Java, Python, or Golang.
  • Extensive experience with cloud infrastructure providers (AWS, Azure, or GCP) and developing distributed systems using cloud services.
  • Knowledge of AWS Quicksight, AWS/Azure Metering, OpenCost, and Kubecost will be advantageous.
  • Strong expertise in container orchestration platforms, specifically Kubernetes, is mandatory. CKA certification is a plus.
  • Familiarity with Agile development methodologies is highly desirable.
  • Exceptional problem-solving skills and a passion for developing robust, scalable, and secure solutions.
  • Excellent communication skills to effectively collaborate with cross-functional teams.
  • Ability to share impactful tech stories, demonstrating the results of your technical contributions.

Can't wait to join us?
Get started on your application.

Apply Now

Do your life's best work with Atlan

Shape the future of our products

As part of the Business Team, you play a pivotal role. You are the voice of our users and inform all the important decisions we make. You will work closely with our users and prospects to understand their pain points and feed this information into our product roadmap.

Ownership and autonomy from day one

At Atlan, business is all about putting the user first and addressing their needs. So the way your set your goals will be informed by the unique challenges that you need to solve. That’s why you have complete ownership of your projects—you set the priorities, decide on the experiments, and the best way to reach your goals.

Apply for Senior Site Reliability Engineer (SRE)

[Website env: production]