Nitish Tiwari

SRE (Site Reliability Engineer)

Bengaluru, Karnataka, India6 yrs 8 mos experience
Highly Stable

Key Highlights

  • Led the development of a high-performance SRE Data team.
  • Achieved significant cloud cost savings through modernization.
  • Developed automation frameworks enhancing reliability and speed.
Stackforce AI infers this person is a Fintech Infrastructure Engineer with strong expertise in Site Reliability Engineering and cloud cost optimization.

Contact

Skills

Core Skills

Site Reliability EngineeringKubernetesData InfrastructureDevops EngineeringCloud Cost Management

Other Skills

API GatewaysAWSAmazon Web Services (AWS)Apache AirflowApache KafkaApache SparkArgoCDBashCI/CDCentralized ObservabilityCloud CostContinuous Integration and Continuous Delivery (CI/CD)Data MigrationData WarehousingDjango

About

Site Reliability Engineer with 6+ years of experience designing and scaling reliable, high-performance systems - driving automation-first solutions to improve reliability and developer experience.

Experience

Cred

2 roles

Lead Site Reliability Engineer

Promoted

Apr 2023Nov 2025 · 2 yrs 7 mos · On-site

  • + Built and scaled the SRE Data team from the ground up; led architecture design, OKR planning, and hiring efforts while spearheading the DevEx charter to enhance reliability and developer productivity.
  • + Lead contributor in building and scaling CRED’s Kubernetes ecosystem using Terraform, Helm, ArgoCD, and Karpenter - powering 50+ data and open-source workloads with built-in observability.
  • + Drove open-source adoption by migrating multiple SaaS platforms to an in-house Spark, Flink, Airflow, and JupyterHub setup - reducing infra and licensing costs by ~$100K annually.
  • + Spearheaded full-stack infra modernisation and cloud cost optimisation for a CRED subsidiary - driving ~60% cloud savings, moving legacy monoliths to a resilient containerised stack with zero downtime, unifying observability, streamlining CI/CD by moving to CRED’s internal platform Caterpillar, standardising developer tooling, and strengthening the org’s security posture through improved governance and security controls.
TerraformHelm ChartsArgoCDKubernetesObservabilitySite Reliability Engineering

Site Reliability Engineer

Apr 2021Mar 2023 · 1 yr 11 mos · On-site

  • + Owned end-to-end data infrastructure, serving as the single point of contact for 50+ data team members - driving infrastructure design, platform development, cost optimisation, and ensuring compliance across all environments.
  • + Drove onboarding and infrastructure enablement for key data platforms - SaaS solutions like Databricks, Astronomer, MonteCarlo, and Fennel, along with internal platforms such as Batch, RTP, Recon, and the ingestion framework (Kinesis, Firehose, Glue) - ensuring secure access, reliable networking, and BAU operations.
  • + Oversaw SRE on-call operations - driving incident triage, RCA, improving documentation, security patching, readiness drills for high-traffic events (e.g., IPL) and well-architected reviews. Collaborated closely with developers to resolve deployment and infrastructure issues.
  • Developed internal automation frameworks to enhance reliability and speed:
  • + GitOps-driven performance testing framework for cross-account microservice and datastore cloning, improving test velocity and reliability during peak traffic.
  • + IaC-based AWS account provisioning framework enabling one-click environment setup and bootstrap infra, cutting setup time by ~70%.
  • + Python-based Kubernetes upgrade CLI with automated pre-checks and validations, reducing manual effort and improving upgrade velocity.
Data InfrastructureSaaS SolutionsKinesisFirehoseGlueSite Reliability Engineering

To the new

2 roles

DevOps Engineer

Promoted

Feb 2019Apr 2021 · 2 yrs 2 mos · Noida, Uttar Pradesh, India · On-site

  • + Delivered end-to-end infrastructure for multiple fintech, media, insurance, and adtech clients by setting up infrastructure using reusable IaC modules with automated scaling & load balancing, CI/CD, centralised observability, and security best practices in place.
  • Cloud Cost Management Platform
  • + Contributed to developing and scaling the platform that evolved into the company’s flagship product and later spun out as an independent venture
  • + Enabled onboarding of 100+ Indian and global customers, driving significant cloud cost savings
  • + Built scalable ETL pipelines to ingest and process AWS CUR and pricing data from 100+ customers into Redshift, collaborated with backend teams to integrate custom cost optimisation logic (RI, SP), and delivered SQL-driven dashboards with dynamic filters for granular, actionable cost visibility.
IaCCI/CDCentralized ObservabilityDevOps Engineering

DevOps Engineer

May 2018Jul 2018 · 2 mos · Noida, Uttar Pradesh, India · On-site

  • + Developed a Django-based self-serve platform to provision ephemeral environments with user-defined configs, reducing the DevOps team's toil and accelerating dev workflows.
DjangoDevOps Engineering

Education

Dr. A.P.J. Abdul Kalam Technical University

Bachelor of Technology (B.Tech.) — Computer Science & Engineering

Jan 2015Jan 2019

Stackforce found 100+ more professionals with Site Reliability Engineering & Kubernetes

Explore similar profiles based on matching skills and experience