Pushkar Joshi

SRE (Site Reliability Engineer)

Pune, Maharashtra, India10 yrs 3 mos experience
Highly Stable

Key Highlights

  • Designed high-availability cloud architectures with 99.99% uptime.
  • Reduced incident response time by 40% through advanced monitoring.
  • Accelerated deployment cycles from days to hours with CI/CD.
Stackforce AI infers this person is a Cloud Infrastructure Engineer with expertise in Site Reliability Engineering and DevOps in the Fintech sector.

Contact

Skills

Core Skills

Cloud InfrastructureDevopsSite Reliability EngineeringCloud ManagementCloud MigrationIt ConsultingSystem Administration

Other Skills

AWSAkamaiAmazon CloudWatchAmazon EC2Amazon Relational Database Service (RDS)Amazon Route 53Amazon S3AsperaBashCCI/CDChange ManagementCloud SecurityCloud StorageData Center

About

I'm a founding team member of Palindrome, where we've built an AI-native platform that transforms how wealth managers and private banks operate. I focus on creating and maintaining a robust cloud infrastructure, enabling our GenAI solutions to deliver seamless, secure experiences. Our platform at Palindrome eliminates time-consuming manual processes throughout the client journey, transforming everything from client onboarding to review cycles, risk assessments, and regulatory compliance. This technology empowers financial firms to expand their business, enhance client service, and substantially reduce administrative burden. We currently support institutions collectively managing assets exceeding $120B. Prior to Palindrome, I was a Senior Site Reliability Engineer at DeepIntent, where I spearheaded the implementation of Kubernetes clusters on AWS and reduced incident response time by 40% through advanced monitoring solutions. Before that, at Vuclip India, I managed 40+ GKE Kubernetes clusters and orchestrated a major data center to GCP migration involving over 300TB of data. Throughout my 9+ year career, I've engineered cloud infrastructure solutions across media, advertising, insurance, and financial sectors. My most significant technical contributions include: Designing high-availability cloud architectures that maintained 99.99% uptime during critical migrations while supporting mission-critical applications. Creating comprehensive monitoring ecosystems with Prometheus, Grafana, and AlertManager that reduced mean time to resolution by 70% and provided actionable insights. Implementing infrastructure as code with Terraform across multiple cloud providers, ensuring consistent, secure, and scalable deployments. Developing automated CI/CD pipelines that accelerated deployment cycles from days to hours while strengthening security posture. I've earned certifications including AWS Solutions Architect, Google Cloud Professional Architect, Kubernetes Administrator (CKA), Kubernetes Application Developer (CKAD), HashiCorp Certified Terraform Associate, and Red Hat Certified System Administrator. I thrive on tackling complex infrastructure challenges that deliver tangible business value—combining technical depth with a customer-centric approach to build reliable, efficient, and secure systems that scale. SRE | DevOps | GCP | AWS | CKA | CKAD | RHCSA | Docker | Kubernetes | CICD | ELK | Jenkins| Prometheus | Grafana | New Relic | IBM Aspera

Experience

Palindrome

Lead SRE

Sep 2024Present · 1 yr 6 mos · Pune, Maharashtra, India · Hybrid

  • At Palindrome, I design, implement, and maintain a secure and scalable cloud infrastructure for web and mobile applications, powering our AI-native platform for wealth managers and private banks. Collaborating closely with our Product Engineering team, I deploy and manage cutting-edge technologies, ensuring seamless integration with clients' existing workflows while upholding the highest security standards essential for the financial services industry.
  • Key Responsibilities:
  • Infrastructure Management: Leveraging Terraform to provision and manage cloud resources that support our agentic AI solutions.
  • CI/CD Implementation: Developing automated pipelines that facilitate rapid deployment of new features and enhancements.
  • Cloud Security: Enforcing robust security measures in compliance with financial services regulations.
  • Monitoring & Optimization: Utilizing Prometheus and Grafana for real-time system health insights and optimizing resource utilization.
TerraformCI/CDCloud SecurityPrometheusGrafanaCloud Infrastructure+1

Deepintent

2 roles

Senior Site Reliability Engineer

Promoted

Feb 2024Sep 2024 · 7 mos · Pune, Maharashtra, India · Hybrid

  • Deepintent is a digital advertising company. It offers its clients Healthcare Marketing Platform, the industry’s first and only DSP with in-platform optimization toward business outcomes. The deepintent platform connects marketers with patients through unique data, premium media partnerships across all devices, and custom integrations.
  • ● Implement monitoring and alerting solutions with Prometheus, Grafana, and integration with Pagerduty.
  • ● Setting up the CICD pipeline with GitHub actions.
  • ● Creating & maintaining Kubernetes clusters across Production, Dev, Staging, and Release environments.
  • ● Setting up Aerospike, and Airflow on EC2 as well as on Kubernetes.
  • ● Provide tech support to Dev’s.
  • ● Troubleshoot issues during and after application deployments on
  • Kubernetes.
  • ● Manage different AWS services which include EC2, RDS, VPC, S3,
  • Route53 etc.
  • ● Configuration of VPC peering between different AWS accounts and
  • setting up routing.
  • ● Taking regular backups of Grafana dashboards and data sources.
  • ● Setting up Oauth-2 proxy service on Kubernetes for external-facing
  • URL’s for authentication with google gsuite account.
  • ● Setting up Jenkins on Kubernetes for cost optimization and high
  • availability.
  • ● Currently working on LinkerD Service Mesh.
  • ● Setting up a logging system with Loki on Kubernetes.
  • ● Helm chart creation for different APIs/services.
PrometheusGrafanaGitHub ActionsKubernetesAWSSite Reliability Engineering+1

Site Reliability Engineer II

Sep 2020Feb 2024 · 3 yrs 5 mos · Pune, Maharashtra, India · Hybrid

  • ● Implement monitoring and alerting solutions with Prometheus, Grafana, and integration with Pagerduty.
  • ● Setting up the CICD pipeline with GitHub actions.
  • ● Creating & maintaining Kubernetes clusters across Production, Dev, Staging, and Release environments.
  • ● Setting up Aerospike, and Airflow on EC2 as well as on Kubernetes.
  • ● Provide tech support to Dev’s.
  • ● Troubleshoot issues during and after application deployments on
  • Kubernetes.
  • ● Manage different AWS services which include EC2, RDS, VPC, S3,
  • Route53 etc.
  • ● Configuration of VPC peering between different AWS accounts and
  • setting up routing.
  • ● Taking regular backups of Grafana dashboards and data sources.
  • ● Setting up Oauth-2 proxy service on Kubernetes for external-facing
  • URL’s for authentication with google gsuite account.
  • ● Setting up Jenkins on Kubernetes for cost optimization and high
  • availability.
  • ● Currently working on LinkerD Service Mesh.
  • ● Setting up a logging system with Loki on Kubernetes.
  • ● Helm chart creation for different APIs/services. Deepintent is a digital advertising company. It offers its clients Healthcare Marketing Platform, the industry’s first and only DSP with in-platform optimization toward business outcomes.
PrometheusGrafanaGitHub ActionsKubernetesAWSSite Reliability Engineering+1

Vuclip inc.

Site Reliability Engineer

Sep 2018Sep 2020 · 2 yrs · Pune, Maharashtra, India

  • Deployment of new k8s Cluster and maintenance.
  • Setting up the CD pipelines on Harness for k8s microservices.
  • Creating a new infrastructure using Terraform on GCP and AWS.
  • AWS
  • Implementation of consul for service discovery in the k8s cluster.
  • Creating Route53 registering, DNS, Nginx server to route the traffic.
  • Setting up the new Prometheus Grafana and Thanos servers.
  • Writing the scripts in shell and python for automation. for event handlers
  • using Jenkins jobs for pod AWS, EBS restart.
  • Worked on multiple projects such as Data Center to GCP migration server
  • migrations, Integration of new microservices with current services in GCP.
  • Cost optimization solutions in order to reduce bills on infrastructure.
  • POC on different tools for process improvements.
  • Experience in AWS & GCP in different services such as Route53, X-Ray, EC2,
  • S3,ELB, VPC, CodePipeline,GKE, GCS, etc.
  • Bringing up and developing the SRE culture in the organization to minimize
  • the gap between development and operations for minimum downtime
TerraformKubernetesAWSGCPPrometheusGrafana+2

Allstate india

2 roles

IT Consultant

Promoted

Apr 2018Sep 2018 · 5 mos

  • Coordinating with Dev/QA teams for CR and bug fixing for the application with Allstate Canada Project.
  • Experience of working on supporting development teams and change
  • management.
  • Restoring and cloning different environments as per dev requirements.
  • Worked on release management tasks and production deployments &
  • patches & Coordinate the same with business.
  • Monitoring & management AWS Cloud infra for prod applications with
  • components like EC2, RDS, S3, IAM.
  • Resolving Incident, Change, Problem Management, User Generated requests, and queries.
  • Package Management – Installation, Up-gradation, Verification, and
  • troubleshooting of RPM packages.
AWSIncident ManagementChange ManagementIT ConsultingCloud Management

Associate IT Consultant

Jan 2016Apr 2018 · 2 yrs 3 mos

  • Perform system administration for HP & DELL machines having RHEL 5/6/7,
  • AIX, and HPUX as an Operating system using Icinga for Canada Project.
  • Creation of services related to the application, shell scripting, cron jobs.
RHELShell ScriptingSystem Administration

Education

MMCOE, PUNE

Engineer’s Degree

Jan 2012Jan 2015

T B GIRWALKAR POLYTECHNIC

DIPLOMA IN E&TC

Jan 2008Jan 2011

Stackforce found 100+ more professionals with Cloud Infrastructure & Devops

Explore similar profiles based on matching skills and experience