Rakshit Mathur

Data Scientist

Rajasthan, India5 yrs experience

Key Highlights

  • Expert in optimizing data pipelines for performance.
  • Proven track record in AWS and Azure data solutions.
  • Strong background in building reliable ETL processes.
Stackforce AI infers this person is a Data Engineering expert in SaaS with a focus on performance optimization and scalable data solutions.

Contact

Skills

Core Skills

Data ArchitectureAwsData EngineeringEtl

Other Skills

TerraformRDSData GovernanceAzureSnowflakePythonPower BIInformaticaData QualityPySparkAirflowDockerGitApache KafkaMySQL

About

I’m a Data Engineer with 5+ years of experience turning messy data pipelines into fast, reliable systems. I’ve designed and optimized data platforms on AWS, Azure, and Snowflake, built streaming jobs in PySpark, and managed petabytes of data powering analytics, ML, and business decisions. What I love most is solving the “why is this slow?” kind of problems - the ones that need both engineering and detective skills. My work has ranged from: • Designing end-to-end data architectures (batch + real-time) • Building ETL pipelines using Airflow, ADF, and Glue • Creating cost-optimized data lakes and warehouses • Writing high-performance SQL and PySpark transformations • Enabling business teams with reliable dashboards and metrics I’m deeply curious about performance optimization, scalable design, and bridging the gap between raw data and actionable insight. Always up for discussing data modeling, pipeline design, or just that one Spark job that mysteriously ran forever.

Experience

Hook

Senior Data Engineer

Oct 2025Present · 6 mos · Remote

  • Led migration of monolithic materialized views into modular, production-grade stored procedures, improving maintainability and reducing refresh complexity
  • Stabilized critical RDS infrastructure by identifying and resolving high-connection lockups, preventing cascading ingestion failures
  • Redesigned ingestion workflows to eliminate dangling Lambda connections and improve system resilience under traffic spikes
  • Modularized Terraform data-platform repos to improve infra clarity, scalability, and deployment reliability
  • Implemented advanced RDS monitoring and VPC-level diagnostics to proactively detect production bottlenecks
  • Designed structured data layers (stage → transform → facts) to improve data governance and analytics reliability
  • Partner closely with Product to translate PRDs (e.g., Jams community features) into scalable data models and event tracking systems
TerraformRDSAWSData GovernanceData Architecture

Elevate k-12

Data Engineer II

Feb 2024Sep 2025 · 1 yr 7 mos · Remote

  • Designed and scaled Azure + Snowflake data pipelines, processing 5M+ daily records with 35% faster runtimes, ensuring reliable reporting for operations and leadership.
  • Automated job-vacancy extraction from K-12 boards (AppliTrack, HireTrue, TedK12) with Python (Selenium, Scrapy), enabling the business team to identify hiring schools 2 weeks earlier and directly contributing to revenue pipeline growth.
  • Integrated Edlink APIs to sync rosters, schedules, and user data, cutting manual onboarding by 53% and improving teacher deployment efficiency.
  • Built BI dashboards in Power BI to track engagement metrics across 10K+ students and teachers, boosting strategic planning accuracy by 20%.
  • Delivered classroom performance insights (attendance, engagement, teacher effectiveness) that influenced executive decisions on staffing and curriculum design
AzureSnowflakePythonPower BIData EngineeringETL

Vmware

Data Engineer II

Jan 2023Dec 2023 · 11 mos · Remote

  • Led the design and deployment of an AWS serverless architecture (EMR, S3, MWAA, RDS, Lambda, API Gateway, IAM, Secrets Manager) for a web application, delivering 99.9% uptime and reducing infrastructure costs by 32% compared to EC2-based setups.
  • Engineered automated data ingestion pipelines from Zoom, Slack, and Microsoft 365 sources (OneDrive, SharePoint, Outlook, Teams) using Informatica, ensuring 100% SLA compliance on daily and monthly loads.
  • Developed anomaly-detection scripts to flag issues like missing data or null primary keys, preventing 15+ critical data quality incidents per month and improving trust in downstream analytics.
AWSInformaticaData QualityData Engineering

Zs

Data Engineer

Jan 2021Jan 2023 · 2 yrs · Gurugram, Haryana, India · Hybrid

  • Processed and standardized Real-World Data (RWD) for global pharma leaders (Johnson & Johnson, Sanofi, Gilead), improving downstream analytics accuracy by 25% and accelerating insights for clinical and commercial teams.
  • Architected and optimized data warehouse solutions with PySpark, Hive, MySQL, and Python, reducing query runtime from hours to minutes and enabling faster business reporting.
  • Automated recurring data transformation pipelines using Airflow, which cut manual effort by 40% and reduced delivery time of monthly/quarterly feeds by 3–4 days.
  • Deployed pipelines on AWS (EC2, EMR, S3, Athena) to process billions of healthcare records, achieving 99.9% pipeline reliability while lowering compute costs by ~20%.
PySparkAirflowAWSData EngineeringData Architecture

Education

Birla Institute of Technology, Mesra

Bachelor of Engineering - BE — Computer Science

Jan 2017Jan 2021

Ryan International School, Mansarovar

Secondary (X) — CBSE

Jan 2004Aug 2016

Stackforce found 100+ more professionals with Data Architecture & Aws

Explore similar profiles based on matching skills and experience