Rakshit Mathur

Data Scientist

Rajasthan, India5 yrs 1 mo experience

Key Highlights

Expert in optimizing data pipelines for performance.
Proven track record in AWS and Azure data solutions.
Strong background in building reliable ETL processes.

Stackforce AI infers this person is a Data Engineering expert in SaaS with a focus on performance optimization and scalable data solutions.

Contact

Skills

Core Skills

Data ArchitectureAwsData EngineeringEtl

Other Skills

TerraformRDSData GovernanceAzureSnowflakePythonPower BIInformaticaData QualityPySparkAirflowDockerGitApache KafkaMySQL

About

I’m a Data Engineer with 5+ years of experience turning messy data pipelines into fast, reliable systems. I’ve designed and optimized data platforms on AWS, Azure, and Snowflake, built streaming jobs in PySpark, and managed petabytes of data powering analytics, ML, and business decisions. What I love most is solving the “why is this slow?” kind of problems - the ones that need both engineering and detective skills. My work has ranged from: • Designing end-to-end data architectures (batch + real-time) • Building ETL pipelines using Airflow, ADF, and Glue • Creating cost-optimized data lakes and warehouses • Writing high-performance SQL and PySpark transformations • Enabling business teams with reliable dashboards and metrics I’m deeply curious about performance optimization, scalable design, and bridging the gap between raw data and actionable insight. Always up for discussing data modeling, pipeline design, or just that one Spark job that mysteriously ran forever.

Experience

5 yrs 1 mo

Total Experience

1 yr 6 mos

Average Tenure

8 mos

Current Experience

Hook

Senior Data Engineer

Oct 2025 – Present · 8 mos · Remote

Led migration of monolithic materialized views into modular, production-grade stored procedures, improving maintainability and reducing refresh complexity
Stabilized critical RDS infrastructure by identifying and resolving high-connection lockups, preventing cascading ingestion failures
Redesigned ingestion workflows to eliminate dangling Lambda connections and improve system resilience under traffic spikes
Modularized Terraform data-platform repos to improve infra clarity, scalability, and deployment reliability
Implemented advanced RDS monitoring and VPC-level diagnostics to proactively detect production bottlenecks
Designed structured data layers (stage → transform → facts) to improve data governance and analytics reliability
Partner closely with Product to translate PRDs (e.g., Jams community features) into scalable data models and event tracking systems

TerraformRDSAWSData GovernanceData Architecture

Elevate k-12

Data Engineer II

Feb 2024 – Sep 2025 · 1 yr 7 mos · Remote

Designed and scaled Azure + Snowflake data pipelines, processing 5M+ daily records with 35% faster runtimes, ensuring reliable reporting for operations and leadership.
Automated job-vacancy extraction from K-12 boards (AppliTrack, HireTrue, TedK12) with Python (Selenium, Scrapy), enabling the business team to identify hiring schools 2 weeks earlier and directly contributing to revenue pipeline growth.
Integrated Edlink APIs to sync rosters, schedules, and user data, cutting manual onboarding by 53% and improving teacher deployment efficiency.
Built BI dashboards in Power BI to track engagement metrics across 10K+ students and teachers, boosting strategic planning accuracy by 20%.
Delivered classroom performance insights (attendance, engagement, teacher effectiveness) that influenced executive decisions on staffing and curriculum design

AzureSnowflakePythonPower BIData EngineeringETL

Vmware

Data Engineer II

Jan 2023 – Dec 2023 · 11 mos · Remote

Led the design and deployment of an AWS serverless architecture (EMR, S3, MWAA, RDS, Lambda, API Gateway, IAM, Secrets Manager) for a web application, delivering 99.9% uptime and reducing infrastructure costs by 32% compared to EC2-based setups.
Engineered automated data ingestion pipelines from Zoom, Slack, and Microsoft 365 sources (OneDrive, SharePoint, Outlook, Teams) using Informatica, ensuring 100% SLA compliance on daily and monthly loads.
Developed anomaly-detection scripts to flag issues like missing data or null primary keys, preventing 15+ critical data quality incidents per month and improving trust in downstream analytics.

AWSInformaticaData QualityData Engineering

Zs

Data Engineer

Jan 2021 – Jan 2023 · 2 yrs · Gurugram, Haryana, India · Hybrid

Processed and standardized Real-World Data (RWD) for global pharma leaders (Johnson & Johnson, Sanofi, Gilead), improving downstream analytics accuracy by 25% and accelerating insights for clinical and commercial teams.
Architected and optimized data warehouse solutions with PySpark, Hive, MySQL, and Python, reducing query runtime from hours to minutes and enabling faster business reporting.
Automated recurring data transformation pipelines using Airflow, which cut manual effort by 40% and reduced delivery time of monthly/quarterly feeds by 3–4 days.
Deployed pipelines on AWS (EC2, EMR, S3, Athena) to process billions of healthcare records, achieving 99.9% pipeline reliability while lowering compute costs by ~20%.

PySparkAirflowAWSData EngineeringData Architecture