Gayathri S.

Data Engineer

Albany, New York, United States3 yrs 8 mos experience

AI EnabledAI ML Practitioner

Key Highlights

Built scalable data pipelines improving data availability by 35%.
Automated data ingestion achieving 99.9% pipeline reliability.
Developed dimensional models enhancing query performance by 50%.

Stackforce AI infers this person is a Data Engineer specializing in Insurance and Public Sector data solutions.

Contact

Skills

Core Skills

Data EngineeringCloud ComputingData Analysis

Other Skills

Apache SparkKafkaAWS GlueLambdaStep FunctionsDelta LakeSQLPySparkGitHub ActionsTerraformAWS S3PythonSnowflakedbtHive

About

’m a Data Engineer with 3+ years of experience building reliable, scalable data pipelines across insurance, public sector, and ad-tech domains. I specialize in turning raw, messy data into analytics-ready datasets that business and analytics teams can trust. Currently, I work on batch and streaming pipelines using Spark, Kafka, and AWS (S3, Glue, Lambda, Step Functions), supporting actuarial, risk, and operations teams with high-quality data at scale. I’ve built and maintained data lakes with Delta Lake, developed dimensional models in Snowflake, and improved query performance and pipeline reliability through optimization and automation. Previously, I worked as a Data Analyst in public sector and ad-tech environments, where I designed ELT workflows with dbt, optimized large SQL workloads, and implemented data validation frameworks to support reporting, compliance, and decision-making. I’ve handled multi-terabyte datasets, integrated data from APIs and SFTP sources, and reduced manual reporting through automation with Airflow and Python. What I bring: • Strong hands-on experience with Python, SQL, Spark, Kafka, and AWS • Solid understanding of data modeling, ETL/ELT, and analytics engineering • Focus on data quality, reliability, and performance • Ability to translate business requirements into scalable data solutions I enjoy working at the intersection of engineering and analytics — building systems that make data accessible, accurate, and useful. I’m always open to opportunities where I can grow as a data engineer and work on impactful data products. 📫 Feel free to connect if you’d like to talk about data engineering, analytics, or cloud data platforms. 📞+1 (802)531-0104

Experience

3 yrs 8 mos

Total Experience

1 yr 5 mos

Average Tenure

9 mos

Current Experience

Metlife

Data Analyst/ Analytics Engineer

Aug 2025 – Present · 9 mos

Designed and maintained batch and streaming data pipelines using Apache Spark, Kafka, AWS Glue, Lambda, and Step Functions, improving data availability for actuarial and risk teams by 35%.
Built and managed an S3-based data lake with Delta Lake tables to centralize structured and semi-structured insurance data, cutting reconciliation efforts by 40%.
Automated ingestion from 15+ internal source systems, achieving 99.9% pipeline reliability for BI and analytics consumers.
Optimized Spark SQL and PySpark jobs through partitioning and execution plan tuning, reducing runtimes by 30–45% and accelerating dashboard refresh cycles.
Developed dimensional data models in Snowflake for policy and claims analytics, improving ad-hoc query performance by 50% for finance and operations teams.
Implemented data quality checks using Great Expectations, reducing downstream data issues by 25% and supporting compliance initiatives.
Supported CI/CD for data pipelines using GitHub Actions and Terraform, ensuring consistent deployments across environments.

Apache SparkKafkaAWS GlueLambdaStep FunctionsDelta Lake+6

New york state division of criminal justice services

Data Analyst

Sep 2024 – May 2025 · 8 mos

Built end-to-end analytical pipelines using SQL, Python, Snowflake, and dbt to deliver curated datasets for justice and public safety reporting.
Developed modular ELT workflows with dbt, reducing transformation duplication by 35% across arrest, court, and corrections data.
Optimized complex SQL queries on multi-terabyte datasets, improving dashboard refresh times by 40% for executive and operational users.
Designed incremental and historical data models to support monthly, quarterly, and annual statutory reporting with consistent metrics for audits.
Integrated data from APIs and secure SFTP feeds, eliminating 20+ hours/month of manual data preparation.
Implemented Python- and SQL-based validation checks, reducing reporting discrepancies by 30% and improving audit readiness.
Documented datasets and business definitions to enable self-service analytics and reduce ad-hoc support requests.

SQLPythonSnowflakedbtData AnalysisData Engineering

Tata consultancy services

Data Analyst

May 2021 – Aug 2023 · 2 yrs 3 mos

Analyzed large-scale mobile advertising data using SQL, Hive, and Python, generating insights from 100M+ daily ad impressions.
Built automated ETL pipelines with Hadoop and Apache Spark, reducing reporting latency by 45% for sales and marketing teams.
Designed fact and dimension tables to standardize key ad-tech metrics (CTR, CPA, ROAS) across analytics teams.
Tuned Spark jobs by optimizing executor settings and partition strategies, lowering compute costs by 20%.
Developed Python-based anomaly detection scripts to identify traffic and revenue spikes, enabling faster investigations by fraud and ad quality teams.
Partnered with product and business stakeholders to convert requirements into SQL-driven dashboards for campaign optimization.
Automated recurring reports using Airflow DAGs, saving 15 hours per week of manual reporting effort.

SQLHivePythonHadoopApache SparkData Analysis+1