Vamsi Krishna Chandaluri

Data Engineer

Jersey City, New Jersey, United States3 yrs 5 mos experience

Highly Stable

Key Highlights

Built scalable ETL pipelines processing terabytes of data daily.
Implemented CI/CD workflows reducing release times by 45%.
Developed ML models achieving up to 93% detection precision.

Stackforce AI infers this person is a Data Engineering and Machine Learning specialist in the SaaS industry.

Contact

Skills

Core Skills

Data EngineeringEtlDevopsMachine LearningData Analysis

Other Skills

Apache SparkScalaGCP DataprocApache AirflowPythonSQLCI/CDGitCloud BuildJenkinsAWSGCPAWS CloudWatchRobot FrameworkBayesian Optimization

About

Data Engineer with 5+ years of experience building scalable data platforms and distributed systems using Apache Spark, PySpark, and Apache Kafka. Experienced in designing ETL pipelines, real-time and batch data processing, and implementing data quality frameworks to ensure reliable and accurate data. Strong exposure to DevOps practices including CI/CD, Docker, Kubernetes, Terraform, and cloud deployments on AWS and GCP, along with production monitoring and performance tuning. Proficient in BigQuery, AWS Redshift, PostgreSQL, and Apache Airflow, with knowledge of QA processes such as automated testing and data validation to maintain system reliability.

Experience

3 yrs 5 mos

Total Experience

3 yrs 5 mos

Average Tenure

Current Experience

Walmart global tech

Data Engineer

May 2025 – Present · 1 yr 1 mo · Arkansas, United States · On-site

● Built distributed ETL pipelines using Scala and Apache Spark on GCP Dataproc and serverless environments, processing ~2–4 TB of structured and semi-structured data daily with ~30% faster batch completion times.
● Orchestrated production data workflows with Apache Airflow, managing 40+ DAGs and achieving ~99% on-time data availability for downstream analytics systems.
● Designed a unified data model integrating eCommerce and in-store datasets, improving cross-channel reporting consistency and reducing data reconciliation effort by ~35%.
● Developed Looker semantic models and dashboards consumed by 25+ business users, enabling faster insight generation and reducing manual reporting turnaround by ~40%.
● Implemented automated data validation and monitoring within Spark pipelines, detecting anomalies early and reducing data quality incidents by ~28%.
● Performed targeted data transformations and exploratory analysis using Python and SQL tools, accelerating root-cause analysis and validation cycles by ~20%.
● Partnered with architects and analysts during requirement gathering and design reviews, contributing to scalable data modeling decisions supporting 10+ downstream use cases.
● Established CI/CD workflows for ETL deployment using Git, Cloud Build, and Jenkins, cutting release rollout time by ~45% and improving version traceability.
● Designed agentic automation workflows to coordinate multi-step data operations such as validation, anomaly triage, and insight extraction, reducing manual investigation effort by ~25%.
● Built LangChain-based LLM utilities for querying pipeline metadata and summarizing dataset characteristics, improving developer productivity in data discovery tasks.
● Developed AI assistants supporting operational analytics and automated pipeline health checks, helping reduce mean time to detection of failures by ~22%.

Apache SparkScalaGCP DataprocApache AirflowPythonSQL+6

Viasat

2 roles

Data Engineer / DevOps & QA

Apr 2020 – Dec 2022 · 2 yrs 8 mos · India · On-site

● Ingested inflight connectivity databus streams, extracting operational logs and telemetry data, and loaded structured datasets into AWS S3 and GCP Cloud Storage for analytics and ML workflows.
● Developed Python utilities to parse, clean, and transform connectivity and diagnostic logs, enabling structured storage in BigQuery and Redshift, reducing manual preprocessing by ~35%.
● Automated log extraction and dataset preparation across AWS and GCP, accelerating ML experimentation and reporting.
● Built Python and Java automation scripts for CI/CD build, deployment, and environment validation, cutting manual operational effort by ~30%.
● Supported Jenkins-based CI/CD pipelines, improving deployment repeatability and reducing release preparation time by ~25%.
● Assisted with containerized deployments and environment configuration across AWS and GCP, enhancing cross-environment consistency.
● Implemented automated monitoring and log collection using AWS CloudWatch and GCP logging, reducing troubleshooting time by ~20%.
● Collaborated with platform teams to validate configurations, deployment readiness, and rollback procedures for inflight software updates.
● Automated service validation and operational checks using Python and Robot Framework, increasing release confidence and minimizing post-deployment issues.
● Developed Python-based test automation integrated with Robot Framework, boosting regression coverage and reducing manual testing by ~40%.
● Executed functional and integration testing in simulated AWS and GCP environments, enabling earlier defect detection.
● Built reusable automation utilities for log validation, API verification, and environment health checks, speeding defect investigation.
● Participated in defect tracking, test reporting, and release validation, resolving ~80% of issues within sprint timelines.
● Collaborated with development teams to reproduce issues, validate fixes, and ensure CI/CD quality gates prior to deployment.

PythonAWSGCPJenkinsCI/CDAWS CloudWatch+3

Machine Learning Engineer

Jun 2019 – Mar 2020 · 9 mos · India · On-site

● Extracted and prepared telemetry data from internal databus streams, creating curated datasets for anomaly detection model training and evaluation across inflight connectivity metrics.
● Developed ML-based anomaly detection models using statistical and machine learning techniques, improving detection precision to ~90–93%.
● Applied Bayesian optimization for hyperparameter tuning, enhancing model performance, stability, and reducing manual effort.
● Accelerated model tuning workflows with Python multiprocessing, cutting experimentation runtime by ~40% and enabling faster ML iteration cycles.
● Performed feature engineering and data preprocessing on large-scale metrics, improving signal quality and reducing false positives by ~20%.
● Supported exploratory data analysis and validation of anomaly patterns, helping engineering teams understand connectivity behavior and recurring failure scenarios.
● Leveraged AWS EC2 for scalable model experimentation and batch processing workloads, facilitating efficient training and evaluation.
● Stored processed datasets and intermediate outputs in AWS S3 and GCP Cloud Storage, improving data accessibility and collaboration across ML and analytics teams.
● Utilized BigQuery for structured data warehousing and analytical queries, reducing manual data preparation time for model evaluation by ~30%.
● Developed reusable Python utilities for dataset extraction, transformation, and validation, improving repeatability of ML workflows.
● Collaborated with data engineers and platform teams to ensure reliable data availability and consistency across ML pipelines and warehouse environments.

PythonAWSGCPBayesian OptimizationMachine Learning

Udbhata technologies private limited

Tech Intern/Data Analyst

May 2018 – Aug 2018 · 3 mos · India

● Developed a Python-based text extraction service to parse and process PDF documents, reducing manual document processing effort by ~40%.
● Built reusable parsing utilities and exposed them as a lightweight service, enhancing accessibility of extracted data for downstream analytics workflows.
● Implemented stock price prediction experiments using Monte Carlo simulation in R/RStudio, enabling probabilistic forecasting of market trends for analytical research.
● Performed data preprocessing and statistical analysis for simulation inputs, improving prediction reliability and minimizing data inconsistencies.
● Contributed to ML initiatives for GRC (Governance, Risk, and Compliance) risk prediction, preparing datasets and training models to identify potential organizational risk indicators.
● Leveraged AWS EC2 and S3 for scalable model experimentation, data storage, and batch processing, supporting efficient ML workload execution.
● Conducted exploratory data analysis and feature preparation for GRC datasets, improving signal quality and aiding development of predictive risk scoring models.
● Developed ad-hoc application components using Angular and Java to support internal tooling and data visualization, enhancing usability of analytical outputs.
● Assisted in debugging, testing, and enhancement of application modules, contributing to improved stability and smoother feature integration cycles.

PythonRAWSData Analysis