Sai Sneha Chittiboyina

Software Engineer

United States1 yr 5 mos experience

AI EnabledAI ML Practitioner

Key Highlights

10+ years in data engineering and database consulting.
Expert in scalable cloud-native data architectures.
Proven track record in healthcare data compliance.

Stackforce AI infers this person is a Data Engineering expert in the Healthcare and Logistics sectors.

Contact

csaisneha@gmail.com LinkedIn

Skills

Core Skills

Data EngineeringMongodbBig Data EngineeringAwsEtl Development

Other Skills

DashboardsAWS SageMakerHadoopDockerGoogle Kubernetes Engine (GKE)GitKinesis Data StreamsAWS GlueLambdaRedshiftAurora PostgreSQLPySparkSnowflakeData PipelineAWS Identity and Access Management (AWS IAM)

About

I am a Data Engineer and Database consultant with 10+ years of experience in designing and optimizing scalable, cloud-native data architectures across AWS, Azure, and GCP. I specialize in big data processing, real-time analytics, and ETL development using Apache Spark, Hadoop, Airflow, and Databricks. Experienced Database Consultant specializing in MongoDB administration, schema design, query optimization, replication, sharding, and backup/restore strategies for enterprise-scale healthcare applications. Proven expertise in designing, implementing, and securing MongoDB databases with advanced authentication, authorization, encryption, and compliance-driven access controls to safeguard PHI and PII under HIPAA, HITRUST, and GDPR standards. Strong background in integrating MongoDB with AWS and GCP cloud ecosystems, leveraging services such as S3, Glue, EMR, Redshift, BigQuery, Dataflow, and Pub/Sub to deliver hybrid, scalable, and cloud-native data solutions. I have built and optimized data lake and warehouse solutions with Snowflake, Redshift, Synapse, and BigQuery, leveraging best practices in indexing, partitioning, and query optimization. My expertise extends to real-time streaming with Kafka, Spark Structured Streaming, Flink, and Delta Lake, enabling low-latency, event-driven processing. On the cloud side, I design scalable data workflows with AWS Glue, ADF, and GCP Dataflow, integrating NoSQL and RDBMS systems. I also implement CI/CD pipelines with Terraform, Jenkins, Docker, and Kubernetes for automated deployments.

Experience

1 yr 5 mos

Total Experience

1 yr 5 mos

Average Tenure

Current Experience

Cigna healthcare

Senior Data Engineer|MongoDB

Jun 2023 – Present · 2 yrs 11 mos · New Jersey, United States · Remote

DashboardsAWS SageMakerData EngineeringMongoDB

Bny investments mellon

Senior Big Data Engineer

Aug 2020 – May 2023 · 2 yrs 9 mos · New York, United States · On-site

HadoopAWS SageMakerBig Data Engineering

Costco wholesale

Senior Big Data Engineer

Oct 2017 – Jul 2020 · 2 yrs 9 mos · Washington, United States · On-site

DockerGoogle Kubernetes Engine (GKE)Big Data Engineering

Ups

Senior Data Engineer

Dec 2015 – Sep 2017 · 1 yr 9 mos · Atlanta, Georgia, United States · On-site

At UPS, I designed and developed real-time and batch ingestion pipelines on AWS, capturing millions of daily package scans, IoT telematics, and GPS events. Real-time pipelines used Kinesis Data Streams, Firehose, and Lambda, landing data into S3, while batch pipelines leveraged AWS Glue (PySpark), Step Functions, and Lambda to consolidate transportation, warehouse, and route planning data into a centralized S3 data lake with schema evolution handling.
I modeled curated datasets in Redshift and Aurora PostgreSQL using star and snowflake schemas, supporting reporting on transportation costs, service-level KPIs, and fleet utilization. Spark ETL pipelines on EMR with Hive applied transformations, custom UDFs in Scala/Python, and enrichment rules to standardize and cleanse delivery event streams. Partitioning, bucketing, ZSTD compression, and Parquet/ORC formats optimized storage and query performance.
Workflows were orchestrated using MWAA (Managed Airflow) and Step Functions, with DAG dependencies, backfilling, and failure recovery via SNS alerts. Data quality frameworks with PyDeequ, Great Expectations, and Python scripts ensured completeness, timeliness, and anomaly checks. Feature pipelines were delivered into SageMaker Feature Store for ML models predicting delivery ETAs, demand, and route optimization.
I built dashboards in Kinesis Analytics and QuickSight, providing visibility into deliveries, warehouse bottlenecks, and exceptions. CI/CD pipelines were automated using CodePipeline, CodeBuild, Terraform, and GitHub Actions. Governance was enforced with Lake Formation, IAM roles, Redshift security, and S3 encryption (GDPR/CCPA). EMR clusters were tuned for performance and cost using auto-scaling, spot instances, and Spark optimizations. These efforts enabled scalable, automated, and secure cloud-native data pipelines that enhanced operational efficiency, analytics reliability, and real-time visibility across UPS logistics operations.

AWS SageMakerGitData EngineeringAWS

Ceva logistics

ETL Developer

May 2014 – Oct 2015 · 1 yr 5 mos · India · On-site

At UPS, I designed and optimized large-scale data engineering solutions on AWS to support logistics and supply chain analytics. I analyzed ETL pipelines, reviewed SQL queries, and tuned Amazon Redshift workload management queues to remove inefficiencies, achieving up to 40% faster runtimes. By strategically applying distribution/sort keys, compression encoding, and partition pruning, I improved scalability and reduced latency for millions of daily package events.
I built and supported enterprise-grade data warehouses on Redshift and Snowflake, designing dimensional models with star/snowflake schemas and implementing Slowly Changing Dimensions (SCD2) for historical accuracy. Using Amazon S3, AWS Glue (PySpark), and Redshift Spectrum, I developed automated pipelines integrating structured and semi-structured data. Orchestration with AWS Step Functions and Lambda reduced manual effort, while PyDeequ, Great Expectations, and CloudWatch monitoring with SNS/Slack alerts ensured data quality and SLA compliance.
I consolidated data ingestion across transactional systems, APIs, and legacy databases with AWS Glue, Informatica, and Data Pipeline, standardizing processes for reliability. Security and governance were enforced using IAM roles, RBAC, and KMS encryption, ensuring compliance with GDPR/CCPA.
I also optimized query performance in Redshift and Snowflake through indexing, clustering, and compression, enabling interactive analytics in Amazon QuickSight dashboards with automated refreshes. Automated Informatica sessions and Glue workflows with Lambda and Step Functions improved resilience, scalability, and monitoring with CloudWatch/X-Ray logging.
These efforts delivered a secure, high-performance, and automated cloud-native data ecosystem that streamlined ingestion, accelerated analytics, reduced costs, and improved operational visibility across UPS’s logistics network.

AWS SageMakerPySparkETL DevelopmentData Engineering