Saumya G.

Software Engineer

Seattle, Washington, United States8 yrs 7 mos experience

Highly Stable

Key Highlights

6+ years of experience in data engineering.
Expertise in building scalable backend systems and APIs.
Led AI-driven projects with significant customer impact.

Stackforce AI infers this person is a Data Engineer specializing in AI-driven solutions within the SaaS industry.

Contact

sg5290@nyu.edu LinkedIn

Skills

Core Skills

Data EngineeringCloud ComputingMachine LearningSoftware Development

Other Skills

Apache SparkPythonSQLAmazon Web Services (AWS)Agile MethodologiesJavaRedshiftData ArchitectsSparkAWSPython (Programming Language)Optical Character Recognition (OCR)OCRShell ScriptingC

About

I’m a Software Engineer specializing in data with 6+ years of experience building scalable backend systems, APIs, data architectures, and AI-driven solutions at Amazon and Morgan Stanley. I’ve led projects in distributed systems, compliance automation, and real-time monitoring, with a strong focus on reliability, cloud-native design, and customer impact. At Alexa Smart Home, I developed large-scale ETL pipelines using Python, SQL, Spark, and AWS, and collaborated with Data Scientists to support LLM-powered features for Alexa Plus. I also built real-time monitoring pipelines to track key business metrics while ensuring compliance. Previously, at Morgan Stanley, I developed automated document workflows utilizing Java and OCR, and built machine learning models to enhance signature verification accuracy. I bring together strong software engineering and machine learning skills with a passion for solving data problems that drive real product impact. I’m especially motivated by roles where I can own systems end-to-end, tackle complex challenges, and collaborate across teams.

Experience

8 yrs 7 mos

Total Experience

2 yrs

Average Tenure

3 mos

Current Experience

Netflix

Engineer

Feb 2026 – Present · 3 mos

Amazon

Engineer

May 2022 – Jan 2026 · 3 yrs 8 mos · Seattle, Washington, United States · On-site

Worked with the Alexa Smart Home team, focused on designing and developing scalable ETL pipelines using SQL, Spark, Python, EMR, Redshift, and other AWS technologies to support Alexa Plus services. This work enhances conversational AI capabilities, particularly for predictions and routines in Smart Home’s LLM initiative.
Designed scalable microservices and APIs using Python and AWS for Smart Home LLM latency tracking (Time-To-First- Token/ Time-To-Full-Output) with real-time monitoring and alerting pipelines which in turn reduced latency by 35%.
Led compliance initiatives for all Smart Home data stores which ensured safe, secure and compliant AI systems.
Designed an internal tool to manage a Redshift Data Warehouse containing multi-petabytes of data, which accounts for 80% of all Smart Home datasets, leading to a 12% increase in efficiency and an 8% reduction in costs.
Analyzed large-scale data in Data Lake, built end-to-end data pipelines and delivered reporting solutions to provide actionable insights for stakeholders in the Alexa Smart Home, Devices team and Marketing team.

Apache SparkPythonSQLAmazon Web Services (AWS)Agile MethodologiesJava+4

Morgan stanley

Application Developer

Nov 2019 – Mar 2022 · 2 yrs 4 mos · New York City Metropolitan Area · On-site

Worked as an AI/ML Engineer on Optical Character Recognition, and automated Signature Verification for financial forms and travel receipts to reduce manual effort by 40%.
Trained and deployed a Checkbox Detection CNN model; boosted accuracy by 25% by adding layers, batch normalization, and dropout using TensorFlow (GPU with CUDA/cuDNN).
Built and implemented a document classification system using TF-IDF and Support Vector Machines to accurately classify document types.
Designed and integrated real-time hedge allocation models for the Mortgages business, enabling a dynamic response to market changes.

Python (Programming Language)JavaMachine LearningOptical Character Recognition (OCR)SparkSoftware Development

Nyu center for data science

Adjunct for Big Data

Jan 2019 – May 2019 · 4 mos · New York City Metropolitan Area

- Tutoring for Prof Brian McFee. Leveraging the knowledge of Big Data Technologies.

Courant institute of mathematical sciences

Adjunct for Intro to Programming Language

Aug 2018 – Dec 2018 · 4 mos · New York City Metropolitan Area

- Tutoring under Professor Adam Meyers for Python Programming Language. Leveraging the knowledge of Python, Programming Languages.

Viome

Scientific Research Intern

Jun 2018 – Aug 2018 · 2 mos · New York City Metropolitan Area

Implemented Action Plans in Java to enhance the recommendation engine that suggests specific diet & supplements to address user’s health conditions.
Optimized the knowledge base and database by writing evolutions scripts in SQL.
Automated Integration Testing in shell script and wrote Unit Tests in Java to test the massive codebase.

Nyu it

Research Technology Specialist

Feb 2018 – May 2018 · 3 mos · New York City Metropolitan Area

Researching on Big Data projects, analyzed data on Spark, Hive and Map Reduce on HPC cluster.
Training and instructing students on Apache Spark, Apache Hive and Map Reduce.
Currently instantiating OpenStack Cloud Computing environment.

Fractal

Engineer

Jul 2015 – Jun 2017 · 1 yr 11 mos · Mumbai Metropolitan Region

Developed a capability “Twitter Sentiment Analysis” using Big Data Technologies – Apache Spark, Storm, Kafka and Camel by using Java and SQL database.
As a Proof of Concept, individually modeled Machine Learning and NLP algorithms to evaluate factors causing prospect/opportunity losses in Java. Increased the efficiency by retraining the models by 44%.
Individually automated the global taxonomy extraction as a Proof of Concept. Later, handled and drove the team as it became a full fledged product for the proprietary harmonization platform.
Architected datasets and analyzed them on Big Data technologies such as Hive and Impala for further visualizations in Tableau. Automated the process on Oozie to provide recommendations for compensations and rebates.
Designed components for proprietary product “Concordia” for data cleaning and mining.
Optimized the framework from R to Hive to make it 11% faster for a Fortune 500 client.

Juniper networks company

Intern

Jan 2015 – Jun 2015 · 5 mos · Greater Bengaluru Area

Modularized and rewrote the proprietary debugger (Jdebug) in Python used in Juniper OS; increasing the readability and efficiency.
Programmed a tool “include-what-you-use” which detected and removed the libraries not being used, therefore making the execution faster by 8%.