Joseph M.

Data Engineer

New York, New York, United States12 yrs 3 mos experience

Highly Stable

Key Highlights

Built scalable data platforms processing exabytes of data.
Reduced data latency from 24 hours to 6 hours.
Automated ETL pipelines, saving significant engineering hours.

Stackforce AI infers this person is a Data Engineering expert with a focus on SaaS and Fintech solutions.

Contact

Skills

Core Skills

Data EngineeringCloud ComputingData QualityReal-time Data ProcessingData Science

Other Skills

Azure DatabricksData PipelineData MigrationSystem DesignFivetranDBTGreat Expectationsk8sApache StormAWS DynamoDBSparkData ModelingAPI DevelopmentETLAutomation

About

Over the last decade, I've built highly scalable distributed data platforms and helped companies scale to processing multiple exabytes of data. My mission is to bring software practices followed by top tech companies to data engineering and help data engineers level up. I help data engineers land high paying tech jobs and significantly up skill themselves. If you want to learn more, you can join my newsletter: → GO HERE: https://www.startdataengineering.com/news-letter/ If you want to join my free Data Engineering 101 Program: → GO HERE: https://www.startdataengineering.com/email-course/ Twitter: @startdataeng YouTube: @startdataengineering

Experience

12 yrs 3 mos

Total Experience

2 yrs

Average Tenure

1 yr 4 mos

Current Experience

Netflix

Senior Data Engineer

Jan 2025 – Present · 1 yr 4 mos · New York City Metropolitan Area · On-site

* Working on Data Engineering - Ads team.

Senior Data Engineer

May 2022 – Mar 2024 · 1 yr 10 mos

Worked on migrating data pipelines from legacy in-house system to Azure Databricks, reducing data latency from 24h to 6h.
Designed and build data quality systems to proactively catch issues quickly, reducing user-reported issues by 60%.

Startdataengineering

Data Engineer & Founder

Mar 2020 – Present · 6 yrs 2 mos

Narrativ

Senior Software Engineer, Data

Jul 2019 – May 2022 · 2 yrs 10 mos · New York City Metropolitan Area

Designed and built source of truth fact and dimension tables and ELT infrastructure with Fivetran and DBT. Established best practices and conventions, enabling other teams to build their data marts. This enabled data freshness monitoring, common source of truth tables, CI/CD for data pipelines, and better data quality tests.
Designed and built data validation system using Great Expectations and k8s tasks to enable API-driven validation of data providing results via UI. This lead to a reduction in engineering hours spent from about 3 days per sprint to less than 15min work for the end user(non-engineer).
Updated product inventory ingestion data pipeline to use Snowflake instead of Postgres to get the new/updated product data. This lead to a reduction in data processing time from about 2hours to less than 10min.
Designed and built a data pipeline factory in Airflow, with config metadata that can be modified via REST APIs. This reduced engineering hours spent from 2 days per sprint to less than 10 min of work for the end user(non-engineer).
Worked on real-time data processing and enrichment of clickstream events in Apache Storm with AWS Dynamo DB to prevent over or under-spending on the allocated budget. This enabled other systems to have accurate up to date spend amounts.
Designed and built a cache data structure on Redis that enables fast lookup of bid metrics based on attributes of a click. The cache was refreshed via Airflow. This feature led to an increase in client spend by about 5mil in the first 2 months.
Designed and built a data pipeline to consolidate similar products without using unique ids, using Word2Vec, Spark, Snowflake, and Airflow. This enabled bidding on additional products per auction leading to an increase in spending of 9%.

Annalect

Senior Data Engineer

Feb 2018 – Jun 2019 · 1 yr 4 mos · New York City Metropolitan Area

Designed and built data models for different types of TB scale data such as geolocation, clickstream, purchase, viewership data.
Designed and built data pipelines using Spark. This reduced data processing time from about 7h to less than 1h.
Designed and built APIs to enable application users to send data to multiple partners. This provided users with one central control platform instead of manually uploading data into the individual partner portal.
Worked on migrating data from AWS Redshift to properly partitioned dataset on S3. This enabled the use of AWS Redshift Spectrum reducing warehouse cost by about 80%.
Set up Apache Airflow for data pipeline orchestration and scheduling, leading to a reduction in data freshness and correctness issues by about 75%.
Worked on DSL to allow application users to join datasets visually. This leads to end-users being able to perform complex joins and aggregates.

Hudson data

Data Scientist

Apr 2016 – Feb 2018 · 1 yr 10 mos · New York City Metropolitan Area

Led project to automate ETL pipelines for high availability of data and significantly reduced wait time for analyses.
Developed graph analysis algorithm to capture fraud signals resulting in savings of 5M dollars.
Designed and deployed an ML pipeline in Python to help clients take immediate action on possible fraudulent policies in the vehicle insurance industry.
Reduced the execution time of a query to <1s from 45s by de normalizing the data.

Indus valley partners

Associate Software Developer

Feb 2015 – Mar 2016 · 1 yr 1 mo · New York

Developing software for hedge fund clients focusing on commercial real estate sector, using C#, ASP.NET, Tortoise svn, MS SQL.
Developing custom applications using AngularJS

New york city transit

College Aide

Jun 2014 – Dec 2014 · 6 mos · New York City Metropolitan Area

Built a web application for MTA employees using Java, javascript and jetty servlet engine.
The web application incorporated multiple python scripts, oracle stored procedures and C++ executable files.
The application was built on hibernate framework.

Polaris financial technology limited

Consultant

Oct 2012 – Jun 2013 · 8 mos · Greater Chennai Area

Worked on banking retail sector website using technologies such as Java, EJB, JSP and javascript.
Developed an automated error log system for the entire project using Java.log4j