Soham Ghosh

AI Researcher

Hyderabad, Telangana, India7 yrs 7 mos experience

Most Likely To SwitchHighly Stable

Key Highlights

Expert in Big Data and Data Engineering.
Proven track record in optimizing procurement strategies.
Strong experience in developing data-driven solutions.

Stackforce AI infers this person is a Data Engineering expert in the E-commerce and SaaS industries.

Contact

Skills

Core Skills

Data EngineeringBig DataBusiness Intelligence (bi)

Other Skills

AlgorithmsAmazon Web Services (AWS)Apache SparkAzure Experimentation StudioAzure SQLBig Data AnalyticsCosmosData StructuresHadoopHiveMapReducePresentation SkillsPySparkPythonQuickSight

About

Experienced Data Engineer with a demonstrated history of working in the internet industry. Skilled in Big Data Analytics, Apache Spark, Big Data, Amazon Web Services (AWS), and Algorithms. Strong business development professional with a Bachelor’s Degree focused in Computer Science from Thapar Institute of Engineering and Technology.

Experience

7 yrs 7 mos

Total Experience

3 yrs 9 mos

Average Tenure

3 yrs 10 mos

Current Experience

Microsoft

2 roles

Senior Data & Applied Scientist, Windows & Devices (W+D) - Windows, Store & Developers

Promoted

Mar 2024 – Present · 2 yrs 2 mos · Hyderabad, Telangana, India

Working on WINDOWS DEVELOPER 360 PLATFORM to understand the developer pain points and build solutions to drive developer success.

Data & Applied Scientist II, Windows & Devices (W+D) - Microsoft Store

Jun 2022 – Feb 2024 · 1 yr 8 mos · Hyderabad, Telangana, India

• Developed STORE EXPERIMENTATION PLATFORM for testing of features like search algorithms, feedback loop personalisations, UI improvements for Microsoft Store and XBox Store by analysing clickstream telemetry data using Scope, Cosmos, Azure SQL, Synapse Analytics and Azure Experimentation Studio.

ScopeCosmosAzure SQLSynapse AnalyticsAzure Experimentation StudioData Engineering+1

Amazon

3 roles

Data Engineer, Amazon Seller Services - WorldWide Pricing (WWP)

Apr 2021 – May 2022 · 1 yr 1 mo

Developed UNIFIED PRICE COMPARISON pipeline using SQL, PySpark and Redshift to measures the overall price competitiveness of products sold on Amazon by third-party(3P) merchants and amazon retail with the competitors and provide actionable insights. I upgraded the entire pipeline to ingest competitor classification data from a new tech system, resulting in a 5% increase in overall price competitiveness.
Created S3P-ONCALL-DASHBAORD for real-time monitoring of metrics like disk utilisation, table statistics, job run status, Spectrum and Cradle costs for the entire S3P org. By tracking the complete upstream and downstream lineage, it provides alerting capabilities by spotting failed job executions. The tool saves roughly 1.5 hours each day by eliminating need for manual tracking of more than 500 production jobs for failures and highlighting actionable items for on-call.

SQLPySparkRedshiftData EngineeringBig Data

Business Intelligence Engineer, Amazon Business - Financial Planning and Analysis (FP&A)

Jan 2020 – Mar 2021 · 1 yr 2 mos

Built BEST SOURCING STRATEGY (BESSY) tool using SQL, Redshift and QuickSight based on decision tree classification model that optimises the procurement of goods by selecting the best inbound sourcing strategy for vendors and manufacturers across EU. BESSY resulted in lowering procurement costs and increasing inventory availability, resulting in faster delivery and lower price points for customers. This project contributed to a 10% increase in overall savings (EUR 2.5 billion) for FY20-21.
Expanded GLOBAL LISTING service to new marketplaces Netherlands, Poland and Sweden. Global Listing is a vendor self-service solution that allows vendors to expand their current catalogue choices to new marketplaces in the EU region without interacting with a vendor manager or Amazon service. The project assisted in the onboarding of 22 million products from current vendor catalogues into newly launched marketplaces.
Created FORTUNE tool for publishing monthly Profit & Loss (PnL) reports for multiple businesses across all product categories. The projects aims to automate the entire process of preparing PnL statements by creating an end to end data pipeline jobs.

SQLRedshiftQuickSightBusiness Intelligence (BI)Data Engineering

Big Data Engineer, Amazon Web Services (AWS) - Elastic MapReduce (EMR)

Jun 2018 – Dec 2019 · 1 yr 6 mos

Implemented EMR LOG ANALYSIS tool using Python for quickly identifying and visualising issues such as cluster provisioning errors, throttling & system errors, memory/disk/CPU issues and network latency with Elastic MapReduce (EMR) service by analysing Hadoop system logs, Application logs and backend AWS infrastructure logs. The project aided in the reduction of the service team's overall ticket response time from 2.5 days to 1 day.
Optimizing job flows (Spark/Hive/MapReduce) by analysing YARN and applications logs based on partition analysis, caching techniques, serialization/compression methods, and the YARN resource configuration.
Designing cloud architectures using open source Hadoop platforms like Hive/Hue, Spark, Presto and Big Data services offered by AWS like EMR, Athena, Redshift, S3, DynamoDB and Glue to meet customer requirements.

PythonHadoopSparkHiveBig DataData Engineering