Souri Nath Datta

Senior Software Engineer

San Francisco, California, United States17 yrs 9 mos experience

Most Likely To SwitchHighly Stable

Key Highlights

8 years of full lifecycle software development experience.
Expertise in Big Data technologies including Hadoop and Spark.
Active participant in patent filing and coding competitions.

Stackforce AI infers this person is a Big Data Engineer with expertise in distributed systems and cloud computing.

Contact

Skills

Core Skills

Apache SparkScalaBig DataHadoop

Other Skills

Data PipelinesETLDistributed Graph ProcessingPigMapReduceOozieScalabilityLinuxAlgorithmsSearchPythonUnixCloud ComputingScrumDistributed Systems

About

I have 8 years of full lifecycle software development experience including 5 years of working with various Big Data technologies - from java api of hadoop mapreduce,hdfs,pig,oozie,hbase,YARN to more recent framework like Apache Spark and scala. Teams I have been part of so far 1. Yahoo Search Indexer backend 2. Data Acquisition and Extraction platform 3. Knowledge Graph platform Other than this : A. Represented Yahoo in multiple University coding competitions around India B. Actively participate in patent filing and innovation/hacking challenges C. Have experience of Agile methodologies and being a scrum master

Experience

17 yrs 9 mos

Total Experience

3 yrs 6 mos

Average Tenure

4 yrs

Current Experience

Apple

Senior Software Engineer

May 2022 – Present · 4 yrs · Cupertino, California, United States

Zillow group

Principal Software Engineer

Feb 2020 – Jun 2022 · 2 yrs 4 mos · San Francisco Bay Area

Trulia

2 roles

Staff Software Engineer

Feb 2019 – Feb 2020 · 1 yr · San Francisco Bay Area

Senior Big Data Engineer

Jan 2017 – Jan 2020 · 3 yrs · San Francisco Bay Area

Working for Property Intelligence team in Trulia's Data Engineering group.

Yahoo

2 roles

Tech Yahoo, Senior Software Development Engineer

Mar 2016 – Dec 2016 · 9 mos · San Francisco Bay Area

Using Apache Spark, Scala to build the platform for Yahoo Knowledge Graph.
I have set up data pipelines for wikipedia (incremental and batch mode), Freebase - from setting up acquisition to ETL jobs converting the raw data to a common data model/ontology (based on schema.org).
Currently, working on distributed graph processing features of Knowledge Graph platform handing both huge scale and optimizing performance.
Also, managed a standalone Spark cluster of 100+ nodes from doing weekly releases to maintaining a production environment handling any issue that arises.
.

Apache SparkScalaData PipelinesETLDistributed Graph Processing