Omesh Patil

Software Engineer

Bengaluru, Karnataka, India18 yrs 6 mos experience
Highly Stable

Key Highlights

  • Over 15 years of experience in data engineering.
  • Expert in building data lakehouse platforms.
  • Proven track record of cost optimization in cloud environments.
Stackforce AI infers this person is a Data Engineering expert in FinTech and AdTech sectors.

Contact

Skills

Core Skills

Data LakehouseApache SparkApache AirflowDatabricksData WarehousingAwsSql

Other Skills

Agile MethodologiesAmazon Elastic MapReduce (EMR)Amazon RedshiftAmazon Web Services (AWS)Analysis ServicesApacheApache KafkaBusiness IntelligenceCloud ComputingData LakesDatabase DesignDatabasesDeep LearningDimensional ModelingDocker

About

With over 15 years of experience, I specialize in developing data-intensive applications and addressing complex architectural and scalability challenges for hyper-growth AdTech, FinTech, and e-commerce companies. Specialties: Programming and Data Engineering: Python, Scala, Apache Airflow, Databricks, Snowflake, Redshift, Apache Spark, Spark Streaming, Kafka, Flink, Presto Cloud Technologies: AWS (S3, EMR, Athena, Glue, Kinesis, RDS, Lambda), Azure Data Lakehouse Platforms: Expertise in building Open Data Lakehouse Platforms using open file formats like Delta and Iceberg ETL and Data Quality Frameworks: Proficient in designing data ingestion, ETL, and data quality frameworks with Python, Scala, Spark, Apache Airflow, Nifi, Talend, Azure Data Factory and SSIS Cost Optimization: Strategic optimization of cloud and Databricks costs Performance Tuning: Advanced performance tuning of ETL processes, Spark jobs, SQL queries, stored procedures, and database servers Data Modeling and Pipeline Design: Expertise in designing and optimizing data models and ETL pipelines Database Management: Comprehensive experience in database design, development, architecture, data modeling, and performance tuning for Microsoft SQL Server, MySQL, and PostgreSQL in production environments Data Warehousing and BI: Proficient in data warehousing, business intelligence, and cube development with SSAS Multidimensional models BI Reporting Tools: Skilled in using BI reporting tools such as Tableau, Metabase, Redash and SSRS CI/CD and Orchestration: Experience with CI/CD and orchestration tools including Bamboo, Docker, Kubernetes, and Terraform Data Governance and Security: Enforcing best data and security practices, ensuring compliance with industry regulations Leadership and Mentorship: Providing technical leadership and guidance, mentoring data engineering teams

Experience

18 yrs 6 mos
Total Experience
3 yrs 4 mos
Average Tenure
1 yr 10 mos
Current Experience

Coupang

Staff Data Engineer

Aug 2024Present · 1 yr 10 mos · Bengaluru, Karnataka, India

  • Building the next-generation data platform for e-commerce, fulfillment, warehouse management, supply chain management, and transportation systems. Architecting, designing, and developing data ingestion, data lakes, data warehouses, and data marts to drive AI and data-driven decision-making across various teams at Coupang.

Cred

Data Architect

Jan 2021Jul 2024 · 3 yrs 6 mos · Bengaluru, Karnataka, India

  • Led the migration of Petabyte-scale data lake from Redshift to open data lakehouse platform using Spark engine and Delta, Iceberg file formats, enhancing availability and significantly reducing operational costs.
  • Achieved annual cost savings of over $1 million through strategic optimization of storage, infrastructure layers, and ETL queries, resulting in reduced AWS cloud and Databricks cluster expenses.
  • Architected and developed self-serve batch and streaming data ingestion and ETL platforms utilizing Airflow, Spark, EMR, Kafka, Flink, and Databricks, achieving streamlined configuration and deployment.
  • Boosted Data Ingestion platform efficiency through Delta and Iceberg file format tuning, leveraging technologies such as Apache Spark, DuckDB, RocksDB, and Scala, and employing partition-to-partition joins.
  • Designed and developed a reconciliation platform featuring robust data models and ETL pipelines to process millions of transactions from diverse sources.
  • Established Data Quality Platform using Great Expectations and Airflow for ensuring data integrity.
  • Enforced best data and security practices, ensuring compliance with industry regulations.
  • Mentored Data Engineering team, offering technical leadership and guidance.
Data LakehouseScalaPythonSQLApache SparkApache Airflow+5

Hypersonix inc.

Principal Data Engineer

May 2020Jan 2021 · 8 mos · Bengaluru, Karnataka, India

  • Designed and implemented end-to-end data solutions, including data warehouses, data lakes, batch and streaming ETL processes, and self-service BI tools for the retail and digital media sectors.
  • Developed dimensional models and ETL pipelines for fulfilment, inventory, and user journey analysis for retail and digital media clients.
ScalaSQLApache SparkSnowflakeApache AirflowDatabricks+2

Kabbage, inc

Sr. Data Engineer

Apr 2017Apr 2020 · 3 yrs · Bengaluru, Karnataka, India

  • Established AWS cloud based Data Lake using Apache Spark, Databricks, Delta, and Athena, ensuring robust data storage and processing capabilities.
  • Developed Python-based data pipelines leveraging Apache Airflow and Databricks to extract, clean, process, and ingest data from SQL Server, Postgres, SFTP, and REST APIs into the Data Lake.
  • Collaborated with the Data Science team to enhance feature calculations, suppression, and machine learning model scoring processes, transitioning from Pandas to PySpark on Databricks for improved parallel processing efficiency.
  • Migrated and optimized 5k lines of code for fraud monitoring from SQL Server to Spark SQL, enhancing performance and scalability.
  • Engineered schema finder job using Spark to extract complex nested JSON raw data, transforming and writing it to Parquet format for efficient storage and retrieval.
  • Contributed to the development of a streaming platform for event processing from Kafka and Spark Streaming, ensuring real-time data processing capabilities.
  • Conducted POC and engineered Apache Airflow Infrastructure setup using Bamboo, Docker, DC/OS, and Kubernetes, enhancing workflow automation and management.
  • Actively participated in Agile and Scrum meetings, facilitating Sprint Planning, Retrospective, and Daily Standups for efficient project execution and collaboration.
  • Involved in on-call rotations for production ETL Data pipelines (~10% of work)
ScalaSQLApache SparkApache AirflowDatabricksMicrosoft SQL Server+3

Media.net

Sr. Database Engineer

Nov 2011Mar 2017 · 5 yrs 4 mos · Mumbai, Maharashtra, India

  • Managed a portfolio of 50+ MySQL and SQL Servers housing extensive OLTP and OLAP databases, ensuring optimal performance and reliability.
  • Engineered robust and scalable ETL data pipelines using Python, SSIS, SQL Server, and MySQL to handle billions of monthly page views.
  • Developed and optimized critical processes including spam detection, revenue auditing, and Affiliate referral using MySQL, SQL Server procedures, Apache Spark, Redis, and Kafka.
  • Designed and implemented interactive reports such as Drill Down, Drill through, and Cross-tab reports using SQL and MDX for comprehensive publisher and admin reporting.
  • Conducted performance tuning of databases, queries, and stored procedures, enhancing overall system efficiency.
  • Constructed data warehouse and OLAP cube with SSAS attribute relationships and hierarchies to optimize MDX query performance.
  • Automated routine maintenance and monitoring tasks using Python, Shell Script, and Linux Crontab, improving operational reliability and efficiency.
  • Facilitated setup and migration of database instances from on-premise to AWS Cloud platform, ensuring seamless transition and optimal resource utilization.
SQLSSASApache SparkData WarehousingSQL Server Integration Services (SSIS)Microsoft SQL Server+3

Hostway corporation

Module Lead Database

Aug 2007Oct 2011 · 4 yrs 2 mos · Mumbai, Maharashtra, India

  • Engineered scalable distributed data processing framework utilizing replication and Sharding to process 1 TB of data daily across 40+ MySQL database servers, handling 500M queries/day.
  • Developed data models for managing customer payouts, ensuring data integrity and accuracy.
  • Conducted fine performance tuning of MySQL servers and optimized Ad serving processes to accommodate optimal concurrent requests.
  • Engineered and optimized data pipelines for efficient data flow, implementing data aggregations for processing raw Ad click and Impression data.
  • Enhanced business-critical Ad Serving and Revenue Auditing processes for high throughput and reliability.
SQLShell ScriptingMySQL

Education

Thadomal Shahani Engineering College

BE — Computer Engineering

Jan 2003Jan 2007

Stackforce found 100+ more professionals with Data Lakehouse & Apache Spark

Explore similar profiles based on matching skills and experience