Kalyani Deshmukh

DevOps Engineer

San Francisco, California, United States9 yrs 9 mos experience
Most Likely To SwitchHighly Stable

Key Highlights

  • Led a team of 10 data engineers.
  • Improved ETL process runtime by over 80%.
  • Developed AI-based solutions for business intelligence.
Stackforce AI infers this person is a Data Engineering and Data Science professional with expertise in cloud services and ETL processes.

Contact

Skills

Core Skills

Data EngineeringCloud ServicesData ScienceGeospatial AnalysisEtl ProcessesBusiness Intelligence

Other Skills

AWSAWS EC2AirflowAnchorsApache KafkaApache NiFiApache SparkApache SqoopArcGIS ProBig DataBusiness Intelligence (BI)CC++CSSCore Java

About

Experienced in leading a team of data engineers or individually developing technical solutions to business problems. Graduate with MS in Software Engineering with a dual specialization in Enterprise Distributed Systems and Data Science. Total 9 years experience as a software data engineer, Eager to learn and take on greater responsibility.

Experience

Accenture ai

Data Engineering Consultant

Jun 2021Present · 4 yrs 9 mos · San Francisco Bay Area

  • Leading a team of 10 data engineers in building and maintaining scalable data pipelines, improving data processing efficiency by 80%.
  • Collaborating with business and cross functional teams to understand data requirements and assess the impact of changes.
  • Significantly improved runtime of a daily ETL process from 18 hours to 3 hours and a monthly process from 36 hours to 14 hours by improving the design, optimizing SQL, refactoring legacy code.
  • Rewrote 5 process from Hive, Shell, Jenkins to Snowflake, Python and Airflow, AWS Lambda.
  • Deployed and managed application on AWS. Troubleshoot and debug production issues and managed business communications.
  • Built an end-to-end workstream for 3 streaming datasets till dimensional data model in snowflake using snowflake Kafka connector, spark, python, airflow.
  • Introduced a batch monitoring framework for the ETL processes orchestrated with Airflow DAG. This framework was implemented in 300+ ETL pipelines which helped reduce manual efforts in monitoring and debugging by 60%.
  • Build outbound load process from snowflake to AWS S3, Snowflake to Oracle and Snowflake to DashDb using AWS Lambda.
  • Refactored existing code to improve readability, maintenance of code, in some cases, it increases performance of application.
  • Used Spark SQL, Data Frame API to load parquet data and consumed Azure APIs to ingest on-prem files to Azure Blob Containers.
  • Developed ETL/ELT pipelines using Azure Databricks for transforming and manipulating large datasets using Spark's Data Frame APIs.
  • Created analytical data model, performed data aggregation to reduce number of records from 5.8B to 3.5B and data size by 12%.
  • With this the query time decreased from 35 mins to 22 mins. Supported Data Scientist to derive variables and deciles.
  • Developed ETL data pipeline jobs orchestration, workflow scheduling, and monitoring with tools like Apache Oozie, Airflow, Azure Data Factory, and Azure Functions.
Distributed ComputingCloud ServicesApache SparkSQLPythonAWS+4

San jose state university

Data Scientist

Sep 2020May 2021 · 8 mos · San Francisco Bay Area · On-site

  • Develop and optimize zonal statistics functions in Python that are used by IPUMS-Terra to transform GIS data
  • Utilize Jupyter and Python profiling, Numpy and Pandas libraries to reduce computing time by 83.9%
  • Implement multiprocessing engines in Python to achieve a 13x speedup
  • DNA Assembly using NLP.
  • Assembled DNA data using different tools(Ex. HPC, Megahit, Barrnap) to compare the performance and identify the best tool for this research. Performed clustering to identify new species.
  • Guide: Dr. Jorjeta Jetcheva, Dr. Carlos Rojas ,and Dr. William Andreopoulos
  • Working at the Department of Urban and Regional Planning with a Geospatial Data Scientist Dr. Ahoura Zandiatashbar on the RIVAS project.
  • Performed data cleaning, attribute selection, feature transformation such as Shapefile to GeoJSON using ArcGIS Pro.
  • Loaded data from a variety of file formats such as CSV, Shapefile, and GeoJSON of sizes up to 28 MB to interactive Leaflet maps in different overlay map layers with the user’s ability to select map layers on checkbox selection.
  • Added heatmap, markers, circles based on attribute value. Provided legends for better user experience and understanding.
  • Deployed application on SJSU servers.
  • Tools and Technology: ArcGIS Pro, JavaScript, Python, Leaflet, HTML, CSS, jQuery, AWS EC2
PythonJupyterNumpyPandasArcGIS ProJavaScript+5

Accenture

Application Data Engineer

Feb 2018Aug 2019 · 1 yr 6 mos · Pune Area, India

  • Played a pivotal role within an agile team at Accenture as a Data Engineer, contributing expertise to enhance Protect myTech, a groundbreaking cybersecurity solution to uphold compliance with 25 security measures across workstations.
  • Programmed Hadoop jobs for analyzing data using Hive Query Language on 492,000 machine compliance data.
  • Performed indexing, query optimization, performance tuning to decrease SQL query time by 15%.
  • Orchestrated Talend jobs for streaming batch processing, ensuring seamless data management.
  • Developed consumer/producer streaming spark job for data collection and reporting.
  • Clean and preprocess the data to remove noise and inconsistencies using PySpark and SQL.
  • Collaborated with cross-functional teams, including data engineers and operations, to implement ETL processes. Optimized SQL queries for efficient data extraction aligned with analytical requirements.
  • Leveraged supervised learning to identify the pattern in the data to predict the noncompliant workstation. This helped in improving the notification mechanism to reduce the number of non-compliant workstations by 40%.
  • Created dynamic reports utilizing Qlik and Power BI, tailored precisely to business requirements.
HadoopHiveTalendPySparkSQLQlik+3

Cognizant

Data Engineer

Aug 2014Jan 2018 · 3 yrs 5 mos · Pune Area, India

  • Orchestrated data extraction, transformation, and modeling processes using ETL pipelines. Designed and optimized database schemas with 32 tables for efficient data storage and retrieval.
  • Designed and implemented key performance indicators (KPIs) to track business metrics effectively on Tableau dashboard.
  • Developed SQL queries for comprehensive data validation, unit testing, integration testing, and performance testing. Achieved 100% test coverage to ensure the accuracy and reliability of analytical outputs.
  • Utilized JIRA for efficient project management, enabling seamless collaboration among team members and stakeholders. Prioritized tasks and managed project timelines effectively to ensure timely delivery of solutions.
  • Implemented monitoring and logging mechanisms for proactive issue resolution and performance optimization.
  • Built ‘’Persona Analytics” an AI-based solution for Cognizant leaders and stakeholders that helped leaders to get a personality analysis of over 100 clients.
  • Conducted thorough data cleaning, outlier detection, and feature scaling to ensure data quality. Employed dimensionality reduction techniques such as Principal Component Analysis (PCA) to handle high-dimensional data efficiently.
  • Utilized Python, Flask, Sklearn, and Pandas for backend development and data manipulation.
  • Integrated streaming APIs and web crawling techniques to gather social media data for analysis.
  • Leveraged historical sales data for model training and validation. Achieved 86% Sales Prediction Accuracy.
  • Deployed analytical solutions on AWS EC2 instances for scalability and reliability.
  • Awarded a rising star in recognition of consistent performance and accomplishments.
ETLSQLTableauPythonFlaskSklearn+3

Education

San José State University

Master of Science - MS — Computer Software Engineering

Jan 2019Jan 2021

INIFD Deccan Pune

Diploma — Interior Design

Jan 2016Jan 2018

Government College of Engineering, Amravati.

Bachelor’s Degree — Computer science and engineering

Jan 2010Jan 2014

Stackforce found 100+ more professionals with Data Engineering & Cloud Services

Explore similar profiles based on matching skills and experience