Diksha Gherade

Data Engineer

Pune, Maharashtra, India3 yrs 1 mo experience

Key Highlights

  • Expert in optimizing ETL processes and data workflows.
  • Proven track record in Azure Databricks and data engineering.
  • Holder of multiple Microsoft and Databricks certifications.
Stackforce AI infers this person is a Data Engineering specialist in the SaaS industry.

Contact

Skills

Core Skills

Data EngineeringAzure DatabricksEtlSqlData IngestionData IntegrationWeb ScrapingData Processing

Other Skills

AWS S3Amazon Web Services (AWS)Apache SparkApache TikaAzure CloudAzure Data FactoryAzure SQLBig DataCascading Style Sheets (CSS)CommunicationData GovernanceData ModelingData WorkflowsDatabricksDatabricks Secrets Scope

About

"Driven by a passion for turning raw data into impactful insights through cutting-edge cloud technology. πŸš€ With extensive experience in Azure Data Factory, Azure Databricks, and Azure Services, I bring a solid expertise in SQL, PySpark, and Python. πŸ› οΈ My specialization includes optimizing ETL processes, advancing Data Warehousing solutions, and fine-tuning data pipelines for maximum performance. πŸ’‘ As a holder of 2 Microsoft certifications and 4 Databricks certifications, I am dedicated to fostering innovation in the data realm. Let’s connect and unlock the transformative potential of data together! 🌐 #DataEngineering #Azure #Databricks #ETL #CertifiedProfessional. My professional career experience started with contributing my expertise as a Data Engineer at Celebal Technology, I have gained invaluable experience in the field. Currently I am working as a Associate Consultant Data Engineer at Digivate Labs. Throughout my academic journey, I honed my technical skills in Python, MySQL, and MS-SQL, enabling effective analysis and manipulation of data. In my current role, I am responsible for developing and maintaining data pipelines using Azure Data Factory and Databricks. This allows my organization to harness the power of data-driven decision-making. My hands-on experience extends to my internship as a Cloud and DevOps Engineer at Xenstack, where I focused on cloud infrastructure and deployments. Also i have hands on experience at Microsoft Azure tools and technologies. I am a strong advocate for continuous learning, committed to staying abreast of the latest industry trends and technologies. I am enthusiastic about utilizing my skills to unlock the potential of data for businesses. Let's connect and explore opportunities for collaboration and mutual growth."

Experience

Citiustech

Data Engineer

Jul 2025 – Present Β· 8 mos Β· Pune, Maharashtra, India Β· On-site

  • BigData and EDW.
Big DataEDWData Engineering

Digivate labs

Data Engineer

May 2024 – Jun 2025 Β· 1 yr 1 mo Β· Remote

  • Focused on building expertise in Azure Databricks, pursuing training and certification in data engineering and cloud-based data solutions.
  • Delivered an SME-led demo session on CDC Pipeline.
  • Successfully implemented secure key management by leveraging Databricks Secrets Scope, aligning with enterprise security and governance standards within Databricks.
  • Building robust ETL pipelines and optimizing data workflows with [ Databricks | Pyspark | Spark SQL].
  • Migrated and optimized SQL queries from Redshift to Databricks SQL, supporting dynamic parameters and repeated execution at scale (4,000+ variations for a single query).
  • Ingested 2 TB of data from AWS S3 into Databricks using Auto Loader performed end-to-end validation including schema, row count, and null checks.
  • Automated large-scale query execution with Databricks Workflows and REST APIs to simulate real-user
  • workloads and benchmark performance.
  • Integrated real-time Kafka streams from a MySQL-Debezium pipeline directly into the Databricks Bronze layer implemented deduplication using latest timestamps.
  • Improved query performance by 2–3x using Delta Lake features like Liquid Clustering and Spark
  • optimizations
  • Designed an asynchronous web-scraping workflow using Selenium to extract unstructured text from 200,000+ websites, storing the data in Delta Lake for downstream ML-based article classification using Databricks MLflow.
  • Developed a PDF processing pipeline to ingest 1,000+ downloaded news article PDFs, using Apache Tika to extract text content via OCR, which was then passed to pre-trained LLMs for classification and downstream analytics, with structured outputs stored in Delta Lake.
  • Implemented end-to-end workflows, including data ingestion (web, PDF), unstructured text processing, and Delta Lake storage, reducing manual data processing efforts by 70%
Azure DatabricksPysparkSpark SQLETLData WorkflowsDatabricks Secrets Scope+6

Celebal technologies

Azure Data Engineer

Aug 2022 – Dec 2023 Β· 1 yr 4 mos Β· Jaipur, Rajasthan, India Β· On-site

  • πŸš€ Data Engineer | Celebal Technology PVT LTD
  • With 8 months of INTERNSHIP + 9 months of training at Clebal Technology, I had the privilege of contributing to dynamic projects and honing my skills in the realm of data engineering. Throughout my tenure, I focused on developing robust solutions and gaining hands-on experience with cutting-edge technologies.
  • 🌐 Key Contributions:
  • Azure Data Factory (ADF) Pipeline Development: Spearheaded the design, implementation, and optimization of data pipelines using Azure Data Factory, ensuring seamless data integration and processing.
  • SQL Expertise: Demonstrated proficiency in SQL, leveraging it for efficient database management, query optimization, and data retrieval.
  • Pyspark Development: Applied Pyspark to enhance data processing capabilities, utilizing its powerful features for large-scale data transformations.
  • Databricks Exploration: Worked extensively with Databricks, leveraging its collaborative environment to enhance productivity in big data analytics and processing.
  • πŸ’‘ Proof of Concept (POC): Unity Catalog, SCD Types
  • Unity Catalog POC: Successfully led a Proof of Concept for the implementation of Unity Catalog, showcasing its potential benefits and applications within our data ecosystem.
  • SCD Types Implementation: Played a pivotal role in implementing Slowly Changing Dimension (SCD) types, ensuring accurate historical data representation and integrity in our data warehouse.
  • πŸ“ˆ Learning and Growth:
  • During my 8-month internship and subsequent trainee phase, I actively engaged in continuous learning and skill development. This period allowed me to immerse myself in real-world projects, collaborate with experienced professionals, and deepen my understanding of the data engineering landscape
Azure Data FactorySQLPysparkDatabricksData Engineering

Education

Sanjivani Group of Institutes

Bachelor of Technology - BTech β€” Computer Engineering

Aug 2019 – Jul 2023

Stackforce found 100+ more professionals with Data Engineering & Azure Databricks

Explore similar profiles based on matching skills and experience