Chaitanya ReddyReddy

DevOps Engineer

United States8 yrs 4 mos experience
AI ML PractitionerAI Enabled

Key Highlights

  • Over 14 years of IT experience with a focus on data engineering.
  • Expert in designing scalable data architectures across multiple platforms.
  • Strong background in Cloud Data Engineering and DataOps best practices.
Stackforce AI infers this person is a Data Engineering expert with extensive experience in Cloud solutions and Big Data architectures.

Contact

Skills

Core Skills

Data EngineeringCloud Data EngineeringCloud MigrationAi/ml SolutionsFull Stack DevelopmentCloud Solutions

Other Skills

Azure DatabricksAzure Data FactoryPython (Programming Language)HadoopApache SparkScalaJava DevelopmentData ArchitectsData LoadingClouderaData ArchitecturePostgreSQLMedallion ArchitectureGxP Data Quality & GovernanceAzure Synapase Analytics

About

Highly skilled Data Engineer with over 14+ years of IT experience, including 10+ years specializing in data engineering. Proven expertise in designing, building, and optimizing scalable, high-performance data architectures across Big Data, cloud, and real-time analytics. Proficient in Big Data frameworks such as Apache Hadoop (HDFS, Hive, MapReduce), Apache Spark (PySpark, Spark Streaming), Apache Flink, and Elasticsearch/OpenSearch, with extensive experience in distributed computing platforms like Databricks, Amazon EMR, Google Dataflow, and Azure Synapse. Expert in programming languages including Python (6+ years), Java (7+ years), and Scala (9+ years), with a strong foundation in ETL/ELT design, data pipeline development, and data modeling (Dimensional Modeling, OLAP/OLTP, Star/Snowflake Schema). Extensive hands-on experience in real-time and batch data processing using Apache Kafka, Spark Streaming, Pulsar, and Amazon Kinesis, with strong data pipeline orchestration skills utilizing Apache Airflow, Prefect, Dagster, Apache NiFi, Azure Data Factory (ADF), and Apache Oozie. Proficient in SQL & NoSQL databases, including MySQL, PostgreSQL, Oracle, SQL Server, Apache HBase, MongoDB, and DynamoDB, with expertise in data warehousing technologies such as Snowflake, BigQuery, Amazon Redshift, and Azure Synapse Analytics. Skilled in Lakehouse architectures with Delta Lake, Apache Iceberg, and Hudi. Strong background in Cloud Data Engineering, working across AWS (EMR, Glue, S3, Lambda, Redshift, Kinesis), Azure (Data Factory, Databricks, Synapse, Microsoft Fabric), and GCP (BigQuery, Dataflow, Pub/Sub), with hands-on experience in Infrastructure as Code (IaC) using Terraform, AWS CloudFormation, and Azure Bicep. Deep knowledge of DataOps & DevOps best practices, including CI/CD pipelines (GitHub Actions, Jenkins, GitLab CI), data governance (GDPR, CCPA, HIPAA compliance), data security, metadata management, and data quality frameworks (Great Expectations, dbt). Experience with containerization and orchestration using Docker, Kubernetes (EKS, AKS, GKE), along with monitoring and logging solutions such as Grafana, Prometheus, Datadog, AWS CloudWatch, ELK Stack, and Splunk. Familiarity with serverless computing & event-driven architectures, leveraging AWS Lambda, Azure Functions, GCP Cloud Functions, and message-driven workflows using Kafka, AWS EventBridge, SNS, and SQS. Knowledgeable in MLOps and AI/ML integration, working with ML pipelines (MLflow, TFX, SageMaker) and Feature Stores (Feast, Databricks Feature Store) for machine learning-powered data solutions.

Experience

8 yrs 4 mos
Total Experience
2 yrs 9 mos
Average Tenure
--
Current Experience

Jll technologies

Lead Data Engineer & AI/ML Solutions Architect – Azure | AWS | ADB | Spark | MS Fabric | Snowflake

Dec 2022Present · 3 yrs 5 mos · Remote

  • Designed and implemented Azure Databricks Medallion architecture (Bronze–Silver–Gold layers) supporting enterprise analytics and data quality initiatives.
  • Developed ETL/ELT pipelines using ADF, PySpark, and Delta Lake, integrating structured and unstructured data from SQL Server, APIs, and Lake Gen2.
  • Collaborated with QA and Compliance teams to implement GxP-aligned data validation, lineage tracking, and audit controls.
  • Defined data models, dictionaries, and semantic layers for analytical reporting and visualization in Power BI.
  • Leveraged Purview and Unity Catalog to establish governance, metadata management, and RBAC security.
  • Optimized Databricks clusters for performance and cost efficiency (30% reduction in compute time).
  • Optimized Snowflake workloads by leveraging clustering, partitioning, caching, and materialized views, cutting compute costs by 30% while improving dashboard responsiveness.
  • Enabled observability and monitoring with AWS CloudWatch, Databricks log analytics, and custom Python scripts, reducing incident resolution time by 25%.
  • Mentored and coached a cross-functional team of 5+ engineers on PySpark performance tuning, Snowflake optimization, and DevOps CI/CD pipelines (GitHub Actions, Terraform, Docker, Kubernetes).
  • Partnered with data scientists to operationalize ML/GenAI pipelines using SageMaker, MLflow, and Python APIs, integrating LLMs and RAG workflows via OpenAI API, LangChain, and vector databases (Pinecone, Weaviate).
Azure DatabricksAzure Data FactoryPython (Programming Language)HadoopApache SparkScala+11

Synechron

Senior Data Engineer & Cloud Migration Specialist – GCP | AWS | Azure | Spark | Microservices

Jun 2021Nov 2022 · 1 yr 5 mos · Hyderabad, Telangana, India · Remote

  • Finance on Cloud (FOTC) Program Development: Spearheaded the development of core components for HSBC's FOTC program, automating data processing and report generation using Big Data frameworks like Spark, Scala, and Hadoop Ecosystem tools
  • Improved operational efficiency and enhanced data accessibility for stakeholders through modern ETL and orchestration workflows
  • Big Data and Spark Optimization: Designed and optimized large-scale data pipelines using Apache Spark with Scala, achieving a 25% increase in data processing efficiency
  • Utilized HDFS, YARN, and Hive for distributed data storage and query optimization, ensuring faster analytics on structured and semi-structured datasets
  • ETL Development with AWS Glue: Built and deployed highly efficient ETL pipelines using AWS Glue with Python and Scala
  • Leveraged Glue's dynamic frames and data frames APIs to enable seamless integration with services like S3, Redshift, RDS, and DynamoDB
  • Designed visual workflows in Glue Studio, streamlining data extraction, transformation, and loading
  • Data Lake Creation and Monitoring: Developed and maintained centralized data lakes on AWS Glue to facilitate scalable data storage and processing
  • Configured monitoring and logging mechanisms using AWS CloudWatch and Glue logs, ensuring quick resolution of errors and proactive job management for enhanced system reliability
  • Data Workflow Automation with Azure Data Factory: Automated and orchestrated data workflows across cloud platforms using Azure Data Factory (ADF)
  • Designed robust data pipelines to extract, transform, and load data into Azure Data Lake Storage, Synapse Analytics, and Azure SQL Database, reducing manual efforts and accelerating analytics readiness
Apache SparkBig DataScalaGoogle Cloud Platform (GCP)Amazon Web Services (AWS)Java Development+8

Monetary authority of singapore (mas)

Senior Data Engineer & AI/ML Solutions Specialist – Cloudera | AWS | Spark | Java | Microservices

Jan 2020May 2021 · 1 yr 4 mos · Singapore · On-site

  • Managed and optimized the MAS Enterprise Data Lake (EDL) on Cloudera Hadoop (HDFS, Hive, HBase), improving system throughput by 35% while achieving 100% compliance with MAS regulatory standards and enforcing RBAC and encryption policies.
  • Designed and implemented distributed data pipelines using Apache Spark (Scala, PySpark) for both batch and real-time processing, resulting in a 40% improvement in structured and semi-structured data transformation efficiency.
  • Built scalable ETL frameworks on HDFS, Hive, and Apache Oozie, enabling parallelized and compressed ingestion pipelines that reduced ingestion time by 50%.
  • Enhanced Hive data modeling using advanced partitioning, indexing, and bucketing techniques to support OLAP workloads, cutting query execution latency by 30%.
  • Automated batch data workflows using Apache Oozie, improving reliability and reducing manual operations by 60%.
  • Collaborated across business, analytics, and compliance teams to deliver trusted, high-availability datasets for BI and regulatory reporting, accelerating insight delivery timelines by 40%.
  • Contributed to data infrastructure readiness for future Lakehouse expansion by aligning with Apache Iceberg and Delta Lake adoption paths and introducing data contracts and schema evolution strategies.
  • Developed STRE (Stress Testing & Risk Engine) applications on Cloudera Hadoop, integrating regulatory models (Basel III/IV, CCR) and reducing end-to-end risk calculation latency by 30% through Spark-based parallel processing.
  • Engineered Spark ETL pipelines (Scala, PySpark) for processing structured and semi-structured datasets from banking and trade systems, achieving 40% processing efficiency gains and enhancing financial model scalability.
  • Designed microservice-based ingestion systems using Spring Boot and REST APIs, improving modularity and reducing dependency bottlenecks across compute layers.
Apache SparkBig DataScalaPostgreSQLJava DevelopmentData Architects+5

Ibm

Big Data Engineer & Full Stack Developer – AWS | Spark | Java | Microservices

Apr 2016Nov 2019 · 3 yrs 7 mos · Hyderabad Area, India · On-site

  • Enterprise Hadoop Environment Implementation and Management: Designed and supported a robust enterprise Hadoop ecosystem with expertise in Core Java, Hadoop Distributed File System (HDFS), and cluster administration
  • Led capacity planning, performance tuning, and high-availability configurations while collaborating with cross-functional teams, including infrastructure, network, database, and BI teams
  • Applied Microservices architecture principles to enhance integration and ensure seamless performance across data engineering systems
  • Comprehensive Utilization of Hadoop Ecosystem Components: Extensively worked with a variety of Hadoop ecosystem tools, including Apache Spark, PySpark, YARN, MapReduce, Hive, HBase, Zookeeper, Pig, and HDFS, to enable end-to-end data processing, analytics, and governance
  • Enhanced processing pipelines by integrating Scala and Python, driving efficiency within big data workflows
  • AWS Glue Integration for Data Lake Solutions: Designed and implemented serverless ETL pipelines with AWS Glue, integrating Glue Crawlers for automated schema discovery and updates
  • Enhanced pipeline performance by leveraging AWS Lambda for automation, implementing data quality checks, and enabling schema evolution in a scalable, cost-effective manner
  • Applied Microservices principles to design modular, reliable data ingestion pipelines
  • ETL Pipeline Design and Development with AWS Glue: Built and managed efficient ETL pipelines in AWS Glue, extracting, transforming, and loading data from diverse sources into centralized data lakes on S3
  • Leveraged Glue's features, such as DynamicFrames and DataFrames, to improve data transformation capabilities and analytics readiness, supporting near real-time data availability
Apache SparkBig DataScalaPostgreSQLData LoadingLarge Scale Systems+5

Autodesk

Data Engineer

Sep 2014Nov 2015 · 1 yr 2 mos · Bengaluru Area, India · On-site

  • Administration and Management of Big Data and AWS Environments: Administered and managed projects such as UCP, EWS, and Web Activity with a strong focus on AWS cloud environments
  • Leveraged advanced knowledge of AWS Redshift, S3, and EC2 to design, manage, and maintain scalable data warehousing solutions
  • Ensured high performance, data quality, and availability by implementing performance tuning techniques, enabling efficient analytics and reporting
  • Optimization and Deployment of Hadoop Environments: Proposed and implemented optimized configurations for Hadoop environments, including HDFS, YARN, and MapReduce
  • Conducted capacity planning and performance optimization for seamless big data processing
  • Deployed new hardware and software solutions to enhance scalability and system responsiveness, ensuring the efficient handling of growing datasets
  • User Management and Cluster Maintenance in Hadoop: Managed Hadoop user accounts and permissions, ensuring secure access and efficient data workflows
  • Administered Hadoop clusters by adding or removing nodes and monitoring cluster health with tools like Ambari, Cloudera Manager, and Core Java for creating custom management solutions
  • Maintained data integrity and optimized resource utilization for improved performance
  • CI/CD and Build Operations for Data Engineering Projects: Orchestrated CI/CD pipelines using tools like Jenkins, Git, GitHub Actions, and Bamboo to streamline automated builds, testing, and deployments
  • Monitored and optimized build processes to ensure efficiency and consistency across data engineering projects
  • Enhanced collaboration through version control and automated workflows
  • Data Quality, Governance, and Availability in HDFS: Ensured the availability and quality of data stored in HDFS by implementing stringent data governance practices and conducting regular validations
Apache SparkBig DataJava DevelopmentClouderaData LoadingApache Kafka+3

Etisbew technology group, inc. (a cmmi level 3 company)

Software Engineer

Jan 2010May 2013 · 3 yrs 4 mos · Hyderabad, Telangana, India · On-site

  • Designing web templates including logos, banner ads, buttons, flash interactions, icons, flash intros, JavaScript and flash animated menus, style sheets etc. Structured and developed full web sites including HTML, XHTML, CSS and JavaScript, JQuery coding from raw level to Dream weaver.
  • CMS Administration, CRM Maintenance, Website Backend support and content management.
  • SEO On page and off page Optimization. Design and Basic HTML Conversion of Web Pages and Coordinating with the Programmers for further changes and adjustments in designing Web Pages. Coordinating with the Business Development Team in making presentations and creating visuals for the business proposals, SEO On page and off page Optimization. Developing and maintenance of the company website. CMS Administration, CRM Maintenance with content management.

Education

University of Madras

Bachelor's degree — Computer Science

Jan 2004Jan 2007

Stackforce found 100+ more professionals with Data Engineering & Cloud Data Engineering

Explore similar profiles based on matching skills and experience