Piyush Raj — Director of Engineering

Passionate AWS Cloud Engineer with a proven track record of designing, implementing, and optimizing cloud-based solutions to drive operational efficiency and scalability. With a solid foundation in AWS services and a deep understanding of DevOps methodologies, I excel in architecting resilient, highly available, and cost-effective infrastructures for diverse business needs. My expertise spans across the AWS ecosystem, including but not limited to EC2, S3, RDS, Lambda, IAM, CloudFormation, and more. I streamline deployment pipelines and accelerate software delivery while ensuring security and compliance standards are met. Also, I am a Data Engineer with expertise in designing scalable databases and data modeling to drive efficient data storage and analytics. I have a strong background in developing and enforcing data governance principles, ensuring data quality, security (RBAC, IAM), and compliance (GDPR) across both cloud and on-premise environments. Passionate about optimizing data pipelines, real-time processing, and cloud-based solutions, I thrive in building high-performance data architectures that empower businesses with actionable insights. I thrive in dynamic environments, collaborating cross-functionally to align technical solutions with business objectives. Whether it's implementing infrastructure as code practices, or orchestrating containerized applications with Kubernetes, I am committed to driving innovation and continuous improvement. I am dedicated to staying at the forefront of emerging technologies and best practices in cloud computing and automation. Let's connect to discuss how I can contribute to your team's success in navigating the complexities of the cloud landscape and achieving strategic goals.

Stackforce AI infers this person is a Data Engineering and Cloud Infrastructure expert in the SaaS and Fintech industries.

Location: Toronto, Ontario, Canada

Experience: 12 yrs 1 mo

Skills

Aws Cloud Engineering
Data Engineering
Infrastructure As Code

Career Highlights

Expert in AWS Cloud Engineering and Data Engineering.
Proven track record in designing scalable data architectures.
Strong leadership in managing cross-functional engineering teams.

Work Experience

Virtusa

Architect / Engineering Manager (9 mos)

Infosys

Lead AWS Cloud Engineer/Architect (2 yrs 2 mos)

Lead AWS Cloud Infrastructure Engineer (1 yr 1 mo)

Lead AWS Cloud Data Engineer (3 yrs 7 mos)

Tata Consultancy Services

Information Technology Analyst (1 yr 5 mos)

Infosys

Technology Analyst (1 yr 8 mos)

Senior System Engineer (3 yrs 7 mos)

Education

Bachelor of Technology (B.Tech.) at RNS Institute of Technology - India

Computer Science at DAV Public School

Piyush Raj

Director of Engineering

Toronto, Ontario, Canada12 yrs 1 mo experience

AI ML PractitionerAI Enabled

Key Highlights

Expert in AWS Cloud Engineering and Data Engineering.
Proven track record in designing scalable data architectures.
Strong leadership in managing cross-functional engineering teams.

Stackforce AI infers this person is a Data Engineering and Cloud Infrastructure expert in the SaaS and Fintech industries.

Contact

Skills

Core Skills

Aws Cloud EngineeringData EngineeringInfrastructure As Code

Other Skills

HadoopHiveSparkAWSAirflowDatabricksPySparkETLKubernetesCloudFormationSQLApache AirflowAWS EKSGitHubApache Spark

About

Experience

12 yrs 1 mo

Total Experience

12 yrs 1 mo

Average Tenure

Current Experience

Virtusa

Architect / Engineering Manager

Aug 2025 – Present · 9 mos · Toronto, Ontario, Canada

Infosys

3 roles

Lead AWS Cloud Engineer/Architect

Apr 2023 – Jun 2025 · 2 yrs 2 mos

AWS Cloud Practitioner, Cloud Infrastructure and Data Engineering
Led end-to-end data warehousing initiatives using Hadoop, Hive, Spark, and AWS to support enterprise-wide analytics and reporting for Apple's business units.
Designed and deployed scalable, cloud-native data pipelines on AWS using Airflow, Databricks, and PySpark to support batch and streaming workloads.
Developed and orchestrated complex ETL workflows using Apache Airflow, ensuring data integrity, lineage, and timely delivery across multiple downstream systems.
Engineered high-performance Spark-based data processing jobs, optimizing execution plans, resource utilization, and check pointing strategies to reduce latency and cost.
Fine-tuned SQL and PySpark queries to enhance performance and minimize execution time, achieving up to 40% improvement in job efficiency.
Designed, built and maintained containerized data applications using Kubernetes, supporting seamless deployment, scaling, and monitoring across development and production environments.
Implemented CI/CD pipelines for data workflows using CloudFormation and Git, enabling automated deployments and reducing time-to-market for new features.
Designed and managed large-scale data lake architectures on AWS S3, leveraging Iceberg tables and IRC Catalog for schema evolution and efficient querying.
Collaborated with cross-functional teams to troubleshoot data bottlenecks, monitor pipeline performance, and continuously enhance reliability and scalability.
Enforced data governance, validation, and compliance checks within pipelines, ensuring high-quality data delivery for analytics, ML, and reporting use cases.

HadoopHiveSparkAWSAirflowDatabricks+7

Lead AWS Cloud Infrastructure Engineer

Promoted

Feb 2022 – Mar 2023 · 1 yr 1 mo

Designed and developed fault tolerant and scalable Kubernetes cluster using AWS EKS for running streaming/batch jobs, hosting databricks , dremio and trino platforms from scratch.
Implemented and optimized load balancing solutions to improve application availability, reliability, and uptime across environments.
Developed and maintained Infrastructure as Code (IaC) using tools like CloudFormation and GitHub to ensure consistent, automated, and version-controlled infrastructure deployments. Strengthen, security and high availability by implementing robust monitoring practices and ensuring uniform enforcement of security policies across all environments.
Led a team of 10+ engineers to deliver solutions for enterprise-level clients.
Achieved 20% reduction in data processing time by optimizing Apache Spark clusters.
Boosted system reliability by 50% by integrating Prometheus and Grafana for real-time monitoring.
Streamlined cloud resource usage, resulting in a 25% cost reduction for client projects.

KubernetesAWS EKSCloudFormationGitHubApache SparkPrometheus+3

Lead AWS Cloud Data Engineer

Jun 2018 – Jan 2022 · 3 yrs 7 mos

Cloud InfrastructureInfrastructure as CodeAWS Cloud Engineering

Tata consultancy services

Information Technology Analyst

Jan 2017 – Jun 2018 · 1 yr 5 mos · Bhubaneswar, Odisha, India · On-site

Description: EDH project was for banking domain and overall design of Hive object. This project was to manage data ware and provide semantic data to end user to extract reports and dash boarding. Also, managed any fraud transaction which we needed to be reversed.
Responsibilities:
Designed and implemented Hive objects for banking data warehouse, enabling efficient data management and semantic reporting for end-users.
Improved fraud transaction management, ensuring accurate reversals and data integrity.
Pioneered EDH project architecture, streamlining data flow and improving accessibility for report generation and dashboard creation.
Proficient in designing Avro schema for Hive tables and managing schema evolution to accommodate changes in data structure and format.
Good experience Hive serialized data processing best practices, such as choosing appropriate serialization formats and codecs, optimizing data compression and encoding, and avoiding serialization overhead in data processing
Experienced in handling schema compatibility issues in Hive, such as adding or removing fields, changing field types or names, and handling default values and nullability
Coordinated with cross-functional teams to optimize data warehouse performance, resulting in faster query processing and improved user experience.
Designed Hive query optimization techniques, such as subquery un nesting, predicate pushdown, and
vectorization, and their impact on query performance and resource utilization.
Meticulously managed data integrity throughout the ETL process, ensuring accuracy and reliability of financial reports and analytics.

HivePythonData Engineering

Infosys

2 roles

Technology Analyst

Apr 2015 – Dec 2016 · 1 yr 8 mos · On-site

CDP was project which came for Personal Data - General Data Protection Regulation (GDPR). This required the data masking for the existing and new PII data being ingested in iTunes. Design and implementation were major scope of this project.
Apple iRadio/Music Table Induction:
Overall design of the Hive object for each application.
Incorporate user requirements with effective hive table design.
Creation of Metadata process, Jobs to generate flat files from oracle source.
Loading of flat file and transforming wherever required.
Expertise in querying Hive tables using SQL-like syntax and performing data analysis using tools like Apache Spark.
Apple Near Real Time, NRT:
Overall design of the hive object for the NRT application.
Using Storm to join and extract data in a file and providing the file for the downstream applications reporting
team.
Used AutoSys to schedule the job to put the file coming from Storm after joining at the source.
Familiarity with Hive performance tuning tools, such as Hive Query Profiler, Hive Query Plan Visualization, and Hive Load Testing Tools, and their features and limitations
Merge IO-ORC Conversion For iTunes Core:
Table file format change after the release of hive ORC.
Complete redesign of the table with updated partition and bucket.
Changing the custom code created by business which help update in hive while reading which extended the ORC class, we called it as MergeIO code (Project Name).
Reloading of entire table with massive data of table size e.g 100TB and replicating same on the disaster cluster using Distcp tool.
Designed Hive table formats, including ORC, Parquet, and Avro, and their advantages and disadvantages for different use cases.