Rajaraman B

Software Engineer

Chennai, Tamil Nadu, India6 yrs 9 mos experience

Key Highlights

Over 6 years of experience in data engineering.
Expert in building scalable ETL/ELT pipelines.
Proficient in AWS and Big Data technologies.

Stackforce AI infers this person is a Data Engineer specializing in Fintech and SaaS solutions.

Contact

Skills

Core Skills

Data EngineeringEtlData ArchitectureData Migration

Other Skills

PySparkAWS GlueLambdaSNSS3Apache KafkaKinesisSparkLake FormationStep FunctionsAPI GatewayGitLabApache HiveHDFSApache Sqoop

About

I am a Results-oriented Data Engineer with over 6 years of experience in architecting and deploying high-performance data ecosystems. My expertise lies in transforming raw, complex datasets into scalable, actionable assets using the Big Data/Hadoop stack and AWS Cloud services. Throughout my career, I have specialized in building robust ETL/ELT pipelines that handle massive throughput. By leveraging Apache Spark (PySpark), Airflow, and the AWS suite (EMR, Redshift, Glue. Lambda, SQS, Athena), I focus on optimizing data workflows to reduce latency and improve reliability. Whether it’s managing NoSQL environments like HBase or fine-tuning complex SQL/PLSQL procedures, my goal is always the same: building cost-effective, future-proof data foundations. Core Technical Stack: 🔹 Cloud: AWS (S3, Lambda, EMR, ECS, Athena, Redshift, Glue, SQS) 🔹 Big Data: Apache Spark, Hadoop, Hive, HBase (NoSQL), Python, Pyspark, Databricks 🔹 Orchestration & Tools: Apache Airflow, Unix Shell Scripting 🔹 Databases: SQL, PLSQL, Data Warehousing, Oracle, Postgres I thrive on solving intricate data challenges and am passionate about staying at the forefront of cloud-native data engineering. Let’s connect to discuss data architecture, cloud optimization, or the future of ETL.

Experience

Ust

Senior Data Engineer

Nov 2024 – Present · 1 yr 4 mos

Credit Bureau & Campaign Analytics Platform
Role:
1. Built and optimized event-driven, distributed data pipelines using PySpark, AWS Glue, Lambda, SNS, S3, and Apache Kafka/Kinesis, orchestrated via MWAA (Airflow), powering high-volume credit risk and campaign analytics for datasets exceeding 1 TB.
2. Boosted ETL performance by 30% via Spark optimizations (partitioning, memory tuning, efficient joins, Parquet storage).
3. Engineered streaming pipelines with Kafka producers/consumers and Kinesis Data Streams/Firehose, transforming raw microservice events into S3 data lake assets.
4. Implemented schema enforcement, data validation, and quality gates using DynamoDB-driven rules for pipeline integrity.
5. Deployed automated monitoring with Splunk and SQS, triggering ServiceNow incidents for failures to enhance operational reliability.
6. Secured production pipelines with AWS Secrets Manager, IAM roles, and encrypted S3, adhering to enterprise standards.
7. Collaborated with product/business stakeholders to deliver scalable solutions aligned with KPIs.

PySparkAWS GlueLambdaSNSS3Apache Kafka+3

Capgemini invent

Consultant

Aug 2022 – Oct 2024 · 2 yrs 2 mos

Migration of Data Lake Architecture to Data Mesh Architecture
Role:
1. Designed distributed data platforms with Spark, AWS Glue, Lake Formation, and Step Functions, enabling Data Mesh architecture across 10+ sources for scalable marketing analytics.
2. Built cross-account Spark pipelines using YAML configs, plus scalable ingestion from SAP OData/REST APIs to S3 (1M+ records daily).
3. Proficient in AWS data engineering services including SNS, IAM, Lambda, Glue, S3, Athena, API Gateway, Event trigger and log management for robust data processing and management.
4. Implemented and optimized CI/CD processes utilizing GitLab, AWS CLI & CDK, and CodePipeline, ensuring efficient automated testing, build, release, and deployment.
5. Designed self-service data management systems, facilitating auto ingestion, data cataloguing, automated access control, and life cycle management, enhancing overall data governance.
6. Optimized Spark jobs via partitioning, caching, and broadcast joins, slashing execution time by 40-50%.
7. Integrated curated datasets with Athena and Redshift to power campaign analytics and reporting.
8. Proficient in handling semi-structured (XML, JSON) and unstructured data, performing extraction and ingestion via REST API and ODATA APIs, enriching data ingestion capabilities.
9. Collaborated in Agile environments, delivering solutions while adhering to Agile methodologies, using tools like JIRA and Service Now for project management and tracking.

SparkAWS GlueLake FormationStep FunctionsAPI GatewayGitLab+2

Deloitte

Data Engineer

Apr 2021 – Jul 2022 · 1 yr 3 mos

Integration for the Financial Transactional Project from Redshift to Oracle ADW for Oracle cloud Analytics
Role:-
1. Built scalable big data solutions on the Apache Hadoop ecosystem using Apache Hive and HDFS for storing and querying multi-terabyte structured and semi-structured datasets.
2. Implemented optimized Hive queries with partitioning, bucketing, and Parquet formats, improving query performance for large analytical workloads.
3. Utilized Apache Sqoop to efficiently migrate large datasets between relational databases (Oracle, MySQL) and Hadoop HDFS, enabling seamless integration between traditional and big data platforms.
4. Optimized PySpark jobs by implementing data partitioning, broadcast joins, and caching strategies, reducing job execution time by 30% for datasets over 1 TB.
5. Lead the team of 3 people for Data deliverables
6. Build data pipeline to migrate transactional records from Redshift to Oracle ADW
using GitHub, orchestrate augmented pipeline using Apache airflow,
7. Data modelling and implementation of ETL code using S3 storage, Python, JSON files
8. Performance tuning of the DAG and task implementation.
9.Led the migration of an on-premise data system to AWS EMR, using AWS S3 as the data lake and automating job orchestration with AWS Step Functions and Airflow, reducing operational costs by 40%.

Apache HiveHDFSApache SqoopPySparkData EngineeringETL

Amdocs

2 roles

Software Engineer ( Data Platform )

Sep 2020 – Mar 2021 · 6 mos

It is data migration project where we need to migrate data from the customer’s legacy systems
to Amdocs System
Role:-
1. Supporting the Development Activities for the transformation of Postpaid Customers.
2. Leading the Migration Related Activities for Enterprise Customers.
3. Developed Reconciliation reports genration after migration completed from legacy system
to Amdocs System
4. Creation of stored procedures, functions and other DB objects in Oracle
5. Work on enhancements and performance tuning of SQL queries
6. Created Ad-hoc excel reports based on the client's requirement using Python

OracleSQLData MigrationETL

Software Developer

Mar 2019 – Sep 2020 · 1 yr 6 mos

Telecom Billing Transformation of Postpaid and Enterprise Customers: The project
involves around migrating all the Postpaid Customers of Vodafone across 23 Circles
and Enterprise Customers from BSCS System to Amdocs System
Role:-
1. Supporting the Development Activities for the transformation of Postpaid Customers.
2. Leading the Migration Related Activities for Enterprise Customers.
3. Developed mappings, workflows in the informatica ETL based on the data mapping
4. Creation of stored procedures, functions and other DB objects in Oracle
5. Work on enhancements and performance tuning of SQL queries
6. Created Ad-hoc excel reports based on the client's requirement using Python