NANDISH KARKI — AI Researcher

Master's student in Data & Knowledge Engineering @ OVGU with 3.3+ years of professional experience @ Clarivate, where I built end-to-end ETL pipelines using AWS Glue, Redshift, PySpark, and Airflow—processing terabytes of data across staging and production environments while implementing CI/CD workflows with Jenkins. Currently working as a Research Assistant (AI/ML & Data Engineering) at Otto-von-Guericke University, designing full-stack deep learning systems for Real time Speech-to-Speech Translation. My work bridges data infrastructure and AI systems—from SQL optimization and ETL development to LLM-based applications with RAG architecture. 🎓 Certified Professional Data Scientist (DataCamp) | SQL Advanced (HackerRank) 💼 Technical Expertise: • Data Engineering: ETL pipelines, AWS (Glue, Redshift, S3), BigQuery, GCP, Apache Airflow, PySpark, SQL query optimization, data quality validation • AI/ML Development: LLMs, RAG (LangChain, ChromaDB, Ollama), PyTorch, TensorFlow, Hugging Face transformers, model deployment, prompt engineering • Data Science: Python (Pandas, NumPy, Scikit-Learn), statistical analysis, data visualization (Matplotlib, Seaborn, Tableau), A/B testing, predictive modeling • DevOps & Tools: Docker, Jenkins CI/CD, Git, GitHub Actions, Postman, REST APIs, Agile methodologies 🔬 Recent Projects: → AI-Powered Learning Assistant: Full-stack RAG application with Flask backend, ChromaDB vector store, and Gradio UI for PDF/DOCX document processing → Audio Steganalysis Pipeline: CNN-based deep learning system for hidden message detection with Dockerized deployment and real-time inference → Production ETL Systems: Built and maintained data pipelines processing 5TB+ monthly across AWS infrastructure 🔍 Seeking Werkstudent opportunities (up to 20 hrs/week) in: • Data Engineering (SQL, ETL, cloud data platforms, data pipelines) • Data Science (analytics, machine learning, statistical modeling) • AI/ML Engineering (LLMs, deep learning, model deployment, MLOps) 📍 Location: Magdeburg, Germany | Open to on-site, hybrid & remote 📫 Let's connect if you're building data-driven or AI-powered solutions!

Stackforce AI infers this person is a Data Engineering and AI/ML specialist in the SaaS industry.

Location: Magdeburg, Saxony-Anhalt, Germany

Experience: 3 yrs

Skills

Ai/ml Development
Data Engineering
Devops

Career Highlights

Built end-to-end ETL pipelines processing terabytes of data.
Designed full-stack deep learning systems for real-time translation.
Certified Professional Data Scientist with advanced SQL skills.

Work Experience

Otto-von-Guericke University Magdeburg

Research Assistant (AI/ML) (7 mos)

Clarivate

Software Engineer (1 yr 6 mos)

Associate Software Engineer (1 yr 7 mos)

Education

Masters at Otto-von-Guericke University Magdeburg

Bachelor of Engineering - BE at Ramaiah Institute Of Technology

NANDISH KARKI

AI Researcher

Magdeburg, Saxony-Anhalt, Germany3 yrs experience

AI ML PractitionerAI Enabled

Key Highlights

Built end-to-end ETL pipelines processing terabytes of data.
Designed full-stack deep learning systems for real-time translation.
Certified Professional Data Scientist with advanced SQL skills.

Stackforce AI infers this person is a Data Engineering and AI/ML specialist in the SaaS industry.

Contact

Skills

Core Skills

Ai/ml DevelopmentData EngineeringDevops

Other Skills

AI/MLAWS GLUAWS GlueAWS LambdaAWS PollyAdvanced GitAirflowAmazon Elastic MapReduce (EMR)AutoencodersAutomationBig DataCI/CDCNNChromaDBData Lakes

About

Experience

3 yrs

Total Experience

3 yrs

Average Tenure

Current Experience

Otto-von-guericke university magdeburg

Research Assistant (AI/ML)

Oct 2025 – Present · 7 mos · Magdeburg, Saxony-Anhalt, Germany

AI/MLData EngineeringDeep LearningAI/ML Development

Clarivate

2 roles

Software Engineer

Promoted

Apr 2023 – Oct 2024 · 1 yr 6 mos · Banglore · Hybrid

Built & operated end-to-end ETL pipelines with AWS Glue, Redshift, S3, PySpark, Airflow, spanning staging → stable → production environments.
Automated Glue workflows via Jenkins CI/CD + Bitbucket, eliminating manual triggers and cutting deployment time ~30%.
Migrated PostgreSQL → Redshift, handling schema mismatches & datatype conversions; added SQL/PySpark validation to ensure parity.
Created reusable data-validation framework (row counts, schema checks, checksums) for UAT vs Prod, reducing manual QA ~60%.
Optimized cloud spend ~25% by pruning redundant S3 data, tuning Glue job parameters, and refining Redshift query patterns.
Improved ETL runtime ~40% through PySpark tuning (partitioning, caching) and SQL optimization.
Applied Redshift best practices: compression tuning, result-set caching, sort/dist keys, and VACUUM maintenance.
Delivered 15+ production releases using Blue-Green (prod1/prod2) with zero downtime.
Managed 11+ Glue jobs via Jenkins with script reviews, static file checks, and formulary table updates.
Partnered with DevOps & API teams on IAM roles, data-flow alignment, and automated orchestration.
Contributed to Agile rituals (planning, triage, retros) with thorough JIRA documentation; mentored junior engineers.
Processed 5TB+ of structured data monthly across AWS Glue, Redshift, and S3 environments
Reduced data refresh time by 30% through Airflow DAG optimization and parallel processing
Improved data quality by implementing Jenkins CI/CD validation checks across 15+ production pipelines

ETL pipelinesAWS GlueRedshiftS3PySparkAirflow+5

Associate Software Engineer

Aug 2021 – Mar 2023 · 1 yr 7 mos · Banglore · Hybrid

Maintained & enhanced end-to-end ETL pipelines across multiple environments, ensuring reliable, on-time refreshes.
Migrated & validated large Payer/Provider datasets using complex SQL (joins, aggregations) with business-rule validation for accuracy.
Removed 100+ GB of duplicate data, reducing DB size to 5 GB and restoring production stability.
Resolved SQL & ETL defects that caused duplication and failures, improving uptime and data consistency.
Automated monthly & annual reports for Adaptive Legacy systems via optimized SQL, cutting manual reporting effort.
Collaborated cross-functionally to design data-driven solutions; active in Agile ceremonies (standups, reviews, retros).

ETL pipelinesSQLdata validationData Engineering