Omkar Shewale — Machine Learning Engineer

I’m a Machine Learning Engineer with a Master’s in Computer Science from Illinois Tech and 3+ years of experience designing, optimizing, and deploying intelligent, scalable ML systems for real-world impact. At ServiceNow and Orion Technolab, I delivered production-grade ML pipelines, accelerated inference for large language models (LLMs), and deployed advanced AI solutions that improved performance, reduced latency, and enabled business-critical decisions. Highlights include: • Achieved 5.7× inference speedups and reduced latency by 40%+ by optimizing LLMs with CUDA, TensorRT, ONNX Runtime, and dynamic batching. • Built and deployed end-to-end ML pipelines and MLOps frameworks with Kubernetes, MLflow, Docker, and FastAPI on cloud platforms (AWS, GCP). • Delivered transformer-based NLP systems, generative AI models (GANs, VAEs), and RAG pipelines, driving measurable KPIs like improved churn prediction, ticket resolution, and robust edge-case testing. • Deployed multi-agent systems using LangChain, AutoGen, and Hugging Face Transformers to orchestrate complex IT and analytics workflows. I’m passionate about building fast, reliable, and scalable ML/AI systems — from optimizing GPU inference to deploying distributed AI pipelines that scale under real-world demands. Currently, I’m seeking opportunities as a Machine Learning Engineer, Applied ML Engineer, AI Engineer, ML/AI Infrastructure Engineer, Inference Engineer, or MLOps Engineer, where I can deliver impactful and innovative AI solutions. Core Skills: Machine Learning | Deep Learning | LLMs | Transformers | Generative AI | Inference Optimization | CUDA | TensorRT | Hugging Face | LangChain | RAG | MLOps | Kubernetes | Docker | FastAPI | PyTorch | TensorFlow | NLP | Predictive Analytics | GPU Profiling | Distributed AI Systems

Stackforce AI infers this person is a Machine Learning Engineer specializing in AI solutions for IT services and cloud computing.

Location: San Francisco, California, United States

Experience: 4 yrs 1 mo

Skills

Machine Learning
Mlops
Inference Optimization
Data Engineering

Career Highlights

Achieved 5.7× inference speedups in ML systems.
Reduced latency by 40%+ for real-time AI applications.
Delivered end-to-end ML pipelines with significant efficiency gains.

Work Experience

ServiceNow

Machine Learning Engineer (1 yr 7 mos)

Orion Technolab

Machine Learning Engineer (2 yrs 6 mos)

Education

Master of computer science at Illinois Institute of Technology

Bachelor of Engineering - BE at Savitribai Phule Pune University

Omkar Shewale

Machine Learning Engineer

San Francisco, California, United States4 yrs 1 mo experience

AI EnabledAI ML Practitioner

Key Highlights

Achieved 5.7× inference speedups in ML systems.
Reduced latency by 40%+ for real-time AI applications.
Delivered end-to-end ML pipelines with significant efficiency gains.

Stackforce AI infers this person is a Machine Learning Engineer specializing in AI solutions for IT services and cloud computing.

Contact

omkar202525@outlook.com LinkedIn

Skills

Core Skills

Machine LearningMlopsInference OptimizationData Engineering

Other Skills

AWS SageMakerAdvanced Database OrganizationAirflowAlgorithmsApache KafkaApache SparkApplied MathematicsAzure FundamentalsBigQueryBlockchain DevelopmentC (Programming Language)C++C++ Performance OptimizationCUDACaching & NoSQL Storage

About

Experience

4 yrs 1 mo

Total Experience

2 yrs 6 mos

Average Tenure

1 yr 7 mos

Current Experience

Servicenow

Machine Learning Engineer

Nov 2024 – Present · 1 yr 7 mos · United States

Accelerated inference for AI agents by optimizing large language models with ONNX Runtime and NVIDIA Triton Inference Server, achieving 40% lower latency in real-time IT service desk responses using GPU-based parallel processing.
Developed and deployed GPU-accelerated MLOps pipelines using CUDA and TensorRT, integrated with LangChain and AWS SageMaker, reducing model inference time by 30% for high-throughput workflows.
Designed intelligent AI agents with LangChain and Lang Graph, leveraging Hugging Face Transformers for NLP tasks, resulting in a 25% reduction in ticket resolution time through enhanced intent detection and response generation.
Engineered real-time data processing pipelines with Apache Spark and Pandas to support AI agents built with Lang Graph, integrating with Snowflake and PostgreSQL for seamless data access, powering Tableau dashboards for operational insights.
Fine-tuned deep learning models using PyTorch and LangChain for predictive maintenance and workflow automation, deploying via FastAPI and REST APIs to production systems, boosting process efficiency by 20%.
Designed and deployed multi-agent systems with AutoGen to orchestrate complex IT operations, leveraging TensorFlow and Scikit-learn for predictive analytics, improving incident prioritization accuracy by 18% in high-volume environments.

CUDATensorRTONNX RuntimeLangChainAWS SageMakerPyTorch+8

Orion technolab

Machine Learning Engineer

Jan 2021 – Jul 2023 · 2 yrs 6 mos

Designed and deployed end-to-end ML pipelines with Kubernetes and MLflow, automating retraining and deployment, cutting deployment cycles by 50%.
Built and fine-tuned transformer-based NLP models, improving enterprise text summarization accuracy by 15%.
Delivered predictive analytics solutions using TensorFlow, PyTorch, and Scikit-learn, achieving 20% improvement in IT service management forecasts.
Integrated XGBoost and LightGBM into production systems, increasing customer churn prediction performance by 22%.
Leveraged Google Cloud AI Platform and BigQuery to process 5TB+ datasets, enhancing query efficiency by 35%.
Developed generative AI models (GANs, VAEs) for synthetic data generation, covering 80% of edge cases in test environments.
Implemented MLOps CI/CD pipelines with Docker and GitLab, enabling seamless model updates with zero downtime and full SLA compliance.
Created interactive dashboards (Tableau, Seaborn), accelerating stakeholder decision-making by 30% through actionable insights.
Streamlined real-time data workflows for 1M+ daily transactions using Apache Kafka and Airflow.