Shiny Sagar Saka

Hi, I'm Shiny Sagar Saka

Building scalable, intelligent, and high-performance ML solutions

Data Scientist with 4+ years of experience designing and deploying machine learning and AI solutions across fintech and enterprise. Skilled in real-time fraud detection, NLP, graph-based risk modeling, and cloud-native ML platforms.

View My Work Get in Touch
4+
Years Experience
30M
Daily Transactions
99.9%
Pipeline Uptime
14%
Fraud Recall Lift

Who I Am

I'm a Data Scientist & AI Engineer with 4+ years of experience designing and deploying machine learning solutions across fintech and enterprise domains.

At PayPal, I engineer real-time fraud detection pipelines processing 20–30M daily transactions using PySpark, Kafka, and Graph Neural Networks — achieving sub-120ms inference latency at 99.9% uptime.

I thrive at the intersection of research and production: from training transformer models and building RAG pipelines to designing the data infrastructure that powers real-time decisions at scale.

Jan 2025 – Present
Data Scientist
PayPal · New York, Hybrid
Jan 2020 – Apr 2023
Data Scientist
Wipro · India
May 2023 – Dec 2024
M.S. Computer Science
University of Central Missouri

Featured Projects

002
Data Engineering · Cloud
Cloud Lakehouse @ Wipro

Built enterprise lakehouse on Azure Data Lake Gen2 + Snowflake, consolidating 40+ sources. Cut retrieval time by 35%, improved data accuracy from 89% to 97%.

AzureSnowflakedbtSpark
003
NLP · RAG
Supplier Risk Intelligence System

NLP-driven risk monitoring ingesting news and filings. FAISS + LangChain RAG pipeline for contextual Q&A over supplier documents with structured risk scoring.

TransformersFAISSLangChainNER
004
Forecasting · ML
Real-Time Retail Demand Forecasting

ML pipeline with XGBoost and LSTM models using rolling averages, seasonal trends, and lag features. Improved forecast accuracy by 12% with cross-validation tuning.

XGBoostLSTMPythonPandas
005
MLOps · Streaming
Real-Time Anomaly Detection @ Wipro

Kafka + Spark Structured Streaming pipelines processing 8M daily events. Real-time anomaly detection cut incident response time by 28% via automated monitoring.

KafkaSparkMLflowDocker

Skills & Technologies

// core competencies
Machine Learning
95%
Deep Learning / NLP
92%
Data Engineering
90%
Graph Neural Networks
85%
MLOps & Deployment
88%
Cloud (AWS / Azure)
85%
// tools & technologies
Python
PySpark
PyTorch
TensorFlow
Hugging Face
LangChain
Kafka
Snowflake
dbt
Airflow
MLflow
Docker
Kubernetes
AWS
Azure
FAISS
Redis
SQL

Contact Me

Open to data science, ML engineering, and AI research roles. Based in NJ — open to relocate. Always happy to talk data, fraud systems, or interesting ML problems.