New Jersey · Open to senior AI/ML roles

Shipping ML systems at the
edge of healthcare, retrieval,
and reliability.

I'm Sai Nikhil — an AI/ML engineer with 5 years building production ML and Generative AI systems. Currently at Molina Health, designing risk-stratification models, RAG pipelines, and decision-support tools for Medicaid and Medicare populations.

See the work ms1104n@gmail.com

5 yrs AI/ML● 22% lower RAG latency● 15% accuracy lift● 45% faster APIs● 30% faster data retrieval● 60% integration efficiency● Healthcare ML at scale● RAG · LangChain · LlamaIndex● 5 yrs AI/ML● 22% lower RAG latency● 15% accuracy lift● 45% faster APIs● 30% faster data retrieval● 60% integration efficiency● Healthcare ML at scale● RAG · LangChain · LlamaIndex●

5 yrs

building and shipping ML and Generative AI systems across healthcare and IT

22%

latency reduction in production RAG pipelines for real-time care-management insights

15%

accuracy lift on risk-adjustment and utilization models via tuning + feature work

45%

faster API response times across FastAPI / Flask services for ML inference

Sai Nikhil Mattapalli — portrait — Fig. 01 Sai Nikhil Mattapalli · NJ · AI/ML Engineer

— 00

Hello — I'm Sai Nikhil.

I build production ML and Generative AI systems for healthcare. My day-to-day at Molina Health is shipping risk stratification models, RAG pipelines, and decision-support tools that move outcomes for Medicaid and Medicare members.

Before Molina I spent two and a half years at Cognizant building backend ML services and analytics infrastructure for enterprise clients — REST APIs on FastAPI/Flask, embedding pipelines, LLM integrations, and the unglamorous data plumbing that makes any of it possible. I hold a Master's in Computer Science from SUNY Albany.

My work sits where modeling meets engineering: latency budgets, evaluation that survives production, and pipelines that don't fall over when the data changes underneath them. Offline accuracy is the start of the job — shipping is the rest.

Role: AI/ML Engineer · Molina Health
Based in: New Jersey, USA
Education: M.S. CS · SUNY Albany
Specialties: Healthcare ML · RAG · MLOps · GenAI
Open to: Senior AI/ML Engineer roles
Contact: ms1104n@gmail.com

— 01

Work history

Five years across healthcare ML and enterprise software — from EDA to production GenAI.

Molina Health

AI/ML Engineer

Owning ML and Generative-AI systems that support care management for Medicaid and Medicare populations — risk stratification, RAG-driven summaries, and decision support for care coordinators and case managers.

Designed end-to-end ML pipelines in Python (Pandas, NumPy, Scikit-learn) for member risk stratification and cost prediction, improving care-management decisions across Medicaid and Medicare.
Built predictive models with XGBoost, Random Forest, and TensorFlow to identify high-risk members, predict hospital readmissions, and detect care gaps — enabling proactive intervention.
Tuned and evaluated risk-adjustment and utilization models — 15% accuracy lift, with measurable AUC and F1 gains across production cohorts.
Processed large-scale claims, eligibility, and provider data with PySpark; designed scalable storage on AWS S3 + Snowflake, cutting data retrieval time by 30%.
Built RAG pipelines with LangChain that summarize member health risk and claims history into explainable, clinician-readable insights — 22% lower latency, 12% response-efficiency gain.
Designed AI-powered decision-support systems combining ML predictions with contextual insights for care coordinators, case managers, and provider engagement.
Shipped Power BI dashboards visualizing risk scores, HEDIS quality metrics, utilization trends, and cost KPIs — adopted across clinical and operational teams.

Python
XGBoost
TensorFlow
PySpark
LangChain
RAG
AWS S3
Snowflake
Power BI

Cognizant

Software Engineer

Built backend ML services, embedding pipelines, and analytics infrastructure for enterprise clients — bridging classical ML, early LLM tooling, and production REST APIs.

Built scalable REST APIs with FastAPI and Flask for real-time ML and GenAI inference — 45% faster response times, 60% integration efficiency lift across services.
Generated high-quality embeddings using Hugging Face and OpenAI models, stored in Pinecone, FAISS, and ChromaDB for semantic search and RAG retrieval.
Integrated LLM APIs (OpenAI, LLaMA-based) with prompt-engineering and LangChain templates to deliver domain-specific, explainable responses inside RAG-based products.
Developed and deployed classification, regression, and clustering models with LightGBM and PyTorch, supporting data-driven decisions across business use cases.
Ran EDA and statistical testing (SciPy, Statsmodels) to surface anomalies — 20% better feature engineering effectiveness, 12% model accuracy improvement.
Automated reporting workflows with Tableau extracts and scheduled refreshes — 30% reduction in manual reporting effort for stakeholders.
Containerized and deployed services on Docker + AWS for production reliability and clean resource utilization.

FastAPI
Flask
LangChain
OpenAI
HuggingFace
Pinecone
FAISS
LightGBM
PyTorch
Docker
AWS
Tableau

— 02

Signature projects

Production ML systems and open-source work where the architecture, the model, and the outcome line up.

P/01

Molina Health·2023–25·Healthcare ML

Member Risk Stratification Engine

End-to-end ML system for identifying high-risk members across Medicaid and Medicare lines. XGBoost + Random Forest + TensorFlow ensemble for readmission, care-gap, and cost prediction; PySpark processing for claims, eligibility, and provider data; AWS S3 + Snowflake as the analytical store.

15%accuracy lift

30%faster data retrieval

3+model families ensembled

XGBoost
Random Forest
TensorFlow
PySpark
AWS S3
Snowflake

P/02

Molina Health·2024–25·Generative AI

RAG Care Insights Pipeline

LangChain-based RAG system that turns member health risk, claims history, and clinical context into explainable summaries for care coordinators and case managers. Low-latency retrieval over a managed vector store, plus prompt-engineered LLM outputs that hold up under operational review.

22%latency reduction

12%response efficiency

k=5retrieval depth

LangChain
OpenAI
Anthropic
Pinecone
FastAPI
RAG

P/03

Open Source·2025·Multi-Agent AI

Multi-Agentic RAG Architecture

Multi-agent retrieval-augmented system where specialized LLM agents handle retrieval, reasoning, and synthesis. Built around LangChain / CrewAI patterns with a vector backend; designed as a reference for production multi-agent workflows that stay debuggable as they scale.

4+agent roles

RAGretrieval + reasoning

OSSon GitHub

Python
LangChain
CrewAI
RAG
VectorDB
LLMs

Open source · selected GitHub repositories All repos on GitHub →

— 03

Technical stack

Tools I've shipped with — grouped by the part of the system they serve.

Languages

Python
SQL
JavaScript
TypeScript
R
C++
Java

ML & Deep Learning

Scikit-learn
XGBoost
LightGBM
PyTorch
TensorFlow / Keras
Hugging Face
SHAP

Generative AI

LangChain
LlamaIndex
CrewAI
OpenAI
Anthropic
Azure OpenAI
LoRA / PEFT
Prompt Eval

Data & Vector

Pandas
NumPy
PySpark
Postgres
MongoDB
Pinecone
ChromaDB
FAISS

Cloud & APIs

AWS
Snowflake
Vercel
Firebase
FastAPI
Flask
Node.js
REST

MLOps & Tooling

Docker
Kubernetes
CI/CD
MLflow
Weights & Biases
TensorBoard
n8n
Git

— 04

Credentials & focus

Formal education and the domains where the work is happening.

Education

M.S.

Computer Science

SUNY Albany · Albany, NY

Graduate coursework in machine learning, data systems, and applied AI.

B.Tech.

Computer Science

Bharath University · India

Domains

HealthcareRisk stratification, readmission, care gaps
GenAIRAG, prompt engineering, multi-agent
MLOpsCI/CD for ML, monitoring, model registry
BackendFastAPI, Flask, Node.js, REST
DataPySpark, Snowflake, vector stores
AnalyticsPower BI, Tableau, EDA, hypothesis testing

Methodology

Production-first ML

Operating principle

Models are graded on what survives latency budgets, drift, and operational review — not just offline AUC. Evaluation runs alongside the model, not after it.

Explainability by design

Operating principle

For healthcare and risk work, every model output ships with the context that justifies it. SHAP for tabular, retrieval-grounded citations for LLM outputs.

— 05

Let's build
something that matters.

Open to senior AI/ML and Generative-AI roles. Happy to consult on healthcare ML, RAG architecture, or productionizing LLM systems.

Email ms1104n@gmail.com ↗ LinkedIn /in/sai-nikhil-m ↗ GitHub @ms1104n-max ↗ Phone +1 518-552-1672 ↗

Shipping ML systems at the edge of healthcare, retrieval, and reliability.

Hello — I'm Sai Nikhil.

Member Risk Stratification Engine

RAG Care Insights Pipeline

Multi-Agentic RAG Architecture

Languages

ML & Deep Learning

Generative AI

Data & Vector

Cloud & APIs

MLOps & Tooling

Education

Computer Science

Computer Science

Domains

Methodology

Production-first ML

Explainability by design

Shipping ML systems at the
edge of healthcare, retrieval,
and reliability.