New Jersey · Open to senior AI/ML roles

Shipping ML systems at the
edge of healthcare, retrieval,
and reliability.

I'm Sai Nikhil — an AI/ML engineer with 5 years building production ML and Generative AI systems. Currently at Molina Health, designing risk-stratification models, RAG pipelines, and decision-support tools for Medicaid and Medicare populations.

5 yrs
building and shipping ML and Generative AI systems across healthcare and IT
22%
latency reduction in production RAG pipelines for real-time care-management insights
15%
accuracy lift on risk-adjustment and utilization models via tuning + feature work
45%
faster API response times across FastAPI / Flask services for ML inference
Sai Nikhil Mattapalli — portrait
Fig. 01 Sai Nikhil Mattapalli · NJ · AI/ML Engineer
— 00

Hello — I'm Sai Nikhil.

I build production ML and Generative AI systems for healthcare. My day-to-day at Molina Health is shipping risk stratification models, RAG pipelines, and decision-support tools that move outcomes for Medicaid and Medicare members.

Before Molina I spent two and a half years at Cognizant building backend ML services and analytics infrastructure for enterprise clients — REST APIs on FastAPI/Flask, embedding pipelines, LLM integrations, and the unglamorous data plumbing that makes any of it possible. I hold a Master's in Computer Science from SUNY Albany.

My work sits where modeling meets engineering: latency budgets, evaluation that survives production, and pipelines that don't fall over when the data changes underneath them. Offline accuracy is the start of the job — shipping is the rest.

Role
AI/ML Engineer · Molina Health
Based in
New Jersey, USA
Education
M.S. CS · SUNY Albany
Specialties
Healthcare ML · RAG · MLOps · GenAI
Open to
Senior AI/ML Engineer roles
— 01

Work history

Five years across healthcare ML and enterprise software — from EDA to production GenAI.

Molina Health

AI/ML Engineer

Owning ML and Generative-AI systems that support care management for Medicaid and Medicare populations — risk stratification, RAG-driven summaries, and decision support for care coordinators and case managers.

  • Designed end-to-end ML pipelines in Python (Pandas, NumPy, Scikit-learn) for member risk stratification and cost prediction, improving care-management decisions across Medicaid and Medicare.
  • Built predictive models with XGBoost, Random Forest, and TensorFlow to identify high-risk members, predict hospital readmissions, and detect care gaps — enabling proactive intervention.
  • Tuned and evaluated risk-adjustment and utilization models — 15% accuracy lift, with measurable AUC and F1 gains across production cohorts.
  • Processed large-scale claims, eligibility, and provider data with PySpark; designed scalable storage on AWS S3 + Snowflake, cutting data retrieval time by 30%.
  • Built RAG pipelines with LangChain that summarize member health risk and claims history into explainable, clinician-readable insights — 22% lower latency, 12% response-efficiency gain.
  • Designed AI-powered decision-support systems combining ML predictions with contextual insights for care coordinators, case managers, and provider engagement.
  • Shipped Power BI dashboards visualizing risk scores, HEDIS quality metrics, utilization trends, and cost KPIs — adopted across clinical and operational teams.
  • Python
  • XGBoost
  • TensorFlow
  • PySpark
  • LangChain
  • RAG
  • AWS S3
  • Snowflake
  • Power BI

Cognizant

Software Engineer

Built backend ML services, embedding pipelines, and analytics infrastructure for enterprise clients — bridging classical ML, early LLM tooling, and production REST APIs.

  • Built scalable REST APIs with FastAPI and Flask for real-time ML and GenAI inference — 45% faster response times, 60% integration efficiency lift across services.
  • Generated high-quality embeddings using Hugging Face and OpenAI models, stored in Pinecone, FAISS, and ChromaDB for semantic search and RAG retrieval.
  • Integrated LLM APIs (OpenAI, LLaMA-based) with prompt-engineering and LangChain templates to deliver domain-specific, explainable responses inside RAG-based products.
  • Developed and deployed classification, regression, and clustering models with LightGBM and PyTorch, supporting data-driven decisions across business use cases.
  • Ran EDA and statistical testing (SciPy, Statsmodels) to surface anomalies — 20% better feature engineering effectiveness, 12% model accuracy improvement.
  • Automated reporting workflows with Tableau extracts and scheduled refreshes — 30% reduction in manual reporting effort for stakeholders.
  • Containerized and deployed services on Docker + AWS for production reliability and clean resource utilization.
  • FastAPI
  • Flask
  • LangChain
  • OpenAI
  • HuggingFace
  • Pinecone
  • FAISS
  • LightGBM
  • PyTorch
  • Docker
  • AWS
  • Tableau
— 02

Signature projects

Production ML systems and open-source work where the architecture, the model, and the outcome line up.

P/01
Molina Health·2023–25·Healthcare ML

Member Risk Stratification Engine

End-to-end ML system for identifying high-risk members across Medicaid and Medicare lines. XGBoost + Random Forest + TensorFlow ensemble for readmission, care-gap, and cost prediction; PySpark processing for claims, eligibility, and provider data; AWS S3 + Snowflake as the analytical store.

15%accuracy lift
30%faster data retrieval
3+model families ensembled
  • XGBoost
  • Random Forest
  • TensorFlow
  • PySpark
  • AWS S3
  • Snowflake
P/02
Molina Health·2024–25·Generative AI

RAG Care Insights Pipeline

LangChain-based RAG system that turns member health risk, claims history, and clinical context into explainable summaries for care coordinators and case managers. Low-latency retrieval over a managed vector store, plus prompt-engineered LLM outputs that hold up under operational review.

22%latency reduction
12%response efficiency
k=5retrieval depth
  • LangChain
  • OpenAI
  • Anthropic
  • Pinecone
  • FastAPI
  • RAG
P/03
Open Source·2025·Multi-Agent AI

Multi-Agentic RAG Architecture

Multi-agent retrieval-augmented system where specialized LLM agents handle retrieval, reasoning, and synthesis. Built around LangChain / CrewAI patterns with a vector backend; designed as a reference for production multi-agent workflows that stay debuggable as they scale.

4+agent roles
RAGretrieval + reasoning
OSSon GitHub
  • Python
  • LangChain
  • CrewAI
  • RAG
  • VectorDB
  • LLMs
— 03

Technical stack

Tools I've shipped with — grouped by the part of the system they serve.

Languages

  • Python
  • SQL
  • JavaScript
  • TypeScript
  • R
  • C++
  • Java

ML & Deep Learning

  • Scikit-learn
  • XGBoost
  • LightGBM
  • PyTorch
  • TensorFlow / Keras
  • Hugging Face
  • SHAP

Generative AI

  • LangChain
  • LlamaIndex
  • CrewAI
  • OpenAI
  • Anthropic
  • Azure OpenAI
  • LoRA / PEFT
  • Prompt Eval

Data & Vector

  • Pandas
  • NumPy
  • PySpark
  • Postgres
  • MongoDB
  • Pinecone
  • ChromaDB
  • FAISS

Cloud & APIs

  • AWS
  • Snowflake
  • Vercel
  • Firebase
  • FastAPI
  • Flask
  • Node.js
  • REST

MLOps & Tooling

  • Docker
  • Kubernetes
  • CI/CD
  • MLflow
  • Weights & Biases
  • TensorBoard
  • n8n
  • Git
— 04

Credentials & focus

Formal education and the domains where the work is happening.

Education

M.S.
Computer Science

SUNY Albany · Albany, NY

Graduate coursework in machine learning, data systems, and applied AI.

B.Tech.
Computer Science

Bharath University · India

Domains

  • HealthcareRisk stratification, readmission, care gaps
  • GenAIRAG, prompt engineering, multi-agent
  • MLOpsCI/CD for ML, monitoring, model registry
  • BackendFastAPI, Flask, Node.js, REST
  • DataPySpark, Snowflake, vector stores
  • AnalyticsPower BI, Tableau, EDA, hypothesis testing

Methodology

Production-first ML

Operating principle

Models are graded on what survives latency budgets, drift, and operational review — not just offline AUC. Evaluation runs alongside the model, not after it.

Explainability by design

Operating principle

For healthcare and risk work, every model output ships with the context that justifies it. SHAP for tabular, retrieval-grounded citations for LLM outputs.

— 05

Let's build
something that matters.

Open to senior AI/ML and Generative-AI roles. Happy to consult on healthcare ML, RAG architecture, or productionizing LLM systems.