New Jersey · Open to senior AI/ML roles
Shipping ML systems at the
edge of healthcare, retrieval,
and reliability.
I'm Sai Nikhil — an AI/ML engineer with 5 years building production ML and Generative AI systems. Currently at Molina Health, designing risk-stratification models, RAG pipelines, and decision-support tools for Medicaid and Medicare populations.
Hello — I'm Sai Nikhil.
I build production ML and Generative AI systems for healthcare. My day-to-day at Molina Health is shipping risk stratification models, RAG pipelines, and decision-support tools that move outcomes for Medicaid and Medicare members.
Before Molina I spent two and a half years at Cognizant building backend ML services and analytics infrastructure for enterprise clients — REST APIs on FastAPI/Flask, embedding pipelines, LLM integrations, and the unglamorous data plumbing that makes any of it possible. I hold a Master's in Computer Science from SUNY Albany.
My work sits where modeling meets engineering: latency budgets, evaluation that survives production, and pipelines that don't fall over when the data changes underneath them. Offline accuracy is the start of the job — shipping is the rest.
- Role
- AI/ML Engineer · Molina Health
- Based in
- New Jersey, USA
- Education
- M.S. CS · SUNY Albany
- Specialties
- Healthcare ML · RAG · MLOps · GenAI
- Open to
- Senior AI/ML Engineer roles
- Contact
- ms1104n@gmail.com
Work history
Five years across healthcare ML and enterprise software — from EDA to production GenAI.
Molina Health
AI/ML Engineer
Owning ML and Generative-AI systems that support care management for Medicaid and Medicare populations — risk stratification, RAG-driven summaries, and decision support for care coordinators and case managers.
- Designed end-to-end ML pipelines in Python (Pandas, NumPy, Scikit-learn) for member risk stratification and cost prediction, improving care-management decisions across Medicaid and Medicare.
- Built predictive models with XGBoost, Random Forest, and TensorFlow to identify high-risk members, predict hospital readmissions, and detect care gaps — enabling proactive intervention.
- Tuned and evaluated risk-adjustment and utilization models — 15% accuracy lift, with measurable AUC and F1 gains across production cohorts.
- Processed large-scale claims, eligibility, and provider data with PySpark; designed scalable storage on AWS S3 + Snowflake, cutting data retrieval time by 30%.
- Built RAG pipelines with LangChain that summarize member health risk and claims history into explainable, clinician-readable insights — 22% lower latency, 12% response-efficiency gain.
- Designed AI-powered decision-support systems combining ML predictions with contextual insights for care coordinators, case managers, and provider engagement.
- Shipped Power BI dashboards visualizing risk scores, HEDIS quality metrics, utilization trends, and cost KPIs — adopted across clinical and operational teams.
- Python
- XGBoost
- TensorFlow
- PySpark
- LangChain
- RAG
- AWS S3
- Snowflake
- Power BI
Cognizant
Software Engineer
Built backend ML services, embedding pipelines, and analytics infrastructure for enterprise clients — bridging classical ML, early LLM tooling, and production REST APIs.
- Built scalable REST APIs with FastAPI and Flask for real-time ML and GenAI inference — 45% faster response times, 60% integration efficiency lift across services.
- Generated high-quality embeddings using Hugging Face and OpenAI models, stored in Pinecone, FAISS, and ChromaDB for semantic search and RAG retrieval.
- Integrated LLM APIs (OpenAI, LLaMA-based) with prompt-engineering and LangChain templates to deliver domain-specific, explainable responses inside RAG-based products.
- Developed and deployed classification, regression, and clustering models with LightGBM and PyTorch, supporting data-driven decisions across business use cases.
- Ran EDA and statistical testing (SciPy, Statsmodels) to surface anomalies — 20% better feature engineering effectiveness, 12% model accuracy improvement.
- Automated reporting workflows with Tableau extracts and scheduled refreshes — 30% reduction in manual reporting effort for stakeholders.
- Containerized and deployed services on Docker + AWS for production reliability and clean resource utilization.
- FastAPI
- Flask
- LangChain
- OpenAI
- HuggingFace
- Pinecone
- FAISS
- LightGBM
- PyTorch
- Docker
- AWS
- Tableau
Signature projects
Production ML systems and open-source work where the architecture, the model, and the outcome line up.
Member Risk Stratification Engine
End-to-end ML system for identifying high-risk members across Medicaid and Medicare lines. XGBoost + Random Forest + TensorFlow ensemble for readmission, care-gap, and cost prediction; PySpark processing for claims, eligibility, and provider data; AWS S3 + Snowflake as the analytical store.
- XGBoost
- Random Forest
- TensorFlow
- PySpark
- AWS S3
- Snowflake
RAG Care Insights Pipeline
LangChain-based RAG system that turns member health risk, claims history, and clinical context into explainable summaries for care coordinators and case managers. Low-latency retrieval over a managed vector store, plus prompt-engineered LLM outputs that hold up under operational review.
- LangChain
- OpenAI
- Anthropic
- Pinecone
- FastAPI
- RAG
Multi-Agentic RAG Architecture
Multi-agent retrieval-augmented system where specialized LLM agents handle retrieval, reasoning, and synthesis. Built around LangChain / CrewAI patterns with a vector backend; designed as a reference for production multi-agent workflows that stay debuggable as they scale.
- Python
- LangChain
- CrewAI
- RAG
- VectorDB
- LLMs
-
rag-from-scratch JavaScript
Retrieval-Augmented Generation, implemented from first principles — chunking, embedding, indexing, retrieval, and generation without framework abstractions.
-
rag-chatbot Python
Production-shaped RAG chatbot — vector retrieval, prompt orchestration, and a pluggable LLM layer with the kind of structure you actually deploy behind an API.
-
multimodal-live-rag-voice Python
Multimodal, low-latency RAG with live voice I/O — combining streaming ASR, retrieval, and grounded LLM responses for real-time conversational interfaces.
-
langchain-rag-document-understanding Jupyter
LangChain-based RAG for document understanding — chunking strategies, retriever tuning, and grounded Q&A across heterogeneous corpora, captured in reproducible notebooks.
-
mlops-app HCL
MLOps reference stack — Terraform-managed infrastructure for training, model registry, deployment, and monitoring; the boring infrastructure that makes ML actually shippable.
Technical stack
Tools I've shipped with — grouped by the part of the system they serve.
Languages
- Python
- SQL
- JavaScript
- TypeScript
- R
- C++
- Java
ML & Deep Learning
- Scikit-learn
- XGBoost
- LightGBM
- PyTorch
- TensorFlow / Keras
- Hugging Face
- SHAP
Generative AI
- LangChain
- LlamaIndex
- CrewAI
- OpenAI
- Anthropic
- Azure OpenAI
- LoRA / PEFT
- Prompt Eval
Data & Vector
- Pandas
- NumPy
- PySpark
- Postgres
- MongoDB
- Pinecone
- ChromaDB
- FAISS
Cloud & APIs
- AWS
- Snowflake
- Vercel
- Firebase
- FastAPI
- Flask
- Node.js
- REST
MLOps & Tooling
- Docker
- Kubernetes
- CI/CD
- MLflow
- Weights & Biases
- TensorBoard
- n8n
- Git
Credentials & focus
Formal education and the domains where the work is happening.
Education
Computer Science
SUNY Albany · Albany, NY
Graduate coursework in machine learning, data systems, and applied AI.
Computer Science
Bharath University · India
Domains
- HealthcareRisk stratification, readmission, care gaps
- GenAIRAG, prompt engineering, multi-agent
- MLOpsCI/CD for ML, monitoring, model registry
- BackendFastAPI, Flask, Node.js, REST
- DataPySpark, Snowflake, vector stores
- AnalyticsPower BI, Tableau, EDA, hypothesis testing
Methodology
Production-first ML
Operating principle
Models are graded on what survives latency budgets, drift, and operational review — not just offline AUC. Evaluation runs alongside the model, not after it.
Explainability by design
Operating principle
For healthcare and risk work, every model output ships with the context that justifies it. SHAP for tabular, retrieval-grounded citations for LLM outputs.
Let's build
something that matters.
Open to senior AI/ML and Generative-AI roles. Happy to consult on healthcare ML, RAG architecture, or productionizing LLM systems.