
Machine learning and data engineer with experience building large-scale production systems handling petabytes of data and tens of millions in business impact.Book a 30 minute intro call: Calendly Link
Data pipeline architecture and debugging
Machine learning model prototyping and deployment
Scaling Spark / Ray / distributed data workflows
ML experimentation infrastructure
Performance optimization for large data systems
Who I help
Early-stage startups
Small businesses with messy data
Teams that need a first ML/data system
Founders who need part-time technical help
Selected Experience
Large-scale search ML pipeline (CloseMatch)
Built a petabyte-scale Apache Spark ML pipeline from scratch to reduce undifferentiated search results. System became the foundation for an entire engineering team’s workflow. Annual profit ~$15M / year
Stack: AWS, EMR, Spark, Random Forest, BERTIdentity Graph System
Built a petabyte-scale graph-processing pipeline for identity resolution. Reduced refresh latency by 50% and cold-start processing time by 80%. Saved ~1.5M / year, faster iteration from cold starts too.
Stack: Google Cloud, Dataproc, SparkAdvertiser Risk ML Model
Built and deployed a supervised ML system that reduced advertiser campaign suspensions, increasing annual profit from $15M → $43M.
Stack: Random Forest, AWS Lambda, Glue, S3Distributed ML Experimentation Platform
Built a Ray-based experimentation platform on Kubernetes for large-scale reinforcement learning research and model evaluation.
Stack: Ray, EKS
Engagements
Intro calls
Architecture reviews
Project-based consulting
Fractional technical advising
Contact
Best way to reach me for general inquiries: [email protected]To schedule a free intro call: Calendly LinkTo schedule a paid consulting call ($300 / hour): Calendly Link