Root-cause analysis for ETL operations
Led an Amazon Bedrock RAG platform for 30K+ developers, automating investigation across 3M+ ETL executions per day and reducing support by 50+ tickets per week.
I build Big Data Technologies platforms that help Amazon data engineers schedule, orchestrate, monitor, and recover Redshift and EMR workloads that produce business data at scale.
Systems work focused on reliable data production, operational automation, and low-latency dependency decisions.
Led an Amazon Bedrock RAG platform for 30K+ developers, automating investigation across 3M+ ETL executions per day and reducing support by 50+ tickets per week.
Redesigned Big Data Technologies' job orchestration platform for scheduling, retries, distributed state, and fault recovery across 5M+ analytics jobs per day.
Built lineage and completeness capabilities aggregating 50M+ events per day for low-latency dependency evaluation, workflow recovery, redrive, and debugging.
10+ years building backend platforms, distributed systems, and developer infrastructure.
Leading platform initiatives across AI-powered operations and workflow orchestration for Amazon's internal analytics and business data production stack.
Built dependency management, data completeness, lineage, and recovery systems for analytics workflows at Amazon scale.
Developed backend services and distributed ingestion workflows for Amazon Appstore application onboarding and publishing.
Languages, platforms, and systems I use to build reliable data infrastructure.