Principal Software Engineer
March 2015 – January 2018
- Contributor to Apache Mesos, Marathon, DC/OS, and more
- Designed, developed, and successfully launched a new lightweight, self-healing, auto scaling private cloud infrastructure platform that efficiently uses 1,000,000+ CPUs, 3+ petabytes of RAM, and 100+ petabytes of storage
- Appointed team lead for application security and continuous integration
- Created an efficient, high performance, highly available Mesos scheduler with support for CNI, distributed storage, and custom executors
- Built a modular SDK to ease and accelerate the internal development of other Mesos schedulers and executors
- Developed a self-healing cluster scaling service which auto-scales applications globally across the cluster and globally to specific node types based on data center configuration and metadata
- Created a distributed, self-healing firewall service to dynamically manage netfilter rules/chains across the cluster
- Developed a load balanced, distributed, self-healing middleman API service which intercepts calls to the logging/monitoring framework to provide a consistent, high performance, database-agnostic interface
- Worked on a custom deployment/upgrade framework that manages dependencies, provides rollback on failure, supports blue-green deployments, and ensures no downtime of services
- Wrote a tool to automate the setup, configuration, and deployment of the cluster in a local, virtualized environment