Makoto YUI is a Senior Principal Engineer of Treasure Data, working on ML-as-a-Service. He is currently leading the development of productizing Treasure AutoML as the tech lead. My work has been productized as the heart of CDP predictive scoring and Content Affinity Engine as well.
Aside from that, he is leading the development of Apache Hivemall, an open source library for scalable machine learning on Apache Hive and Spark. Hivemall is awarded for IDC InfoWorld’s Bossie Awards 2014 to picks in big data tools.
He received Ph.D in computer science from Nara Institute of Science and Technology (NAIST) in 2009. He won the best student award from those who graduated NAIST in 2009. He also won MITOH Youth Super Creater Award at the goverment program for young engineers in 2003.
Ph.D. Computer Science, 2009
Nara Institute of Science and Technology Graduate School of Information Science (NAIST), Japan.
M.E. Computer Science, 2006
Nara Institute of Science and Technology Graduate School of Information Science (NAIST), Japan.
B.E. Computer Science, 2003
Shibaura Institute of Technology.
Working with multi-regional teams, leading AutoML productization as a technical lead. Designed, build, lead the team, and operate Auto-scaling container platform for Automl service and implemented various machine learning functions running on top of it.
Promoted to a Senior Principal Engineer by sucessfully leading ML team as a Tech Lead for productizing Treasure Automl.
Leading the development of a large scale machine learning service (ML-as-a-Service) in Treasure Data cloud service. My work has became productized as the heart of CDP predictive scoring (Predictive models with automated feature engineering) and Content Affinity Engine (Data augumentation of user interests using NLP, Ontology, and Wikipedia Corpus).
Also, I worked on an Airflow-like workflow management system (OSS’d as Digdag) and it’s next generation Docker runtime design and implementation using AWS ECS cluster auto scaling. A number of customers are using it for running/managing machine learning workflows. Some of example ML workflows can be seen in this repository.
Worked as the company’s 1st machine learning engineer. Grow machine learning team and leaded machine learning applications at the company. Productized Apache Hivemall in Treasure Data cloud service. Aside from development, I did anything I can contribute to the company growth as a startup member including ML consulting for customers, presentations in sales meeting, talks at external events as well at the early stage of the company.
Internship management: I initiated and organized Summer Internship program from 2015 to 2018 and mentored a number of students. We implemented various Ranking measure and anonaly detection algorithms. Another student worked on Field-aware Factorization Machines and online Kernel SVM. We successfully finished the intership program and 5 master students joined to Treasure Data in the past 3 years. This accomplishments are what I’m proud of.
Consulting: Aside from development tasks, I consulted 20~30 customer-facing machine learning projects from Fortune 500 companies to foreign startups (Indonesia, Taipei, Israel). Consulted industrial segments includes: Telecom, Insurance, Automobile, Ad-tech, EC, Media Agency, Internet-related Service, Real-estate, Gaming, and Online Publishers. My consulting work for Subaru is featured by CEO’s talk at Softbank world. Those consulting gave me precious and unique experiece applying ML to diverse domain problems such as dealing with overfitting (data leakage), feedback loops, and pitfalls for optimization by evaluation measures.
External talks: I presented talks in various conferences (such as ApacheCon, Hadoop Conference, Annnual event of Japan DataScientist Society) and gave demos at research conferences (RecSys'18).
Implemented various Machine Learning algorithms
Fluent, Main Programming Language. 10+ years experience
Tool for Scripting
Long-term user. Uses Ubuntu/Debian/Redhat/AmazonLinux
Experienced in AWS tech stacks
Used ECS/Fargate in production. Experienced in Container security issues.
Experienced in Postgres/MySQL. Ph.D in Database Systems :-)
Hadoop/Spark/Hive master
Tool for DevOps
Introduction to Apache Hivemall v0.5.0: Machine Learning on Hive/Spark
. Join talk with Takeshi Yamamuro from NTT.
Recommendation 101 using Hivemall
for Treasure Data customers.