Makoto Yui

Senior Principal Engineer

Arm Treasure Data

Biography

Makoto YUI is a Senior Principal Engineer of Treasure Data, working on ML-as-a-Service. He is currently leading the development of productizing Treasure AutoML as the tech lead. My work has been productized as the heart of CDP predictive scoring and Content Affinity Engine as well.

Aside from that, he is leading the development of Apache Hivemall, an open source library for scalable machine learning on Apache Hive and Spark. Hivemall is awarded for IDC InfoWorld’s Bossie Awards 2014 to picks in big data tools.

He received Ph.D in computer science from Nara Institute of Science and Technology (NAIST) in 2009. He won the best student award from those who graduated NAIST in 2009. He also won MITOH Youth Super Creater Award at the goverment program for young engineers in 2003.

Interests

Machine Learning
Recommendation Systems
Data Engineering
Database Systems

Education

Ph.D. Computer Science, 2009

Nara Institute of Science and Technology Graduate School of Information Science (NAIST), Japan.
M.E. Computer Science, 2006

Nara Institute of Science and Technology Graduate School of Information Science (NAIST), Japan.
B.E. Computer Science, 2003

Shibaura Institute of Technology.

Experience

Senior Principal Engineer

Treasure Data

Apr 2023 – Present Tokyo, Japan

Working with multi-regional teams, leading AutoML productization as a technical lead. Designed, build, lead the team, and operate Auto-scaling container platform for Automl service and implemented various machine learning functions running on top of it.

Promoted to a Senior Principal Engineer by sucessfully leading ML team as a Tech Lead for productizing Treasure Automl.

Principal Engineer

Arm Treasure Data

Aug 2018 – Mar 2023 Tokyo, Japan

Leading the development of a large scale machine learning service (ML-as-a-Service) in Treasure Data cloud service. My work has became productized as the heart of CDP predictive scoring (Predictive models with automated feature engineering) and Content Affinity Engine (Data augumentation of user interests using NLP, Ontology, and Wikipedia Corpus).

Also, I worked on an Airflow-like workflow management system (OSS’d as Digdag) and it’s next generation Docker runtime design and implementation using AWS ECS cluster auto scaling. A number of customers are using it for running/managing machine learning workflows. Some of example ML workflows can be seen in this repository.

Apache Hivemall PPMC member

The Apache Software Foundation

Sep 2016 – Sep 2022

Leading the development of Apache Hivemall at Apache Incubator as the original creator (slide).

Research Engineer

Treasure Data (Acquired by Arm)

Apr 2015 – Aug 2018 Tokyo, Japan

Worked as the company’s 1st machine learning engineer. Grow machine learning team and leaded machine learning applications at the company. Productized Apache Hivemall in Treasure Data cloud service. Aside from development, I did anything I can contribute to the company growth as a startup member including ML consulting for customers, presentations in sales meeting, talks at external events as well at the early stage of the company.
Internship management: I initiated and organized Summer Internship program from 2015 to 2018 and mentored a number of students. We implemented various Ranking measure and anonaly detection algorithms. Another student worked on Field-aware Factorization Machines and online Kernel SVM. We successfully finished the intership program and 5 master students joined to Treasure Data in the past 3 years. This accomplishments are what I’m proud of.
Consulting: Aside from development tasks, I consulted 20~30 customer-facing machine learning projects from Fortune 500 companies to foreign startups (Indonesia, Taipei, Israel). Consulted industrial segments includes: Telecom, Insurance, Automobile, Ad-tech, EC, Media Agency, Internet-related Service, Real-estate, Gaming, and Online Publishers. My consulting work for Subaru is featured by CEO’s talk at Softbank world. Those consulting gave me precious and unique experiece applying ML to diverse domain problems such as dealing with overfitting (data leakage), feedback loops, and pitfalls for optimization by evaluation measures.
External talks: I presented talks in various conferences (such as ApacheCon, Hadoop Conference, Annnual event of Japan DataScientist Society) and gave demos at research conferences (RecSys'18).

Visiting Researcher

The University of Edinburgh

Sep 2011 – Nov 2011 Edinburgh, UK

As a visiting researcher from AIST, I worked with Paolo Basala and Prof. Malcolm Atkinson at Data Intensive Research (DIR) group. Designed a distributed streaming data processing system on EDIM1 data-intensive machine (an energy-efficient PC cluster) for scientific workflows.

Senior Researcher

National Institute of Advanced Industrial Science and Technology (AIST)

Apr 2010 – Mar 2015 Tsukuba, Japan

Working on distributed and parallel data processing and large-scale machine learning at the data science research group. Designed and managed 50 nodes Hadoop cluster for managing scientific workflows. Promoted to a Senior Researcher in the 3rd year at AIST.

Visiting Postdoc

Centrum Wiskunde and Informatica (CWI)

Oct 2009 – Mar 2010 Amesterdam, Nederlands

Worked with Peter Boncz and Prof. Martin Kersten at INS1 database research group. Designed and implemented a parallel database system on the top of shared-nothing MonetDB servers.

Visiting Researcher

Waseda University

Apr 2009 – Mar 2010 Tokyo, Japan

While I’m receiving my JSPS followship, I worked with Prof. Yamana.

System Engineer

NEC Infomatic Systems, Ltd

Apr 2004 – Mar 2006 Tokyo, Japan

Designed and implemented RM4GS (Reliable Messaging for Grid Services) as a reference implementation of OASIS WSRM which provides reliable messaging facilities for Web Services.

Skills

Machine Learning

Implemented various Machine Learning algorithms

Java

Fluent, Main Programming Language. 10+ years experience

Python

Tool for Scripting

Linux

Long-term user. Uses Ubuntu/Debian/Redhat/AmazonLinux

AWS

Experienced in AWS tech stacks

Used ECS/Fargate in production. Experienced in Container security issues.

Databases

Experienced in Postgres/MySQL. Ph.D in Database Systems :-)

Hadoop

Hadoop/Spark/Hive master

Terraform

Tool for DevOps

Accomplishments

Bossie Awards 2014: The best open source big data tools

IDG InfoWorld Sep 2014

Hivemall is awarded for InfoWorld’s to picks in big data tools.

IPSJ Yamashita SIG Research Award 2009

Information Processing Society of Japan Mar 2010

Awarded for a research paper on non-blocking database buffer management.

JSPS Research Fellow (PD)

Japan Society for the Promotion of Science Apr 2009 – Mar 2010

Recieved a goverment-sponsored Postdoctoral Fellowship.

Best Student Award

Nara Institute of Science and Technology, Japan Mar 2009

I was selected as the best student from those graduated this year.

JSPS Research Fellow (DC2)

Japan Society for the Promotion of Science Apr 2008 – Mar 2009

Recieved a goverment-sponsored Fellowship for Ph.D students

MITOH Youth Super Creater Award

Information-Technology Promotion Agency, Japan Jul 2003

Won Super Creater Award in the goverment program for young engineers. The MITOH Program aims to discover and develop outstanding human resources called Super Creators. Specifically, these are persons possessing creative ideas and skills for achieving software innovation and who can put these ideas and skills to use. Super Creators discovered through this program implemented by IPA are expected to play active roles as world-class IT human resources that help support Japan’s IT industry during the next generation.

IPSJ Outstanding Paper Award 2003

Information Processing Society of Japan Mar 2003

Our paper for XML database system on the top of PostgreSQL is selected as an outstanding paper.

Projects

BTree4j

Btree4j is from scratch implementation of a disk-based Prefix B+-tree written in Pure Java. It’s pretty fast and 100k ops/sec is expected even on laptop.

Github SlideShare

Apache Hivemall (Incubating)

Apache Hivemall is a scalable machine learning library that runs on Apache Hive and Spark. Apache Hivemall offers a variety of functionalities including regression, classification, recommendation, anomaly detection, k-nearest neighbor, and feature engineering. Won IDG’s InfoWorld 2014 Bossie Awards 2014, the best open source big data tools.

Github SlideShare

XBird

XBird is a light-weight XQuery processor and database system written in Java. The light-weight means reasonably fast and embeddable. Impmented fully Function Programming lanuguage for XML, XQuery using JavaCC. It passes about 91% of the minimal conformance of XQuery Test Suite.

Github Paper

XpSQL

Developed a multi-functional XML database environment using PostgreSQL as a graduate student project. Extended PostgreSQL functionality using Server Programming Interface. Awarded super creater award at IPA mitoh youth, a goverment program to educate young engineers.

gborg (attic)

Talks

Fireside chat at Indeed

Gave a invited talk about my OSS experience.

Apr 4, 2019 Indeed Tokyo office

Slides

ApacheCon North America 2018

Gave a talk titled Introduction to Apache Hivemall v0.5.0: Machine Learning on Hive/Spark. Join talk with Takeshi Yamamuro from NTT.

Sep 27, 2018 Montrêal, Canada

Event Slides

Recommendation 101 using Hivemall

Gave a talk titled Recommendation 101 using Hivemall for Treasure Data customers.

Aug 24, 2016

Slides

See all talks

Makoto Yui

Senior Principal Engineer

Biography

Interests

Education

Experience

Senior Principal Engineer

Principal Engineer

Apache Hivemall PPMC member

Research Engineer

Treasure Data (Acquired by Arm)

Visiting Researcher

Senior Researcher

Visiting Postdoc

Visiting Researcher

System Engineer

NEC Infomatic Systems, Ltd

Skills

Machine Learning

Java

Python

Linux

AWS

Databases

Hadoop

Terraform

Accomplish­ments

Bossie Awards 2014: The best open source big data tools

IPSJ Yamashita SIG Research Award 2009

JSPS Research Fellow (PD)

Best Student Award

JSPS Research Fellow (DC2)

MITOH Youth Super Creater Award

IPSJ Outstanding Paper Award 2003

Projects

Talks

Accomplishments