Today, the foundation of artificial intelligence, machine learning, has become the hot topic in internet world. Alibaba, the leading market player of ecommerce and internet finance, has foreseen and applied machine learning technology long before AlphaGo s demonstration in 2016. In Alibaba Technology Forum in Polytechnic University, June 17 2016, Dr. Wei Chu, Director of Engineering of Alibaba gave his speech on Distributed Machine Learning and its Application in Alibaba.
According to Wikipedia, machine learning is a subfield in computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence. In general, machine Learning can be divided into 3 levels, namely, supervised learning, unsupervised learning, reinforcement learning.
According to Zhu, there are 3 pillars for artificial intelligence learning and decision making. The first is big data, as petabytes or megabytes data are just the foundation pool for machine learning and pattern recognition. The second is cloud computing, as cloud would facilitate high performance, scalability, accessibility, cheap cost, and security of computation. The third is research innovations – on both algorithm and CPU chips level.
Zhu also pinpointed the features of global machine learning platforms. For example, Amazon has launched machine learning in its web service platform, but only supports very limited functions like regression and classification. Google’s cloud platform, Tenserflow, is quite advanced and open sourced, and perhaps the best for professional developers, as the technical bar is rather high for people who don’t know Python. Microsoft’s machine learning platform, Azure, has a relatively more user friendly graphical interface. In this respect, Alibaba s Platform of Artificial Intelligence (PAI), is more like Microsoft’s style.
Dr. Zhu also demonstrated the features and functions of Alibaba’s cloud-based and graphical platform (PAI): searching experiments, components and models, managing experiments, exploring data tables, listing algorithms and tools, managing models, generating binary classification and editing diffusion matrix. In search for data intelligence, PAI is now based on big data collection and analytics, comprehensive data-mining, distributed computing, and advanced algorithm.
According to Zhu, PAI’s customers include traditional giants (like Sinopac, CNPC, etc.), startups, governments (Zhejiang provincial government’s meteorological administration and traffic management), and researchers (like data scientists, engineers, and developers).
Zhu described 5 conceptual layers within the PAI’s architecture. The surface layer is applications, such as credit score, security, recommender system, search engine, finance cloud, public cloud. The next layer is platform products, including product management, module management, analytic visualization, customized PAI, Ani My-PAI etc. The third layer is algorithms and tools, data pre-processing, feature engineering, machine learning algorithms, statistics, deep learning with CNN/DNN/LSTM RNN. The next deeper layer is computing framework, MR, SQL, MPI, PS, Graph, GPU single machine or multiple machines. Finally, the deepest layer is infrastructure, namely, Aliyun, CPU Cloud and GPU cloud.
After that, Zhu also introduced Caffe and Pluto, which is the deep learning algorithm system behind Alibaba’s PAI. Based on Caffe, Alibaba’s open source deep learning library, Pluto could be regarded as the distributed deep learning algorithms, with multi-machine and multi-card, supporting popular models like Convolutional Neural N or RNN.
Zhu then depicted 3 Deep Learning Models in Alibaba’s cloud service, especially in Pluto. The first is Deep Neural network (DNN). This could be regarded as Monte-Carlo system, with multi-card and multi-level computation, in which each node would have distributed inputs and outputs. The second is Convolutional Neural Network (CNN), every piece in the puzzle would be broken down into smaller units, with continuous convolution-pooling process until it is fully connected, so as to search for precise pattern recognition with higher resolution. CNN is widely used in region detection and item verification. With Pluto, Alibaba cloud (with only 8 GPU cards) could identify 100 million images within 10 hours, and in the same time identify over 99.6 % Chinese characters. The third is Recurrent Neural Network (RNN), which is built with closed loop cycle, enabling the middle layer automatically feedback to previous input layer when identifying faults or missing items, and only pass to output layer until completely debugged.
Within Pluto, there are Parameter Server Nodes (model processors) and Worker Nodes (shared processors). All the workstations in Parameter Nodes and Worker Nodes would coordinate and feedback to each other collectively. Recently, Caffe has started to support multiple models, including CNN and LSTM.
As the following, Zhu rendered Pluto’s features in data computation. With 56 Gigabytes in CPU/GPU cloud, data parallelism and multiple copying, complete failover, asynchronous update, developers and operators would then enjoy significant Pluto’s advantages in data processing, like efficient multi-tasking and scheduling, infinite band communication, scalability, higher accuracy along with higher acceleration ration.
Afterwards, Zhu also showed the Pluto’s applications in business. The first example is Taobao recommender systems – given user, query, and candidate items, Pluto will find the best item to maximize business metrics, CTR or RPM or user satisfaction. The second example is logistic Regression. Pluto has finished the metric analysis of 1B attributes and 57B training samples on Alimama ads within 5 hours, and finally accelerated 40% logistic calculation over MPI. Right now, Pluto can support up to 10B attributes and up to 100B samples. The third application is Zhima personal credit score. The score is the integrated result of several indicators like identification, compliance, credit history, social connections, behavior etc. The score, 350-950, represents personal and company credit status, whereas the higher the score, means, the lower the default risks with locally connected DNN. The fourth example is Image OCR, it can be widely used in identity card verification, name-card information input, vocabulary teaching, verification code recognition, and so on.
In summary, Zhu highlighted a picture of Alibaba’s artificial intelligence and machine learning in the near future. Based on CPU/GPU cloud computing, big data analytics, and deep learning algorithms, Alibaba’s machine learning platform signifies smart computation and data intelligence, which not only helps Alibaba’s developers tremendously, but also its entrepreneurs and customers on Alibaba.
About Alibaba Cloud
Established in September 2009, Alibaba Cloud (intl.aliyun.com), Alibaba Group’s cloud computing arm, develops highly scalable platforms for cloud computing and data management. It provides a comprehensive suite of cloud computing services to support participants of Alibaba Group’s online and mobile commerce ecosystem, including sellers and other third-party customers and businesses. Alibaba Cloud is a business within Alibaba Group.