In Search for Data Intelligence: Big Data Computing Platform at Alibaba Cloud

In Search for Data Intelligence: Big Data Computing Platform at Alibaba Cloud

During the Alibaba Technology Forum: Cloud Computing and Big Data, 17 June 2016, in Chiang Cheng Studio Theatre, Polytechnic University, Zhou Jingren, Vice President of Alibaba Group, shared the topic of Big Data Computing Platform at Alibaba Group.  Starting from 2009, Alibaba launched and applied big data cloud computing platform. Today, it only supports the infrastructural services inside Alibaba (all business units of Taobao, Tmall, Alipay and Ali-ads, over 8,000 developers, over 1,500 applications, over 1,000,000 tables, over 40,000 cluster nodes, over 1 EB data), but also offers services for external customers, Vanke Group, Zhejiang provincial government, People’s Daily, China Weather and so on.

Zhou highlighted the key advantages of Alibaba’s cloud. First of all, the obvious benefit is processing speed, as the platform can process 100 TB datasets within 6 hours, 2X faster than Hadoop/Hive, with static and dynamic optimization at various levels. Secondly, scalability of the platform means it can support clusters of million machines and provide linear availability for data intensive computation. Thirdly, there is also high utilization, with efficient resource management, and driving over ten millions of machines by mixing workloads. At the same time, users could enjoy high usability, regarding to the platform’s unified programming languages, availability of structured and unstructured datasets, and seamless integration of user code. Finally, users could also enjoy reliability, with 24 hour continuous monitoring, and automatic failure detection and recovery.

In search for all those benefits, Zhou classified 5 layers of Alibaba’s big data computing platform, from applicable and infrastructural levels, namely, data application, data services, big data development kit, and compute engines.


Data Application Layer

Nowadays, Alibaba cloud computing has been widely used as various industrial solutions in different fields like media, health, energy, game, government, ecommerce, and transportation. Alibaba would not own the data sets, but just provide the infrastructure and let its customers to own the data.

The most obvious application is intelligent promotion and recommendation. On 11.11 Shopping Festival 2015, Alibaba adopted machine learning for personalized recommendation, stream compute for processing millions of event per second, batch compute for processing 700 Petabytes of data within 6 hours, and finally received 140,000 orders per second and 85,900 payments per second.

Similarly, another application is intelligent customer service. On 11.11 Shopping Festival 2015, Alibaba received 5 million customer calls and achieved 94% machine-handling proportion. The intelligent governance and planning – the recent successful applications are provincial and macro analysis, real-time traffic monitor and prediction, real-time monitor of water consumption and conservation.

Big data cloud computing also facilitates intelligent healthcare. In particular, smart devices can scan, integrate and then transfer health data to the cloud in real time, so as to provide real time alerts, and enable manufacturers, equipment vendors, medical service providers to share data and develop medical models. At the same time, with wearable devices, smartphones, and portable ECG devices, local SDK could provide heart rate computing and monitoring, so as to establish cloud ECG alliance with family doctors, community ambulance, and hospital A&E department.

One more application is intelligent scientific research. For example, with the cooperation of Genomics, Alibaba has developed pre-processing, variant discovery, and call-set refinement of genes.


Data Services Layer

In the layer of data service, data ingestion comes from 4G/5G sensors, devices, apps, whereas data processing involves more on analyzing audio-visual graphics and language, updating engines for recommendation and subscription.

Natural Language processing means automatic speech recognition, text to speech synthesis, speaker verification, and audio analysis of speech rhythm and speech tone. For example, the cloud accurately predicted the champion for the “I’m a singer” live performance show based on sentiment, singer’s profile, and songs characteristics.

Image analysis, like OCR, refers to pattern recognition loop (image capture-face detection – feature extraction- figure modelling – model comparison). Video analysis mostly refers to real-time scenario monitor and analysis. It is applied in detecting traffic congestion through real-time analyzing the speed, number and types of cars, and then generating real time transport and driving advice.

As engines for recommendation and subscription, the SDK would break down behavior data into item information (keywords, categories), user information (interest, terminal, location, preference, behavior), and collect all those data for processing, and then generate customized and exclusive recommendation (exclusive offers, lucky money events) via algorithm.

The machine learning technique in data service level is more related to analysis and translation of text, speech, audio, images, and video, into computer language, and finally towards interactive dialogue.

Data Development Kit Layer

According to Zhou, the big data development kit layer is consisted of BI reporting, data management, data visualization, DW suite, DW IDE, data quality control, data map, pipeline, multi-source data digestion, data processing (SQL, MR, Shell, Graph) and data analytics (drill down, drag and drop).

There are many functional elements in searchable metadata service, like access control (authentication, authorization, table-level data access management), catalog management (multi-tenant usage supporting, graphical analysis of cross-project or cross-table relationship), table details analysis (usage auditing, usage pattern tracking), global metadata view (project management tooling, tables and files listing, usage and resource summary), metadata search (category search, full text search), and data source management (table creation and modification, data sources creation).

Data quality monitoring could be regarded as data filtering loop. When source data enters the data warehouse, there is an online closed-loop of quality monitoring (data cleaning – data quality checking – online monitoring – warning – problem feedback – data re-cleaning or data source check).

Workload management (with SLA) refers to resource allocation and rescheduling. In particular, it would estimate the average running time of every task, and provide real-time monitoring of the starting time of each checkpoint, and then reschedule the following tasks according to the remaining time, so as to guarantee service level.

Data visualization and synchronization could support various templates and data sources, and provide multi-screen and multi-channel for broadcast. The visualization tools are now used in real-time monitor of marketing campaigns, global trade, retail analysis, natural resource analysis, logistic analysis, urban planning and so on. Machine learning techniques in data development level, refers to algorithm design of neural networks, decision tree, support vector machine, and clustering.


Compute Engines Layer

The most fundamental layer is compute engine infrastructure, including Max-Compute, Stream-Compute, and machine learning for massive data analysis.

In particular, Max-Compute could support batch, interactive, in-memory and iterative computation, whereas Stream-Compute could support real-time computing to millions of smartphones, mobile consoles, sensors, server logs, event stores, online service terminals, and intelligent dashboards.

Alibaba’s analytic database could process and synchronize massive data simultaneously, with real-time LOAP, compatible with MySQL, and at the same time achieve precise computation, high concurrency, high availability, and low latency. In particular, even there are over 210 active developers, generating over 23 M requests, over 60 MS RT, over 5,000 QPS, over 413 B records, over 70 TB storage, the cloud can still achieve 99.99 % availability. With data-parallel computing and distributed resource management, the cloud could then manage multiple clusters in the same time, whereas each cluster is equipped with over 10,000 servers.

Regardless of its short history, Alibaba’s cloud has enjoyed late comer advantage and accumulated faster technological growth than many US competitors. In particular, as the champion of global sorting benchmark race, Hadoop required 4,328 seconds in 2013, Apache Spark required 1,406 seconds in 2014, while Aliyun just required 377 seconds in 2015.

Research Challenges

In summary, Zhou mentioned the research challenges of cloud computing in the future, like cloud-scale computing infrastructure, big data management, and large scale machine learning. Big data and cloud computing would undoubtedly free up developers’ time from routine work, and allow developers to focus more on creative and strategic work. Today, Alibaba cloud is mainly for its own platform or ecommerce partners, there are still provision of open tools for developers from outside. More importantly, all these are more than computing approach, but a change of lifestyle and business model.

Alibaba’s Architect and Application of Artificial Intelligence and Machine Learning

Alibaba’s Architect and Application of Artificial Intelligence and Machine Learning

Screen Shot 2016-06-20 at 8.38.30 AM

Today, the foundation of artificial intelligence, machine learning, has become the hot topic in internet world.  Alibaba, the leading market player of ecommerce and internet finance, has foreseen and applied machine learning technology long before AlphaGo s demonstration in 2016.  In Alibaba Technology Forum in Polytechnic University, June 17 2016, Dr. Wei Chu, Director of Engineering of Alibaba gave his speech on Distributed Machine Learning and its Application in Alibaba.

According to Wikipedia, machine learning is a subfield in computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence.  In general, machine Learning can be divided into 3 levels, namely, supervised learning, unsupervised learning, reinforcement learning.

According to Zhu, there are 3 pillars for artificial intelligence learning and decision making.  The first is big data, as petabytes or megabytes data are just the foundation pool for machine learning and pattern recognition.  The second is cloud computing, as cloud would facilitate high performance, scalability, accessibility, cheap cost, and security of computation.  The third is research innovations – on both algorithm and CPU chips level.

Zhu also pinpointed the features of global machine learning platforms.  For example, Amazon has launched machine learning in its web service platform, but only supports very limited functions like regression and classification.  Google’s cloud platform, Tenserflow, is quite advanced and open sourced, and perhaps the best for professional developers, as the technical bar is rather high for people who don’t know Python.  Microsoft’s machine learning platform, Azure, has a relatively more user friendly graphical interface.  In this respect, Alibaba s Platform of Artificial Intelligence (PAI), is more like Microsoft’s style.

Dr. Zhu also demonstrated the features and functions of Alibaba’s cloud-based and graphical platform (PAI): searching experiments, components and models, managing experiments, exploring data tables, listing algorithms and tools, managing models, generating binary classification and editing diffusion matrix.  In search for data intelligence, PAI is now based on big data collection and analytics, comprehensive data-mining, distributed computing, and advanced algorithm. 

According to Zhu, PAI’s customers include traditional giants (like Sinopac, CNPC, etc.), startups, governments (Zhejiang provincial government’s meteorological administration and traffic management), and researchers (like data scientists, engineers, and developers).

Zhu described 5 conceptual layers within the PAI’s architecture.  The surface layer is applications, such as credit score, security, recommender system, search engine, finance cloud, public cloud.  The next layer is platform products, including product management, module management, analytic visualization, customized PAI, Ani My-PAI etc.  The third layer is algorithms and tools, data pre-processing, feature engineering, machine learning algorithms, statistics, deep learning with CNN/DNN/LSTM RNN.  The next deeper layer is computing framework, MR, SQL, MPI, PS, Graph, GPU single machine or multiple machines.  Finally, the deepest layer is infrastructure, namely, Aliyun, CPU Cloud and GPU cloud.

After that, Zhu also introduced Caffe and Pluto, which is the deep learning algorithm system behind Alibaba’s PAI.  Based on Caffe, Alibaba’s open source deep learning library, Pluto could be regarded as the distributed deep learning algorithms, with multi-machine and multi-card, supporting popular models like Convolutional Neural N or RNN.

Zhu then depicted 3 Deep Learning Models in Alibaba’s cloud service, especially in Pluto.  The first is Deep Neural network (DNN).  This could be regarded as Monte-Carlo system, with multi-card and multi-level computation, in which each node would have distributed inputs and outputs.  The second is Convolutional Neural Network (CNN), every piece in the puzzle would be broken down into smaller units, with continuous convolution-pooling process until it is fully connected, so as to search for precise pattern recognition with higher resolution.  CNN is widely used in region detection and item verification.  With Pluto, Alibaba cloud (with only 8 GPU cards) could identify 100 million images within 10 hours, and in the same time identify over 99.6 % Chinese characters.  The third is Recurrent Neural Network (RNN), which is built with closed loop cycle, enabling the middle layer automatically feedback to previous input layer when identifying faults or missing items, and only pass to output layer until completely debugged.

Within Pluto, there are Parameter Server Nodes (model processors) and Worker Nodes (shared processors).  All the workstations in Parameter Nodes and Worker Nodes would coordinate and feedback to each other collectively.  Recently, Caffe has started to support multiple models, including CNN and LSTM.

As the following, Zhu rendered Pluto’s features in data computation.  With 56 Gigabytes in CPU/GPU cloud, data parallelism and multiple copying, complete failover, asynchronous update, developers and operators would then enjoy significant Pluto’s advantages in data processing, like efficient multi-tasking and scheduling, infinite band communication, scalability, higher accuracy along with higher acceleration ration.

Afterwards, Zhu also showed the Pluto’s applications in business.  The first example is Taobao recommender systems – given user, query, and candidate items, Pluto will find the best item to maximize business metrics, CTR or RPM or user satisfaction.  The second example is logistic Regression.  Pluto has finished the metric analysis of 1B attributes and 57B training samples on Alimama ads within 5 hours, and finally accelerated 40% logistic calculation over MPI.  Right now, Pluto can support up to 10B attributes and up to 100B samples.  The third application is Zhima personal credit score.  The score is the integrated result of several indicators like identification, compliance, credit history, social connections, behavior etc.  The score, 350-950, represents personal and company credit status, whereas the higher the score, means, the lower the default risks with locally connected DNN.  The fourth example is Image OCR, it can be widely used in identity card verification, name-card information input, vocabulary teaching, verification code recognition, and so on.

In summary, Zhu highlighted a picture of Alibaba’s artificial intelligence and machine learning in the near future.  Based on CPU/GPU cloud computing, big data analytics, and deep learning algorithms, Alibaba’s machine learning platform signifies smart computation and data intelligence, which not only helps Alibaba’s developers tremendously, but also its entrepreneurs and customers on Alibaba.


About Alibaba Cloud
Established in September 2009, Alibaba Cloud (, Alibaba Group’s cloud computing arm, develops highly scalable platforms for cloud computing and data management.  It provides a comprehensive suite of cloud computing services to support participants of Alibaba Group’s online and mobile commerce ecosystem, including sellers and other third-party customers and businesses.  Alibaba Cloud is a business within Alibaba Group.


“For Grace”, For Grace in Life?!

“For Grace”, For Grace in Life?!

This is an article especially for the Valentine’s Day 2016. “For Grace” is the foodie-centric documentary by co-directors Kevin Pang and Mark Helenowski, talking about a Chicago master chef (Curtis Duffy)’s quest to a dream restaurant, and the startling story behind his control-freakish pursuit of excellence.

According to Kevin Pang, he worked at the Pike Place Market magic shop during his high school. He highly recommends the fried chicken at Chicken Valley and also writes about food for the Chicago Tribune. One day when he roamed on the streets, and saw the Cheeseburger Show, then thought to start a 15-min web short, but finally ran out of things after filming so many scenes, and made the film shortlisted in culinary film festival.

The story begins with Duffy’s training in Chicago’s best restaurants (Charlie Trotter’s and Alinea). He then earned stellar reviews and two Michelin stars for his modernist cuisine at Avenues in the luxury Peninsula hotel. Duffy tries to make his burning ambition come true – opening his high-end fine-dining restaurant, which he names “Grace”.

The preparation takes longer time and money than he originally expected, but Duffy trudges through the muddling setup process, by waking up at 4 o’ clock in everyday morning, selecting the perfect chairs and hiring a world-class staff and so on. The audience can feel the director’s sharp eyes on emblematic elegant details within the restaurant, for example, kitchen design, menu, staff training, tablecloth, table legs ……

But in the same time, Duffy totally ignores his relationship. He gets barred at door while revisiting the ex-boss’s restaurant as he forgot to sign on for a lawsuit against bad ex-boss. He quarrels with partner, close friend, former mentor, and expectedly, neglects his family and suffers a split with wife and misses his young kids. Even so, Duffy adopts a philosophical attitude about his negligence. He says more than once – the restaurant is his dream, and seems his everything in life, and henceforth, he’s willing to sacrifice his relationship for his dream.

When opening finally arrives, the restaurant is in every bit Duffy hopes for. Duffy shares with beguilingly tone, but when longer he talks, the more he discovers a potential tragedy that might permanently traumatize him. When Duffy greets his middle-school home teacher, whose profound influence on young Curtis as life-saver, he becomes aware that the triumph of dream seems much less affecting, when compared with the relationship he treasures.

The film delivers a revelation that in most of the time, absolute success may be out of one’s reach, but rather, a state of relative grace is the truth of life. Along with the way in search of professional excellence, one should never neglect the surrounding people who you love and love you.

In Valentine’s Day, with the message of “For Grace”, might all the lovers in the world have happy ending and find their diamond deep from heart.


Wechat: raychowyui