During the Alibaba Technology Forum: Cloud Computing and Big Data, 17 June 2016, in Chiang Cheng Studio Theatre, Polytechnic University, Zhou Jingren, Vice President of Alibaba Group, shared the topic of Big Data Computing Platform at Alibaba Group. Starting from 2009, Alibaba launched and applied big data cloud computing platform. Today, it only supports the infrastructural services inside Alibaba (all business units of Taobao, Tmall, Alipay and Ali-ads, over 8,000 developers, over 1,500 applications, over 1,000,000 tables, over 40,000 cluster nodes, over 1 EB data), but also offers services for external customers, Vanke Group, Zhejiang provincial government, People’s Daily, China Weather and so on.
Zhou highlighted the key advantages of Alibaba’s cloud. First of all, the obvious benefit is processing speed, as the platform can process 100 TB datasets within 6 hours, 2X faster than Hadoop/Hive, with static and dynamic optimization at various levels. Secondly, scalability of the platform means it can support clusters of million machines and provide linear availability for data intensive computation. Thirdly, there is also high utilization, with efficient resource management, and driving over ten millions of machines by mixing workloads. At the same time, users could enjoy high usability, regarding to the platform’s unified programming languages, availability of structured and unstructured datasets, and seamless integration of user code. Finally, users could also enjoy reliability, with 24 hour continuous monitoring, and automatic failure detection and recovery.
In search for all those benefits, Zhou classified 5 layers of Alibaba’s big data computing platform, from applicable and infrastructural levels, namely, data application, data services, big data development kit, and compute engines.
Data Application Layer
Nowadays, Alibaba cloud computing has been widely used as various industrial solutions in different fields like media, health, energy, game, government, ecommerce, and transportation. Alibaba would not own the data sets, but just provide the infrastructure and let its customers to own the data.
The most obvious application is intelligent promotion and recommendation. On 11.11 Shopping Festival 2015, Alibaba adopted machine learning for personalized recommendation, stream compute for processing millions of event per second, batch compute for processing 700 Petabytes of data within 6 hours, and finally received 140,000 orders per second and 85,900 payments per second.
Similarly, another application is intelligent customer service. On 11.11 Shopping Festival 2015, Alibaba received 5 million customer calls and achieved 94% machine-handling proportion. The intelligent governance and planning – the recent successful applications are provincial and macro analysis, real-time traffic monitor and prediction, real-time monitor of water consumption and conservation.
Big data cloud computing also facilitates intelligent healthcare. In particular, smart devices can scan, integrate and then transfer health data to the cloud in real time, so as to provide real time alerts, and enable manufacturers, equipment vendors, medical service providers to share data and develop medical models. At the same time, with wearable devices, smartphones, and portable ECG devices, local SDK could provide heart rate computing and monitoring, so as to establish cloud ECG alliance with family doctors, community ambulance, and hospital A&E department.
One more application is intelligent scientific research. For example, with the cooperation of Genomics, Alibaba has developed pre-processing, variant discovery, and call-set refinement of genes.
Data Services Layer
In the layer of data service, data ingestion comes from 4G/5G sensors, devices, apps, whereas data processing involves more on analyzing audio-visual graphics and language, updating engines for recommendation and subscription.
Natural Language processing means automatic speech recognition, text to speech synthesis, speaker verification, and audio analysis of speech rhythm and speech tone. For example, the cloud accurately predicted the champion for the “I’m a singer” live performance show based on sentiment, singer’s profile, and songs characteristics.
Image analysis, like OCR, refers to pattern recognition loop (image capture-face detection – feature extraction- figure modelling – model comparison). Video analysis mostly refers to real-time scenario monitor and analysis. It is applied in detecting traffic congestion through real-time analyzing the speed, number and types of cars, and then generating real time transport and driving advice.
As engines for recommendation and subscription, the SDK would break down behavior data into item information (keywords, categories), user information (interest, terminal, location, preference, behavior), and collect all those data for processing, and then generate customized and exclusive recommendation (exclusive offers, lucky money events) via algorithm.
The machine learning technique in data service level is more related to analysis and translation of text, speech, audio, images, and video, into computer language, and finally towards interactive dialogue.
Data Development Kit Layer
According to Zhou, the big data development kit layer is consisted of BI reporting, data management, data visualization, DW suite, DW IDE, data quality control, data map, pipeline, multi-source data digestion, data processing (SQL, MR, Shell, Graph) and data analytics (drill down, drag and drop).
There are many functional elements in searchable metadata service, like access control (authentication, authorization, table-level data access management), catalog management (multi-tenant usage supporting, graphical analysis of cross-project or cross-table relationship), table details analysis (usage auditing, usage pattern tracking), global metadata view (project management tooling, tables and files listing, usage and resource summary), metadata search (category search, full text search), and data source management (table creation and modification, data sources creation).
Data quality monitoring could be regarded as data filtering loop. When source data enters the data warehouse, there is an online closed-loop of quality monitoring (data cleaning – data quality checking – online monitoring – warning – problem feedback – data re-cleaning or data source check).
Workload management (with SLA) refers to resource allocation and rescheduling. In particular, it would estimate the average running time of every task, and provide real-time monitoring of the starting time of each checkpoint, and then reschedule the following tasks according to the remaining time, so as to guarantee service level.
Data visualization and synchronization could support various templates and data sources, and provide multi-screen and multi-channel for broadcast. The visualization tools are now used in real-time monitor of marketing campaigns, global trade, retail analysis, natural resource analysis, logistic analysis, urban planning and so on. Machine learning techniques in data development level, refers to algorithm design of neural networks, decision tree, support vector machine, and clustering.
Compute Engines Layer
The most fundamental layer is compute engine infrastructure, including Max-Compute, Stream-Compute, and machine learning for massive data analysis.
In particular, Max-Compute could support batch, interactive, in-memory and iterative computation, whereas Stream-Compute could support real-time computing to millions of smartphones, mobile consoles, sensors, server logs, event stores, online service terminals, and intelligent dashboards.
Alibaba’s analytic database could process and synchronize massive data simultaneously, with real-time LOAP, compatible with MySQL, and at the same time achieve precise computation, high concurrency, high availability, and low latency. In particular, even there are over 210 active developers, generating over 23 M requests, over 60 MS RT, over 5,000 QPS, over 413 B records, over 70 TB storage, the cloud can still achieve 99.99 % availability. With data-parallel computing and distributed resource management, the cloud could then manage multiple clusters in the same time, whereas each cluster is equipped with over 10,000 servers.
Regardless of its short history, Alibaba’s cloud has enjoyed late comer advantage and accumulated faster technological growth than many US competitors. In particular, as the champion of global sorting benchmark race, Hadoop required 4,328 seconds in 2013, Apache Spark required 1,406 seconds in 2014, while Aliyun just required 377 seconds in 2015.
In summary, Zhou mentioned the research challenges of cloud computing in the future, like cloud-scale computing infrastructure, big data management, and large scale machine learning. Big data and cloud computing would undoubtedly free up developers’ time from routine work, and allow developers to focus more on creative and strategic work. Today, Alibaba cloud is mainly for its own platform or ecommerce partners, there are still provision of open tools for developers from outside. More importantly, all these are more than computing approach, but a change of lifestyle and business model.