Alibaba

Alibaba Cloud Appoints Dr Zhou Jingren as Chief Scientist Leading Big Data and Artificial Intelligence Research at Alibaba iDST

Alibaba Cloud Appoints Dr Zhou Jingren as Chief Scientist Leading Big Data and Artificial Intelligence Research at Alibaba iDST

Alibaba Cloud
Hangzhou, July 7, 2016
-Alibaba Cloud, the cloud computing arm of Alibaba Group, has appointed Dr ZHOU Jingren as Chief Scientist to lead big data and artificial intelligence research at Alibaba Cloud iDST (Institute of Data Science Technology), developing cloud-scale distributed computation platforms, data analytic products and various business solutions.

At Alibaba Cloud iDST, Dr Zhou manages a team of top-tier engineers based in China and USA, which is engaged in the development of advanced technologies in speech, natural language, image and video processing and large-scale machine learning.

Prior to joining Alibaba, Dr Zhou was the Partner Engineering Manager at Microsoft, where he managed a team developing the big data computation platform supporting Microsoft’s Windows and Office applications and web search engine Bing. This platform laid the foundation for Microsoft’s entire back-end data services, utilizing over more than 10,000 computing clusters to provide diversified computing capability, real-time massive data streaming and processing.

Dr Zhou is a well-known researcher in the field of cloud computing, database and distributed systems. He has published dozens of papers in top-tier database and system journals (VLDB, SIGMOD, ODSI), and has been a chairperson and/or panelist at numerous academic conferences.

Dr Zhou received his PhD in Computer Science from Columbia University, having previously obtained his bachelor degree at the University of Science and Technology of China.

 

About Alibaba Cloud
Established in September 2009, Alibaba Cloud (intl.aliyun.com), Alibaba Group’s cloud computing arm, develops highly scalable platforms for cloud computing and data management. It provides a comprehensive suite of cloud computing services to support participants of Alibaba Group’s online and mobile commerce ecosystem, including sellers and other third-party customers and businesses. Alibaba Cloud is a business within Alibaba Group.

Alibaba Cloud Kicks off Create@Alibaba Cloud Startup Contest in Europe

Alibaba Cloud Kicks off Create@Alibaba Cloud Startup Contest in Europe

Alibaba Cloud


Hangzhou, June 27, 2016
Alibaba Cloud, the cloud computing arm of Alibaba Group, announced the kick-off of its Create@Alibaba Cloud Startup Contest (CACSC) regional events in Europe. The CACSC is part of the Alibaba Clouds global startup program, Create@Alibaba Cloud, which was first launched in Singapore in April this year.

The European contests are set to begin in London and Paris on 23 June and 5 July respectively, while other regional contests will be held in South East Asia, Korea, Dubai, Hong Kong and major provinces throughout China later this year. The most promising contestants from each regional event will be invited to attend the world final at the Alibaba Cloud Computing Conference, which will be held in Hangzhou in October 2016.

CACSC is a global entrepreneur contest aimed at championing startups and maximizing their potential through Alibaba’s comprehensive support network and innovative suite of cloud infrastructure services. Businesses are judged on their innovativeness, market potential, profitability and teams.

CACSC provides participants with professional technology training and one-on-one post-sales support to help startups get their businesses up to speed and running smoothly. It also empowers startups with better access to overseas markets and potential connections, which is crucial for those startups looking to expand internationally.

“Alibaba Cloud’s mission is to empower businesses of all sizes and all industries to expand and grow. CACSC is one of the key elements of our Create@Alibaba Cloud program and we are devoted to bringing our expertise to global start-ups through providing access to cloud capabilities from Alibaba Cloud,” said Mr. Sicheng YU, Vice President of Alibaba Group and General Manager of Alibaba Cloud International. “Startups and small enterprises are essential to global innovation and we are excited to see what types of ideas entrepreneurs in Europe put forth during the London and Paris contests, and especially how Alibaba Cloud’s big data and cloud services can help them to achieve their goals.”


A European Hotbed for Technology

Alibaba Cloud chose London as a host city for the CACSC given the capital’s reputation for being a hotbed for technology startups and ideas. As an exciting convergence of high-quality talent, capital, infrastructure and ideas, London enjoys an envied ‘world city’ address providing access to businesses with international reach. Taking place during London Technology Week, CACSC London will be held on 23 June at Huckletree, a London-based co-working space designed for inspiring small businesses and startups, in partnership with London & Partners and Cocoon Networks.

“As the home of more than 40,000 tech businesses, London is now one of the largest tech capitals in Europe with more software developers than Stockholm, Berlin and Dublin combined. The scene for tech startups is thriving and London is one of the best cities to scale and globalize a business. We’re proud to be part of CACSC London and look forward to seeing a local startup take the stage in China in October,” said Gordon Innes, CEO of London & Partners.

Startups from across the UK and the Netherlands have applied to join the CACSC London and compete. Their business ideas come from a range of different industries including e-commerce, Internet of Things, social media and finance.


Opportunity for Global Innovation

Paris has been selected to host the second European regional contest, partnering with Paris & Co, Cheung Kong Graduate School of Business (CKGSB) and Association des Chinois à l’Etranger pour la Création d’Entreprise (ACECE).

“As the economic development and innovation agency in Paris, our mission is to promote the international attractiveness of Paris. We look forward to seeing more brilliant French startups participate in the Create@Alibaba Cloud Startup Contest Paris and explore the opportunities of the digital economy in China,” said Karine Bidart, Managing Director of Paris & Co.

The French capital is well placed to take advantage of the country’s famous scientific tradition, strong engineering workforce and rigorous education system, all of which are encouraging a new generation of startups and an evolving entrepreneurial culture. More than 500 startups will be participating in CACSC Paris, from a wide variety of industries including multimedia, social media, e-education, tourism tech and robotics.

“China has become a new blue ocean for European startups to expand their businesses, and it represents a huge opportunity with an enormous and unsatisfied consumer market, thirst for technology and a pool of risk taking investors,” said Bo JI, Chief Representative for CKGSB Europe.

More information about the Create@Alibaba Cloud Startup Contest can be found at https://intl.aliyun.com/startup


About Alibaba Cloud

Established in September 2009, Alibaba Cloud (intl.aliyun.com), Alibaba Group’s cloud computing arm, develops highly scalable platforms for cloud computing and data management. It provides a comprehensive suite of cloud computing services to support participants of Alibaba Group’s online and mobile commerce ecosystem, including sellers and other third-party customers and businesses. Alibaba Cloud is a business within Alibaba Group.

 

In Search for Data Intelligence: Big Data Computing Platform at Alibaba Cloud

In Search for Data Intelligence: Big Data Computing Platform at Alibaba Cloud

During the Alibaba Technology Forum: Cloud Computing and Big Data, 17 June 2016, in Chiang Cheng Studio Theatre, Polytechnic University, Zhou Jingren, Vice President of Alibaba Group, shared the topic of Big Data Computing Platform at Alibaba Group.  Starting from 2009, Alibaba launched and applied big data cloud computing platform. Today, it only supports the infrastructural services inside Alibaba (all business units of Taobao, Tmall, Alipay and Ali-ads, over 8,000 developers, over 1,500 applications, over 1,000,000 tables, over 40,000 cluster nodes, over 1 EB data), but also offers services for external customers, Vanke Group, Zhejiang provincial government, People’s Daily, China Weather and so on.

Zhou highlighted the key advantages of Alibaba’s cloud. First of all, the obvious benefit is processing speed, as the platform can process 100 TB datasets within 6 hours, 2X faster than Hadoop/Hive, with static and dynamic optimization at various levels. Secondly, scalability of the platform means it can support clusters of million machines and provide linear availability for data intensive computation. Thirdly, there is also high utilization, with efficient resource management, and driving over ten millions of machines by mixing workloads. At the same time, users could enjoy high usability, regarding to the platform’s unified programming languages, availability of structured and unstructured datasets, and seamless integration of user code. Finally, users could also enjoy reliability, with 24 hour continuous monitoring, and automatic failure detection and recovery.

In search for all those benefits, Zhou classified 5 layers of Alibaba’s big data computing platform, from applicable and infrastructural levels, namely, data application, data services, big data development kit, and compute engines.

 

Data Application Layer

Nowadays, Alibaba cloud computing has been widely used as various industrial solutions in different fields like media, health, energy, game, government, ecommerce, and transportation. Alibaba would not own the data sets, but just provide the infrastructure and let its customers to own the data.

The most obvious application is intelligent promotion and recommendation. On 11.11 Shopping Festival 2015, Alibaba adopted machine learning for personalized recommendation, stream compute for processing millions of event per second, batch compute for processing 700 Petabytes of data within 6 hours, and finally received 140,000 orders per second and 85,900 payments per second.

Similarly, another application is intelligent customer service. On 11.11 Shopping Festival 2015, Alibaba received 5 million customer calls and achieved 94% machine-handling proportion. The intelligent governance and planning – the recent successful applications are provincial and macro analysis, real-time traffic monitor and prediction, real-time monitor of water consumption and conservation.

Big data cloud computing also facilitates intelligent healthcare. In particular, smart devices can scan, integrate and then transfer health data to the cloud in real time, so as to provide real time alerts, and enable manufacturers, equipment vendors, medical service providers to share data and develop medical models. At the same time, with wearable devices, smartphones, and portable ECG devices, local SDK could provide heart rate computing and monitoring, so as to establish cloud ECG alliance with family doctors, community ambulance, and hospital A&E department.

One more application is intelligent scientific research. For example, with the cooperation of Genomics, Alibaba has developed pre-processing, variant discovery, and call-set refinement of genes.

 

Data Services Layer

In the layer of data service, data ingestion comes from 4G/5G sensors, devices, apps, whereas data processing involves more on analyzing audio-visual graphics and language, updating engines for recommendation and subscription.

Natural Language processing means automatic speech recognition, text to speech synthesis, speaker verification, and audio analysis of speech rhythm and speech tone. For example, the cloud accurately predicted the champion for the “I’m a singer” live performance show based on sentiment, singer’s profile, and songs characteristics.

Image analysis, like OCR, refers to pattern recognition loop (image capture-face detection – feature extraction- figure modelling – model comparison). Video analysis mostly refers to real-time scenario monitor and analysis. It is applied in detecting traffic congestion through real-time analyzing the speed, number and types of cars, and then generating real time transport and driving advice.

As engines for recommendation and subscription, the SDK would break down behavior data into item information (keywords, categories), user information (interest, terminal, location, preference, behavior), and collect all those data for processing, and then generate customized and exclusive recommendation (exclusive offers, lucky money events) via algorithm.

The machine learning technique in data service level is more related to analysis and translation of text, speech, audio, images, and video, into computer language, and finally towards interactive dialogue.


Data Development Kit Layer

According to Zhou, the big data development kit layer is consisted of BI reporting, data management, data visualization, DW suite, DW IDE, data quality control, data map, pipeline, multi-source data digestion, data processing (SQL, MR, Shell, Graph) and data analytics (drill down, drag and drop).

There are many functional elements in searchable metadata service, like access control (authentication, authorization, table-level data access management), catalog management (multi-tenant usage supporting, graphical analysis of cross-project or cross-table relationship), table details analysis (usage auditing, usage pattern tracking), global metadata view (project management tooling, tables and files listing, usage and resource summary), metadata search (category search, full text search), and data source management (table creation and modification, data sources creation).

Data quality monitoring could be regarded as data filtering loop. When source data enters the data warehouse, there is an online closed-loop of quality monitoring (data cleaning – data quality checking – online monitoring – warning – problem feedback – data re-cleaning or data source check).

Workload management (with SLA) refers to resource allocation and rescheduling. In particular, it would estimate the average running time of every task, and provide real-time monitoring of the starting time of each checkpoint, and then reschedule the following tasks according to the remaining time, so as to guarantee service level.

Data visualization and synchronization could support various templates and data sources, and provide multi-screen and multi-channel for broadcast. The visualization tools are now used in real-time monitor of marketing campaigns, global trade, retail analysis, natural resource analysis, logistic analysis, urban planning and so on. Machine learning techniques in data development level, refers to algorithm design of neural networks, decision tree, support vector machine, and clustering.

 

Compute Engines Layer

The most fundamental layer is compute engine infrastructure, including Max-Compute, Stream-Compute, and machine learning for massive data analysis.

In particular, Max-Compute could support batch, interactive, in-memory and iterative computation, whereas Stream-Compute could support real-time computing to millions of smartphones, mobile consoles, sensors, server logs, event stores, online service terminals, and intelligent dashboards.

Alibaba’s analytic database could process and synchronize massive data simultaneously, with real-time LOAP, compatible with MySQL, and at the same time achieve precise computation, high concurrency, high availability, and low latency. In particular, even there are over 210 active developers, generating over 23 M requests, over 60 MS RT, over 5,000 QPS, over 413 B records, over 70 TB storage, the cloud can still achieve 99.99 % availability. With data-parallel computing and distributed resource management, the cloud could then manage multiple clusters in the same time, whereas each cluster is equipped with over 10,000 servers.

Regardless of its short history, Alibaba’s cloud has enjoyed late comer advantage and accumulated faster technological growth than many US competitors. In particular, as the champion of global sorting benchmark race, Hadoop required 4,328 seconds in 2013, Apache Spark required 1,406 seconds in 2014, while Aliyun just required 377 seconds in 2015.


Research Challenges

In summary, Zhou mentioned the research challenges of cloud computing in the future, like cloud-scale computing infrastructure, big data management, and large scale machine learning. Big data and cloud computing would undoubtedly free up developers’ time from routine work, and allow developers to focus more on creative and strategic work. Today, Alibaba cloud is mainly for its own platform or ecommerce partners, there are still provision of open tools for developers from outside. More importantly, all these are more than computing approach, but a change of lifestyle and business model.

Alibaba’s Architect and Application of Artificial Intelligence and Machine Learning

Alibaba’s Architect and Application of Artificial Intelligence and Machine Learning

Screen Shot 2016-06-20 at 8.38.30 AM

Today, the foundation of artificial intelligence, machine learning, has become the hot topic in internet world.  Alibaba, the leading market player of ecommerce and internet finance, has foreseen and applied machine learning technology long before AlphaGo s demonstration in 2016.  In Alibaba Technology Forum in Polytechnic University, June 17 2016, Dr. Wei Chu, Director of Engineering of Alibaba gave his speech on Distributed Machine Learning and its Application in Alibaba.

According to Wikipedia, machine learning is a subfield in computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence.  In general, machine Learning can be divided into 3 levels, namely, supervised learning, unsupervised learning, reinforcement learning.

According to Zhu, there are 3 pillars for artificial intelligence learning and decision making.  The first is big data, as petabytes or megabytes data are just the foundation pool for machine learning and pattern recognition.  The second is cloud computing, as cloud would facilitate high performance, scalability, accessibility, cheap cost, and security of computation.  The third is research innovations – on both algorithm and CPU chips level.

Zhu also pinpointed the features of global machine learning platforms.  For example, Amazon has launched machine learning in its web service platform, but only supports very limited functions like regression and classification.  Google’s cloud platform, Tenserflow, is quite advanced and open sourced, and perhaps the best for professional developers, as the technical bar is rather high for people who don’t know Python.  Microsoft’s machine learning platform, Azure, has a relatively more user friendly graphical interface.  In this respect, Alibaba s Platform of Artificial Intelligence (PAI), is more like Microsoft’s style.

Dr. Zhu also demonstrated the features and functions of Alibaba’s cloud-based and graphical platform (PAI): searching experiments, components and models, managing experiments, exploring data tables, listing algorithms and tools, managing models, generating binary classification and editing diffusion matrix.  In search for data intelligence, PAI is now based on big data collection and analytics, comprehensive data-mining, distributed computing, and advanced algorithm. 

According to Zhu, PAI’s customers include traditional giants (like Sinopac, CNPC, etc.), startups, governments (Zhejiang provincial government’s meteorological administration and traffic management), and researchers (like data scientists, engineers, and developers).

Zhu described 5 conceptual layers within the PAI’s architecture.  The surface layer is applications, such as credit score, security, recommender system, search engine, finance cloud, public cloud.  The next layer is platform products, including product management, module management, analytic visualization, customized PAI, Ani My-PAI etc.  The third layer is algorithms and tools, data pre-processing, feature engineering, machine learning algorithms, statistics, deep learning with CNN/DNN/LSTM RNN.  The next deeper layer is computing framework, MR, SQL, MPI, PS, Graph, GPU single machine or multiple machines.  Finally, the deepest layer is infrastructure, namely, Aliyun, CPU Cloud and GPU cloud.

After that, Zhu also introduced Caffe and Pluto, which is the deep learning algorithm system behind Alibaba’s PAI.  Based on Caffe, Alibaba’s open source deep learning library, Pluto could be regarded as the distributed deep learning algorithms, with multi-machine and multi-card, supporting popular models like Convolutional Neural N or RNN.

Zhu then depicted 3 Deep Learning Models in Alibaba’s cloud service, especially in Pluto.  The first is Deep Neural network (DNN).  This could be regarded as Monte-Carlo system, with multi-card and multi-level computation, in which each node would have distributed inputs and outputs.  The second is Convolutional Neural Network (CNN), every piece in the puzzle would be broken down into smaller units, with continuous convolution-pooling process until it is fully connected, so as to search for precise pattern recognition with higher resolution.  CNN is widely used in region detection and item verification.  With Pluto, Alibaba cloud (with only 8 GPU cards) could identify 100 million images within 10 hours, and in the same time identify over 99.6 % Chinese characters.  The third is Recurrent Neural Network (RNN), which is built with closed loop cycle, enabling the middle layer automatically feedback to previous input layer when identifying faults or missing items, and only pass to output layer until completely debugged.

Within Pluto, there are Parameter Server Nodes (model processors) and Worker Nodes (shared processors).  All the workstations in Parameter Nodes and Worker Nodes would coordinate and feedback to each other collectively.  Recently, Caffe has started to support multiple models, including CNN and LSTM.

As the following, Zhu rendered Pluto’s features in data computation.  With 56 Gigabytes in CPU/GPU cloud, data parallelism and multiple copying, complete failover, asynchronous update, developers and operators would then enjoy significant Pluto’s advantages in data processing, like efficient multi-tasking and scheduling, infinite band communication, scalability, higher accuracy along with higher acceleration ration.

Afterwards, Zhu also showed the Pluto’s applications in business.  The first example is Taobao recommender systems – given user, query, and candidate items, Pluto will find the best item to maximize business metrics, CTR or RPM or user satisfaction.  The second example is logistic Regression.  Pluto has finished the metric analysis of 1B attributes and 57B training samples on Alimama ads within 5 hours, and finally accelerated 40% logistic calculation over MPI.  Right now, Pluto can support up to 10B attributes and up to 100B samples.  The third application is Zhima personal credit score.  The score is the integrated result of several indicators like identification, compliance, credit history, social connections, behavior etc.  The score, 350-950, represents personal and company credit status, whereas the higher the score, means, the lower the default risks with locally connected DNN.  The fourth example is Image OCR, it can be widely used in identity card verification, name-card information input, vocabulary teaching, verification code recognition, and so on.

In summary, Zhu highlighted a picture of Alibaba’s artificial intelligence and machine learning in the near future.  Based on CPU/GPU cloud computing, big data analytics, and deep learning algorithms, Alibaba’s machine learning platform signifies smart computation and data intelligence, which not only helps Alibaba’s developers tremendously, but also its entrepreneurs and customers on Alibaba.

 

About Alibaba Cloud
Established in September 2009, Alibaba Cloud (intl.aliyun.com), Alibaba Group’s cloud computing arm, develops highly scalable platforms for cloud computing and data management.  It provides a comprehensive suite of cloud computing services to support participants of Alibaba Group’s online and mobile commerce ecosystem, including sellers and other third-party customers and businesses.  Alibaba Cloud is a business within Alibaba Group.

 

Alibaba Group Co-hosts Technology Forum with Hong Kong PolyU

Alibaba Group Co-hosts Technology Forum with Hong Kong PolyU

Above Photo: Dr. Jingren ZHOU, Vice President of Alibaba Group, discussed key  trends in big data and cloud computing at Alibaba Technology Forum in Hong Kong.

Screen Shot 2016-06-20 at 8.38.30 AM
Hong Kong, June 17, 2016
-Alibaba Group and Hong Kong Polytechnic University hosted a technology forum for nearly 300 students today to discuss key opportunities, trends and challenges in big data and cloud computing. The Technology Forum covered topics including innovations in elastic computing, machine learning and big data development, and how these innovations can address a range of different business needs and challenges.

Image 2
(From left to right): Dr. Jingren ZHOU, Vice President of Alibaba Group, presented gift to Professor Alex Wai, Vice President (Research Development) of Hong Kong Polytechnic University

Dr. Jingren ZHOU, Vice President of Alibaba Group, presented an overview of Alibaba’s innovative Big Data Computing Platform (BDCP), which consists of a wide range of products and services which enable fast and efficient big data development.

Since its establishment in 2009, Alibaba’s BDCP has expanded its coverage from China to US, Europe, Middle East and South East Asia, with more than 8,000 developers and over 1,500 applications. BDCP’s technology not only supports Alibaba’s internal businesses but also provides services to enterprise customers worldwide.

Dr. Zhou also demonstrated how BDCP utilizes data visualization, making it useful for a wide variety of industries, including transportation, healthcare, water conservancy and human gene analysis.

“Alibaba Cloud’s big data and cloud computing technologies are not only contributing an enormous amount to the ongoing development of the wider technology industry, but also empowering enterprises in various sectors such as finance, energy, gaming, entertainment, healthcare and education. Today’s forum allows us to showcase these world-class tools to help inspire students to become the technology leaders of tomorrow,” said Dr. Zhou. “To us, supporting future talent is as important as growing the cloud computing ecosystem, and we’re delighted to be doing both.”

At the same event, Dr. Wei CHU, Director of Engineering, Alibaba Cloud, introduced the Distributed Machine Learning Platform and its applications – also known as Platform of Artificial Intelligence (PAI). PAI supports customers by providing complete solutions for various business scenarios including advertising, search and finance. Dr. Xianglong HUANG, Director of Engineering, Alibaba Cloud, shared his insights on the evolving virtualization techniques in elastic computing and the future roadmap for helping more enterprise users. In addition, Dr. Qian Zhengping, Staff Engineer of Alibaba Cloud, shared the latest development of real-time computing, and how it facilitates increasing demand for applications such as online payment and traffic monitoring.

This event is part of the Alibaba Technology Forum (ATF), which is an annual event organized by Alibaba Group. Previous editions of the Alibaba Technology Forum (ATF) have been held at Stanford University, Hong Kong University of Science and Technology, Peking University and Beijing University of Posts and Telecommunications.


About Alibaba Cloud
Established in September 2009, Alibaba Cloud (intl.aliyun.com), Alibaba Group’s cloud computing arm, develops highly scalable platforms for cloud computing and data management. It provides a comprehensive suite of cloud computing services to support participants of Alibaba Group’s online and mobile commerce ecosystem, including sellers and other third-party customers and businesses. Alibaba Cloud is a business within Alibaba Group.