MPN Competency: Big Data for Data Analytics_zh-CN MPN14481 (April 2017)
This module discussed Azure Data Lake, is Microsoft’s hyperscale repository for big data analytic workloads in the cloud. This offering is built for the cloud, compatible with HDFS, and has unbounded scale with massive throughput and enterprise-grade capabilities.
The module covers Apache Spark. Apache Spark unifies batch processing, real-time processing, stream analytics, machine learning, and interactive SQL. Apache Spark is an open source processing framework that runs large-scale data analytics applications. Built on an in-memory compute engine, Spark is known for high performance querying on big data. It leverages a parallel data processing framework that persists data in-memory and disk if needed. This allows Spark to deliver both 100x faster speed and a common execution model for various tasks like extract, transform, load (ETL), batch, and interactive queries on data in Hadoop distributed file systems (HDFS).
This module discusses Hadoop on HDInsight. Hadoop is a Java-based, open source Apache project delivering a highly reliable, distributed, and parallel programming framework for analyzing big data. HDInsight is a standard Apache Hadoop distribution offered as a managed service on Microsoft Azure.
Saving please wait...