Microsoft Azure Databricks for Data Engineering
Learn how to harness the power of Apache Spark and Azure Databricks to run large data engineering workloads in the cloud. Prepare for the Exam DP-203: Data Engineering on Microsoft Azure and gain expertise in designing and implementing data solutions using Azure data services.
In this course, you will learn how to harness the power of Apache Spark and powerful clusters running on the Azure Databricks platform to run large data engineering workloads in the cloud.
You will discover the capabilities of Azure Databricks and the Apache Spark notebook for processing huge files. You will come to understand the Azure Databricks platform and identify the types of tasks well-suited for Apache Spark. You will also be introduced to the architecture of an Azure Databricks Spark Cluster and Spark Jobs. You will work with large amounts of data from multiple sources in different raw formats. you will learn how Azure Databricks supports day-to-day data-handling functions, such as reads, writes, and queries.
This course is part of a Specialization intended for Data engineers and developers who want to demonstrate their expertise in designing and implementing data solutions that use Microsoft Azure data services for anyone interested in preparing for the Exam DP-203: Data Engineering on Microsoft Azure (beta). You will take a practice exam that covers key skills measured by the certification exam.
This is the eighth course in a program of 10 courses to help prepare you to take the exam so that you can have expertise in designing and implementing data solutions that use Microsoft Azure data services. The Data Engineering on Microsoft Azure exam is an opportunity to prove knowledge expertise in integrating, transforming, and consolidating data from various structured and unstructured data systems into structures that are suitable for building analytics solutions that use Microsoft Azure data services. Each course teaches you the concepts and skills that are measured by the exam.
By the end of this Specialization, you will be ready to take and sign-up for the Exam DP-203: Data Engineering on Microsoft Azure (beta).
What you will learn
Introduction to Azure Databricks
Describe the capabilities of Azure Databricks and the Apache Spark notebook for processing huge files. Describe the Azure Databricks platform and identify the types of tasks well-suited for Apache Spark. Describe the architecture of an Azure Databricks Spark Cluster and Spark Jobs.
Read and write data in Azure Databricks
Describe how to use Azure Databricks supports day-to-day data-handling functions, such as reads, writes, and queries.
Data processing in Azure Databricks
Process data in Azure Databricks by defining DataFrames to read and process the Data. Perform data transformations in DataFrames and execute actions to display the transformed data. Explain the difference between a transform and an action, lazy and eager evaluations, Wide and Narrow transformations, and other optimizations in Azure Databricks.
Work with DataFrames in Azure Databricks
Use the DataFrame Column Class Azure Databricks to apply column-level transformations, such as sorts, filters and aggregations. Use advanced DataFrame functions operations to manipulate data, apply aggregates, and perform date and time operations in Azure Databricks.