Cloudera Data Engineering

Cloudera Data Engineering: Empower Your Skills in Apache Spark and High-Performance Data Applications on the Cloudera Data Platform

Buy NOW!

$1,500.00

Overview

This four-day hands-on training course in Cloudera Data Engineering delivers the key concepts and knowledge developers need to utilize Apache Spark for developing high-performance, parallel applications on the Cloudera Data Platform (CDP). By focusing on Cloudera Data Engineering, participants will gain valuable skills in managing data and building robust data pipelines.

Through practical exercises, students will practice writing Spark applications that integrate with CDP’s core components. Participants will learn how to use Spark SQL to query structured data, utilize Hive features for data ingestion and denormalization, and work with large datasets stored in a distributed file system.

After completing this course, participants will be well-equipped to tackle real-world challenges, building applications that facilitate faster decision-making, enhance analytical capabilities, and support a wide range of use cases across various industries.

What you’ll learn

During this course, you will learn how to:

Distribute, store, and process data in a CDP cluster.
Write, configure, and deploy Apache Spark applications.
Use the Spark interpreters and applications to explore, process, and analyze distributed data.
Query data using Spark SQL, DataFrames, and Hive tables.
Deploy a Spark application on the Data Engineering Service.

What to Expect

This course is designed for developers and data engineers. All students are expected to have basic Linux experience, and basic proficiency with either Python or Scala programming languages. Basic knowledge of SQL is helpful. Prior knowledge of Spark and Hadoop is not required.

Course details

HDFS Introduction

HDFS Overview
HDFS Components and Interactions
Additional HDFS Interactions
Ozone Overview
Exercise: Working with HDFS

YARN Introduction

YARN Overview
YARN Components and Interaction
Working with YARN
Exercise: Working with YARN

Working with RDDs

Resilient Distributed Datasets (RDDs)
Exercise: Working with RDDs

Working with DataFrames

Introduction to DataFrames
Exercise: Introducing DataFrames
Exercise: Reading and Writing DataFrames
Exercise: Working with Columns
Exercise: Working with Complex Types
Exercise: Combining and Splitting DataFrames
Exercise: Summarizing and Grouping DataFrames
Exercise: Working with UDFs
Exercise: Working with Windows

Introduction to Apache Hive

About Hive
Transforming data with Hive QL

Working with Apache Hive

Exercise: Working with Partitions
Exercise: Working with Buckets
Exercise: Working with Skew
Exercise: Using Serdes to Ingest Text Data
Exercise: Using Complex Types to Denormalize Data

Buy NOW!

Overview

What you’ll learn

What to Expect

Course details

Cloudera Data Engineering chapters

CDE_Intro

HDFS_YARN

Spark_Basic_DF

Spark_Analysis_DF

RDD_Overview

RDD_Trans_Aggre

Spark_SQL-App

Spark_DP_DDP

Hive_Intro

Hive_ACID_MV

Hive_SerDe_Complax

Hive_Joins_Partition

About

Courses

Services

Contact Us