Spark ETL | Big Data In Real World

Autoplay
Autocomplete

Previous Lesson Complete and Continue

Spark Developer In Real World

Let's Get Started

Thank you and Welcome (11:35)

Introduction To Spark

Hadoop vs. Spark - Who Wins (15:30)
Challenges Spark Tries To Address (12:24)
How Spark Is Faster Than Hadoop (8:39)

RDD - Core Of Spark

The Need For RDD (11:29)
What Is RDD (12:30)
What An RDD Is Not (7:31)

Execution In Spark (Behind the scenes)

First Program In Spark (16:04)
What are Dependencies and Why They are Important (11:11)
Program to Execution (Part 1) (13:01)
Program to Execution (Part 2) (19:10)
Caching Data In Spark (15:04)
Fault Tolerance (7:34)

Shuffle in Spark

Need for Shuffle (10:45)
Hash Shuffle Manager - Part 1 (11:44)
Hash Shuffle Manager - Part 2 (14:07)
Sort Shuffle Manager (8:15)

Spark Transformations

reduceByKey vs groupByKey (9:34)
Cogroup, Join and Avoiding Shuffle - Part 1 (14:19)
Cogroup, Join and Avoiding Shuffle - Part 2 (8:23)
Resizing Partitions (7:46)

PageRanking with RDDs

PageRanking Algorithm (7:33)
PageRank Walk-through (6:15)
Implementing PageRank with RDDs (6:31)

Beyond RDDs

What's the Problem with RDDs (11:53)
DataFrame vs DataSet vs SQL (12:25)
Simple Selects (8:26)
Filtering DataFrames (2:24)
Aggregating DataFrames (5:19)
Joining DataFrames (8:20)
PageRanking with DataFrames (16:39)

Spark with Other Datasources & File Formats

Spark & Hive (8:26)
Spark & Hive with XML, Parquet & ORC (14:23)
Spark & RDBMS (8:49)
Spark & HBase (Part - 1) (18:47)
Spark & HBase (Part - 2) (9:03)

Spark Optimizations

Number of Tasks (14:33)
Join Algorithms (16:57)
Picking a Join Algorithm (9:09)
Join Hints (4:13)

Spark - Under the Hood

Inside the Catalyst Optimizer (12:05)
Catalyst Optimizer - Plan Walkthrough (6:27)
Project Tungsten - Better Memory Management (13:09)
Project Tungsten - CPU Cache Aware Optimizations (11:05)

Resource Management

Spark Architecture (7:59)
Memory Layout In Executor (8:12)
Resource Management - Standalone (12:09)
Resource Management - YARN (14:07)
Dynamic Resource Allocation (7:47)

Cluster Installation

Spark Installation (5:28)
Hadoop Cluster Setup (Part 1) (23:43)
Hadoop Cluster Setup (Part 2) (25:35)
Hadoop Cluster Setup (Part 3) (18:01)

An end to end project (Spark, Elasticsearch, Kibana, REST and Angular)

End to End Project Introduction (8:09)
Elasticsearch (A quick introduction) (8:18)
Hands-on with Elasticsearch (10:45)
Stackoverflow Dataset (8:58)
Spark ETL (12:53)
Visualizations with Kibana (8:44)
REST Service with Spring framework (19:29)
Building an Angular application (12:28)

Introduction to Kafka

Kafka - The Why and the What (8:43)
Key Concepts (12:32)
Experiments with Kafka (19:18)

Machine Learning

Introduction to Machine Learning (11:38)
Machine Learning Blueprint (5:49)
Feature Engineering (10:39)
Linear Regression (8:17)
World Happiness Project (13:58)
Decision Trees (9:55)
Random Forest (3:14)
Predicting 2016 US Elections (11:46)
Predicting Yelp Ratings (+ve or -ve) (15:55)

Streaming with Spark

Why Streaming and How Spark Does Streaming (11:51)
Core Concepts in Streaming (8:36)
Output Modes With Non Aggregate Queries (13:40)
Output Modes With Aggregate Queries (8:50)
Event Time, Window and Late Events (10:39)
Handling Late Events In Streaming (10:47)
Late Events and Append Mode (8:05)
Streaming Meetup with Spark (Part 1) (5:31)
Streaming Meetup with Spark (Part 2) (8:53)

A Short Chapter On Scala

Introduction to Scala (12:05)
First Program in Scala (not HelloWorld) (11:45)
Scala Functions (11:43)

Teach online with

Spark ETL

Lesson content locked

If you're already enrolled, you'll need to login.

Enroll in Course to Unlock