Analyzing Large Data Sets with Apache Spark

Analyzing Large Data Sets with Apache Spark

Get ready for your exam by enrolling in our comprehensive training course. This course includes a full set of instructional videos designed to equip you with in-depth knowledge essential for passing the certification exam with flying colors.

$14.99 / $24.99

Getting Started with Spark

  • 1. Introduction
    2m 16s
  • 2. How to Use This Course
    1m 41s
  • 3. [Activity]Getting Set Up: Installing Python, a JDK, Spark, and its Dependencies.
    14m 50s
  • 4. [Activity] Installing the MovieLens Movie Rating Dataset
    3m 35s
  • 5. [Activity] Run your first Spark program! Ratings histogram example.
    4m 52s

Spark Basics and Simple Examples

  • 1. Introduction to Spark
    10m 11s
  • 2. The Resilient Distributed Dataset (RDD)
    12m 17s
  • 3. Ratings Histogram Walkthrough
    13m 33s
  • 4. Key/Value RDD's, and the Average Friends by Age Example
    16m 13s
  • 5. [Activity] Running the Average Friends by Age Example
    5m 39s
  • 6. Filtering RDD's, and the Minimum Temperature by Location Example
    8m 10s
  • 7. [Activity]Running the Minimum Temperature Example, and Modifying it for Maximums
    5m 8s
  • 8. [Activity] Running the Maximum Temperature by Location Example
    3m 21s
  • 9. [Activity] Counting Word Occurrences using flatmap()
    7m 28s
  • 10. [Activity] Improving the Word Count Script with Regular Expressions
    4m 44s
  • 11. [Activity] Sorting the Word Count Results
    7m 44s

Advanced Examples of Spark Programs

  • 1. [Activity] Find the Most Popular Movie
    5m 52s
  • 2. [Activity] Use Broadcast Variables to Display Movie Names Instead of ID Numbers
    8m 23s
  • 3. Find the Most Popular Superhero in a Social Graph
    4m 29s
  • 4. [Activity] Run the Script - Discover Who the Most Popular Superhero is!
    6m
  • 5. Superhero Degrees of Separation: Introducing Breadth-First Search
    7m 54s
  • 6. Superhero Degrees of Separation: Accumulators, and Implementing BFS in Spark
    6m 44s
  • 7. [Activity] Superhero Degrees of Separation: Review the Code and Run it
    9m 14s
  • 8. Item-Based Collaborative Filtering in Spark, cache(), and persist()
    10m 12s
  • 9. [Activity] Running the Similar Movies Script using Spark's Cluster Manager
    10m 54s
  • 10. [Exercise] Improve the Quality of Similar Movies
    2m 58s

Running Spark on a Cluster

  • 1. Introducing Elastic MapReduce
    5m 8s
  • 2. [Activity] Setting up your AWS / Elastic MapReduce Account and Setting Up PuTTY
    9m 55s
  • 3. Partitioning
    4m 21s
  • 4. Create Similar Movies from One Million Ratings - Part 1
    5m 12s
  • 5. [Activity] Create Similar Movies from One Million Ratings - Part 2
    11m 27s
  • 6. Create Similar Movies from One Million Ratings - Part 3
    3m 28s
  • 7. Troubleshooting Spark on a Cluster
    3m 43s
  • 8. More Troubleshooting, and Managing Dependencies
    5m 47s

SparkSQL, DataFrames, and DataSets

  • 1. Introducing SparkSQL
    6m 8s
  • 2. Executing SQL commands and SQL-style functions on a DataFrame
    8m 16s
  • 3. Using DataFrames instead of RDD's
    5m 52s

Other Spark Technologies and Libraries

  • 1. Introducing MLLib
    8m 10s
  • 2. [Activity] Using MLLib to Produce Movie Recommendations
    2m 56s
  • 3. Analyzing the ALS Recommendations Results
    4m 53s
  • 4. Using DataFrames with MLLib
    7m 31s
  • 5. Spark Streaming and GraphX
    7m 36s
Study4Pass does not provide real Microsoft exam questions. Similarly, Study4Pass does not supply real Amazon exam questions. The materials offered by Study4Pass lack real questions and answers from Cisco's certification exams. The CFA Institute neither endorses nor assures the accuracy or quality of Study4Pass content. CFA® and Chartered Financial Analyst® are registered trademarks held by the CFA Institute.

© study4pass.com 2025. All rights reserved.