• Stars
    star
    204
  • Rank 192,063 (Top 4 %)
  • Language
    Python
  • Created almost 5 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian

Data Algorithms with Spark by Mahmoud Parsian

"... This book will be a great resource for
both readers looking to implement existing
algorithms in a scalable fashion and readers
who are developing new, custom algorithms
using Spark. ..."

Dr. Matei Zaharia
Original Creator of Apache Spark

FOREWORD by Dr. Matei Zaharia

Data Algorithms with Spark by Mahmoud Parsian

Foreword by Dr. Matei Zaharia (Original Creator of Apache Spark)

Author: Mahmoud Parsian

Goal of this book: Data Algorithms with Spark

Story of this book: Data Algorithms with Spark



Github Chapter Solutions


Software:

All programs are tested with the following software:

Spark Python Scala Java
Apache Spark 3.4.0 Python 3.10.5 Scala 2.13 Java 11

Table of Contents

Chapter Title
Glossary Glossary of Big Data, MapReduce, Spark
Chapter 1 Introduction to Data Algorithms
Chapter 2 Transformations in Action
Chapter 3 Mapper Transformations
Chapter 4 Reductions in Spark
Chapter 5 Partitioning Data
Chapter 6 Graph Algorithms
Chapter 7 Interacting with External Data Sources
Chapter 8 Ranking Algorithms
Chapter 9 Fundamental Data Design Patterns
Chapter 10 Common Data Design Patterns
Chapter 11 Join Design Patterns
Chapter 12 Feature Engineering in PySpark

Bonus Chapters

Bonus Chapter Title / Description
Glossary Glossary of Big Data, MapReduce, Spark
Word Count Solutions for Word Count using RDDs and DataFrames
Anagrams Find words, which are anagrams
Lambda Expressions Using Lambda Expressions in PySpark programs
TF-IDF Term Frequency - Inverse Document Frequency
K-mers K-mers for DNA Sequences
Correlation All vs. All Correlation
Mapping Partitions mapPartitions() Complete Example
UDF User-Defined Function Examples
DataFrames Transformations Examples on Creation and Transformation of DataFrames
DataFrames Tutorials DataFrames Tutorials: from collections and CSV text files
Join Operations Examples on join of RDDs and DataFrames
PySpark Tutorial 101 Examples on using PySpark RDDs and DataFrames
Physical Data Partitioning Tutorial of Physical Data Partitioning
Monoids and Combiners Monoid as a Design Principle

Data Algorithms with Spark Data Algorithms with Spark Data Algorithms with Spark