• Stars
    star
    1,167
  • Rank 40,028 (Top 0.8 %)
  • Language
    Jupyter Notebook
  • License
    Other
  • Created over 9 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

PySpark-Tutorial provides basic algorithms using PySpark

PySpark Tutorial

  • PySpark is the Python API for Spark.

  • The purpose of PySpark tutorial is to provide basic distributed algorithms using PySpark.

  • PySpark supports two types of Data Abstractions:

    • RDDs
    • DataFrames
  • PySpark Interactive Mode: has an interactive shell ($SPARK_HOME/bin/pyspark) for basic testing and debugging and is not supposed to be used for production environment.

  • PySpark Batch Mode: you may use $SPARK_HOME/bin/spark-submit command for running PySpark programs (may be used for testing and production environemtns)


Glossary: big data, MapReduce, Spark


Basics of PySpark with Examples


PySpark Examples and Tutorials


Books

Data Algorithms with Spark

Data Algorithms

PySpark Algorithms


Miscellaneous

Download, Install Spark and Run PySpark

How to Minimize the Verbosity of Spark


PySpark Tutorial and References...


Questions/Comments

Thank you!

best regards,
Mahmoud Parsian

Data Algorithms with Spark Data Algorithms with Spark PySpark Algorithms Data Algorithms