Distributed Statistical Computing (大数据分布式计算——教学讲义以及案例)
developed by
Feng Li
School of Statistics and Mathematics
Central University of Finance and Economics
[email protected]
由中央财经大学统计与数学学院李丰建设。
Course Homepage (课程主页)
Books (中文配套教材)
-
Distributed Statistical Computing for Big Data and Case Studies (大数据分布式计算与案例) ISBN:9787300230276
- Available at JD.COM
-
New version (In Preparation)
Contents (目录)
Teaching slides and demo code (with Jupyter Notebook format)
Quick View
You could view all the notebooks in this repository via the Jupyter Notebook Viewer
Run the demo locally
Requirements to run the notebook interactively
-
Python (>= 3.6.0)
findspark
(invoke Spark from Python Session)numpy
,scipy
,pandas
-
Hadoop (>= 2.7.0)
-
Hive (>= 2.3.3)
-
Spark (>= 2.3.1)
-
Jupyter Notebook (>= 5.0)
-
RISE (for Jupyter slides)
Use
Alt+R
to enter slideshow mode -
Bash Kernel (for Linux and Hadoop, Hive, Spark batch mode)
-
IPython kernel for Python 3 (for Interactive PySpark Sessions)
-
HiveQL Kernel (for Interactive Hive Sessions)
-
Spark Toree (for Interactive Spark Scala Sessions)
-