• This repository has been archived on 02/Feb/2024
  • Stars
    star
    645
  • Rank 69,781 (Top 2 %)
  • Language
    Python
  • License
    BSD 2-Clause "Sim...
  • Created over 7 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Numba extension for compiling Pandas data frames, Intel® Scalable Dataframe Compiler

Sdc

Intel® Scalable Dataframe Compiler

Travis CI Azure Pipelines

Numba* Extension For Pandas* Operations Compilation

Intel® Scalable Dataframe Compiler (Intel® SDC) is an extension of Numba* that enables compilation of Pandas* operations. It automatically vectorizes and parallelizes the code by leveraging modern hardware instructions and by utilizing all available cores.

Intel® SDC documentation can be found here.

Note

For maximum performance and stability, please use numba from intel/label/beta channel.

Installing Binary Packages (conda and wheel)

Intel® SDC is available on the Anaconda Cloud intel/label/beta channel. Distribution includes Intel® SDC for Python 3.6 and Python 3.7 for Windows and Linux platforms.

Intel® SDC conda package can be installed using the steps below:

> conda create -n sdc-env python=<3.7 or 3.6> -c anaconda -c conda-forge
> conda activate sdc-env
> conda install sdc -c intel/label/beta -c intel -c defaults -c conda-forge --override-channels

Intel® SDC wheel package can be installed using the steps below:

> conda create -n sdc-env python=<3.7 or 3.6> pip -c anaconda -c conda-forge
> conda activate sdc-env
> pip install --index-url https://pypi.anaconda.org/intel/label/beta/simple --extra-index-url https://pypi.anaconda.org/intel/simple --extra-index-url https://pypi.org/simple sdc

Building Intel® SDC from Source on Linux

We use Anaconda distribution of Python for setting up Intel® SDC build environment.

If you do not have conda, we recommend using Miniconda3:

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
chmod +x miniconda.sh
./miniconda.sh -b
export PATH=$HOME/miniconda3/bin:$PATH

Note

For maximum performance and stability, please use numba from intel/label/beta channel.

It is possible to build Intel® SDC via conda-build or setuptools. Follow one of the cases below to install Intel® SDC and its dependencies on Linux.

Building on Linux with conda-build

PYVER=<3.6 or 3.7>
NUMPYVER=<1.16 or 1.17>
conda create -n conda-build-env python=$PYVER conda-build
source activate conda-build-env
git clone https://github.com/IntelPython/sdc.git
cd sdc
conda build --python $PYVER --numpy $NUMPYVER --output-folder=<output_folder> -c intel/label/beta -c defaults -c intel -c conda-forge --override-channels conda-recipe

Building on Linux with setuptools

export PYVER=<3.6 or 3.7>
export NUMPYVER=<1.16 or 1.17>
conda create -n sdc-env -q -y -c intel/label/beta -c defaults -c intel -c conda-forge python=$PYVER numpy=$NUMPYVER tbb-devel tbb4py numba=0.54.1 pandas=1.3.4 pyarrow=4.0.1 gcc_linux-64 gxx_linux-64
source activate sdc-env
git clone https://github.com/IntelPython/sdc.git
cd sdc
python setup.py install

In case of issues, reinstalling in a new conda environment is recommended.

Building Intel® SDC from Source on Windows

Building Intel® SDC on Windows requires Build Tools for Visual Studio 2019 (with component MSVC v140 - VS 2015 C++ build tools (v14.00)):

It is possible to build Intel® SDC via conda-build or setuptools. Follow one of the cases below to install Intel® SDC and its dependencies on Windows.

Building on Windows with conda-build

set PYVER=<3.6 or 3.7>
set NUMPYVER=<1.16 or 1.17>
conda create -n conda-build-env -q -y python=%PYVER% conda-build conda-verify vc vs2015_runtime vs2015_win-64
conda activate conda-build-env
git clone https://github.com/IntelPython/sdc.git
cd sdc
conda build --python %PYVER% --numpy %NUMPYVER% --output-folder=<output_folder> -c intel/label/beta -c defaults -c intel -c conda-forge --override-channels conda-recipe

Building on Windows with setuptools

set PYVER=<3.6 or 3.7>
set NUMPYVER=<1.16 or 1.17>
conda create -n sdc-env -c intel/label/beta -c defaults -c intel -c conda-forge python=%PYVER% numpy=%NUMPYVER% tbb-devel tbb4py numba=0.54.1 pandas=1.3.4 pyarrow=4.0.1
conda activate sdc-env
set INCLUDE=%INCLUDE%;%CONDA_PREFIX%\Library\include
set LIB=%LIB%;%CONDA_PREFIX%\Library\lib
git clone https://github.com/IntelPython/sdc.git
cd sdc
python setup.py install

Troubleshooting Windows Build

  • If the cl compiler throws the error fatal error LNK1158: cannot run 'rc.exe', add Windows Kits to your PATH (e.g. C:\Program Files (x86)\Windows Kits\8.0\bin\x86).
  • Some errors can be mitigated by set DISTUTILS_USE_SDK=1.
  • For setting up Visual Studio, one might need go to registry at HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\VisualStudio\SxS\VS7, and add a string value named 14.0 whose data is C:\Program Files (x86)\Microsoft Visual Studio 14.0\.
  • Sometimes if the conda version or visual studio version being used are not latest then building Intel® SDC can throw some vague error about a keyword used in a file. So make sure you are using the latest versions.

Building documentation

Building Intel® SDC User's Guide documentation requires pre-installed Intel® SDC package along with compatible Pandas* version as well as Sphinx* 2.2.1 or later.

Intel® SDC documentation includes Intel® SDC examples output which is pasted to functions description in the API Reference.

Use pip to install Sphinx* and extensions:

pip install sphinx sphinxcontrib-programoutput

Currently the build precedure is based on make located at ./sdc/docs/ folder. While it is not generally required we recommended that you clean up the system from previous documentaiton build by running:

make clean

To build HTML documentation you will need to run:

make html

The built documentation will be located in the ./sdc/docs/build/html directory. To preview the documentation open index.html file.

More information about building and adding documentation can be found here.

Running unit tests

python sdc/tests/gen_test_data.py
python -m unittest

References

Intel® SDC follows ideas and initial code base of High-Performance Analytics Toolkit (HPAT). These academic papers describe ideas and methods behind HPAT:

More Repositories

1

scikit-learn_bench

scikit-learn_bench benchmarks various implementations of machine learning algorithms across data analytics frameworks. It currently support the scikit-learn, DAAL4PY, cuML, and XGBoost frameworks for commonly used machine learning algorithms.
Python
109
star
2

dpctl

Python SYCL bindings and SYCL-based Python Array API library
C++
99
star
3

numba-dpex

Data Parallel Extension for Numba
Python
75
star
4

dpnp

Data Parallel Extension for NumPy
C++
65
star
5

mkl-service

Python hooks for Intel(R) Math Kernel Library runtime control settings.
Cython
63
star
6

mkl_fft

NumPy-based Python interface to Intel (R) MKL FFT functionality
Python
55
star
7

container-images

Dockerfiles for building docker images
Python
27
star
8

ibench

Benchmarks for python
Python
26
star
9

DPEP

Data Parallel Extensions for Python*
Jupyter Notebook
24
star
10

examples

Examples and sample code showcasing features of the Intel(R) Distribution for Python
Shell
21
star
11

mkl_random

Python interface to Intel(R) Math Kernel Library's random number generation functionality
Python
20
star
12

dpbench

Benchmark suite to evaluate Data Parallel Extensions for Python
Python
17
star
13

composability_bench

Show effects of over-subscription and ways to fix that
Python
15
star
14

workshop

Getting Python Performance with Intel(R) Distribution for Python
Jupyter Notebook
13
star
15

smp

Static partitioning and thread affinity for nestable Symmetric Multi-Processing
Python
12
star
16

BlackScholes_bench

Benchmark computing Black Scholes formula using different technologies
Python
12
star
17

source-publish

Sources used in Intel Python that have a license that requires publication: GPL, LGPL, MPL
C
10
star
18

DPPY-Spec

Draft specifications of DPPY
4
star
19

scikit-ipp

Scikit-image like API to Intel® IPP
C
4
star
20

fft_benchmark

C
3
star
21

optimizations_bench

Collection of performance benchmarks used to present optimizations implemented for Intel(R) Distribution for Python*
C++
3
star
22

bearysta

Pandas-based statistics aggregation tool
Python
3
star
23

sharded-array-for-python

C++
1
star
24

oneAPI-for-SciPy

"oneAPI for Scientific Python Community" virtual poster at SciPy 2022
CSS
1
star
25

sample-data-parallel-extensions

Sample data parallel extensions built with oneAPI DPC++
Python
1
star
26

sdc-doc

Documentation pages for SDC.
1
star
27

example-portable-data-parallel-extensions

Examples of portable data-parallel Python extensions using oneAPI DPC++
C++
1
star