• Stars
    star
    158
  • Rank 237,095 (Top 5 %)
  • Language
  • Created about 8 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

πŸ’₯ πŸ“ˆ A curated list of data science, analysis and visualization tools

Data Science & Visualization Awesome

A curated list of data science, machine learning and visualization tools with emphasis on python, d3 and web applications.

CONTRIBUTING

Contents

Machine Learning

Resources

Frameworks

  • Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently
  • TensorFlow library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them.
  • Keras Deep Learning library for Theano, TensorFlow and CNTK.
  • Caffe deep learning framework made with expression, speed, and modularity in mind. Written in C++ and has python bindings.
  • Torch provides several tools for fast tensor mathematics, storage interfaces and machine learning models. Written in C with Lua interface.
  • PyTorch tensors and dynamic neural networks in Python with strong GPU acceleration
  • Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning. Writtent in C++ with bindings for python and other languages.
  • Scikit Learn is a Python module for machine learning built on top of SciPy
  • CNTK computational network toolkit. A C++ library by Microsoft Research.
  • XGboost an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. Written in C++ with python integration.
  • Tpot is a python tool that automatically creates and optimizes machine learning pipelines using genetic programming.

Neural networks

  • Brainforge A Neural Networking library based on NumPy only
  • deeplearn.js a neural network library for the web
  • OpenNN a neural network C++ library

Reinforcement Learning

  • Keras-rl Deep Reinforcement Learning for Keras.
  • Gym A toolkit for developing and comparing reinforcement learning algorithms. Written in Python.
  • TFLearn is a deep learning library featuring a higher-level API for TensorFlow.
  • Tensorforce a TensorFlow library for applied reinforcement learning

Examples

NLP

Natural Language processing benefits from Recurrent Neural Network algorithms.

Analysis

  • huggingface/transformers State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0
  • Natural Language Toolkit (NLTK) is a suite of python modules, data sets and tutorials supporting research and development in NLP. Some of its modules are out of date but still a useful resource nonetheless.
  • SpaCy is a powerful, production ready, NLP library for python
  • fastText a C++ library for sentence classification
  • TextBlob is a python library for processing textual data. It provides a simple API for diving into common NLP tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
  • simhash a python implementation of Simhash Algorithm for detecting near-duplicate web documents
  • langdetect is a port of Google's language-detection library to Python.

Tools

  • inflect.py Correctly generate plurals, ordinals, indefinite articles; convert numbers to words
  • dataprofiler The DataProfiler is a Python library designed to make data analysis, monitoring and sensitive data detection easy. NLP processing is accomplished using a character-level CNN.

Resources

Images

Resources

  • Convolutional neural network In machine learning, a convolutional neural network (CNN, or ConvNet) is a class of deep, feed-forward artificial neural network that have successfully been applied to analyzing visual imagery.

Frameworks

  • tesseract-ocr well tested OCR engine written in C++
  • OpenCV computer vision and machine learning software library. The library has more than 2500 optimized algorithms, which includes a comprehensive set of both classic and state-of-the-art computer vision and machine learning algorithms. These algorithms can be used to detect and recognize faces, identify objects, classify human actions in videos, track camera movements, track moving objects, extract 3D models of objects, produce 3D point clouds from stereo cameras, stitch images together to produce a high resolution image of an entire scene, find similar images from an image database, remove red eyes from images taken using flash, follow eye movements, recognize scenery and establish markers to overlay it with augmented reality, etc. Written in C++ with bindins for most languages including python.
  • SimpleCV is a framework for machine vision, using OpenCV and Python. It provides a concise, readable interface for cameras, image manipulation, feature extraction, and format conversion.
  • match makes it easy to search for images that look similar to each other
  • Noteshrink Convert scans of handwritten notes to beautiful, compact PDFs
  • srez Image super-resolution through deep learning
  • CovNetJS train Convolutional Neural Networks (or ordinary ones) in the browser

Data

Sources

Aggregators

  • pyspider a web crawler system in python.
  • Newspaper News, full-text, and article metadata extraction in Python 3.

Explore

  • Crossfilter is a JavaScript library for exploring large multivariate datasets in the browser.

Storage

  • pytables a package for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data. It is built on top of the HDF5 library and the NumPy package.

Visualization

Resources

JavaScript Libraries

  • Chart.js HTML5 Charts using the canvas tag
  • G2 is a visualization grammar, a data-driven visual language with a high level of usability and scalability
  • plotly.js charting library built on top of d3 and stack.gl
  • frappe/charts Simple, responsive, modern SVG Charts with zero dependencies
  • GraphicsJS A lightweight JavaScript graphics library with the intuitive API, based on SVG/VML technology.

Python Libraries

  • bokeh an interactive visualization library that targets modern web browsers for presentation
  • bqplot plotting library for IPython/Jupyter notebooks - front-end in d3
  • dash Dash is a Python framework for building analytical web applications
  • Altair declarative statistical visualization library for Python, based on Vega and Vega-Lite

D3 based libraries

Digital Art

Languages

Python

  • Awesome Python A curated list of awesome Python frameworks, libraries, software and resources.
  • Interactive coding challenges which focus on algorithms and data structures that are typically found in coding interviews

JavaScript

License

CC0

To the extent possible under law, Quantmind has waived all copyright and related or neighboring rights to this work.

More Repositories

1

pulsar

Event driven concurrent framework for Python
Python
1,874
star
2

awesome-open-finance

A curated list of open finance and open banking resources
103
star
3

dynts

Python package for timeseries analysis and manipulation
Python
86
star
4

ccy

python module for currencies
Python
80
star
5

pulsar-queue

Asynchronous message queue consumer and scheduler
Python
61
star
6

aio-openapi

A python module for building OpenAPI compliant asynchronous Rest Servers. Auto documentation, serialization and validation in the same unified API.
Python
32
star
7

pulsar-django

django pulse application
Python
21
star
8

lux

Asynchronous web toolkit for python - alpha
Python
20
star
9

d3-canvas-transition

transition on canvas with d3
JavaScript
18
star
10

giotto

d3 based visualization library - svg & canvas
JavaScript
14
star
11

pulsar-cloud

Asynchronous cloud clients - botocore - pusher
Python
14
star
12

d3-view

d3 plugin for β˜€οΈ web interfaces
JavaScript
13
star
13

pulsar-odm

Green SqlAlchemy extensions for pulsar
Python
12
star
14

qmlib

Legacy C++ code for quantitative finance
C++
11
star
15

jflib

C++ utilities for timeseries analysis
C++
9
star
16

aio-kong

Asynchronous python client for kong
Python
7
star
17

kollector

Collect orderbook data from crypto exchanges and publish as GRPC
Rust
7
star
18

datastructures

Data Structures
Python
6
star
19

pulsar-bench

Benchmarking HTTP servers
Python
5
star
20

pyml

Docker Image for data analysis with python 3
Python
4
star
21

d3-quant

D3 plugin for quantitative data analysis (pre-alpha)
TypeScript
3
star
22

ansible-openresty

Openresty docker image and configuration
2
star
23

metablock-js

TypeScript tools for the metablock cloud. It includes the core library for managing front-end blocks and a webpack dev server for block development.
TypeScript
2
star
24

quantflow

Quantitative finance and derivative pricing
Python
2
star
25

d3-let

d3 plugin with common utilities - import what you need if you need it
JavaScript
1
star
26

pulsar-twitter

Pulsar queue plugin to stream tweets from Twitter
Python
1
star
27

topo-regions

World maps for visualization
JavaScript
1
star