Data Science & Visualization
A curated list of data science, machine learning and visualization tools with emphasis on python, d3 and web applications.
Contents
Machine Learning
Resources
- Awesome Machine Learning comprehensive list of machine learning resources
- Dive into machine learning collections of links and notebooks for a gentle introduction to machine learning
- TopDeepLearning is a list of popular github projects related to deep learning (ranked by stars)
- Probabilistic Programming and Bayesian Methods for Hackers An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python
- Data science ipython notebooks
- Python data-science handbook
- Deep Learning Papers Reading Roadmap
Frameworks
- Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently
- TensorFlow library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them.
- Keras Deep Learning library for Theano, TensorFlow and CNTK.
- Caffe deep learning framework made with expression, speed, and modularity in mind. Written in C++ and has python bindings.
- Torch provides several tools for fast tensor mathematics, storage interfaces and machine learning models. Written in C with Lua interface.
- PyTorch tensors and dynamic neural networks in Python with strong GPU acceleration
- Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning. Writtent in C++ with bindings for python and other languages.
- Scikit Learn is a Python module for machine learning built on top of SciPy
- CNTK computational network toolkit. A C++ library by Microsoft Research.
- XGboost an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. Written in C++ with python integration.
- Tpot is a python tool that automatically creates and optimizes machine learning pipelines using genetic programming.
Neural networks
- Brainforge A Neural Networking library based on NumPy only
- deeplearn.js a neural network library for the web
- OpenNN a neural network C++ library
Reinforcement Learning
- Keras-rl Deep Reinforcement Learning for Keras.
- Gym A toolkit for developing and comparing reinforcement learning algorithms. Written in Python.
- TFLearn is a deep learning library featuring a higher-level API for TensorFlow.
- Tensorforce a TensorFlow library for applied reinforcement learning
Examples
- AIMA python Python code for the book Artificial Intelligence: A Modern Approach
- TensorFlow Examples a TensorFlow tutorial with popular machine learning algorithms implementation
NLP
Natural Language processing benefits from Recurrent Neural Network algorithms.
Analysis
- huggingface/transformers State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0
- Natural Language Toolkit (NLTK) is a suite of python modules, data sets and tutorials supporting research and development in NLP. Some of its modules are out of date but still a useful resource nonetheless.
- SpaCy is a powerful, production ready, NLP library for python
- fastText a C++ library for sentence classification
- TextBlob is a python library for processing textual data. It provides a simple API for diving into common NLP tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
- simhash a python implementation of Simhash Algorithm for detecting near-duplicate web documents
- langdetect is a port of Google's language-detection library to Python.
Tools
- inflect.py Correctly generate plurals, ordinals, indefinite articles; convert numbers to words
- dataprofiler The DataProfiler is a Python library designed to make data analysis, monitoring and sensitive data detection easy. NLP processing is accomplished using a character-level CNN.
Resources
- Oxford Deep NLP 2017 course lecture slides and course description for the Deep Natural Language Processing course
Images
Resources
- Convolutional neural network In machine learning, a convolutional neural network (CNN, or ConvNet) is a class of deep, feed-forward artificial neural network that have successfully been applied to analyzing visual imagery.
Frameworks
- tesseract-ocr well tested OCR engine written in C++
- OpenCV computer vision and machine learning software library. The library has more than 2500 optimized algorithms, which includes a comprehensive set of both classic and state-of-the-art computer vision and machine learning algorithms. These algorithms can be used to detect and recognize faces, identify objects, classify human actions in videos, track camera movements, track moving objects, extract 3D models of objects, produce 3D point clouds from stereo cameras, stitch images together to produce a high resolution image of an entire scene, find similar images from an image database, remove red eyes from images taken using flash, follow eye movements, recognize scenery and establish markers to overlay it with augmented reality, etc. Written in C++ with bindins for most languages including python.
- SimpleCV is a framework for machine vision, using OpenCV and Python. It provides a concise, readable interface for cameras, image manipulation, feature extraction, and format conversion.
- match makes it easy to search for images that look similar to each other
- Noteshrink Convert scans of handwritten notes to beautiful, compact PDFs
- srez Image super-resolution through deep learning
- CovNetJS train Convolutional Neural Networks (or ordinary ones) in the browser
Data
Sources
- Quandl delivers free and premium financial, economic, and alternative data from hundreds of sources via their website, API, or directly into dozens of tools
- Public APIs a collective list of public JSON APIs for use in web development
- 7 and a quarter hours of largely highway driving from comma.ai research
Aggregators
- pyspider a web crawler system in python.
- Newspaper News, full-text, and article metadata extraction in Python 3.
Explore
- Crossfilter is a JavaScript library for exploring large multivariate datasets in the browser.
Storage
- pytables a package for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data. It is built on top of the HDF5 library and the NumPy package.
Visualization
Resources
JavaScript Libraries
- Chart.js HTML5 Charts using the canvas tag
- G2 is a visualization grammar, a data-driven visual language with a high level of usability and scalability
- plotly.js charting library built on top of d3 and stack.gl
- frappe/charts Simple, responsive, modern SVG Charts with zero dependencies
- GraphicsJS A lightweight JavaScript graphics library with the intuitive API, based on SVG/VML technology.
Python Libraries
- bokeh an interactive visualization library that targets modern web browsers for presentation
- bqplot plotting library for IPython/Jupyter notebooks - front-end in d3
- dash Dash is a Python framework for building analytical web applications
- Altair declarative statistical visualization library for Python, based on Vega and Vega-Lite
D3 based libraries
- brite Charts reusable Charting Library based on D3.js v4 by https://www.eventbrite.co.uk/
- C3.js D3-based reusable chart library
- dc.js Multi-Dimensional charting built to work natively with crossfilter rendered with d3.js
- d3-visualize is a d3-view based reactive data-visualization library - alpha
- d3-waffle waffle plots with d3
- semiotic a data visualization framework combining React & D3
- tau Charts
- Vega visualization grammar
- Vega-lite high-level grammar of interactive graphics
Digital Art
- Generating Abstract Patterns with TensorFlow Compositional Pattern Producing Network (CPPN)
Languages
Python
- Awesome Python A curated list of awesome Python frameworks, libraries, software and resources.
- Interactive coding challenges which focus on algorithms and data structures that are typically found in coding interviews
JavaScript
- Simple Statistics statistical methods in readable JavaScript for browsers, servers.
- Computer science in JavaScript Collection of classic computer science paradigms, algorithms, and approaches written in JavaScript
License
To the extent possible under law, Quantmind has waived all copyright and related or neighboring rights to this work.