iamaziz/PyDataset

Stars
934
Rank 48,927 (Top 1.0 %)
Language
Python
License
MIT License
Created almost 9 years ago
Updated over 2 years ago

iamaziz/PyDataset

iamaziz

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Instant access to many datasets in Python.

PyDataset

Provides instant access to many datasets right from Python (in pandas DataFrame structure).

What?

The idea is simple. There are various datasets available out there, but they are scattered in different places over the web. Is there a quick way (in Python) to access them instantly without going through the hassle of searching, downloading, and reading ... etc? PyDataset tries to address that question :)

Usage:

Start with importing data():

from pydataset import data

To load a dataset:

titanic = data('titanic')

To display the documentation of a dataset:

data('titanic', show_doc=True)

To see the available datasets:

data()

That's it. See more examples.

Why?

In R, there is a very easy and immediate way to access multiple statistical datasets, in almost no effort. All it takes is one line > data(dataset_name). This makes the life easier for quick prototyping and testing. Well, I am jealous that Python does not have a similar functionality. Thus, the aim of pydataset is to fill that gap.

Currently, pydataset has about 757 (mostly numerical-based) datasets, that are based on RDatasets. In the future, I plan to scale it to include a larger set of datasets. For example,

include textual data for NLP-related tasks, and
allow adding a new dataset to the in-module repository.

Installation:

$ pip install pydataset

Uninstall:

$ pip uninstall pydataset
$ rm -rf $HOME/.pydataset

Changelog

0.2.0

Add search dataset by name similarity.
Example:

>>> data('heat')
Did you mean:
Wheat, heart, Heating, Yeast, eidat, badhealth, deaths, agefat, hla, heptathlon, azt

0.1.1

Fix: add support to Windows and fix filepaths, issue #1

Dependency:

pandas

Miscellaneous:

Tested on OSX and Linux (debian).
Supports both Python 2 (2.7.11) and Python 3 (3.5.1).

TODO:

add textual datasets (e.g. NLTK stuff).
add samples generators.

Thanks to:

RDatasets: R's datasets collection.

TermFeed

A simple terminal feed reader.

ar-embeddings

Sentiment Analysis for Arabic Text (tweets, reviews, and standard Arabic) using word2vec

language-detection-fastText

Building a language detection classifier using fastText

Jupyter Notebook

llm-voice-bot

Speak (speech-to-text) to Ollama LLMs in any lanaguage - Streamlit app

PyTorch-docset

PyTorch docset! use with Dash, Zeal, Velocity, or LovelyDocs.

ollachat

A minimal Chatbot GUI for Ollama models - Streamlit App

iversions

A simple IPython/Jupyter cell magic command to display name and version of imported modules.

llvm-dash

LLVM Compiler Infrastructure docset for dash.

git-dash

Git command reference Docset for Dash

scrapy-dash

Scrapy docset for Dash.

etl

simple ETL example

Jupyter Notebook

ar_wordcloud

A tiny wrapper for Arabic WordCloud plots

algorithms-visualization-with-dash

This repo brings Algorithms and Data Structures Visualizations into a Dash docset for better browsing experience.

bs4-dash

BeautifulSoup 4 docset for Dash

queue-system

Simulating patients data flow in and out of a hospital.

chat_with_images

Streamlit app to chat with images using Multi-modal LLMs.

lookml_visualizer

Visualize LookML contents as a network diagram in an interactive Plotly figure.

mini_RAG_LLM

A minimal example for in-memory RAG using ChromaDB and Ollama LLM

ml

machine learning

Jupyter Notebook

pygraph

Create simple and quick Directed Graphs from relational statements.

emacs-dash

The emacs editor docset for Dash

algorithms-dash

Algorithms and data structures docsets for Dash

StBook

Streamlit Notebook - imagine Jupyter Notebook in Streamlit (ℹ️ Experimental)

SnowChat

scikit-docset

notebooky

Random ipython notebook scripts

Jupyter Notebook

Cython-dash

Cython docset for Dash

CBF_neighborhood

Jupyter Notebook

sqlify

soccer-leagues-interactive

Visualization of the European Soccer Leagues standing tables

Sphinx-dash

Sphinx (Python documentation generator) docset for Dash

YouTube_downloader_app

https://huggingface.co/spaces/iamaziz/youtube_downloader

LLMs-guidebook-for-engineers-and-scientists

LLMs flashcards for ML/AI Scientists and Engineers.

py_ml_and_algorithms-in-ar

Jupyter Notebook

fh_pyconsole

A simple FastHTML app to turn browser into Python console. (Experimental ☢️)