• Stars
    star
    935
  • Rank 48,537 (Top 1.0 %)
  • Language
    Python
  • License
    MIT License
  • Created over 8 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Instant access to many datasets in Python.

PyDataset

PyPI version

Provides instant access to many datasets right from Python (in pandas DataFrame structure).

What?

The idea is simple. There are various datasets available out there, but they are scattered in different places over the web. Is there a quick way (in Python) to access them instantly without going through the hassle of searching, downloading, and reading ... etc? PyDataset tries to address that question :)

Usage:

Start with importing data():

from pydataset import data
  • To load a dataset:
titanic = data('titanic')
  • To display the documentation of a dataset:
data('titanic', show_doc=True)
  • To see the available datasets:
data()

That's it. See more examples.

Why?

In R, there is a very easy and immediate way to access multiple statistical datasets, in almost no effort. All it takes is one line > data(dataset_name). This makes the life easier for quick prototyping and testing. Well, I am jealous that Python does not have a similar functionality. Thus, the aim of pydataset is to fill that gap.

Currently, pydataset has about 757 (mostly numerical-based) datasets, that are based on RDatasets. In the future, I plan to scale it to include a larger set of datasets. For example,

  1. include textual data for NLP-related tasks, and
  2. allow adding a new dataset to the in-module repository.

Installation:

$ pip install pydataset

Uninstall:

  • $ pip uninstall pydataset
  • $ rm -rf $HOME/.pydataset

Changelog

0.2.0

  • Add search dataset by name similarity.
  • Example:
>>> data('heat')
Did you mean:
Wheat, heart, Heating, Yeast, eidat, badhealth, deaths, agefat, hla, heptathlon, azt

0.1.1

  • Fix: add support to Windows and fix filepaths, issue #1

Dependency:

  • pandas

Miscellaneous:

  • Tested on OSX and Linux (debian).
  • Supports both Python 2 (2.7.11) and Python 3 (3.5.1).

TODO:

  • add textual datasets (e.g. NLTK stuff).
  • add samples generators.

Thanks to:

More Repositories

1

TermFeed

A simple terminal feed reader.
Python
256
star
2

ar-embeddings

Sentiment Analysis for Arabic Text (tweets, reviews, and standard Arabic) using word2vec
Python
90
star
3

language-detection-fastText

Building a language detection classifier using fastText
Jupyter Notebook
41
star
4

PyTorch-docset

PyTorch docset! use with Dash, Zeal, Velocity, or LovelyDocs.
Python
31
star
5

llm-voice-bot

Speak (speech-to-text) to Ollama LLMs in any lanaguage - Streamlit app
Python
18
star
6

iversions

A simple IPython/Jupyter cell magic command to display name and version of imported modules.
Python
16
star
7

ollachat

A minimal Chatbot GUI for Ollama models - Streamlit App
Python
15
star
8

llvm-dash

LLVM Compiler Infrastructure docset for dash.
HTML
14
star
9

git-dash

Git command reference Docset for Dash
HTML
14
star
10

scrapy-dash

Scrapy docset for Dash.
HTML
12
star
11

etl

simple ETL example
Jupyter Notebook
12
star
12

ar_wordcloud

A tiny wrapper for Arabic WordCloud plots
Python
10
star
13

algorithms-visualization-with-dash

This repo brings Algorithms and Data Structures Visualizations into a Dash docset for better browsing experience.
JavaScript
8
star
14

bs4-dash

BeautifulSoup 4 docset for Dash
HTML
7
star
15

queue-system

Simulating patients data flow in and out of a hospital.
Python
5
star
16

lookml_visualizer

Visualize LookML contents as a network diagram in an interactive Plotly figure.
Python
5
star
17

mini_RAG_LLM

A minimal example for in-memory RAG using ChromaDB and Ollama LLM
Python
4
star
18

ml

machine learning
Jupyter Notebook
3
star
19

pygraph

Create simple and quick Directed Graphs from relational statements.
Python
3
star
20

emacs-dash

The emacs editor docset for Dash
Python
3
star
21

algorithms-dash

Algorithms and data structures docsets for Dash
Python
3
star
22

chat_with_images

Streamlit app to chat with images using Multi-modal LLMs.
Python
3
star
23

StBook

Streamlit Notebook - imagine Jupyter Notebook in Streamlit (ℹī¸ Experimental)
Python
3
star
24

SnowChat

Python
2
star
25

scikit-docset

Scikit docset
HTML
2
star
26

notebooky

Random ipython notebook scripts
Jupyter Notebook
1
star
27

Cython-dash

Cython docset for Dash
JavaScript
1
star
28

CBF_neighborhood

Jupyter Notebook
1
star
29

sqlify

Python
1
star
30

soccer-leagues-interactive

Visualization of the European Soccer Leagues standing tables
1
star
31

Sphinx-dash

Sphinx (Python documentation generator) docset for Dash
JavaScript
1
star
32

YouTube_downloader_app

https://huggingface.co/spaces/iamaziz/youtube_downloader
Python
1
star
33

LLMs-guidebook-for-engineers-and-scientists

LLMs flashcards for ML/AI Scientists and Engineers.
1
star