• Stars
    star
    321
  • Rank 130,752 (Top 3 %)
  • Language
    Jupyter Notebook
  • Created over 6 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Using pandas for Better (and Worse) Data Science

Using pandas for Better (and Worse) Data Science

This tutorial was presented by Kevin Markham at PyCon on May 10, 2018.

Jupyter notebook

The tutorial code is available as a Jupyter notebook. The notebook includes 4 additional exercises that were not covered during the tutorial.

Videos (playlist)

  1. Introducing the dataset (19:40)
  2. Removing columns (6:27)
  3. Comparing groups (8:42)
  4. Examining relationships (8:44)
  5. Handling missing values (5:02)
  6. Using string methods (5:55)
  7. Combining dates and times (9:11)
  8. Plotting a time series (8:48)
  9. Creating useful plots (8:47)
  10. Fixing bad data (16:31)

What is the tutorial about?

The pandas library is a powerful tool for multiple phases of the data science workflow, including data cleaning, visualization, and exploratory data analysis. However, proper data science requires careful coding, and pandas will not stop you from creating misleading plots, drawing incorrect conclusions, ignoring relevant data, including misleading data, or executing incorrect calculations.

In this tutorial, you'll perform a variety of data science tasks on a handful of real-world datasets using pandas. With each task, you'll learn how to avoid either a pandas pitfall or a data science pitfall. By the end of the tutorial, you'll be more confident that you're using pandas for good rather than evil!

How well do I need to know pandas to participate?

You will get the most out of this tutorial if you are an intermediate pandas user, since the tutorial will not cover pandas basics. If you are new to pandas or just need a refresher, I recommend watching some videos from my free pandas course. Alternatively, you can review all of the code from my pandas course in this Jupyter notebook.

What datasets are we using?

How do I download the files from GitHub?

Here are three options that will work equally well:

  • If you know how to use git, you can click the green button above and clone the repository.
  • If you know how to open a ZIP file, you can click the green button above and download the repository.
  • If you want to download the files individually, right click on these links and select "Save As": police.csv, ted.csv, tutorial.ipynb.

How can I check that pandas and matplotlib are properly installed?

  1. Move the CSV files into your working directory. (This is usually the directory where you create Python scripts or notebooks.)

  2. Open the Python environment of your choice.

  3. If you're using Jupyter notebook, run the following code:

    import pandas as pd
    import matplotlib.pyplot as plt
    %matplotlib inline
    ri = pd.read_csv('police.csv')
    ted = pd.read_csv('ted.csv')
    ri.driver_age.plot()
  4. If you're using any other Python environment, run the following code:

    import pandas as pd
    import matplotlib.pyplot as plt
    ri = pd.read_csv('police.csv')
    ted = pd.read_csv('ted.csv')
    ri.driver_age.plot()
    plt.show()

If you don't get any error messages, and a plot appears on your screen, then it's very likely that pandas and matplotlib are installed correctly.

Who is the instructor?

Kevin Markham is the founder of Data School, an online school for learning data science with Python. He is passionate about teaching data science to people who are new to the field, regardless of their educational and professional backgrounds. Previously, Kevin was the lead data science instructor for General Assembly in Washington, DC. Currently, he teaches machine learning and data analysis to over 10,000 students each month through the Data School YouTube channel. He has a degree in Computer Engineering from Vanderbilt University and lives in Asheville, North Carolina with his wife and son.

Can I contact the instructor with questions?

Sure! You can email [email protected].

More Repositories

1

scikit-learn-videos

Jupyter notebooks from the scikit-learn video series
Jupyter Notebook
3,663
star
2

pandas-videos

Jupyter notebook and datasets from the pandas video series
Jupyter Notebook
2,143
star
3

scikit-learn-tips

πŸ€–βš‘ 50 scikit-learn tips
Jupyter Notebook
1,714
star
4

DAT8

General Assembly's 2015 Data Science course in Washington, DC
Jupyter Notebook
1,602
star
5

DAT4

General Assembly's Data Science course in Washington, DC
Jupyter Notebook
794
star
6

python-reference

Python Quick Reference
Jupyter Notebook
669
star
7

DAT3

General Assembly's Data Science course in Washington, DC
Roff
660
star
8

pycon-2019-tutorial

Data Science Best Practices with pandas
Jupyter Notebook
526
star
9

pycon-2016-tutorial

Machine Learning with Text in scikit-learn
Jupyter Notebook
441
star
10

trump-lies

Tutorial: Web scraping in Python with Beautiful Soup
Jupyter Notebook
241
star
11

DAT7

General Assembly's Data Science course in Washington, DC
Jupyter Notebook
230
star
12

DAT5

General Assembly's Data Science course in Washington, DC
Jupyter Notebook
185
star
13

dplyr-tutorial

Tutorials for the dplyr package in R
159
star
14

pydata-dc-2016-tutorial

Tutorial: Machine Learning with Text in scikit-learn
Jupyter Notebook
74
star
15

python-data-analysis-workshop

Workshop: Intro to Python for Data Analysis
Python
71
star
16

python-data-science-workshop

Workshop: Python for Data Science
Python
61
star
17

kaggle-allstate

Allstate Purchase Prediction Challenge on Kaggle
R
58
star
18

kaggle-pycon-2015

Solution code from my winning submission to Kaggle's PyCon 2015 competition
Python
55
star
19

tidy-data

Commented R code from Hadley Wickham's "tidy data" presentation
R
29
star
20

PracticalMachineLearning

Course project for Practical Machine Learning: https://www.coursera.org/course/predmachlearn
13
star
21

coursera-getting-data

Class project for Coursera's "Getting and Cleaning Data" class
R
10
star
22

babynames

Baby Names by Birth Year
R
5
star
23

justmarkham

1
star