• Stars
    star
    240
  • Rank 168,229 (Top 4 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created almost 5 years ago
  • Updated 7 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Data Analysis Using Python: A Beginner’s Guide Featuring NYC Open Data.

Data Analysis Using Python: A Beginner’s Guide Featuring NYC Open Data

Mark Bauer

The recording for this presentation can be viewed here: YouTube Video Views

Table of Contents

1. Introduction

NYC Open Data provides a treasure-trove of information - all publicly available with a click of a button. While having access to data is great, its analysis is often a difficult process for beginners, potentially creating barriers in one's open data journey. Additionally, performing data analysis in a reproducible way is often limited or even discarded altogether.

Data Analysis Using Python: A Beginner’s Guide Featuring NYC Open Data is a four-part series as listed in the sections below. These collection of notebooks serve as references/user guides for how to apply Python to real-world Data Analysis projects. The repository features notebooks that will utilize the Python programming language and datasets from NYC Open Data. This series exemplifies how data analytics can be used for discovering useful information and supporting decision-making.

Sections include:

Part 1: Reading and Writing Files in Python
Demonstrates various ways to read (load) and write (save) data using the Python programming language. The datasets contain common file formats such as comma-separated values (csv), JavaScript Object Notation (json), shapefiles (i.e. format for geometric location and attribute information) and zip files.

Part 2: Data Inspection, Cleaning, and Wrangling in Python
Demonstrates various ways to to inspect, clean, wrangle, and detect any outliers in your data.

Part 3: Plotting and Data Visualization in Python
Demonstrates various examples of plotting and data visualizations.

Part 4: Geospatial Data and Mapping
Demonstrates various workflows of working with geospatial data and mapping.

You can run an interactive example on MyBinder through your browser - no installation required: click here Binder.

2. Notebooks

You can view these notebooks through your browser by clicking View under the Static Webpage column.

File Name Description Static Webpage
1-reading-writing-files.ipynb Reading and Writing Files. View
2-data-inspection-cleaning-wrangling.ipynb Data Inspection, Cleaning, and Wrangling. View
3-plotting-visualizations.ipynb Plotting and Data Visualization. View
4-geospatial-data-mapping.ipynb Geospatial Data and Mapping. View

3. Data

Dataset Description
Building Footprints Shapefile of footprint outlines of buildings in New York City.
MapPLUTO MapPLUTO merges PLUTO tax lot data with tax lot features from the Department of Finance’s Digital Tax Map (DTM) and is available as shoreline clipped and water included. It contains extensive land use and geographic data at the tax lot level in ESRI shapefile and File Geodatabase formats.
Schools This is an ESRI shape file of school point locations based on the official address. It includes some additional basic and pertinent information needed to link to other data sources. It also includes some basic school information such as Name, Address, Principal, and Principal’s contact information.
Streets The NYC Street Centerline (CSCL) is a road-bed representation of New York City streets containing address ranges and other information such as traffic directions, road types, segment types.
Neighborhood Tabulation Areas (NTA) Boundaries of Neighborhood Tabulation Areas as created by the NYC Department of City Planning using whole census tracts from the 2010 Census as building blocks. These aggregations of census tracts are subsets of New York City's 55 Public Use Microdata Areas (PUMAs).
NYC Boroughs GIS data: Boundaries of Boroughs (water areas excluded).

4. Open Source Applications Used in Project

  • Anaconda: A distribution of the Python and R programming languages for scientific computing (data science, machine learning applications, large-scale data processing, predictive analytics, etc.), that aims to simplify package management and deployment.
  • Project Jupyter: Project Jupyter is a non-profit, open-source project, born out of the IPython Project in 2014 as it evolved to support interactive data science and scientific computing across all programming languages.
    • Jupyter Notebook: The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text.
    • nbviewer: A web application that lets you enter the URL of a Jupyter Notebook file, renders that notebook as a static HTML web page, and gives you a stable link to that page which you can share with others.
    • Binder: The Binder Project is an open community that makes it possible to create sharable, interactive, reproducible environments.

5. Additional Resources

  • NYC Open Data: Open Data is free public data published by New York City agencies and other partners.
  • Sodapy Tutorial Using NYC Open Data: This tutorial demonstrates how to use sodapy and provides examples of querying data using Socrata Query Language or SoQL.

Say Hello!

I can be reached at:

Twitter: markbauerwater
LinkedIn: markebauer
GitHub: mebauer

Keywords: Data Analysis, Data Science, Python, pandas, numpy, matplotlib, seaborn, GeoPandas, Jupyter Notebook, Anaconda, NYC Open Data, Building Footprints, PLUTO, Open Data, Open Source, Open Science, Exploratory Data Analysis, EDA

More Repositories

1

nyc-311-street-flooding

Analyzing NYC's 311 Street Flooding Complaints from 2010 to 2020.
Jupyter Notebook
20
star
2

stormwater-map-analysis-nyc

Analyzing NYC's Stormwater Flood Map - Extreme Flood Scenario
Jupyter Notebook
15
star
3

sodapy-tutorial-nyc-opendata

Socrata Open Data API (SODA) Tutorial Using NYC Open Data. Sample analysis can be found here: https://github.com/mebauer/sodapy-tutorial-nyc-opendata/blob/main/sample-analysis.ipynb
Jupyter Notebook
8
star
4

nyc-floodzone-analysis

New York City's Preliminary Flood Insurance Rate Map (PFIRM) Data Analysis.
Jupyter Notebook
7
star
5

fema-nfip-nyc

Exploring National Flood Insurance Program (NFIP) Data for New York City.
Jupyter Notebook
7
star
6

nyc-flood-layers

A Collection of Flood Hazard Layers for New York City.
Jupyter Notebook
7
star
7

boba-nyc

Obsessed with Boba? Analyzing Bubble Tea Shops in NYC Using the Yelp Fusion API
Jupyter Notebook
6
star
8

floodmapping-sar

A Collection of NASA ARSET Courses for Flood Mapping and Synthetic Aperture Radar (SAR)
5
star
9

nyc-flood-reports

New York City Flooding Reports and Publications.
3
star
10

fema-nfip-claims

Analyzing FEMA's National Flood Insurance Program (NFIP) Claims Data Using Python
Jupyter Notebook
2
star
11

nyc-flood-data

Flood Data Catalog for NYC: Comprehensive Inventory on NYC Open Data.
Jupyter Notebook
2
star
12

parcel-impervious-area-nyc

Exploring NYC's DEP Citywide Parcel-Based Impervious Area GIS Study.
Jupyter Notebook
2
star
13

duckdb-fema-nfip

Analyzing FEMA's National Flood Insurance Program (NFIP) Data With DuckDB.
Jupyter Notebook
1
star
14

nyc-art-galleries

Exploratory Data Analysis of Art Galleries in Manhattan, New York City
Jupyter Notebook
1
star
15

covid-19-states

Analyzing COVID-19 state data from The New York Times.
Jupyter Notebook
1
star
16

covid-19-counties

Analyzing COVID-19 county data from The New York Times.
Jupyter Notebook
1
star
17

fema-disaster-information

FEMA Disaster Declarations and Public Assistance Data Analysis.
Jupyter Notebook
1
star
18

mebauer

My personal repository
1
star
19

building-elevation-subgrade-nyc

New York City Flood Risk: Exploring the Building Elevation and Subgrade (BES) Dataset in Python.
Jupyter Notebook
1
star
20

projected-sea-level-rise-nyc

New York City Flood Risk: Exploring the projected sea level rise data by New York City Panel on Climate Change (NPCC) in Python.
1
star
21

mycoast-ny-data

Jupyter Notebook
1
star
22

statistics-using-python

Jupyter Notebook
1
star