• This repository has been archived on 26/Mar/2021
  • Stars
    star
    104
  • Rank 330,604 (Top 7 %)
  • Language
    Python
  • Created almost 10 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

So you've gone through a Django tutorial or two, maybe even built an app, and want to know how to use it for your reporting.

Django for Data Analysis

Who are we, and what are we doing here?

So you know some SQL, or maybe done data analysis with R, but would prefer a more integrated way to explore and publish data.

In addition to making front-facing web apps, Django can also be used internally as a reporting and research tool to explore a dataset.

To do this, we'll use a publicly available dataset from the City of L.A.'s data portal on response times to complaints filed to the Department of Building and Safety. We used this data to publish a story on the varying response times to DBS complaints throughout the city, including especially slow response times on the East side of L.A.

We'll learn how to:

  • Load data from a CSV into Django models.
  • Create queries and views to help us analyze the data.
  • Create tables and vizualizations that we can use for publication.

That's a lot of ground to cover in 60 minutes, so this will be like a cooking show. We'll walk through the steps, but the final product is mostly already baked, just waiting in the oven.

To demonstrate, here's Julia Child dancing with a turkey:

Julia Child dancing with a turkey

Setup

Installation

If you're on a classroom-provided computer, or have already installed the requirements, skip down to Database setup.

Otherwise, start by creating a virtual environment and clone the repo.

$ virtualenv django-for-data-analysis-nicar-2016
$ cd django-for-data-analysis-nicar-2016
$ . bin/activate
$ git clone [email protected]:datadesk/django-for-data-analysis-nicar-2016.git repo
$ cd repo

Then, install the requirements by typing or pasting the two commands below. This uses a Python data stack of NumPy, SciPy and Pandas, so it takes a loooong time to install. I highly recommend you do this before the class, if possible.

# If you're using Ubuntu, and have not installed SciPy before
# you may need to install these system libraries
$ sudo apt-get install gfortran libopenblas-dev liblapack-dev
# we have to install numpy first due to some bugs with the numpy install
$ pip install numpy==1.8.1
$ pip install -r requirements.txt

Database setup

If you're on a classroom-provided computer, a couple of commands to start. Open your terminal and type or paste:

$ workon adv_django
$ cd /data/advanced_django

Everybody now:

$ createdb django_data_analysis
$ python manage.py syncdb

This will give you a couple of prompts:

You just installed Django's auth system, which means you don't have any superusers defined.
Would you like to create one now? (yes/no): 

Enter "yes", and pick a username and password you will remember. You can skip the email field.

Next, more setup to migrate the database.

$ python manage.py schemamigration building_and_safety --init
$ python manage.py migrate 

Getting started: The Models

First, let's look at our models, and how they correspond to our source data CSVs. There are a couple of fields we're concerned with or need more explanation. In your text editor or browser, open up models.py and building_and_safety/data/Building_and_Safety_Customer_Service_Request__out.csv

|CSR Number|LADBS Inspection District|Address House Number|Address House Fraction Number|Address Street Direction|Address Street Name|Address Street Suffix|Address Street Suffix Direction|Address Street Zip|Date Received|Date Closed|Due Date|Case Flag|CSR Priority|GIS Parcel Identification Number (PIN)|CSR Problem Type|Area Planning Commission (APC)|Case Number Related to CSR|Response Days|Latitude/Longitude| |:---|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:| |263018|4500|19841| |W|VENTURA|BLVD| |91367|3/1/2011||3/21/2011|N|3|174B113 358|"ISSUES REGARDING ADULT ENTERTAINMENT LOCATIONS (CLUBS, CABARETS, BOOK AND VIDEO STORES)"|South Valley|||(34.1725, -118.5653)| |265325|2142|972| |N|NORMANDIE|AVE| |90029|4/7/2011||5/5/2011|N|3|144B197 1166|FENCES WALLS AND HEDGES THAT ARE TOO HIGH|Central|||(34.08833, -118.30032)| |265779|2142|1402| |N|ALEXANDRIA|AVE| |90027|4/14/2011||5/12/2011|N|3|147B197 909|FENCES WALLS AND HEDGES THAT ARE TOO HIGH|Central|||(34.0967, -118.29815)|

  • csr - This stands for customer service request ID, and can serve as our primary key.
  • ladbs_inspection_district - How LADBS divides their inspectors, one per district. After reporting this out further we found that these actually change quite a bit from year-to-year.
  • date_received - Date the complaint was received.
  • date_closed - "Closed" is a funny term here. This is actually the date that an inspector visited the location of a complaint. In many cases this is when an inspector determines whether a complaint is valid or not. So really "closed" just means an inspector has visited a complaint.
  • date_due - Arbitrary measure of time that the LADBS would like to address a complaint by. They apparently don't place a whole lot of meaning on this, and not all complaints have due dates.
  • csr_priority - This is the priority level that LADBS responds to the complaints. Level 1 is often life-threatening, level 2 is quality of life issues, and level 3 is more like a nuisance.
  • area_planning_commission - One of the seven different planning department areas the complaint is located in. We used this and the neighborhood level for most of our analysis.
  • response_days - This should reflect the number of days it takes to respond to a complaint, however it is calculated differently for open and closed cases. So, this is a good example of why you need to understand your data. We calculated this field manually.

There are also a few fields that we're calculating at the time we load the data:

  • full_address - Caching the address by concatenating the seven different address fields.
  • is_closed - An easier way to find whether a case is "closed" or not.
  • days_since_complaint - Our calculated "response_days" field from above. Finds the difference between the date received and the date closed.
  • gt_30_days, gt_90_days, gt_180_days, more_than_one_year - Different booleans to make it easy to access cases of a certain age.

Then, let's load the complaints data using a custom management command you can see here: building_and_safety/management/commands/load_complaints.py

$ python manage.py load_complaints

The city provides two CSVs of data -- one for open cases and another for closed. This command creates a Complaint record for every row in our two csvs. Note that instead of saving every individual record as it's loaded, we use Django's bulk_create method to create them in batches of 500. This saves time because we're not hitting the database for every row in the CSV.

What are we looking at here? Exploring the data

We can use our shell to explore the data with Django's query sets.

$ python manage.py shell

Let's use this to answer a couple of basic questions about the data:

  • How many complaints do we have total?
  • How many are still unaddressed?
  • How many complaints went more than a year before being addressed?
>>> from building_and_safety.models import Complaint
>>> complaints = Complaint.objects.all()
>>> complaints.count()
71120
>>> complaints.filter(is_closed=False).count()
4060
>>> complaints.filter(more_than_one_year=True).count()
1231

Whats happening behind the scenes?

Django's converting all of our queries to SQL, and executing them on the database. When we write code like complaints = Complaint.objects.all() Django is creating a queryset, an easy way to filter and slice our data without hitting the database. The database is only queried once we call something that evaluates the queryset, like .count().

We can see these, and it's helpful to see exactly what Django's trying to do when you're having trouble with a query.

>>> from django.db import connection
>>> print connection.queries[-1]
{u'time': u'0.028', u'sql': u'SELECT COUNT(*) FROM "building_and_safety_complaint" WHERE "building_and_safety_complaint"."more_than_one_year" = true '}

We can use this for more complicated questions too. How are the complaints older than one year distributed throughout the planning commissions? Are some worse than others?

>>> from django.db.models import Count
>>> complaints.filter(more_than_one_year=True).values('area_planning_commission').annotate(count=Count("csr")).order_by('-count')
[{
    'count': 618,
    'area_planning_commission': u'East Los Angeles'
}, {
    'count': 394,
    'area_planning_commission': u'Central'
}, {
    'count': 86,
    'area_planning_commission': u'West Los Angeles'
}, {
    'count': 60,
    'area_planning_commission': u'South Valley'
}, {
    'count': 56,
    'area_planning_commission': u'South Los Angeles'
}, {
    'count': 8,
    'area_planning_commission': u'North Valley'
}, {
    'count': 7,
    'area_planning_commission': u''
}, {
    'count': 2,
    'area_planning_commission': u'Harbor'
}]

Now we're on to something.

The views: A first look

One advantage of using Django is that all of our data manipulation can be stored in views, and the output displayed in HTML templates.

We've already created a few views that summarize the data, so let's take a look.

Exit the Python interpreter, if you still have it running, by pressing CTRL-D, and start the Django server.

$ python manage.py runserver
Validating models...

0 errors found
February 20, 2016 - 16:40:36
Django version 1.6.5, using settings 'project.settings'
Starting development server at http://0.0.0.0:8000/
Quit the server with CONTROL-C.

In the spirit of a cooking show, let's cheat. Let's see what we're going to get first, and then look at how we get it. In your web browser, go to http://localhost:8000/complaint_analysis.

Analysis page

Here is a table with some basic information about our complaints. How many have been addressed, how many are still open, and how many have hung around for a really long time.

In the second table, we also break this down by area planning commissions - areas used by the L.A. Department of City Planning to divide up the city.

A primer on survival analysis

For our analysis, we want to get the median time it takes to close a complaint. Simply calculating the median based on complaints that are already closed leaves out all of the complaints that are still open -- including those that have lingered in the system unaddressed for more than a year.

Using a statistical method called survival analysis can help us account for complaints still open, and give us a more accurate closure time.

Survival analysis was created for use by actuaries and medical professionals to measure lifespans of individuals. It can be used to analyze not only literal births and deaths, but also any instance where a "birth" or "death" can be observed. In this case, a "birth" is the opening of a complaint, and "death" is that complaint being visited and marked as "closed."

The excellent Lifelines documentation has another example of survival analysis looking at the tenure of world leaders, where the "birth" is the start of their time in office and the "death" is their departure.

Another example was performed by Times database editor Doug Smith and data analyst Brian Kim in finding that youths in California's privatized foster care remained in the foster system 11% longer than those in other types of homes — 378 days compared to 341 days.

There are libraries in R that can do this as well, (see survival and KMsurv,) but using a Python library keeps this all in the same development stack.

Diving into the views

Open up the views.py file in your text editor, and look at the ComplaintAnalysis view.

There's a lot going on here. We know there were more complaints open after a year in East L.A. than other parts of the city. But is this because they receive more complaints in general? Are some types of complaints addressed more quickly than others?

To find the median time to address a complaint, we use survival analaysis. Let's take apart this view:

class ComplaintAnalysis(TemplateView):
    # The HTML template we're going to use, found in the /templates directory
    template_name = "complaint_analysis.html"

This defines the ComplaintAnalysis as a template view, and sets the HTML template that we're going to build with the data generated from the view. You can either open complaint_analysis.html in your text editor, or follow the link here.

# Quick notation to access all complaints
complaints = Complaint.objects.all()

# Quick means of accessing both open and closed cases
open_cases = complaints.filter(is_closed=False)
closed_cases = complaints.filter(is_closed=True)

# Overall complaints not addressed within a year
over_one_year = complaints.filter(more_than_one_year=True)
open_over_one_year = over_one_year.filter(is_closed=False)
closed_over_one_year = over_one_year.filter(is_closed=True)

We then split our complaints into four groups: open, closed, open over a year, and closed over a year. We'll use these throughout the rest of the view.

# total counts of cases, all priority levels
total_count = complaints.all().count()
total_by_csr = get_counts_by_csr(complaints)

# Counts of open cases, all priority levels
open_cases_count = open_cases.count()
open_by_csr = get_counts_by_csr(open_cases)

# Counts of cases that have been open fore more than a year, all priority levels
open_over_one_year_count = open_over_one_year.count()
open_over_one_year_by_csr = get_counts_by_csr(open_over_one_year)

# Counts of cases that were closed, but have been open for more than a year, all priority levels.
closed_over_one_year_count = closed_over_one_year.count()
closed_over_one_year_by_csr = get_counts_by_csr(closed_over_one_year)

We use a helper function here to get counts of complaints for each priority level:

def get_counts_by_csr(qs):
    counts = {}
    counts["csr1"] = qs.filter(csr_priority="1").count()
    counts["csr2"] = qs.filter(csr_priority="2").count()
    counts["csr3"] = qs.filter(csr_priority="3").count()
    return counts

This function takes a queryset, and returns a dictionary of counts for each CSR. This is the programming adage of DRY, we're not repeating the same code many different times in our view.

To calculate the response times for different priority levels of complaints, we use the Kaplan-Meier fit from our survival analysis library. It will return us a median wait time, and establish a closure rate for accounts for complaints that are still open.

all_complaints = Complaint.objects.exclude(days_since_complaint__lt=0)
kmf_fit = get_kmf_fit(all_complaints)
median_wait_time_kmf = get_kmf_median(kmf_fit)

Let's take a quick look at the functions get_kmf_fit() and get_kmf_median().

# Get the average wait time using a Kaplan-Meier Survival analysis estimate
# Make arrays of the days since complaint, and whether a case is 'closed'
# this creates the observations, and whether a "death" has been observed
def get_kmf_fit(qs):
    t = qs.values_list('days_since_complaint', flat=True)
    c = qs.values_list('is_closed', flat=True)
    kmf = KaplanMeierFitter()
    kmf.fit(t, event_observed=c)
    return kmf


# Return the mean of our KMF curve
def get_kmf_median(kmf):
    return kmf.median_

This function makes two lists: one of the count of days since a complaint was created and another of whether the complaint has been closed. Those lists are then fitted to a Kaplan-Meier curve, and the function returns the kmf object. get_kmf_median simply returns the median_ value of that object.

We use this method to find the median response time to all complaints, as well as the response time for priority level 1, 2 and 3 complaints.

Since it seems like response times are slower in the eastern parts of the city, let's break it down further. We make a list of the seven different planning commission areas, and then store the results in a dictionary.

region_names = ['Central','East Los Angeles','Harbor','North Valley','South Los Angeles','South Valley','West Los Angeles']
regions = {}

Then, we iterate over those region names, creating a queryset and aggregate counts for each area.

# Iterate over each name in our region_names list
for region in region_names:
    # Filter for complaints in each region
    qs = complaints.filter(area_planning_commission=region, days_since_complaint__gte=0)
    # create a data dictionary for the region
    regions[region] = {}
    # get a count of how many complaints total are in the queryset
    regions[region]['total'] = qs.count()

Let's also find the volume of complaints each area receives in a year.

regions[region]['avg_complaints_per_year'] = get_avg_complaints_filed_per_year(region)

Let's use survival analysis to find the median response time for each priority level of complaint. This is basically the same thing we did above, but for a smaller queryset of complaints confined to each planning commission area.

# Separate the complaints into querysets of their respective priority levels 
region_csr1 = qs.filter(csr_priority="1")
region_csr2 = qs.filter(csr_priority="2")
region_csr3 = qs.filter(csr_priority="3")

# Find the KMF fit for all complaints in the area and by each priority level
regional_kmf_fit = get_kmf_fit(qs)
regional_kmf_fit_csr1 = get_kmf_fit(region_csr1)
regional_kmf_fit_csr2 = get_kmf_fit(region_csr2)
regional_kmf_fit_csr3 = get_kmf_fit(region_csr3)

# Get the median value from the KMF fit. 
regions[region]['median_wait_kmf'] = get_kmf_median(regional_kmf_fit)
regions[region]['median_wait_kmf_csr1'] = get_kmf_median(regional_kmf_fit_csr1)
regions[region]['median_wait_kmf_csr2'] = get_kmf_median(regional_kmf_fit_csr2)
regions[region]['median_wait_kmf_csr3'] = get_kmf_median(regional_kmf_fit_csr3)

Last, we also find the number of complants greater than a year in each area.

regions[region]['gt_year'] = qs.filter(more_than_one_year=True).count()

We have complaints over a year, but where are the response times breaking down? Let's also find the number of complaints older than 30, 90, 180 days and one year, and use latimes-calculate to find the what percentage of the total complaints fall in these categories.

Now let's take a look at this in the template. Open up complaint_analysis.html in your text editor and in your browser at http://localhost:8000/complaint_analysis/.

This page is two HTML tables. Notice the values wrapped in {{foo}} curly brackets. These are the variables from our view that get fed into the template. We can use them in a number of ways, here, we simply feed them in as table values.

You'll also see values like |intcomma or |floatformat:1. These are brought in from the humanize set of template filters, that make numbers easier to read and understand. See the load statement in line 2, {% load humanize toolbox_tags %} that brings in this set of filters.

Go ahead and remove one of the floatformat filters and reload the page. Rounding is good!

We have summary data for all complaints, and for complants broken down by region. Let's use the new variables we created in the view on what percentage of complaints took longer than 30, 90 and 180 days to build a new table.

Below the regional breakdown table, but before the {% endblock %} tag (around line 95), type or paste in this HTML, and reload the page.

<h3>Wait times</h3>
<table class="table table-striped table-bordered table-condensed">
    <thead>
        <tr>
            <th>Region</th>
            <th class="data">% > 30 days</th>
            <th class="data">% > 90 days</th>
            <th class="data">% > 180 days</th>
            <th class="data">% > 1 yr</th>
        </tr>                
    </thead>
    <tbody>
        {% for region, value in regions.items %}
        <tr>
            <td>{{ region }}</td>
            <td class="data">{{ value.per_gt_30_days|floatformat:2 }}</td>
            <td class="data">{{ value.per_gt_90_days|floatformat:2 }}</td>
            <td class="data">{{ value.per_gt_180_days|floatformat:2 }}</td>
            <td class="data">{{ value.per_gt_year|floatformat:2 }}</td>
        </tr>
        {% endfor %}
    </tbody>            
</table>    

Pasting the html


Using this method you can analyze and present your data in a number of ways that can help you find and understand the story.

Visualizing the data: Let's make a map

Complaints map

Take a look at the view ComplaintsMap.

What's happening here? It doesn't look like anything's in the view, right? Let's take a look at the template. Open up templates/complaints_map.html

This template calls two other views through a $.getJSON() call - open_complaints_json and closed_complaints_json.

These two views are essentially the same, pulling in all geocoded complaints that waited more than a year for a response, except one pulls in closed and one pulls in open complaints.

Both use a method on the Complaint model to export relevant fields of the model to a GeoJSON object.

features = [complaint.as_geojson_dict() for complaint in complaints]

This refers back the as_geojson_dict() method on the Complaint model. This method takes properties from the model and returns them in a geojson dict format:

def as_geojson_dict(self):
  """
  Method to return each feature in the DB as a geojson object.
  """
  as_dict = {
      "type": "Feature",
      "geometry": {
          "type": "Point",
          "coordinates": [
              float(self.lon),
              float(self.lat)
          ]
      },
      "properties": {
          "address": self.full_address,
          "csr": self.csr,
          "date": dateformat.format(self.date_received, 'F j, Y'),
          "closed": self.get_closed_date(),
          "type": self.csr_problem_type,
          "priority": self.csr_priority
      }
  }
  return as_dict

We get this dict for each item in the queryset, and then form them into a geojson object that is returned when we request the view.

objects = {
    'type': "FeatureCollection",
    'features': features
}
response = json.dumps(objects)
return HttpResponse(response, content_type='text/json')

Try this out - go to http://localhost:8000/api/complaints.json. It loads a GeoJSON object! All those rows in the database are automatically generated in GeoJSON format.

Load the URL http://localhost:8000/complaints-map/. It should load a map in your browser.

What is this map loading in? What's it showing us? How does this demonstrate the flexibility of using a Django backend instead of exporting our results from SQL or another database format?

Take a look at the views for open_complaints_json and closed_complaints_json - we're filtering for all complaints older than a year, and excluding any without geocoded lat/lon points.

Say, later in the editing process, we want to export only complaints older than 180 days, instead of a year. In Django, this is easy:

complaints = Complaint.objects\
        .filter(is_closed=True, gt_180_days=True).exclude(lat=None, lon=None)

Go ahead and change that in the open_complaints_json and closed_complaints_json views now, and reload the map. You'll see the changes instantly. Instead of having to go back to the database, re-enter our SQL, and re-export our files, Django takes care of all of that for you.

Later, if we decide not to include the closed complaints, for example, we can simply remove that call in the template.

You can also change the information that's fed into the GeoJSON object. Maybe we want to add the area planning commission, and the number of days since the complaint was filed?

Let's add this in line 148 of models.py

"properties": {
    "address": self.full_address,
    "csr": self.csr,
    "date": dateformat.format(self.date_received, 'F j, Y'),
    "closed": self.get_closed_date(),
    "type": self.csr_problem_type,
    "priority": self.csr_priority,
    "apc": self.area_planning_commission, 
    "days_since_complaint": self.days_since_complaint
}

And we'll modify our leaflet tooltip to pull in this information in complaints_map.html. On line 166, we feed those properties to our underscore template.:

var context = {
    address: props["address"],
    csr: props["csr"],
    date: props["date"],
    closed_date: props["closed"],
    priority: priority,
    problem: props["type"],
    apc: props["apc"],
    days_since_complaint: props["days_since_complaint"]
};

And update the underscore template on line 74.

<script type="text/template" id="tooltip-template">
    <h4><a href="/complaint/<%= csr %>/"><%= address %></a></h4>
    <p>CSR number <a href="/complaint/<%= csr %>/"><%= csr %></a></p>
    <p>Received <%= date %></p>
    <% if (closed_date !== null) { %><p>Closed <%= closed_date %></p><% } %>
    <p>Priority <%= priority %> complaint</p>
    <p><%= problem %></p>
    <p><%= days_since_complaint %> days since complaint was filed.</p>
    <p><%= apc %> planning commission</p>
</script>

So you can pretty quickly add or remove properties from your data and your templates.

Pitfalls of Django

Like anything, there are a few disadvantages to using Django as a reporting framework.

It's a lot of up-front work.

  • There's no doubt that setting up a Django project takes a lot of time to get started and running. Models and loader scripts take planning, and we're all on a constant time crunch. And not every project merits this, sometimes you can get the results you need with a few quick pivot tables or R scripts. But the saved time in the analysis and publication phases make it well worth the input for me.

With huge databases (and a slow computer) the queries can take forever

  • This can be a problem with large datasets, but there are a few solutions. Caching is one. Django has a number of built-in caching options to help out with this. Another possible option is to bake out any visualizations that will be used for publication. Django-bakery can help with this.

Check, check, check your data

  • Because it's so easy to swap things around, or filter a queryset, it's easy to think you're looking at one thing, but you're actually looking at another. What if we had filtered for complaints over 90 days, when we thought we were looking at 180?
  • Fortunately, since everything is written down and traceable in the views, it's fairly easy to spot and correct possible errors before publication.

What can we do from here?

We skipped it for this lesson, but in the original story, we used GeoDjango and a GIS-enabled database to allow for geospatial queries on the data - This allowed us to relate complaints to L.A. Times Neighborhoods shapes, and to census tracts and LADBS inspection districts.

Using GeoDjango, you can also set a buffer around a point, which is what LAT Alum Ken Schwencke did for this piece analyzing homicides in the unincorporated neighborhood of Westmont and along a two-mile stretch of Vermont Ave. called "Death Alley."

Julia Child brandishing cutlery over a fish

More Repositories

1

notebooks

All of our computational notebooks
Python
516
star
2

california-coronavirus-data

The Los Angeles Times' open-source archive of California coronavirus data
Jupyter Notebook
214
star
3

census-data-downloader

Download U.S. census data and reformat it for humans
Python
191
star
4

python-elections

A Python wrapper for the Associated Press' U.S. election data service.
Python
175
star
5

vr-interactives-three-js

A tutorial on how to build VR interactives using DEM data and Three.js
JavaScript
164
star
6

web-map-maker

Use Natural Earth and OpenStreetMap data to export an image or a vector file.
JavaScript
94
star
7

california-2016-election-precinct-maps

Precinct-level results for the 2016 general election in California
Python
83
star
8

latimes-calculate

Some simple math we use to do journalism.
Python
80
star
9

latimes-table-stacker

Publish spreadsheets as interactive tables. And do it on deadline.
Python
75
star
10

jquery-geocodify

Autocomplete for address searches
JavaScript
66
star
11

kobe-every-shot-ever

A Los Angeles Times analysis of Every shot in Kobe Bryant's NBA career
Jupyter Notebook
65
star
12

python-documentcloud

A deprecated Python wrapper for the DocumentCloud API
Python
63
star
13

tutorials

All of our code examples and tutorials
Python
63
star
14

lapd-crime-classification-analysis

A Los Angeles Times analysis of serious assaults misclassified by LAPD
Jupyter Notebook
60
star
15

print-map-maker

Tool for generating map images from Mapbox tiles
HTML
58
star
16

homeless-arrests-analysis

A Los Angeles Times analysis of arrests of the homeless by the LAPD
Jupyter Notebook
55
star
17

california-coronavirus-scrapers

The open-source web scrapers that feed the Los Angeles Times California coronavirus tracker.
Jupyter Notebook
51
star
18

california-crop-production-wages-analysis

A Los Angeles Times analysis of crop worker pay in California
Jupyter Notebook
42
star
19

census-data-aggregator

Combine U.S. census data responsibly
Python
42
star
20

osm-quiet-la

A template for a muted base layer about Southern California.
Python
39
star
21

django-shp2svg

Convert a shapefile into an SVG you can use with JavaScript libraries
HTML
38
star
22

django-softhyphen

A Python library for hyphenating HTML in your Django project
Python
38
star
23

census-map-downloader

Easily download U.S. census maps
Python
33
star
24

latimes-appengine-template

Bootstrap a Google App Engine project with Django and other goodies.
Python
33
star
25

nasa-wildfires

Download wildfire hotspots detected by NASA satellites and the Fire Information for Resource Management System (FIRMS)
Python
32
star
26

overpass-turbo-tutorial

Code examples and tutorial on getting data out of OpenStreetMap
32
star
27

census-map-consolidator

Combine Census blocks into new shapes
Python
31
star
28

slackdown

A simple Slack message text formatting to HTML code converter.
Python
27
star
29

latimes-statestyle

A Python library that standardizes the names of U.S. states
Python
25
star
30

timemap

A branch of the original, with a touch of Leaflet
JavaScript
25
star
31

django-bigbuild

The open-source engine that powers bigbuilder, the Los Angeles Times Data Desk's system for publishing standalone pages
JavaScript
25
star
32

baker

A build tool by and for the Los Angeles Times
JavaScript
22
star
33

geopandas-spatial-join-example

An example of how to join point to polygon data with geopandas and Python
Jupyter Notebook
21
star
34

street-racing-analysis

A Los Angeles Times analysis of street racing fatalities in L.A. County
Jupyter Notebook
21
star
35

california-electricity-capacity-analysis

A Los Angeles Times analysis of California's costly power glut
Jupyter Notebook
17
star
36

latimes-mappingla-geopy

A fork of the geocoding library geopy. Returns accuracy scores. Allows viewport biasing.
Python
17
star
37

baker-example-page-template

A demonstration of how to build and publish pages with the baker build tool
SCSS
17
star
38

census-error-analyzer

Analyze the margin of error in U.S. census data
Python
16
star
39

california-topojson-atlas

Simple maps of California's 58 counties
HTML
15
star
40

california-business-entities

Corporations and limited-liability companies registered with the California Secretary of State.
Shell
14
star
41

los-angeles-police-killings-data

The Los Angeles Times' database of people killed by local police in Los Angeles County.
14
star
42

california-fire-zone-analysis

A Los Angeles Times analysis of California buildings within fire hazard zones
Jupyter Notebook
14
star
43

altair-area-examples

How to use the Altair data visualization library to create an array of area charts.
Jupyter Notebook
14
star
44

latimes-mappingla-api

A Python wrapper for accessing the Mapping L.A. Boundaries API.
Python
13
star
45

copyboy

Copyboy is our fork of GitHub's Campfire bot, hubot. He is aware of all Internet traditions.
CoffeeScript
13
star
46

star-wars-analysis

A Los Angeles Times analysis of the dialogue spoken in Star Wars episodes 1-8
12
star
47

django-project-template

A custom template for initializing a new Django project the Data Desk way.
Python
12
star
48

latimes-pluggablemaps-uscounties

A pluggable GeoDjango app with the boundaries of United States counties.
Python
12
star
49

nifc-wildfires

Download wildfires data from the National Interagency Fire Center
Python
12
star
50

california-ccscore-analysis

A Los Angeles Times analysis of water usage after the state eased drought restrictions
Jupyter Notebook
12
star
51

inciweb-wildfires

Download wildfire incidents data from InciWeb
Python
12
star
52

noaa-wildfires

Download wildfires data from NOAA satellites
Python
11
star
53

altair-latimes

A Los Angeles Times theme for Python's Altair statistical visualization library
Jupyter Notebook
11
star
54

ripa-analysis

A Los Angeles Times analysis found that LAPD officers search blacks and Latinos far more often than whites during traffic stops even though whites are more likely to be found with illegal items.
HTML
11
star
55

nws-wwa

Download watch, warning and advisory data from the National Weather Service
Python
11
star
56

lametro-maps

Geospatial data from L.A. Metro's public transportation system
10
star
57

slack-buttons-example

A Flask app to document and test Slack's interactive messages.
Python
10
star
58

deadspin-scraper

Scrape posts from Deadspin
Jupyter Notebook
10
star
59

homeless-sleeping-restrictions

Analyzing a Los Angeles city council proposal aiming to limit where homeless people can sleep
Jupyter Notebook
10
star
60

california-h2a-visas-analysis

The Los Angeles Times analysis of temporary visas granted to foreign agricultural workers
Jupyter Notebook
10
star
61

latimes-document-stacker

Use DocumentCloud to publish PDFs for humans.
Python
10
star
62

helicopter-accident-analysis

A Los Angeles Times analysis of helicopter accident rates
Jupyter Notebook
10
star
63

jquery-scroll-slideshow

jQuery Waypoints extension that rotates through an array of images as you scroll down the page
JavaScript
9
star
64

osm-silent-la

A template for a black base layer about Southern California
CartoCSS
9
star
65

congress-headshots

Almost all active congress members' photos
Python
9
star
66

calfire-wildfires

Download wildfires data from CalFire
Python
8
star
67

wine-country-fires

Analysis of every fire in California's wine country since 1950
Jupyter Notebook
7
star
68

la-election-night

Parses election results data published online by the Los Angeles County Registrar-Recorder/County Clerk
Python
7
star
69

houston-flood-zone-analysis

A Los Angeles Times geospatial analysis of Houston homes after Hurricane Harvey
Jupyter Notebook
7
star
70

lat-soundsystem

The voice of the Los Angeles Times Data Desk
HTML
6
star
71

la-county-trail-maps

Geospatial data of trails managed or planned by the Los Angeles County Department of Parks and Recreation
6
star
72

construction-jobs-analysis

A Los Angeles Times analysis of the demographics and pay of construction workers
HTML
6
star
73

jquery-shareify

Automagically shareificates divs to Facebook and Twitter buttons.
JavaScript
6
star
74

latimes-pluggablemaps-usstates

A pluggable GeoDjango app with the boundaries of all states in the United States of America. Geography, loosely coupled.
Python
6
star
75

geopandas-intersection-area-example

How to use geopandas' overlay method to find the area of intersections between two datasets.
Jupyter Notebook
6
star
76

altair-election-maps-example

An experiment in creating precinct-level election results maps using Python's Altair library
Jupyter Notebook
6
star
77

checkbook-la-watchdog

A periodically updated archive of financial data published by the city of Los Angeles' Checkbook LA data portal
Python
6
star
78

extreme-heat-excess-deaths-analysis

A statistical analysis of excess deaths attributable to extreme heat in California's most populous counties
Jupyter Notebook
6
star
79

groundwater-analysis

A Los Angeles Times analysis of well completion, groundwater levels, and water shortages in the San Joaquin Valley
Jupyter Notebook
5
star
80

tilemill-staticmaps

Generate static maps from an existing TileMill project.
Python
5
star
81

python-lametro-api

A simple Python wrapper for the L.A. Metro’s API for bus stops, routes and vehicles.
JavaScript
5
star
82

python-road-clipping-example

An example of how to use Python's geospatial tools to clip a shapefile of roads into segments using a polygon layer.
Jupyter Notebook
5
star
83

la-vacant-building-complaints-analysis

A Los Angeles Times analysis of vacant building complaints filed with L.A. city
Jupyter Notebook
5
star
84

highschool-homicide-analysis

The Los Angeles Times analysis of homicides near public high schools.
Jupyter Notebook
5
star
85

la-weedmaps-analysis

An analysis conducted for the May 29, 2019, Los Angeles Times story "Black market cannabis shops thrive in L.A. even as city cracks down."
Jupyter Notebook
5
star
86

la-settlements-analysis

A Los Angeles Times analysis of legal payouts by L.A. city
Jupyter Notebook
5
star
87

kamala-harris-schedules

An open-source archive of Vice President Kamala Harris' public calendar
4
star
88

pandas-squarify-example

How to use the squarify extension to matplotlib to visualize a pandas DataFrame as a treemap.
Jupyter Notebook
4
star
89

air-quality-index

Download air quality index data from AirNow
Python
4
star
90

latimes-qiklog

A simplified wrapper for Python's logging module
Python
4
star
91

lausd-school-campus-polygons

The areas of school campuses at the Los Angeles Unified School District
4
star
92

ferc-enforcement-analysis

A Los Angeles Times analysis of civil penalties issued by FERC
Jupyter Notebook
4
star
93

responsive-fullscreen-header-example

An example of a header image that changes based on the device's orientation and always fits to fill the full screen
HTML
4
star
94

nyrb-covers-analysis

An analysis of the colors and content of NYRB Classics book covers
HTML
4
star
95

california-k12-notebooks

Scripts to download and process California K12 data
Jupyter Notebook
4
star
96

python-censusbatchgeocoder-examples

Examples of how to use the python-censusbatchgeocoder library
Jupyter Notebook
3
star
97

california-kindergarten-vaccination-analysis

A Los Angeles Times analysis of California kindergartens with the lowest vaccination rates
Jupyter Notebook
3
star
98

mapping-la-data

Data files from the Los Angeles Times' Mapping L.A. project
3
star
99

native-american-census-analysis

An analysis conducted for the June 13, 2019, Los Angeles Times story "The 2020 census is coming. Will Native Americans be counted?"
Jupyter Notebook
3
star
100

osm-la-streets

A street-centric base layer for overlaying point data about Southern California
Python
3
star