• Stars
    star
    251
  • Rank 161,862 (Top 4 %)
  • Language
    Jupyter Notebook
  • Created almost 9 years ago
  • Updated over 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Methodology and code supporting the BuzzFeed News/BBC article, "The Tennis Racket," published Jan. 17, 2016.

Methodology and Code: Detecting Match-Fixing Patterns In Tennis

A closer look at the data analysis behind BuzzFeed News’ investigation into corruption in tennis.

General Notes

In “The Tennis Racket,” a yearlong investigation into match-fixing in professional tennis, BuzzFeed News published findings from an original data analysis we performed. That analysis revealed many examples of one particularly suspicious pattern: heavy betting against a player, followed by that player’s loss.

Betting patterns alone aren’t proof of fixing. Players can underperform for all sorts of reasons — injury, fatigue, bad luck — and sometimes that underperformance will just happen to coincide with heavy betting against them. But it's extremely unlikely for a player to underperform repeatedly in matches on which people just happen to be betting massive sums against him.

In developing this analysis, BuzzFeed News consulted with Abraham Wyner, a professor of statistics at the University of Pennsylvania, and Thomas Severini, a professor of statistics at Northwestern University.

To see the code that we used for the analysis, go here.

An important note: The analysis was undertaken with only the betting information that is publicly available. Tennis authorities and betting houses have access to much finer-grained data, such as the accounts placing bets, as well as forensic evidence such as phone data and bank records. Without access to such information, it is impossible to know with a sufficient degree of certainty whether these suspicious patterns are indeed the result of match fixing. For this reason, BuzzFeed News has decided not to name the players.

Methodology

  1. Data Acquisition. The analysis began by collecting the opening and closing odds of more than 26,000 tennis matches that occurred between 2009 and mid-September 2015. We downloaded the odds for Association of Tennis Professionals (ATP) and Grand Slam matches from seven large, independent bookmakers whose odds are available on OddsPortal.com.

  2. Data Preparation. BuzzFeed News prepared a dataset that contained one row for each bookmaker for each match. We then used the odds to calculate the implied chances that each player would win. The calculation is straightforward — opponent odds / (opponent odds + player odds) — and accounts for the house's cut.

  3. Match Selection. We excluded opening odds that implied probabilities more than 10 percentage points higher or lower than the median of all bookmakers’ opening odds for the match. (Otherwise the return of these odds toward the consensus could be mistaken for a sign of suspicious betting.) BuzzFeed News also excluded matches that were noted as “canceled” — typically a result of pre-match withdrawals — or “walkover” on OddsPortal. After removing around 500 matches based on the criteria above, 25,993 matches remained.

  4. Odds-Movement Calculation. To calculate the “odds movement” for a bookmaker in a given match, BuzzFeed News looked at the difference between each player’s chance of winning (see above) implied by the opening and final odds. For example, if the opening odds suggested Player A had a 65% chance of winning, but the final odds suggested a 50% chance of winning, the “odds movement” is 15 percentage points.

  5. Player Selection. BuzzFeed News then selected only matches where, in at least one book, the odds moved more than 10 percentage points. (This phenomenon occurred in about 11% of all matches.) We selected the 10-percentage-point cutoff based on discussions with sports-betting investigators, who said that movement above this threshold was what prompted them to give greater scrutiny to a match. We then selected players who had lost more than 10 such “high-movement” matches. Thirty-nine players met this criterion.

  6. Simulation. To estimate the unlikelihood of each player’s outcomes, BuzzFeed News ran a series of simulations. Each simulation used the player’s implied chance of winning — based on each match’s opening odds — to generate a set of outcomes for each string of matches. BuzzFeed News ran the simulation 1 million times per player. The result: The estimated chance that the player would have lost as many (or more) high-movement matches as the player did, if the chances implied by the opening odds were correct.

  7. Significance Check. BuzzFeed News then tested each player’s results for statistical significance. Because 39 players were tested — and the more players you test, the more likely you are to encounter false positives — BuzzFeed News applied a Bonferroni correction to the results. Four players’ simulation results achieved Bonferroni significance at the 95% confidence level. For another 11 players, the results were not significant at the Bonferroni level, but would still have been expected to occur less than 5% of the time. For the full results, please see the table in the analysis notebook.

More Repositories

1

everything

An index of all our open-source data, analysis, libraries, tools, and guides.
1,302
star
2

nics-firearm-background-checks

Monthly data from the FBI's National Instant Criminal Background Check System, converted from PDF to CSV.
Python
177
star
3

zika-data

Data — and pointers to data — related to the 2015–16 Zika virus outbreak.
Python
111
star
4

2016-10-facebook-fact-check

Data and analysis for the BuzzFeed News article, "Hyperpartisan Facebook Pages Are Publishing False And Misleading Information At An Alarming Rate."
Jupyter Notebook
109
star
5

2017-08-spy-plane-finder

The data and analysis referenced in the Aug. 7, 2017 BuzzFeed News article, "BuzzFeed News Trained A Computer To Search For Hidden Spy Planes. This Is What We Found." https://www.buzzfeed.com/peteraldhous/hidden-spy-planes
HTML
107
star
6

2016-04-federal-surveillance-planes

The data and analysis referenced in the Apr. 6, 2016 BuzzFeed News article, "Spies in the Skies." https://www.buzzfeed.com/peteraldhous/spies-in-the-skies
HTML
82
star
7

trumpworld

TrumpWorld data as CSV and GraphML files
82
star
8

2014-06-bikeshare-gender-maps

Data and code for BuzzFeed's bikeshare gender maps.
Jupyter Notebook
74
star
9

2017-08-partisan-sites-and-facebook-pages

Data, analytic code, and findings related to the BuzzFeed News article, "Inside The Partisan Fight For Your News Feed," published August 8, 2017.
Jupyter Notebook
46
star
10

2018-06-nyc-311-complaints-and-gentrification

NYC 311 complaints and demographic analysis
Jupyter Notebook
42
star
11

2018-12-fake-news-top-50

Data and analysis supporting the BuzzFeed News article, "In Spite Of Its Efforts, Facebook Is Still The Home Of Hugely Viral Fake News" published on Dec. 28, 2018
Jupyter Notebook
33
star
12

2015-07-h2-visas-and-enforcement

Data and analysis supporting several passages in the BuzzFeed News article, "The New American Slavery: Invited To The U.S., Foreign Workers Find A Nightmare," published July 24, 2015.
Jupyter Notebook
28
star
13

2016-12-fake-news-survey

Data, analytic code, and findings based on a large-scale survey conducted by Ipsos Public Affairs for BuzzFeed News.
Jupyter Notebook
27
star
14

2017-12-eeoc-harassment-charges

Data and analysis for the BuzzFeed News article, "We Got Government Data On 20 Years Of Workplace Sexual Harassment Claims. These Charts Break It Down," published Dec. 5, 2017.
Jupyter Notebook
27
star
15

2014-08-st-louis-county-segregation

Analysis and data notes for the August 20, 2014 BuzzFeed News article, "The Ferguson Area Is Even More Segregated Than You Probably Guessed"
25
star
16

namestand

A Python library for standardizing lists of names, especially database/CSV column–names.
Python
23
star
17

figure-skating-scores

ISU Figure Skating Score Sheets as Structured Data
Python
23
star
18

2016-11-grading-the-election-forecasts

Data and code supporting BuzzFeed News' evaluation of forecasters' predictions for the November 2016 U.S. presidential and Senate elections.
Jupyter Notebook
22
star
19

2017-12-fake-news-top-50

Data and analysis supporting the BuzzFeed News article, "These Are 50 Of The Biggest Fake News Hits On Facebook In 2017," published on Dec. 28, 2017
Jupyter Notebook
21
star
20

2020-02-gentrification

Data, analytic code, and findings that support portions of the BuzzFeed News article, “These 11 Maps Show How Black People Have Been Driven Out Of Neighborhoods In Five Of The Most Gentrified US Cities,” published February 27, 2020.
Jupyter Notebook
15
star
21

H-2-certification-data

Raw and standardized data tracking tracking the certification decisions for the United States' H-2 visa program.
Python
14
star
22

2018-07-wildfire-trends

Data and R code to reproduce graphics in the Jul. 28, 2018 BuzzFeed News post "How A Booming Population And Climate Change Made California’s Wildfires Worse Than Ever" https://www.buzzfeednews.com/article/peteraldhous/california-wildfires-people-climate
HTML
14
star
23

2019-04-climate-change

Data and R code underlying the maps and chart in the Apr. 22, 2019 BuzzFeed News post "Here And Now: These Maps Show How Climate Change Has Already Transformed The Earth" https://www.buzzfeednews.com/article/peteraldhous/climate-change-maps-ice-sea-level-rise
HTML
14
star
24

2015-12-fatal-police-shootings

The data and analysis referenced in the Dec. 7, 2015 BuzzFeed News article, "Here's What We Know About Race And Killings By Police." http://www.buzzfeed.com/peteraldhous/race-and-police-shootings
R
14
star
25

2018-02-figure-skating-analysis

The data, code, and methodologies supporting the February 8, 2018 BuzzFeed News article, "The Edge."
Jupyter Notebook
13
star
26

2019-08-actblue-donations

Analysis of ActBlue's 2019 mid-year FEC report
Jupyter Notebook
13
star
27

2018-05-fentanyl-and-cocaine-overdose-deaths

Data, analytic code, and findings supporting BuzzFeed News's analysis of fentanyl and cocaine overdose deaths.
Jupyter Notebook
13
star
28

bikeshares

Standardized parsers for data published by bicycle-sharing programs. Currently supporting: NYC's Citi Bike, Chicago's Divvy, and Boston's Hubway.
Python
12
star
29

2015-11-refugees-in-the-united-states

Data and analysis supporting the BuzzFeed News article, "Where U.S. Refugees Come From — And Go — In Charts," published on November 19, 2015.
Python
12
star
30

whtranscripts

Fetch and parse the American Presidency Project's press-briefing and presidential-news-conference transcripts.
HTML
11
star
31

2018-01-twitter-withheld-accounts

Data and analysis supporting the BuzzFeed News article, "An Inside Look At The Accounts Twitter Has Censored In Countries Around The World," published January 24, 2018.
Jupyter Notebook
10
star
32

2018-03-oscars-script-diversity-analysis

Data, analytic code, and findings supporting BuzzFeed News's analysis of diversity in the dialogue of Best Picture–nominated films
Jupyter Notebook
10
star
33

2018-01-trump-twitter-wars

R code to reproduce this Jan. 23, 2018 BuzzFeed News analysis of a year of tweets from President Donald Trump and all members of Congress: https://www.buzzfeed.com/peteraldhous/trump-twitter-wars
HTML
10
star
34

2019-11-sipp

Data, code, and methodology supporting BuzzFeed News' analysis of the 2016 U.S. Census Survey of Income and Program Participation
Jupyter Notebook
8
star
35

2018-02-olympic-figure-skating-analysis

Data, code, and methodology supporting BuzzFeed News' analysis of figure skating scores at the 2018 Olympic Winter Games, published on February 23, 2018.
Jupyter Notebook
8
star
36

bikeshare-data-sources

How to get trip history and station data from various bicycle-sharing programs.
Shell
8
star
37

2020-05-covid-city-zip-codes

Data, code, and methodology supporting the BuzzFeed News' analysis of COVID-19 ZIP codes and demographic trends.
Jupyter Notebook
8
star
38

2015-08-immigrant-detention

Data and code supporting several passages in the BuzzFeed News article, "Vast Disparities By Nationality In Immigration Jailings," published August 25, 2015.
Python
8
star
39

presidential-campaign-contributions

Contributions, transfers, and refunds from recent U.S. presidential candidates' principal campaign committees.
Python
7
star
40

2019-10-fcc-comments

Data, code, and methodology supporting BuzzFeed News' analysis of comments submitted to three Federal Communications Commission (FCC) dockets.
Jupyter Notebook
7
star
41

2015-03-earthquake-maps

The analysis and maps are referenced in the March 6, 2015 BuzzFeed News article, "Midwestern States Are Having Big Earthquakes Like Never Before."
6
star
42

2014-06-firework-injuries

Code and data to generate the BuzzFeed list, "275 Ways Americans Hurt Themselves — Badly — Playing With Fireworks."
Python
6
star
43

2019-10-fec-top-10-donors

Data and code supporting a BuzzFeed News article examining the number of donors per day among presidential candidates in the third quarter of the 2020 election cycle
Jupyter Notebook
6
star
44

2014-08-irs-scams

Analysis of IRS scam-complaint data received from the FTC
Python
6
star
45

2018-10-russian-troll-tweets

Data and R code underlying the original analysis reported in the Oct. 25, 2018 BuzzFeed News post "How Russia’s Online Trolls Engaged Unsuspecting American Voters — And Sometimes Duped The Media" https://www.buzzfeednews.com/article/peteraldhous/russia-online-trolls-viral-strategy
HTML
6
star
46

2017-05-us-health-care

Data and R code to reproduce the graphics in the May 24, 2017 BuzzFeed News article, "Why Americans Are So Damn Unhealthy, In 4 Shocking Charts." https://www.buzzfeed.com/peteraldhous/american-health-care
HTML
6
star
47

2016-02-republican-donor-movements

Data and code to estimate the number of donors who previously gave to one candidate, but then switched to another.
Jupyter Notebook
6
star
48

2022-04-icf-analysis

Data and analysis of intermediate care facilities, supporting a BuzzFeed News investigation.
Jupyter Notebook
6
star
49

2016-12-medicare-claims-analysis

Analysis of Medicare claims for the BuzzFeed News article, “Intake,” published December 7, 2016.
Jupyter Notebook
6
star
50

2018-01-trump-state-of-the-union

R code to reproduce this Jan. 31, 2018 BuzzFeed News analysis of the text of every State of the Union address: https://www.buzzfeed.com/peteraldhous/trump-state-of-the-union-words
HTML
5
star
51

presidential-language-notebooks

A collection of notebooks doing data analysis for BuzzFeed stories on stories about presidential speech.
5
star
52

2017-04-fake-news-ad-trackers

Data and analysis supporting portions of the BuzzFeed News article, "Fake News, Real Ads," published April 4, 2017.
Jupyter Notebook
5
star
53

2017-09-science-foia

The FOIA logs referenced in the Sep. 2, 2017 BuzzFeed News article, "These Scientists Got To See Their Competitors’ Research Through Public Records Requests." https://www.buzzfeed.com/teresalcarey/when-scientists-foia
5
star
54

2016-01-texas-municipal-courts

Data and analysis regarding criminal cases in Texas municipal courts. The findings are referenced in the BuzzFeed News article, "The Ticket Machine", published January 26, 2016.
Jupyter Notebook
4
star
55

2016-10-white-ancestry-and-trump-support

Data and analytic code supporting the October 9, 2016 BuzzFeed News article, "Inside The White Vote: Ethnic Germans, Italians Favor Trump And The GOP, Poll Finds."
Jupyter Notebook
4
star
56

2015-05-talkpay-tweets

A *Very Rough* Analysis of #talkpay Tweets
4
star
57

2016-04-bernie-sanders-donors

Data and code supporting the BuzzFeed News article, "How Bernie Sanders Raises All That Money," published on April 19, 2016.
Jupyter Notebook
4
star
58

2015-07-primates

The data and analysis referenced in the July 7, 2015 BuzzFeed News article, "The Silent Monkey Victims Of The War On Terror." http://www.buzzfeed.com/peteraldhous/the-monkey-victims-of-the-war-on-terror
R
4
star
59

2014-09-tuition-and-minimum-wage

Data, sourcing notes, and analysis for the September 4, 2014 BuzzFeed article, "These Charts Show How Much College A Minimum Wage Job Paid For, Then And Now," by Greg Schoofs.
4
star
60

2020-04-cms-nursing-homes

Analysis of CMS Nursing Home Compare data
Jupyter Notebook
4
star
61

2015-11-lottery-simulations

Code to simulate the long-term net profit/loss of people who buy New York state lottery tickets on a regular basis.
4
star
62

2015-02-texas-cpa-deficiencies

Data and code used to analyze deficiency rates among Texas foster care child placing agencies. This analysis is referenced in the Feb. 20, 2015 BuzzFeed News article, "Fostering Profits."
Python
3
star
63

us-land-cover

Interactive display of 2016 US National Land Cover database from https://www.mrlc.gov/
HTML
3
star
64

2015-06-ssm-and-abortion-poll

BuzzFeed/Ipsos same-sex marriage and abortion poll data and analysis.
3
star
65

2016-08-sports-gender-gaps

Data and R code to reproduce the analysis and graphics in the the August 22, 2016 article "How Katie Ledecky Stacks Up Against Male Swimmers."
HTML
3
star
66

2016-12-transgender-rights-survey

Data and analysis supporting the BuzzFeed News article, "This Is How 23 Countries Feel About Transgender Rights," published December 29, 2016.
Jupyter Notebook
3
star
67

2019-06-democratic-candidates-twitter

Data and R code underlying the the Jun. 6, 2019 BuzzFeed News quiz "Can You Guess These Presidential Candidates Based On What They Tweet About?" https://www.buzzfeednews.com/article/peteraldhous/2020-election-democratic-primary-tweets
HTML
3
star
68

2015-12-H-2-visas-and-experience-requirements

Data and analysis supporting several passages in the BuzzFeed News article, "All You Americans Are Fired," published December 1, 2015.
Python
3
star
69

2017-12-sexual-misconduct-cable-news-coverage

Code, data, analysis, and graphs supporting the BuzzFeed News article, "What Sexual Misconduct Allegations Are Getting The Most Attention On Cable News?" published on Dec. 10, 2017
Jupyter Notebook
3
star
70

2017-01-causes-of-warming

The data and R code to reproduce the graphics in the Jan. 18, 2017 BuzzFeed News article, "2016 Was The Hottest Year. Yes, Greenhouse Gases Are To Blame." https://www.buzzfeed.com/peteraldhous/blame-co2-for-record-heat
HTML
3
star
71

2016-07-athletic-performances

Data and R code to reproduce the graphics in the July 30, 2016 BuzzFeed News article "Why Track-And-Field Stars Don’t Set World Records Like They Used To (But Swimmers Do)"
HTML
2
star
72

2020-10-electoral-college-effect-by-demographic

Jupyter Notebook
2
star
73

2018-12-wechat-pence

Data, methodologies, and graphics associated with the BuzzFeed News article, "China’s Censors Give Anti-Trump And Anti-US Rhetoric A Pass On WeChat", published December 19, 2018
Jupyter Notebook
2
star
74

2022-04-registries

Analysis that supports the investigation The Blacklist published on April 27, 2022 on BuzzFeed News
Jupyter Notebook
2
star
75

2018-10-midterm-demographics

The data and code for the BuzzFeed News article "Black Voters Are Underrepresented In This Year’s Biggest House Races"
Jupyter Notebook
2
star
76

2017-01-immigration-and-science

R code to recreate the graphics in the January 31, 2017 BuzzFeed News article "These Nobel Prizewinners Show Why Immigration Is So Important For American Science." https://www.buzzfeed.com/peteraldhous/immigration-and-science
HTML
2
star
77

2016-01-port-arthur-arrests

Data, methodologies, and analysis associated with the BuzzFeed News article, "The Ticket Machine", published January 26, 2016.
Jupyter Notebook
2
star
78

2021-05-tx-winter-storm-deaths

Data and R code to reproduce the analysis underlying "The Graveyard Doesn't Lie," a May 26, 2021 BuzzFeed News article on the excess deaths caused by the February 2021 winter storm and power outages in Texas.
HTML
2
star
79

2018-09-ftc-analysis

Data, analytic code, and findings that support portions of the BuzzFeed News article, “She Paid A Lawyer Thousands Of Dollars To Apply For A Green Card. Then She Got A Deportation Order Instead.,” published September 29, 2018.
Jupyter Notebook
2
star
80

2015-12-the-coyote

Analysis supporting several passages in the BuzzFeed News article, "The Coyote," published December 29, 2015.
Jupyter Notebook
2
star
81

2016-11-voter-power-by-demographic

Data and code supporting the BuzzFeed News article, "How The Electoral College Screws Hispanic And Asian Voters" published on November 7, 2016.
Jupyter Notebook
1
star
82

2015-12-mass-shooting-intervals

Data and code behind BuzzFeed News' Dec. 2, 2015 post regarding the number of days elapsed between mass shootings.
1
star
83

2020-08-nypd-march

Jupyter Notebook
1
star
84

2019-10-social-sentinel

Data and R code underlying the the Oct. 31, 2019 BuzzFeed News article "Your Dumb Tweets Are Getting Flagged To People Trying To Stop School Shootings" https://www.buzzfeednews.com/article/lamvo/social-sentinel-school-officials-shootings-flag-social-media
HTML
1
star
85

2020-11-covid-election

HTML
1
star
86

2020-09-patchwork-pandemic

HTML
1
star
87

2016-05-H-2-debarments-and-violations

Data, code, and methodologies supporting the May 12, 2016 BuzzFeed News article, "The Pushovers."
Jupyter Notebook
1
star
88

2016-11-bellwether-counties

Analysis supporting the BuzzFeed News article, "How Well Does Your County Predict The Next President?" published on November 2, 2016
Jupyter Notebook
1
star
89

2016-12-warmest-year

The data and R code to reproduce the graphics in the Dec. 20, 2016 BuzzFeed News article, "2016 Will Be The Warmest Year, But This Is How Deniers Will Spin It." https://www.buzzfeed.com/peteraldhous/another-hottest-year
HTML
1
star
90

2021-09-guardianship-filings

Data and analysis re. US adult guardianship filing counts, supporting a BuzzFeed News investigation.
Jupyter Notebook
1
star
91

2016-07-rnc-dnc-surveillance-planes

Data to support the findings reported in the July 24, BuzzFeed News article "The Republican Convention Was Secretly Watched From Above" and the August 1, 2016 article "Government Spy Planes Circled Over The Democratic Convention More Intensely Than GOP Event."
1
star
92

2016-09-shy-trumpers

Data and R code to reproduce the analysis and graphics in the September 16, 2016 article "Why 'Shy Trumpers' Probably Won't Decide The Election."
HTML
1
star
93

2019-07-fec-daily-donors

Data and code supporting a BuzzFeed News article examining the number of donors per day among Democratic presidential candidates in the second quarter of the 2020 election cycle
Jupyter Notebook
1
star
94

2018-08-charlottesville-twitter-trolls

Data and R code to reproduce the analysis and graphics in the Aug. 10, 2018 BuzzFeed News post "Russian Trolls Swarmed The Charlottesville March — Then Twitter Cracked Down" https://www.buzzfeednews.com/article/peteraldhous/russia-twitter-trolls-charlottesville
HTML
1
star
95

2020-06-leso-1033-transfers-since-ferguson

Data and code, supporting the BuzzFeed News' analysis of "1033" program transfers to local law enforcement agencies since Ferguson.
Jupyter Notebook
1
star
96

2017-11-gun-sales-estimates

The analysis referenced in the Nov. 3, 2017 BuzzFeed News article "Under Trump, Gun Sales Did Not Spike After The Las Vegas Shooting." https://www.buzzfeed.com/peteraldhous/gun-sales-after-vegas-shooting
HTML
1
star
97

2016-04-republican-donor-movements

Data and code supporting the BuzzFeed News article, "Bush, Rubio Donors Didn’t Rush To Support Ted Cruz In March," published on April 26, 2016.
Jupyter Notebook
1
star
98

2017-09-harvey-emissions-update

Data and analysis supporting BuzzFeed News' reporting on Harvey-related industrial emissions in Texas.
HTML
1
star
99

2017-11-federal-employee-diversity

Data and analysis supporting the BuzzFeed News article, "The Agriculture Department Hired More Than 50 Political Appointees. They All Say They're White.," published November 15, 2017
Jupyter Notebook
1
star
100

2016-11-philadelphia-turnout

Data and code supporting the BuzzFeed News article, "Data Shows Sharp Drop Among Black Voters In Philadelphia" published on November 19, 2016
Jupyter Notebook
1
star