• Stars
    star
    182
  • Rank 211,154 (Top 5 %)
  • Language
    Python
  • License
    GNU Affero Genera...
  • Created about 6 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Policy Change Index for China (PCI-China)

Website: policychangeindex.org

Build Status codecov

Authors: Julian TszKin Chan and Weifeng Zhong

Please email all comments/questions to julian.chan [AT] policychangeindex.org or weifeng.zhong [AT] policychangeindex.org

What is the Policy Change Index for China (PCI-China)?

China's industrialization process has long been a product of government direction, be it coercive central planning or ambitious industrial policy. For the first time in the literature, we develop a quantitative indicator of China's policy priorities over a long period of time, which we call the Policy Change Index for China (PCI-China). The PCI-China is a leading indicator that runs from 1951 to the most recent quarter and can be updated in the future. In other words, the PCI-China not only helps us understand the past of China's industrialization but also allows us to make short-term predictions about its future directions.

The design of the PCI-China has two building blocks: (1) it takes as input data the full text of the People's Daily --- the official newspaper of the Communist Party of China --- since it was founded in 1946; (2) it employs a set of machine learning techniques to "read" the articles and detect changes in the way the newspaper prioritizes policy issues.

The source of the PCI-China's predictive power rests on the fact that the People's Daily is at the nerve center of China's propaganda system and that propaganda changes often precede policy changes. Before the great transformation from the central planning under Mao to the economic reform program after Mao, for example, considerable efforts were made by the Chinese government to promote the idea of reform, move public opinion, and mobilize resources toward the new agenda. Therefore, by detecting (real-time) changes in propaganda, the PCI-China is, effectively, predicting (future) changes in policy.

For details about the methodology and findings of this project, please see the following research paper:

Disclaimer

Results will change as the underlying models improve. A fundamental reason for adopting open source methods in this project is so that people from all backgrounds can contribute to the models that our society uses to assess and predict changes in public policy; when community-contributed improvements are incorporated, the model will produce better results.

Getting Started

The first step for everyone (users and developers) is to open a free GitHub account. And then you can specify how you want to "watch" the PCI-China repository by clicking on the Watch button in the upper-right corner of the repository's main page.

The second step is to get familiar with the PCI-China repository by reading the documentation.

If you want to ask a question or report a bug, create a new issue here and post your question or tell us what you think is wrong with the repository.

If you want to request an enhancement, create a new issue here and provide details on what you think should be added to the repository.

Installation Guide

First, install the dependencies and set up the proper environment by running the following command in the shell:

./PCI-China>conda env create -f environment.yml

Second, activate the new environment pci_env:

./PCI-China>conda activate pci_env

Third, run the following in the pci_env environment:

./PCI-China>sh run_all.sh

The above command will perform the following tasks: (1) processing data, (2) training models for two-, five-, and ten-year rolling windows, (3) compiling results, (4) creating text output, and (5) visualizing results.

If you do not have the People's Daily data, you can run our tests which estimate a PCI using a simulated data set:

./PCI-China>pytest 

Notes

  • The default setting uses the first GPU to run the code. If you don't have a GPU, the code can be ran on CPU by changing the GPU setting to -1 (see details below)
  • One of the package imported by PCI (jieba-fast) requires Visual Studio C++ Build Tools. Please checkout jieba-fast's website for details.

Function Usage

The python and an R script listed below are contained in the run_all.sh file. They are available for users to perform the following tasks, respectively.

  • proc_pd.py: Process and prepare the raw data from the People's Daily for building the neural network models.
  • pci.py: Train a neural network model to construct the PCI-China for a specified year-quarter, using a specified rolling window length.
  • compile_tuning.py: Compile the results from all models and export them to a .csv file.
  • create_text_output.py: Generate the raw data together with the model's classification result for each article in a specified year-quarter.
  • gen_figures.R: Generate figures.
  • create_plotly.py: Create an interactive Plotly figure.

For the pci.py file, users can also check out the descriptions of the arguments for the function using the --help option:

./PCI-China>python pci.py --help
Using TensorFlow backend.
usage: pci.py [-h] [--model MODEL] [--year YEAR] [--month MONTH] [--gpu GPU]
              [--iterator ITERATOR] [--root ROOT] [--temperature TEMPERATURE]
              [--discount DISCOUNT] [--bandwidth BANDWIDTH]

optional arguments:
  -h, --help            show this help message and exit
  --model MODEL         Model name: window_5_years_quarterly,
                        window_10_years_quarterly, window_2_years_quarterly
  --year YEAR           Target year
  --month MONTH         Target month
  --gpu GPU             Which gpu to use
  --iterator ITERATOR   Iterator in simulated annealing
  --root ROOT           Root directory
  --temperature TEMPERATURE
                        Temperature in simulated annealing
  --discount DISCOUNT   Discount factor in simulated annealing
  --bandwidth BANDWIDTH
                        Bandwidth in simulated annealing

Data

The raw data of the People's Daily, which are not provided in this repository, should be placed in the sub-folder PCI-China/Input/pd/. Each file in this sub-folder should contain one year-quarter of data, be named by the respective year-quarter, and be in the .pkl format. For example, the raw data for the first quarter of 2018 should be in the file 2018_Q1.pkl. Below is the list of column names and types of each raw data file:

>>> df1 = pd.read_pickle("./PCI-China/Input/pd/pd_1946_1975.pkl")
>>> df1.dtypes
date     datetime64[ns]
year              int64
month             int64
day               int64
page              int64
title            object
body             object
id                int64
dtype: object

where title and body are the Chinese texts of the title and body of each article.

The processed data of the People's Daily, which are not provided in this repository, should be placed in the sub-folder PCI-China/data/Output/database.db. The file is in SQLite format. The schema of the database is shown as the table below:

import sqlite3
import pandas as pd 

conn = sqlite3.connect("data/output/database.db")
pd.read_sql_query("PRAGMA TABLE_INFO(main)", conn)
cid name type notnull dflt_value pk
0 0 date TIMESTAMP 0 None 0
1 1 id INTEGER 0 None 0
2 2 page REAL 0 None 0
3 3 title TEXT 0 None 0
4 4 body TEXT 0 None 0
5 5 strata INTEGER 0 None 0
6 6 title_seg TEXT 0 None 0
7 7 body_seg TEXT 0 None 0
8 8 year INTEGER 0 None 0
9 9 quarter INTEGER 0 None 0
10 10 month INTEGER 0 None 0
11 11 day INTEGER 0 None 0
12 12 weekday INTEGER 0 None 0
13 13 frontpage INTEGER 0 None 0
14 14 page1to3 INTEGER 0 None 0
15 15 title_len INTEGER 0 None 0
16 16 body_len INTEGER 0 None 0
17 17 n_articles_that_day INTEGER 0 None 0
18 18 n_pages_that_day REAL 0 None 0
19 19 n_frontpage_articles_that_day INTEGER 0 None 0

where title_int and body_int are the word embeddings (numeric vectors) of the title and body of each article.

The summary statistics for the processed data can be found in the following .csv file:

https://github.com/PSLmodels/PCI-China/blob/master/PCI-China/figures/Summary%20statistics.csv

Neither the raw data nor the processed data of the People's Daily can be released by the authors. Users who have questions about applying the repository to their own data are welcome to contact the authors:

Citing the PCI-China

Please cite the source of the latest PCI-China by the website: https://policychangeindex.org.

For academic work, please cite the following research paper:

More Repositories

1

Tax-Calculator

USA Federal Individual Income and Payroll Tax Microsimulation Model
Python
262
star
2

OG-Core

An overlapping generations model framework for evaluating fiscal policies.
Python
67
star
3

ui_calculator

Calculator for unemployment insurance benefits
Python
22
star
4

taxdata

The TaxData project prepares microdata for use with the Tax-Calculator microsimulation project.
Python
20
star
5

OG-USA

Overlapping-generations macroeconomic model for evaluating fiscal policy in the United States
Python
19
star
6

ParamTools

Library for parameter processing and validation with a focus on computational modeling projects
Python
19
star
7

CGE

An Open Source Computational General Equilibrium Model
Python
19
star
8

PSL-Infrastructure

The Policy Simulation Library consists of models and modeling tools that share the transparency standards and interoperability criteria set by the PSL-Infrastructure project
HTML
18
star
9

Cost-of-Capital-Calculator

A cost of capital and effective tax rate calculator
Python
17
star
10

Tax-Cruncher

Calculates federal tax liabilities from individual data under a range of policy scenarios
Python
14
star
11

Business-Taxation

USA Corporate and Pass-Through Business Tax Model
Python
11
star
12

Tax-Brain

Tax-Brain is an integrator model for PSL tax models
Python
10
star
13

OG-UK

An overlapping generations model to simulate fiscal policy the United Kingdom.
Python
6
star
14

Git-Tutorial

This repository contains the source material for the Git and GitHub Use, Collaboration, and Workflow tutorial. This tutorial is in the form of a Jupyter Book.
HTML
6
star
15

Behavioral-Responses

PSL module that estimates partial-equilibrium behavioral responses to tax changes simulated by Tax-Calculator
Python
5
star
16

UBI-examples

UBI analyses using Tax-Calculator, TaxData, and C-TAM
Jupyter Notebook
5
star
17

PCI-Crackdown

Policy Change Index for Crackdown (PCI-Crackdown)
R
5
star
18

PCI-Outbreak

Policy Change Index for Outbreak (PCI-Outbreak)
HTML
5
star
19

Package-Builder

Policy Simulation Library tool that builds packages and uploads them to the Anaconda Cloud's PSLmodels channel for public distribution
Python
4
star
20

scf

Extract US Survey of Consumer Finances microdata
Python
4
star
21

Federal-State-Tax

Modeling the impact of federal tax policy on the 50 states
3
star
22

DSGE.jl

This is a mirror of https://github.com/FRBNY-DSGE/DSGE.jl
Julia
3
star
23

examples

Data structures for executing examples with models in the Policy Simulation Library.
Python
3
star
24

Border-Adjustment-Calculator

R
3
star
25

OG-Multi-Country

Python
2
star
26

Geo-DICE

Modified DICE model with Geoengineering
MATLAB
2
star
27

PUF-State-Distribution

Shell
2
star
28

PFL-CM

Paid Family Leave Cost Model
Stata
2
star
29

blog

PSL blog using Fastpages.
CSS
2
star
30

plot-concepts

2
star
31

C-TAM

This repository provides code scripts and description for CPS Transfer Augmentation Model (C-TAM). This is an Open Source Model.
Python
2
star
32

tax-microdata-benchmarking

A project to develop a benchmarked general-purpose dataset for tax reform impact analysis.
Python
2
star
33

InverseOptimalTax

Inverse optimal taxation model
Jupyter Notebook
1
star
34

plots

HTML
1
star
35

drop-q

Python
1
star