• Stars
    star
    3
  • Rank 3,944,563 (Top 79 %)
  • Language
    Python
  • License
    MIT License
  • Created over 2 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Web application to render more transparent and approachable privacy policies. Functionalities include privacy policy summarization, similarity analysis, readability score, and topic modeling.

More Repositories

1

Contemporary-US-Economic-Mobility-Analysis-in-R

Principles of Data Science Part I. Fivethirtyeight data graphics An R package that provides access to the code and data sets published by FiveThirtyEightย https://github.com/fivethirtyeight/data, was just made available to public. The developers, Albert Kim and his colleagues, maintains a webpage for the packageย fivethirtyeight:ย https://rudeboybert.github.io/fivethirtyeight/ The data sets included are massive. You can find a list of these, including the URLs to the original fivethirtyeight.com articles, atย https://rudeboybert.github.io/fivethirtyeight/articles/fivethirtyeight.html. The task (Part I) is to choose one of the articles with data graphics, and recreate one or more of the data graphics found in the article. Examples of such report can be found atย https://rudeboybert.github.io/fivethirtyeight/articles/ The report will consist of 1. A technical discussion of your data wrangling-visualization statements; 2. A brief paragraph explaining the context of the data graphic you created, and be prepared by R markdown. Part II. Retreive, explore, and analyze This part of the task is to retreive, explore, and analyze data in one of the topic areas. You will need to choose one from American Time Use Survey Data and Economic Mobility data (see below). Scope of the work The final product will consist of 1. visualization or tabulation of the data (from either exploring or modeling), 2. results of statistic tests for your hypothesis, 3. and modeling and predictions from statistical learning methods. report The report consists of 1. Proposed goals in your progress report, 2. Analysis (both code chunks and results), 3. Interpretation, 1. Economic Mobility data We will look at economic mobility across generations in the contemporary USA. The data come from a large study1, based on tax records, which allowed researchers to link the income of adults to the income of their parents several decades previously. For privacy reasons, we donโ€™t have that individual-level data, but we do have aggregate statistics about economic mobility for several hundred communities, containing most of the American population, and covariate information about those communities. We are interested in predicting economic mobility from the characteristics of communities. Data can be read using the following R code. There are 741 communities (observations) and 43 variables. dat <- read.csv("mobility.csv") The variable we want to predict is economic mobility; the rest are predictor variables or covariates. 1. Mobility: The probability that a child born in 1980โ€“1982 into the lowest quintile (20%) of household income will be in the top quintile at age 30. Individuals are assigned to the community they grew up in, not the one they were in as adults. (๊ฐ€๊ณ„ ์†Œ๋“์˜ ์ตœ์ € 5 ๋ถ„์œ„์ˆ˜ (20 %)์— ์†ํ•ด ์žˆ๋Š” 1980-1982 ๋…„ ์ถœ์ƒํ•œ ์•„์ด๊ฐ€ 30์„ธ์— ๋˜์—ˆ์„ ๋•Œ ์ƒ์œ„ 1 ๋ถ„์œ„์— ์†ํ•  ํ™•๋ฅ )โ€จ 2. Population in 2000. (2000๋…„ ๊ธฐ์ค€ ์ธ๊ตฌ)โ€จ 3. Is the community primarily urban or rural? (์ปค๋ฎค๋‹ˆํ‹ฐ๊ฐ€ ๋„์‹œ์ธ๊ฐ€ ์‹œ๊ณจ์ธ๊ฐ€?)โ€จ 4. Black: percentage of individuals who marked black (and nothing else) on census forms. (ํ‘์ธ์˜ ๋น„์œจ)โ€จ 5. Racial segregation: a measure of residential segregation by race. (์ธ์ข…๋ณ„ ์ฃผ๊ฑฐ์ง€ ๋ถ„๋ฆฌ์˜ ์ •๋„)โ€จ 6. Income segregation: Similarly but for income. (์†Œ๋“๋ณ„ ์ฃผ๊ฑฐ์ง€ ๋ถ„๋ฆฌ์˜ ์ •๋„)โ€จ 7. Segregation of poverty: Specifically a measure of residential segregation for those in the bottom quarter of the national income distribution. (์ €์†Œ๋“์ธต๊ณผ ์ค‘์ƒ๋ฅ˜์ธต์˜ ์ฃผ๊ฑฐ์ง€ ๋ถ„๋ฆฌ์˜ ์ •๋„)โ€จ 8. Segregation of affluence: Residential segregation for those in the top qarter. (์ƒ๋ฅ˜์ธต๊ณผ ์ค‘ํ•˜์ธต์˜ ์ฃผ๊ฑฐ์ง€ ๋ถ„๋ฆฌ์˜ ์ •๋„)โ€จ 9. Commute: Fraction of workers with a commute of less than 15 minutes. (15 ๋ถ„ ๋ฏธ๋งŒ ํ†ต๊ทผํ•˜๋Š” ์ฃผ๋ฏผ์˜ ๋น„์œจ)โ€จ 10. Mean income: Average income per capita in 2000. (ํ‰๊ท  ์†Œ๋“ )โ€จ 11. Gini: A measure of income inequality, which would be 0 if all incomes were perfectly equal, and tends towards 100 as all the income is concentrated among the richest individuals. ( ์ง€๋‹ˆ ๊ณ„์ˆ˜)โ€จ 12. Share 1%: Share of the total income of a community going to its richest 1%. (์ƒ์œ„ 1% ๊ฐ€ ์ฐจ์ง€ํ•˜๋Š” ์ˆ˜์ž…์˜ ๋น„์œจ)โ€จ 13. Gini bottom 99%: Gini coefficient among the lower 99% of that community. (์ƒ์œ„ 1 %๋ฅผ ์ œ์™ธํ•œ ๋‚˜๋จธ์ง€์˜ ์ง€๋‹ˆ ๊ณ„์ˆ˜)โ€จ 14. Fraction middle class: Fraction of parents whose income is between the national 25th and 75th percentiles. ( ์ค‘์‚ฐ์ธต ๋น„์œจ )โ€จ 15. Local tax rate: Fraction of all income going to local taxes. ( ์ง€๋ฐฉ์„ธ์œจ )โ€จ 16. Local government spending: per capita. ( 1 ์ธ๋‹น ์ง€๋ฐฉ์ •๋ถ€ ์ง€์ถœ )โ€จ 17. Progressivity: Measure of how much state income tax rates increase with income. ( ์„ธ๊ธˆ ๊ฐ€์ค‘์˜ ์ •๋„ )โ€จ 18. EITC: Measure of how much the state contributed to the Earned Income Tax Credit (a sort of negative income tax for very low-paid wage earners). ( ์ €์†Œ๋“์ธต์„ ์œ„ํ•œ ์„ธ๊ธˆ ๊ณต์ œ์˜ ์ •๋„ )โ€จ 19. School expenditures: Average spending per pupil in public schools. ( ๊ณต๋ฆฝํ•™๊ต์˜ ํ•™์ƒ 1 ์ธ๋‹น ํ‰๊ท  ์ง€์ถœ. )โ€จ 20. Student/teacher ratio: Number of students in public schools divided by number of teachers.( ํ•™์ƒ / ๊ต์‚ฌ ๋น„์œจ )โ€จ 21. Test scores: Residuals from a linear regression of mean math and English test scores on household income per capita. ( ์‹œํ—˜ ์ ์ˆ˜: ์–ธ์–ด+์ˆ˜ํ•™ ์ ์ˆ˜๋ฅผ ํ‰๊ท  ๊ฐ€์ • ์†Œ๋“์— ํšŒ๊ท€ํ•œ ์ž”์ฐจ )โ€จ 22. High school dropout rate: Also, residuals from a linear regression of the dropout rate on per-capita income. ( ๊ณ ๋“ฑํ•™๊ต ์ค‘ํ‡ด์œจ : ์‹ค์ œ ์ค‘ํ‡ด์œจ๋ฅผ ํ‰๊ท  ๊ฐ€์ • ์†Œ๋“์— ํšŒ๊ท€ํ•œ ์ž”์ฐจ )โ€จ 23. Colleges per capita ( 1 ์ธ๋‹น ๋Œ€ํ•™์˜ ๊ฐฏ์ˆ˜ )โ€จ 24. College tuition: in-state, for full-time students ( ๋Œ€ํ•™ ๋“ฑ๋ก๊ธˆ )โ€จ 25. College graduation rate: Again, residuals from a linear regression of the actual graduation rate on household income per capita. ( ๋Œ€ํ•™ ์กธ์—…์œจ: ์‹ค์ œ ์กธ์—…์œจ๋ฅผ ํ‰๊ท  ๊ฐ€์ • ์†Œ๋“์— ํšŒ๊ท€ํ•œ ์ž”์ฐจ )โ€จ 26. Labor force participation: Fraction of adults in the workforce. ( ๋…ธ๋™์ธ๊ตฌ ์ค‘ ์„ฑ์ธ์˜ ๋น„์œจ )โ€จ 27. Manufacturing: Fraction of workers in manufacturing. ( ์ œ์กฐ์—… ๊ทผ๋กœ์ž์˜ ๋น„์œจ )โ€จ 28. Chinese imports: Growth rate in imports from China per worker between 1990 and 2000. ( ์ค‘๊ตญ์‚ฐ ์ˆ˜์ž… ์ฆ๊ฐ€์œจ )โ€จ 29. Teenage labor: fraction of those age 14โ€“16 who were in the labor force. ( ๋…ธ๋™์ธ๊ตฌ ์ค‘ 10 ๋Œ€์˜ ๋น„์œจ )โ€จ 30. Migration in: Migration into the community from elsewhere, as a fraction of 2000 population. ( ์ด์‚ฌ์˜ค๋Š” ๋น„์œจ )โ€จ 31. Migration out: Ditto for migration into other communities. ( ์ด์‚ฌ ๋‚˜๊ฐ€๋Š” ๋น„์œจ )โ€จ 32. Foreign: fraction of residents born outside the US. ( ์™ธ๊ตญ ํƒœ์ƒ ์ธ๊ตฌ ๋น„์œจ )โ€จ 33. Social capital: Index combining voter turnout, participation in the census, and participation in community organizations. ( ์‚ฌํšŒ ์ฐธ์—ฌ์˜ ์ •๋„ )โ€จ 34. Religious: Share of the population claiming to belong to an organized religious body. ( ์ข…๊ต ์ƒํ™œ ์ฐธ์—ฌ์˜ ์ •๋„ )โ€จ 35. Violent crime: Arrests per person per year for violent crimes. ( ํญ๋ ฅ ๋ฒ”์ฃ„์œจ )โ€จ 36. Single motherhood: Number of single female households with children divided by the total number of households with children. ( ์ „์ฒด ์•„์ด๊ฐ€ ์žˆ๋Š” ๊ฐ€์ • ์ค‘ ์—„๋งˆ ํ˜ผ์ž ์•„์ด ํ‚ค์šฐ๋Š” ์ง‘์˜ ๋น„์œจ )โ€จ 37. Divorced: Fraction of adults who are divorced. (์ดํ˜ผํ•œ ๋น„์œจ )โ€จ 38. Married: Ditto. ( ๊ฒฐํ˜ผํ•œ ๋น„์œจ )โ€จ 39. Longitude: Geographic coordinate for the center of the community (๊ฒฝ๋„: ๋™์„œ )โ€จ 40. Latitude: Ditto ( ์œ„๋„: ๋‚จ๋ถ )โ€จ 41. ID: A numerical code, identifying the community. ( ์ปค๋ฎค๋‹ˆํ‹ฐ ์‹๋ณ„ ์ฝ”๋“œ )โ€จ 42. Name: the name of principal city or town. ( ๋™๋„ค ์ด๋ฆ„ )โ€จ 43. State: the state of the principal city or town of the community. ( ๋™๋„ค๊ฐ€ ์†ํ•œ ๋ฏธ๊ตญ์˜ ์ฃผ)โ€จ 1. Chetty, Raj, Nathaniel Hendren, Patrick Kline and Emmanuel Saez (2014). โ€œWhere is the Land of Opportunity? The Geography of Intergenerational Mobility in the United States.โ€ Quarterly Journal of Economics, 129: 1553โ€“ 1623. Finding and reading this paper does not actually help youโ†ฉโ€จ
3
star
2

PandemicTracker

HTML
2
star
3

YouTube_User_Radicalization_Research

A repository for the ongoing capstone project
Jupyter Notebook
1
star
4

Underpass-Water-Level-Prediction-LSTM

Woojangchun underpass is located in Busan, South Korea. Flooding of the underpass has been a chronic problem during the rainy season in summers. This project is to accurately predict the tunnelwater level from June to August to help better prepare for
Jupyter Notebook
1
star
5

YouTubeCrawler

To construct a network of children videos we crawl data from a single video of an input query. Data will be acquired from a parent video and its recommended videos that usually range from 100 to 120 in number. So, for instance, if crawling starts from one video, all its data will first be scrapped and then the video IDs of its 100 120 recommended videos will be put into a queue so that the crawler moves onto each of them in order. By repeating this process for every video, we could record the parent-child relationships between different videos. Every time a video undergoes a crawling process, its counter will be updated so that the video is not crawled no more than 10 times in repetition. The reason for this is that a parent video and a child video could keep recommending each other back and forth in a loop. Although this would indicate that these pair of videos are dominant in the network, we set a threshold to the loop to reduce the unnecessary computing power. The data acquired for each video will be as the following: video ID, title, description, tags, channel title, published time, duration, comment count, dislike count, like count, view count, for-Kids Boolean, recommended videos, parent video, counter.
Jupyter Notebook
1
star