Example code and data for "Practical Data Science with R" 2nd Edition by Nina Zumel and John Mount.
We are very proud to present early access to our book Practical Data Science with R 2nd Edition.
This is the book for you if you are a data scientist, want to be a data scientist, or want to work with data scientists. This is a good "what next" book for analysts and programmers wanting to know more about machine learning and data wrangling.
Our goal is to present data science from a pragmatic, practice-oriented viewpoint. The book will complement other analytics, statistics, machine learning, data science and R books with the following features:
- This book teaches you how to work as a data scientist. Learn how important listening, collaboration, honest presentation and iteration are to what we do.
- The key emphasis of the book is process: collecting requirements, loading data, examining data, building models, validating models, documenting and deploying models to production.
- We provide over 10 significant example datasets, and demonstrate the concepts that we discuss with fully worked exercises using standard R methods. We feel that this approach allows us to illustrate what we really want to teach and to demonstrate all the preparatory steps necessary to any real-world project. Every result and almost every graph in the book is given as a fully worked example.
- This book is scrupulously correct on statistics, but presents topics in the context and order a practitioner worries about them. For example we emphasize construction of predictive models and model evaluation and prediction over the more standard topics of summary statistics and packaged procedures.
In support of Practical Data Science with R 2nd Edition we are providing:
- Table of contents, and a free example chapter available from the Manning book page .
- A public repository of data sets (under a Creative Commons Attribution-NonCommercial 3.0 Unported License where possible).
- Downloadable example code.
The first edition is available in print as 416 pages softbound black and white or as a color eBook. The print version comes with a complimentary eBook version (an insert when the book is purchased new), in all three formats: PDF, ePub, and Kindle. The eBook can be purchased separately from Manning Publications. The second edition is under preview subscription (or MEAP, Manning Early Access Program) and includes an eBook copy of the previous edition (Practical Data Science with R First Edition) at no additional cost!
Available for order now on the Manning book page.
For more about the book please check out:
Additional materials
- A excerpt showing how to install the required software and packages: Starting_with_R_and_Other_Tools.pdf.
- The support site (code and data): GitHub WinVector/PDSwR2.
- Manning's book page.
- A discount code!.
All code excerpts from the book:
More from the authors:
- Win-Vector blog
- WinVectorLLC on Twitter
- Nina Zumel homepage
- John Mount homepage
- Win-Vector data science consulting services
Example data sets:
Includes works derived from others (data sets) remain controlled by them. We are distributing as these parties have allowed and not making any claim or grant of additional rights or license.
- bioavailability Synthetic simple ADME data (source).
- Bookdata Book ratings (source).
- Buzz Discussion forum popularity (source).
- CDC US CDC birth statistics (source).
- Custdata Synthetic example data derived from Census PUMS data to demonstrate retail related plots.
- KDD2009 Credit account prediction (source).
- PUMS US Census PUMS (source).
- Protein Dietary protein sources across multiple countries (source).
- SQLExample Synthetic example data relating price to hotel reservation pickup.
- Spambase Email spam classification (source).
- Statlog German loans defaults (source).
- UCICar Synthetic car ratings (source).
Download
You can download all of the examples and code by following the "git clone" or "download zip" instructions at our master repository: github.com/WinVector/PDSwR2.
We share some installation instruction here.
License for additional documentation, notes, code, and example data:
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
No guarantee, indemnification or claim of fitness is made regarding any of these items.
No claim of license on works of others or derived data.
Errata )