dlab-berkeley/Machine-Learning-in-R

This repository has been archived on 06/May/2022
Stars
187
Rank 206,464 (Top 5 %)
Language
CSS
License
Other
Created almost 8 years ago
Updated over 3 years ago

dlab-berkeley/Machine-Learning-in-R

dlab-berkeley

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Workshop (6 hours): preprocessing, cross-validation, lasso, decision trees, random forest, xgboost, superlearner ensembles

See the Fall 2020 tidymodels update!

https://github.com/dlab-berkeley/Machine-Learning-with-tidymodels

Machine Learning in R

This is the repository for D-Lab’s Introduction to Machine Learning in R workshop. View the associated slides here.

RStudio Binder:

Content outline

Background on machine learning
- Classification vs regression
- Performance metrics
Data preprocessing
- Missing data
- Train/test splits
Algorithm walkthroughs
- Lasso
- Decision trees
- Random forests
- Gradient boosted machines
- SuperLearner ensembling
- Principal component analysis
- Hierarchical agglomerative clustering
Challenge questions

Getting started

Please follow the notes in participant-instructions.md.

HAVE FUN! :^)

The seven algorithm R Markdown files (lasso, decision tree, random forest, xgboost, SuperLearner, PCA, and clustering) are designed to function in a standalone manner.

After installing and librarying the packages in 01-overview.Rmd, run all the code in 02-preprocessing.Rmd to preprocess the data. Then, open any one of the seven algorithm R Markdown files and "Run All" code to see the results and visualizations!

Assumed participant background

We assume that participants have familiarity with:

Basic R syntax
Statistical concepts such as mean and standard deviation

Technology requirements

Please bring a laptop with the following:

R version 3.5 or greater
RStudio integrated development environment (IDE) is highly recommended but not required.

Resources

Browse resources listed on the D-Lab Machine Learning Working Group repository. Scroll down to see code examples in R and Python, books, courses at UC Berkeley, online classes, and other resources and groups to help you along your machine learning journey!

Slideshow

The slides were made using xaringan, which is a wrapper for remark.js. Check out Chapter 7 if you are interested in making your own! The theme borrows from Brad Boehmke's presentation on Decision Trees, Bagging, and Random Forests - with an example implementation in R.

Computational-Social-Science-Training-Program

This course is a rigorous, year-long introduction to computational social science. We cover topics spanning reproducibility and collaboration, machine learning, natural language processing, and causal inference. This course has a strong applied focus with emphasis placed on doing computational social science.

Jupyter Notebook

Python-Fundamentals-Legacy

D-Lab's 12 hour introduction to Python. Learn how to create variables and functions, use control flow structures, use libraries, import data, and more, using Python and Jupyter Notebooks.

Jupyter Notebook

R-Fundamentals-Legacy

D-Lab's 12 hour introduction to R Fundamentals. Learn how to create variables and functions, manipulate data frames, make visualizations, use control flow structures, and more, using R in RStudio.

Bash-Git

D-Lab's 3 hour introduction to basic Bash commands and using version control with Git and Github.

R-Deep-Learning

Workshop (6 hours): Deep learning in R using Keras. Building & training deep nets, image classification, transfer learning, text analysis, visualization

git-fundamentals

A starting point for discovering the wonderful world of Git, GitHub, and Git Annex (Assistant)

Stata-Fundamentals

D-Lab's 9 hour introduction to performing data analysis with Stata. Learn how to program, conduct data analysis, create visualization, and conduct statistical analyses in Stata.

python-for-everything

Materials for teaching the Python for Everything workshop at UC Berkeley's D-lab

Jupyter Notebook

Python-Machine-Learning

D-Lab's 6 hour introduction to machine learning in Python. Learn how to perform classification, regression, clustering, and do model selection using scikit-learn in Python.

Jupyter Notebook

MachineLearningWG

D-Lab's Machine Learning Working Group at UC Berkeley, with supervised & unsupervised learning tutorials in R and Python

Python-Geospatial-Fundamentals-Legacy

D-Lab's 6 hour introduction to working with geospatial data in Python. Learn how to import, visualize, and analyze geospatial data using GeoPandas in Python.

Jupyter Notebook

Python-Data-Visualization-Legacy

D-Lab's 3 hour introduction to data visualization with Python. Learn how to create histograms, bar plots, box plots, scatter plots, compound figures, and more, using matplotlib and seaborn.

Jupyter Notebook

R-Geospatial-Fundamentals-Legacy

This is the repository for D-Lab's Geospatial Fundamentals in R with sf workshop.

Jupyter Notebook

Python-Data-Wrangling-Legacy

D-Lab's 3 hour introduction to data wrangling in Python. Learn how to import and manipulate dataframes using pandas in Python.

Jupyter Notebook

R-Machine-Learning-Legacy

D-Lab's 6 hour introduction to machine learning in R. Learn the fundamentals of machine learning, regression, and classification, using tidymodels in R.

Unsupervised-Learning-in-R

Workshop (6 hours): Clustering (Hdbscan, LCA, Hopach), dimension reduction (UMAP, GLRM), and anomaly detection (isolation forests).

python-berkeley

python resources of berkeley curated at a place

Jupyter Notebook

Python-Text-Analysis-Fundamentals

D-Lab's 9 hour introduction to text analysis with Python. Learn how to perform bag-of-words, sentiment analysis, topic modeling, word embeddings, and more, using scikit-learn, NLTK, gensim, and spaCy in Python.

Jupyter Notebook

R-Data-Wrangling-Legacy

D-Lab's 6 hour introduction to data wrangling with R. Learn how to manipulate dataframes using the tidyverse in R.

python-data-from-web

API and web scraping workshops

Jupyter Notebook

R-Data-Visualization-Legacy

D-Lab's 3 hour introduction to data visualization with R. Learn how to create histograms, bar plots, box plots, scatter plots, compound figures, and more using ggplot2 and cowplot.

R-Functional-Programming

The joy and power of functional programming in R

python-text-analysis-legacy

Text Analysis Workshops for UC Berkeley's D-Lab

Jupyter Notebook

programming-fundamentals

Introduction to Programming for UC Berkeley's D-Lab

ANN-Fundamentals

Jupyter Notebook

DIGHUM101-2020

Jupyter Notebook

Python-Text-Analysis

D-Lab's 12 hour introduction to text analysis with Python. Learn how to perform bag-of-words, sentiment analysis, topic modeling, word embeddings, and more, using scikit-learn, NLTK, Gensim, and spaCy in Python.

Jupyter Notebook

sql-for-r-users

SQL for R Users, Workshop

Python-Deep-Learning-Legacy

D-Lab's 6 hour introduction to deep learning in Python. Learn how to create and train neural networks using Tensorflow and Keras.

Jupyter Notebook

awesome-dlab

😎 Awesome lists about all kinds of topics and tools interesting to D-Labbers

advanced-data-wrangling-in-R-legacy

Advanced-data-wrangling-in-R, Workshop

R-Census-Data-Legacy

Workshop on fetching and mapping census data with tidycensus

Geospatial-Fundamentals-in-QGIS

regular-expressions-in-python

Jupyter Notebook

Qualtrics-Fundamentals

D-Lab's 3 hour introduction to Qualtrics Fundamentals. Learn how to design and manage your own surveys in Qualtrics.

Data-Science-Social-Justice-2022

Materials for D-Lab / UC Berkeley Graduate Division's Data Science + Social Justice summer workshop. These materials provide an introduction to Python, natural language processing, text analysis, word embeddings, and network analysis. They also include discussions on critical approaches to data science to promote social justice.

Jupyter Notebook

Geocoding-in-R

Python-Data-Wrangling

D-Lab's 3-hour workshop diving deep into Pandas. Learn how to manipulate, index, merge, group, and plot data frames using Pandas functions.

Jupyter Notebook

efficient-reproducible-project-management-in-R

Efficient and Reproducible Project Management in R

Excel-Fundamentals

D-Lab's six-hour introduction to the basics of Microsoft Excel (with support materials for Google Sheets). Learn Excel functions for handling text, math, dates, logic, and calculations; learn to create charts and pivot tables.

fairML

Bias and Fairness in ML workshop

Jupyter Notebook

Python-Web-Scraping-Legacy

D-Lab's 3 hour introduction to web scraping in Python. Learn how to use APIs and scrape data from websites using the New York Times API and BeautifulSoup in Python.

Jupyter Notebook

regex-intro

Geospatial-Fundamentals-in-R-sp

Leaflet-Maps-in-R

A 3-hour intensive workshop to introduce the R Leaflet package

javascript-viz

A D-Lab intro to JavaScript visualization using the IPython notebook.

DIGHUM101-2023

Practicing the Digital Humanities, UC Berkeley Summer Session 2023

Jupyter Notebook

LaTeX-Fundamentals

DIGHUM101-2021

Jupyter Notebook

cloud-computing-working-group

data-security-fundamentals

Data Security Fundamentals

Python-Fundamentals

D-Lab's 3-part, 6 hour introduction to Python. Learn how to create variables, distinguish data types, use methods, and work with Pandas, using Python and Jupyter.

Jupyter Notebook

Python-Web-APIs

D-Lab's 2 hour introduction to using web APIs in Python. Learn how to obtain data from web platforms using the New York Times API as a case study.

Jupyter Notebook

quick-consulting-examples

Collection of quick pandas, python, and other coding examples based on real consulting requests.

Jupyter Notebook

dlab-berkeley.github.io

Tech overview site showcasing D-Lab's online offerings

visualization-in-Excel

Python-Web-Scraping

D-Lab's 2 hour introduction to web scraping in Python. Learn how to scrape HTML/CSS data from websites using Requests and Beautiful Soup.

Jupyter Notebook

Data-Science-Social-Justice

Materials for D-Lab / UC Berkeley Graduate Division's Data Science for Social Justice summer workshop. These materials provide an introduction to Python, natural language processing, text analysis, word embeddings, and network analysis. They also include discussions on critical approaches to data science to promote social justice.

Jupyter Notebook

DIGHUM101-2022

Practicing the Digital Humanities, UC Berkeley Summer Session 2022

Jupyter Notebook

Python-Geospatial-Fundamentals

About D-Lab's 4-hour introduction to working with geospatial data in Python. Learn how to import, visualize, and analyze geospatial data in Python.

Jupyter Notebook

Basics-of-Excel

intro-maxqda

Python-Intermediate

D-Lab's 3-part, 6 hour workshop diving deeper into Python. Learn how to create functions, use if-statements and for-loops, and work with Pandas, using Python and Jupyter.

Jupyter Notebook

R-Data-Visualization

D-Lab's 2-hour introduction to data visualization with R. Learn how to create histograms, bar charts, box plots, scatter plots, and more using ggplot2.

IRB-Fundamentals

D-Lab's 3 hour introduction to the fundamentals of navigating Institutional Review Boards (IRB).

RStudio-Project-Management

Resources to help you start managing data science projects.

git-for-project-management

Using Git and GitHub for Project Management

R-package-development

R package development workshop

Git-Playground

This repository is for D-Lab workshops that require practicing with Git.

sas-intro

Introduction to SAS

R-Push-Ins

D-Lab's 4.5 hour "push-in" introduction to R, providing a brief survey of foundational R concepts and operations.

DEVP229-Spring2021

MAXQDA-Fundamentals

D-Lab's 2 hour introduction to MAXQDA. Learn how to conduct qualitative data analysis using MAXQDA.

sas-analysis

Data Analysis with SAS

R-Research-Design

ArcGIS-Online-Fundamentals

dlab-methods

Computational-Text-Analysis-2017

An introduction to Computational Text Analysis in four 2hr sessions designed to help beginners build intuition, and to interact with workflows for natural language processing, supervised, and unsupervised approaches. Created for CTAWG in 2017 by Ben Gebre-Medhin

Python-Data-Visualization-Pilot

D-Lab's 4-hour introduction to data visualization with Python. Learn how to create histograms, bar plots, box plots, scatter plots, compound figures, and more, using matplotlib and seaborn.

Jupyter Notebook

HAAS-Python-Workshop

Jupyter Notebook