Boston Machine Learning
Table of Contents
- Intro to Data Science
- Web Scraping
- Theano
- Data Visualization
- Semi-supervised Learning
- Dealing with Temporal Clinical Data
- RNNs and Hyperparameters
- Bayesian Methods
- Distributed Learning
- Techniques for Dimensionality Reduction
- Modeling Sensor Data
- Introduction to Markov Decision Processes
- Perception as Analysis by Synthesis
- Operationalizing Data Science Output
- GPU Accelerated Learning
- High Dimensional Function Learning
- Basketball Analytics Using Player Tracking Data
- TensorFlow in Practice
- Virtual Currency Trading
- NSFW Modeling with ConvNets
- Structured Attention Networks
- Automated Machine Learning
- Grounding Natural Language with Autonomous Interaction
- Neural Network Design Using RL
- AI for Enterprise
Intro to Data Science
- Imran Malek is a Solutions Architect at DataXu. His workshop introduced pandas and matplotlib. Slides | Notebook
Web Scraping
- Marcus Way is an SDE at Amazon and was previously a Software Engineer at Wanderu, a company that helps people find the lowest bus fares. This workshop took us through the process of acquiring data from the web before building a model to predict whether an article's title originated from Gawker or the Wall Street Journal. Notebook
Theano
- Alec Radford is the Head of Research at indico. His talk introduced Theano and convolutional networks. Video | Code
Data Visualization
- Lane Harrison is an Assistant Professor of Computer Science at WPI and was previously a Postdoc in the Visual Analytics Lab at Tufts. His workshop introduced data visualization with d3.js. Slides | Code
Semi-supervised Learning
- Eli Brown is an Assistant Professor of Computer Science at DePaul. His talk focused on using interactive visualizations to help users leverage learning algorithms. Slides | Paper
Dealing with Temporal Clinical Data
- Marzyeh Ghassemi is a PhD Student at MIT CSAIL in the Clinical Decision Making Group. Her session introduced both Latent Dirchlet Allocation and Gaussian Processes before walking us through her recent paper entitled "A Multivariate Timeseries Modeling Approach to Severity of Illness Assessment and Forecasting in ICU with Sparse, Heterogeneous Clinical Data." Paper | Slides
RNNs and Hyperparameters
-
Alec Radford - Using Passage to Train RNNs. Slides | Code | Video
-
David Duvenaud - Gradient-Based Learning of Hyperparameters. Paper | Slides | Code
Bayesian Methods
-
Allen Downey is a Professor of Computer Science at Olin College. His talk focused on an application of bayesian statistics from World War II. Slides | Video
-
José Miguel Hernández Lobato is a Postdoc at Harvard's Intelligent Probabilistic Systems Lab and presented on bayesian optimization and information-based approaches. Slides
Distributed Learning
- Arno Candel is the Chief Architect at H2o. His talk focused on the implementation and application of distributed machine learning algorithms such as Elastic Net, Random Forest, Gradient Boosting, and Deep Neural Networks. Slides
Techniques for Dimensionality Reduction
- Dan Steinburg is a PhD student in intelligent systems at the University of Pittsburgh. His talked introduced various techniques for dimensionality reduction including PCA, multidimensional scaling, isomaps, locally linear embedding, and laplacian eigenmaps. Slides
Modeling Sensor Data
- Hank Roark is a Data Scientist at H2O, where he works on building data products within the domains of machine prognostics, health management, and agriculture. His workshop focused on on the challenges faced when modeling streaming sensor data. Slides | Notebook
Introduction to Markov Decision Processes
- Alborz Geramifard is a Research Scientist at Amazon and lead an introductory workshop on MDPs with RLPy. Paper | Code | Slides
Perception as Analysis by Synthesis
- Tejas Kulkarni is a PhD Student at MIT in Josh Tenenbaum's lab and spent last summer working at Google DeepMind in London. His talk will was focused on his recent paper entitled: "Picture: A Probabilistic Programming Language for Scene Perception." Paper | Slides
Operationalizing Data Science Output
- Tom LaGatta is a Senior Data Scientist & Analytics Architect at Splunk. His session focused on aligning data science output with operational workflows. Slides
GPU Accelerated Learning
- Bob Crovella joined NVIDIA in 1998 and leads a technical team that is responsible for supporting GPU Computing Products. His talk began with an introduction to why GPUs are helpful when training deep neural networks. He then walked through demos of cuDNN and DIGITS from the perspective of how they fit together with frameworks like Caffe, Torch, and Theano. Slides | Video
High Dimensional Function Learning
- Jason Klusowski is a PhD student at Yale and presented on the computational and theoretical aspects of approximating d-dimensional functions. Slides | Video
Basketball Analytics Using Player Tracking Data
- Alexander D'Amour is an Assistant Professor in Statistics at UCB, and recently completed his PhD at Harvard. His talk introduced applications of 24-FPS spatial data in the direction of answering fundamental questions related to the game of basketball. Video
TensorFlow in Practice
- Nathan Lintz is a research scientist at indico Data Solutions where he is responsible for developing machine learning systems in the domains of language detection, text summarization, and emotion recognition. His session focused on the first principles of TensorFlow, building all the way up to generative modeling with recurrent networks. Slides | Code | Video
Virtual Currency Trading
- Anders Brownworth is a principle engineer at Circle and was previously an instructor at the MIT Media Lab. His talk focused on building the intution needed with respect to the blockchain and bitcoin to develop succesful trading stratagies. Slides | Video | Hacker News
NSFW Modeling with ConvNets
- Ryan Compton is a data scientist at Clarifai. His talk used the problem of nudity detection to illustrate the workflow involved with training and evaluating convolutional neural networks. He also discussed deconvolution and demonstrated how it can be used to visualize intermediate feature layers. Slides | Video
Structured Attention Networks
- Yoon Kim is Phd Student in computer science at Harvard. This session gave an overview of attention mechanisms and structured prediction before introducing a method for combining the two ideas by way of graphical models. Slides | Code
Automated Machine Learning
- Nicolo Fusi is a research scientist at Microsoft Research, working at the intersection of machine learning, computational biology and medicine. He received his PhD in Computer Science from the University of Sheffield under Neil Lawrence. His talk focused on the process of selecting and tuning pipelines consisting of data preprocessing methods and machine learning models. Slides | Paper | Video
Grounding Natural Language with Autonomous Interaction
- Karthik Narasimhan is a PhD candidate at CSAIL working on natural language understanding and deep reinforcement learning. His talk focused on task-optimized representations to reduce dependence on annotation. The session built up to a demonstration of how reinforcement learning can enhance traditional NLP systems in low resource scenarios. In particular, he described an autonomous agent that can learn to acquire and integrate external information to improve information extraction. Slides
Neural Network Design Using RL
- Bowen Baker recently completed his graduate work at the MIT Media Lab. His presentation touched on practical CNN meta-modeling. He now is continuing his work as a member of the research team at OpenAI. Slides | Video
AI for Enterprise
- Sophie Vandebroek is the COO at IBM Research, and discussed applications of her teams work. Ruchir Puri is the Chief Architect of Watson, and presented on challenges related to deploing machine learning systems. Video