Machine Learning Paper Club with nPlan

A repository of papers discussed at nPlan's Machine Learning Paper Club.

Joining Instructions

Paper Club is now remote, with an in-person session approximately every 4 weeks. As always, Thursdays 12h30 London, but via webinar or in our office in Whitechapel. During the session feel free to ask and answer questions or make a comment about the paper. This is a discussion rather than a presentation. Bear in mind that these meetings may be recorded for dissemination purposes.

Next meetup's paper

[05.10.2023] Gerard presents: ImageBind: One Embedding Space To Bind Them All by Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, Ishan Misra

FOR IN-PERSON SESSIONS: For those who want to join in person. Please go to the reception at our office building (133 Whitechapel High St, London E1 7PT) and say you are here for a meeting with nPlan IN MEETING ROOM 4 IN THE BASEMENT. IF YOU ARE ATTENDING IN PERSON PLEASE RSVP ON OUR MEETUP PAGE SO WE CAN GET A HEADCOUNT FOR FOOD AND DRINKS.

Supplementary material

For those new to machine learning, these are some recommended reading material:

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
W. Hamilton. 2020, Graph Representation Learning
L. Wu, P. Cui, J. Pei, L. Zhao, L. Song, 2022, Graph Neural Networks
Provost, F. and Fawcett, T. (2013). Data science for business. Sebastopol: O'Reilly.
Goldberg, Y. (2016). A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research, 57, 345-420.

Transformer-related resources:

The illustrated transformer
The annotated transformer
Embedding layers in BERT, explained
Examples of BERT: sentiment analysis and feature extraction from BERT

The wide and deep model implementation that Carlos presented can be found here https://github.com/caledezma/wide_deep_model. Why not download it, play with it, and let us know your findings at paper club?

The demo for Platt scaling in calibration can be found here https://github.com/caledezma/calibration_scaling_demo. Feel free to contribute to it, we might make a push to TensorFlow with a Platt Scaling layer!

YouTube channel

We regularly record the presentations made during the Meetup (subject to the presenter's approval). These videos are then uploaded to our YouTube channel so that those that can't attend are still able to profit from the presentations. If you'd like to stay up to date with the presentations, just hit the subscribe button!

Paper history

Past papers discussed in Paper Club meetings:

[28.09.2023] Peter presents: RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control by Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Florence, Chuyuan Fu, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Kehang Han, Karol Hausman, Alexander Herzog, Jasmine Hsu, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Lisa Lee, Tsang-Wei Edward Lee, Sergey Levine, Yao Lu, Henryk Michalewski, Igor Mordatch, Karl Pertsch, Kanishka Rao, Krista Reymann, Michael Ryoo, Grecia Salazar, Pannag Sanketi, Pierre Sermanet, Jaspiar Singh, Anikait Singh, Radu Soricut, Huong Tran, Vincent Vanhoucke, Quan Vuong, Ayzaan Wahid, Stefan Welker, Paul Wohlhart, Jialin Wu, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Tianhe Yu, and Brianna Zitkovich
[14.09.2023] Inneke presents Graph of Thoughts: Solving Elaborate Problems with Large Language Models by Besta, M., Blach, N., Kubicek, A., Gerstenberger, R., Gianinazzi, L., Gajda, J., Lehmann, T., Podstawski, M., Niewiadomski, H., Nyczyk, P. and Hoefler, T. Recording
[07.09.2023] Vahan presents Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning by Zeyuan Allen-Zhu, Yuanzhi Li. Recording
[31.08.2023] Peter presents Bayesian Design Principles for Frequentist Sequential Learning by Yunbei Xu, Assaf Zeevi Recording
[24.08.2023] Inneke presents Tree of Thoughts: Deliberate Problem Solving with Large Language Models by S Yao, D Yu, J Zhao, I Shafran, T Griffiths, Y Cao, K Narasimhan Recording
[17.08.2023] Gerard presents QLoRA: Efficient Finetuning of Quantized LLMs by Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer Recording
[10/08/2023] Ben Steer and Naomi Arnold from present how Pometry uses Temporal Graph Motifs to study bitcoin darkweb market places and NFT wash trading.
[03/08/2023] Vahan Presents Continual Pre-training of Language Models by Zixuan Ke, Yijia Shao, Haowei Lin, Tatsuya Konishi, Gyuhak Kim, Bing Liu
[27/07/2023] Peter presents FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness By Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré
[20/07/2023] Arvid presents TrueSkill: A Bayesian skill rating system By Ralf Herbrich, Tom Minka, Thore Graepel
[13/07/2023] Arvid presents Expectation Propagation for Approximate Bayesian Inference By Thomas P Minka
[06/07/2023] Arvid presents Factor Graphs and the Sum-Product Algorithm By Frank R. Kschischang, Brendan J. Frey, and Hans-Andrea Loeliger
[29/06/2023] Peter presents Faster sorting algorithms discovered using deep reinforcement learning By Daniel J. Mankowitz, Andrea Michi, Anton Zhernov, Marco Gelmi, Marco Selvi, Cosmin Paduraru, Edouard Leurent, Shariq Iqbal, Jean-Baptiste Lespiau, Alex Ahern, Thomas Köppe, Kevin Millikin, Stephen Gaffney, Sophie Elster, Jackson Broshear, Chris Gamble, Kieran Milan, Robert Tung, Minjae Hwang, Taylan Cemgil, Mohammadamin Barekatain, Yujia Li, Amol Mandhane, Thomas Hubert, David Silver
[23/06/2023] Vahan presents DensePose From WiFi
[15/06/2023] Gerard presents Bytes Are All You Need: Transformers Operating Directly On File Bytes
[01/06/2023] Peter presents Improving language models by retrieving from trillions of tokens
[25/05/2023] Ben presents Generative Diffusion Models on Graphs: Methods and Applications
[11/05/2023] Vahan presents an outstanding paper award winner from ICLR 2023 Rethinking the Expressive Power of GNNs via Graph Biconnectivity
[04/05/2023] Gerard presents an outstanding paper award winner from ICLR 2023. Emergence of Maps in the Memories of Blind Navigation Agents
[20/04/2023] Peter presents Segment Anything by Alexander Kirillov, Eric Mintun, Nikhila Ravi1, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick
[13/04/2023] Vahan presents: Knowledge and topology: A two layer spatially dependent graph neural networks to identify urban functions with time-series street view image by Yan Zhang, Pengyuan Liu, Filip Biljecki
[06/04/2023] Gerard presents Anomaly Detection in Multiplex Dynamic Networks: from Blockchain Security to Brain Disease Prediction By Ali Behrouz, Margo Seltzer
[23/03/2023] Peter presents Graph Neural Networks for Link Prediction with Subgraph Sketching By Benjamin Paul Chamberlain, Sergey Shirobokov, Emanuele Rossi, Fabrizio Frasca, Thomas Markovich, Nils Hammerla, Michael M. Bronstein, Max Hansmire
[16/03/2023] Vahan presents Hierarchical Text-Conditional Image Generation with CLIP Latents By Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen
[09/03/2023] Peter presents ZeRO: Memory Optimizations Toward Training Trillion Parameter Models By Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He
[23/02/2023] Inneke presents Temporal Cycle-Consistency Learning by Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman

-[09/02/2023] Peter Presents Zero-shot Causal Learning By Hamed Nilforoshan, Michael Moor, Yusuf Roohani, Yining Chen, Anja Šurina, Michihiro Yasunaga, Sara Oblak, Jure Leskovec

[26/01/2023] Dan presenting Mad Max: Affine Spline Insights into Deep Learning
[12/01/2023] Peter will present Gradient Descent: The Ultimate Optimizer by Kartik Chandra, Audrey Xie, Jonathan Ragan-Kelley, Erik Meijer
[05/01/2023] Peter will present Flamingo: a Visual Language Model for Few-Shot Learning by Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andrew Brock, Aida Nematzadeh, Sahand Sharifzadeh, Mikolaj Binkowski, Ricardo Barreira, Oriol Vinyals, Andrew Zisserman, Karen Simonyan Recording
[17/12/2022] Vahan presents Expander Graph Propagation by Andreea Deac, Marc Lackenby and Petar Veličković.
[08/12/2022] Hosted by Data Spartan. James will present Guangxuan Xiao, Ji Lin, Mickael Seznec, Julien Demouth, Song Han SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
[01/12/2022] Peter presents Lucas Maystre, Daniel Russo Temporally-Consistent Survival Analysis
[24/11/2022] Arvid presents: Javier Fernández, Luke Bornn SoccerMap: A Deep Learning Architecture for Visually-Interpretable Analysis in Soccer
[17/11/2022] Dirk presents: Noah Hollmann, Samuel Müller, Katharina Eggensperger, Frank Hutter TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second
[10/11/2022] Peter presents: Alhussein Fawzi, Matej Balog, Aja Huang, Thomas Hubert, Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Francisco J. R. Ruiz, Julian Schrittwieser, Grzegorz Swirszcz, David Silver, Demis Hassabis & Pushmeet Kohli Discovering faster matrix multiplication algorithms with reinforcement learning
[03/11/2022] Nayef presents: Jaideep Pathak, Shashank Subramanian, Peter Harrington, Sanjeev Raja, Ashesh Chattopadhyay, Morteza Mardani, Thorsten Kurth, David Hall, Zongyi Li, Kamyar Azizzadenesheli, Pedram Hassanzadeh, Karthik Kashinath, Animashree Anandkumar FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators
[27/10/2022] Peter presents: Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida Spectral Normalization for Generative Adversarial Networks
[20/10/2022] Vahan presents: Francesco Di Giovanni, James Rowbottom, Benjamin P. Chamberlain, Thomas Markovich, Michael M. Bronstein Graph Neural Networks as Gradient Flows: understanding graph convolutions via energy
[13/10/2022] Vahan presents: Ziang Chen, Jialin Liu, Xinshang Wang, Jianfeng Lu, Wotao Yin On Representing Linear Programs by Graph Neural Networks
[06/10/2022] Ben presents: Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole Score-Based Generative Modeling through Stochastic Differential Equations
[29/09/2022] Gerard presents: Guy Dar, Mor Geva, Ankit Gupta, Jonathan Berant Analyzing Transformers in Embedding Space
[15/09/2022] Nayef presents: Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, Surya Ganguli Deep Unsupervised Learning using Nonequilibrium Thermodynamics
[08/09/2022] Vahan presents: Li Jing, Pascal Vincent, Yann LeCun, Yuandong Tian UNDERSTANDING DIMENSIONAL COLLAPSE IN CONTRASTIVE SELF-SUPERVISED LEARNING
[01/09/2022] Tara presents: Qiao, W., Zhao, Y., Xu, Y., Lei, Y., Wang, Y., Yu, S. and Li, H. Deep learning-based pixel-level rock fragment recognition during tunnel excavation using instance segmentation model. If you have any requests about the paper, please email vahanATnplan.io
[25/08/2022] Peter presents: Thomas Muller, Alex Evans, Christoph Schied, Alexander Keller Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
[18/08/2022] Vahan presents: A deep graph neural network architecture for modelling spatio-temporal dynamics in resting-state functional MRI data
[11/08/2022] Dirk presents: S. Scott, A. Blocker, F. Bonassi, H. Chipman, E. George, and R. McCulloch Bayes and Big Data: The Consensus Monte Carlo Algorithm
[04/08/2022] Peter presents: Kawin Ethayarajh, Yejin Choi, Swabha Swayamdipta Understanding Dataset Difficulty with V-Usable Information (One of ICML's 2022 outstanding papers)
[28/07/2022] Vahan presents: Stéphane D’Ascoli, Pierre-Alexandre Kamienny, Guillaume Lample, Francois Charton Deep symbolic regression for recurrence prediction
[21/07/2022] Nayef presents: Moshe Eliasof, Eldad Haber, Eran Treister PDE-GCN: Novel Architectures for Graph Neural Networks Motivated by Partial Differential Equations
[14/07/2022] Peter presents: Rakshit Trivedi, Mehrdad Farajtabar, Prasenjeet Biswal, Hongyuan Zha DyRep: Learning Representations over Dynamic Graphs
[07/07/2022] Vahan presents: Petar Veličković, William Fedus, William L. Hamilton, Pietro Liò, Yoshua Bengio, R Devon Hjelm Deep Graph Infomax
[29/06/2022] Peter presents: Melih Kandemir, Abdullah Akgül, Manuel Haussmann, Gozde Unal Evidential Turing Processes
[22/06/2022] Vahan presents: Carlos Fernandez-Loria, Foster Provost Causal Decision Making and Causal Effect Estimation Are Not the Same...and Why It Matters
[16/06/2022] Peter presents: Albert Gu, Karan Goel, and Christopher Ré Efficiently Modeling Long Sequences with Structured State Spaces
[09/06/2022] Vahan presents Newton vs the machine: solving the chaotic three-body problem using deep neural networks
[26/05/2022] Arvid Presents: Xun Zheng, Bryon Aragam, Pradeep Ravikumar, Eric P. Xing DAGs with NO TEARS: Continuous Optimization for Structure Learning
[19/05/2022] Peter Presents: Rebekka Burkholz, Nilanjana Laha, Rajarshi Mukherjee, Alkis Gotovos On the Existence of Universal Lottery Tickets
[12/05/2022] Vahan Presents: Bertrand Charpentier, Simon Kibler, Stephan Günnemann Differentiable DAG Sampling
[05/05/2022] Peter Presents: Shengjia Zhao, Abhishek Sinha, Yutong He, Aidan Perreault, Jiaming Song, Stefano Ermon Comparing Distributions by Measuring Differences that Affect Decision Making
[04/28/2022] Ben Presents: Viacheslav Borovitskiy, Iskander Azangulov, Alexander Terenin, Peter Mostowsky, Marc Deisenroth, Nicolas Durrande Matern Gaussian Processes on Graphs
[04/07/2022] Peter leads a discussion on what is AI and what are its social and economic impacts. Please watch the following video by Jerry Kaplan Humans need not apply and read the following paper by John Searle The Chinese Room
[04/07/2022] Vahan presents: Yasaman Razeghi, Robert L. Logan IV, Matt Gardner, Sameer Singh (2022) Impact of Pretraining Term Frequencies on Few-Shot Reasoning
[04/07/2022] Vahan presents: Chence Shi, Minkai Xu, Zhaocheng Zhu, Weinan Zhang, Ming Zhang, Jian Tang (2021) GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation
[03/31/2022] Peter presents: Chao Ma, Cheng Zhang (2021) Identifiable Generative Models for Missing Not at Random Data Imputation
[03/24/2022] Arvid presents: Judea Pearl On Measurement Bias in Causal Inference
[03/17/2022] Peter presents: Brian D. Ziebart, Andrew Maas, J.Andrew Bagnell, and Anind K. Dey Maximum Entropy Inverse Reinforcement Learning
[03/10/2022] Inneke presents: David Cohn, Les Atlas, Richard Ladner Improving Generalization with Active Learning
[03/03/2022] Vahan presents: Beatrice Bevilacqua, Fabrizio Frasca, Derek Lim, Balasubramaniam Srinivasan, Chen Cai, Gopinath Balamurugan, Michael M. Bronstein, Haggai Maron Equivariant Subgraph Aggregation Networks
[24/02/2022]Inneke presents: Jishnu Mukhoti, Andreas Kirsch, Joost van Amersfoort, Philip H.S. Torr, Yarin Gal Deep Deterministic Uncertainty: A Simple Baseline
[17/02/2022] Peter presents: Chelsea Finn, Pieter Abbeel, Sergey Levine (2017). Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks.
[10/02/2022] Peter presents: Sebastien Thrun (1995). Is Learning The n-th Thing Any Easier Than Learning The First?
[03/02/2022] Peter presents: Emmanuel Bengio, Moksh Jain, Maksym Korablyov, Doina Precup, Yoshua Bengio (2021). Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation.
[27/01/2022] Arvid presents: Marius Muja, David G. Lowe (2014). Scalable Nearest Neighbor Algorithms for High Dimensional Data.
[20/01/2022] Dwane presents: Paul J. Blazek & Milo M. Lin (2021). Explainable neural networks that simulate reasoning.
[13/01/2022] Sagar presents: Sagar Vaze, Kai Han, Andrea Vedaldi, Andrew Zisserman (2021). Open-Set Recognition: A Good Closed-Set Classifier is All You Need.
[06/01/2022] Peter presents: Deng-Bao Wang, Lei Feng, Min-Ling Zhang (2021). Rethinking Calibration of Deep Neural Networks: Do Not Be Afraid of Overconfidence.
[09/12/2021] Peter presents: Gregory Clark (2021). Deep Synoptic Monte Carlo Planning in Reconnaissance Blind Chess.
[02/12/2021] Vahan presents: Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver (2020). Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model.
[25/11/2021] Joao presents: Keyulu Xu, Mozhi Zhang, Jingling Li, Simon S. Du, Ken-ichi Kawarabayashi, Stefanie Jegelka (2020). How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks.
[18/11/2021] Peter presents: Rico Jonschkowski, Divyam Rastogi, Oliver Brock (2018). Differentiable Particle Filters: End-to-End Learning with Algorithmic Priors.
[11/11/2021] Vahan presents: Rex Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, Jure Leskovec (2019). GNNExplainer: Generating Explanations for Graph Neural Networks.
[04/11/2021] Inneke presents: Sören Mindermann, Muhammed Razzak, Winnie Xu, Andreas Kirsch, Mrinank Sharma, Adrien Morisot, Aidan N. Gomez, Sebastian Farquhar, Jan Brauner, Yarin Gal (2021). Prioritized training on points that are learnable, worth learning, and not yet learned.
[28/10/2021] Joao presents: Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira (2021). Perceiver IO: A General Architecture for Structured Inputs & Outputs.
[21/10/2021] Peter presents: L Liu, M Hughes, S Hassoun, L Liu (2021). Stochastic Iterative Graph Matching.
[14/10/2021] Arvid presents: J Ma, B Chang, X Zhang, Q Mei (2021). CopulaGNN: Towards Integrating Represntational and Correlatioonal Roles of Graphs in Graph Neural Networks.
[07/10/2021] Vahan presents: X Chen, X Han, J Hu, F Ruiz, L Liu (2021). Order Matters: Probabilistic Modeling of Node Sequence for Graph Generation.
[02/09/2021] Joao presents: James Thorne et al. (2021). Database Reasoning Over Text. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pages 3091–3104, August 1–6, 2021.
[26/08/2021] Ben presents: Emilien Dupont, Arnaud Doucet, Yee Whye Teh (2019). Augmented Neural ODEs. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019).
[19/08/2021] Arvid presents: Tan, M. & Le, Q.. (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:6105-6114.
[12/08/2021] Jiameng presents: Nanxin Chen et al. (2020). WaveGrad: Estimating Gradients for Waveform Generation. arXiv preprint arXiv:2009.00713. AND Yang Song, Stefano Ermon (2019). Generative Modeling by Estimating Gradients of the Data Distribution. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019).
[05/08/2021] Inneke presents: Wanyu Lin, Hao Lan, Baochun Li (2021). Generative Causal Explanations for Graph Neural Networks arXiv preprint arXiv:2104.06643.
[29/07/2021] Vahan presents: Li, G., Müller, M., Ghanem, B., & Koltun, V. (2021). Training Graph Neural Networks with 1000 Layers. arXiv preprint arXiv:2106.07476.
[22/07/2021] Peter presents: Jesson, A., Mindermann, S., Gal, Y., & Shalit, U. (2021). Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding. arXiv preprint arXiv:2103.04850.
[15/07/2021] Joao presents: Jannik Kossen, Neil Band, Clare Lyle, Aidan N. Gomez, Tom Rainforth, Yarin Gal (2021). Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning arXiv preprint arXiv:2106.02584.
[08/07/2021] Peter presents: Ghifary, M., Kleijn, W. B., Zhang, M., Balduzzi, D., & Li, W. (2016, October). Deep reconstruction-classification networks for unsupervised domain adaptation. In European conference on computer vision (pp. 597-613). Springer, Cham.
[01/07/2021] Peter presents: Louizos, C., Shalit, U., Mooij, J., Sontag, D., Zemel, R., & Welling, M. (2017). Causal effect inference with deep latent-variable models. arXiv preprint arXiv:1705.08821.
[24/06/2021] Inneke presents: Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., ... & Sutskever, I. (2021). Zero-shot text-to-image generation. arXiv preprint arXiv:2102.12092.
[17/06/2021] Arvid presents: Arik, S. O., & Pfister, T. (2019). Tabnet: Attentive interpretable tabular learning. arXiv preprint arXiv:1908.07442.
[10/06/2021] Carlos presents: Chen, L., Lu, K., Rajeswaran, A., Lee, K., Grover, A., Laskin, M., ... & Mordatch, I. (2021). Decision Transformer: Reinforcement Learning via Sequence Modeling. arXiv preprint arXiv:2106.01345.
[03/06/2021] Vahan presents: Hugo Touvron, Piotr Bojanowski, Mathilde Caron, Matthieu Cord, Alaaeldin El-Nouby, Edouard Grave, Armand Joulin, Gabriel Synnaeve, Jakob Verbeek, Hervé Jégou (2021). ResMLP: Feedforward networks for image classification with data-efficient training. arXiv preprint arXiv:2105.03404.
[27/05/2021] Joao presents: Curtis G. Northcutt, Lu Jiang, Isaac L. Chuang (2019). Confident Learning: Estimating Uncertainty in Dataset Labels arXiv preprint arXiv:1911.00068.
[20/05/2021] Peter presents: Tolstikhin, I., Houlsby, N., Kolesnikov, A., Beyer, L., Zhai, X., Unterthiner, T., ... & Dosovitskiy, A. (2021). MLP-Mixer: An all-MLP Architecture for Vision. arXiv preprint arXiv:2105.01601.
[13/05/2021] Inneke presents: Oord, A. V. D., Vinyals, O., & Kavukcuoglu, K. (2017). Neural discrete representation learning. arXiv preprint arXiv:1711.00937.
[06/05/2021] Peter presents: Sahoo, S., Lampert, C., & Martius, G. (2018, July). Learning equations for extrapolation and control. In International Conference on Machine Learning (pp. 4442-4450). PMLR.
[29/04/2021] Peter presents: Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., ... & Zaremba, W. (2017). Hindsight experience replay. arXiv preprint arXiv:1707.01495.
[22/04/2021] Jiameng presents: Bai, S., Kolter, J. Z., & Koltun, V. (2019). Deep equilibrium models. arXiv preprint arXiv:1909.01377.
[15/04/2021] Peter presents: Schölkopf, B., Locatello, F., Bauer, S., Ke, N. R., Kalchbrenner, N., Goyal, A., & Bengio, Y. (2021). Toward Causal Representation Learning. Proceedings of the IEEE.
[08/04/2021] Carlos presents: Jiang, Y., Chang, S., & Wang, Z. (2021). Transgan: Two transformers can make one strong gan. arXiv preprint arXiv:2102.07074.
[01/04/2021] Inneke presents: Jaegle, A., Gimeno, F., Brock, A., Zisserman, A., Vinyals, O., & Carreira, J. (2021). Perceiver: General Perception with Iterative Attention. arXiv preprint arXiv:2103.03206.
[25/03/2021] Vahan presents: He, P., Liu, X., Gao, J., & Chen, W. (2020). Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654.
[18/03/2021] Alexandre presents: Khrulkov, V., Mirvakhabova, L., Ustinova, E., Oseledets, I., & Lempitsky, V. (2020). Hyperbolic image embeddings. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 6418-6428).
[11/03/2021] Arvid presents: Yeh, C. C. M., Zhu, Y., Ulanova, L., Begum, N., Ding, Y., Dau, H. A., ... & Keogh, E. (2016, December). Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In 2016 IEEE 16th international conference on data mining (ICDM) (pp. 1317-1322). Ieee.
[04/03/2021] Dwane presents: Chen, Z., Bei, Y., & Rudin, C. (2020). Concept whitening for interpretable image recognition. Nature Machine Intelligence, 2(12), 772-782.
[25/02/2021] João presents: Pruthi, G., Liu, F., Sundararajan, M., & Kale, S. (2020). Estimating Training Data Influence by Tracing Gradient Descent. arXiv preprint arXiv:2002.08484.
[18/02/2021] Amin presents: Xiong, Y., Zeng, Z., Chakraborty, R., Tan, M., Fung, G., Li, Y., & Singh, V. (2021). Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention. arXiv preprint arXiv:2102.03902.
[11/02/2021] Inneke presents: Zhang, M., & He, Y. (2020). Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping. arXiv preprint arXiv:2010.13369.
[04/02/2021] Carlos presents: Brown, N., Bakhtin, A., Lerer, A., & Gong, Q. (2020). Combining deep reinforcement learning and search for imperfect-information games. arXiv preprint arXiv:2007.13544.
[28/01/2021] Amin presents: Kong, L., d'Autume, C. D. M., Ling, W., Yu, L., Dai, Z., & Yogatama, D. (2019). A mutual information maximization perspective of language representation learning. arXiv preprint arXiv:1910.08350.
[21/01/2021] João presents: Haidar, M. A., & Rezagholizadeh, M. (2019, May). Textkd-gan: Text generation using knowledge distillation and generative adversarial networks. In Canadian Conference on Artificial Intelligence (pp. 107-118). Springer, Cham.
[14/01/2021] Dwane presents: Bartolo, M., Roberts, A., Welbl, J., Riedel, S., & Stenetorp, P. (2020). Beat the AI: Investigating Adversarial Human Annotations for Reading Comprehension. arXiv preprint arXiv:2002.00293.
[10/12/2020] Carlos presents: Huang, Q., He, H., Singh, A., Lim, S. N., & Benson, A. R. (2020). Combining Label Propagation and Simple Models Out-performs Graph Neural Networks. arXiv preprint arXiv:2010.13993.
[03/12/2020] Arvid presents: Lim, B., Arik, S. O., Loeff, N., & Pfister, T. (2019). Temporal fusion transformers for interpretable multi-horizon time series forecasting. arXiv preprint arXiv:1912.09363.
[19/11/2020] Dan presents: Zhang, J., Shi, X., Xie, J., Ma, H., King, I., & Yeung, D. Y. (2018). Gaan: Gated attention networks for learning on large and spatiotemporal graphs. arXiv preprint arXiv:1803.07294.
[12/11/2020] Amin presents: Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., & Bengio, Y. (2017). Graph attention networks. arXiv preprint arXiv:1710.10903.
[05/11/2020] Vahan presents: Rong, Y., Bian, Y., Xu, T., Xie, W., Wei, Y., Huang, W., & Huang, J. (2020). GROVER: Self-supervised Message Passing Transformer on Large-scale Molecular Data. arXiv preprint arXiv:2007.02835.
[29/10/2020] Carlos presents: Paper under double-blind review. Lambda networks: modeling long-range interactions without attention.. ICLR 2021.
- Accompannying video explanation
[22/10/2020] Dan presents: Doersch, C., Gupta, A., & Zisserman, A. (2020). CrossTransformers: spatially-aware few-shot transfer. arXiv preprint arXiv:2007.11498.
[15/10/2020] Vahan presents: Vyas, A., Katharopoulos, A., & Fleuret, F. (2020). Fast Transformers with Clustered Attention. arXiv preprint arXiv:2007.04825.
- Blog post and code for the paper
- Further reading on efficient transformers: The Reformer
[09/10/2020] Joao presents: Under double-blind review. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.
[01/10/2020] João presents: Swayamdipta, S. et al. Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics. arXiv preprint 2009.10795.
[25/09/2020] Amin presents: Cordonnier, J. B., Loukas, A., & Jaggi, M. (2019). On the relationship between self-attention and convolutional layers. arXiv preprint arXiv:1911.03584.
[17/07/2020] Arvid presents: Ratner, A., Bach, S. H., Ehrenberg, H., Fries, J., Wu, S., & Ré, C. (2017, November). Snorkel: Rapid training data creation with weak supervision. In Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases (Vol. 11, No. 3, p. 269). NIH Public Access.
[10/09/2020] Carlos presents: Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
[03/09/2020] Vahan presents: Lee, H., Hwang, S. J., & Shin, J. Self-supervised Label Augmentation via Input Transformations.. Supporting material: Yann LeCun speaks about self supervised learning
[27/08/2020] Dan presents: Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, Quoc V. Le (2019). Unsupervised Data Augmentation for Consistency Training. arXiv preprint arXiv:1904.12848.
[20/08/2020] Inneke presents: Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, Geoffrey Hinton (2020). Big Self-Supervised Models are Strong Semi-Supervised Learners. arXiv preprint arXiv:2006.10029.
[13/08/2020] Joao presents: Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning (2020). ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In International Conference on Learning Representations 2020.
[06/08/2020] Amin presents: Yun, S., Jeong, M., Kim, R., Kang, J., & Kim, H. J. (2019). Graph transformer networks. In Advances in Neural Information Processing Systems (pp. 11983-11993).
[30/07/2020] Slides Krisztina presents: Kohl, S., Romera-Paredes, B., Meyer, C., De Fauw, J., Ledsam, J. R., Maier-Hein, K., ... & Ronneberger, O. (2018). A probabilistic u-net for segmentation of ambiguous images. In Advances in Neural Information Processing Systems (pp. 6965-6975).
[23/07/2020] Vahan presents: Katharopoulos, A., Vyas, A., Pappas, N., & Fleuret, F. (2020). Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention. arXiv preprint arXiv:2006.16236.
[16/07/2020] Dan presents: Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
[09/07/2020] Amy presents: Ronneberger, O., Fischer, P., & Brox, T. (2015, October). U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention (pp. 234-241). Springer, Cham.
[02/07/2020] Joao presents: Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91-99).
[25/06/2020] Carlos presents: Wang, X., Huang, T. E., Darrell, T., Gonzalez, J. E., & Yu, F. (2020). Frustratingly Simple Few-Shot Object Detection. arXiv preprint arXiv:2003.06957.
[18/06/2020] Vahan presents: Zhang, J., Kailkhura, B., & Han, T. (2020). Mix-n-Match: Ensemble and Compositional Methods for Uncertainty Calibration in Deep Learning. arXiv preprint arXiv:2003.07329.
[11/06/2020] Slides Arvid presents: Schoenholz, S. S., Gilmer, J., Ganguli, S., & Sohl-Dickstein, J. (2016). Deep information propagation. arXiv preprint arXiv:1611.01232.
[05/06/2020] Dan presents: Snoek, J., Ovadia, Y., Fertig, E., Lakshminarayanan, B., Nowozin, S., Sculley, D., ... & Nado, Z. (2019). Can you trust your model's uncertainty? Evaluating predictive uncertainty under dataset shift. In Advances in Neural Information Processing Systems (pp. 13969-13980).
[28/05/2020] Amy presents: Santoro, A., Raposo, D., Barrett, D. G., Malinowski, M., Pascanu, R., Battaglia, P., & Lillicrap, T. (2017). A simple neural network module for relational reasoning. In Advances in neural information processing systems (pp. 4967-4976).
[21/05/2020] Joao presents: Will Grathwohl, Kuan-Chieh Wang, Jörn-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, Kevin Swersky (2020). Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One In International Conference on Learning Representations 2020.
[14/05/2020] Carlos presents: Malinin, A., & Gales, M. (2018). Predictive uncertainty estimation via prior networks. In Advances in Neural Information Processing Systems (pp. 7047-7058).
[30/04/2020] Dan presents: Garnelo, M., Schwarz, J., Rosenbaum, D., Viola, F., Rezende, D. J., Eslami, S. M., & Teh, Y. W. (2018). Neural processes. arXiv preprint arXiv:1807.01622.
[23/04/2020] Amy presents: Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., ... & Amodei, D. (2020). Scaling laws for neural language models. arXiv preprint arXiv:2001.08361.
[15/04/2020] Joao presents: Huang, G., Li, Y., Pleiss, G., Liu, Z., Hopcroft, J. E., & Weinberger, K. Q. (2017). Snapshot ensembles: Train 1, get m for free. arXiv preprint arXiv:1704.00109.
[09/04/2020] Carlos presents: Ashukha, A., Lyzhov, A., Molchanov, D., & Vetrov, D. (2020). Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning. arXiv preprint arXiv:2002.06470.
[05/03/2020] Vahan presents: Haber, E., Ruthotto, L., Holtham, E., & Jun, S. H. (2018, April). Learning Across Scales---Multiscale Methods for Convolution Neural Networks. In Thirty-Second AAAI Conference on Artificial Intelligence.
[27/02/2020] Arvid presents: Wilson, A. G., Hu, Z., Salakhutdinov, R., & Xing, E. P. (2016, May). Deep kernel learning. In Artificial Intelligence and Statistics (pp. 370-378).
[20/02/2020] Arvid presents: Wilson, A., & Nickisch, H. (2015, June). Kernel interpolation for scalable structured Gaussian processes (KISS-GP). In International Conference on Machine Learning (pp. 1775-1784).
[13/02/2020] Arvid presents: Wilson, A. G., Knowles, D. A., & Ghahramani, Z. (2011). Gaussian process regression networks. arXiv preprint arXiv:1110.4411.
[06/02/2020] Joao presents: Sabour, S., Frosst, N., & Hinton, G. E. (2017). Dynamic routing between capsules. In Advances in neural information processing systems (pp. 3856-3866).
[30/01/2020] Carlos presents: Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In Advances in neural information processing systems (pp. 4765-4774).
[23/01/2020] Carlos presents: Ribeiro, M. T., Singh, S., & Guestrin, C. (2018, April). Anchors: High-precision model-agnostic explanations. In Thirty-Second AAAI Conference on Artificial Intelligence.
[16/01/2020] Joao presents: Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135-1144). ACM.
[12/12/2019] Vahan presents: Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., & Raffel, C. (2019). Mixmatch: A holistic approach to semi-supervised learning. arXiv preprint arXiv:1905.02249.
[05/12/2019] Gary presents: Dozat, T. (2016). Incorporating nesterov momentum into adam.
[28/11/2019] Joao presents: Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017, August). On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (pp. 1321-1330). JMLR. org.
[21/11/2019] Carlos presents: Platt, J. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers, 10(3), 61-74.
[14/11/2019] Vahan presents: Kendall, A., & Cipolla, R. (2016, May). Modelling uncertainty in deep learning for camera relocalization. In 2016 IEEE international conference on Robotics and Automation (ICRA) (pp. 4762-4769). IEEE.
[07/11/2019] Gary presents: Cobb, A. D., Roberts, S. J., & Gal, Y. (2018). Loss-calibrated approximate inference in Bayesian neural networks. arXiv preprint arXiv:1805.03901.
[31/10/2019] Arvid presents (slides): Chapelle, Olivier, and Lihong Li. "An empirical evaluation of thompson sampling." Advances in neural information processing systems. 2011.
[24/10/2019] Ivan presents: Chelombiev, I., Houghton, C., & O'Donnell, C. (2019). Adaptive estimators show information compression in deep neural networks. arXiv preprint arXiv:1902.09037.
[10/10/2019] Ivan presents: Saxe, A. M., Bansal, Y., Dapello, J., Advani, M., Kolchinsky, A., Tracey, B. D., & Cox, D. D. (2018). On the information bottleneck theory of deep learning.
[03/10/2019] Ivan presents: Shwartz-Ziv, R., & Tishby, N. (2017). Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810.
[26/09/2019] Carlos presents (with demo): Cheng, H. T., Koc, L., Harmsen, J., Shaked, T., Chandra, T., Aradhye, H., ... & Anil, R. (2016, September). Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems (pp. 7-10). ACM.
[19/09/2019] Alan presents: Mosca, A., & Magoulas, G. D. (2018). Distillation of deep learning ensembles as a regularisation method. In Advances in Hybridization of Intelligent Methods (pp. 97-118). Springer, Cham.
[12/09/2019] Carlos presents: Papernot, N., McDaniel, P., Wu, X., Jha, S., & Swami, A. (2016, May). Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP) (pp. 582-597). IEEE.
[05/09/2019] Carlos presents: Frosst, N., & Hinton, G. (2017). Distilling a neural network into a soft decision tree. arXiv preprint arXiv:1711.09784.
[29/08/2019] Alan presents: Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
[22/08/2019] Gary presents: Lee, J., Lee, I., & Kang, J. (2019). Self-Attention Graph Pooling. arXiv preprint arXiv:1904.08082.
[15/08/2019] Vahan presents: Yao, L., Mao, C., & Luo, Y. (2019, July). Graph convolutional networks for text classification. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, pp. 7370-7377).
[08/08/2019] Carlos presents: Wu, F., Zhang, T., Souza Jr, A. H. D., Fifty, C., Yu, T., & Weinberger, K. Q. (2019). Simplifying graph convolutional networks. arXiv preprint arXiv:1902.07153.
[25/07/2019] Arvid presents: Enßlin, T. A., Frommert, M., & Kitaura, F. S. (2009). Information field theory for cosmological perturbation reconstruction and nonlinear signal analysis. Physical Review D, 80(10), 105005.
[18/07/2019] Gary presents: Zhang, G., Wang, C., Xu, B., & Grosse, R. (2018). Three mechanisms of weight decay regularization. arXiv preprint arXiv:1810.12281.
[11/07/2019] Auke presents: Oord, A. V. D., Li, Y., & Vinyals, O. (2018). Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748.
[04/07/2019] François presents: Kool, W., van Hoof, H., & Welling, M. (2019). Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement. arXiv preprint arXiv:1903.06059.
[20/06/2019] Vahan presents: Kipf, T. N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.
[13/06/2019] Alessio presents: Dobriban, E., & Liu, S. (2018). A new theory for sketching in linear regression. arXiv preprint arXiv:1810.06089.
[06/06/2019] François presents: Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).
[30/05/2019] Arvid presents: Schölkopf, B., Smola, A., & Müller, K. R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural computation, 10(5), 1299-1319.
[23/05/2019] Auke presents: Alaa, A. M., & van der Schaar, M. (2018). Autoprognosis: Automated clinical prognostic modeling via bayesian optimization with structured kernel learning. arXiv preprint arXiv:1802.07207.
[16/05/2019] Carlos presents: Dhamija, A. R., Günther, M., & Boult, T. (2018). Reducing Network Agnostophobia. In Advances in Neural Information Processing Systems (pp. 9175-9186).
[09/05/2019] Naman presents: Geifman, Y., & El-Yaniv, R. (2017). Selective classification for deep neural networks. In Advances in neural information processing systems (pp. 4878-4887).
[02/05/2019] Gary presents: Gal, Y., & Ghahramani, Z. (2016, June). Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning (pp. 1050-1059).
[25/04/2019] Vahan presents: Lakshminarayanan, B., Pritzel, A., & Blundell, C. (2017). Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems (pp. 6402-6413).
[18/04/2019] Vahan presents: Vyas, A., Jammalamadaka, N., Zhu, X., Das, D., Kaul, B., & Willke, T. L. (2018). Out-of-distribution detection using an ensemble of self supervised leave-out classifiers. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 550-564).
[11/04/2019] Carlos presents: Bendale, A., & Boult, T. E. (2016). Towards open set deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1563-1572).
[04/04/2019] Arvid presents: Reshef, D. N., Reshef, Y. A., Finucane, H. K., Grossman, S. R., McVean, G., Turnbaugh, P. J., ... & Sabeti, P. C. (2011). Detecting novel associations in large data sets. science, 334(6062), 1518-1524.
[28/03/2019] Joao presents: Chen, B., Medini, T., & Shrivastava, A. (2019). SLIDE: In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems. arXiv preprint arXiv:1903.03129.
[21/03/2019] Joao presents: Oord, A. V. D., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., ... & Kavukcuoglu, K. (2016). Wavenet: A generative model for raw audio.. arXiv preprint.
[14/03/2019] Vahan presents: Wright, J., Ganesh, A., Rao, S., Peng, Y., & Ma, Y. (2009). Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization. In Advances in neural information processing systems (pp. 2080-2088).
[07/03/2019] Vahan presents: Candes, E. J., Romberg, J. K., & Tao, T. (2006). Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 59(8), 1207-1223.
[28/02/2019] Arvid presents: Dietterich, T. G., & Bakiri, G. (1994). Solving multiclass learning problems via error-correcting output codes. Journal of artificial intelligence research, 2, 263-286.
[21/02/2019] Gary presents: Mnih, A., & Kavukcuoglu, K. (2013). Learning word embeddings efficiently with noise-contrastive estimation. In Advances in neural information processing systems (pp. 2265-2273).
[14/02/2019] Carlos presents: Ziko, I., Granger, E., & Ayed, I. B. (2018). Scalable Laplacian K-modes. In Advances in Neural Information Processing Systems (pp. 10062-10072).
[07/02/2019] Carlos presents: Wang, W., & Carreira-Perpinán, M. A. (2014). The Laplacian K-modes algorithm for clustering. arXiv.
[31/01/2019] Gary presents: Hoffer, E., Hubara, I., & Soudry, D. (2017). Train longer, generalize better: closing the generalization gap in large batch training of neural networks. In Advances in Neural Information Processing Systems (pp. 1731-1741).
[24/01/2019] Alessio presents: McInnes, L., & Healy, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426.
[17/01/2019] Chris presents: Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine learning research.
[10/01/2019] Carlos presents: Chen, T. Q., Rubanova, Y., Bettencourt, J., & Duvenaud, D. (2018). Neural Ordinary Differential Equations. arXiv:1806.07366.
[20/12/2018] Gary presents: Wilson, A. C., Roelofs, R., Stern, M., Srebro, N., & Recht, B. (2017). The marginal value of adaptive gradient methods in machine learning. In Advances in Neural Information Processing Systems.
[13/12/2018] Carlos presents: Lin, H., & Jegelka, S. (2018). ResNet with one-neuron hidden layers is a Universal Approximator. In Advances in Neural Information Processing Systems.
[06/12/2018] Auke presents: Ulyanov, D., Vedaldi, A., & Lempitsky, V. (2018). Deep image prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 9446-9454).
[29/11/2018] Vahan presents: Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2016). Understanding deep learning requires rethinking generalization. arXiv:1611.03530.
[22/11/2018] Gary presents: Smith, S. L., Kindermans, P. J., Ying, C., & Le, Q. V. (2017). Don't decay the learning rate, increase the batch size. arXiv:1711.00489.
[15/11/2018] Joao presents: Bai, S., Kolter, J. Z., & Koltun, V. (2018). An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv:1803.01271.
[01/11/2018] Vahan presents: Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences.
[18/10/2018] Carlos presents: Howard, J., & Ruder, S. (2018). Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.
[11/10/2018] dos Santos, C., & Gatti, M. (2014). Deep convolutional neural networks for sentiment analysis of short texts. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers.

nplan-io/ml_paper_club

nplan-io

Reviews

Repository Details