papers-I-read
I am trying a new initiative - a-paper-a-week. This repository will hold all those papers and related summaries and notes.
List of papers
- Toolformer - Language Models Can Teach Themselves to Use Tools
- Hints for Computer System Design
- Synthesized Policies for Transfer and Adaptation across Tasks and Environments
- Deep Neural Networks for YouTube Recommendations
- The Tail at Scale
- Practical Lessons from Predicting Clicks on Ads at Facebook
- Ad Click Prediction - a View from the Trenches
- Anatomy of Catastrophic Forgetting - Hidden Representations and Task Semantics
- When Do Curricula Work?
- Continual learning with hypernetworks
- Zero-shot Learning by Generating Task-specific Adapters
- HyperNetworks
- Energy-based Models for Continual Learning
- GPipe - Easy Scaling with Micro-Batch Pipeline Parallelism
- Compositional Explanations of Neurons
- Design patterns for container-based distributed systems
- Cassandra - a decentralized structured storage system
- CAP twelve years later - How the rules have changed
- Consistency Tradeoffs in Modern Distributed Database System Design
- Exploring Simple Siamese Representation Learning
- Data Management for Internet-Scale Single-Sign-On
- Searching for Build Debt - Experiences Managing Technical Debt at Google
- One Solution is Not All You Need - Few-Shot Extrapolation via Structured MaxEnt RL
- Learning Explanations That Are Hard To Vary
- Remembering for the Right Reasons - Explanations Reduce Catastrophic Forgetting
- A Foliated View of Transfer Learning
- Harvest, Yield, and Scalable Tolerant Systems
- MONet - Unsupervised Scene Decomposition and Representation
- Revisiting Fundamentals of Experience Replay
- Deep Reinforcement Learning and the Deadly Triad
- Alpha Net: Adaptation with Composition in Classifier Space
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
- Gradient Surgery for Multi-Task Learning
- GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks
- TaskNorm: Rethinking Batch Normalization for Meta-Learning
- Averaging Weights leads to Wider Optima and Better Generalization
- Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions
- When to use parametric models in reinforcement learning?
- Network Randomization - A Simple Technique for Generalization in Deep Reinforcement Learning
- On the Difficulty of Warm-Starting Neural Network Training
- Supervised Contrastive Learning
- CURL - Contrastive Unsupervised Representations for Reinforcement Learning
- Competitive Training of Mixtures of Independent Deep Generative Models
- What Does Classifying More Than 10,000 Image Categories Tell Us?
- mixup - Beyond Empirical Risk Minimization
- ELECTRA - Pre-training Text Encoders as Discriminators Rather Than Generators
- Gradient based sample selection for online continual learning
- Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One
- Massively Multilingual Neural Machine Translation in the Wild - Findings and Challenges
- Observational Overfitting in Reinforcement Learning
- Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML
- Accurate, Large Minibatch SGD - Training ImageNet in 1 Hour
- Superposition of many models into one
- Towards a Unified Theory of State Abstraction for MDPs
- ALBERT - A Lite BERT for Self-supervised Learning of Language Representations
- Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
- Contrastive Learning of Structured World Models
- Gossip based Actor-Learner Architectures for Deep RL
- How to train your MAML
- PHYRE - A New Benchmark for Physical Reasoning
- Large Memory Layers with Product Keys
- Abductive Commonsense Reasoning
- Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
- Assessing Generalization in Deep Reinforcement Learning
- Quantifying Generalization in Reinforcement Learning
- Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks
- Measuring abstract reasoning in neural networks
- Hamiltonian Neural Networks
- Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations
- Meta-Reinforcement Learning of Structured Exploration Strategies
- Relational Reinforcement Learning
- Good-Enough Compositional Data Augmentation
- Multiple Model-Based Reinforcement Learning
- Towards a natural benchmark for continual learning
- Meta-Learning Update Rules for Unsupervised Representation Learning
- GNN Explainer - A Tool for Post-hoc Explanation of Graph Neural Networks
- To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks
- Model Primitive Hierarchical Lifelong Reinforcement Learning
- TuckER - Tensor Factorization for Knowledge Graph Completion
- Linguistic Knowledge as Memory for Recurrent Neural Networks
- Diversity is All You Need - Learning Skills without a Reward Function
- Modular meta-learning
- Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies
- Efficient Lifelong Learningi with A-GEM
- Pre-training Graph Neural Networks with Kernels
- Smooth Loss Functions for Deep Top-k Classification
- Hindsight Experience Replay
- Representation Tradeoffs for Hyperbolic Embeddings
- Learned Optimizers that Scale and Generalize
- One-shot Learning with Memory-Augmented Neural Networks
- BabyAI - First Steps Towards Grounded Language Learning With a Human In the Loop
- Poincaré Embeddings for Learning Hierarchical Representations
- When Recurrent Models Don’t Need To Be Recurrent
- HoME - a Household Multimodal Environment
- Emergence of Grounded Compositional Language in Multi-Agent Populations
- A Semantic Loss Function for Deep Learning with Symbolic Knowledge
- Hierarchical Graph Representation Learning with Differentiable Pooling
- Imagination-Augmented Agents for Deep Reinforcement Learning
- Kronecker Recurrent Units
- Learning Independent Causal Mechanisms
- Memory-based Parameter Adaptation
- Born Again Neural Networks
- Net2Net-Accelerating Learning via Knowledge Transfer
- Learning to Count Objects in Natural Images for Visual Question Answering
- Neural Message Passing for Quantum Chemistry
- Unsupervised Learning by Predicting Noise
- The Lottery Ticket Hypothesis - Training Pruned Neural Networks
- Cyclical Learning Rates for Training Neural Networks
- Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning
- An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks
- Learning an SAT Solver from Single-Bit Supervision
- Neural Relational Inference for Interacting Systems
- Stylistic Transfer in Natural Language Generation Systems Using Recurrent Neural Networks
- Get To The Point: Summarization with Pointer-Generator Networks
- StarSpace - Embed All The Things!
- Emotional Chatting Machine - Emotional Conversation Generation with Internal and External Memory
- Exploring Models and Data for Image Question Answering
- How transferable are features in deep neural networks
- Distilling the Knowledge in a Neural Network
- Revisiting Semi-Supervised Learning with Graph Embeddings
- Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension
- Higher-order organization of complex networks
- Network Motifs - Simple Building Blocks of Complex Networks
- Word Representations via Gaussian Embedding
- HARP - Hierarchical Representation Learning for Networks
- Swish - a Self-Gated Activation Function
- Reading Wikipedia to Answer Open-Domain Questions
- Task-Oriented Query Reformulation with Reinforcement Learning
- Refining Source Representations with Relation Networks for Neural Machine Translation
- Pointer Networks
- Learning to Compute Word Embeddings On the Fly
- R-NET - Machine Reading Comprehension with Self-matching Networks
- ReasoNet - Learning to Stop Reading in Machine Comprehension
- Principled Detection of Out-of-Distribution Examples in Neural Networks
- Ask Me Anything: Dynamic Memory Networks for Natural Language Processing
- One Model To Learn Them All
- Two/Too Simple Adaptations of Word2Vec for Syntax Problems
- A Decomposable Attention Model for Natural Language Inference
- A Fast and Accurate Dependency Parser using Neural Networks
- Neural Module Networks
- Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
- Conditional Similarity Networks
- Simple Baseline for Visual Question Answering
- VQA: Visual Question Answering
- Learning to Generate Reviews and Discovering Sentiment
- Seeing the Arrow of Time
- End-to-end optimization of goal-driven and visually grounded dialogue systems
- GuessWhat?! Visual object discovery through multi-modal dialogue
- Semantic Parsing via Paraphrasing
- Traversing Knowledge Graphs in Vector Space
- PPDB: The Paraphrase Database
- NewsQA: A Machine Comprehension Dataset
- A Persona-Based Neural Conversation Model
- “Why Should I Trust You?” Explaining the Predictions of Any Classifier
- Conditional Generative Adversarial Nets
- Addressing the Rare Word Problem in Neural Machine Translation
- Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models
- Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
- Improving Word Representations via Global Context and Multiple Word Prototypes
- Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
- Skip-Thought Vectors
- Deep Convolutional Generative Adversarial Nets
- Generative Adversarial Nets
- A Roadmap towards Machine Intelligence
- Smart Reply: Automated Response Suggestion for Email
- Convolutional Neural Network For Sentence Classification
- Conditional Image Generation with PixelCNN Decoders
- Pixel Recurrent Neural Networks
- Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
- Bag of Tricks for Efficient Text Classification
- GloVe: Global Vectors for Word Representation
- SimRank: A Measure of Structural-Context Similarity
- How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation
- Neural Generation of Regular Expressions from Natural Language with Minimal Domain Knowledge
- WikiReading : A Novel Large-scale Language Understanding Task over Wikipedia
- WikiQA: A challenge dataset for open-domain question answering
- Teaching Machines to Read and Comprehend
- Evaluating Prerequisite Qualities for Learning End-to-end Dialog Systems
- Recurrent Neural Network Regularization
- Deep Math: Deep Sequence Models for Premise Selection
- A Neural Conversational Model
- Key-Value Memory Networks for Directly Reading Documents
- Advances In Optimizing Recurrent Networks
- Query Regression Networks for Machine Comprehension
- Sequence to Sequence Learning with Neural Networks
- The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training
- Question Answering with Subgraph Embeddings
- Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
- Visualizing Large-scale and High-dimensional Data
- Visualizing Data using t-SNE
- Curriculum Learning
- End-To-End Memory Networks
- Memory Networks
- Learning To Execute
- Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud
- Large Scale Distributed Deep Networks
- Efficient Estimation of Word Representations in Vector Space
- Regularization and variable selection via the elastic net
- Fractional Max-Pooling
- TAO: Facebook’s Distributed Data Store for the Social Graph
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- The Unified Logging Infrastructure for Data Analytics at Twitter
- A Few Useful Things to Know about Machine Learning
- Hive – A Petabyte Scale Data Warehouse Using Hadoop
- Kafka: a Distributed Messaging System for Log Processing
- Power-law distributions in Empirical data
- Pregel: A System for Large-Scale Graph Processing
- GraphX: Unifying Data-Parallel and Graph-Parallel Analytics
- Pig Latin: A Not-So-Foreign Language for Data Processing
- Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
- MapReduce: Simplified Data Processing on Large Clusters
- BigTable: A Distributed Storage System for Structured Data
- Spark SQL: Relational Data Processing in Spark
- Spark: Cluster Computing with Working Sets
- Fast Data in the Era of Big Data: Twitter’s Real-Time Related Query Suggestion Architecture
- Scaling Memcache at Facebook
- Dynamo: Amazon’s Highly Available Key-value Store
- f4 : Facebook's Warm BLOB Storage System
- A Theoretician’s Guide to the Experimental Analysis of Algorithms
- Cuckoo Hashing
- Never Ending Learning