• Stars
    star
    214
  • Rank 184,678 (Top 4 %)
  • Language
    Jupyter Notebook
  • License
    Apache License 2.0
  • Created over 4 years ago
  • Updated almost 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Implementation of Estimating Training Data Influence by Tracing Gradient Descent (NeurIPS 2020)

TracIn

Implementation of Estimating Training Data Influence by Tracing Gradient Descent

Goal: Identify the influence of training data points on F(data point at inference time).

Idea: Trace Stochastic Gradient Descent (Using the loss function as F)

Equation

Broader Impact

This work proposes a practical technique to understand the influence of training data points on loss functions/predictions/differentiable metrics. The technique is easier to apply than previously proposed techniques, and we hope it is widely used to understand the quality and influence of training data. For most real world applications, the impact of improving the quality of training data is simply to improve the quality of the model. In this sense, we expect the broader impact to be positive.

Most of the implementation in this repo will be in the form of colabs. Consider reading the FAQ before adapting to your own data.

Terminology

  • Proponents have positive scores proportional to loss reduction.
  • Opponents have negative scores proportional to loss enlargement.
  • Self-influence is the influence of a training point on its own loss.