Autodiff Puzzles
- by Sasha Rush - srush_nlp
Click here to get started:
This notebook contains a series of self-contained puzzles for learning about derivatives in tensor libraries. It is the 3rd puzzle set in a series of puzzles about programming for deep learning (Tensor Puzzles, GPU Puzzles).
Introduction
Deep learning libraries utilize autodifferentiation of tensors to compute the parameter updates necessary to learn complex models from data. This technique is central to understanding why deep learning has become so widely used and effective. The autodifferentiation process is a neat trick that builds up a computational graph and then uses that graph to provide derivatives for user-constructed mathematical functions. At heart, this process is just an instantiation of the chain-rule based on implementations of every function and its derivative.
However, a library needs to have efficient implementations of derivatives for its key building blocks. This sounds trivial -> just implement high school calculus. However, this is a bit more tricky than it sounds. Tensor-to-tensor functions are pretty complex and require keeping careful track of indexing on both the input and the output side.
Your goal in these puzzles is to implement the derivatives for each function below. In each case the function takes in a tensor x and returns a tensor f(x), so your job is to compute
Rules and Tips
-
Every answer is 1 line of 80-column code.
-
Everything in these puzzles should be done with standard Python numbers. (No need for torch or anything else.)
-
Recall the basic multivariate calculus identities, most importantly:
$$f(x_1, x_2, x_3) = x_1 + x_2 + x_3 \Rightarrow \frac{d f(x)_1}{dx_1} = 1, \frac{d f(x)_1}{dx_2} = 0$$ -
Python booleans auto-cast with python numbers. So you can use them as indicator functions, i.e.
$$\mathbf{1}(3=3) \cdot (25-3)$$
For each of the problems, a function dx
variable. If you get any of the derivatives wrong, it will print out the values of