• Stars
    star
    361
  • Rank 117,957 (Top 3 %)
  • Language
    HTML
  • Created over 10 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

code + contents of my website, and programming life
<h2> A Universe of Sorts
<img style="float:left;display:inline-block;padding-right: 16px; width: 48px" src="/static/banner.png">
</h2>
<h3> Siddharth Bhat </h3>

- [Leave me your thoughts](https://www.admonymous.co/bollu) / [Chat with me](https://calendly.com/bollu/) / Email me:  <a href='mailto:[email protected]'> `[email protected]` </a>
- [Github](http://github.com/bollu) / [Math.se](https://math.stackexchange.com/users/261373/siddharth-bhat) /  [Resume](resume/main.pdf) / [Link hoard](todo.html)
- <a type="application/rss+xml" href="feed.rss"> RSS feed </a>
- **It's useful to finish things.**


# Uniform Boundedness Principle / Banach Steinhauss

- Consider a set of bounded linear operators $\mathcal F$. If $\mathcal F$ is pointwise bounded,
  that is, $sup_{T \in \mathcal F}\{ ||T(p)|| \}$ exists for all $p \in X$, then 
  the family is norm-bounded: $\sup_{T \in \mathcal F} \{ ||T|| \}$ exists.

## Proof 1: Based on an ingenious inequality

- Reference: A really simple elementary proof of the uniform boundedness theorem by Alan D Sokal

##### Ingenious Inequality, Version 1
- Let $T: X \to Y$ be a bounded linear operator. Then for any $r \geq 0$ we have 
  $\sup_{ ||x|| \leq r } ||Tx|| \geq ||T||r$.
- Proof: recall that $||T|| \equiv \sup_{||x|| = 1} ||Tx||$.
- Now see that $\sup_{ ||x|| \leq r } ||Tx|| \geq \sup_{ ||x|| = r } ||Tx||$.
- This can be rewritten as $r \sup_{ ||x|| = r } || T(x/r) ||$, but this $r \sup_{ ||\hat x|| = 1 } T(\hat x) = r ||T||$.

#### Ingenious Inequality, Version 2
- Let $T: X \to Y$ be a bounded linear operator, let $p \in X$ be any basepoint. 
  Then for any $r \geq 0$ we have $\sup_{ y' \in B(p, r) } ||Ty'|| \geq ||T||r$.
- We rewrite the optimization problem as $\sup_{ ||x|| \leq r } ||T(p + x)||.
- First, consider:  $\max{||T(p + x)||, ||T(p - x)||} \geq 1/2 [||T(p + x)|| + ||T(p - x)|| \geq ||T(x)||.
- The last inequality follows from $||\alpha|| + ||\beta|| = ||\alpha|| + ||-\beta|| \leq ||\alpha + (-\beta)||$, that is,
  triangle inequality.
- Now we see that:

$$
\begin{aligned}
&\sup_{||x|| \leq r} ||T(p + x)|| = \sup_{||x|| \leq r} \max(||T(p + x)||, ||T(p - x)||)
&\sup_{||x|| \leq r} \max(||T(p + x)||, ||T(p - x)||) \geq \sup_{||x|| \leq r} ||T(x)||
&\sup_{||x|| \leq r} ||T(x)|| = ||T||r
\end{aligned}
$$

- and thus we get the bound that $||\sup_{||x|| \leq r} ||T(p + x) \geq ||T||r$.

#### Proof of theorem

- Suppose for contradiction that $\sup_{T \in \mathcal F} ||T|| = \infty$, which it is indeed
  pointwise bounded (for all $p$ $\sup_{T \in \mathcal F} ||Tp||$ is bounded).
- Then choose a sequence $T[n]$ such that $||T[n]|| \geq 4^n$. This is possible since the set is unbounded.
- Next, create a sequence of points, such that $x[0] = 0$, and $||x[n] - x[n - 1]|| \leq 3^{-n}$ (that is,
  $x[n]$ is a $3^{-n}$ radius ball around $x[n-1]$.
- See that this sequence is cauchy, and thus converges. In particular, let the limit be $L$.
  Then we can show that $||L - x[n]|| \leq 3^{-n}(1 - 1/3)) =  2/3  3^{-n}$.
- Also see that we have the bound $||T_n L || \geq 2/3 3^{-n} 4^n = 2/3 (4/3)^n$.
- Thus, $\lim_{n \to \infty} ||T_n L|| \to \infty$.
- But this contradicts the pointwise boundedness of $\mathcal F$ at the point $L$.  Hence proved.



## Proof 2 using Baire category
- Suppose that for every $x \in X$, $\sup_{T \in \mathcal F} ||T(x)|| < \infty$.
- We want to show that $\sup_{T \in \mathcal F} ||T|| < \infty$.
- For every integer $n \in \mathbb N$, we build the subset
  $X_n \equiv \{ x \in X : \sup_{T \in \matcal F} ||T(x)|| \leq n \}$.
- Since for every $l \in X$, there is *some* $n_l$ such that $||T(l)|| < n_l$ (by assumption, $\mathcal F$ is pointwise
  bounded), we know that the sets $X_n$ cover $X$.
- Furthermore, each $X_n$ is closed: A cauchy sequence of points such that $||T x_n|| \leq k$  will converge to a limit $L$
  such that $||T L|| \leq k$.
- Thus, by the baire category theorem, there is a ball $B(p, r) \subseteq X_m$ for some $m \in \mathbb N$, $r > 0$.
- Now this means that the set $B(p, r)$ is norm bounded as $\leq m$.
- But this is a linear space, once we trap one ball we trap them all. By rescaling and translation, we can move the 
  norm boundedness of $B(p, r)$ into the norm boundedness of $B(origin, 1)$ at which point we have proven that $||T(x)|| \leq infty||$.
- Now let $||u|| \leq 1$ and $T \in \mathcal F$. Calculate:

$$
\begin{aligned}
&||Tu|| \\
& = 1/r ||T (p + r u) - T(p)|| \\
& \text{(triangle inequality:)} \\
& \leq 1/r (||T(p + ru)|| + || T(p)||
& \text{($p + ru, p \in B(p, r))}  \\
& \leq 1/r (m + m)
\end{aligned}
$$

- This bound of $2m/r$ does not in any way depend on $T$ or $u$, then $\sup_{T \in \mathcal F} ||T|| < \infty$,
  which establishes the bound.


# Coercive operator

- This is called as the lax milgram theorem, but in lawrence and narici, it's a fucking lemma (lol).
- Suppose there is an operator $A : X \to Y$ whose norm is bounded *below*: That is, there exists a $k$ 
  such that for all $x$, $k||x|| \leq ||Ax||$.
- Intuitively, this forces $A$ to "spread vectors" out, making it one-one.
- Intuitively, this makes the inverse bounded, because the inequality "flips direction" when we consider the inverse operator.
- See that we do not require $A$ to be bounded! We will still get a bounded inverse operator.

#### Step 1: $A$ is one to one

- Suppose $At = 0$. We will show that this implies $t = 0$.
- $k ||t || \leq ||At||$. That is, $k ||t|| \leq 0$. Since $k > 0$, this implies $||t|| = 0$ or $t = 0$.

#### Step 2: $A^{-1}$ is bounded

- Define $A^{-1}(y) \equiv x$ when $Ax = y$.
- Since $k ||x|| \leq ||Ax||$, we write $Ax = y$, and thus $x = A^{-1} y$.
- This gives us $k || A^{-1} y || \leq y$.
- This means that $||A^{-1} y|| \leq (1/k) y$, thereby establishing the boundedness of $A$.
- Thus, $A$ is a bounded linear operator.


#### Claim: This is in fact sufficient: Every invertible operator $A$ with bounded inverse has such a lower bound $k$.

- Reverse the proof: take the bound of $A^{-1}$ to be $k$ and show that this lower bounds $A$.

- We can thus define $A^{-1} : Range(A) \to X$

# It suffices to check for weak convergence on a spanning set.

- Theorem: suppose $x[i]$ is a bounded sequence in $X$. Then, to check that $x[i] \to_w L$,
  it suffices to check on a spanning set $A \subseteq X$ such that $closure(span(A)) = X$.
- Proof: first, it easily suffices for linear combinations by triangle inequality.
- Next, to show it suffices for closures, we wish to show that $h(x[n]) \to h(L)$ given that $g(x[n]) \to g(x)$
  for all $g \in span(A)$.
- Let $h = \lim_j g[j]$ for some $g[j] \in X^\star$.
- Let us bound $|h(x[n]) - h(L)|$. 
- This is equal to $|h(x[n]) - g[j](x[n]) + g[j](x[n]) + g[j](L) - g[j](L) - h(L)$
- Rearranging: $|(h(x[n]) - g[j](x[n])) + (g[j](x[n])  - g[j](L)) + (g[j](L) - h(L))|$.
- We bound each pair: $|h(x[n]) - g[j](x[n])|$ can be made arbitrary because $g[j] \to h$, and thus they are bounded
  pointwise since these are bounded linear functionals.
- $|g[j](x[n])  - g[j](L)$ can be made arbitrarily small because we know that $x[n] \to_w L$ on the set $A$.
- The third term $|g[j](L) - h(L))|$ can be made arbitrarily small because $g[j] \to h$ and these are bounded linear
  functionals.
- Thus we have shown that we can make stuff arbitrarily small, and we are done!


# Sequence that converges weakly but not strongly in $l^p$.

- Consider the sequence $e_1 = (1, 0, \dots)$, $e_2 \equiv (0, 1, \dots)$, and
  in general $e_i[j] = \delta_i^j$.
- Recall that to check weak convergence, it suffices to check on a basis of the dual space.
- We check on the basis $\pi_j (x) \mapsto x[j]$. 
- Clearly, on such a basis, we see that $\lim_{n \to \infty} e_n[j] \to 0$, because after $n > j$,
  the sequence will be forever zero.
- However, see that this sequence does not strongly converge, since the basis vectors $e_i$ cannot be cauchy,
  since $||e_i - e_j|| = \sqrt(2)$ when $i \neq j$.
- The intuition is that weak convergence can only see converge "in a finite subspace", since we are
  considering what happens with bounded linear functionals.
- Thus, a sequence can appear to converge when restricting attention to any finite region of space, but cannot
  strongly converge.




# Axioms for definite integration

- [Pete Clark](https://math.stackexchange.com/a/56522/261373)'s notes on honors calculus
  provides a handy axiomatization of what properties the definite integral ought to satisfy.
- 1. If $f = C$ is a constant function, then $\int_a^b C = C (b - a)$.
- 2. If $f_1(x) \leq f_2(x)$ for all $x \in [a, b]$, then $\int_a^b f_1(x) \leq \int_a^b f_2(x)$.
- 3. If $a \leq c \leq b$, then $\int_a^b f = \int_a^c f + \int_c^b f$.

#### Proof of fundamental theorem of calculus from the above axiomatization

- Let $f$ be any integrable function over $[a, b]$. For $x \in [a, b]$, we define $F(x) \equiv \int_a^x f$. Then:
- (a) The function $F : [a, b] \to \mathbb R$ is continuous at every $c \in [a, b]$.
- (b) if $f$ is continuous at $c$, then $F$ is differentiable at $c$ and $F'(c) = f(c)$.
- (c) if $f$ is continuous and $F$ is any antiderivative of $f$, that is, $F'(x) = f(x)$, then 
  $\int_a^b f = F(b) - F(a)$.
- Proof:
- First, by coninuity of $f$ and compactness of $[a, b]$, there exists a $M \in \mathbb R$ such that $|f(x)| \leq M$
  for all $x \in [a, b]$. If $M = 0$, then $f(x) = 0$ and thus from axiom 2 $F = 0$ and everything holds.
- Thus we assume that $M > 0$. For all $\epsilon > 0$, we take $\delta = \epsilon / M$.
- By the third axiom, we see that $F(x) - F(c) = \int_a^x f - \int_a^c F = \int_c^x f$.
- TODO.

# Quotient spaces of Banach space

- We will see why it is important for a subspace $M$ of a banach space $X$ 
  to be closed for $X/M$ to be banach.
- The algberaic properties of $+$ and $\cdot$ will go through for any subspace $M$ since they
  in no way depend on norm.
- The norm on $X/M$ will correctly interact with rescaling and triangle inequality also 
  for any subspace $M$.
- However, to show that the norm is non-degenerate ($||x|| = 0$ iff $x = 0$) needs $M$ to be closed.


#### Norm on $X/M$

- We define the norm on $X/M$ as $||\overline{x}|| \equiv \inf_{m \in M} ||x + m||$.
  This is abbreviated to $||x + M||$.

#### Lemma: Norm on $X/M$ interacts correctly with rescaling
- ||\alpha \overline{x}|| = \inf_{m \in M} ||\alpha x + m||$.
- But we can replace $m \mapsto \alpha m$, giving $\inf_{m \in M} || \alpha x + \alpha m||$, 
  which equals $\inf_{m \in M} \alpha  ||x + M|| = \alpha || \overline{x}||$.
- Thus, scalar product correctly rescales with norm.

#### Lemma: Norm on $X/M$ obeys triangle ineq

- The LHS is ||\overline{x} + \overline{y}|| = \inf_{m \in M} ||x + y + m||$.
- The RHS is ||\overline{x}|| + ||\overline{y}|| = \inf{k \in M} || x + k|| + \inf_{l \in M} ||y + l||$.
- We need to somehow "split" the $m$ in the LHS into $k$ and $l$.
- We do this sequentually. There must be a sequence of elements $k[i]$ such that 
  $||\overline{x}|| \leq ||x + k[i]||$ such that $||x + k[i]|| \to ||\overline{x}||$.
- Similarly, there must be a sequence of elements $l[i]$ such that 
  $||\overline{y}|| \leq ||y + l[i]||$ such that $||y + l[i]|| \to ||\overline{y}||$.
- Now, we see that $||overline{x} + \overline y|| \leq ||x + y + k[i] + l[i]||$.
- By triangle inequality, this is going to be $||\overline{x} + \overline{y}|| \leq ||x + k[i]|| + ||y + l[i]||$.
- Since this holds pointwise, it also holds in the limit, proving triangle inequality..

#### Theorem: proving that norm of zero is zero

- It is clear that $|| \overline 0|| = \inf_{m \in M} || 0 + m|| = || 0 + 0 || = 0$.

#### Theorem: proving that norm is nondegenerate.

- Suppose $||\overline{x}|| = 0$. We want to show that $\overline{x} = 0$, or $x \in M$.
- This means that $\inf_{m \in M} ||x + m|| = 0$.
- Thus there are a sequence of elements $m[i]$ such that ||x + m[i]|| \to 0$.
- This implies that $x + m[i] \tendsto 0$, since this is happening using the norm of the underlying space.
- This means that $m[i] \to -x$. 
- Now, we need to use the fact that $M$ is closed, to say that $-x \in M$, to get that $x \in M$.
- This gives us that $\overline{x} = 0$.


# Reisez Lemma

- Let $M$ be a closed proper subspace of a normed linear space $X$. Then for all $0 < \alpha < 1$,
  there exists a $p \in X$ (dependent on $\alpha$), such that $d(M, p) \geq \alpha$. 
  That is, $\forall m \in M, d(m, p) \geq \alpha$.
- This is easy to establish for say $\mathbb R^2$. pick a unit orthogonal vector, it will be at least 
  $1$ unit apart (or more) by pythogoras.
- This lemma provides a convenient substitute for orthogonality.

#### Proof via Hahn Banach
- Hahn banach is also a substitute for orthogonality.
- Pick a point $z \not in M$. Thus, $d(z, M) > 0$. Note that $d(-, M)$ is:
- (a) a sublinear function on $X$.
- (b) vanishes on $M$.
- (c) equals the projection onto $\mathbb R z \simeq \mathbb R$ on $M + \mathbb Rz$.
- By Hahn Banach, the portion of it that is linear extends to a linear
  functional on all of $M + \mathbb Rz$, and is dominated above by $d(-, M)$.
- Now, normalize the bounded linear functional so obtained to get a functional $f$ such that $|f| = 1$.
  Note that noramalization does not change the fact that $f(M) = 0$.
- Next, we build an "approximate normer" $z'$. This is an element $z'$ of unit norm such that 
  $|f(z')| \sim |\f||$. Such an element exists by definition of norm: $||f|| = \sup_{||x|| = 1} |f(x)|$.
  See that since $|z'| = 1$, we must surely have that $|f(z)| \leq ||f||$, thus $|f(z') \leq 1$.
  We claim that $|f(z')| = 1 - \epsilon$. (This is clear by clear and distinct perception since $f$ "behaves differently"
  along $Y$). This must happen, for if not, then $f(z') = 1$ for all $|z'| = 1$. This is patently untrue since $f(Y) = 0$,
  thus the unit vector along $Y$ must vanish at the very least.
- Now, consider $f(z' - m) = f(z') - f(m) = (1 - \epsilon) - 0 = 1 - \epsilon$.
- Next, estimate $|f(z' - m)| = |1 - \epsilon| = 1 - \epsilon.
- This gives $1 - \epsilon = |f(z' - m)| \geq |f| |z' - m| = |z' - m|$.
- The first inequality follows from the defn of norm $|f| = \sup_k |f(k)|/|k|$, and thus $|f| < |f(k)|/|k|$, or $|f(k)| > |f||k|$.
- The second inequality follows from the fact that $|f| = 1$.
- [Reference](https://www.math.ucla.edu/~jmanaker/Expository/RieszLemma.html)

#### What about $\alpha > 1$?

- $\alpha > 1$ does not even hold in $\mathbb R^2$. If I pick $\alpha = 5$, there is no unit vector
  that is $5$ units away from the $x$-axis. A vector is at most $1$ unit away from the $x$ axis.

#### What about $\alpha = 1$?

- Apparently, this case holds for any reflexive space (double dual equals original space).
- To show counterexample, we need a non-reflexive space. consider $\l_\infty$, space of sequences
  under max norm.
- Alternatively, pick $C[0, 1]$ under max norm.
- We begin by picking a subspace $X \equiv \{ f \in C[0, 1] : f(0) = 0 \}$. So $f$ is continuous and $f(0) = 0$.
- Let $M$ be the subspace of $X$ such that $\int_0^1 f(x) dx = 0$.
- We want to show that there exists **no function** $p \in X$ such that (a) $||p|| = 1$,
  (that is, $\sup_{x \in [0, 1] p(x) = 1$ and (b) $d(p, m) \geq 1$ for all $m \in M$.

#### Pedestrian proof when $\alpha = 1$.
- If $d(p, m) \geq 1$, then we must have $d(p, 0) \geq 1$. This means that $\int_0^1 (p(x) - 0) \geq 1$,
  or that $\int_0^1 p(x) \geq 1$.
- Intuitively, since $p \in X$, we know that $p(0) = 0$, and since $p$ is continuous, it must "spend some time"
  around $0$, thereby losing some of the integral. Furthermore, since we know that $||p|| = 1$, the maximum
  integral any such function can attain is $1$.
- Since $p$ is continuous and $p(0) = 0$ (as $p \in X$), pick $\epsilon = 0.5$. Then there exists a $\delta$
  such that for all $0 \leq x < \delta$, we have that $p(x) < \epsilon = 0.5$. Thus, we can upper bound
  the integral of $p(x)$ over $[0, 1]$ by $\delta \times 0.5 + (1 - \delta) \times 1$. Ie, we surround
  $p$ by two rectangles, one of height $0.5$, one of height $1$, since we have the bounds.
  Since $\delta > 0$, we can see that $\int_0^1 p(x) dx < 1$ from the above estimate.
- Thus, this means that $d(p, m) < 1$, thereby violating the claim that we can find such a $p$ such that
  $d(0, p) \geq 1$. Hence proved!

#### Slightly more sophisticated proof when $\alpha = 1$.
- The same setup. We consider the integral operator $F: X \to \mathbb R$, defined as $F(f) \equiv \int_0^1 f(x) dx$.
- We note that $M = ker(f)$.
- We note that $||F|| \geq 1$. (In fact, $||F|| = 1$, since the function is maximized by being evaluated
  on $one(x) = 1$ which lies on the unit sphere).
- We note that $d(f, M) = |F(f)|/||F||$. That is, the distance of a point to the kernel of an operator $F$
  is the norm of $F(x)$ rescaled by the norm of $F$.
- We need an estimate on $|F(f)|$. By the above argument, we know that $|F(f)| < 1$ by continuity of $f$,
  $f(0) = 0$, and that $f(x) < 1$ for all $x$ (as $||f||  = 1$).
- Combine the two estimates to see tha $d(f, M) = (<1)/(>1)$, which is indeed less than $1$. Done.


# Using LLL to discover minimal polynomial for floating point number

- Explain by example. Let floating point number $f = 1.4142$.
- Clearly, this has a minpoly of $10000x - 14142 = 0$ with coefficients in $Z[x]$.
- However, we know that the correct minpoly is $x^2 - 2 = 0$.
- What's the difference in these two cases? Well, the correct minpoly has "small" coefficients.
- How do we find such a minpoly using LLL?
- Key idea: Build vectors (in our case) of the form $b_0 \equiv (1, 0, 0, 1000)$,
  $b_1 \equiv (0, 1, 0, 1000f)$, and $b_2 \equiv (1000f^2)$.
  Then LLL, on trying to find a shorter basis for these, will try very hard to make the third
  component smaller. It will do this by changing the basis vectors to $b'_0 \equiv b_0, b'_1 \equiv b_1$,
  and $b'_2 \equiv b_2 - 2b_0 = (-1, 0, 1, 1000(f^2 - 2)$. Since $f^2 - 2 \sim 0$, the length of $b'_2$
  is much smaller than $b_2$, granting us a shorter basis!
- In general, given some rational number $\theta$, we build `n` basis vectors `v_i` of dimension `n+1`
  of the form `v_i[i] = 1`, `v_i[n] = 1000(ฮธ^i)`, and `v_i[-] = 0` otherwise.
- LLLing on these vectors `v_i` will attempt to find an integer relation on the last
  coefficient, which will be the minpoly.

# Total Boundedness in a metric space

- A set $A$ is totally bounded iff for any $\epsilon$, there exists a
  **finite** $\epsilon$ net $N_\epsilon$.

#### Totally bounded implies bounded

- Let $S$ be a totally bounded set. We want to establish a uniform bound $M$.
- Pick some $\epsilon$. We then get a finite number of points $N$ such that any point in $x$ is $\epsilon$ away from $N$.
- Any two points in $N$ can be at most $N\epsilon$ apart. 
- Thus the total distance between any two points $s, t \in S$.
- Let $s, t$ be closest to points $n, n' \in N$.
- Thus $d(s, t) \leq d(s, n) + d(n, n') + d(n', t)$ all of which is bounded by
  $\epsilon + N\epsilon + \epsilon$ which is less than $(N + 3) \epsilon$.
- Thus we have established a bound of $(N + 3) \epsilon$.

#### For $\mathbb R$, bounded implies totally bounded

- Say set $S$ is bounded by distance $M$. 
- Trap $S$ inside an interval of side length $M$, WLOG suppose interval is from $[0, M]$.
- For any adverserial $\epsilon$, pick points at $[0, \epsilon, 2\epsilon, \dots, M]$.
  These points are the epsilon net which contain $M$:$S \subseteq M \subseteq N_\epsilon$
- The net only needs $M/\epsilon$ points which is finite. 
- See that this holds more generally for any $\mathbb R^n$ where we trap in a
  hypercube and sprinkle points.

#### In infinite dimensions, bounded need not be totally bounded.

- Classic example, sphere in $l^2$. It's clearly bounded by a constant $2$.
- we claim this is not totally bounded.
- Note that any two vectors $e_i, e_j$ have distance $\sqrt(2)$.
- Note that we have an infinite number of basis vectors $e_1, e_2, \dots$.
- Suppose it is totally bounded. Pick $\epsilon = \sqrt(2)/9999$ as an adversary. We get a finite
  neighborhood set $N_1$. 
- Thus, some point in $n \in N_1$ must have an infinite number of basis vectors trapped in it.
- so there must be two basis vectors in it, $e_i, e_j$ such that $e_i \neq e_j$.
- We must have that $d(e_i, e_j) < d(e_i,  n) + d(n, e_j)$ by triangle inequality.
- This gives us $\sqrt(2) < \sqrt(2)/9999 + \sqrt(2) / 9999$ which is a contradiction.
- Thus, the sphere is not totally bounded.

#### compact => closed + totally bounded in infinite dim also

- Let $S$ be a compact set.
- it is closed by the usual argument.
- We claim $S$ is totally bounded.
- Let adversary pick $\epsilon$.
- We must establish a finite number of points $N_\epsilon$ such that any point in $S$
  is in an $\epsilon$ neighbourhood of some $n \in N_\epsilon$.
- Reread that. that's literally compactness.
- Pick an open cover $O$ consisting of an $\epsilon$ ball for each point $s \in S$.
- Extract a finite subcover from this.
- This finite subcover is the finite $\epsilon$ net.
- Thus done. Compact is totally bounded.

#### closed + totally bounded => sequentially compact in infinite dim also

- Let $S$ be a closed and totally bounded set.
- We wish to show that it is sequentially compact.
- Let $x_i$ be a sequence in $S$.
- Perform the classical Bonzalo Weirstrass bisection construction.
- Since $S$ is totally bounded, we can pick any $\epsilon$ and get a finite epsilon net.
- We claim that a closed subset of a totally bounded set is also totally bounded.

```py
def mk_cauchy_sequence(S):
  k = 1
  while True:
    s = choose(S)
    yield s
    Nk = finite_epsilon_net(S=S, epsilon=1/2^k)
    # n โˆˆ Nk such that n has an infinite number of points from $S$ in 
    # the epsilon ball around $n$.
    n = hilbert-epsilon-choose(|{ n โˆˆ N : S โˆฉ Ball(n, epsilon) }| = infty)
    S = Ball(n, epsilon) # restrict to ball that has the inif
    k += 1
```



# Holonomic v/s non holonomic constraints

- A set of constraints such that the system under consideration becomes $TM$ where $M$ is the position space
  and $T_p M$ is the allowed velocities at position $p$ is a holonomic system
- A set of constraints such that the system under consideration *cannot* be thought of as $TM$ where $M$
  is the allowed positions. So we are imposing some artifical restrictions on the velocity of the system.
- Another restriction one often imposes is that constraint forces do no work.
- Under these assumptions, D'alambert's principle holds: the physical
  trajectory of the system is a constrained optimization problem: optimize the
  action functional of the free system restricted to paths lying on the
  constraint submanifold.
- [Reference: SYMPLECTIC GEOMETRY AND HAMILTONIAN SYSTEMS by E Lerman](https://faculty.math.illinois.edu/~lerman/467/v3.pdf)

# The Plenoptic Function

- What can we see because of light?
- Key idea: at each point $(x, y, z)$, we should be able to know, for all wavelenghts $\lambda$, the intensity
  of the wavelength in all directions $(\theta, \phi)$. Even more generally, this can vary with time $t$.
- Intuition: we should be able to reproduce at all points in spacetime, what happens if one builds a camera!
- This function $P(\theta, \phi, \lambda, t, x, y, z)$ is called as the *plenoptic function*.
- Notice that when one builds a pinhole camera, what one is doing is to, in fact, use the pencil
  of rays at that point to capture an image! Thus, the plenoptic function contains *all possible*
  pinhole images at all positions.
- The key conjecture of the paper "The plenoptic function and the elements of early vision" is that the
  visual cortex is extracting local changes / derivatives of the plenoptic function.

## Crash Course Radiometry
- Irradiance at a point: density of radiant flux (power) per unit surface area.
- Radiance at a point in a direction: density of radiant flux (power) per unit surface area per unit solid angle.

## Light field rendering
- See that if we restrict to only radiance of light at a fixed time $t_0$, then we have $(x, y, z, \theta, \phi)$,
  a 5 dimensional function.
- Also note that if there is no obstruction, then the radiance does not change along lines. So we can quotient
  $(x, y, z)$ to get a lower dimensional 4D field, given by
  $(\texttt{pos}_\theta, \texttt{pos}_\phi, \texttt{look}_\theta, \texttt{look}_phi)$.
- This 4D field is called as a light field.
- Alternatively, we can parametrize these by $(x_1, y_1)$ and $(x_2, y_2)$, and the paper canonically
  calls these as $(u, v, s, t)$. This coordinate system they call a _light slab_, and represents light starting
  from the point $(u, v)$ at the first plane and ending at $(s, t)$ at the second plane.

# Precision, Recall, and all that.

- Setting: we have some theorem goal $g$, a dataset of mathematical lemmas $D$, a set of actually useful
  lemmas $A$, and a set of predicted lemmas $P$.
- We want to provide a good measure of how "good" $P$
  is with respect to the ground truth $A$.
- We begin by defining *precision*, which is the fraction of $P$ that was correct: $|P \cap A|/|P|$. 
  Probabilistically, this can be seen as `P(actual|predicted)`.
- Simlarly, we define *recall* as the fraction of `A` that was correctly predicted: $|P \cap A|/|A|$.
  Probabilistically, this is `P(predicted|actual)`.
- Let us now change the setting a little bit, where we swap the set $P$ for a \emph{sequence} over the
  full universe of lemmas $A$.
- We can get a set by truncating $P$ at some threshold.
  So we will define `precision@k` to be the precision of the set $P[:k]$.
- We note that recall as a function of k, `recall@k` is non-decreasing. As we see more predictions, we can
  only get more actually useful things. See that recall has a fixed denominator (the size
  of the number of actually useful things).
- Since `recall@k` is non-decreasing, we can build an inverse, `k@recall(r)`. For a given level of recall `r`,
  we map to the smallest $k$ (smallest number of items we need to take from the ranking) to get that level of recall.
- This lets us define a precision-recall function, where `precision@recall(r) = precision@k(k@recall(r))`.

#### Precision, Recall formulae

- Suppose for a given goal $g$, we have 3 correct premises `a, b, c`. The universe has premises `a, b, c, x, y, z`.
  Our model predicts premises in the ranking `x, a, y, b, c, z`. Then we summarize this as follows:

```
Rank | Val | Prec | Rec
1    |  x  | 0    | 0/3
2    |  a  | 1/2  | 1/3
3    |  y  | 1/3  | 1/3
4    |  b  | 2/4  | 2/3
5    |  c  | 3/4  | 3/3
5    |  z  | 3/5  | 3/3
```

- Note that recall is monotonically non-decreasing, while precisoin both increases (`0 -> 1/2`) and
  decreases (`3/4 -> 3/5`).
- We introduce an auxiliary function delta, $\delta(i) \equiv \text{lemma at i is correct}$.
  This lets us write the above quantities as follows:
- Let $s(n) \equiv \sum_{i=0}^n \delta(i)$. 
- The total number of correct elements is $s(N)$ where $N$ is the total number of correct premises.
- The precision at $k$ is given by $p(k) \equiv s(k)/k$. The recall at $k$ is given by $r(k) \equiv s(k)/s(N)$.
- Now note that the discrete difference $dr(k) = r(k) - r(k-1)$, which equals $(s(k)-s(k-1)/s(N)$, which is $\delta(k)/s(N)$.

#### Mean Average Precision

- The best the `precision@recall` function can be is a flat line with `precision=1` for all levels of recall.
- Deviation from this tells us how much worse our model is from the best model.
- So, let's compute the area under the `precision@recall` curve. This is going to be the average precision,
  $ap \equiv \int_{r=0}^1 p(r) dr$.
- Recall that the "best" precision will always have `p(r) = 1`. Thus the theoretical maximum of this value
  we can have is $ap = \int_{r=0}^1 1 dr = 1$. This gives us a good scale, where $0 \leq ap \leq 1$.
- We use recall as a way to "standardize" across the size of the dataset by the recall.
- Let's change of variables into $k$. So we want to change $r$ into $r(k)$.
- This will change the bounds of integration.
- The lower limit is given by $0 = r(l)$ which solves for $l = 0$.
- The upper lmit is $1 = r(u)$ which solves for $u = N$ (the size of the dataset).
- This also changes $dr$ to $dr(k)dk$.
- In the discrete case, we set $dk = 1$, and $dr(k)$ becomes $r(k) - r(k-1)$.
  This is $\Sigma_{i=0}^k \delta(i)/s(N) - \Sigma_{i=0}^{k-1} \delta(i))/s(N)$ which evaluates to $\delta(k)/s(N)$.
- This gives us the discrete calulation of $ap$ to be $ap \equiv \Sigma_{k=1}^N p(k) \delta(k)/s(N)$.


##### Mean Average precision at K.

- I believe this to be an incoherent concept; Recall that we chose to define average precision
  as the _area under the precision recall curve_.  This is a sensible quantity, because it's a normalized
  value (recall is between $0$ and $1$). We got the expression in terms of $k$ _via a change of variables_.
  We **did not** start at $k$! We started from $r$ and got to $k$.
- We can try to hack the expression for $ap$ to artifically create $ap@K$. Let's try.
- First, to go from $k$ to $r$, we find a number $r_k$ such that the recall at $r(k) = r_k$
- Next, we calculate $ap@K \equiv \int_0^{r_K} p(r) dr$.
- We must now find we must find new lower and upper bounds in terms of $k$. 
- The lower bounds needs us to find  `0  = r(l)` or `l = 0`.
- The upper bound needs us to find `r_K = r(u)`, or `u = K`.
- We will next have to calculate $dr(k) dk$. Previously, we set $dk = 1$, and we calculated $dr(k) \equiv r(k) - r(k-1)$.
  This will give us $dr(k) \equiv \delta(k)/s(N)$. But note that $s(N)$ will count _all_ documents, not just limited to the top-K.
  So let's used $s(K)$ instead --- a hack!
- Combining these, we get the formula to be $ap@K \equiv \int_{0}^K p(r(k)) dr(k) = \Sigma_{k=0}^K p(k) \delta(k) / s(K)$.
- `ap@K` feels very unprincipled, and I don't feel that this carries mathematical weight.
- Is there an alternative derivatio that sheds light on why this formula makes sense?

#### R-precision

- Recall that the recall is a nondecreasing function of $k$. The precision can vary any way it wants with respect to $k$.
- We will try to find a point where precision equals recall. 
- Consider the equation $p(K) = r(K)$. Using our previous formulation, this reduces to
  $s(K)/K = s(K)/s(N)$. This of course gives us $K = s(N)$.
- So, at the index $K$ which is equal to the total number of correct lemmas, we will have the precision equal the recall.
- This value is called as the *R* precision: the precision $p(K)$ at the first index $K$ such that $r(K) = 1$.
- Empirically, this value $R$ correlates well with mean average precision.


#### $F_1$ Score

- A sensible derivation is offered by Van Rijsbergen in his PhD thesis.
- First, a quick and dirty definition: $F_1$ is the harmonic mean of precision and recall.
- This gives us `2/(1/p + 1/r)`, which works out to `2/[(tp + fp)/tp + (tp + fn)/tp]`.
- This simplifies to `2tp/(2tp + fp + fn)`.
- Now consider the quantity `E := 1 - F`. Think of `E` as `error`. This works out to `(fp + fn)/(2tp + fp + fn)`.
- See that this quantity works out the symmetric difference of the $A$actual and $P$redicted set divided by the 
  sum of the sizes of the $A$ctual and $P$redicted set! $A \Delta P \equiv fp + fn$, and $|A| = tp + fn$, and
  $|P| = tp + fp$. 
- Thus, the full expression for $E$ becomes $E \equiv (|A| \Delta |P|) / [|A| + |P|]$ which is a genuinely sensible quantity!

# Heine Borel

- Theorem: closed bounded subset of $\R^n$ is compact
- We will prove it for $\R$ and leave the obvious generalization to the reader.
- Key idea: recall that for metric spaces, compactness and sequential compactness are equivalent,
  so the proof must follow some ideas from Bolzano Weirstrass (sequence in closed bounded set has
  convergent subsequence).
- Recall that *that* proof goes by bisection, so let's try to bisect some stuff!
- Also recall why this fails in infinite dimensions: you can bisect repeatedly in "all directions" and
  get volume (measure) to zero, without actually controlling the cardinality. There is no theorem that says
  "measure 0 = single point". So, the proof must rely on finite dimension and "trapping" a point.
- Take an interval, say $[0, 1]$ and take a cover $\mathcal C$. We want to extract a finite subcover.
- For now, suppose that the cover is made up only of open balls $B(x, \epsilon)$. We can always reduce
  a cover to a cover of open balls --- For each point $p \in X$ which is covered by $U_p$,
  take an open ball $B_p \equiv B(p, \epsilon_p) \subseteq U$. A finite subcover of the open balls $\{ B_p \}$ tells us which $U_p$ to pick from the original cover.
- Thus, we shall now assume that $C$ is only made up of epsilon balls of the form $C \equiv \{ B(p, \epsilon_p) \}$.
- If $C$ has a finite subcover, we are done.
- Suppose $C$ has no finite subcover. We will show that this leads to a contradiction.
- Since we have no finite subcover, it must be the case that at $I_0$, there are an infinite number of balls $\{ B \}$.
  Call this cover of infinite balls $C_0$.
- Now, let the interval $I_1$ be whichever of $[0, 1/2]$ or $[1/2, 1]$ that has infinitely many balls from $C_0$.
  One of the two intervals must have infinite many balls from $C_0$, for otherwise $C_0$ would be finite, a contradiction.
  Let $C_1$ be the cover of $I_1$ by taking balls from $C_0$ that lie in $I_1$.
- Repeat the above for $I_1$. This gives us a sequence of nested intervals $\dots \subset I_2 \subset I_1 \subset I_0$,
  as well as nested covers $\dots \subset C_2 \subset C_1 \subset C_0$.
- For each $i$, pick any epsilon ball $B_i(p_i, \epsilon_i) \in C_i$.
  This gives us a sequence of centers of balls $\{ p_i \}$. These centers must
  have a coverging subsequence $\{ q_i \}$ (by bolzano weirstrass) which converges to a limit point $L$.
- Take the ball $B_L \equiv (L, \epsilon_L) \in C$ which covers the limit point $L$.
- Since the sequence $\{ q_i \}$ is cauchy, for $\epsilon_L$, there must exist a natural $N$ such that for all 
  $n \geq N$, the points $\{ q_n : n \geq N \} \subseteq B_L$.
- Thus, we only have finitely many points, $q_{\leq n}$ to cover. Cover each of these by their own ball.
- We have thus successfully found a covering for the full sequence!

# The conceit of self loathing

>  Self-contempt is defined as the conceit of thinking "I am inferior" and
>  involves a sharp sense of oneโ€™s baseness and inadequacy vis-a-vis others.
>  There is even an excessive variety of it called "self-abasement"
>  which is a conceit wherein, asserts that one is inferior even to inferior
>  persons.

> Conceit, as it is generally understood, is waving oneโ€™s flag or banner highest
> over others and drawing attention to oneself. Even if one
> is asserting oneโ€™s inferiority, one is still engaged in a display of
> self-advertisement.

> "I am," "I shall be," "I might be," and "would that I
> might be." These four possibilities describe how I conceive of myself at present,
> how I might be in the future, how I might imagine myself either in doubt or
> speculation, and how I might plan to be.


# Inverse scattering transform

- Useful to solve nonlinear PDE
- classical example is the shock wave equation / burges equation: $u_x + u u_t = 0$
- [References](https://en.wikipedia.org/wiki/Inverse_scattering_transform)

### KdV equation

- Equation for shallow waves.
- Example of nonlinear PDE that can be solved exactly.
- The geometry of the KDv equation describes this on a circle


# Differentiating through sampling from a random normal distribution

- Credits to [Edward Eriksson) for teaching me this. 
- The key idea is that since we can write the normal distribution with parameters
  mean $\mu$ and variance $\sigma$ as a function of the standard normal distribution.
  We then get to believe that the standard 

- $y = f(\sigma z)$ where $z \sim N(0, 1)$. 
- Then, by treating $z$ as a constant, we see that $dy/d\sigma = f'(\sigma z) \cdot z$ by chain rule.
- That is, we treat $z$ as "constant", and minimize the $\sigma$. 
- My belief in this remains open until I can read a textbook,
  but I have it on good authority that this is correct.
- How does this relate to the VAE optimisation? It's the same trick, where we claim that
  $sample(N(0, 1))$ can be held constant during backprop, as if the internal structure of the $sample$
  function did not matter. Amazing.


```py
#!/usr/bin/env python3
import numpy as np

sigma = 1.0

# # function we are minimising over
# def f (x): return - x*x
# # derivative of function we are minimising over
# def fprime(x): return -2*x

# function we are minimising over
def f (x): return np.sin(x + 0.1)

# derivative of function we are minimising over
def fprime(x): return np.cos(x + 0.1)

# f(sigma z) = f'(sigma z) z.
# \partial_\sigma E[f(X_\sigma)] = E[\partial_\sigma f(X_\sigma)]
for i in range(1000):
    z = np.random.normal(0, 1)
    # sample from normal distribution with mean 0 and standard deviation sigma
    sz = sigma * z
    # evaluate function at x
    fx = f(sz)
    gradfx = fprime(sz)

    # update sigma
    # z2 = np.random.normal(0, 1)
    dsigma = gradfx * z

    print("z = %5.2f | f = %6.2f | df = %6.2f | sigma = %6.2f | dsigma = %6.2f" %
        (z, fx, gradfx, sigma, dsigma))
    sigma = sigma - 0.01 * dsigma
```


# BOSCC Vectorization

- Branch on superword conditional codes.

# Autodiff
- Activity analysis

# Vector Bundles and K theory, 1.1

- We define a sphere (in 3D) by all points with distance $1$ from the original. Call this $M$.
- The tangent plane $T_p M \equiv \{ v | v \perp p \}$.
- Define $TM \equiv \cup_{x \in M} T_M$
- We have a projection map $p$ from $TM \to M$ which sends the point $(x, v)$ to $x$.
- for a point $x \in X$, we define $U(x)$ to be the hemisphere with apex $x$. This is the portion of the sphere
  on one side of the hyperplane that is perpendicular to $x$.
- We want a map $p^{-1}(U(x))$ to $U(x) \times p^{-1}(x)$. The right hand is the same as $U(x) \times T_x M \times \{ x \}$,
  which is the same as $U(x) \times T_x M$.
- for a given $(y, v) \in TM$, that is, for a given $v \in T_y M$, we map it to $U(x) \times T_x M$ by sending $y \mapsto y \in U(x)$,
  and by orthogonally projecting the vector $v \in T_y M$ onto the tangent plane $T_x M$.
- Intuitively, we keep the point $y$ the same, and map the tangent plane $T_y$ to its orthogonal projection onto $T_x$.
- Since we know the basepoint $y$, it is clear that we can reconstruct the projection operator from $T_y$ to $T_x$, and that this
  operator is linear and surjective, and thus invertible.
- This shows us that what we have is really a fiber bundle, since we can locally straighten  the $p^{-1}(U(x))$ into the trivial bundle.
- Proof?

# Equicontinuity, Arzela Ascoli
- A sequence/family of functions are said to be equicontinuous if they vary equally in a given nbhd
- Necessary for Arzela Ascoli
- A subset of $C(X)$, space of continuous functions on a compact Hausdorff
  space $X$ is compact iff if it is closed, bounded, and equicontinuous.
- Corollay: a sequence of $C(X)$ is uniformly convergent iff it is equicontinuous.
- Thus, an equicontinious family converges pointwise, and moreover, since it is uniform convergence, the limit will also be
  continuous.

#### Uniform boundedness Principle

> The uniform boundedness principle states that a pointwise bounded family of
> continuous linear operators between Banach spaces is equicontinuous.

#### Equicontinuity of metric spaces

- Let $X, Y$ be metric space.
- The family $F$ is equicontinuous at a point $p \in X$ if for all $\epsilon > 0$,
  there is a $\delta > 0$ such that $d(f(p), f(x)) < \epsilon$ for all $f \in F$
  and all $d(p, x) < \delta$.
- Thus, for a given $\epsilon$, there is a uniform choice of $\delta$ that works for all *functions*.
- It's like uniform continutity, except the uniformity is enforced acrorss the _functions_ $f_i$, not on the
  points on the domain.
- The family $F$ is pointwise equicontinuous iff it is equicontinuous at each point $p$.



# Sobolev Embedding Theorem [WIP]

- Intuitive Statement: Bound on norm of derivatives gives bound on function norm
- Intuition: On a closed compact set, function can only grow as much as the derivative lets it grow.


# Hahn Banach Theorem [WIP]


# Method of Characteristics [WIP]


# Eikonal Equation [WIP]

#### 1D
#### nD

- Since everything is determined by 1D parametrization (curves)
- $|\grad \phi(r)| = 1/v(r)$
- $r$ is position in space.
- $v(r)$ is the speed of light at position $r$.
- $\phi(r)$ represents the travel time of a wavefront to reach $r$.
- So, the gradient of the travel time is

#### References
- [Video on eikonal equation](https://www.youtube.com/watch?v=G1LOsvGGQos)



# Practical example of semidirect product

```
-- | represents a string with indexes into it.
type IndexedString = (String , [Nat])
```

- If we combine the strings together as `t = s1 + s2`, how do the indexes of `s1, s2` change to become an index of `t`?

```
instance Semigroup IndexedString where
  (IndexedString s1 ixs1) <> (IndexedString s2 ixs2) =
    (s1 <> s2, ix1 <> ((+) (length s1)) <$>  ixs2)
```
- See that we "twist" the indexes of the second by the length of the string.
  Because we are sticking the "origin" of the second to the "tip" of the first,
  we need to change the indexes to point to the "new" origin.
- One can check that this is going to be a semidirect product.
- In general, if we have data, pointers into the data, and ways to combine the data,
  then the bundled up abstraction of (data+pointers into data) should have a semidirect
  structure.

# Algebraic graph calculus
- http://gabarro.org/ccn/algebraic_graph_calculus.html
- The gradient corresponds to the incidence matrix, which takes values on
  vertices and spits out values on edges.

# Change of basis from triangle x y to barycentric

- If we have $\int_T f(x, y)dx dy$ for a triangle $T$, we would often like to change to
  barycentric coordinates to compute $\int_{p=0}0^1 \int_{q=0}^p f(p, q) dp dq$. But what is the relationship
  between these two integrals?
- Note that when we parametrize $p, q$ by as $\{ (p, q) : p \in [0, 1], q \in [0, p] \}$, we are
  drawing a right triangle whose base is on the $x$ axis.

# Lean4 access metam and so forth

```
#eval show Lean.MetaM _ from do
  return 0
```

# Harmonic function

- Solve Poisson's equation (wave equation) : `ฮ” f = 0`.
- is a vector field $V = โˆ‡f$ arose from a potential, then it
  satisfies $โˆ‡ x(โˆ‡f) = 0$ (by the exact sequence).

# Lax Milgram theorem

- The theorem states that given a system $B(u, -) = f(-)$, where $B$
  is a linear, bounded, coercive operator, then a unique solution exists
  and this solution depends continuously on $f$.
- This is useful to solve elliptic equation problems, for which one can find
  some kind of inner product $B(., .)$ that represents "energy".

# Why L2 needs a quotient upto almost everywhere

- We want a norm to have the property that $|x| = 0$ if and only if $x = 0$.
- But in a function space, we can have nonzero functions taht have measure zero. eg. the function
  that is $1$ on $\mathbb Q$ and zero everywhere else.
- Thus, such functions are $f \neq 0$ such that $|f| = 0$.
- To prevent this and to allow the L2 norm to really be a norm, we quotient by
  the closed subspace of functions such that $|f| = 0$.
- This has the side effect such that $f = g$ iff $|(f - g)| = 0$, or that functions agree
  almost everywhere.

# Repulsive curves

#### Gradient depends on norm

- consider a function $f: \mathbb R^n \to \mathbb R$ and an energy $e: (\mathbb R^n \to \mathbb R) \to \mathbb R$.
- We want to optimize $df/dt = - grad e(f)$.
- however, what even is $grad$?
- Recall that $de$ is the differential which at a point $f$ on the space $\mathbb R^n \to \mathbb R$, in a tangent direction $u \in \mathbb R^n \to \mathbb R$,
  computes  $de|_f(u) \equiv \lim_{\epsilon \to 0} (e(f + \epsilon u) - e(f))/\epsilon$.
- Now the gradient is given by $\langle grad(e), u \rangle_X \equiv de(u)$. So the gradient is the unique vector such that the inner product with the
  gradient produces the value of the contangent evaluated in that direction.
- Said differently, $\langle grad(e), -\rangle = de(-)$. This is a Reisez like representation theorem.
- Note that asking for an inner product means we need a hilbert space.
- One choice of inner product is given by $L^2$, where $\langle u, v \rangle_{L^2} \equiv \int \langle u(x), v(x) \rangle dx$.
- More generaly, we can use a Sobolev space, where we define the inner product given by
  $\langle u, v\rangle_{H^1} \equiv \langle \grad u, \grad v\rangle_{L^2}$,
  which can also be written as $\langle \Del u, v\rangle_{L^2}$.
- Similarly, for the sobolev space $H^2$, we would use $\langle u, v\rangle_{H^2} \equiv \langle \Del u, \Del v\rangle_{L^2}$. which is equal
  to $\langle \Del^2 u, v \rangle_{L^2}$.
- In general, we can write our inner product as something like $\langle Au, v\rangle_{L^2}$.

#### Solving heat equation with finite differences

- Solving $df/dt = \Del f = d^2 f / dx^2$.

> If we try to solve this equation using, say, explicit
> finite differences with grid spacing h, we will need a time step of size O(h 2)
> to remain stableโ€”significantly slowing down computation as the grid is refined


#### Different norm is good for different situations

- TODO

#### Tangent point energy

- Key intuition: want energy that is small for points that are "close by" in terms of $t$ on the knot,
  want energy that repels points on the knot that are far away in terms of $t$ by close by in terms of $f(t)$.
- TODO



# Why NuPRL and Realisability makes it hard to communicate math

- [Superb answer by jon sterling](https://proofassistants.stackexchange.com/questions/1012/can-mathematical-formalizations-in-nuprl-be-trusted-as-correct-in-the-greater-ma/1046#1046)


> To me the difficulty with relating Nuprl to mathematics is basically one of
> methodology. As Andrej says, Nuprl's Computational Type Theory is based on
> "truth in one model"; as a result, there are many things that are true in
> this specific model that are false in the category of sets, false in many
> categories of presheaves, and false in many categories of sheaves. This is
> not the fault of (e.g.) realizability semantics, but rather the fault of
> confounding syntax and semantics. Both are important, but semantics benefits
> from multiplicity --- and the multiplicity of semantics is embodied in
> syntax. We can therefore expect strange results if we say that syntax is just
> a way to speak about one single example of semantics.


> So my aim is not to say "realizability is bad" --- realizability is clearly
> very good. But I think it is bad on balance to base a proof assistant on one
> single model (bad in ways that COULD NOT have been anticipated [clarification:
> by that community] in the early 1980s when this was going on!) because it
> limits the applicability of your results.

> Because Nuprl incorporates axioms that are not true in ordinary math, nor in
> the relative ordinary math of topoi, we cannot take a Nuprl proof about groups
> and use it as evidence to a "proper mathematician" for the truth of that
> statement about groups in a way that applies to that mathematician's work. This
> limits the ability to communicate and re-use results, but that is to me the
> entire point of mathematics.

> I want to end by saying that my perspective on mathematics is not the only one.
> Nuprl is much inspired by the ideas of L.E.J. Brouwer who took a very different
> viewpoint --- a proof in Brouwer's style about groups also does not necessarily
> lead to evidence that a mathematician would accept for the truth of that
> statement about groups. But Brouwer's perspective was that all the
> mathematicians were wrong, and that only he was right. If that was actually so,
> then one could not blame him for doing his proofs in a way that was not
> backward compatible.

> Therefore, the question that Nuprl raises is nothing less than: is mainstream
> mathematics wrong? Back when I was building tools based on Nuprl, I believed
> that normal mathematics was wrong. I no longer believe that though.

# Lean does not allow nested inductive families


- The checker is defined in terms of reduction to plain inductives, although
  the reduction itself is not performed before going to the kernel (it was in
  lean 3 but this lead to performance issues).
- The recursor for the type is basically "whatever the analogous mutual
  inductive would have".

```
inductive Const : Type _ | mk
inductive Const1 (t: Type _) : Type _ | mk : Const1 t
inductive E : Const โ†’ Type
| mk : {c : Const} โ†’ (args : Const1 (E c)) โ†’ E Const.mk
-- (kernel) invalid nested inductive datatype 'Const1',
-- nested inductive datatypes parameters cannot contain local variables.
```

# Lean `isDefEq` is undecidable

- Written down in Mario's thesis.

# Lean subject reduction is broken

- It only happens if one quotients a proposition
- It probably also happens because the implementation of `isDefEq`
  is a heuristic, because `isDefEq` is undecidable.


# Weakly implicit arguments in Lean

```
variables {ฮฑ : Type} (f : ฮฑ โ†’ ฮฑ)
def injective {ฮฑ ฮ’: Type} (f: ฮฑ โ†’ ฮฒ) : Prop := 
  โˆ€ {{x y}}, f x = f y โ†’ x = y -- NOTE: weakly implicit
def injective2 {ฮฑ ฮฒ : Type} (f : ฮฑ โ†’ ฮฒ) : Prop :=
    โˆ€ {x y}, f x = f y โ†’ x = y -- NOTE: implicit

def foo (h: injective f) : false := sorry
example (h: injective f) : false := 
begin
  have := @foo,
  unfold injective2 at *,
  exact this f h
end


def bar (h : injective2 f) : false := sorry
example (h : injective2 f) : false :=
begin
  have := @bar,
  unfold injective2 at *,
  exact this f h
end
```

The error becomes:

```
type mismatch at application
  this f h
term
  h
has type
  f ?m_1 = f ?m_2 โ†’ ?m_1 = ?m_2
but is expected to have type
  โˆ€ {x y : ฮฑ}, f x = f y โ†’ x = y
```

# Big list of elf file munging / linker / ABI

- `nm`: list symbols in file.
- Useful tools are available at [binutils](https://www.gnu.org/software/binutils/)
- `readelf -a <file>`: see everything in an ELF file.
- `ldd <file>`: see shared libraries used by an ELF file.
- `file <file>`: shows filetype info of a given fuile.
- `objdump <file>`

#### `objdump` versus `readelf`:


- Both programs are capabale of displaying the contents of ELF format files,
  so why does the `binutils` project have two file dumpers ?
- The reason is that objdump sees an ELF file through a BFD filter of the
  world; if BFD has a bug where, say, it disagrees about a machine constant
  in `e_flags`, then the odds are good that it will remain internally
  consistent.  The linker sees it the BFD way, objdump sees it the BFD way,
  GAS sees it the BFD way.  There was need for a tool to go find out what
  the file actually says.
- This is why the readelf program does not link against the BFD library - it
  exists as an independent program to help verify the correct working of BFD.
- `readelf` is arch. independent, `objdump` needs the appropriate toolchain.
- [Stack overflow reference for difference between objdump and readelf](https://stackoverflow.com/a/8979687/5305365)


# Regular epi and regular category

- A regular epi `c->d` means that there is a kind of relation on `c` (concreteley,
  an object `R` and two morphisms `f: R -> c` and `g: R -> c`) such that `d` is `c` module `R`, i.e. the quotient of `c` by `R`
- A regular category is one where every arrow has a (regular epi-mono) factorization.

# Focal point

- The focal point of a space is a point whose only open nbhd is the whole space.
- In the sierpiski space `(), bottom`, the `bottom` is the focal point.
- In a local ring, the focal point is given by the maximal ideal (in the prime spectrum, ofc).
- Given any topological space $T$, consider the cone: (ie, take product with
  $[0, 1]$ and smash all the $\{0\} \times *$ together).
- Given any topological space $T$, now build the scone: take the product with
  the sierpinski space, and smash everything with the closed point. Then, the
  apex of the cone / the closed point becomes a focal point for the topological
  space. This can be seen as a
  "one point focalization".


# Operational versus Denotational semantics

> I think if you tell people that denotational semantics is just model theory for
> programming languages you've got most of the way there.

> Another consequence of this perspective is that you *must* care about
> nonstandard models, even if you think you don't! When you prove something by
> natural number induction, you are precisely constructing a non-standard model
> of Nat in Prop.


# Minimising L2 norm with total constraint

- Suppose we are trying to minimize $x^2 + y^2$ subject to $x + y = 10$.
- We can think of $(x, y)$ as two points located symmetrically about $5$, suppose
  it is $x = (5 + \epsilon)$ and $y = (5 - \epsilon)$.
- See that the function $f(k) = k^2$ is such that the output becomes larger as we go to the
  right / increase the argument than the rate at which the output becomes smaller
  as we go to the left / decrease the argument.
- This is clear by computing $\partial_k f = 2k$, which means that if $k_r > k_l$ (right/left), then
  $\partial_{k_r} f = 2 k_r$, while $\partial_{k_l} f = 2 k_l$, so if we step to the
  left and the right by $\epsilon$, keeping the total the same, the sum will change by
  $(2 k_r - 2 k_l) \epsilon > 0$.
- Said differently, because the function is convex / $f''(x) > 0$, this means that
  $\partial_k|_r f > \partial_k|_l f$, and thus we can trade the loss of the total
  from moving to the left (a $- \partial_k|_l \epsilon$ for the gain of the total
  from moving to the right (a $+ \partial_k|_r \epsilon$).

- Picture:

```
          * dx=1.2
         /|---->
        - |
       /  |
     --   |
*---/     |
-dx=0.8   |
  <-|     |
    |    x=0.6
   x=0.4
```

- We gain more by moving rightwards (in terms of $f(r+dx) \simeq f(r) + f'(r) dx = f(r) + 2f(r)dx$ than we lose by
  moving leftward (in terms of $f(l-dx) \simeq f(l) - f'(l) dx = f(l) - 2f(l) dx$. Since $f(r) > f(l)$, the total
  we gain is still net positive.
- Said differently again, we gain faster by moving from a point that is rightwards, than the rate at which
  we lose  from a point that is leftwards.
- Said differently again, the elevation gain is larger towards the right, so a small motion rightwards gains
  us more elevation than a small motion leftwards loses elevation.

#### How does this relate to convexity?

- What is the geometric intution for this being related to "below a line"?

# Bounding L2 norm by L1 norm and vice versa

- We can bound a function along the x-axis (in its domain) or along the
  y axis (in its range).

#### Bounded domain
- If a function's norm is well defined in a bounded domain, then it has not increased
  too rapidly.
- Intuitively, with L2 norm, large numbers become larger, thus it is "harder" for a function
  to stay finite.
- Thus, in bounded domain, L2 is a subset of L1.

#### Bounded range

- If a bounded function's norm is well defined on an unbounded domain, then it has
  vanished to zero sufficiently quickly.
- Intuitively, with L2 norm, smaller numbers becomes smaller, thus L2 allows functions
  to decay faster (eg. $|1/n|_1 = \sum_k 1/k = \infty$ versus $|1/n|_2 = \sqrt{\sum_k 1/k^2 < \infty}$.
- Thus, in bounded range, L1 is a subset of L2, because L2 allows more functions to decay to zero.

- intuitively, l2 error will fail to reduce small values.
- but l1 error will reduce all values equally.
- thus, l1 norm is larger than l2 norm.

# Example of unbounded linear operator

#### Differentiation

- Simplest example is differentiation.
- Let $C^0[0, 1]$ be continuous functions on interval, and $C^1[0, 1]$ be differentiable functions on the interval.
- We equip both spaces with sup norm / infty norm.
- Consider the differentiation operator $\partial : C^1[0, 1] \to C[0, 1]$.
- Since every differentiable function is continuous, we have that $C^[0, 1] \subseteq C[0, 1]$
- Clearly differentiation is linear (well known).
- To see that the operator is not bounded, consider the sequence of functions $f_n(x) \equiv sin(2\pi nx)$.
- We have that $||f_n||_\infty = 1$ for ann $n$, while the $||\partial_x f_n||_\infty \to \infty$, so clearly, there
  is no constant $M$ such that $||\partial f_n(x)|| \leq M ||f_n(x)||$. Thus, the operator is unbounded.
- Note that in this definition, the space $C^1[0, 1]$ is *not* closed, as there are sequences of differentiable
  functions that coverge to non differentiable functions. Proof: polynomials which are differentiable functoins
  are dense in the full space of continuous functions.
- Thus, in the case of an unbounded operator, we consider $L : U \to X$ where
  $U$ is some subspace of $X$, not ncessarily closed!
- If we ask for an everywhere defined operator, then constructing such
  operators $L : X \to X$ needs choice.

#### Nonconstructive example

- Regard $\mathbb R$ as a normed vector space over $\mathbb Q$. [Cannot call this a banach space, since a banach
  space needs base field $\mathbb R$]
- Find an algebraic basis $B$ containing the numbers $1$ and $\pi$ and whatever else we need.
- define a function $f: \mathbb R \to \mathbb R$ such that $f(\pi) = 0$, and $f(1) = 1$, and extend everywhere else
  by linearity.
- Now let $p_i \in \mathbb Q$ be a sequence of rationals that converge to $\pi$. Then $f(p_i) = 0$, and thus $\lim_i f(p_i) = 0$,
  while $f(\lim_i p_i) = f(\pi) = 1$. This shows that $f$ is not continuous, but is linear.


# Direct sum of topological vector spaces

- In vector spaces, direct sum (also direct product) needs projection functors $\pi_1, \pi2_: V \to X, Y$
  such that $X \times Y = V$.
- In topological vector spaces, these projections also need to be *continuous* which is a massive
  thing to ask for.

#### Direct sum need not be closed.

- Let $X$ be a Hilbert space with schrauder basis (topological basis) $e[i]$
- Consider subspaces spanned by the basis $A[k] \equiv e[2k]$, and $B[k] \equiv e[2k] + e[2k+1]/(k+1)$.
  So $A \equiv span(A[k])$, $B \equiv span(B[k])$.
- Clearly, $A, B$ are subspaces, $A, B$ are closed.
- See that the closure of $A + B$ is the full space, since it contains the hamel basis $e[i]$.
- However, also see that the vector $z \equiv  sum_k e[2k]/(k+1)$ is not in $A + B$.
  If we tried writing it as $a + b$, then we woulnd need $b \equiv \sum_k B[k]$. But this sum does not converge.
- This means that $(A + B)$ is not closed. If it were closed, it would contain the full space (because it's dense).
- Thus, we have an example of the direct sum of two closed subspaces which is not closed, because it is dense.

# Subspaces need not have complement

- Clearly, one can have open subspaces that cannot be complemented. For example,
  the subspace of polynomials in $C[0, 1]$ is dense, and thus has no complement, as a complemented
  subspace must be closed.

#### Closed subspace need not have complement

- Apparently, in $l^\infty$, the subspace $c_0$ of sequences that converge to
  zero does not have a complement.
- Proof is given in a paper "projecting $m$ onto $c_0$"

#### Lemma: countable set $I$ has uncountable family of countable subsets $S$ which are almost disjoint

- Let $I$ be countable.
- We must prove that (1) there exists a $S \subset 2^I$ that is uncountable, such that (2) every set $K \in S$ is countable, and
  (3) every two sets $K, L \in S$ have finite intersection $K \cap L$.
- Proof: let $I$ be rationals in $(0, 1)$. For each irrational  $r \in (0, 1)$ create a set
   $S_r$ to be the set of sequences of rationals that converge to $r$.
- TODO: why is each $S_r$ countable?

#### Proof of theorem

- Suppose that there is a continuous projection of $l^\infty$ into $c_0$,
- Then we must have $l^\infty = c_0 \osum R$ for some closed subspace $R$.
- Since $l^\infty / c_0$ is isomorphic to $R$.
- TODO

# $L^\infty$ is HUGE

- Key insight: if we take any space like $L^1$ or $L^2$ or something, the terms need to eventually vanish.
- This is a small subspace $c_0$ of $L^\infty$, which is the subspace of sequences that eventually vanish.

#### Continuous functions are dense in $L^1$
#### Continuous functions are dense in $L^2$
#### Continuous functions is NOT dense in $L^\infty$

# Banach space that does not admit Schrauder basis

- Schrauder basis is a basis where we can get all elements uniquely by taking countable sequence of
  elements from the basis.
- Apparently, every banach space that obeys the approximation property has countable basis.
- An example of a banach space without this properly was given in
  "a counterexample to the approximation problem in banach spaces" by Per Enflo.
- Thus, in no argument can we say "assume we have a (schrauder) basis..."



# Open mapping theorem

- Given a surjective continuous linear map $f: X \to Y$, image of open unit ball is open.
- Immediate corollary: image of open set is open (translate/scale open unit
  ball around by linearity) to cover any open set with nbhds.

#### Quick intuition
- Intuition 1: If the map $f$ we bijective, then thm is reasonably believeable given
  bounded/continuous inverse theorem, since $f^{-1}$ would be continuous, and thus would
  map open sets to open sets, which would mean that $f$ does the same.
- In more detail: suppose $f^{-1}$ exists and is continuous. Then $f(U) = V$
  implies $(f^{-1})^{-1}(U) = V$. Since $f^{-1}$ is continuous, the iverse image of an
  open set ($U$) is open, and thus $V$ is open.

#### Why surjective
- Consider the embedding $f : x \mapsto (x, x)$ from $\mathbb R$ to $\mathbb R^2$.
- The full space $\R$ is open in the domain of $f$, but is not open in $\mathbb R^2$,
  since any epsilon ball around any point in the diagonal $(x, x)$ would leak out of the diagonal.
- Thus, not every continuous linear map maps open sets to open sets.

#### Proof

- TODO

# Closed graph theorem

- the graph of a function from a banach space to another banach space is a
  closed subset iff the function is continuous.
- Formally, given $f:X \to Y$, the set $G \equiv  { (x, f(x)) }$ is closed
  in $X \times Y$ iff $f$ is continuous.

#### Proof: Continuous implies closed

- We must show that every limit point of the graph G is in G.

- Let $(p, q)$ be a limit point. Since everything in metric spaces is equivalent
  to the sequential definition, this means that $(p, q) = \lim (x_i, f(x_i))$.

- Limits in product spaces are computed pointwise, so $(p, q) = (\lim x_i, \lim f(x_i))$

- Thus, $p = lim xi$ from above. Now we calculate:

- $q = \lim f(x_i) = f (\lim xi) = f(p)$ where we use the continuity of $f$ to
  push the limit inside.
- Thus, $(p, q) = (p, f(p))$ which is an element of $G$.
- So an arbitrary limit point $(p, q) \in G$ is an element of $G$, and thus G
  is closed. Qed.


#### Proof: closed implies continuous


- Suppose G as defined above is a closed set. We must show that f is
  continuous, ie, $f$ preserves limits.

- Let $x_i$ be a sequence. We must show that $f(\lim x_i) = \lim f (x_i)$.

- Consider $(x_i, f(x_i))$ as a sequence in $G$. Let the limit of this sequence
  be $(p, q)$. Since G is closed, $(p, q)$ in G. By defn of $G$, $q = f(p)$.

- Now we compute that

$$
\lim (x_i, f(x_i)) = (p, q)
(\lim x_i, lim f (x_i)) = (p, q)
\lim x_i = p, \lim f(x_i) = q
$$

- But since $q = f(p)$ (by defn of $G$), we have that
  $lim f(x_i) = q = f(p) = f(\lim x_i)$ which proves continuity.

# Bounded inverse theorem

- Theorem: Every bijective bounded linear operator has bounded inverse.
- Equivaently: Every bijective continuous linear operator has continuous inverse.
- Proof: quick corollary of open mapping. Let $L: X \to Y$ be
  bijective bounded linear operator.
- Assuming open mapping, we know that $T$ maps opens $U$
  to open sets. Recall that bounded iff continuous. Thus, we can show
  that $T \equiv L^{-1} : Y \to X$ is continuous to show that $L$ is bounded.
- We need to show that inverse images of open sets under $T$ is open.
  Specifically that $T^{-1}(U \subseteq X)$ is open for $U$ open
- Since $V \equiv L(U)$ is open as $U$ is open and $L$ is an open map, this means that
  $V \equiv T^{-1}(U)$ is open, as $L = T^{-1}$. Hence done.


# Nonexistence of solutions for ODE and PDE

- ODE system, no bc: always solution by picard liendolf
- ODE system, with boundary cond:,  can have no solution. Eg. $f'(x) = 0$, with
  boundary conditoin $f(a) = 0, f(b) = 1$.
- PDE system, no bc: can still create no solutions!
- PDE system, with boundary cond: can have no solution because ODE is PDE.

#### Example 1 of PDE with no solutions

- Take a vector field on $\mathbb R^2$ with $V(x, y) = (-y, x)$. This vector field
  has concentric spirals.
- consider this vector field as a PDE, so we are looking for a function $f$ such that $\grad f = V$.
- No such potential function can exist, because this vector field allow us to extract work.
- Suppose such a potential exists. Then if I travel in a circle, according to the potential, net work
  is zero. But if I evaluate the integral, I will get work done. Thus, no soln exists!
- In general, asking **if a differential form is exact** is literally asking for a PDE to be solved!
- In this case, the form is also **closed**, since it's a 2D form on a 2D surface. This is an example
  of a closed form that is not exact.
- It's nice to see PDE theory and diffgeo connect

#### Example 2: use second axis as time

- Consider a PDE on a square $[0, 1]\times [0, 1]$. We will think of the first axis as space
  where the function is defined and the second axis as time where the function is perturbed.
- We start by saying $\partial f / \partial x = t$. So the function at $t=0$ is constant, and at $t=1$ is linear.
- Next, we say that $\partial f / partial t = 0$. This means that the function is not allowed to evolve through time.
- This is nonsensical, becase at $t=1$, we expect a constant function to have become a linear function, but along the time axis,
   we say that no point in space can change.
- Thus, this DE has no solutions!
- We can use the extra dimensions available in a PDE to create "conflicting" data along different time axes.

# Baire Category Theorem

- Dense set: set whose closure is full space
- Baire Category theorem: Intersection of countably many
  dense sets is dense (countable interesction of chonk is chonk)

#### Proof
- Let $D[i]$ be family of dense sets.
- Denote $C(\cdot)$ for the closure of a set.
- Let $p \in W$, we need to show that $p \in C(\cap_i D[i])$.
- So for any $\epsilon$, we need to show that there is a point $w$ (for witness)
  that is $\epsilon$ close to $p$ in $\cap_i D[i]$.
- So we need to show that $w$ is in each of the $D[i]$.
- We will create a sequence of points that will converge to $w$

```py
def find_witness(Ds, p, eps):
 """returns a point 'w' such that `w โˆˆ Ds[i] โˆ€i` and that `|p - w| < eps`."""
 seq = []
 cur = D[0].get_close_pt(p, eps/3)) # cur is 'eps/3' close to p.
 d_scale = 1 # current distance is eps / 3^d
 yield cur
 # loop invariant: 'cur' is eps/3^d from p
 for k in range(1, infty):
   for i in range(0, k):
     d += 1
	 # 1. next is closer to to point than 'cur' is by (1/3)
     next = D[i].get_close_pt(cur, eps/3**d)
	 # 2. (cur, next) distance is a monotonic decreasing function of 'd',
	 #      so d(cur, next) > d(next, next')
	 # Proof:
	 # - dist(cur, next) <= dist(cur, p) + dist(p, next)
	 # - dist(cur, next) <= eps/3**(d) + eps/3**(d+1) <= 4 eps / 3**d
	 # - this dist(cur, next) is monotone decreasing in d, and thus
	 #   sequence is cauchy.
	 yield cur
	 cur = next
```


#### Corollary 1: Space cannot be union of non chonk

- A nowhere dense set is a set whose closure is empty.
- A meagre set/ a set of first category
  is a set which can be written as a countable union of
  nowhere dense sets.
- A non meagre set/a set of second category is
  a set which cannot be written in this way
- Baire Category: A complete metric space is non-meagre / second
  category in itself.


##### Proof of Corollary 1

- By contradiction, assume that $X$ is the union of nowhere dense sets $N_i$.
- complement the set $N_i$ to get $D_i \equiv X - N_i$.
- We claim that the sets $D_i$ are dense.
- By baire category theorem, $\cap D_i$ is dense.
- But this means that $(\cap D_i)^C$ is nowhere dense, that is, $\cup D_i^C = \cup C_i$ is nowhere dense.
- If $\cup C_i$ is nowhere dense, then $(\cup C_i) \neq X$.



#### Corollary 2: One of union is chonk
- Baire Category, stmt 2: If $X$ is the union of a countable family
    of closed subsets, then at least one of the closed subsets
    contains an open set


##### Proof of Corollary 2

- TODO

#### Abstract Use: Swap Quantifiers

- Let $D$ be an enumerable set and $X$ a complete metric space.
- We have some statement $\forall x \in X, \exists  d \in D, P_d(x)$.
- We get a uniform statement: $\exists d \in D, \forall x \in X, P_d(x)$
- To do this, we first define $F_d \equiv \{ x \in X : P_d(x) \}$ (filter $x$ by $P_d$).
- If we are lucky, then each of the $F_d$ are closed. Since $d \in D$ is enumerable,
  and we have that $X = \cup_d F_d$, we apply baire category.
- Baire category tells us that there is a $\mathcal d \in D$ such that $F_{\mathcal d}$ has
  non empty interior.
- By setup the situation such that if 
  $int(F_D) \neq \emptyset$, then $F_D = X$, which gives us the uniform statement.

#### Application : Vanishing derivative pointwise implies fn is polynomial

- Let $P$ be infinitely differentiable, such that for each $x \in \mathbb R$,
  there is a number $n(x) \in \mathbb N$ such that
  $\partial^{n(w)} P /\partial x^{n(w)}|_w = 0$.
- This is again a case where we switch a pointwise fact --- is locally like a poly as nth order
  derivative vanishes, into a global fact --- is actually a polynomial.
- Let us try the above proof sketch.
- Proof by contradiction, assuming $P$ is not a polynomial.
- Define $X \equiv \{ x : \forall x \in (a, b), \text{P|_{(a, b)} is not a polynomial} \}$
- Define $F_d \equiv \{ v \in [0, 1] : (\partial^d f / partial x^d)(v) = 0 \}$
- Clearly, $\cup F_d \equiv [0, 1]$, and each of th $F_d$ are closed (zero set of fn).
- By baire category, one of the $F_d$ has an open set inside it.
- This means that for some open set, there is some natural $D$ such that the $D$th derivative vanishes.
- From this, it is "clear" that $f$ is a polynomial?

#### Application : the reals are uncountable.

- Assume $[0, 1]$ is countable.
- So $[0, 1] = \cup_i  \{x[i]\}$ for a countable number of points $x[i]$.
- See that each of the sets $\{x[i]\}$.
- But we know that a nonempty set $X$ is not a countable union of nowhere dense sets.

#### Application: Uniform boundedness

- Let $X$ be a banach space, $Y$ a normed vector space, $B(X, Y)$ be all bounded linear operators
  from $X$ to $Y$. Let $F$ be a collection of bounded linear operators from $X$ to $Y$.
  If $\forall x_0 \in X, \sup_{T \in F} ||T(x_0)||_Y < \infty$ (set of operators is pointwise bounded), then 
  $sup_{T \in F} ||T||< \infty$ (set of operators is uniformly bounded)


# libOpenGL, libVDSO and Nix


- openGL is bother userspace / user facing (provides APIs) and drivers
  (talks to GPU hardware)
- Nix thinks openGL is impure becuase openGL needs to be loaded based
  on *target hardware*, which is fundamentally unknown at *host build machine*.
- Contrast this situatino to VDSO, which is a library the kernel secretly
  links/loads into any executable to provide access to syscalls.
- This is *also* a fundamentally target side infra, which the host
  build cannot know about, but this impurity is "purified" for Nix
  because userspace doesn't need to worry about loading VDSO!
- If it were the case that the kernel also links in


# Stuff I learnt in 2022

2022 was a weird year for me. I moved from India to Edinburgh to pursue
my PhD, and a lot of the year was (and still is) getting used to
what it even means to be a PhD student. Here's a run down of the
things I learnt this year, and what the experience of doing this was.

#### Semantics of MLIR in Lean

The first project I took up was to define the semantics of [MLIR](https://mlir.llvm.org/),
a new compiler infrastructure in the [Lean4](https://leanprover-community.github.io/) proof
assistant, titled boringly as [`opencompl/lean-mlir`](https://github.com/opencompl/lean-mlir).

While working on the project, I did a bunch of useful things:

- I helped write the [Lean metaprogramming.book](https://github.com/arthurpaulino/lean4-metaprogramming-book),
  an open-source book on using Lean's powerful metaprogramming facilities.
- I read through [Mario Carneiro's thesis, models of dependently typed programming languages](https://github.com/digama0/lean-type-theory)
  which explains Lean's metatheory and a proof of consistency of Lean. I quite liked this thesis, since it provides
  a very readable set-theoretic semantics for Lean!

#### Lean OSS work

I also began contributing to the proof assistant itself.
In particular, I've been slowly chipping away at adding [LLVM support](leanprover/lean4#1837),
which got merged on the 31st, right before new years! The hardest part was learning a huge amount
about low-level linkers, loaders, and other cross platform shenanigans. The sources I leaned against
most were:

- The [CMake manual](https://cmake.org/documentation/), which actually makes CMake sensible. I quite like CMake now,
  since I actually understand its semantics.
- [Linkers and Loaders](http://14.99.188.242:8080/jspui/bitstream/123456789/12311/1/LinkerLoader.mca.pdf), to learn
  a ton of the arcane details of how precisely loading works.
- The [GNU binutils](https://www.gnu.org/software/binutils/), which were a lifesaver in debugging weird linker visibility issues.


#### Partial Evaluation

I was also interested in improving the performance of the code generator, and was frustrated that we in compilers
kept rediscovering basic optimisation techniques over an over again. Partial Evaluation seemed like a powerful
technique to prevent this waste, and I thus began reading the literature on partial evaluation. The best
book I found was called [Partial evaluation and automatic program generation](https://www.itu.dk/people/sestoft/pebook/jonesgomardsestoft-a4.pdf),
and I [implemented the algorithms from the book](https://github.com/bollu/halfred). However, I haven't had the time
to integrate these back into lean proper. Plans for next year!


#### Adjoint School And Category Theory

I wanted to learn more category theory, since I felt it was important as a type theorist in training
to be fluent with category theory.

- I was reading [Categorical logic](https://link.liverpool.ac.uk/portal/Categorical-logic-and-type-theory-Bart/zctzUzjlvJk/)
  by Bart Jacobs, which describes how to reason about type theory using the
  machinery of [fibered categories](https://en.wikipedia.org/wiki/Fibred_category).
- I attended [Adjoint School](https://adjointschool.com/), a summer school for applied category theorists, which was
  a blast. I learnt a lot from it, and read a bunch of papers from it!
- I loved the paper [Opinion Dynamics on Discourse Sheaves](https://arxiv.org/abs/2005.12798), which describes how to setup
  a 'discourse sheaf', a sheaf structure on a graph which models private versus public opinions. The punchline is that harmonic
  functions lead to 'harmonious' discourse amongst participants in the model!
- Ohad pointed me to [Probabilistic models via quasi borel spaces](https://arxiv.org/abs/1701.02547), which builds
  quasi-borel spaces, which is a closed cartesian category where one can interpret simply typed lambda calculus.
  This gives a nice denotational footing to probabilistic programming.

But to be honest, what I really took away from this was that I *don't enjoy*
category theory as much as I enjoy geometry. Thus, I'm going to try to align
next year such that I get to read more geometric concepts!

#### Logic

I wanted to learn logic and model theory, so I read George Boolos'
book ["Computability and Logic"](https://www.cambridge.org/core/books/computability-and-logic/440B4178B7CBF1C241694233716AB271).
My two favourite theorems were:

- The [Compactness theorem](https://en.wikipedia.org/wiki/Compactness_theorem), which points to the finiteness of
  the proof system we use, and how this impacts the logic itself.
- The [Lowenheim Skolem](https://en.wikipedia.org/wiki/L%C3%B6wenheim%E2%80%93Skolem_theorem) theorem, which shows that
   first order logic cannot control the size of its models.

I also wanted to learn what forcing was about, so I tried
to read through the literature:

- I began by reading [Jech: Axiom of Choice](https://link.springer.com/chapter/10.1007/978-3-642-41422-0_37#Abs1), which
  was far too terse to grok, so I switched to reading the next lecture notes.
- [Independence of CH: an intuitive explanation](https://arxiv.org/pdf/2208.13731.pdf) was a readable account of the
   machinery of forcing! I wanted to get the account of forcing from the point of view of topi, for which I started reading
   the next book.
- [Sheaves in geometry and logic](https://link.springer.com/book/10.1007/978-1-4612-0927-0) is a textbook on topos theory, which
  provide an account of forcing by building an object called a [cohen topos](https://toddtoddtodd.net/T%20Schmid%20-%20Toposes,%20Sets,%20and%20Cohen%20Forcing,%20an%20Overview.pdf). I didn't manage to get through enough of the book to really understand what the
  chapter on the cohen topos was doing, but I did get the vague idea. We seem
  to build a topos, and then use the internal logic of the topos to mimic the
  model we are building. The machinery of topos theory allows us to easily
  control the internal logic, thereby adding the axioms to ZF.

#### Frex

[Ohad Kammar](https://twitter.com/aleph_kappa?lang=en) is a senior research fellow here at Edinburgh who I really enjoy
talking to. He told me about a project he works on, `frex`, which stands for
"free extensions". The TL;DR is that they wish to study how to write down simplifiers / computer algebra systems
in a principled fashion. Their paper [Partially static data as free extensions of algebras](https://www.cl.cam.ac.uk/~jdy22/papers/partially-static-data-as-free-extension-of-algebras.pdf) is a super readable account of their ideas. I quite enjoyed re-implementing
the basic version in Haskell. I wish to implement their more recent, more complex dependently-typed version of the
theory in Lean.

#### Ideas in type theory and proof assistants

Since I'm here at Edinburgh, I keep getting stray recommendations on things to read.
A big shout-out to
[Andres Goens](https://github.com/goens),
[Chris Hughes](https://github.com/ChrisHughes24),
[Jesse Sigal](https://github.com/jasigal),
[Justus Mathiessen](https://www.inf.ed.ac.uk/people/staff/Justus_Matthiesen.html),
[Leonardo De Moura](https://leodemoura.github.io/about.html),
[Li-yao Xia](https://poisson.chat/),
[Mario Carneiro](https://github.com/digama0),
[Ohad Kammar](https://twitter.com/aleph_kappa?lang=en), and
[Sebastien Michelland](https://github.com/lephe/) for many of these pointers.

- [On Universes in Type Theory](http://www2.math.uu.se/~palmgren/universe.pdf) describes the difference between russel and tarski
  style universes.
- [Case trees](https://hackage.haskell.org/package/idris-1.1.0/docs/Idris-Core-CaseTree.html) are a data structure which are used
   in Coq and Idris to manage dependent pattern matching.
- [primitive recursors](https://leanprover.github.io/theorem_proving_in_lean/inductive_types.html?highlight=recursor#defining-the-natural-numbers) in Lean
- [congruence closure in intensional type theories](https://arxiv.org/abs/1701.04391) describes how to extend the naive
  congruence closure algorithm in the presence of definitional equalities between types.
- Difference between [match+fix, as introduced by Theirrey Coquand in 'Pattern Matching with Dependent Types'](https://wonks.github.io/type-theory-reading-group/papers/proc92-coquand.pdf) ,  `termination_by`, and primitive recursion.
- The idea of full abstraction, which asks the question of when operational and denotational semantics agree,
  first studied by [Gordon Plotkin for PCF](https://pdf.sciencedirectassets.com/271538/1-s2.0-S0304397500X0240X/1-s2.0-0304397577900445/main.pdf)
- [Andrej Nauer's notes on realizability](https://github.com/andrejbauer/notes-on-realizability), which very cleanly describes
  the theory of realisability models, where one studies mathematical objects equipped with computational structure. this naturally
  segues into discussions of models of computation and so forth.
- [I am not a number, I am a free variable](https://www.semanticscholar.org/paper/Functional-pearl%3A-i-am-not-a-number--i-am-a-free-McBride-McKinna/833cf29aa614fa26348a505f3b9a3832e1d47dd4) describes "locally nameless", a technique to manage names when implementing proof assistants.
- Higher order unification is necessary when implementing the elaborator for a proof assistant. Unfortunately,
  [the full problem is also undecidable](https://www.ps.uni-saarland.de/Publications/documents/SpiesForster_2019_UndecidabilityHOU.pdf)
- Luckily for us, [Miller found a fragment called 'pattern unification'](https://github.com/Saizan/miller), where unification is indeed
  decidable. The key idea is to add a 'linearity' constraint which ensures that variables are not repeated into the pattern match, which
  makes even higher-order patterns decidable.
- The [Beluga Proof Assistant](https://beluga-lang.readthedocs.io/en/latest/) is a proof assistant whose logic allows one to reify
  contexts. This means that one can write shallow embeddings of programming languages, have all the nice power of the proof assistant,
  while still reasoning about binders! I found the way in which Beluga makes
  such invasive changes to the metatheory in order to allow reasoning about
  binders to be very enlightening.
- The [HOL light](https://www.cl.cam.ac.uk/~jrh13/hol-light/tutorial.pdf) proof assistant and [Isabelle/HOL](https://isabelle.in.tum.de/)
  are both based on higher order logic, and alternate, untyped foundations for
  proof assistants. I feel it's important for me to know the various ideas
  folks have tried in building proof assistants, and I was glad to have been
  pointed to Isabelle and HOL. I want to spend some time this year (2023) to
  learn Isabelle and HOL well enough that I can prove something like strong
  normalization of STLC in them.
- [Fitch style modal logics](https://arxiv.org/pdf/1710.08326.pdf) are a systematic way to build type theories with
  modalities in them. These are typically used to create type theories that can reason about resources, such as
  concurrent access or security properties. The paper provides a unified account of how to build such type theories, and
  how to prove the usual slew of results about them.
- [Minimal implementation of separation logic: Separation Logic for Sequential Programs](http://www.chargueraud.org/research/2020/seq_seplogic/seq_seplogic.pdf) explains how to write a small separation logic framework embedded in a dependently typed progrmaming language.
  I [translated the original from Coq to Lean](https://github.com/bollu/slf/blob/main/Separation.lean#L1934), and the whole thing clocks in at
  around 2000 LoC, which is not too shabby to bootstrap a full 21st century
  theory of reasoning about parallelism!
- [Telescopic Mappings in Typed Lambda Calculus](https://pdf.sciencedirectassets.com/272575/1-s2.0-S0890540100X02428/1-s2.0-089054019190066B/main.pdf) builds the theory of telescopes, which is the basic notation that's used when describing binders in dependent type theory.
  I had no idea that this had to be developed; I shudder to think how ugly notation was before this paper! I can't help but
  feel that this paper did for dependent type theory what einstein summation convention did for tensor calculus: provide compact
  notation for the uninteresting bits to allow us to to talk about the interesting bits well.
- When talking to Leonardo, I learnt that the hardest part of implementing a homotopical theorem prover
  was the elaboration of pattern matching. [Pattern matching without K](https://jesper.sikanda.be/files/pattern-matching-without-K.pdf)
  explains how to do this, and also made clear for me at what step
  [UIP](https://ncatlab.org/nlab/show/uniqueness+of+identity+proofs) is used
  during pattern matching --- when refining on indexes of a type.
- [The garden of forking paths to reason about lazy programs](https://arxiv.org/abs/2103.07543) describes how to use
  [Clairvoyant call by value](https://www.cs.nott.ac.uk/~pszgmh/clairvoyant.pdf) to reason about laziness in a convenient fashion.


#### Computational group theory

I was confused about what I should pick for my PhD topic, and I briefly flirted with the idea
of working on computational group theory. I genuinely loved a bunch of the papers in this space,
but alas, I couldn't see myself seriously working on these, due to the lack of a clear and focused
problem that I coulf work on. I did read some cool papers regardless:

- [A practical model for computing with matrix groups](https://www.sciencedirect.com/science/article/pii/S074771711400056X)
  describes algorithms that form the crux of computational matrix group theory.
- [A data structure for a uniform approach to computations with finite groups](https://dl.acm.org/doi/abs/10.1145/1145768.1145811)
  provides a data structure that unifies algorithms for the two ways of representing groups computationally: (1) as subgroups
  of the symmetric group, which is given as a [strong generating set](https://en.wikipedia.org/wiki/Strong_generating_set),
  and (2) as matrices. These two approaches are radically different under the hood, but the paper provides a unified API to
  deal with both.
- [Computing in the monster](https://webspace.maths.qmul.ac.uk/r.a.wilson/pubs_files/MDurham.pdf) describes
  how to perform computations in the monster group, a group that's so large that naively trying to write down elements would
  take two gigabytes of memory.

#### Automated theorem proving

I wound up reading a little on how to implement automated theorem provers (SAT/SMT solvers).
This space is huge, and I only got a cursory glance at it from the [Decision procedures book](https://www.decision-procedures.org/).
Even so, it was neat to learn the core ideas:

- [The DPLL algorithm for solving SAT](https://en.wikipedia.org/wiki/DPLL_algorithm)
- [The CDCL strategy for refining SMT queries](https://en.wikipedia.org/wiki/Conflict-driven_clause_learning)
- [The Nelson Oppen algorithm for mixing convex theories](https://web.stanford.edu/class/cs357/lecture11.pdf)
- The [First Order Resolution](https://logic4free.informatik.uni-kiel.de/llocs/Resolution_(first-order_logic))
  rule, which exploits
  [refutation-completeness](https://cs.stackexchange.com/a/9096/122524) to
  build a SAT solver,
- The [Superposition Calculus](https://en.wikipedia.org/wiki/Superposition_calculus) for fast SAT solving, based on an extension
  of resolution.

#### Common Lisp

I got turned off of writing scripts in Python because writing parallel
processing is a pain, so I did the obvious thing: picked up common lisp!
- I mostly learnt lisp by reading [practical common lisp](https://gigamonkeys.com/book/) and hanging out on `##lisp` on libera IRC.
- I absolutely *loved* the REPL driven development enabled by
  [`emacs`+`slime`](https://slime.common-lisp.dev/), and this has definitely
  set a gold standard for how programming language interaction ought to feel like.
- I was floored by some of the comon lisp projects I saw, such as [cepl](https://github.com/cbaggers/cepl), the code-eval-play loop
  for rapid shader development! His [videos are super cool](https://www.youtube.com/playlist?list=PL2VAYZE_4wRKKr5pJzfYD1w4tKCXARs5y),
  and I highly recommend them to anyone who wants to get a flavour of LISP.
- I'd like to work through [Let over Lambda](https://letoverlambda.com/), a book that explains all sorts of macro shenanigans!


#### Non fiction and fiction

I wound up reading a bunch of sci-fi this year, since I wanted to work my way through
the Nebula award winners. My favourites were:

- All Clear by Connie  Willis paints a great picture of the blitz during WW2. It feels surreal to have
  visited london and edinburgh and glasgow and all the other places that are name dropped in the book,
  it felt visceral.
- The [Annihilation series](https://en.wikipedia.org/wiki/Annihilation_(VanderMeer_novel)), which has
  the creepiest vibes in a book I've ever read.
- [Accelerando](https://en.wikipedia.org/wiki/Accelerando), which had a really neat take on a resolution of the Fermi Paradox,
  and just an overall fun tone.
- [The book of all skies](https://www.gregegan.net/ALLSKIES/AllSkies.html) by
  Greg egan, which as usual explores a neat universe, this time with some kind
  of monodromy.
- Honorary mention to [Perfect State by Brandon Sanderson](https://www.goodreads.com/book/show/25188109-perfect-state), a cute novella
  with a really neat twist at the end I didn't see coming.

As usual, I was reading some more sciencey things, this time on chemistry and nanotechnology,
which were the books
[Ignition! An informal history of rocket liquid propellants](https://www.amazon.co.uk/Ignition-Informal-Propellants-University-Classics/dp/0813595835),
[Inventing Temperature](https://global.oup.com/academic/product/inventing-temperature-9780195337389?cc=gb&lang=en&), and
[Engines of creation](https://en.wikipedia.org/wiki/Engines_of_Creation).


#### Looking Back

When I started writing this blog post, I felt that I hadn't learnt as much as I
did in [2019](https://bollu.github.io/stuff-i-learnt-in-2019.html). However, I
now see that I've learnt a bunch of things, just in a small domain (proof
assistants / type theory / logic). To be honest, this makes me kind of sad; I
miss learning different things, and I feel like I haven't gotten closer towards
some of my life goals of things I want to learn --- the standard model of
particle physics, the proof of resolution of singularities, and other such
goals. I'm going to try to make sure 2023 is more diverse in what I read,
to make sure I'm happy and sane, while continuing to become an expert in proof assistants `:)`. With that said,
I'd like to set concrete goals for 2023:

- Learn enough QFT to know what the hell renormalization is.
- Learn enough QED to be able to explain what a feynmann path integral is.
- Get good enough at juggling to be able to juggle three balls consistenty.
- Write [my own proof kernel](https://github.com/bollu/qoc) for Lean.
- Implement and write the paper about elaboration of mutual inductives for
  Lean, and start thinking about coinductives.
- Continue working on Lean's LLVM backend, and make Lean the fastest functional
  programming language in the block.
- Learn to cook a proper three course meal consistently.
- Get good enough at [djembe](https://en.wikipedia.org/wiki/Djembe) to play
  [kuku](https://afrodrumming.com/djembe-rhythm-kuku/) consistently.
- Get good enough at the guitar to strum Snow and Can't Stop well enough.
- Build a routine for shuffle dancing, so that I can dance consistently to a song.
- Learn to rock climb well enough that I can do V4's consistently.

That looks like an ambitious list, and I'm glad it is. I'd like my years to be full of
interesting challenges and neat things I can point to at the end of year! With that said, happy new year!


# You don't know jack about data races

#### Toy example

- Consider a function `void incr() { global += 1; }`.
- On running this on multiple thread, the final value of `incr` can be too low.
- It can even be a constant! Thread 1 reads the value of `global(0)`, gets suspended, then writes `global = 0 + 1`
  at the end of the execution!
- It can even be *larger* than correect. If a 128 bit number is stored as two
  64 bit registers, data races could cause the high and low parts to desync,
  causing the final count to be off. Assume we have two decimal bits `lo, hi`.
  If we have `08` (`hi=0, lo=8`) and both threads try to update, one thread
  wants to write `09` and the other thread which writes after this wants to
  write `10`. An interleaving can cause `19`, which is way larger than `10`.


#### Rules of Racy Programs

- Sequentual consistency: A program is executed by interleaving steps from each thread.
  Logically the computer executes a step from one thread, then picks another
  thread, or possibly the same one, executes its next step, and so on.
- Real machines sometimes have non-sequentially consistent semantics, due to assignments
  being visible to threads out of order. (QUESTION: This is at the C level. But if we view it at
  the hardware level, it's still sequentially consistent?)
- All modern languages promise sequential consistency for programs **without data races**.
- What is a data race?
- Two memory operations **conflict** if they access the same location and at least one of them is a write.
- Two **conflicting data operations** form a **data race** if they are from different threads and
  can be executed "at the same time". But wen is tis possible? Clearly, this depends on the semantics
  of parallel computation, which we are trying to define in the first place!
- We break this circularity by considering only **sequentially consistent** executions.
- Two **conflicting data operations** form a **data race** iff (definition)
  one executes immediately after the other in that executionโ€™s interleaving.
- Now we can say that a program is data-race-free if none of its sequentially
  consistent executions has a data race.
- We define this in terms of **conflicting data operations** to exclude synchronization
  operations on mutexes. Synchronization operations do not constitute a data race even if they appear
  next to each other in the interleaving.
- Thus, the programming model is:

1. Write code such that data races are impossible, assuming that the
   implementation follows sequential consistency rules.
2. The implementation then guarantees sequential consistency for such code.

#### Counter-intuitive implications

- Consider an example where the initial state is that `x,y` are false.

```
P1: if(x) { y = true; }
P2: if(y) { x = true; }
```

- There is no sequentially consistent set of executions where both assignments are executed.


#### Higher Level

- [Article](https://www.cs.helsinki.fi/group/nodes/kurssit/rio/papers/adve_boehm_2012.pdf)



# Training a custom model for Lean4

- Bert: 111m (0.1BN)
- Gato : 1.2 BN
- GPT 2: 1.5 billion
- DALL-e: 12 billion
- Codex: 12 billion
- GPT 3: 172 BN
- Leela zero: 2 GB --- 0.5 million
- Jax
- [GPT 3 on CS-2](https://www.youtube.com/watch?v=paAF8eaEqsM)
- [Zero algorithm](https://arxiv.org/pdf/1910.02054.pdf)
- composer versus deepspeed versus fairscale
- linformer for lean4? can attend to the entire file? 10^6 attention from 10^3, a
  million. That's number of lines in mathlib. But how do we generate correct proofs of weird looking
  compilers statements? what do we start with?

# Stratified synthetsis

> The key to our results is stratified
> synthesis, where we use a set of instructions whose semantics
> are known to synthesize the semantics of additional instruc-
> tions whose semantics are unknown. As the set of formally
> described instructions increases, the synthesis vocabulary
> expands, making it possible to synthesize the semantics of
> increasingly complex instructions

- [Paper reference](https://dl.acm.org/doi/pdf/10.1145/2908080.2908121)



# Mutual recursion elaboration in Lean

Lean has four backends for elaborating mutual definitions.

-   Lean, given a mutual def block, can compile to (1) partial, which is
    just an opaque blob in the kernel, (2) primitive recursion on an
    inductive type via `recOn`, (3) well founded induction via `WF`,
    and (4) `brecOn` + `casesOn`, which allows us to split the recursion
    into the pattern matching part (casesOn) and the recursion part
    (brecOn).

-   (2) `recOn` is primitive recursion, and is synthesized by the
    kernel for every inductive declaration. It is often complicated to
    elaborate pattern matching syntax into a primitive recursor, and is
    a research question for the mathematically correct, complete
    solution which handles all cases. This is the lowest level of
    recursion in Lean, and supports good definitional equality. However,
    the code generator does not currently generate code for `recOn` of
    mutal inductives, and thus cannot be executed. When working with
    objects that live in `Type`, it is a good idea to use `recOn` right
    now, since (a) it reduces correctly, and (b) has no computational
    content.

-   (3) `WF` is well founded recursion on a terminating metric, which
    allows one to express functions more easily than primitive
    recursion. Currently, mutual recursion elaborates into `WF`. The
    drawback is that it has poor definitional equality, and thus breaks
    a lot of convenient reasoning. It has support in the code generator,
    since the code generator supports evaluating `recOn` of non-mutual
    definitions (which `WF` is).

-   As an example of where `WF` is more convenient than `recOn`, think
    of ackermann in first order logic. It's not primitive recursive, but
    does terminate by well founded induction on the lexicographic metric
    of the naturals. Another example is the hydra tree, which is a crazy
    game which is known to be finite, but any proof system that can
    prove the game is finite has at least as much proof strength as PA
    (Kirby and Paris).

-   (4) `brec` + `casesOn` which is used to elaborate inductive
    predicates. `brec` is bounded recursion, which allows using
    $k$-inductoin: using inductive hypothesis upto $k$ children behind
    you. Useful for encoding things like fibonacci, where for a
    $S(S(n))$ depends on $S(n)$ and $n$ (2-induction). This way of
    elaborating mutual inductives splits the matching part (`casesOn`)
    from the indction part (`brecOn`), and is thus more convenient to
    elaborate into than a lower level `recOn`. There are some bugs
    luring in the lean elaborator for inductive predicates, so this is
    not fully figured out.

-   Coq gets away with this stuff, because coq has `fix` + `match` in
    the kernel, and they have guardedness checks *in the kernel* which
    checks that the fix is structurally decreasing or whatever. This is
    complicated, and has led to many soundness bugs in the kernel. Thus,
    Lean wishes to avoid this.

-   Thus, in an ideal world, we would improve the elaborator to
    elaborate everything into `rec`, and would teach the code generator
    how to code generate mutual `rec`.

#### Simp Bottlenecks

Benchmarking simp with perf showed us that the bottleneck in one example
was in `congr`, which recursively calls `simp` and `dsimp` (dsimp is a
variant of simp which preserves definitional equality). This needs to be
investigated further.

Another bottleneck could be that simp processes bottom-up. This can lead
to quadratic behaviour on certain tests. For example, consider:

```
(not (and A  B) = (or (not A) (not B)
```

We denote the currently processed node with square brackets `[.]` If we
proceed top-down, see that we would need a quadratic number of steps,
because we need a linear number of steps to reach the top from the
bottom, where we push down the `not`. We must repeat this till fixpoint.

```
(not (and (and  a   b )  c ))
(not (and (and  a   b ) [c]))
(not (and (and  a  [b])  c))
(not (and (and [a]  b)   c))
(not (and [and  a   b]   c))
(not [and (and  a   b)   c])
[not (and (and  a   b)   c]
;; TRANSFORM=>
(or (not  (and a b) (not c))
;; ...
```

#### Simp lemma generation

If we define functions in a mutual def block, and we tag these functions
as `simp`, then simp must generate simp lemmas. If we have a definition
of the form:

```
inductive X where
| X1 | X2 .. | Xn

def foo: X -> X -> Bool
| X1, _ => True
| _, X2 => False
```

the theorems will be:

```
theorem foo.simp1 (x x': X) (h: x = X1): foo x x' = True.
theorem foo.simp2 (x x': X) (h: x /= X1) (h': x' = X2): foo x x' = False.
```

This could be very expensive in case we have complicated mutual
definitions, since Lean can blow up if we have many inductives.


# Subject reduction in Lean

Not exactly. Subject reduction is the property that if you replace a subterm of
a term with a defeq one (especially if the subterm is the result of reduction),
the resulting big term remains typecheckable. This fails in lean because if you
reduce some of the identities in @id A (@id B (@id C t)) you can deduce
transitivity of defeq, so by applying one of the counterexamples to
transitivity you get a term such that reducing the internal identity functions
results in another term that doesn't typecheck


```
variables {A : Type} {R : A โ†’ A โ†’ Prop} (x : A) (h : acc R x)

def my_rec : โˆ€ x : A, acc R x โ†’ โ„• := @acc.rec A R (ฮป _, โ„•) (ฮป _ _ _, 1)
def inv {x : A} (h : acc R x) : acc R x := acc.intro x (ฮป y h', acc.inv h h')
example : inv h = h := rfl -- ok
#reduce my_rec x (inv h) -- 1
#reduce my_rec x h -- acc.rec _ h

-- failure of transitivity
#check (rfl : my_rec x (inv h) = 1) -- ok
#check (rfl : inv h = h) -- ok
#check (rfl : my_rec x (inv h) = my_rec x h) -- ok
#check (rfl : my_rec x h = 1) -- fail

-- failure of SR:
#check @id (my_rec x h = 1) (@id (my_rec x (inv h) = 1) rfl) -- ok
#check @id (my_rec x h = 1) (@id (1 = 1) rfl) -- fail

-- fooling tactics into producing type incorrect terms:
def T (X : 1 = my_rec x h โ†’ Type) :
  X (@id (1 = my_rec x (inv h)) rfl) = X (@id (1 = my_rec x (inv h)) rfl) :=
by { dsimp, refl }
-- kernel failed to type check declaration 'T' this is usually due to a buggy tactic or a bug in the builtin elaborator
-- elaborated type:
--   โˆ€ {A : Type} {R : A โ†’ A โ†’ Prop} (x : A) (h : acc R x) (X : 1 = my_rec x h โ†’ Type), X _ = X _
-- elaborated value:
--   ฮป {A : Type} {R : A โ†’ A โ†’ Prop} (x : A) (h : acc R x) (X : 1 = my_rec x h โ†’ Type), id (eq.refl (X rfl))
-- nested exception message:
-- type mismatch at application
--   X rfl
-- term
--   rfl
-- has type
--   1 = 1
-- but is expected to have type
--   1 = my_rec x h
```


# Big list of GNU Binutils

- `nm` to list all symbols in an object file.

#### ld

- [trace-symbol](https://sourceware.org/binutils/docs/ld/Options.html#index-symbol-tracing) to trace symbol information.

#### List symbols in a file

Use `nm` to list all symbols in a file.

# Axiom K versus UIP
- UIP: all proofs of equality are equal: `(p q: Eq A a a'): Eq (Eq A a a') p q`
- Axiom K: all proofs of equality are equal to refl: `(p: Eq A a a): Eq (Eq A a a) p (refl A a)`

#### Where is K used in pattern matching

- `K` can be proven by depenedent pattern matching on the identity type!

```
K : (p : x = x) -> p = refl
K refl = refl
```

> In fact, Conor McBride showed in his thesis ("Dependently typed functional
> programs and their proofs (2000)") that K is the only thing that dependent
> pattern matching really adds to dependent type theory.

> Indexed type definitions could be interpreted as non-indexed definitions with
> extra equality proofs in constructors that set the indices. In Agda, what
> ultimately matters is the method for unifying indices in dependent pattern
> matching, so _โ‰ก_ can be seen as a wrapper for whatever notion of equality
> stems from pattern matching. But pattern matching is ultimately reducible to
> applications of either Axiom K or Axiom J. So, even in the context of Agda,
> you should just look at the bare-bones refl/Axiom J definition of equality to
> see where the extra equalities come from.

- [What is axiom K](https://stackoverflow.com/questions/39239363/what-is-axiom-k)
- [Pattern matching without K](https://stackoverflow.com/questions/39264130/is-agda-without-k-less-powerful?noredirect=1&lq=1)


# Linear vs uniqueness types
- A function `A -o B` which is linear in `A` guarantees that the function *consumes* A
- A function `Unique<A> -> B` guarantees that the function holds the *only reference* to `A`.

# Any model of lean must have all inductives

- Or, lean knows about the sizes of types.
- See that the below proof script shows that

```
inductive one: Type
| o1

inductive two: Type
| t1 | t2

theorem one_neq_two: one โ‰  two :=
have h1 : โˆ€ x y : one, x = y := by
  intros x y; cases x; cases y; rfl
have h2 : two.t1 โ‰  two.t2 :=
  by intro h; cases h
ฮป h => by
rw [h] at h1
exact h2 (h1 two.t1 two.t2)
```


# Index over the past, fiber over the future

- indexed view corresponds to `check`
- fibered corresponds to `infer`: given a term, tell me the type of the term?
- Some talk by conor at topos.




# Type formers need not be injective

```
abbrev Powerset (X: Type) := X -> Prop -- the powerset of a type is the collection of all subsets.
```

- This shows that we can create type formers which are not injective.
- This means that inductives are indeed special, for us to be able to have that, eg, `cons a as = cons b bs` implies
  that `a = b /\ b = bs`.

# There cannot be a type of size the universe

```
axiom CODE : Type -- assume we have CODEs for types...
axiom decode : CODE -> Type -- and a decoding...
axiom decode_surjective: โˆ€ (t: Type), { code: CODE // decode code = t } -- which is surjective on types.
abbrev Powerset (X: Type) := X -> Prop -- the powerset of a type is the collection of all subsets.

abbrev codedU := ฮฃ (code: CODE), decode code -- create the set of all values that are reachable by decoding the codes...
abbrev UcodedU := Powerset codedU -- build its powerset...
noncomputable def codedUcodedU: { code_UcodedU : CODE //  decode code_UcodedU = UcodedU } := by { -- encode this...
  apply decode_surjective;
}
noncomputable def cantor (UcodedU: Powerset codedU): codedU := -- use the fact that the UcodedU has a code....
    โŸจ codedUcodedU.val, by { have H := codedUcodedU.property; simp[H]; exact UcodedU } โŸฉ
-- Now run cantor diagonalization.
```

# The dependently typed expression problem

Dependently typed programming is like the expression problem.
We can either write Ohad/OOP, where we have data and proofs (behaviour)
next to each other. Or we can write in Xavier/functional style, where
the data is separate from the proofs (behaviour).

# Motivation for modal logic

- `possibly A -> necessarily (possibly A -> B) -> necessarily B`
- this weakens the precondition `A -> (A -> B) -> B` by needing only `possible A`
  and strengthens the postcondition by spitting out `necessarily B`.
- Key idea 1: if A is true in no world `w`, then `possibly A` that we have is false, and from this we derive explosion.
- Key idea 2: if A is true in some world `wa`, then suppose we are in some arbitrary world `wr`.
- Since `A` is true in `wa`, we have `possibly A`.
- Since `necessarily (possibly A -> B)` is true in all worlds, we have `(possibly A -> B)`.
- Since we have both `possibly A`, and `possibly A -> B`, we derive `B` in `wr`.
- Since `wr` was arbitrary, we then have `necessarily B` since `B` holds in any arbitrary worlds.

#### Use of this for Kant
- experience of objects is possible.
- it is necessarily the case that if experience is possible, then I must have some way to unite experience.
- thus, necessarily we have unity of experience.

#### Use of this for descartes
- it is possible for me to be certain of something (ie, I think therefore I am)
- it is neecessarily the case that if I can be certain of something, I have clear and distinct perception.
- Therefore, it is necessary that I have clear and distinct perception.

# Scones

- take $C$ a category. There is a global sections functor $\Gamma: C -> Set$ given by $Hom(1, -)$.
- take the pullback $C \xrightarrow{\Gamma} Set \xleftarrow{cod} Set^{\to}$.

- From any type theory $T$, we build $syn(T)$, where objects are the types, and morphisms are terms with
  free variables. (ie, $A \to B$ is a term of type $B$ involving a free variable of type $A$)
- whatever structure $T$ had will be visible in $syn(T)$. eg: if $T$ has products, then $syn(T)$ will have products.
  moreover, $syn(T)$ will be the initial such category. For any other $C$ with the appropriate structure, there will a functor $syn(T) \to C$.
- To use this to prove properties of $T$, we'll need to cook up a special $C$, so that $syn(T) \to C$ can tell us something.
  Further, this $C$ must somehow depend on $T$ to explore properties of $T$, so let's call it $C(T)$.
- We must use the uniqueness of the morphism $syn(T)$ to $C(T)$ (ie, the initiality of $syn(T)$), because that's what makes
  this thing universal.

- [An introduction to fibrations, topos theory, the effective topos and modest sets](http://www.lfcs.inf.ed.ac.uk/reports/92/ECS-LFCS-92-208/)
- [Scones, logical relations, parametricity](https://golem.ph.utexas.edu/category/2013/04/scones_logical_relations_and_p.html)


# Presheaf models of type theory
- Let $C$ be any category.
- Contexts are presheaves $\Gamma: C^op \to Set$. Morphisms are natural transformations of presheaves.
- an element of a context $Elem(\Gamma)$ is a global element / grothendieck construction / object in the category of elements of contexts:
  $\Sigma{I:Ob(C)} \Gamma(I)$
- A type in the context, $\Gamma \vdash T$ is a presheaf over the category of elements $\alpha \in T(I, \rho)$.
- A term $\Gamma \vdash t: T$ is $t: (I: Ob(C)) -> (\rho: \Gamma(I)) -> T(I, \rho)$.
- substitution is a natural transformation $\sigma: \Gamma \to \Delta$.


- [A presheaf model of dependent type theory by Alexis Laouar](https://perso.crans.org/alaouar/rapportm1.pdf)
- [Ref: Cubical type theory with several universes in nuprl](https://www.youtube.com/watch?v=ioa-f_nCNuE)

# Weighted limits via collages

#### Collage of a profunctor.
- more explicitly, for `P : C -|-> D`, define `Collage(P)` as the category where `Obj(Collage(P)) = Obj(D) + Obj(C)`, `Collage(P)(inl x, inl y) = D(x,y)`, `Collage(P)(inr x, inr y) = C(x,y)`, `Collage(P)(inl x, inr y) = P(x,y)`, `Collage(P)(inr x, inl y) = 0`
- It is the categorification of a cograph. A graph is where we take the product `A \times B` and then take a subset of it where `f(x) = y` (equalizer).
- A cograph is where we take the union `A \cup B` and then impose a quotient `f(x) ~ y` (coequalizer).
- When we categorify this, we don't coequalize, but we setup arrows that capture the morphisms.

#### Quick intro to enriched (pro)functors.

- In an enriched category, we replace hom sets by hom objects which live in some suitable category $V$.
- The category must be monoidal, so we can define composition as $\circ: hom(y, z) \otimes hom(x, y) \to hom(x, z)$.

#### Weighted Limits via collages

- Let `1` be the terminal enriched category, having 1 object `*` and `Hom(*,*) = I`, and `I` is the unit of the monoidal structure `(V, (x), I)` of the enrichment.
- A weighted cone over `D : J -> C` with weight `W : J -|-> 1` (where `I` is the terminal enriched category over `V`),
  is a functor `G` from the collage of `W: J -|-> 1` to `C` that agrees with `F` on the copy of `J` in the collage.
  So, `G: Col(W) -> C`, or `G: J+* -> C` where `G(J) = D`.
- Unravelling this, construct the category `Col(W) = J+*` with the morphisms in `J`, morphism `I: * -> *`, and a bunch of arrow `J -> *`. So we are adding an "enriched point",
  with an arrow `I: * -> *`.
- What does a weighted cone `G: Col(W) -> C` have that doesn't just come from `F: J -> C`?  Well, it has an object `X` (for apeX) to be the image of `(inr *): J+*`,
  and it has the collage maps `W(inl j -> inr *) -> C(j -> X)` for all `j` in `Obj(J)`, and these maps commute with the base maps of `F`.
  So far, this looks like a cone. However, note that the collage maps are enriched maps!
-  The natural transformations can only choose to move where `*` goes, since that's the only freedom two functors `G, G:':J+* -> C` have,
   since they must agree with `F` on `J`: `G(J) = G'(J) = F(J)`.
   This is akin to moving the nadir, plus commutation conditions to ensure that this is indeed a cone morphism.

- Maps of these weighted cones are natural transformations that are identity on the copy of `J`
- Terminal means what it usually does. A terminal weighted cone is a weighted limit.


```
15:51 <xplat> *C(X,F(j))
How does this look in our ordinary Set-enriched world?  a `W`-weighted cone has its ape`X` and for each `j` in `J`
  it has a `W(*,j)`-tuple of arrows `x_j,k : X -> F(j)` in `C` and for each `g : j -> j'` we have equations `x_j,k . F(g) = x_j',W(*,g)(k)`
15:57 <xplat> both correct
15:57 <xplat> wait, no
15:58 <xplat> first correct
15:58 <xplat> maps of weighted cones are natural transformations `eta : F => F' : Collage(W) -> C` that are identity on the copy of J in Collage(W)
16:04 <xplat> in the `Set`-enriched world, a map of `W`-weighted cones is a map `f : X -> X'` in `C` and for each `j` in `Obj(J)` and `k` in `W(*,j)` we have equations `x_j,k = x'_j,k . f`
16:08 <xplat> so you can take a simple example, the second power.  For this example, `J = 1`, `W(*,*) = 2`, `F` picks out some object `c`, so each weighted cone consists of `X` and `x_*,0 : X -> c` and `x_*,1 : X -> c` and no equations
16:09 <xplat> what does the terminal weighted cone look like in this example?
```

#### Weighted limit via `nlab`

- Let $K$ be a small category, which is the diagram.
- Suppose $F: K \to \mathsf{Set}$.
- See that cones of $F$ corresond to natural transformations $[K, \mathsf{Set}](\Delta(p), F)$ for $p \in \mathsf{Set}$.
- See that the limit represents cones: $\mathsf{Set}(p, \texttt{Lim} F) \simeq [K, \mathsf{Set}](\Delta(p), F)$, natural in $p$
- Generalizing this to arbitrary category $C$, we can write

- [Ref: nlab](https://ncatlab.org/nlab/show/weighted+limit)

# Disjoint Coproduct

- One says that a coproduct $X+Y$ is disjoint iff the intersection of $X$ with $Y$ in $X+Y$ is empty.
- The intersection of $A, B$ over $X$ is defined as the pullback of the diagram (in fact, cospan) $A \rightarrow X \leftarrow B$.
- Thus, in this case,  we say that $X, Y$ are disjoint iff the pullback of $X \rightarrow X+Y \leftarrow Y$ is the initial object.

- [Disjoint coproduct](https://ncatlab.org/nlab/show/disjoint+coproduct)



# Leibniz Equality in Lean4

```
@[simp, reducible]
abbrev Leibniz {A: Type} (x y: A) := โˆ€ (P: A -> Prop), P x -> P y

theorem Leibniz_refl {A: Type}: โˆ€ (x: A), Leibniz x x := fun _A _P Px => Px
theorem Leibniz_trans {A: Type}: โˆ€ (x y z: A), Leibniz x y -> Leibniz y z  -> Leibniz x z :=
        fun _x _y _z Lxy Lyz P Px => Lyz P (Lxy P Px)

theorem Leibniz_sym {A: Type}: โˆ€ (x y: A), Leibniz x y -> Leibniz y x :=
  fun x y  Lxy P Py =>
      let prop (a: A) := P a -> P x
      let proofPropX : prop x := id
      let proofPropY: prop y := Lxy prop proofPropX
      proofPropY Py

theorem defeq_implies_Leibniz (x y: A) (EQ: x = y):
  Leibniz x y := fun P Px => EQ โ–ธ Px

theorem Leibniz_implies_defeq (x y: A) (LEQ: Leibniz x y):
  x = y := LEQ (fun a => x = a) rfl
```

# Strong normalization of STLC

- Recall that in the category Hask, objects are types, morphisms are functions/expressions.
- Recall that in the category of contexts, objects are contexts, morphisms are substitutions.
- A local predicate $L$ will relate to an object (type/context) a collection of morphisms
    (type โ†’ expressions of that type, typing context โ†’ substitutions of the variables of the typing context where the
    expressions have the type as per the context).
- Consider STLC with the typing rules:

```
------
ฮ“โŠข():Unit
```

```
ฮ“ โŠข (f:Aโ†’B); ฮ“ โŠข (x:A)
----------------
ฮ“โŠข f@x:B
```

```
ฮ“,(a:A) โŠข (body:B)
----------------
ฮ“ โŠข  (ฮปa.body:Aโ†’B)
```

- Define the logical prediate `Ltype: Type โ†’ Set(Expression)`, by induction on the rules:

```
--------------------------
  () โˆˆ LType(Unit)
```


```
f โˆˆ LType(Aโ†’B); xโˆˆLType(A)
--------------------------
  f @ x โˆˆ LType(B)
```

```
body โˆˆ LType(B); (โˆ€ avalโˆˆ Ltype(A), body[a/aval] โˆˆ Ltype(B))
--------------------------
  ฮปa.body โˆˆ LType(Aโ†’B)
```

- It is clear that `x โˆˆ LType(T)` implies that `x` is strongly normalizing.
- When we try to prove that `ฮ“ โŠข  x : T` implies  `x โˆˆ LType(T)`, we get stuck on the
  case of the lambda, because it's impossible to prove that a well typed term
  `(ฮปa.body):Aโ†’B` is such that upon substitution, `body[a/aval]` will be strongly normalizing.

- So we control the context as well, and create another relatoin `LCtx(ฮ“)`. For a given context
  `ฮ“: Var โ†’ Type`, we say that a substitution `ฮณ: Var โ†’ Expr` is in `LCtx(ฮ“)` iff `dom(ฮณ) = dom(ฮ“)`,
  and that for all `xโˆˆ dom(ฮ“)`, that `ฮณ(x) : ฮ“(x)` and `ฮณ(x) โˆˆ LType(ฮ“(x))`. Maybe written with a little abuse of notation,
  this means that if `(x:T)โˆˆ ฮ“`, then `ฮณ(x):T`, and `ฮณ(x)โˆˆ LType(T)`. That is, `ฮณ` is a set of assignments
  for the typing context `ฮ“` where each assignment is strongly normalizing.

# Subobject classifiers of $N \to FinSet$, or precosheaf of $FinSet$

#### Subobject classifier in $S^2$

- Start with $Set^2$. This has as objects $X_0 \to X_1$. The subobjects are of the form:

```
   f
S0 -> S1
v     v
|i    |i'
v     v
X0 -> X1
   g
```

- we can identify $i(S_0)$ with a subset $T_0$ of $X_0$, and $i'(S_1)$ with a subset $T_1$ of $X_1$.
- The diagram commuting implies that $g(i(S_0)) = i'(f(S_0))$. This means that $g(T_0) = i'(f(S_0))$, or that $g(T_0) \in im(i') = T_1$.
- Thus, we have that $g(T_0) \subseteq T_1$.
- We define the subobject classifier as having values $T, \triangleright T, \triangleright^\infty T$, where $T$ is interpreted as "is a subobject" (is true),
  and $\triangleright$ is interpreted as "delay" (ie, will be a subobject in the next timestep).
- An element $s \in S_0 \subset X_0$ will be classified as $T$.
- An element $s \not in X_0, s \in X_1$ will be classified as $\triangleright T$, since it lands in $X$ in one timestep.
- An element $s \not in X_0, s \not \in X_1$ will be classified as $\triangleright^\infty T$, since it lands in $X$ after
  infinite timesteps (ie, never).
- We can alternatively think of $\triangleright^\infty \sim \triangleright^2$, since it takes "two timesteps", but the second
  timestep is never materialized.

#### Proof that this is the subobject classifier

- We formally define the subobject classifier as $\Omega_0 \xrightarrow{\omega_0} \Omega_1$, where
  $\Omega_0 \equiv \{ T, \triangleright T, \triangleright^\infty T \}$, $\omega_1 \equiv \{T, \triangleright T \}$.
- The map is $\texttt{force}_0$, $T \mapsto T$, $\triangleright T \mapsto T$,
  $\trianglright^\infty T \mapsto \triangleright^\infty T$.
- Informally, the map can be said to be given by $\texttt{force} \equiv (T \mapsto T, \triangleright^{n+1} T \mapsto \triangleright^n T)$.
- We call it "force" since it forces a layer of delay.
- We define the bijection between subobjects $(S \xhookrightarrow{f} X)$ and classification maps $(X \xrightarrow{\xi[f]} \Omega$
  as follows: Let $i$ be the least $i$ index such that $f(S_i) \in X_i$. Then have $\xi[f]_0 = \triangleright^i T$. See that
  by the square, this determines $\xi[f]_{i}$ for all larger $i$:

```
X0 ---ฮง[f]0--> ฮฉ0
|               |
f0            force0
v               v
X1 - ฮง[f]1- -> ฮŸ1
   [to be determined]
```

- We have the obvious


#### Why $N \to FinSet$ does not have subobject classifier

- The objects in this category are sequences of sets $(X_0 \to X_1 \to X_2 \to \dots)$.
- We claim this category


# Dimensions versus units
- `gram/kg` is dimensionless because it's length/length, but it indeed has units `g/kg`, since it's
  the conversion ratio between grams versus kilograms.


# HoTTesT: Identity types

- [lecture](https://www.youtube.com/watch?v=oMKl7pBRg1E&list=PLtIZ5qxwSNnzpNqfXzJjlHI9yCAzRzKtx&index=8).
- We already have judgemental equality. (computational equality: beta eta alpha equality).
- We will next introduce propositional equality.
- Proving an equality => constructing a term of the type of equalities.
- We can prove many judgemental equalities (eg. `add x 0 =judgement= x`), but not all the ones we want
  (eg. `add 0 x =judgement= x`).
- We can't because we need to do induction on `x`.
- When we use natural number elimination / induction, we must produce a term of a type.
- What type?!
- Type constructors internalize structure. Eg. at a meta level, we can talk about contexts `x:A, y:B(x), z:C(x,y)`.
- But internally, we will need to be able to talk about contents, we can use sigma types! `ฮฃ(x:A) ฮฃ(y: Bx) ฮฃ z:C(x, y)`
  lets us internalize contexts!
- Similarly, we can think about dependent terms  as `meta` functions. Example, `x:A,y:B(x) |-  c(x,y): C(x, y)`.
  We can think of this as a function that takes an `x` and a `y` and produce a `c(x,y)`. See that pi types
  internalize this notion! `c: (x:A) -> (y: B(x)) -> C(x,y)`.
- Bool, Nat, etc. are internalizing the usual booleans, naturals, etc.
- The universe type internalizes the judement `A is a type` via `A: Type`.
- The identity type internalizes the meta notion of judgemental equality (how? Doesn't it prove strictly more?)



#### Identity Type

- $=$-formation: A type `a: A`, `b: B`, then we have a type `a =A b type`.
- `=`-intro: `a:A| r_a: a =A a`. (`r` = reflexivity).
- `=`-elim: `x: A, y: A, z: x =A y |- D(x, y, z) type`, and given `x:A |- d: D(x, x, r_x)`, then we have  `ind=(d, x, y, z): D(x, y, z)`

# Left and right adjoints to inverse image

#### The story in set

- Suppose $f: A \to B$ is a morphism of sets.
- Then there is an associated morphism $f^*: 2^B \to 2^A$, the inverse image.
- We get two associated morphisms, $\forall_f, \exists_f: 2^A \to 2^B$, which perform universal and existential
  quantification "relative to $f$".
- The idea is this: think of $A$ as being fibered over $B$ by $f$. Then $\forall_f(S \subseteq A)$
  gives the set of $b \in B$ such that the fiber of $b$ lies entirely in $A$. That is, $f^*(b) = f^{-1}(b) \subseteq A$.
- In pictures, Suppose the `@` mark the subset $S$ of $A$, while the `-` is outside the subset. We draw $A$ as being
   fibered over $B \equiv \{b_1, b_2, b_3\}$.

```
-   @  @
-   -  @
-   -  @
|   |  |
v   v  v
b1 b2 b3
```

- Then, $\forall_f(A)$ will give us $b_3$, because it's only $b_3$ whose entire fiber lies in $A$.
- Dually, $\exists_f(A)$ will give us $\{ b_2, b_3 \}$, because _some portion_ of the fiber lies in $A$.

#### The story in general

- Suppose we have a presheaf category $\hat C$. Take a morphism $(c \xrightarrow{f} c')$

#### The story in slice categories

- If we have $f: A \to B$ in `Set`, then we have $f^*: Set/B \to Set/A$, which
  sends a morphism $(K \xrightarrow{g} B)$ to $(K \xrightarrow{g} B \xrightarrow{f^{-1}} A)$.
- This also motivates the presheaves story, as $Set/B \simeq Set^B$.
- Recall that any morphism $K \xrightarrow{h} B \in Set/B$ can be equally seen as a morphism $b \mapsto h^{-1}(b) \in Set^B$.
  This is the mapping between slice and exponential.
- We can think of $(K \xrightarrow{h} B) \in Set/B$ as a collection $\{ h_b \equiv h^{-1}(b) \subseteq B \}$.
  This is the fibrational viewpoint.
- Then the functor $f^*(\{ h_b : b \in B\}) \equiv \{ h_{f(a)} : a \in A\}$.
- TODO


# Paredit via adjoints

- We posit that text editor movements ought to be endofunctions, and complementary keybinds
  ought to be adjoints to each other.
- With this in mind, what is the correct category for `paredit`, and what are the adjunctions?
- Suppose we wish to build a theory of `Sexp`s. Then let's consider the category of rooted trees,
  where the root is the currently selected sexp, where the morphisms are inclusion maps of trees.
- What are the operations? They are going to be endofunctions in this category. For example, moving up to the
  parent, moving to the left and right sibling, etc.
- Hopf algebras and rooted trees (https://personal.math.ubc.ca/~thomas/TeXthings/HopfAlgebras-1.1.pdf)


# Less than versus Less than or equals over Z

-  If we have a theorem whose hypotheses and goal are of the form `a <= b - 1` for `a, b` integers,
   is it always safe to replace these with `a < b`? Shockingly, no!
- Consider the theorem: `(a <= n - 1) => (2a <= 2n - 2)`.
- When we lose the information to `. < .`, it becomes `(a < n) => (2a < 2n - 1)`.
- But this can't be proved, because the best we can do is `(a < n) => 2a < 2n`!
- The key reason is that even though `(a < n) <-> (a <= n - 1)` is an equivalence, we can't
  always rewrite with an equivalence under an inequality!
- Said differently, `a <= n - 1` is equivalent to `a < n`, but once we start performing *algebra*
  on these, we start seeing the difference.
- This came up in the context of delinearization. I was trying to prove that if `i < N` and `j < M`, then `iM + j < NM`.
- This proof, while state in terms of `<`, actually needs us to go through `<=`:
- `i <= (N - !)`, so `iM <= NM - M`, so `iM + j <= (NM - M) + (M - 1)`, which means `iM + j <= M - 1`, or `iM < M`.

# Allegories and Categories

- An allegory is a category enriched over posets, where each morphism $r: A \to B$
  has a converse $r': B \to A$.

# Partial function as span

- A partial function $f: D \subseteq X \to Y$ is a span of $Y \leftarrow D \hookrightarrow X$.
  What a slick definition!
- See that if $Y = 1$, then a partial function $X \to 1$ carries only the data of $D \hookrightarrow X$, giving us
  subobjects.


# Turing degree

- [Lectures on turing degree](https://pi.math.cornell.edu/~shore/papers/pdf/SingLect2NS.pdf)
- A set $X$ is turing reducible to $Y$ iff oracle access to membership in $Y$ provides
  decidable membership for $X$. (imagine $Y$ as hovering above $X$, as we are given oracle access to $Y$). This is written as $X \leq_T Y$.
- Two sets are turing equivalent iff $X \leq_T Y$ and $Y \leq_T X$, also written as $X \equiv_T Y$
- Clealy, $\equiv_T$ is an equivalence relation.
- A **turing degree** is an equivalence class of $\equiv_T$.
- Said differently, it is a maximal strongly connected component of the $\leq_T$ graph.
- Turing degrees have a partial order, where $[X] \leq [Y]$ iff $X \leq Y$ (note that the precise representatives of each class do not matter).
- A set is **recursively enumerable in $A$** if it is the domain of some partial function recursive in $A$ (ie, can write a partial function that semidecides membership
  in $S$ given oracle access to $A$.)
- The jump of a set $A$, written $A'$, is the set of programs $p$ (treated as natural numbers such that $A' \equiv { p | eval^A(p)(p) \downarrow }$, where $\downarrow$ means converges.
  That is, it's the set of natural numbers $p$ such that when the $p$th program in the enumeration of programs with oracle access to $A$, when evaluated on $p$, converge.
- There is a unique turing degree containing all the computable sets [what does this mean? how is this (computably) a subset of the naturals?],
  called $0$ since $0 \leq_T Y$ for all $Y$. That is, oracle access to decision procedure for $0$ gives a decision procedure for $Y$
- $0'$ is the degree of the halting problem.
- The first jump is taken relative to $A \equiv \phi$.
-  The join of two sets is given by $A \oplus B \equiv \{ 2n : n \in A \} \cup \{ 2m + 1 : m \in B \}$. Claim that the turing degree of $A \oplus B$ is a LUB of the turing
  degrees of $A, B$.
- Cutland, N. Computability. Cambridge University


# Proof that there is a TM whose halting is independent of ZFC

- Start by assuming that ZFC is consistent.
- Consider a TM which enumerates proofs in ZFC (ie, sequents if we want to use sequent calculus),
 looking for a sequent that proves the inoncsistency of ZFC.
- If this TM halts, then it has proven inconsistency of ZFC, which contradicts our hypothesis.
- If it does not halt, then this means that we have proven the consistency of
  ZFC in ZFC, which contradicts Godel's incompleteness theorem.
- Thus, it is independent of ZFC whether TM M halts or not.

# Contradiction from non-positive occurence

We wish to show that allow non-positive occurences of the inductive type
in its constructor can lead to contradiction. Proof as haskell file below:

```hs
{-# LANGUAGE GADTs #-}

data Void where

data F where
  FnSpace :: (F -> Void) -> F

contra :: F -> Void
contra f@(FnSpace fn) = fn f


inhab :: F
inhab = FnSpace contra
```


# The constructible universe L

- When building **von neumann universe**, we take *all* subsets from previous state; $V(0) = \emptyset$, $V(n + 1) = 2^{V(n)}$,
  $V(\lim \alpha) = \cup_{\beta < \alpha} V(\beta)$.
- To build $L$ (the definable universe), first we need the notion of definability.
- For a set $X$, the set $Def(X)$ is the set of all $Y \subseteq X$ such that $Y$ is logically definable in the structure $(X, \in)$ (That is, we are given access to FOL and $\in$)
  from parameters in $X$ (that is, we can have free variables of elements of $X$).
- We can now build the constructible universe by iteratively constructing definable sets of the previous level.
- Can talk about definability in terms of [godel operations](https://en.wikipedia.org/wiki/G%C3%B6del_operation), which has
  ordered & unordered pairing, cartesian product, set difference, taking the domain of a binary relation, automorphisms of an ordered triple.
  These give us a "constructive" description of what we can do using
  definability. [See also: constructible universe at nLab](https://ncatlab.org/nlab/show/constructible+universe)
- [Computable universe](https://en.wikipedia.org/wiki/Constructible_universe)

#### Godel Normal Form theorem

- Theorem which says that constructible sets are those that can be built from godel operations.


# Godel completeness theorem

- If a formula is true (holds in every model), then it is derivable from the
  logic.
-  theory is syntactically consistent if one cannot derive both $s$ and $\lnot s$ from the deduction rules.
- Henkin's model existence theorem says that if a theory is syntactically consistent, then it has a model, for a 1st order theory
  with well orderable language.

#### Relationship to compactness

- Compactness and completeness are closely related.
- Compactness: If $\phi$ is a logical consequence of at most countably infinite $\Gamma$, then $\phi$ is a logical consequence of some
  finite subset of gamma.
- Completeness => compactness, since a derivation tree is a finite object, and must thus only use a finite number of rules.
- For compactness => completeness, suppose that `ฮ“ |= ฯ†`. We wish to show `ฮ“ |- ฯ†`.
- Compactness implies that `ฮณ1, ฮณ2, ... ฮณn |= ฯ†` where `{ ฮณ1, ..., ฮณn } โŠ‚ ฮ“`.
- That is the same as proving that `|= ฮณ1 -> (ฮณ2 -> (... (ฮณn โ†’ ฯ†)))`

#### Henkin model (term model)
- [References](https://en.wikipedia.org/wiki/G%C3%B6del%27s_completeness_theorem)


# Uniform proofs, focused proofs, polarization, logic programming

- Focusing and synthetic rules: http://requestforlogic.blogspot.com/2010/09/focusing-and-synthetic-rules.html
- girard statement about proofs as time; https://mathoverflow.net/a/179258/123769
- [Focused proof](https://en.wikipedia.org/wiki/Focused_proof)
- Polarity in type theory: https://existentialtype.wordpress.com/?s=polarity
- PhD thesis of Noam Zeilberger, polarity: http://www.cs.cmu.edu/~noam/thesis.pdf



# Why cut elimination?

- Morally spekaing, gives control over the formulae that occur in a proof.
- If we can conclude that `(X -> Y; Y -> Z)|(X -> Z)`, then the proof of `X -> Z`
  could be arbitrarily complex, since `Y` might be something crazy.
- If we have `cut`, we know that such arbitrarily crazy things cannot happen, as the `cut` rule is the
  only rule where we are forced to synthesize something "out of thin air".
- [Example of use of cut](https://mathoverflow.net/questions/8632/cut-elimination/64769#64769)

#### Cut implies consistency of first order logic (FOL)
- suppose we have cut for FOL
- If FOL is inconsistent, then there would be a proof of `False` starting from no premises.
- To be more formal, one could write `|- False` or `True |- False`.
- Written in terms of sequent calculus, this would be `[] |- []` (recall that the LHS is interpreted with `AND`, RHS with `OR`.
- But by cut, this would mean that the proof of `[] |- []` would involve only the the symbols in `Sym([]) U ([])` which is the empty set!
- Since there is no trivial proof of `False` with zero symbols, and all other derivation rules need symbols, there cannot be a proof of `False`!
- To repeat: a proof of `True |- False` or `[] |- []` could be `cut`-eliminated so it is *forced* to contain only the sumbols in `Sym([]) U Sym([]) = EMPTYSET`.
  This is absurd, and thus there is no proof of `True |- False`, which implies that the theory is consistent (assuming soundness).

#### References

- [An introduction to the complexity and combinatorics of cut elimination](https://www.ams.org/journals/bull/1997-34-02/S0273-0979-97-00715-5/S0273-0979-97-00715-5.pdf)
- [Reference](https://mathoverflow.net/questions/8632/cut-elimination)

# Forcing to add a function

- Let $M$ be a countable transitive model of ZFC.
- We will add a new function $c: \aleph_0^M \to \{0, 1\}^M$ into $M$ by creating $M[G]$.
- Let $P$ be the set of all finite partial functions from $\aleph_0$ to $\{0, 1\}$ in $M$.
- Let $G$ be a generic maximal ideal of $P$. That is, $G$ intersects every dense set of $M$.
- Also, since it is a maximal ideal, taking the full union $\cup G \equiv c$ will give us a well defined total function.
- It will be well defined since no two elements of $G$ disagree, and it will be total because if it were not, we could extend $G$,
  contradicting the maximality of $G$.
- Great, so if we can construct $M[G]$, we will also have $c = \cup G \in M[G]$.
- But how do we know that $c$ is new? Ie, how do we know that $c \not in M$?
- Well, consider for any function $h \in M$, the subset of $P$ that disagrees with $h$. That is, the subset
  $D_h \equiv \{ p \in P : \exists i, p(i) \neq h(i) \}$.
- See that $D_h$ is dense in $M$: Suppose $p \in P$, and $p$ is well-defined on some subset $S$. Either $p$ disagrees with $h$ on $S$,
  that is, there is some $s \in S$ such that $p(s) \neq h(s)$, in which case $p \in D_h$ and we are done.
- On the other hand, maybe $h|S = p$ (that is, $h$ restricted to $S$ fully agrees with $p$). Then we pick some point $s' \not in S$
  and extend $p$ into $p'$ to disagree with $h$ at $s'$. So define $p'(s') \equiv h(s') + 1$ or something. Now we have $p \leq p'$ and $p' \in D_h$.
- Since $D_h$ is generic, we have that $G \cap D_h \neq \emptyset$, thus $f$ disagrees with $h$ at some point!
- Thinking intuitively, it would be a CRAZY coincidence for it to agree with a function $h$ fully in $M$. If we build it "randomly",
  or "generically", one _would_ expect it to disagree with stuff in $M$ at some point in the construction!.
- Cool, we've now seen how to enlarge the universe to add a _single_ function of interest.
- [Reference](https://math.stackexchange.com/questions/1311667/what-are-some-simple-example-of-forcing-in-set-theory)

# Diaconescu's theorem

- Choice implies LEM
- Let $P$ be a proposition. Build the sets $T, F$ as:
- $T \equiv {x \in \{0, 1\} : (x = 1) \lor P}$, and $F \equiv x \in \{ 0, 1 \} : (x = 0) \lor P \}$.
- Note that if we had LEM, we could case split on $P$ via LEM and show that $x \equiv \{ 1 \}$ if $P$, and $x \equiv \{ 0, 1\}$ if
  $\not P$.
- However, we don't have LEM. So let's invoke Choice on the set $B \equiv \{T, F \}$. This means we get a choice function
  $c: B \to \cup c B$ such that $c(T) \in T$ and $c(F) \in F$.
- By the definition of the two sets, this means that $(c(T) = 1 \lor P)$, and $(c(F) = 0 \lor P)$.
- This can be written as the logical formula $(c(T) = 1 \lor P) \land (c(F) = 0 \lor P)$.
- This is the same as $(c(T) \neq c(F)) \lor P$.
- Now see that since $P \implies (U = V)$ (by extensionality), we have that $P \implies (f(U) = f(V))$.
- See that contraposition is available purely intuitionistically: (`(p -> q) -> (q -> false) -> p -> false`).
- Therefore, by contraposition $(f(U) \neq f(V)) \implies \lnot P$.
- This means we have $P \lor \lnot P$!

# Forcing machinery

- Let $M$ be a countable mode of ZFC (exists by lowenheim skolem).
- Let $\Omega \equiv \{0, 1\}$ ($\Omega$ for subobject classifier).
- Take $P$ to be the set of partial functions from $\aleph_2 \times \aleph_0 \to \Omega$ with finite support
- Note that elements of $P$ can be thought of as finite lists, where we know the values where they are 0, where they are 1.
- Also note that elements of $P$ can be arranged in an obvious partial order.

#### Ideal of a post
- We define an ideal $I \subseteq P$ to be a set of elements which are pairwise compatible (all pairs of elements have a union),
  and is downward closed (all elements with less information is present in the ideal).
- More formally, for any $i \in I$ and $p \in P$, if $p \leq i$, then $p \in I$. So $P \leq I \implies P \in I$.
- For every $i, i' \in I$, there is some $j \in I$ such that $i, i' \leq j$ ($I$ is a directed set).

#### Maximal ideal

- A maximal ideal $I_\star \subseteq P$ is an ideal such that for any $p \in P$, either $p$ is incompatible with $I_\star$,
  or $p$ is in $I_\star$.

#### Density in a poset

- A subset $D$ of a poset $P$ is dense iff for any $p \in P$, there is some $d \in D$ such that $d \geq p$.
  Intuitively, at any point in the poset, it is possible to "add more information" to reach $D$.

#### Generic Ideals
- We say that an ideal $G$ is generic iff $G \cap D \neq \emptyset$ for all dense $D \subseteq P$.
- For any countable model $M$, and a poset $P$ over it,
  We claim that for any $p \in P$, a generic ideal $G_p$ which contains $p$ ($p \in G$) exists.

#### Proof: Generic ideal always exists
- We wish to find a generic ideal that contains a special $p_\star \in P$.
- Let $D_1, D_2, \dots$ be an enumeration of the dense subsets of $P$ that are members of the countable model $M$.
- We can perform such an enumeration because $M$ is countable, and thus only has countable many sets.
- We will create a new sequence $\{q_i\}$ which hits each $\{D_i\}$.
- Start with $q_0 \equiv p_star$.
- Since $D_1$ is dense, there is some $d_1 \in D_1$ such that $d_1 \geq q_0$. Set $q_1 \equiv d_1$.
- This gives us a sequence $\{q_i\}$.
- Build an ideal $I^\star_p \equiv \{ p \in P : \exists i, p \leq q_i \}$. That is, we build the union of all the lower
  sets of $q_i$. So this can also be written as $I^\star_p \equiv \cup_i \downarrow q_i$, where $\downarrow(p) \equiv \{ p' : p' \leq p \}$, the down set of $p$.
- $I^\star_p$ is downward closed by construction, and is directed because for any two elements $a, b \in I$, there is some $q_i, q_j$
  such that $a \in \downarrow q_i$, $b \in downarrow q_j$, and WLOG, if $q_i \leq q_j$, then $a, b \leq q_j$, thereby making
  the set directed.


#### Separative poset
- $P$ is separative iff $p \leq q \leq p$ implies $p = q$.

#### Generic ideal of separative poset is not in the model

- Claim: if $G$ is a generic ideal of $P \subseteq M$, then $G$ is not in $M$.
- Let $H \subseteq P$, $H \in M$. Consider the set $D_H \equiv \{ p \in P : \exists h \in H, \texttt{incompatible}(p, h) \}$.
- Intuitively, $D_H$ is the set of all elements of $P$ which are incompatible with some element of $H$.
- We must have $D_H$ \in $M$  by `comprehension(M)`, since $M$ is a model of $ZFC$ ahd $D_H$ is a susbet of $P$.
- To see that $D_H$ is dense, for any element $p \in P$, we need to find an element $d \in D$ such that $p \leq d$.
  See that $d \in D$ iff there exists some $h \in H$, such that `incompatible(d, h)`.
- Since $D_H$ is dense, we have that $G \cap D_H \neq \emptyset$, This gives us some element $g \in G$ such that `incompatible(g, p)`
  for some $p \in H \subseteq P$.
- TODO: this makes no sense!

#### Definition of forcing

- An element $p \in P$ forces the sentence $\phi(\vec \tau)$ iff $\phi^{M[G]}(\vec \tau^G)$ is true for **every generic ideal**
  $G$ such that $p \in G$. For every formula $\phi$, forcing tells us for which pairs of $p, \vec \tau$ it is the case that
  $\phi^{M[G]}(\vec \tau^G)$ is true. It is written as $p \Vdash \phi(\vec \tau)$.
- Written differently, we say that  $p \in P$ forces $phi(\vec \tau)$, iff for any $G \subseteq P$, $p \in G \implies \phi^G(\vec \tau^G)$ is true.
- That is to say, we can decide the truth of $\phi^G(\vec \tau^G)$ by looking at the presence/absence of $p$ in $G$.
- See that for a fixed $\phi$, forcing gives us a relation $\subseteq P \times M^k$.
- What we want to show is this that this forcing relation, for each $\phi$, is definable in $M$.
- This will show that the collection of $p$ that force a $\phi$ is in $M$ (project the first components of $P \times M^k$.

#### Fundamental theorem of forcing

- For every formula $\phi$, for every generic ideal $G$ over $P$:
- 1. Definability: there is a set $F(\alpha, \phi) \in M$ such that $p \Vdash \phi(\vec \tau)$ ($p$ forces $\tau$) if and only if
  $(p, \vec \tau) \in F(\alpha, \phi)$. That is, the forcing relation is definable in $M$
- 2. Completeness: $\phi^{M[G]}(\vec \tau^G)$ is true iff there is a $g \in G$ such that $g \Vdash \phi(\vec \tau)$.
  That is, any true sentence in $M[G]$ must have a witnessing $p \in G$ which forces it, for any generic ideal $G$.
- 3. Coherence/Stability: If $p \Vdash \phi$, for all $q \geq p$, we have $q \Vdash \phi$. Truth once forced cannot be unforced,
  truth is inflationary, truth is stable, etc.
- The FTF (fundamental theorem of forcing) is an algorithm on the ZFC syntax. It takes a formula $\phi$, and produces a ZFC
  proof of (1), (2), (3).

#### Architecture of FTF

- TODO, here I Am!

#### Net to capture generic ideal
- If $G$ is a generic ideal of $P$, and $G \subseteq (Z \in M)$, then there is a $p \in G$, such that all $q$ such that $p \leq q$ are in $Z$.
  That is, $\forall G, \exists p \in G, \forall q \in G, p \leq q \implies q \in Z$.
- QUESTION: How can $Z \in M$ if $G$ is a proper class relative to $M$, and $G$ is a subset of $M$? Isn't a superset of a proper
  class a proper class?
- Recalling that `(p, q) โˆˆ P` are compatible iff `โˆƒr โˆˆ P, p โ‰ค r โˆง q โ‰ค r`. If no such `r` exists, then `(p, q)` are incompatible.
- Suppose we take some `(p, q) โˆˆ P`. We can have `(1) p  โ‰ค q`, `(2) q  โ‰ค p`, `(3) (p, r) compatible`, `(4) (p, r) incompatible`.
  Consider:

```


a   r
 \ / \
  p   d  e
   \ /   |
    c----*
```
- If `q=a` then `p <= q`
- If `q=c` then `q <= p`.
- If `q=d` then `โˆƒr, (p <= r, q <= r)` compatible.
- If `q=e`, then `(p, e)` incompatible.
- We wish to show that there is a $p \in G$ such that all its extensions lie in $Z$.
- That is to say, all of the extensions of $p \in G$ do not lie in $Z^c$.


#### Proof of net lemma

- To prove: If $G$ is a generic ideal of $P$, and $G \subseteq (Z \in M)$, then
  there is a $p \in G$, such that all $q$ such that $p \leq q$ are in $Z$. That
  is, $\forall G, \exists p \in G, \forall q \in G, p \leq q \implies q \in Z$.

- Let $D$ be the set of elements in $p$ that is incompatible with every element in $Z^c$:
  $D \equiv \{ p \in P: \forall q \in Z^c, p \perp q \}$
- If $D$ were dense in $P$, then an element $r \in G \cap D$ would be the
  element we are looking for, where all the extensions of $r$ is in $G$.
- Let's try to show that $D$ is dense. Let $p \in P$ be arbitrary. We need to find a
  $d \in D$ such that $p \leq d$.
- If $p \perp q$ for every $q \in Z^c$, then we are done, since $p \in D$, and thus $p \leq p \in D$.
- On the other hand, suppose there is a $q$ such that $p \not \perp p$. That is, there is an $r$
  such that $p \leq r, q \leq r$.
- Now what? Now we make an observation: See that we can freely add $\uparrow Z^c = \{ r : \exists q \in Z^c, q \leq r \}$
  into $D$, because (1) if we consider $G \cap (D \cup Z^c)$, then $G \cap Z^c = \emptyset$.
  (2) $G \cap \uparrow Z^c$ could have an element $\uparrow r \in \uparrow Z^c, \in G$. But this cannot happen,
  because this means that $\exists q \in Z^c, q \leq \uparrow r$. But since $G$ is downward closed and $r \in G$, this means that $q \in G$,
  which is a contradiction as $q \in Z^c$ which has empty intersection with $G$.
- TLDR: We can fatten up any set with $Z^c$, while not changing the result of $G \cap - $!
- So we build $D' \equiv D \cup \uparrow Z^c$, which is to say, $D' \equiv \{ (p \in P: \forall q \in Z^c, p \perp q) \} \cup \{ r \in P : (\exists q \in Z^c, q \leq r)\}$.
- We claim that $D'$ is dense. Suppose we have some $p \in P$. (1) $p \perp q$ for all $q \in Z^c$, and thus $p \leq p \in D'$ and we are done.
  Otherwise, assume that there is some $q$ such that $p \not \perp q$. then there is an $r \in P$, such that $p \leq r, q \leq r$.
  This gives us an $r \in D'$ such that $p \leq r \in D'$ and we are done.


#### Simpler proof of net lemma (Unverified)
- Let $D \equiv \{ p \in P : \forall q \in Z^c, p \not \leq q \}$.
- Let's now pick a concrete $p \in P$, and try to show that $D$ is dense. so we need to find a $d \in D$ such that $p \leq d$.
- *Easy case:* If $p$ has no extensions in $Z^c$, then $p \in D$ by defn of $D$;
   we are done since $p \leq (p \in D)$, ahd thus density is fulfilled.
- *Hard case:* Suppose $p$ does have an extension $q \in Z^c$, what then? How do I find an element of $d \in D$
  such that $p \leq d$? ($d$ for extension)?
- *Hard case:* See that we will be using $D$ to consider $r \in (G \cap D)$ to find an element $r$ whose every extension
  lies in $Z$. So suppose we add $q \in Z^c, p \leq q$ into $D$ (ie, $D' \equiv D \cup \{q\}$).
- While $q \in D$, we will still have that $q \not \in G \cap D$,
  because $q$ lies in $Z^c$, which has zero intersection with $G \subseteq Z$!
- Thus, we can throw $Z^c$ in $D$ "for free" to fatten $D$ up to make it
  more dense, while knowing that $G$ will cull this $Z^c$ portion.
- So define $D' \equiv \{ p \in P : \forall q \in Z^c, p \not \leq q\} \cup Z^c$
- We claim that $D'$ is dense.
- Suppose $p \in P$. If for all $q \in Z^c$, $p \not \leq q$, then $p \in D'$.
  Otherwise, suppose $p \leq q \in Z^c$. Then we have $p \leq (q \in Z^c \subseteq D'$. Thus $D'$ is dense.
- Let $r \in G \cap D$. Then we cannot have $r$ come from the portion of $Z^c$, since $G \cap Z^c = \emptyset$.
  This means that $r$ came from the first part of the set $D'$,
  where no extension of $p$ lies in $Z^c$. Thus we are done.

#### Intuition for Net definition
- A net $Z \subseteq P$ could be defined in two ways: (A) $\forall p \in P, \exists z \in Z, p \leq Z$,
  or (B) $\forall z \in Z, \exists p \in P, z \leq p$.
- It can't be (B), because (B) has a trivial solution $Z = \emptyset$!
- It should be (A), because (A) forces $Z$ to be "non-trivial", since I can test it against all $p \in P$.

#### Names and name creation
- Let $N$ (for names) be defined transifinitely, where $N_0 \equiv \emptyset$, $N_{i+1} \equiv \mathcal{P}(P \times N_i) \cap M$,
  and take the union in the usual way at the limit ordinal.
- Intuitively, names let us create "hypothetical sets", which are realised into real sets for each subset $S \subseteq P$.
  We keep those elements which are tagged by $s \in S$, and we remove those sets which are not.

#### Forcing equality

###### Step 1: Defining the forcing tuple set $F^{x=y}$.

- to decide equality of $\tau, \tau'$, it is very sensitive to $G$ because elements can appear/disappear based on $G$.
- We want all triplets $(p, \tau, \tau')$ where $\tau, \tau' \in \texttt{NamedSet}(M)$such that
  $p$ forces $\tau^G = \tau'^G$.
- Recall that $p$ forces $\tau^G = \tau'^G$ means: $\tau^G = \tau'^G$ **if and only if** $p \in G$.
- Thus, $p$ must be such that it is **NECESSARILY POSSIBLY TRUE** that every
  element $\sigma^G \in \tau^G$ must also be such that $\sigma^G \in \tau'^G$,
  and also vice verss: every $\sigma'^G \in \tau'^G$ must be such that
  $\sigma'^G \in \tau^G$.
- Let us prove the forward direction, where we want to force $\sigma \in \tau$ implies $\sigma \in tau'$.
- Whenever $q \geq p$,  and $(\sigma, q) \in \tau$, there must be an $r \geq q$ such that $(\sigma^G \in \tau')$.
- We might be tempted to say that $r$ implies $(sigma^G \in \tau'^G)$ iff $(\sigma, r) \in \tau'$, but this is too strong.
  There could be many different collapses that allowed for $\sigma, r \in \tau'$. That is, we could have some $\xi^G \in \tau^G$,
  and $r$ forces $\xi^G = \sigma^G$.
- Now it looks like we need to define equality in terms of equality. We just perform induction on name rank,
  because to have $\sigma \in \tau$, the name rank of $\sigma$ must be lower than $\tau$ because we built
  the name rank universe by induction.
- So we define the condition on tripets $(p, \tau, \tau')$ of name rank less than $\alpha$ to be that
  for ALL $(\sigma, q) \in tau$ where $q \geq p$, there is
  $(\xi, r) \in \tau'$ such that $r \geq q$ and
  $(r, \sigma, \xi) \in F^{x=y}_{max(nr(\sigma), nr(\xi))}$,
- So we define the relation $F^{x=y}$ by name rank induction.

##### Step 2: defining the net
- Next, we need to define the net $Z$.
- Let $Z^{=} \equiv \{ q \in P: \forall (\sigma, q) \in \tau, \exists (\xi, r) \in tau', r \geq q \land (r \models \sigma = \xi)$ \}.
- Question: What is the meaning of the $\models$ symbol in this context?
- SID: I guess $r \models \sigma = \xi$ is syntactic sugar for $(r, \sigma, \xi) \in F^{x=y}$.
- See that $Z^{=}$ is the set of all $q$ for which $\tau$ is possibly a subset or equal to $\tau'$.
- By the inductive hypothesis of name rank, FTF holds for $\sigma, \xi$ and it follows that $Z^{=} \in M$
  [I have no fucking idea what this means].

#### Step 4: The equivalence of net, modality, relativized inclusion:

- $\tau^G \subseteq \tau'^G$ implies
- $G \subseteq Z^{=}$ implies
- $\exists p \in G, \forall q \geq p, q \in Z^{=}$ implies
- $\tau^G \subseteq \tau'^G$

Therefore, all these conditions are equivalent.

- We will show that $\tau^G \subseteq \tau'^G$ implies that $G \subseteq Z^{=}$. This by the net lemma will implu that
  there is a $p \in G$ such that all larger elements will be trapped in the net $Z^{=}$.
- Then we will prove that if there is such a $p \in G$ which traps elements in the net, then we have $\tau^G = \tau'^G$.


# Partial Evaluation, Chapter 3


#### Bootstrapping and self-application

- Suppose we have a high-level compiler in the language `S`, from `S` to `T`. I will denote that as `h : S(S โ†’ T)`.
  where the compiler is `h` (for high), written in language `S`, from `S` to `T`.
- We also have a low-level compiler written in `T` from `S` to `T`, denoted by
  `l : T(S โ†’ T)`, where the compiler is `l` for low.
- Suppose the two versions agree, so `[h]_S = [l]_T`.
- Suppose we extend `h` to `h'`, to compile the language `S'` to `T`. `h'` is also written in `S`, so we have `h': S(S'โ†’ T)`.
- Now we can use `t` on `h'` to recieve an `S'` compiler `l' : T(S' โ†’ T)`.
- TODO

# Partial Evaluation, Chapter 1

- `out = [[p]](i, i')`, then `p1 = [[mix]](p, i); out = [[p1]](i')`.
- Alternatively, can write as `[[p]](i, i') = [[ [[mix]](p, i) ]](i')`
- Let `S` be the source language (Early Rust). Let `T` be the target language (assembly). Let `L`
  be the language that the interpreter for `S` is implemented in (OCaml).
- See that we have the equation `out = [source]_S (in)`, or `out = [interp]_L (source, in).`
- That's the equation for the interpreter.
- a compiler produces a target program, so we have `target = [compiler]_L(source)`. Running
  the target and source programs should have the same effect, so `[target]_T(in) = [source]_S(in)`.
- Written differently, this is `[ [compiler]_L(source) ]_T(in) = [source]_S(in)`.

##### First futamura projection

- `out = [source]_S (in)`
- `out = [int](source, in)`
- `out = [[mix](int, source)](in)`
- But by definition of `target`, we have `out = target(in)`
- Thus, we see that `target = [mix](int, source)`. We get the *compiled output program/target program* by partially
  applying the interpreter to the source program.

##### Second futamura projection

- Start with `target = [mix](int, source)`.
- Now partially apply, `target = [mix(mix, int)](source)`.
- But we know that `target = compiler(source)`. So we must have `[mix(mix, int)] = compiler`.
- A compiler is obtained by partially applying the partial applier against the interpreter. Thus, when
  fed an input, it partially evaluates the interpreter against any input, giving us a compiler.

#### Third futamura projection

- consider `cogen = [mix(mix, mix)]`, applied to an interpreter.
- Let `comp = cogen(interp) = ([mix](mix, mix))(interp) = mix(mix, interp)`
- Apply `comp(source)`. This gives us `comp(source) = mix(mix, interp)(source) = mix(interp, source) = target`.
- Thus, we have create a compiler generator, which takes an interpreter and produces a compiler.

# Diagonal lemma for monotone functions

- Statement: For a monotone function $f: P \times P \to Q$, we have the equality
  $f(\sqcup_s s, \sqcup_t t) = f(\sqcup_x (x, x))$
- Since $\sqcup_x (x, x) \sqsubseteq \sqcup_{s, t} (s, t)$, by monotonicity of $f$, we have that
  $f(\sqcup_x (x, x)) \sqsubseteq f(\sqcup{s, t} (s, t))$.
- On the other hand, note that for each $(s_\star, t_\star)$, we have that
  $(s_\star, t_\star)  \leq \sqcup (s_\star \sqcup t_\star, t_star \sqcup t_\star) = (s_\star, s_star) \sqcup (t_\star, t_\star)$.
  Thus each element on the RHS is dominated by some element on the LHS.
- So we must have equality of LHS and RHS.

#### Proving that powering is continuous

- We wish to prove that $f^n$ is continuous, given that $f$ and $(\circ)$ is continuous.
- Proof by induction. $n = 0$ is immediate. For case $n+1$:

$$
\begin{aligned}
&(\sqcup_f f)^{n+1} \\
&= (\sqcup_f f) \circ (\sqcup_g g)^n \\
&= \sqcup_g ((\sqcup_f f) \circ g^n) \\
&= \sqcup_g \sqcup_f (f \circ g^n) \\
&= \sqcup_f (f \circ f^n) \\
&= \sqcup_f (f^{n+1}) \\
\end{aligned}
$$

- See that we used the diagonal lemma to convert the union over $f, g$ into a union over $f$.

# Cantor Schroder Bernstein via fixpoint

- Given two injections $f: S \to T$, $g: T \to S$, we want to create a bijection.
- Suppose we have $S = T = N$, and $f(n) = g(n) = n + 1$.
- If $f$ were surjective, we are done, for then $f$ is the bijection.
- In this case, $f$ is not surjective, because $T-f(S) = {0}$. So $0$ has no preimage under $f$.
- We will create a new function $f'$ by perturbing $f$, such that it does map some element in $X$ to $0$ [which is currently missed].
- Start with $f' \equiv f$. This means that $f'$ misses $0$.
- We can "force" a pre-image for $0$. How? Consider $g(0) = 1$, and set $f'(g(0)) \equiv 0$, or $f'(1) \equiv 0$.
- Whoops, but we have now "lost" a preimage for $f(1) = 2$, as now $2$ is not in the image of $f'$.
- Let's repeat the same process and fix it the same way. $f'(g(2)) \equiv 2$, or $f'(3) \equiv 2$.
- Now we have lost a pre-image for $f(3)$. Well, we just repeat the construction. For how long?
- Well, this is where we invoke the glory of a fixpoint theorem!
- See that we definitely need to reverse the arrows for $(T-f(S))$. If we start with a set $Y \subseteq T$ that
  we will reverse the arrows to, we will then need to reverse the arrows for $Y \cup F(G(Y))$.
- Thus, the set that we need to fiddle in $f'$ is $Y \mapsto (T-f(S))\cup F(G(Y))$.


# Maximal Ideals of Boolean Algebras are Truth Values

##### Boolean algebras
- Has meet, join, complement, 1, 0 with usual laws


##### Atomic boolean algebras
- Consider $2^S$ where $S$ is finite. Then the elements of the form `{s} โˆˆ 2^S` are said
  to be atoms because if `x โŠ‚ {s}` then `x = 0` or `x = {s}`.


##### Atomless boolean algebras
- Let $S$ be an infinite set, and let $I$ be a collection of its finite subsets. Then $I$ is an ideal
  (downward closed subset which has all joins), because the union of two finite sets is finite, and the
  subset of any finite set is finite.
- The quotient $T = 2^S/I$ will be an *atomless* boolean algebra.
- Note that the quotient kills all finite subsets.
- So for any non-zero $x \in T$, then it must be an equivalence class with some infinite subset.
  If we take $k, k'$ to be non-empty disjoint subsets of $x$, then neither is equivalent to $x$ or to $\emptyset$,
  because they differ at infinitely many locations from each. Thus, $x$ is not an atom.
- Furthermore, the boolean algebra is not complete, because, if we have $k_1, k_2, \dots$ be a countable collection
  of countably infinite subsets of $S$ (for example, if $S \equiv \mathbb N$, then we could take $k_i$ to be the
  set of numbers with $i$ bits as 1 in their binary representation), then this collection has no least upper bound.
- Suppose $u$ is an upper bound. Then $u$ differs from each $k_i$ in only finitely many locations.
- Now build $e_i \in u \cap k_i$, and consider the set $c \equiv u / \{ e_i \}$. That is, we remove one element from $u$
  from the intersection with each $k_i$. This new $c \subseteq u$, and $c$ is still an upper bound, since it differs
  from each of the $k_i$ at finitely many locations. Thus, this algebra is not complete.

##### Or, how to embed a poset into a boolean algebra.

- Every poset $P$ can be embedded into a complete atomic boolean algebra $2^P$
  by sending $p \mapsto \{ x : x \leq p \}$ (the ideal of $p$).
- Alternatively, that's just $Hom(-, p)$. God bless yoneda embedding.
- We can thus consider a ring map from $2^p \to 2$, which gives us a maximal ideal of $2^P$ (ideal is maximal because
  quotient is field).
- This assigns to us consistent truth values of $p$.
- In this way, maximal ideals of posets completed to rings correspond to truth values.
- Dualize the story via Grothendieck/Geometry to talk about filters :)

# Crash course on DCPO: formalizing lambda calculus

In lambda calculus, we often see functions of the form $\lambda x \rightarrow x(x)$. We would
like a way to associate a "natural" mathematical object to such a function. The
most obvious choice for lambda calculus is to try to create a set $V$ of values
which contains its own function space: $(V  \rightarrow V) \subseteq V$. This
seems to ask for a set whose cardinality is such that $|V|^|V| = |V|$, which is
only possible if $|V| = 1$, ie $V$ is the trivial set $\{ * \}$.
However, we know that lambda calculus has at least two types of functions:
functions that terminate and those that don't terminate. Hence the trivial set
is *not* a valid solution.

However, there is a way out. The crucial insight is one that I shall explain by
analogy:

- We can see that the cardinality of $\mathbb R$ is different from the cardinality
   of the space of functions over it, $\mathbb R \rightarrow \mathbb R$.
- However, "the set of all functions" isn't really something mathematicians consider.
   One would most likely consider "the set of all _continuous_ functions" $\mathbb R \rightarrow \mathbb R$.
-  Now note that a function that is continuous over the reals is [determined by its values at the rationals](https://math.stackexchange.com/questions/379899/why-is-every-continuous-function-on-the-reals-determined-by-its-value-on-rationa).
   So, rather than giving me a continus function $f: \mathbb R \rightarrow \mathbb R$, you can
   give me a continuous function $f': \mathbb Q \rightarrow \mathbb R$ which I can Cauchy-complete,
   to get a function $\texttt{completion}(f') : \mathbb R \rightarrow \mathbb R = f$.
-  Now, [cardinality considerations](https://math.stackexchange.com/a/271641/261373)
   tell us that:

$$|\mathbb R^\mathbb Q| = (2^{\aleph_0})^{\aleph_0} = 2^{\aleph_0 \cdot \aleph_0} = 2^\aleph_0 = |R|$$

- We've won! We have a space $\mathbb R$ whose space of _continuous_
   functions $\mathbb R \rightarrow \mathbb R$ is isomorphic to $\mathbb R$.
- We bravely posit: all functions computed by lambda-calculus are continuous!
   Very well. This leaves us two questions to answer to answer: (1) over what space?
   (2) with what topology? The answers are (1) a space of partial orders
   (2) with the [Scott topology](https://en.wikipedia.org/wiki/Scott_continuity)



### Difference between DCPO theory and Domain theory

- A DCPO (directed-complete partial order) is an algebraic structure that can
  be satisfied by some partial orders. This definition ports 'continuity'
  to partial orders.

- A domain is an algebraic structure of even greater generality than a DCPO.
  This attempts to capture the fundamental notion of 'finitely approximable'.

- The presentation of a domain is quite messy. The nicest axiomatization of
  domains that I know of is in terms of [information systems](https://en.wikipedia.org/wiki/Scott_information_system).
  One can find an introduction to these in the excellent book
  ['Introduction to Order Theory' by Davey and Priestly](https://www.cambridge.org/core/books/introduction-to-lattices-and-order/946458CB6638AF86D85BA00F5787F4F4)


### Computation as fixpoints of continuous functions

### Posets, (least) upper bounds

- A partial order $(P, \leq)$ is a set equipped with a reflexive, transitive, relation $\leq$ such that
  $x \leq y$ and $y \leq x$ implies $x = y$.
- A subset $D \subseteq P$ is said to have an *upper bound* $u_D \in P$ iff for all $d \in D$, it is true that $d \leq u_D$.
- An upper bound is essentially a cone over the subset $D$ in the category $P$.
- A subset $D \subseteq P$ has a *least upper bound* $l_D$ iff (1) $l_D$ is an upper bound of $D$, and (2)
  for every upper bound $u_D$, it is true that $l_D \leq u_D$
- Least upper bounds are unique, since they are essentially limits of the set $D$ when $D$ is taken as a thin category

### Directed Sets

- A subset $D \subseteq P$ is said to be *directed* iff for every *finite* subset $S \subseteq D$,
  $S$ has a upper bound $d_S $ in $D$. That is, the set $D$ is closed under the upper bound of all of its
  subsets.
- Topologically, we can think of it as being *closure*, since the upper bound is sort of a "limit point" of the subset,
  and we are saying that $D$ has all of its limit points.
- Categorically, this means that $D \subseteq P$ taken as a subcategory of $P$ has cones over all finite
  subsets. (See that we *do not* ask for limits/least upper bounds over all finite subsets, only cones/upper bounds).
- We can think of the condition as saying that the information in $D$ is internally consistent: for any two facts,
  there is a third fact that "covers" them both.
- slogan: internal consistency begets an external infinite approximation.
- QUESTION: why not ask for countable limits as well?
- QUESTION: why upper bounds in $D$? why not external upper bounds in $P$?
- QUESTION: why only upper bounds in $D$? why not least upper bounds in $D$?
- ANSWER: I think the point is that as long as we ask for upper bounds, we can pushforward via a function $f$,
  since the image of a directed set will be directed.

### Directed Complete Partial Order (`DCPO`)

- A poset is said to be directed complete iff every directed set $D \subseteq P$ has a least upper bound in $P$.
- Compare to chain complete, `CCPO`, where every chain has a LUB.
- QUESTION: why not postulate that the least upper bound must be in $D$?
- In a DCPO, for a directed set $D$, denote the upper bound by $\cup D$.

### Monotonicity and Continuity

- A function $f: P \to Q$ between posets is said to be monotone iff $p \leq p'$ implies that $f(p) \leq f(p')$.
- A function $f: P \to Q$ is continuous, iff for every directed set $D \subseteq P$, it is true that $f(\cup D) = \cup f(D)$.
- This has the subtle claim that the image of a directed set is directed.
-

### Monotone map

- A function from $P$ to $Q$ is said to be monotone if $p \leq p' \implies f(p) \leq f(p')$.
- Composition of monotone functions is monotone.
- The image of a chain wrt a monotone function is a chain.
- A monotone function **need not preserve least upper bounds**. Consider:

$$
f: 2^{\mathbb N} \rightarrow 2^{\mathbb N}
f(S) \equiv
\begin{cases}
S & \text{$S$} is finite \\
S U \{ 0 \} &\text{$S$ is infinite}
\end{cases}
$$

This does not preserve least-upper-bounds. Consider the sequence of elements:

$$
A_1 = \{ 1\}, A_2 = \{1, 2\}, A_3 = \{1, 2, 3\}, \dots, A_n = \{1, 2, 3, \dots, n \}
$$

The union of all $A_i$ is $\mathbb N$.
Each of these sets is finite.
Hence $f(\{1 \}) = \{1 \}$, $f(\{1, 2 \}) = \{1, 2\}$ and so on. Therefore:

$$
f(\sqcup A_i) = f(\mathbb  N) = \mathbb N \cup \{ 0 \}\\
\sqcup f(A_i) = \sqcup A_i = \mathbb N
$$

### Continuous function

- A function is continous if it is monotone and preserves all LUBs. This is
  only sensible as a definition on ccpos, because the equation defining it is:
  `lub . f  = f . lub`, where `lub: chain(P) \rightarrow P`. However, for `lub`
  to always exist, we need `P` to be a CCPO. So, the definition of continuous
  only works for CCPOs.
- The composition of continuous functions of chain-complete partially
  ordered sets is continuous.

### Fixpoints of continuous functions

The least fixed point of a continous function $f: D \rightarrow D$ is:

$$\texttt{FIX}(f) \equiv \texttt{lub}(\{ f^n(\bot) : n \geq 0 \})$$


### $\leq$ as implication

We can think of $b \leq a$ as $b \implies a$. That is, $b$ has more information
than $a$, and hence implies $a$.

### References

- Semantics with Applications: Hanne Riis Nielson, Flemming Nielson.
- [Lecture notes on denotational semantics: Part 2 of the computer science Tripos](https://www.cl.cam.ac.uk/~gw104/dens.pdf)
- [Outline of a mathematical theory of computation](https://ropas.snu.ac.kr/~kwang/520/readings/sco70.pdf)
- [Domain theory and measure theory: Video](https://www.youtube.com/watch?v=UJrnhhRi2IE)




# Resolution algorithm for propositional logic
- Resolution is refutation complete: will find a disproof if one exists for propositional logic
-  Key idea is the resolution rule:

```
F \/ l; G \/ not(l)
-------------------
  F \/ G
```

- See that this allows us to reduce the number of occurrences of `l`. If we keep doing this, we get the empty set
  with `\/` as the monoidal operation, forcing us to conclude `False`.

# Completeness for first order logic
- This requires soundness to have been established before.
- We work with sequent calculus, where `ฮ“ => ฮ”` means that `g1 /\ g1 /\ ... /\ gn => d1 \/ d2 \/ .. \/ dn`.
- First prove that `ฮ“ => ฮ”` is derivable iff `ฮ“ U ~ฮ” => 0` is derivable.
- By soundness, this means that `ฮ“ U ~ฮ”` is inconsistent.
- Thus, see that `ฮ“ => ฮ”` is derivable ifff `ฮ“ U ~ฮ”` is inconsistent.
- contraposing, `ฮ“ => ฮ”` is NOT derivable ifff `ฮ“ U ~ฮ”` is Consistent.
- Thus, the set `CONSISTENT := { ฮ“=>ฮ” |  ฮ“=>ฮ” has a model}` is equal
  to the set `{ ฮ“=>ฮ” | ฮ“ U ~ฮ” is inconsistent}`, which (by soundness)
  is the same as `{ ฮ“=>ฮ” | ฮ“U~ฮ” is not derivable}`.
- We want to show that `CONSISTENT` is a satisfiable set (obeys conditions `(S0)`...`(S8)`),
  which will allow us to produce models for all `ฮฆ โˆˆ CONSISTENT` (by taking the closure `ฮฆ#` and building the term model,
  where taking the closure needs the ambient `CONSISTENT` set to obey satisfiability).
- Thus, this shows that every element of `consistent` (proofs of sequent calculus) in fact has a model, and thus we are complete.


# Compactness theorem of first order logic

- Define a theory to be a set of sentences.
- Compactness states that if a theory `T` is such that every finite subset `Tfin โŠ‚ T` of the theory
  has a model, then `T` itself has a model.

#### Proof Sketch
- Let `L` be a language.
- We study `CONSISTENT := { T โŠ‚ 2^L : T has a model }` to be the set of
  all theories of `L` which is consistent.
- (1.a) We analyze `CONSISTENT` and see that it has properties
  satisfcation (`(S0)` ... `(S8)`).
- (1.b) We show that if `K` is a set of theories which has satisfaction, then
  so does `PROFINITE(K) := { T โˆˆ K : โˆ€ Tfin โŠ‚ T, Tfin has a model }`.
- (2.a) We analyze, for a model `M`, the set `TRUTHS := { T : T is true for M }`.
  We see that it has properties of called closures (`(C0)`, ..., `(C8)`).
- (3) We show that if `ฮ”` has `(C0)`, ... `(C8)`, then `ฮ”` has a model (the *term* model).
- (4) Show that if `ฮ“` is a theory, then `ฮ“#` is the *closure* of the theory,
  such that `ฮ“#` obeys `(C0)`...`(C8)` and `(ฮ“ โŠ‚ ฮ“#)`.
- (5) Show that if `ฮ“ โˆˆ S` where `S` has satisfaction, then one can build a `ฮ“ โŠ‚ ฮ“# โˆˆ S` where `ฮ“#` is closed.
- (6) To prove compactness, take a theory `ฮ” โˆˆ PROFINITE(CONSISTENT)`.
      Since `CONSISTENT` has satisfaction, and `PROFINITE` preserves satisfaction,
      `PROFINITE(CONSISTENT)` has satisfaction. Now apply (5) to build the closure `ฮ”#`.
      Use (3) to build the term model `M(ฮ”#)`, a model for `ฮ”#`, which is also a model for `ฮ”`.

#### Proof sketch sketch

- 0. Define a property called "satisfaction" which is possessed by the set of consistent theories.
- 1. See that the profinite completion of a satisfaction set also obeys satisfcation.
- 2. Define a property called closure on a theory, where a closed theory possesses a term model.
- 3. Show that every theory in a satisfaction set also has a closure in the satisfaction set.
- 4. Take `ฮ“ โˆˆ PROF(CONSISTENT)`, a theory `ฮ“` which is profinite,
     which we wish to build a model for. Create `ฮ“#`, the closure, such that `ฮ“ โŠ‚ ฮ“#`.
     See that `ฮ“#` has a model (the term model `Mฮ“`), and that this is also a model for `ฮ“`, and thus `ฮ“` is consistent.

#### Non algorithmic proof sketch
- See that given a `S` which obeys `(S1)`...`(S8)`, `PROFINITE(S)` has **finite character**.
- A family `F` has finite character is defined to be: `A โˆˆ F` iff all subsets of `A` belong to `F`.
- Show that for any `ฮ“ โˆˆ S*`, there is a maximal `ฮ“# โˆˆ S*` which contains `ฮ“`.
  This follows by Zorn on `S*`. Let the partial order by the subset ordering on `S*(ฮ“) := { ฮ” โˆˆ S* | ฮ“ โŠ‚ ฮ” }`.
  See that every chain has a maximal element, by the finite character property. Thus, `S*(ฮ“)` has a maximal element, call it `ฮ“#`.
- Show that this `ฮ“#` obeys `(C0)`...`(C8)` [closure properties] This will rely on `S*` having `(S1)`..`(S8)`.
  Thus, `ฮ“#` possesses a model (the term model).
- This means that `ฮ“` also possees a term model.


#### Algorithmic proof: details

- TODO



# First order logic: Semantics
- $M \models F$ can be reads as $M$ models $F$, or $M$ makes true $F$ ($M$ for model, $F$ for formula).

#### Defining models for quantification

- We wish to define $M \models \forall x, F(x)$
- A first shot might be $M \models \exists x, F(x)$ iff for every closed term $t$, $M \models F(t)$.
- However, see that intuitively, $\exists x$ ranges over the _denotational space_, while closed terms range over
  the _image of syntax in the denotation_.
- For example, consider the language of nautrals, which we can interpret over naturals, nonnegative rationals, and reals.
  So let us think of the formula $(F \equiv \exists t, t + t = 1)$. If we only allow $t$ to take on closed terms, then
  see that since the closed terms of naturals are natural numbers, this will be false! But really, when interpreted
  over the integers, we want the formula to be true, since there is the real number $1/2$ which witnesses
  the truth of $(\exists t, t + t = 1)$. Thus, it is insufficient to range over closed terms, since the "image"
  of the closed terms in $\mathbb R$ is going to be $\mathbb N$, but in fact, we have "more in $\mathbb R$"
  than just the closed terms which are unreachable.
- So the correct notion of $M \models \exists x, F(x)$ is to take $M$, extend with a constant symbol $c$, evaluate it to some $m \in M$,
  and call this $M^c_m$. Then, we say that $M \models \exists x, F(x)$ iff there exists an $m \in M$ such that $M^c_m \models F(c)$.
- See that this allows access to denotations, without needing a syntactic closed term.
- his fells close to the notion of **adequacy**, when the operational and denotational semantics
  agree.[HackMD notes by Alexander Kurz](https://hackmd.io/@alexhkurz/Hkf6BTL6P#Adequacy)

# full abstraction in semantics

- Observational equivalence: same results when run, $\sim_O$
- Denotational equivalence: same denotation.
- Full abstraction: the two equivalences coincide: observationally equivalent iff denotationally equivalent.
- I thought full abstraction meant that everything in the denotational side has a program that realises it!

#### Parallel `or` and PCF

  For example, I thought that the problem will `por` in PCF was that it wasnt't possible to implement in the language.
- However, this is totally wrong.
- The reason `por` is a problem is that one can write a _real programs_ in PCF of type `(bool -> bool -> bool) -> num`,
  call these `f, g`, such that `[[f]](por) != [[g]](por)`, even though both of these have the same operational semantics!
  (both `f`, `g` diverge on all inputs).
- [SEP reference](https://plato.stanford.edu/entries/games-abstraction/#ProgEquiFullAbst)

#### Relationship between full abstraction and adequacy
- Adequacy: $O(e) = v$ iff $[[e]] = [[v]]$. This says that the denotation agress on observations.
- See that this is silent on _divergence_.

##### Theorem relating full abstraction and adequacy
- Suppose that $O(e) = v \implies [[e]] = [[v]]$.  When is the converse true? ($[[e]] = [[v]] \implies O(e) = v$?)
- It is true iff we have that $e =_M e'$ iff $e =_O e'$.


#### Full abstraction between languages
- say two languages $L, M$ have a translation $f: L \to M$ and have the same observables $O$.
- Then the translation $f$ is fully abstract iff $l \sim_O l' \iff f(l) \sim_O f(l')$
- See that $L$ cannot be more expressive than $M$ if there is a full abstract translation from $L$ into $M$.

- [HackMD notes by Alexander Kurz](https://hackmd.io/@alexhkurz/Hkf6BTL6P#Adequacy)


# You could have invented Sequents

- Key idea: define a notation called `ฮ“ => ฮ”` iff the conjunction of sentences
  in gamma implies the disjunction of terms in delta.
- Why would anybody do this? isn't this weird?
- It's because we first note what we need to think about consequence, validity, and unsatisfiability.
- `d1` is a consequence of `ฮ“`  iff `g1 /\ g2 .. /\ gn => d1`
- `d1` is valid iff `empty => d1`, or written differently, `0 => {d1}`.
- `ฮ“` is unsatisfiable iff `g1 /\ ... /\ gn => False`, or written differently, `ฮ“ => 0`
- Thus, see that on the RHS, we need a set with 0 or 1 inhabitant. We can think of this as `Maybe`, smooshed
  together with `\/`, since we want the empty set to represent `False`.
- Recall that haskell teaches us to replace failure with a list of successes!
- Thus we should use `ฮ“ => ฮ”` where on the RHS, we have a list that is smooshed together by or (`\/`)!
- Great, we have successfully invented sequents.


# Fibrational category theory, sec 1.1, sec 1.2

- Key idea: can define a notion of a bundle `p: E โ†’ B`
- The idea is that we want to generalize pullbacks into fibres.
- A functor `p: E \to B` is called as a fibration if for each morphism `f: b โ†’ b'`
  downstairs, and an element `e' โˆˆ E` such that `ฯ€(e') = b'`, then we have a lift
  of the morphism `f` into `fโ™ฎ`, such that this morphism has a property called
  *cartesianity*.
- Given:

```
        e'
        |

b ----> b'
```


- We want:

```

e===fโ™ฎ=>e'
|       |
|ฯ€      |ฯ€
v       v
b --f-->b'
```

- Furthermore, to ensure that this is really a pullback, we ask for the condition that
  TODO

#### Omega sets

- A set with some sort of "denotation by natural numbers" for each element of the set.
- More formally, an omega set is a tuple `(S, E: S โ†’ 2^N)` such that `E(s) โ‰  โˆ…`.
  The numbers `E(s)` are to be thought of as the denotation for the element `s`.
- A morphism of omega sets `(S, E) โ†’ (S', E')` is a function `f: S โ†’ S'`, and
  a partial function `realiser(f): N โ†’ N` such that for all `s โˆˆ S`, `realiser(f)(E(s)) โŠ‚ E'(f(s))`.
  That is, for every denotation `d โˆˆ E(s)`, we have that the realiser `realiser(f)` maps `d` into
  the denotation of `f(s)`, so we must have that `d' = realiser(f(d))` lives in `E'(f(s))`.

#### PERs
- This is a partial equivalence relation, so we only need symmetry and transitivity.
- We consider partial equivalence relations (PERs) over `N`.
- Let `R` be a PER. We think of those elements that are reflexively related (ie, `xRx`) as
  "in the domain" of the `PER`.
- Thus we define `domain(R) = { x | xRx }`.
- In this way, `R` is a real equivalence relation on `domain(R)`.
- We write `N/R` or `domain(R)/R` for the equivalence classes induced by `R` on `N`.
- The category of PERs has as objects these PERs.
- Intuitively, these give us subsets of the naturals ...


#### Cloven Fibrations

- A fibration is cloven if for every arrow downstairs, there is a chosen cartesian
  arrow upstairs. So we have a *function* that computes the cartesian arrow upstairs
  for an arrow downstairs. This is different from the regular definition where
  we just know that there *exists* something upstairs.
- Note that given a fibration, we can always cleave the fibration using choice.
- Recall the definition of cartesian. For a functor `p: E โ†’ B`, for every arrow
  downstairs `u: I โ†’ J โˆˆ B` and every object `Y โˆˆ E` lying above `J` (ie, `p(Y) = J`),
  there is a cartesian lift of `u` given by `X โ†’ Y` for some `X` lying above `I`. (`p(X) = I`).
- Having made such a choice, every map `u: I โ†’ J` in `B` gives a functor `u*: E_J โ†’ E_I` from the the fiber `E_J` over `J`
  to the fiber `E_I` over `I`. (Direction changes, pullback)
- Recall that `E_J` is a subcategory of `E` where the objects are `p^{-1}(J)`, and the morphisms
  are `p^{-1}(id_J)`.
- Implement the map `u*` as `u*(y)` as that object given by lifting the map `u: I โ†’ J` along `Y`.
  This is well-defined since we have a clevage to pick a unique `u*(Y)`!

```
defn
u*(Y)-->Y
        |
        v
I -u--->J
```

- For morphisms, suppose we are given an arrow `f: Y โ†’ Y'` in `E_J`. Then we use the cartesianity of the
  lifted morphism to give us the lift. Mediatate on the below diagram:

####  Split Fibrations

- A fibration is split if the cartesian lift of identity is identity, and the cartesian lift
  of compositions is the composition of the lifs (ie, the lifting is functorial).
- In a cloven fibration, this is going to only be equal upto iso.
- Example of a non-split fibration: Set-arrow, because pullbacks in set are not associative.
 Thus, the (A x B) x C != A x (B x C).
- Being split has mathemaitcal content, because it's not possible to globally fix a functor
  being non-split.

##### Pseudofunctors
- A functor where all the equalities are isos. `f(a . b) ~= f a . f b`. `f(id) ~= id`.

##### Split Indexed Category
-

##### Lemma about pulling stuff back into the fiber

- `E(X, Y) != disjoint union (u: ฯ€X -> ฯ€Y) E_{ฯ€X} (X, u*(Y))`


# Simple Type Theory via Fibrations

- Objects are contexts, so sequence of `(term:type)`
- Morphisms between contents `ฮ“ = (v1:s1, v2:s2)` and `ฮ” = (w1:t1, w2:t2)`
  are terms `M1, M2` such that we have `ฮ“ |- M1 : t1` and `ฮ“ |- M2 : t2`.
- More cleaned up, for a context `ฮ“`, and a context `ฮ”` with sequence of types `(_:t1, _:t2, ..., _:tn)`,
  a morphism is a sequence of terms `ฮ“|- M1: t1`, `ฮ“|- M2:t2`, ..., `ฮ“|-Mn:tn`.
- For concreteness, let us suppose `ฮ” = (w1:t1, w2: t2)`
- The identity morphism is `ฮ” -(d1, d2)-> ฮ”`, since we have `d1 := w1:t1, w2:t2|-w1:t1` and `d2 := w1:t1, w2:t2|-w2:t2`.
  Thus, starting from `ฮ”` on the context, we can derive terms of types `t1, t2`, which are given by the derivations `d1, d2`.
- Let us meditate on composition `ฮ“ -(d1, d2)-> ฮ” -(d1', d2')-> ฮ˜`. First off, let us write this more explicitly as:

```
ฮ“

(d1 := ฮ“|-M1:s1, d2 := ฮ“|-M2:s2)

ฮ” := (x1:s1, x2:s2)


(d'1 := ฮ”|-N1:t1, d2 := ฮ”|-N2:t2)

ฮ˜ := (_:t1, _:t2)
```

- See that have `(x1:s1, x2:s2)|- N1 : t1`
- If we substitute `N1[x1 := M1, x2 := M2]`,
  then under context `ฮ“`, we know that `M1:s1`, and `M2:s2`, so they have the
  correct types to be substituted for `x1` and `x2`. Thus, in context `ฮ“`, `N1[x1 := M1, x2 := M2]`
  has the same type it used to have `(t1)`.
- Thus we have that `ฮ“ |- N1[x1 := M1, x2 := M2] : t1`.
- This gives us the composite of the section of the morhphisms, by telling us how to compose `d'1` with `d1`.
- Do the same for `d2`.
- What the hell is going on anyway?
- Given any well typed term in a context, `ฮ“|-M:t`, we can think of this as a morphism `ฮ“ --M--> (ฮ”:=M:t)`.
- This relative point of view (ala grothendieck) lets us extend to larger contexts.
- The empty context is the terminal object, since there is precisely one morphism, consisting of the
  empty sequence `()`. Can be written as `ฮ“-()-> 0`.
- The categorical product of contexts is given by sequencing (see that this needs exchange),
  and the projections are the "obvious" rules: `ฮ“ <--ฮ“-- (ฮ“;ฮ”)---ฮ”--> ฮ”`.



# Realisability models


For closed terms `u`, and type `๐œ` we are going to define `u โŠฉ ๐œ` (read โ€œu is a
realiser of ๐œโ€). Let's say that we have PCF's type: `โ„•` and `ฯƒ โ†’ ฯ„`.

1. `u โŠฉ โ„• `if u reduces to a natural number literal
2. `f โŠฉ ฯƒ โ†’ ฯ„` if for all `sโŠฉฯƒ`, `f s โŠฉ ฯ„`.


Some immediate observations:
- This definition is by induction on the types, and assumes little from the terms (it's a logical relation).
- This is โ€œreally what we meanโ€ by `x : ๐œ` (except non-termination, not modelled here): realisability has explanatory power.
- This relation is usually undecidable.
- It is strongly related to parametricity (in parametricity we associate a binary relation to types, in realisability we associate a (unary) predicate). 4/10
- From this point of view, typing is a (usually decidable) modular approximation of realisability.
- For instance, consider `if x then 1 else (\y -> y)`. It isn't well-typed.
- However, if we add `let x = True in if x then 1 else (ฮปy. y)` it becomes a realiser of `โ„•` [because upon reduction, it produces an `N`, even
  though it is not "well-typed".

- Typing and realisability are related by a lemma sometimes referred as adequacy.

Take the rule for ฮป (simplified):

```
x:X, y:Y โŠข z : Z
-----------------------
x:X โŠข ฮป(y: Y). z : Y โ†’ Z
```

You interpret it as a statement

```
โˆ€ vโŠฉA, (โˆ€ w โŠฉ B, u[x\v, y\w] โŠฉ C) โŸน  \lambda y.u[x\v] โŠฉ C
```

- Then, you prove this statement. 7/10

Once you've done so for each rule, you can conclude (very easy induction) that
if โŠข u : A, then u โŠฉ A. This gives type safety, since realisers are practically
defined as verifying type safety.

What you've gained along the way is that this proof is an open induction. 8/10

In standard combinatorial proofs of type safety, the induction hypothesis may
depend on every case. Adding a form in your language may require redoing the
entire proof. Here the various case in the adequacy lemma will remain true. So
you can just prove the new cases. 9/10


There are many variants of realisability. Tuned for logic, with models instead
of terms, โ€ฆ My favourite is Krivine's realisability with both terms and stacks,
and magic happening when they interact. But this is another story and shall be
told another time. 10/10

# Naming left closed, right open with start/stop

Call the variables `startPos` and `stopPos`. Since it's called stop,
it's a little more intuitive that it's exclusive!

# Nested vs mutual inductive types:

```
inductive m1
| mk: m2 -> m1

inductive m2
| mk: m1 -> m2
```

```
inductive n1: Type :=
| mk: n2 n1 -> n1

inductive n2 (a: Type): Type :=
| nil: n2 a
| cons: a -> n2 a -> n2 a

```

# Embedding HOL in Lean

```
inductive Sets where
| bool: Sets
| ind: Nat -> Sets
| fn: Sets -> Sets -> Sets

def Sets.denote: Sets -> Type
| bool => Prop
| ind => nat
| fn i o => i.denote -> o.denote

def ifProp (p: Prop) (t: a) (e: a) : a := by
  match Classical.lem p with
  | Or.inl _ => t
  | Or.inr _ => e

def Model := ฮฃ (s: Sets), s.denote
```

# Module system for separate compilation

- leanprover/lean4#416
- https://www.cs.utah.edu/plt/publications/macromod.pdf
- https://raw.githubusercontent.com/alhassy/next-700-module-systems/master/phd-defence.pdf
- https://raw.githubusercontent.com/alhassy/next-700-module-systems/master/thesis.pdf

# Second order arithmetic

- First order arithmetic has variables that range over numbers
- Second order arithmetic has variables that range over sets of numbers
- [Ref: Jeremy Avigad on forcing](https://www.andrew.cmu.edu/user/avigad/Papers/forcing.pdf)
- Axiomatic second-order arithmetic is often termed โ€œanalysisโ€ because, by
  coding real numbers and continuous functions as sets of natural numbers, one
  can develop a workable theory of real analysis in this axiomatic framework.

# Lean4 Dev Meeting

- Mathport uses matli4 to move tactics.
- Mathlib4 has syntax definitions ofr every single tactic that exists in `mathlib`.
- These only exist as syntax so far. We need to port this.
- The goal for this week is to have as many as possible knocked off.


#### Macro
- A tactic that expands to another tactic.
- Example: `_` tactic. This expands into `({})`, which shows you the current state.
- `macro_rules` need a piece of `Syntax`, and it expands into another tactic.

```
-- | do the iff.rfl as well.
macro_rules | `(tactic| rfl => `(tactic| exact Iff.rfl)
```

- Closed syntax categories: `syntax rcasesPatLo := ...`.
- Open syntax categories: `syntax X`.

#### How to collate info?
- Use `macro` to define syntax + macro
- Use `elab` to define syntax + elaborator together.
- Add command to list all places where something was extended.
- Add information into docstrings.
- `match_target`.

#### `Mapsto` arrow

- `(x: ฮฑ) \mapsto e`: maybe annoying to parse.
- `\lambda (x: \alpha) \mapsto e`: easy to parse, but mathematicians don't know what lambda is.

#### `ext` tactic

- implemented as a macro tactic, which uses an `elab` tactic.

#### `colGt`

- `syntax "ext"`
- Lean4 is whitespace sensitive, like python. `colGt` says that we can have the following
    syntax on a column that is greater than the current line.

```
ext
  x -- parsed as part of `x`.
  y -- parsed as part of `y`.
z -- is not parsed as part of `x y`.
```
- If in parens, we don't need colGt, because we want to allow something like:

```
ext (x
 y) -- should parse.
```
#### `ppSpace`

- Used when pretty printing a tactic.

#### Scoped syntax
- `scoped syntax "ext_or_skip: .. `.
- This declares a syntax that is valid only for the current section/namespace.
- Trailing percent `ext_proof%` is an indicator that it is a term macro / term elaboration.
- Protected: identifier cannot appear without prefixing namespaces.

#### Trivia
- `{..}` pattern matches on a struct.


#### Tactic development: `Trace`
- Create a new file `Mathlib/Tactic/Trace.lean`
- Move the syntax line from `./Mathlib/Mathport/Syntax.lean`  into `Mathib/Tactic/Trace.lean`.
- On adding a file, add it to `Mathlib.lean`. So we add `import Mathlib.Tactic.Trace`
- We also want to re-import the syntax into `Mathlib/Mathport/Syntax.lean`.
- We have now moved `trace` into its own file and hooked into the build system and extension.
- The first thing to do is to find out what the tactic even does.
- Go to [`trace`](https://leanprover-community.github.io/mathlib_docs/tactics.html#trace) at the mathlib docs.

```
-- Tactic.lean
import Lean
open Lean Meta Elab

syntax (name := trace) "trace " term : tactic

elab "foo" : tactic => do
  -- `TacticM Unit` expected
  logInfo "hi"

```

```
open Lean Meta Elab Tactic ~=
open Lean
open Lean.Meta
open Lean.Elab
open Lean.Elab.Tactic
```
- TacticM is a `MonadRef`, which is aware of source spans to report the errors.
  so we can write:

```
elab "foo" : tactic => do
  logInfo "hi"
```

- We can use `withRef` to control the current source span where errors
  are reported.

```
elab tk:"foo" val:term : tactic => do
  withRef tk (logInfo val)
```

- We want to evaluate the `val:term`, because otherwise, it literally prints the syntax
  tree for things like `(2 + (by trivial))`.
- Use `elabTerm` to elaborate the syntax into a term.
- `set_option trace.Elab.definition true in ...` which printso ut the declarations that are being
  sent to the kernel.
- `elab` is `syntax` + `elab_rules` together, just like `macro` is `syntax` + `macro_rules` together.
- Create a test file `test/trace.lean`. Import the tactic, and write some examples.
- Recompile, and check that the test works.
- How do we check that our port works?


#### Reducible

> To clarify, @[reducible] marks the definition as reducible for typeclass
> inference specifically. By default typeclass inference avoids reducing because
> it would make the search very expensive.


# Categorical model of dependent types

- [Motivation for variants of categorical models of dependent types](https://proofassistants.stackexchange.com/questions/1086/what-are-the-motivations-for-different-variants-of-categorical-models-of-depende)
- [Seminal paper: Locally cartesian closed categories and type theory](https://www.math.mcgill.ca/~rags/LCCC/LCCC.pdf)
- A closed type is interpreted as an object.
- A term is interpreted as a morphism.
- A dependent type upon $X$ is interpreted as an object of the slice category $C/X$.
-  A dependent type of the form `x: A |- B(x) is a type` corresponds to morphisms `f: B -> A`,
    whose fiber over  `x: A` is the type `f^{-1}(x) = B(x)`.
- The dependent sum $\Sigma_{x : A} B(x)$ is given by an object in $Set/A$, the set $\cup_{a \in A} B_a$.
  The morphism is the morphism from $B_a \to A$ which sends an elements of $B_{a_1}$ to $a_1$, $,B_{a_2}$ to $a_2$ and so forth.
  The fibers of the map give us the disjoint union decomposition.
- The dependent product $\Pi_{x: A} A(x)$ is given by an object in $Set/A$.
- We can describe both dependent sum and product as arising as adjoints to the functor $Set \to Set/A$ given
  by $X \mapsto (X \times A \to A)$.
- Recalling that dependent types are interpreted by display maps, substitution
  of a term tt into a dependent type BB is interpreted by pullback of the
  display map interpreting BB along the morphism interpreting tt.
- [Reference](https://ncatlab.org/nlab/show/categorical+model+of+dependent+types)

#### Key ideas

- [Intro to categorical logic](https://www.andrew.cmu.edu/user/jonasf/80-514-814/notes.pdf)
- Contexts are objects of the category `C`
- Context morphisms are morphisms `f: ฮ“ โ†’ ฮ”`
- Types are morphisms `ฯƒ: X โ†’ ฮ“` for arbitrary `X`
- Terms are sections of `ฯƒ: X โ†’ ฮ“`, so they are functions `s: ฮ“ โ†’ X` such that `ฯƒ . s = id(ฮ“)`
- Substitution is pullback

#### Why is substitution pullback?

- Suppose we have a function $f: X \to Y$, and we have a predicate $P \subseteq Y$.
- The predicate can be seen as a mono $P_Y \xrightarrow{py} Y$, which maps the subset where $P$ is true into $Y$.
- now, the subset $P_X \equiv P_Y(f(x))$, ie, the subset $P_X \equiv \{ x : f(x) \in P_Y \}$ is another subset $P_X \subseteq X$.
- See that $P_X$ is a pullback of $P$ along $f$:

```
P_X -univ-> P_Y
|            |
px            py
|            |
v            v
X -----f---> Y
```

- This is true because we can think of $Q_X \equiv \{ x \in X, y \in P_Y: f(x) = py(y) \}}$.
- If we imagine a bundle, at each point $y \in Y$, there is the presence/absence of a fiber $py^{-1}(y)$
  since $py$ is monic.
- When pulling back the bundle, each point $x \in X$ either inherits this fiber or not depending
  on whether $f(x)$ has a fiber above it.
- Thus, the pullback is also monic, as each fiber of $px$ either has a strand or it does not, depending
  on whether $py$ has a strand or not.
- This means that $px(x)$ has a unique element precisely when $f(x)$ does.
- This means that $px$ is monic, and represents the subset that is given by $P_Y(f(x))$.

#### Isn't substitution composition?

- If instead we think of a subset as a function $P_Y: Y \to \Omega$ where $\Omega$ is the subobject classifier,
  we then get that $P_X$ is the composite $P_X \equiv P_Y \circ f$.
- Similarly, if we have a "regular function" $f: X \to Y$, and we want to substitute $s(a)$ ($s: A \to X$ for
  substitution) into $f(x)$ to get $f(s(a))$, then this is just computing $f \circ s$.

#### Using this to do simply typed lambda calculus

- [Introduction to categories and categorical logic](http://www.cs.ox.ac.uk/people/bob.coecke/AbrNikos.pdf)
- Judgement of the form `A1, A2, A3 |- A` becomes a morphism `A1xA2xA3 โ†’ A`.
- Stuff above the inference line will be arguments, stuff below the line will be the return value.
- Eg, the identity judgement:

```
ฮ“,A |- A
```

becomes the function `snd: ฮ“xA โ†’ A`.

#### Display maps

- [Reference: Substitution on nlah](https://ncatlab.org/nlab/show/substitution)
- To to dependent types in a category, we can use [display maps](https://ncatlab.org/nlab/show/display+map).
- The display map of a morphism $p: B \to A$ represents $x:A |- B(x): Type$. The intuition is that $B(x)$
  is the fiber of the map $p$ over $x:A$.
- For any category $C$, a class of morphisms $D$ are called display maps iff all pullbacks of $D$ exist and
  belong to $D$. Often, $D$ is also closed under composition.
- Said differently, $D$ is closed under all pullbacks, as well as composition.
- A category with displays is _well rooted_ if the category has a terminal object $1$, and all maps into $1$
  are display maps (ie, they can always be pulled back along any morphism).
- This then implies that binary products exist (?? HOW?)


#### Categories with families

- [Lectures notes on categorical logic](https://staff.math.su.se/palmgren/lecturenotesTT.pdf)

# Coends

- Dual of an end
- A cowedge is defined by injections into the co-end of all diagonal elements.

```
p(a, a)   p(b, b)
  \          /
  ฯ€[a]      ฯ€[b]
   \        /
    v      v
     \int^x p(x, x)
```

- It's a universal cowedge, so every cowedge `c` other must factor.

```
p(a, a)   p(b, b)
  \   \  /    /
  ฯ€[a]  c   ฯ€[b]
  \     |    /
   \    โˆƒ!  /
    \   |  /
    v   v  v
     \int^x p(x, x)
```

- Now we have the cowedge condition. For every morphism `h: b -> a`, and for every cowedge `c`, the following
  must hold:

```
   [p b a]
  /      \
[p a a]   [p b b]
  \      /
      c
```

- By curry howard, `type coend p = exists a. p a a`


```
data Coend p where
  MkCoend :: p a a -> Coend p
```

```
type End (p :: * -> * -> *) = forall x. p x x
```

- A functor is continuous if it preserves limits.
- Recall that `Hom` functor preserves limits.

```
-- Hom(\int^x p(x, x), r) ~= \int_x (Hom(p x x, r))
type Hom a b = a -> b
```

- `Set(\int^x p(x, r), s)` is asking for a function `Coend p -> r`.
- But e claim this  is the same as having `forall a. (p a a -> r)`.
- So we can write `Set(\int^x p(x, x), r) ~= \int_x Set(p(x, x), r)`.

```
-- | \int_x Set(p(x, x), r)
-- | \int_x Hom(p(x, x), r)
-- | \int_x RHS(x,x)
       where RHS a b = Hom(p(a, b), r)
type RHS p r a b = Hom (p a b) r -- rhs of the above expression
```


The isomorphisms are witnessed below, reminiscent of building a continuation

```
-- fwd :: (Coend p -> r) -> End (RHS p r)
-- fwd :: (Coend p -> r) -> (forall x. (RHS r) x x)
-- fwd :: (Coend p -> r) -> (forall x. Hom (p x x) r)
-- fwd :: (Coend p -> r) -> (forall x. (p x x) -> r)
fwd :: Profunctor p => (Coend p -> r) -> (forall x. (p x x) -> r)
fwd coendp2r  pxx = coendp2r (MkCoend pxx)
```

- The backward iso, reminiscent of just applying a continuation.

```
-- bwd :: End (RHS p r)             -> (Coend p -> r)
-- bwd :: (forall x. (RHS r) x x)   -> (Coend p -> r)
-- bwd :: (forall x. Hom (p x x) r) -> (Coend p -> r)
-- bwd :: (forall x. (p x x) -> r)  -> (Coend p -> r)
bwd :: Profunctor p => (forall x. (p x x) -> r) -> Coend p -> r
bwd pxx2r (MkCoend paa) = pxx2r paa
```

- Ninja coyoneda lemma: `\int^x C(x, a) * f(x) ~= f(a)`
- Witnessed by the following:

```hs
-- ninja coyoneda lemma:
-- \int^x C(x, a) * f(x) ~= f(a)
-- the profunctor is \int^x NinjaLHS[f, a](x, y)
--   where
newtype NinjaLHS g b y z = MkNinjaLHS (y -> b, g z)
```



- Forward iso:

```hs
-- ninjaFwd :: Functor f => Coend (NinjaLHS f a) -> f a
ninjaFwd :: Functor g => Coend (NinjaLHS g r) -> g r
ninjaFwd (MkCoend (MkNinjaLHS (x2r, gx))) = fmap x2r gx
```

- Backward iso:

```hs
-- ninjaBwd :: Functor f => g r -> (Coend (NinjaLHS g r))
-- ninjaBwd :: Functor f => g r -> (โˆƒ x. (NinjaLHS g r x x))
-- ninjaBwd :: Functor f => g r -> (โˆƒ x. (NinjaLHS (x -> r, g x))
ninjaBwd :: Functor g => g r -> Coend (NinjaLHS g r)
ninjaBwd gr = MkCoend (MkNinjaLHS (x2r, gx)) where
   x2r = id -- choose x = r, then x2r = r2r
   gx = gr -- choose x = r
```

- We can prove the ninja coyoneda via coend calculus plus yoneda embedding,
  by using the fact that yoneda is full and faithful.
- So instead of showing LHS ~= RHS in the ninja coyoneda, we will show that `Hom(LHS, -) ~= Hom(RHS, -)`.

- We compute as:

```
Set(\int^x C(x, a) * f(x), s) ~? Set(f(a), x)
[continuity:]
\int_x Set(C(x, a) * f(x), s) ~? Set(f(a), x)
[currying:]
\int_x Set(C(x, a), Set(f(x) s)) ~? Set(f(a), x)
[ninja yoneda on Set(f(-), s):]
Set(f(a) s)) ~? Set(f(a), x)
Set(f(a) s)) ~= Set(f(a), x)
```

#### Ninja Coyoneda for containers


- The type of `NinjaLHS`, when specialized to `NinjaLHS g b r r` becomes `(r -> b, g r)`.
- This is sorta the way you can get a `Functor` instance on any `g`, by essentially accumulating
  the changes into the `(r -> b)`. I learnt this trick from some kmett library, but I'm not
  sure what the original reference is.

- Start with `NinjaLHS`:

```hs
-- ninja coyoneda lemma:
-- \int^x C(x, a) * f(x) ~= f(a)
-- the profunctor is \int^x NinjaLHS[f, a](x, y)
--   where
newtype NinjaLHS g b y z = MkNinjaLHS (y -> b, g z)
```

- Specialize by taking the diagonal:

```hs
-- newtype NinjaLHS' g i o = MkNinjaLHS' (i -> o, g i)
newtype NinjaLHS' g i o = MkNinjaLHS' (NinjaLHS g o i i)
```

- Write a smart constructor to lift values into `NinjaLHS'`:

```
mkNinjaLHS' :: g i -> NinjaLHS' g i i
mkNinjaLHS' gi = MkNinjaLHS' (MkNinjaLHS (id, gi))
```

- Implement functor instance for `NinjaLHS' g i`:

```
-- convert any storage of shape `g`, input type `i` into a functor
instance Functor (NinjaLHS' g i) where
  -- f:: (o -> o') -> NinjaLHS' g i o -> NinjaLHS' g i o'
  fmap o2o' (MkNinjaLHS' (MkNinjaLHS (i2o, gi))) =
    MkNinjaLHS' $ MkNinjaLHS (\i -> o2o' (i2o i), gi)
```

See that to be able to extract out values, we need `g` to be a functor:

```
extract :: Functor g => NinjaLHS' g i o -> g o
extract (MkNinjaLHS' (MkNinjaLHS (i2o, gi))) = fmap i2o gi
```


# Natural Transformations as ends
- [Bartosz: Natural transformations as ends](https://www.youtube.com/watch?v=DseY4qIGZV4&list=PLbgaMIhjbmEn64WVX4B08B4h2rOtueWIL&index=13)
- Ends generalize the notion of product/limit. It's sort of like an infinite product plus the wedge condition.
- $\int_X p x x$ is the notation for ends, where $p$ is a profunctor.
- Remember `dimap :: (a' -> a) -> (b -> b') -> p a b -> p a' b'`. Think of this as:

```
-----p a' b'-------
a' -> [a -> b] -> b'
       --pab---
```

- The set of natural transformations is an end.
- Haskell: `type Nat f g = forall x. f x -> g x`.
- We can think of this as the "diagonal" of some end `p x x` for some profunctor `p` we need to cook up.
- `type p f g a b = f a -> g b`. Is `p f g` a profunctor?

```
dimap :: (a' -> a) -> (b -> b') -> (f a -> g b) -> (f a' -> g b')
dimap a'2a b2b' fa2gb = \fa' ->
  let fa = fmap a'2a fa'
  let gb = fa2gb fa
  let gb' - fmap b2b' gb
  in gb'
dimap a'2a b2b' fa2gb = (@fmap g b2b')  . fa2gb  . (@fmap f a'2a)
```

- Clearly, from the above implementation, we have a profunctor.
- So we have a profunctor `P(a, b) = C(Fa, Gb)`.
- In haskell, the end is `End p = forall a. p a a`.
- In our notation, it's `\int_x C(Fx, Gx)`.
- Recall the wedge condition. For a profunctor `p: Cop x C -> C`, and any morphism `k: a -> b` for `a, b โˆˆ C`,
  the following diagram commutes for the end `\int_X p(X,X)`:

```
 (p x x, p y y, p z z, ... infinite product)
\int_x p(x,x)
 /[ฯ€a]  [ฯ€b]\
v            v
p(a,a)       p(b,b)
 \            /
[p(id, k)]  [p(k,id)]
   \        /
    v      v
     p a b
```

- If we replace `p x x` with with our concrete `p a b = C(fa, gb)`, we get:

```
    (forall x. f x -> g x)
    /               \
  [@ a]            [@ b]
   v                v
 ฯ„a:(f a -> g a)     ฯ„b:(f b -> g b)
   \                  /
dimap id(a) k ฯ„a    dimap k id(b) ฯ„b
   \                 /
    \               ฯ„b.(@fmap f k): (f a-> g b)
     \              /
     \           COMMUTES?
     \            /
    (@fmap g k).ฯ„a(f a -> g b)

```

- This says that `gk . ฯ„a = ฯ„b . fk`
- But this is a naturality condition for `ฯ„`!
- So every end corresponds to a natural transformation, and `ฯ„` lives in `[C, D](f, g)`.
- This shows us that the set of natural transformations can be seen as an end (?)
- I can write `\int_a D(fa, ga) ~= [C, D](f, g)`

#### Invoking Yoneda

- Now, yoneda tells us that `[C, Set](C(a, -), f(-)) ~= f(a)`.
- Now I write the above in terms of ends as `\int_x Set(C(a, (x)), f(x)) ~= f(a)`.
- So we can write this as a "point-full" notation!
- In haskell, this would be `forall x. (a -> x) -> f x ~= f a`.

# Ends and diagonals

- [Bartosz: Wedges](https://www.youtube.com/watch?v=TAPxt26YyEI)
- Let's think of `Cop x C`, and an element on the diagonal `(a, a)`, and a function `f: a -> b`.
- Using the morphism `(id, f)`, I can go from `(a, a)` to `(a, b)`.
- If we have `(b, b)`, I can once again use `f` to go fo `(a, b)`.
- So we have maps:

```
     b,b
    / |
   /  |
  /   |
 /    v
a,a-->a,b
```

- This tells us that if we have something defined on the diagonal for a profunctor `p a a`, we can "extrapolate"
  to get data everywhere!
- How do we get the information about the diagonal? Well, I'm going to create a product of all the diagonal elements of the profunctor.
- so we need a limit `L`, along with maps `L -> p c c` for each `c`. This kind of infinite product is called a wedge (not yet,  but soon).
- The terminal object in the category of wedges is the end.
- But our cone is "under-determined". We need more data at the bottom of the cone for things to cohere.
- suppose at the bottom of the cone, we want to go from `p a a` to `p b b`. for this, I need morphisms `(f: b -> a, g: a -> b)` to lift
  into the profunctor with `dimap`.
- We might want to impose this as coherence condition. But the problem is that
  there are categories where we don't have arrows going both ways (eg. partial orders).
- So instead, we need a different coherence condition. If we had a morphism from `a -> b`, then we can get from `p a a --(id, f)-->p a b`.
  Or, I can go from `p b b --(f, id)-->p a b`. The wedge condition says that these commute. So we need `p id f . pi_1 = p f id . pi_2`

#### Relationship to haskell

- How would we define this wedge condition in haskell?
- Because of parametricity, haskell gives us naturality for free.
- How do we define an infinite product? By propositions as types, this is the same as providing `โˆ€x.`.
- `End p = forall a. p a a`
- We can define a cone with apex `A` of a diagram `D: J -> C` as a natural transformation `cone(A): Const(A) => D`. What's the version for a profunctor?
- Suppose we have a profunctor diagram `P: J^op x J -> C`. Then we have a constant profunctor `Const(c) = \j j' -> c`.
  Then the wedge condition  (analogue of the cone condition) is to ask that we need a **dinatural transformation** `cone': Const(A) => P`.
- NOTE: a dinatural transformation is STRICTLY WEAKER than a natural transformation from `J^opxJ -> C`.
- Suppose we have transformation that is natural in both components. That is to say, it is a natural transfrmation
  of functors of the type `[J^op x J -> C]`. This means that we have naturality arrows `ฮฑ(a,b): p(a,b) -> q(a,b)`. Then the following must commute, for any `f: a -> b` by
  naturality of `ฮฑ`:

```
      p(b,a)
    /    |
[p(f,id)]|
  /      |
p(a,a)  [ฮฑ(b,a)]
 |       |
[ฮฑ(a,a)] |
 |       |
 |    q(b,a)
 |     /
 |  [q(f,id)]
 |   /
q(a,a)
```

- Similarly, other side must commute:


```
      p b a
    /   |  \
[p f id]|   [p id f]
  /     |     \
p a a  [ฮฑ b a] p b b
 |      |        |
[ฮฑ a a] |      [ฮฑ b b]
 |      |        |
 |    q b a      |
 |     /   \     |
 |  [q f id]\    |
 |   /  [q id f] |
 |  /          \ |
q a a         q b b
```

- I can join the two sides back together into a `q a b` by using `[q id f]` and `[q f id]`. The bottom square
  commutes because we are applying `[q f id]` and `[q id f]` in two different orders. By functoriality,
  this is true because to `q(f.id, id.f) = q(f,f) = q(id.f, f.id)`.


```
      p b a
    /   |  \
[p f id]|   [p id f]
  /     |     \
p a a  [ฮฑ b a] p b b
 |      |        |
[ฮฑ a a] |      [ฮฑ b b]
 |      |        |
 |    q b a      |
 |     /   \     |
 |  [q f id]\    |
 |   /  [q id f] |
 |  /          \ |
q a a         q b b
  \             /
 [q id f]      /
    \        [q f id]
     \      /
     q a b
```

- If we erase the central node `[q b a]` and keep the boundary conditions, we arrive at a diagram:

```
      p b a
    /      \
[p f id]    [p id f]
  /           \
p a a          p b b
 |               |
[ฮฑ a a]        [ฮฑ b b]
 |               |
 |               |
 |               |
 |               |
 |               |
 |               |
q a a         q b b
  \             /
 [q id f]      /
    \        [q f id]
     \      /
     q a b
```

- Any transformation `ฮฑ` that obeys the above diagram is called as a *dinatural transformation*.
- From the above, we have proven that any honest natural transformation is a dinatural transformation, since the
  natural transformation obeys the diagram with the middle node.
- In this diagram, see that we only ever use `ฮฑ a a` and `ฮฑ b b`.
- So for well behavedness, we only need to check a dinatural transformation at the diagonal. (diagonal natural transformation?)
- so really, all I need are the diagonal maps whoch I will call `ฮฑ' k = ฮฑ a a`.
- Now, a wedge is a dinatural transformation from constant functor to this new thingie.


# Parabolic dynamics and renormalization

- [Video](https://www.youtube.com/watch?v=Z77mTqj_Wnk)

# Quantifiers as adjoints

- Consider `S(x, y) โŠ‚ X ร— Y`, as a relation that tells us when `(x, y)` is true.
- We can then interpret `โˆ€x, S(x, y)` to be a subset of `Y`, that has all the elements
  such that this predicate holds. ie, the set `{ y : Y | โˆ€ x, S(x, y) }`.
- Similarly, we can interpret `โˆƒx, S(x, y)` to be a subset of `Y` given by
   `{ y : Y | โˆƒ x, S(x, y) }`.
- We will show that these are adjoints to the projection `ฯ€: X ร— Y โ†’ Y`.
- Treat `P(S)` to be the boolean algebra of all subsets of `S`, and similarly `P(Y)`.
- Then we can view `P(S)` and `P(Y)` to be categories, and we have the functor `ฯ€: P(S) โ†’ P(Y)`.
- Recall that in this boolean algebra and arrow `a โ†’ b` denotes a subset relation `a โŠ† b`.

#### A first try: direct image, find right adjoint

- Suppose we want to analyze when `ฯ€ T โŠ† Z`, with the hopes of getting some condition when `T โŠ† ? Z` where `?`
  is some to-be-defined adjoint to `ฯ€`.
- See that `ฯ€ T โŠ† Z` then means `โˆ€ (x, y) โˆˆ T, y โˆˆ Z`.


```
     T
   t t t
   t t t
    |
    v
---tttt---- ฯ€(T)
-zzzzzzzzz--Z
```

- Suppose we build the set `Q(Z) โ‰ก { (x, y) โˆˆ S : y โˆˆ Z }`. That is to say, `Q โ‰ก ฯ€โปยน(Z)`. (`Q` for inverse of `P`).
- Then, it's clear that we have `ฯ€ T โŠ‚ Z` implies that `T โŠ† Q(Z)` [almost by definition].
- However, see that this `Q(Z)` construction goes in the wrong direction; we want a functor
  from `P(S)` to `P(Y)`, which projects out a variable via `โˆƒ / โˆ€`. We seem to have built
  a functor in the other direction, from `P(Y)` to `P(S)`.
- Thus, what we must actually do is to reverse the arrow `ฯ€: S โŠ† X ร— Y โ†’ Y`, and rather we
  must analyze `ฯ€โปยน` itself, because its adjoints will have the right type.
- However, now that we've gotten this far, let's also analyze left adjoints to `ฯ€`.

#### Direct image, left adjoint
- Suppose that `Z โŠ† ฯ€ T`. This means that for every `y โˆˆ Z`, there is some `x_y` such that  `(x_y, y) โˆˆ T`

```
     T
   t t t
   t t t
    |
    v
---tttt---- ฯ€(T)
----zz--------Z
```

- I want to find an operation `?` such that `? Z โŠ† T`.
- One intuitive operation that comes to mind to unproject, while still reminaing a subset,
  is to use `ฯ€โปยน(Z) โˆฉ T`. This would by construction have that `ฯ€โปยน(Z) โˆฉ T โŠ† T`.
- Is this an adjoint? we'll need to check the equation `:)`.

#### Inverse image, left adjoint.

- Suppose we consider `ฯ€โปยน = ฯ€* : P(Y) โ†’ P(S)`.
- Now, imagine we have `ฯ€*(Z) โŠ† T`.

```
    S
    -
    -
   tttt
   tztt
   tztt T
   tztt
    ^^
    || ฯ€*(Z)
----zz-------Z
```

- In this case, we can say that for each `z โˆˆ Z`, for all `x โˆˆ X` such that `(x, z) โˆˆ S`, we had `(x, z) โˆˆ T`.
- Consider the set `โˆ€ T โ‰ก { y โˆˆ T: โˆ€ x, (x, y) โˆˆ S => (x, y) โˆˆ T}`.
- Thus, we can say that `ฯ€*(Z) โŠ‚ T` iff `Z โŠ‚ โˆ€ T`.
- Intuitively, `T โŠ‚ ฯ€*(ฯ€(T))`, so it must be "hard" for the inverse image of a set `Z` (`ฯ€*(Z)`) to
  be contained in the set `T`, because inverse images cannot shrink the size.
- Furthermore, it is the right adjoint to `ฯ€*(Z)` because the ???


# TLDP pages for bash conditionals
- [The TLDP pages](https://tldp.org/LDP/Bash-Beginners-Guide/html/sect_07_01.html) have a large list of all
  possible bash conditionals

# Remainder, Modulo

- remainder takes the sign of the first operand.
- modulo takes the sign of the second operand.

# Parameters cannot be changed *anywhere*, not just in return location

```
inductive List (a: Type): Type where
| Good: List a
| Good2: List a -> List a
| Bad: List b -> List a
        ^^^^^^^ -- not allowed!
```

- In hindsight, this makes sense, as the parameter really is a trick to
  represent, well, *parametric* polymorphism.


# LCNF

- `let x := v in e ~= (\x -> e) v`

```
let x : Nat := 2
let xarr : Vec Int x := Nil
let 2arr : Vec Int 2 := Nil
in rfl : xarr = 2arr
```

```
(\x ->
   let xarr : Vec Int x := Nil
   let 2arr : Vec Int 2 := Nil
   in rfl : xarr = 2arr) 2
      ^^^^^^^^^^^^^^^^^ <---- (ill-typed under CIC)
```


```
Erased a ~ Erased b
(\x ->
   let xarr : Vec Int (Erased x) := Nil
   let 2arr : Vec Int (Erased 2) := Nil
   in rfl : xarr = 2arr) 2
      ^^^^^^^^^^^^^^^^^ <---- (ill-typed under CIC)
```

#### Erased


```
match x with
  | F1 f1 => h f1
  | F2 f2 => h f2
```



- Design an ill-typed type system to allow *only* the types
  of errors that occur when floating let/join points?


```
def tupleN (n: Nat) (t: Type) :=
  match n with
  | 0 => Unit
  | n+1 => t * (tupleN n t)

def foo n := Vec Int n

Any ~ t
def f (n: Nat): (tupleN n t) := ...
def f (n: Nat): Any := ...
```

# Predicative v/s Impredicative: On Universes in Type Theory


#### Tarski formulation of univereses

- `U` is a code for types.
- We have a decoding function `T: U โ†’ Type`.

```
U type

a โˆˆ U
-----
T(a) type
```

- Universes bump up levels when we quantify over them?
- Impredicative universes are closed under quantification over types of that same universe.


> Q. Given a definition, give me an algorithm that says when the definition needs predicativity
> A. perform type + universe inference on the definition. If we get a constraint
>    of the form Type lower = Type higher where lower < higher, then the definition
>    needs impredicativity.



# Testing infra in Lean4

- to run tests in parallel, `cd build/stage1/ && CTEST_PARALLEL_LEVEL=20 ctest`.


# Autocompletion in Lean4

- for C++, use `compiledb` to generate a `compile_commands.json`
- for lean, setup a lean toolchain override for the correct `stage0`:

```
$ elan toolchain add my-lean-copy /path/to/my-lean-copy/build/stage0
$ elan override my-lean-copy
```

# Inductive types

#### Coq

##### Expressive Power
- Bare Inductives
- Nested inductives: The constructor of the nesting cannot be a mutual
   [So we can have `T := ... List T`, but not `T := ... Mutual1 T where Mutual1T := Mutual2 T and Mutual2 T := Mutual1 T`]
- Mutual inductives: All parameters of inductives in the mutual group must be the same.
- Nested Mutual inductives: Not supported, something like:

```
T
| .. U (T)

U
| .. T
```

does not work due to the presence of `U(T)`.

##### Computational content
- Bare inductives: primitive recursion principle `fix`, kernel has support for `fix` reduction (iota)
- Nested inductives: primitive recursion principle using `fix` and `match`.
  Kernel has support for `fix` reduction, and `match` is also known by the kernel.
- Mutual inductives: generates 'simple' recursion principles for each element of the mutual.
- Need to use the `scheme` command to get the full recursion principle.
- Primitive recursion principle using `fix` and `match`.

#### Lean

##### Expressive Power
- Bare inductives
- Nested inductives: works perfectly!

```
mutual
inductive U1 (S: Type): Type :=
| mk: U2 S โ†’ U1 S
| inhabitant: U1 S

inductive U2 (S: Type): Type :=
| mk: U1 S โ†’ U2 S
| inhabitant: U2 S
end

inductive T
| mk: U1 T โ†’ T
```
- Mutual inductives: All parameters of inductives types in the mutual group must be the same.

##### Computational content


# Parameter verus Index

- Parameters are fixed for all constructors, indexes can vary amongst constructors.
- Parameters represent parametric polymorphism, and one does not gain information on them during pattern matching.
- Indexes rep
- Coq calls indexes "non-uniform".

# HNF versus WHNF

#### Head normal form
- a data constructor applied to arguments which are in normal form
- a lambda abstraction whose body is in normal form

# Different types of arguments in Lean4:
- `(x: T)` regular argument
- `[S: Functor f]` typeclass argument / argument resolved by typeclass resolution
- `{x: T}`: Maximally implicit argument, to be inferred.
- `โฆƒx: Tโฆ„`: Non-maximally-inserted implicit argument. It is instantiated if it can be deduced from context,
    and remains uninstantiated (ie, no metavariable is introduced) otherwise.

In Coq people shun away from this binder. I'm not sure why, I guess there are issues with it at a larger scale. We could get rid of it. For the paper it's utterly irrelevant in my opinion

# Big list of lean tactics
- `conv <pattern> => ...`: rewrite in pattern. example: `conv in x + 1 => rw ...`
- `split` allows one to deal with the cases of a match pattern. This also allows one to case on an `if` condition.
- `cases H: inductive with | cons1 => sorry | cons2 => sorry` is used to perform case analysis on an inductive type.
- `cases H; case cons1 => { ... }; case cons2 => { ... }` is the same , but with slightly different syntax.
- `rewrite [rule] (at H)?` performs the rewrite with rule. But generally, prefer `simp [rule] (at H)`, because `simp`
   first runs the rewrite, and then performs reduction. But if `simp` does not manage to perform a rewrite, it
   does not perform reduction, which can lead to weird cases like starting with `let H = x = true in match x | true => 1, false => 2`.
   On running `rewrite [H]`, we get `match true | true => 1, false => 2`. And now if we run `simp`, it performs no reduction.
   On the other hand, if we had run `simp [H]`, it would rewrite to `match true | true => 1 | false => 2` and then also
   perform reduction to give `1`.

# Hyperdoctrine
- A hyperdoctrine equips a category with some kind of logic `L`.
- It's a functor `P: T^op -> C` for some higher category `C`, whose objects are categories
  whose internal logic corresponds to `L`.
- In the classical case, `L` is propositional logic, and `C` is the 2-category of posets.
  We send `A โˆˆ T` to the poset of subobjects `Sub_T(A)`.
- We ask that for every morphism `f: A -> B`, the morphism `P(f)` has left and right adjoints.
- These left and right adjoints mimic existential / universal quantifiers.
- If we have maps between cartesian closed categories, then the functor `f*` obeys frobenius
  reciprocity if it preserves exponentials: `f*(a^b) ~iso~ f*(a)^f*(b)`.


#### Algebra of logic

- Lindenbaum algebras : Propositional logic :: Hyperdoctrines : Predicate logic
- Work in a first order language, with a mild type system.
- Types and terms form a category $B$ (for base.
- Interpretations are functors which map $B$ to algebras.

#### Syntax
- `e | X`. `e` is untyped. `X` is typed.
- `e` is a sequence of symbols. `X` is a set of variables, which intuitively are the free variable.
- Every variable that has a free occurence in the untyped part should also occur in the typed pat.
- eg. `R x[1] x[2] | { x[1], x[2] }`
- Not every variable in the typing part needs to occur in the untyped part.
- eg. `R x[1] x[2] | { x[1], x[2], x[3] }`. (dummy variable `x[3]`).
- Variables: `x[1], ...`
- constants: `c[1], ...`
- Separators: `|, {, }, ()`.
- `c[i] | {}` is a unary term.
- `x[i] | {x[i]}` is a unary term.
- If `t |X` is `n`ary and `s | Y` is m-ary, then `ts | X U Y` is a `n+m`ary term.
- if `t|X` is n-ary and `y` is a variable, then `t | X U {y}` is a `n-ary` term.

#### Formulas.

- if `R` is a n-ary predicate and `t|X` is a n-ary term, then `Rt|X` is a formula.
- if `phi|X` is a formula and `x โˆˆ X`, then `โˆ€x, phi | X - {x}` is a formula.

#### A category of types and terms.
- Find a natural way to view terms in our language as arrows in our category.
- `s | Y`. `Y` tells us `card(Y)` name shaped gaps. `s` is a `len(s)` name shaped things.
- `s:codomain`, `Y: domain`.
- Question: when should composition be defined? the types have to match for `t|X . s|Y`.
- So we should have `card(X) = len(s)`.
- We want the composition to be substitution. `t|X . s|Y = t[X/s]|Y`.
- eg. `x3|{x3, x4}  . x1a3 | {x1, x2} = x1|{x1, x2}`. (substitute `x3` by `x1`, and `x4` by `x2`.)
- eg. `x3 x4|{x3, x4}  . x1a3 | {x1, x2} = x1 a3|{x1, x2}`. (substitute `x3` by `x1`, and `x4` by `x2`.)

#### Problem: we don't have identity arrows!
- Left identities `x1 x2 | {x1, x2}` is not a right identity!
- But in a category, we want two sided identity.
- The workaround is to work with an equivalence class of terms.
- Define equivalence as `t|X ~= t(X/Y)|Y`.
- Arrows are equivalence classes of terms.

#### Reference
- [Hyperdoctrines and why you should care about them](https://www.youtube.com/watch?v=VvSTE9oqRag)

# Fungrim

- https://fredrikj.net/math/fungrim2022.pdf
- They want to integrate with mathlib to have formal definitions.

# Category where coproducts of computable things is not computable

- Modular lattices are an algebraic variety.
- Consider the category of modular latties.
- The free modular lattice on 2 elements and on 3 elements has dediable equality, by virtue of being finite.
- The free modular lattice on 5 elements does not have decidable equality.
- The coproduct of free modular lattice on 2 and 3 generators is the free modular
  lattice on 5 generators, because $F(2 \cup 3) = F(2) \sqcup F(3)$  (where $2, 3$ are two and three element sets),
  because free is left adjoint to forgetful, and the left adjoint $F$ preserve colimits!

# Homotopy continuation


- [Rigorous arithmetic with approximate roots of polynomials --- CAG L16](https://www.youtube.com/watch?v=XC_tfjjBPLc&list=PL5ErEZ81Tyqc1RixHj65XA32ejrS2eEFK&index=38)

# Relationship between linearity and contradiction

- https://xorshammer.com/2021/04/08/but-why-is-proof-by-contradiction-non-constructive/

# Monads from Riehl

- I'm having some trouble enmeshing my haskell intuition for monads with the rigor, so this
- A category is said to be monadi
  is an expository note to bridge the gap.

#### What is a monad
- A monad is an endofunctor `T: C -> C` equipped with two natural transformations:
- (1) `return/eta: idC => T` [yeeta, since we are yeeting into the monad.]
- (2) `join/mu: T^2 => T`, such that two laws are obeyed:

- First law: `mu, T` commutation:

```
T^3(x) --T(@mu@x)--> T^2@x
|                   |
mu@(T@x)          mu@x
|                   |
v                   v
T^2(x)---mu@x-----> T(x)
```

- Second law: `mu, eta` cancellation:


```
(Tx) --eta@(T@x)--> T^2(x)
|EQ                 |
|                   |
T@(eta@x)         mu@x
|                   |
v                 EQv
T^2(x)---mu@x---> T(x)
```

- `mu .  eta T = mu . T eta = 1`



#### Monad from adjunction

- Any adjunction between a Free functor `F: L -> H` and a forgetfUl/Underlying functor `U: H -> L`
  `F |- U` gives a monad. The categories are named `L, H` for `lo, high` in terms of the amount of
  structure they have. We go from low structure to high structure by the free functor.
- The monad on `L` is given by `T := UF`.
- Recall that an adjunction gives us `pullback: (F l -> h) -> (l -> U h)` and `pushfwd: (l -> U h) -> (F l -> h)`.
  The first is termed pullback since it takes a function living in the high space and pulls it back to the low space.
- This lets us start with `(F l -> F l)`, peel a `F` from the left via `pullback`
  to create  `(l -> U (F l))`. That is we have `return: l -> T l`.
- In the other direction, we are able to start with `(U h -> U h)`, peel a `U` from the right via `pushforward` to
  create `(F U h -> h)`. This allows us to create the counit as `T^2 l = F U F U l = F (U F) U l -> F U l = T l`.

#### Algebra for a monad $C^T$.

- Any monad, given by `(T: C -> C, return: 1C => T, join: T^2  => T)` has a category of `T`-algebras associated to it.
- The objects of `T-alg` are morphisms `f: Tc -> c`.
- The morphisms of `T-alg` between `f: Tc -> c` and `g: Td -> d` are commuting squares, determined by an `arr: c -> d`

```
Tc -T arr-> Td
|           |
f           g
|           |
v           v
c   -arr->  d
```

- The notation for the category as $C^T$ makes some sense, since it consists of objects of the form `Tc -> c` which matches
  somewhat with the function notation. We should have written $C^{TC}$ but maybe that's too unweildy.


#### Factoring of forgetful functor of adjunction

- Any adjunction `(F: L -> H, U:  H -> L)` with associated monad `T` allows us to factor `U: H -> L` as:

```
H -Stx-> L^T -forget-> L
```

- So we write elements of `H` in terms of syntax/"algebra over `L`". We then forget the algebra structure to keep only the low form.
- The way to think about this is that any object in the image of `U` in fact has a (forgotten) algebra structure, which is why
  we can first go to `L^T` and then forget the algebraic structure to go back to `L`. It might be that this transition from `H` to `L^T`
  is very lossy. This means that the algebra is unable to encode what is happening in `H` very well.

#### Monadic adjunction

- Let us consider an adjunction `(F: L -> H, U: H -> L)` with monad `T`. Factor `U` via `L^T` as:

```
H -Stx-> L^T -forget-> L
```

- The adjunction is said to be monadic if in the factoring of `U` via `L^T`, it happens
   that `H ~= L^T`. That is, `Stx` is an equivalence between `H` and `L^T`.
- The way to think about this is that any object in the image of `U` in fact has a (forgotten) algebra structure, and
  this algebra structure actually correctly represents everything that was happening in `H`.
- Another way to say **the adjunction `F: L -> H: U` is monadic** is to say that is that
   **`F` is monadic over `U`**. We imagine the higher category `H` and the free functor `F` lying over `L` and `U`.
- **Warning**: This is a STRONGER condition than saying that `UF` is a monad. `UF` is ALWAYS a monad for ANY adjunction.
  This says that `H ~= L^T`, via the factoring `H -Stx-> L^T -forget-> L`.
- We sometimes simply say that **`U` is monadic**, to imply that there exists an `F` such that `UF` is an adjunction
  and that `U ~= L^T`.

#### Category of models for an algebraic theory

- A functor is finitary if it preserves filtered colimits.
- In particular, a monad `T : L -> L` is finitary if it preserves filtered colimits in C.
- If a right adjoint is finitary, then so is its monad because its left adjoint preserves all colimits.
  Thus, their composite preserves filtered colimits.
- A category `H` is a **category of models for an algebraic theory** if there
  is a finitary monadic functor `U : H -> Set`.

#### Limits and colimits in categories of algebras


- We say that `H` is monadic over `L` iff the adjunction `F: L -> H: U` such that the monad `T: L -> L := UF`
  gives rise to an equivalence of categories `H ~= L^T`.

## Riehl: Limits and colimits in categories of algebras

Here, we learn theorems about limits and colimits in `L^T`.

#### Lemma 5.6.1: If `U` is monadic over `F`, then `U` reflects isos

- That is, if for some `f: h -> h'`, if `Uf: Uh -> Uh'` is an iso, then so is `f`.
- Since the adjunction `F |- U` is a monadic adjunction (`U` is monadic over `F`), we know that `H ~= L^T`, and `U` equals
  the forgetful functor `(H = L^T) -> L`.
- Write the arrow `f: h -> h'` as an arrow in `L^T` via the commuting square datum
  determined by `g: h -> h'`:

```
Tl-Tg->Tl'
|      |
a      a'
|      |
v      v
l--g-->l'
```

- Since we assume that `U(Tg)` is iso, this means that `g` is iso. This means that there exists a `g'`
  which is the inverse of `g`. But this means that the diagram below commutes:


```
Tl<-Tg'-Tl'
|      |
a      a'
|      |
v      v
l<-g'--l'
```

- For a proof, we see that `a' . Tg = g . a'`. Composing by `g'` on left, giving: `g' . a' . Tg = a'`.
  Composing by `Tg'` on the right, we get: `g'. a' = a' . Tg'`. That's the statement of the above square.
- This means we have created an inverse `Tg'`, which reflects `g'` into `L^T`.


#### Corollary 5.6.2: Bijective continuous functions in CHaus are isos

- When we forget to `Set`, we see that bijections are the isos. Thus, in `CHaus` (compact haussdorff spaces)
  which is monadic over `Set`, we have that the arrows that forget to become isos in set, ie, continuous
  bijections are also isos.

#### Corollary 5.6.4: Any bฤณective homomorphism arising from a monadic adjunction which forgets to `Set` will be iso

- Follow the exact same proof.

#### Thm 5.6.5.i A monadic functor `U: H -> L` creates any limits that `L` has.

- Since the equivalence `H ~= L^T` creates all limits/colimits, it suffices to show the result for `U^T: L^T -> L`.
- Consider a diagram `D: J -> C^T` with image spanned by `(T(D[j]) -f[j]-> D[j])`.
- Consider the forgotten diagram `U^TD: J -> C` with image spanned by `D[j]`.  Create the limit cone
  `P` (for product, since product is limit) with morphisms `pi[j]: P -> D[j]`. We know this limit exists since we assume that `L`
  has this limit that `U^T` needs to create.
- We can reinterpret the diagram `D: J -> C^T` as a natural transformation between two functors `Top, Bot: J -> C`.
  These functors are `Top(j) := T(D[j])`,
  `Bottom(j) := D[j]`.
- The the natural transformation is given by `eta: Top => Bottom`,
  with defn `eta(j) := D[j]-f[j]-> D[j]` where `f[j]` is given by the image of `(T(D[j]) -f[j]-> D[j])`.
- So we see that `eta: Top => Bot` can also be written as `eta: TD => D` since `Top ~= TD` and `Bot ~= D`.
- Now consider the composition of natural transformations `Const(TL) =Tpi=> TD =gamma=> D` all in `J -> C`.
  This gives us a cone with summit `TL`.
- This cone with summit `TL` factors through `L` via the unique morphism `lambda: TL -> L`.
  We wish to show that `(TL -lambda-> L)` is a `T`-algebra, and is the limit of `D`.
- Diagram chase. Ugh.

#### Corollary 5.6.6: The inclusion of a reflective subcategory creates all limits
- The inclusion of a reflective subcategory is monadic.
- This lets us create all limits by the above proposition.

#### Corollary 5.6.7: Any category monadic over `Set` is complete
- `Set` has all limits.
- The forgetful functor `U: H -> L` creates all limits that `L=Set` has.
- Thus `H` has all limits, ie. is complete.

#### Corollary 5.6.9: `Set` is cocomplete

- The contravariant power set functor `P: Set^op -> Set` is monadic.
- `Set` has all limits, and `P` creates all limits.
- Thus all limits of `Set^op` exist, ie, all colimits of `Set` exist.

#### Category of models for alg. theory is complete

TODO

#### Category of algebras has coproducts

- We show how to construct the free product of monoids via haskell. The same principle
  generalizes for any algebraic theory:

```hs
import Control.Monad(join)

-- |(a*|b*)* ~~simplify~~> (a|b)*
eval :: Monoid a => Monoid b => [Either [a] [b]] -> [Either a b]
eval = map (either (Left . mconcat) (Right . mconcat))

-- | a*|b* -> (a|b)* with no simplification
transpose :: Either [a] [b] -> [Either a b]
transpose = either (map Left) (map Right)

-- | (a*|b*)* -> (a|b)* with no simplification
flatten :: [Either [a] [b]] -> [Either a b]
flatten = join . map transpose


-- force: eval = flatten | via coequallizer
```

#### If $T: C \to C$ is finitary and $C$ is complete and cocomplete, then so is $C^T$

- We have already seen that if $C$ is complete then so is $C^T$
- We have also seen that $C^T$ contains coproducs
- So is we show that $C^T$ has coequalizers, then we get cocomplete, since any colimit can be
  expressed as coproduct-coequalizer.
- To show that all coequalizers exists is to show that there is an adjoint to
  the functor `const: [C^T] -> [J -> C^T]` where `J := [a -f,g-> b]`
  is the diagram category for coequalizers.
- Recall that the adjoint sends a diagram `[J -> C^T]` to the nadir that is the coequalizer in `C^T`.
- See that the constant functor trivially  preserves limits.
- To show that it possesses an adjunction, we apply an **adjoint functor theorem** (fuck me).
  In particular, we apply the general adjoint functor theorem, so we must show that the solution
  set condition is satisfied.
- Recall that the solution set condition for $F: C \to D$ requires that for each $d \in D$,
  the comma category $const(d) \downarrow F$ admit weakly initial objects.
- Unwrapping that definition: For each $d \in D$, there is a solution set.
  Tht is, there exists a small set $I$
  and a family of objects $c_I$ and a family of morphisms $f_I: d \to F(c_i)$ such that
  any morphism $d \to F(c)$ in $D$ can be factored via some $f_i$ as $d \xrightarrow{f_I} F(c_i) \xrightarrow{g} F(c) = d \to F(c)$.
- To apply the theorem, we must produce a solution set for every object in `[J -> C^T]`, that is,
  for each parallel pair of morphisms $f, g: (A, \alpha) \to (B, \beta)$.
- We will produce a solution set with a single element by creating a fork $(Q, u)$ such that any other fork
  factors through this fork (perhaps non uniquely!) So we create:

$$
(A, \alpha) \xrightarrow{f, g} (B, \beta) \xrightarrow{q} (Q, u)
$$

- If we know how to create coequalizers in $C^T$, then this would be easy: we literally just create a coequalizer.
- Instead, we create some "approximation" of the coequalizer with $(Q, u)$.
- To start with, we define $q_0: B \to Q_0$ in $C$ of the pair $(A \xrightarrow{f, g} B$).
- If $Tq_0$ would be the coequalizer of $Tf, Tg$ then we are done.
  But this is unlikely, since a monad need not preserve coequalizers.
- Instead, we simply calculate the coequalizer of $Tf, Tg$ and call this $q_1: B \to Q_1=TQ_0$.
- Repeat inductively to form a directed limit (colimit).
- Monad preserves filtered colimits, since in $UF$, $F$ the left adjoint preserves all colimits, and $U$
  the right adjoint preserves colimits since it simply forgets the data in

# Combinatorial Cauchy Schwarz

#### Version 1

- Suppose you have r pigeons and n holes, and want to minimize the number of pairs of pigeons in the same hole.
- This can easily be seen as equivalent to minimizing the sum of the squares of the number of pigeons in each hole:
  $\min_{h: i > j} (h[i] - h[j])^2$ where $h[i]$ is the hole of the $i$th pigeon.
- Classical cauchy schwarz: $x_1^2 + x_2^2 + x_3^2 \geq 1/2(x_1 + x_2 + x_3)^2$
- Discrete cauchy schwarz: On placing a natural number of pigeons in each hole, The number of pairs of pigeons in the
  same hole is minimized iff pigeons are distributed as evenly as possible.
- Pigeonhole principle: When $r = m + 1$, the best split possible is $(2, 1, 1, \dots)$.

#### Version 2

- I recently learned about a nice formulation of this connection from a version of the Cauchyโ€“Schwarz
  inequality stated in Bateman's and Katz's article.
- Proposition: Let $X$ and $Y$ both be finite sets and let `f:Xโ†’Y` be a function.
- $|ker f| \cdot |Y| \geq |X|^2$. (Where `ker f` is the kernel of `f`, given as the equalizer of `X*X-f*f-> X`.
    More explicitly, it is the subset of `X*X`  `ker(f) := { (x, x') : f(x) = f(x') }`).
- Equality holds if and only if every fiber has the same number of elements.
- This is the same as the version 1, when we consider $f$ to be the function $h$ which assigns pigeons to holes.
  Every fiber having the same number of elements is the same as asking for the pigeons to be evenly distributed.
- Compare: $|ker(f)| \cdot |Y| \geq |X|^2$ with $(x_1^2 + x_2^2 + x_3^2) \cdot n \geq (x_1 + x_2 + x_3)^2$. Cardinality replaces
  the action of adding things up, and $|X|^2$ is the right hand side, $|ker(f)|$ is the left hand side, which is the sum of squares.

# Bezout's theorem

- [On Bezout's theorem Mc coy](https://sites.math.washington.edu/~morrow/336_19/papers19/Dathan.pdf)
- Let $k$ be algebraically closed.
- Let $R \equiv k[x, y, z]$ be ring.
- We wish to detect number of intersections betweeen $f, j \in k[x, y, z]$ counted upto multiplicity.
- For any point $a \in k$, denote $R_a$ to be the localization of $R$ at the multiplicative subset $D_a \equiv \{ f \in R: f(a) \neq \}$
  ($D$ for does not vanish).
- So $R_a \equiv D_a^{-1}(R)$, which concentrates attention around point $a$.

#### Intersection multiplicity $i[f \cap g](a)$
- Define the intersection multiplicyt of $f, g$ at $a$ by notation $i[f \cap g](a)$.
- Defined as $i[f \cap g](a) \equiv dim_k(R_a/(f, g)_a)$.
- That is, we localize the ring at $a$ and quotient by the ideal generated by $f, g$,
  and then count the dimension of this space as a $k$ vector space.

####  $f(a) \neq 0$ or $g(a) \neq 0$ implies $i[f \cap g](a) \equiv 0$
- WLOG, suppose $f(a) \neq 0$. Then localization at $a$ makes $f$ into a unit. The ideal $(f, g)_a \equiv R_a$ since the ideal
  explodes due to the presence of the local unit $f_a$. Thus, $R_a/(f, g)_a \equiv 0$.

#### $f(a) = 0$ and $g(a) = 0$ implies $i[f \cap g](a) \neq 0$.
- If both vanish, then $(f, g)_a$ is a real ideal of $R_a$.


#### Examples
- $x-0$ and $y-0$ at $(0, 0)$ have multiplicity $D_{(0, 0)}^{-1}(k[x, y]/(x, y))$ which is just $k$, which has dimension $1$.
  So they intersect with dimension $1$.
- $x-1$ and $y-1$ at $(0, 0)$ have multiplicity $D_{(0, 0)}^{-1}(k[x, y]/(x - 1, y - 1))$. The ideal $(x - 1, y - 1)$ blows up because $x - 1 \in D_{(0, 0)}$,
  and thus the quotient is $0$, making the dimension $0$.
- $x^2-y$ and $x^3-y$ at $(0, 0)$ gives quotient ring $k[x, y]/(x^2-y, x^3-y)$, which is the same as $k[x, y]/(x^2 - y, x^3 - y, 0)$, which is equal
  to $k[x,y]/(x^2, x^3, y)$, which ix $k[x]/(x^2)$. This is the subring of the form $\{ a + bx : a,b \in k \}$ which has dimension $2$ as a $k$
  vector space. So this machinery actually manages to captures the degree 2 intersection between $y=x^2$ and $y=x^3$ at $(0, 0)$.

##### Intersection cycle ($f \cap g$)
- Define $f \cap g \equiv \sum_{a \in \texttt{space}} i[f \cap g](a) \cdot a.$
- It's a generating function with intersection multiplicity as coefficients hanging on the clothesline of points.

#### Intersection number $\#(f \cap g)$
- Given by $\#(f \cap g) \equiv \sum_{a \in \texttt{space}} i[f \cap g](a)$. This is the count of number of intersections.

#### Lemma: $f \cap g = g \cap f$
- Mental note: replace $f \cap g$ with "ideal $(f, g)$" and stuff makes sense.
- Follow immediately since $(f, g) = (g, f)$ and the definition of $i[f \cap g](a) = R_a/(f, g)_a$ which is equal
  to $R_a/(g, f)_a = i[g \cap f](a)$

#### Lemma: $f \cap (g + fh) = f \cap g$
- $(f, g + fh) \equiv (f, g)$.

#### $f \cap gh \equiv f \cap g + f \cap h$
- Heuristic: if $f(a)$ and $gh(a)$ vanish, then either $f(a), g(a)$ vanish or $f(a), h(a)$ vanish, which can be counted by
  $f \cap g + f \cap h$

#### Lemma: if $f, g$ are nonconstant and linear then $\#(f \cap g) = 1$.
- Recall that we are stating this within the context of $k[x, y, z]$.
- So $f, g$ are homogeneous linear polynomials $f(x, y, z) = ax + by$, $g(x, y, z) = cx + dy$.
- Sketch: if they have a real solution, then they will meet at unique intersection by linear algebra.
- if they do not have a unique solution, then they are parallel, and will meet at point at infinity which exists
  because we have access to projective solutions.

##### Lemma: homogeneous polynomial $g \in k[p, q]$ factorizes as $\alpha_0 p^t \prod_{i=1}{n-t}(p - \alpha_i q)$: $\alpha_0 \neq 0$ and $t > 0$
- Key idea: see that if it were $g \in k[p]$, then it would factorize as $p^t \prod_i (p - \alpha_i)$
- To live in $k[p, q]$, convert from $g(p, q) \in k[p, q]$ to $g(p/q, q/q) \in k[(p/q)]$, which is the same
  as $g(t, 1) \in k[t]$.
- Since we are homogeneous, we know that $g(\lambda p, \lambda q) = \lambda^{deg(g)} g(p, q)$. This lets us
  make the above transform:

- $g(p/q, q/q) = g(p/q, 1) = (p/q)^k \prod_{i : i +k = n} (p/q - \alpha_i)$.
- $g(p/q, q/q) = g(p/q, 1) = (p/q)^k \prod_{i : i + k = n} (p - \alpha_i q)/q$.
- $g(p/q, q/q) = g(p/q, 1) = p^k/q^k \cdot (1/q^{n-k}) \cdot \prod_{i : i + k = n} (p - \alpha_i q)$.
- $g(p/q, q/q) = g(p/q, 1) = p^k / q^n \prod_{i : i + k = n} (p - \alpha_i q)$.
- $g(p, q) = q^n \cdot g(p/q, 1) = q^n \cdot p^k / q^n \prod_{i : i + k = n} (p - \alpha_i q)$.
- $g(p, q) = q^n \cdot g(p/q, 1) =  p^k \prod_{i : i + k = n} (p - \alpha_i q)$.
- This proves the decomposition that $g(p, q) = q^k \prod_i (p - \alpha_i q)$.


##### Lemma: homogeneous polynomial $g \in k[p, q]$ factorizes as $\alpha_0 q^t \prod_{i=1}{n-t}(p - \alpha_i q)$ with $t > 0$.
- This is different from the previous step, since we are pulling out a factor of $q^t$ this time!
- We cannot argue "by symmetry" since the other terms are $(p - \alpha_i q)$. If it really were symmetry, then we should
  have $(q - \alpha_i p)$ which we don't.
- So this new lemma is in fact DIFFERENT from the old lemma!

- Key idea: see that if it were $g \in k[p]$, then it would factorize as $p^t \prod_i (p - \alpha_i)$
- To live in $k[p, q]$, convert from $g(p, q) \in k[p, q]$ to $g(p/q, q/q) \in k[(p/q)]$, which is the same
  as $g(t, 1) \in k[t]$.
- Since we are homogeneous, we know that $g(\lambda p, \lambda q) = \lambda^{deg(g)} g(p, q)$. This lets us
  make the above transform:

- $g(p/q, q/q) = g(p/q, 1) = (p/q)^k \prod_{i : i +k = n} (p/q - \alpha_i)$.
- $g(p/q, q/q) = g(p/q, 1) = (p/q)^k \prod_{i : i + k = n} (p - \alpha_i q)/q$.
- $g(p/q, q/q) = g(p/q, 1) = p^k/q^k \cdot (1/q^{n-k}) \cdot \prod_{i : i + k = n} (p - \alpha_i q)$.
- $g(p/q, q/q) = g(p/q, 1) = p^k / q^n \prod_{i : i + k = n} (p - \alpha_i q)$.
- $g(p, q) = q^n \cdot g(p/q, 1) = q^n \cdot p^k / q^n \prod_{i : i + k = n} (p - \alpha_i q)$.
- $g(p, q) = q^n \cdot g(p/q, 1) =  p^k \prod_{i : i + k = n} (p - \alpha_i q)$.
- This proves the decomposition that $g(p, q) = q^k \prod_i (p - \alpha_i q)$.

## Lemma: $f \in k[x, y, z]$ and $g \in [y, z]$ homogeneous have $def(f) deg(g)$ number of solutions
- This is the base case for an induction on the degree of $x$ in $g$. here, the degree of $x$ in $g$ is zero.
- to compute $i[f(x, y, z) \cap g(y, z)]$, we write it as $i[f(x, y, z) \cap z^k \prod_{i : i + k = n} (y - \alpha_i z)]$
- This becomes $i[f(x, y, z) \cap y^k] + \sum_i i[f(x, y, y) \cap  (y - \alpha_i z)]$.
- Intersecting with $y^k$ gives us $k$ times the intersection of $y$ with $f(x, y, z)$, so we have
  the eqn $i[f(x, y, z) \cap z^k] = k i[f(x, y, z) \cap z]$.
- The full eqn becomes $k i[f(x, y, z) \cap z] + \sum_i i[f(x, y, z) \cap  (y - \alpha_i z)]$.



##### Solving for $i[f(x, y, z) \cap z]$

- Let's deal with the first part.
- See that $i[f(x, y, z) \cap z]$ equals $i[f(x, y, 0) \cap z]$, because we want a common intersection, thus can impose
  $z = 0$ on $f(x, y, z)$.
- We now write $f(x, y, 0) = \mu y^t \prod_j (x - \beta_j y)$.

##### Solving for $i[f(x, y, z) \cap (y - \alpha_i z)]$
- Here, we must impose the equation $y = \alpha_i z$.
- Thus we are solving for $f(x, z, \alpha_i z)$. Once again, we have an equation of two variables, $x$ and $z$.
- Expand $f(x, z, \alpha_i z) = \eta_i z^{l_i} \prod_{j=1}{m - l_i}(x - \gamma_{ij} z)$
- This makes the cycles to be $l_i (z \cap (y - \alpha_i z)) + \sum_j (x - \gamma_{ij} z) \cap (y - \alpha_i z)$.
- The cycle $(z \cap (y - \alpha_i z))$ corresponds to setting $z = 0, y - \alpha_i z = 0$, which sets $y=z=0$.
  So this is the point $[x:0:0]$.
- The other cycle is $(x - \gamma_{ij} z) \cap (y - \alpha_i z)$, which is solved by $(\gamma_{ij} z : \alpha_i z : z)$.
- In total, we see that we have a solution for every cycle.

#### Inductive step

- Let $deg(f)$ denote total degree of $f$, $deg_x(f)$ denote $x$ degree.
- Let $\deg_x(f) \gep deg_x(g)$.
- We treat $f, g$ as polynomials in a single variable $x$, ie, elements $(k[y, z])[x]$.
- We want to factorize $f$ as $f = Qg + R$. But to do this, we need to enlarge the coefficient ring $k[y, z]$
  into the coefficient *field* $k(y, z)$ so the euclidean algorithm can work.
- So we perform long division to get polynomials $Q, R \in (k(y, z)[x]$ such that $f = Qg + R$.
- Since $f, g$ are coprime, we must have $R$ nonzero. Now these $Q, R$ are rational *functions* since they live in $k(y, z)$.
- Take common denominator of $Q, R$ and call this $h \in k[y, z]$ (ie, it is the polynomial denominator).
- Then $hf = (hQ)g + (hR)$ which is $hf = qg + r$ where $q \equiv hQ \in k[y, z]$ and $r \equiv hR \in k[y, z]$.
  So we have managed to create polynomials $q, r$ such that $hf = qg + r$.
- Let $c = gcd(g, r)$. $c$ divides $g$ and $r$, thus it divides $qg + r$, ie, it divides $hf$.
- Dividing through by $c$, we get $h'f = qg' + r'$, where $h = h'c$, $g = g'c$, $r = r'c$.
- We assume (can be shown) that these are all homogeneous.
- Furthermore, we started with $gcd(g, f) = 1$. Since $g'$ divides $g$, we have $gcd(g', f) = 1$.
- $c$ cannot divide $f$, since $c = gcd(g, r)$, and $g, f$ cannot share nontrivial common divisors. Thus, $gcd(c, f) = 1$.
- We have some more GCDs to check, at the end of which we write the intersection equation:

$$
f \cap g = ()
$$

- [Borcherds Video](https://www.youtube.com/watch?v=UJssbO-e2yw)
- [On Bezout's Theorem: Dathan Ault-McCoy](https://sites.math.washington.edu/~morrow/336_19/papers19/Dathan.pdf)

# Example for invariant theory

- Consider $p(z, w) = p_1 z^2 + p_2 zw + p_3 w^2$ --- binary forms of degree two.
- The group $SL(2, Z)$ acts on these by substituting $(z, w) \mapsto PSL(2, Z) (z, w)$.
- We can write the effect on the coefficents explicitly: $(p_1', p_2', p_3') = M (p_1, p_2, p_3)$.
- So we have a representation of $SL(2, Z)$.
- An example

- [IAS lecture](https://www.youtube.com/watch?v=3jksqrYuvuk)

# Counterexample to fundamental theorem of calculus?

- Integral of `1/x^2` from `[-1, 1]` should equal `-1/x` evaluated at `(-1, 1)` which gives `-1/1 - (-(-1)/1)`, that is, `-1 - 1 = -2`.
- But this is absurd since $1/x^2$ is always positive in $[-1, 1]$.
- What's going wrong?

# Why a sentinel of `-1` is sensible

- See that when we have an array, we usually index it with an array index of `0 <= i < len`.
- If `len = 0`, then the only "acceptable" `i` is `-1`, since it's the greatest integer that is less that `len=0`.

# Data structure to maintain mex

#### offline

- Key idea: maintain a set of numbers that we have not seen, and maintain
  set of numbers we have seen. Update the set of unseen numbers on queries.
  The mex is the smallest number of this set.

#### online

- Key idea: exploit cofinality. Decompose set of numbers we have not seen into two parts:
 a finitary part that we maintain, and the rest of the infinite part marked by an `int` that tells
 us where the infinite part begins.

```cpp
set<int> unseen;
map<int, int> freq;
// unseen as a data structure maintains
// information about [0..largest_ever_seen]
int largest_ever_seen;


void init() {
    unseen.insert(0);
}

void mex_insert(int k) {
    freq[k]++;
    for(int i = largest_ever_seen+1; i <= k; ++i) {
        unseen.insert(i);
    }
    unseen.erase(k);
    largest_ever_seen = max(largest_ever_seen, k);
}

void mex_delete(int k) {
    assert(freq[k] >= 1);
    freq[k]--;
    if (freq[k] == 0) {
        unseen.insert(k);
    }
}

int mex_mex() {
    assert(!unseen.empty());
    return *unseen.begin();
}
```

# Scatted algebraic number theory ideas: Ramification

- I've  had Pollion on math IRC explain ramification to me.

```
15:17 <Pollion> Take your favorite dedekind domain.
15:17 <bollu> mmhm
15:17 <Pollion> For instance, consider K a number field
15:17 <Pollion> and O_K the ring of integers.
15:17 <Pollion> Then take a prime p in Z.
15:18 <Pollion> Since Z \subset O_K, p can be considered as an element of O_K, right ?
15:18 <bollu> yes
15:18 <Pollion> Ok. p is prime in Z, meaning that the ideal (p) = pZ is a prime ideal of Z.
15:18 <bollu> yep
15:18 <Pollion> Consider now this ideal, but in O_K
15:18 <bollu> right
15:19 <Pollion> ie the ideal pO_K
15:19 <bollu> yes
15:19 <Pollion> It may not be prime anymore
15:19 <bollu> mmhm
15:19 <Pollion> So it factors as a product of prime ideals *of O_K*
15:20 <Pollion> pO_K = P_1^e_1....P_r^e_r
15:20 <Pollion> where P_i are distinct prime ideals of O_K.
15:20 <bollu> yes
15:20 <Pollion> You say that p ramifies in O_K (or in K) when there is some e_i which is > 1
15:21 <Pollion> Example
15:21 <Pollion> Take Z[i], the ring of Gauss integers.
15:22 <Pollion> It is the ring of integers of the field Q(i).
15:22 <Pollion> Take the prime 2 in Z.
15:23 <bollu> (2) = (1 + i) (1 - i) in Z[i] ?
15:23 <Pollion> Yes.
15:23 <Pollion> But in fact
15:23 <Pollion> The ideal (1-i) = (1+i) (as ideals)
15:23 <Pollion> So (2) = (1+i)^2
15:23 <Pollion> And you can prove that (1+i) is a prime ideal in Z[i]
15:23 <bollu> is it because (1 - i)i = i + 1 = 1 + i?
15:24 <Pollion> Yes
15:24 <bollu> very cool
15:24 <Pollion> Therefore, (2) ramifies in Z[i].
15:24 <bollu> is it prime because the quotient Z[i]/(1 - i) ~= Z is an integral domain? [the quotient tells us to make 1 - i = 0, or to set i = ]
15:24 <Pollion> But you can also prove that primes that ramify are not really common
15:24 <bollu> it = (1 - i)
15:25 <Pollion> In fact, 2 is the *only* prime that ramifies in Z[i]
15:25 <Pollion> More generally, you only have a finite number of primes that ramify
15:25 <bollu> in any O_K?
```


# Coreflection

- A right adjoint to an inclusion functor is a coreflector.

#### Torsion Abelain Group -> Abelian Group

- If we consider the inclusion of abelian groups with torsion into the
  category of abelian groups, this is an inclusoin functor.
- This has right adjoint the functor that sends every abelian group into
  its torsion subgroup.
- See that this coreflector somehow extracts a subobject out of the larger object.

#### Group -> Monoid
- inclusion: send groups to monoids.
- coreflection: send monoid to its group of units. (extract subobject).

#### Contrast: Reflective subcategory

- To contrast, we say a category is reflective if the inclusion $i$ has a _left_ adjoint
  $T$.
- In this case, usually the inclusion has more _structure_, and we the reflector
  $T$ manages to _complete_ the larger category to shove it into the subcategory.
- Eg 1: The subcategory of complete metric spaces embeds into the category of
  metric spaces. The reflector $T$ builds the completion.
- Eg 2: The subcategory of sheaves embeds into the category of presheaves. The
  reflector is sheafification.

#### General Contrast

- $T$ (the left adjoint to $i$) adds more structure. Eg: completion, sheafification.
- This is sensible because it's the left adjoint, so is kind of "free".
- $R$ (the right adjoint to $i$) deletes structure / pulls out substructure.
  Eg: pulling out torsion subgroup, pulling out group of units.
- This is sensible because it's the right adjoint, and so is kind of "forgetful",
  in that it is choosing to forget some global data.

#### Example from Sheaves
- This came up in the context of group actions in Sheaves in geometry and logic.
- Suppose $G$ is a topological group. Consider the category of $G$ sets,
  call it $BG$.
- If we remove the topology on $G$ to become the discrete topology, we get
  a group called $G^\delta$. This has a category of $G^\delta$ sets,
  called $BG^\delta$.


# Better `man` Pages via `info`

- I recently learnt about `info`, and it provides so much more quality than `man`!
- `info` pages about things like `sed` and `awk` are actually useful.


# The Zen of juggling three balls

- Hold one ball in the left hand `A`, two in the right hand `B, C`.
  This initial configuration is denoted `[A;;B,C]`.
- throw `B` from the right hand to the left hand. This configuration is denoted
  by `[A;Bโ†;C]` where the `Bโ†` is in the middle since it is in-flight, and has `โ†`
  since that's the direction its travelling.
- When the ball `B` is close enough to the left hand that it can be caught, *throw*
  ball `A`. Thus the configuration is now `[;(Aโ†’)(Bโ†);C]`.
- Now catch ball `B`, which makes the configuration `[B;Aโ†’;C]`.
- With the right hand, throw `C` (to anticipate catching `A`). This makes the
  configuration `[B;(Aโ†’)(Cโ†);]`
- Now catch the ball `A`, which makes the configuration `[B;Cโ†;A]`.
- See that this is a relabelling of the state right after the initial state. Loop back!

### The Zen
- The key idea is to think of it as (1) "throw (B)" (2) "throw (A), catch (B)", (3) "throw (C), catch (A)", and so on.
- The cadence starts with a "throw", and then settles into "throw, catch", "throw catch", "throw, catch", ...
- This cadence allows us to actually succeed in the act of juggling. It fuses the hard parts
  of actually freeing a hand and accurately catching the ball. One can then focus attention
  on the other side and solve the same problem again.

# Example of lattice that is not distributive

- Take a 2D vector space, and take the lattice of subspaces of the vector space.
- Take three subspaces; `a = x`, `b = y`, `c = x + y`.
- Then see that `c /\ (a \/ b) = c`, while `c /\ a = c /\ b = 0`,
  so `(c /\ a) \/ (c /\ b) = 0`.

# Patat

- Make slides that render in the terminal!
- https://github.com/bollu/patat

# Common Lisp LOOP Macro

#### Loop with index:

```
(loop for x in xs for i from 0 do ...)
```

#### Nested loop appending

```
(loop for x in `(1 2 3 4) append
      (loop for y in `(,x ,x) collect (* y y))
```

# Mitchell-Bรฉnabou language

- [Link](https://ncatlab.org/nlab/show/Mitchell-B%C3%A9nabou+language)

# Hyperdoctrine

- A hyperdoctrine equips a category with some kind of logic `L`.
- It's a functor `P: T^op -> C` for some higher category `C`, whose objects are categories
  whose internal logic corresponds to `L`.
- In the classical case, `L` is propositional logic, and `C` is the 2-category of posets.
  We send `A โˆˆ T` to the poset of subobjects `Sub_T(A)`.
- We ask that for every morphism `f: A -> B`, the morphism `P(f)` has left and right adjoints.
- These left and right adjoints mimic existential / universal quantifiers.
- If we have maps between cartesian closed categories, then the functor `f*` obeys frobenius
  reciprocity if it preserves exponentials: `f*(a^b) ~iso~ f*(a)^f*(b)`.
- https://ncatlab.org/nlab/show/Mitchell-B%C3%A9nabou+language

# Why is product in Rel not cartesian product?

#### Monoidal category
- Intuitively, category can be equipped with $\otimes, I$ that makes it a monoid.

#### Cartesian Monoidal category
- A category where the monoidal structure is given by the categorical product (universal property...).


#### Fox's theorem: Any Symmetric Monoidal Category with Comonoid is Cartesian.

- Let `C` be symmetric monoidal under $(I, \otimes)$.

- A monoid has signature `e: () -> C` and `.: C x C -> C`.
- A comonoidal structure flips this, and gives us `copy: C -> C x C`, and `delete: C -> ()`.
- Fox's theorem tells us that if the category is symmetric monoidal, and has morphisms $copy: C \to C \otimes C$,
  and $delete: C \to I$ which obey some obvious conditions, then the monoidal product is the categorical product.

#### Rel doesn't have the correct cartesian product
- This is because the naive product on Rel produces a monoidal structure on Rel.
- However, this does not validate the `delete` rule, because we can have a relation that does not relate a set to _anything_
  in the image. Thus, `A -R-> B -!-> 1` need not be the same as `A -!-> 1` if `R` does not relate `A` to ANYTHING.
- Similarly, it does not validate the `copy` rule, because first relating and then copying is not the same
  as relating to two different copies, because `Rel` represents nondeterminisim.

#### Locally Caretesian Categories

- A category is locally cartesian if each of the slice categories are cartesian.
- That is, all $n$-ary categorical products (including $0$-ary) exist in the slice category of each object.
- MLTT corresponds to [locally cartesian categories](https://www.math.mcgill.ca/rags/LCCC/LCCC.pdf)

# `simp` in Lean4

- `Lean/Elab/Tactic/Simp.lean`:

```
"simp " (config)? (discharger)? ("only ")? ("[" simpLemma,* "]")? (location)?
@[builtinTactic Lean.Parser.Tactic.simp] def evalSimp : Tactic := fun stx => do
  let { ctx, fvarIdToLemmaId, dischargeWrapper } โ† withMainContext <| mkSimpContext stx (eraseLocal := false)
  -- trace[Meta.debug] "Lemmas {โ† toMessageData ctx.simpLemmas.post}"
  let loc := expandOptLocation stx[5]
  match loc with
  | Location.targets hUserNames simplifyTarget =>
    withMainContext do
      let fvarIds โ† hUserNames.mapM fun hUserName => return (โ† getLocalDeclFromUserName hUserName).fvarId
      go ctx dischargeWrapper fvarIds simplifyTarget fvarIdToLemmaId
  | Location.wildcard =>
    withMainContext do
      go ctx dischargeWrapper (โ† getNondepPropHyps (โ† getMainGoal)) (simplifyTarget := true) fvarIdToLemmaId
where
  go (ctx : Simp.Context) (dischargeWrapper : Simp.DischargeWrapper) (fvarIdsToSimp : Array FVarId) (simplifyTarget : Bool) (fvarIdToLemmaId : FVarIdToLemmaId) : TacticM Unit := do
    let mvarId โ† getMainGoal
    let result? โ† dischargeWrapper.with fun discharge? => return (โ† simpGoal mvarId ctx (simplifyTarget := simplifyTarget) (discharge? := discharge?) (fvarIdsToSimp := fvarIdsToSimp) (fvarIdToLemmaId := fvarIdToLemmaId)).map (ยท.2)
    match result? with
    | none => replaceMainGoal []
    | some mvarId => replaceMainGoal [mvarId]
```


# Big list of Lean4 TODOS

- Hoogle for Lean4.
- show source in `doc-gen4`.
- mutual `structure` definitions.
- Make Lean4 goals go to line number when pressing `<Enter>`
- Convert lean book into `Jupyter` notebook?

# `unsafePerformIO` in Lean4:


- First do the obvious thing, actually do the IO:

```
unsafe def unsafePerformIO [Inhabited a] (io: IO a): a :=
  match unsafeIO io with
  | Except.ok a    =>  a
  | Except.error e => panic! "expected io computation to never fail"
```

- Then wrap a "safe" operation by the unsafe call.

```
@[implementedBy unsafePerformIO]
def performIO [Inhabited a] (io: IO a): a := Inhabited.default
```

# Big List of Lean4 FAQ

- `FVar`: free variables
- `BVar`: bound variables
- `MVar`: metavariables [variables for unification].
- `Lean.Elab.Tactic.*`: tactic front-end code that glues to `Lean.Meta.Tactic.*`.



# Sheaves in geometry and logic 1.2: Pullbacks
- Pullbacks are fiber bundles.
- Pullbacks for presheaves are constructed pointwise.
- The pullback of $f$ along itself in set is going to be the set of $(x, y)$ such that $f(x) = f(y)$.
- The pullback of $f: X \to Y$ along itself in an arbitrary category is an object $P$ together parallel pair of arrows `P -k,k'-> X` called the kernel pair.
- $f$ is monic iff both arrows in the kernel pair are identity `X -> X`.
- Thus, any functor preserving pullbacks preserves monics, (because it preserves pullback squares, it sends the kernel pair with
  both arrows identity to another kernel pair with both arrows identity. This means that the image of the arrow is
  again a monic).
- The pullback of a monic along any arrow is monic.
- The pullback of an epi along any arrow is epi in set, but not necessarily always!

# Sheaves in geometry and logic 1.3: Characteristic functions of subobjects


```
    !
 S --> 1
 v     |
 |     | true
 v     v
 X---->2
  phi(S)
```

- `true: 1 -> 2` is the unique monic such that `true(1) = 1` (where `2 = {0, 1}`)
- For all monic `m: S -> X` , there must be a unique `phi(S): X -> 2` such that the diagram is a pullback.
- Then `1 -true-> 2` is called as the subobject classifier. See that `1` is also determined (it is terminal object).
  So the only "choice" is in what `2` is and what the morphism `1 -true-> 2` is.
- The definition says that every monic is the pullback of some universal monic `true`.

### Subobject category

- Define an equivalence relation between two monics `m, m': S, S' -> X` where `m ~ m'`
  iff there is an iso `i: S -> S''` such that the triangle commutes:

```
  S --i--> S'
   \      /
   m\    /m'
     v  v
      X
```

- $Sub_C(X)$ is the set of all subobjects of $X$.
- to make the idea more concrete, let `C = Set` and let `X = {1, 2}`. This has subobjects
  `[{}], [{1}], [{2}], [{1, 2}]`.
- To be clear, these are given by the map `m0: {} -> {1, 2}` (trivial), `m1: {*} -> {1, 2}` where `m1(*) = 1`,
  `m2: {*} -> {1, 2}` where `m2(*) = 2`, and finally `m3: {1, 2} -> {1, 2}` given by `id`.
- The category $C$ is well powered when $Sub_C(X)$ is a small set for all $X$. That is, the class of
  subobjects for all $X$ is set-sized.
- Now given any arrow $f: Y \to X$, then pulllback of a monic $m: S -> X$ along $f$ is another monic $m': S' \to Y$.
  (recall that pullback of monic along any arrow is monic).
- This means that we can contemplate a functor $Sub_C: C^{op} \to \texttt{Set}$ which sends an object $C$
  to its set of subobjects, and a morphism $f: Y \to X$ to the pullback of the subobjects of $X$ along $f$.
- If this functor is representable, that is,




#### $G$ bundles

- If $E \to X$ is a bundle, it is a $G$-bundle if $E$ has a $G$ action such that $\pi(e) = \pi(e')$ iff there
  is a unique $g$ such that $ge = e'$. That is, the base space is the quotient of $E$ under the group,
  and the group is "just enough" to quotient --- we don't have redundancy, so we get a unique $g$.
- Now define the space $GBund(X)$ to be the set of all $G$ bundles over $X$.
- See that if we have a morphism $f: Y \to X$, we can pull back a $G$ bundle $E \to X$ to get a new bundle $E' \to Y$.
- Thus we can contemplate the functor `GBund: Space^op -> Set` which sends a space to the set of bundles over it.
- A bundle `V -> B` is said to be a classifying bundle if any bundle `E -> X` can be obtained as a pullback of the
  universal bundle `V -> B` along a unique morphism `X -> B`.
- in the case of the orthogonal group `Ok`, let `V` be the steifel manifold. Consider the quotient `V/Ok`, which
  is the grassmanian. So the bundle `V -> Gr` is a `G` bundle. Now, some alg. topology tells us that is in fact the universal
  bundle for `Ok`.
- The key idea is that this universal bundle `V -> B` represents the functor that sends a space to its set of bundles,
  This is because any bundle `E -> X` is uniquely determined by a pullback `X -> B`! So the base space `B` determines
  every bundle. We can recover the bundle `V -> B` by seeing what we get along the identity `B -> B`.


#### Sieves / Subobject classifiers of presheaves

- Let `P: C^op -> Set` be a functor.
- `Q: C^op -> Set` is a subfunctor of `P` iff `Q(c) โŠ‚ P(c)` for all `c โˆˆ C`
  and that `Qf` is a restriction of `Pf`.
- The inclusion `Q -> P` is a monic arrow in `[C^op, Set]`. So each subfunctor is a subobject.
- Conversely, all subobjects are given by subfunctors. If `ฮธ: R -> P` is a monic natural transformation
  (ie, monic arrow) in the functor category `[C^op, Set]`, then each `ฮธC: RC -> PC` is an injection (remember that `RC, PC`
  live in `Set`, so it's alright to call it an injection)
- For each `C`, let `QC` be the image of `ฮธC`. So `(QC = ฮธC(RC)) โŠ‚ PC`.
- This `Q` is manifestly a subfunctor.
- For an arbitrary presheaf `C^ = [C^op, Set]`, suppose there is a subobject classifier `O`.
- Then this `O` must at the very least classify yonedas (ie, must classify `yC = Hom(-, C): [C^op, Set]`.
- Recall that `Sub_C(X)` was the functor that sent `X โˆˆ C` to the set of subobjects of `X`, and that
  the category `C` had a subobject classifier `O` iff `Sub_C(X)` is represented by the subobject classifier `O`.
  Thus we must have that `Sub_C(X) ~= Hom(X, O)`.
- Let `y(C) = Hom(-, C)`. Thus we have the isos `Sub_C^(yC) = Hom_C^(yC, O) =[yoneda] O(C)`.
- This means that the subobject classifier `O: C^op -> Set`, if it exists, must be defined
  on objects as `O(C) = Sub_C^(yC)`. This means we need to build the set of all subfunctors of `Hom(-, C)`.

###### Sieves

- for an object `c`, a sieve on `c` is a set `S` of arrows with codomain `c` such that `f โˆˆ S` and
  for all arrows `fh` which can be defined, we have `fh โˆˆ S`.
- If we think of paths `f` as things allowed to get through `c`, this means that some path to some
  other `b` (via a `h`) followed by an allowed path to `c` (via `f` is allowed).
  So if `b -f-> c` is allowed, so is `a -h-> b -f-> c`.
- If `C` is a monoid, then a sieve is just a right ideal
- For a partial order, a sieve on `c` is a set of elements that is downward closed/smaller closed.
  If `b <f= c` is in the sieve, then so too is any element `a` such that `a <h= b <f= c`.
- So a sieve is a smaller closed subset: if a small object passes the sieve, then so does anything smaller!
- Let `Q โŠ‚ Hom(-, c) = yc` be a subfunctor. Then define the set `S_Q = { f | f: a -> c and f โˆˆ Q(a) }`.
- Another way of writing it maybe to say that we take `S_q = { f โˆˆ Hom(a, c) | f โˆˆ Q(a) }`.
- This is a sieve because `fh` is pulling back `f: a -> c` along `h: z -> a`, and the action on the hom functor will pull back
  the set `Hom(a, c)` to `Hom(z, c)`, which will maintain sieveiness, as if `f โˆˆ Hom(a, c)` then `fh โˆˆ Hom(z, c)`.
- This means that a sieve on `c` is the same as a subfunctor on `yc = Hom(c, -)`.
- this makes us propose a subobject classifier on `[C^op, Set]` to be defined as `O(c) = set of sieves of Hom(c, -)`.


# Common Lisp Debugging: Clouseau

- Install the `clouseau` package to get GUI visualizations of common lisp code.
- Use `(ql:quickload 'clouseau)` to use the package, and then use
  `(clouseau:inspect (make-condition 'uiop:subprocess-error :code 42))` to inspect a variable.

# Drawabox: Lines

#### Superimposed liens

- Step 1: Draw a line with a ruler
- Step 2: keep the pen at the beginning of the line
- Step 3: Follow the line confidently, try to end at the endpoint of the line.

#### Ghosting lines

- Step 1: Draw two endpoints
- Step 2: Mimic drawing a line [ghosting].
- Step 3: confidently draw a line. LIFT THE PEN UP to stop the pen, don't slow down!

# Common Lisp Beauty: paths

```
; Evaluation aborted on #<UNDEFINED-FUNCTION PATHNAME-TU[E {10034448F3}>
CL-USER> (pathname-type "/home/siddu_druid/**/*.mlir")
"mlir"
CL-USER> (pathname-type "/home/siddu_druid/**/foo")
NIL
CL-USER> (pathname "/home/siddu_druid/**/foo")
#P"/home/siddu_druid/**/foo"
CL-USER> (pathname-directory "/home/siddu_druid/**/foo")
(:ABSOLUTE "home" "siddu_druid" :WILD-INFERIORS)
CL-USER> (pathname-directory "/home/siddu_druid/**/foo/**/bar")
(:ABSOLUTE "home" "siddu_druid" :WILD-INFERIORS "foo" :WILD-INFERIORS)
CL-USER> (pathname-tu[ey "/home/siddu_druid/**/foo/**/bar")
; in: PATHNAME-TU[EY "/home/siddu_druid/**/foo/**/bar"
;     (PATHNAME-TU[EY "/home/siddu_druid/**/foo/**/bar")
;
; caught STYLE-WARNING:
;   undefined function: COMMON-LISP-USER::PATHNAME-TU[EY
;
; compilation unit finished
;   Undefined function:
;     PATHNAME-TU[EY
;   caught 1 STYLE-WARNING condition
; Debugger entered on #<UNDEFINED-FUNCTION PATHNAME-TU[EY {10038F92C3}>
[1] CL-USER>
; Evaluation aborted on #<UNDEFINED-FUNCTION PATHNAME-TU[EY {10038F92C3}>
CL-USER> (pathname-type "/home/siddu_druid/**/foo/**/bar")
NIL
CL-USER> (pathname-type "/home/siddu_druid/**/foo/**/bar.ty")
"ty"
CL-USER> (pathname-name "/home/siddu_druid/**/foo/**/bar.ty")
"bar"
CL-USER> (pathname-name "/home/siddu_druid/**/foo/**/*.ty")
:WILD
CL-USER> (pathname-name "/home/siddu_druid/**/foo/**/*.ty"); Evaluation aborted on #<UNDEFINED-FUNCTION PATHNAME-TU[E {10034448F3}>
CL-USER> (pathname-type "/home/siddu_druid/**/*.mlir")
"mlir"
CL-USER> (pathname-type "/home/siddu_druid/**/foo")
NIL
CL-USER> (pathname "/home/siddu_druid/**/foo")
#P"/home/siddu_druid/**/foo"
CL-USER> (pathname-directory "/home/siddu_druid/**/foo")
(:ABSOLUTE "home" "siddu_druid" :WILD-INFERIORS)
CL-USER> (pathname-directory "/home/siddu_druid/**/foo/**/bar")
(:ABSOLUTE "home" "siddu_druid" :WILD-INFERIORS "foo" :WILD-INFERIORS)
CL-USER> (pathname-tu[ey "/home/siddu_druid/**/foo/**/bar")
; in: PATHNAME-TU[EY "/home/siddu_druid/**/foo/**/bar"
;     (PATHNAME-TU[EY "/home/siddu_druid/**/foo/**/bar")
;
; caught STYLE-WARNING:
;   undefined function: COMMON-LISP-USER::PATHNAME-TU[EY
;
; compilation unit finished
;   Undefined function:
;     PATHNAME-TU[EY
;   caught 1 STYLE-WARNING condition
; Debugger entered on #<UNDEFINED-FUNCTION PATHNAME-TU[EY {10038F92C3}>
[1] CL-USER>
; Evaluation aborted on #<UNDEFINED-FUNCTION PATHNAME-TU[EY {10038F92C3}>
CL-USER> (pathname-type "/home/siddu_druid/**/foo/**/bar")
NIL
CL-USER> (pathname-type "/home/siddu_druid/**/foo/**/bar.ty")
"ty"
CL-USER> (pathname-name "/home/siddu_druid/**/foo/**/bar.ty")
"bar"
CL-USER> (pathname-name "/home/siddu_druid/**/foo/**/*.ty")
:WILD
CL-USER> (pathname-name "/home/siddu_druid/**/foo/**/*.ty")
```

# Logical Predicates (OPLSS '12)

- $R_\tau(e)$ has three conditions:
- (1) $e$ has type $\tau$
- (2) $e$ has the property of interest ($e$ strongly normalizes / has normal form)
- (3) The set $R\tau$ is closed under eliminators!
- My intuition for (3) is that expressions are "freely built" under constructors. On the other hand, it is eliminators
  that perform computation, so we need $R_\tau$ to be closed under "computation" or "elimination"
- [Video](https://www.youtube.com/watch?v=h5kDxde6PTc)



# Logical Relations (Sterling)

- Key idea is to consider relations $R_\tau$ between closed terms of types $\tau_l$ and $\tau_r$. That is, we have
  have a relation $R_\tau \subseteq \{ (t_l, t_r): (\cdot \vdash t_l : \tau_l), (\cdot \vdash t_r : \tau_r)$.
- We write a relation between two closed terms $\tau_L$ and $\tau_R$ as: $R_{\tau} \equiv (\cdot \vdash \tau_L) \times (\cdot \vdash \tau_R)$.
- A morphism of relations $f: R_\sigma to R_\tau$ is given by two functions $f_l: \sigma_l \to \tau_l$ and $f_r: \sigma_r \to \tau_r$
  such that $aR_\sigma b \implies f_l(a) R_\tau f_r(b)$.

#### Logical relations for function spaces

- Given this, we can build up logical relations for more complex cases like function types and quantified types.
  For example, given logical relations $R_\sigma$ and $R_\tau$, we build $R_{\sigma \to \tau}$ to be the relation between
  types $(\cdot \vdash \sigma_l \to \tau_l) \times (\cdot \vdash \sigma_r \to \tau_r)$, and given by the formula:

$$
(f_l: \sigma_l \to \tau_l, f_r: \sigma_r \to \tau_r) : R_{\sigma \to \tau} \equiv \forall (x_l : \sigma_l , x_r : \sigma_r) \in R_\sigma, (f_l(x_l), f_r(x_r)) \in R_\tau
$$

- This satisfies the universal property of functions in the category of logical relations, ie, there is an adjunction
  between $R_{\rho \to \sigma} \to R_{\tau}$ and $R_{\rho} \to R_{\sigma \to \tau}$.
- Next, we can interpret a base type like `bool` by the logical relation that encodes equality on that type.
  so $R_{\texttt{bool}} : (\cdot \vdash \texttt{bool}) \times (\cdot \vdash \texttt{bool})$ and is given by:

#### Logical relations for data types

$$
R_{\texttt{bool}} \equiv \{ (\texttt{true, true}), (\texttt{false, false}) \}
$$

#### Logical relations for parametric types

- for a type of the form $\tau(\alpha)$ that is parametric in $\alpha$, suppose we have a family
  of relations $R_{\tau \alpha} \subseteq  \{ (\cdot \vdash \tau_l(\alpha_l) \times (\cdot \vdash \tau_r(\alpha_r) \}_{R_\alpha}$
  which vary in $R_\alpha$.
- Then we define the logical relation for the type
  $R_{\forall \alpha, \tau(\alpha)} \subseteq (\cdot \vdash \forall \alpha \tau_l(\alpha)) \times (\cdot \vdash \forall \alpha \tau_r(\alpha))$ as:

$$
R_{\forall \alpha, \tau (\alpha)} \equiv
\{ (f_l : \forall \alpha, \tau_l(\alpha), f_r: \forall \alpha, \tau_r(\alpha))
\mid
\forall R_\alpha, (f_l(\alpha_l), f_r(\alpha_r)) \in R_{\tau(\alpha)}
\}
$$

#### Proving things using logical relations

- For $f: \forall \alpha, \alpha \to \texttt{bool}$,  we have that $f @\texttt{unit} (()) = f @ \texttt{bool}(\texttt{true})$
  That is, the function value at `() : unit` determines the value of the function also at `true: bool` (and more generally, everwhere).

- To prove this, we first invoke that **by soundness**, we have that $(f, f) \in R_{\forall \alpha. \alpha \to \texttt{bool}}$. On
  unwrapping this, this means that:

$$
\forall R_\alpha, \forall (x_l, x_r) \in R_\alpha, ((f(x_l), f(x_r)) \in R_{\texttt{bool}})
$$

- Plugging in $R_{\texttt{bool}}$, this gives us an equality:


$$
\forall R_\alpha, \forall (x_l, x_r) \in R_\alpha, (f(x_l) =  f(x_r))
$$

- We now choose $R_\alpha \subseteq (\cdot \vdash \texttt{unit}) \times (\cdot \vdash \texttt{bool})$, with the singleton element $\{ ((), \texttt{true}) \}$.


- [Jon Talk](https://www.youtube.com/watch?v=AEthjg2k718)

##### $(x/p)$ is $x^{(p-1)/2}$

- Since $x$ is coprime to $p$, we have that  $1 \equiv x^{p-1}$
- This can be written as $1^2 - x^{({p-1}/2)^2} = 0$. [$(p-1)$ is even when $p>2$].
- That is, $(1 - x^{(p-1)/2})(1 + x^{(p-1)/2}) = 0$.
- Since we are in an integral domain (really a field), this means that $x^{(p-1)/2} \equiv \pm 1 (\mod p)$.


# Pointless topology: Frames

- A frame is a lattice with arbitrary joins, finite meets, with distributive law: $A \cap \cup_i B_i = \cup_i A \cap B_i$.
- A map of frames is a lattice map between frames.
- A category of locales is the opposite category of frames.


### Thm: Any locale has a smallest dense sublocale
- For example, $\mathbb R$ has $\mathbb Q$.

### Sober spaces
- A space is sober iff every irreducible closed subset is the closure of a single point.
- A sober space is one whose lattice of open subsets determine the topology of the space.
- A space $X$ is sober iff for every topological embedding $f: X \to X'$ that adds more points to $X$,
  if the inverse image map $f: O(X') \to O(X)$ is an isomorphism, then $f$ is a homeomorphism.
  [Source: martin escardo twitter](https://twitter.com/EscardoMartin/status/1573417458178093056?s=20&t=9dbWTcOVpbLhOB1LG8BSDQ)
  This means we can't add more points to $X$ without changing its topology. it has as many points as it could.
- Equivalently: Every complete prime filter of open sets is the open nbhd filter of a unique point.
- $F \subseteq O(X)$ is a completely prime filter iff (1) $F$ is closed under all finite intersections   (including empty),
  (2) if the union of some family $O_i$ is in $F$, then some $O$ is already in $F$ (prime).
- This tries to specify a point by open sets.
- Joke: A sober space is one where what you see is there, and you don't see double.
  What you see is there: every completely prime filter is the nbhd of some point. You don't see double: the pt is unique.

# Introduction to substructural logics: Ch1

#### Terminology


#### Logic as talking about strings

- The book gives a new (to me) interpretation of rules like $X \vdash A$.
  It says that this can be read as "the string $X$ is of type $A$", where type is some
  chomskian/grammarian sense of the word "type".
- This means that we think of $X ; Y$ as "concatenate $X$ and $Y$".
- This allows one to think of $X \vdash A \to B$ as the statement "$X$ when concatenated with a string of type $A$
  produces a string of type $B$".
- This is interesting, because we can have judgements like $X \vdash A$ and $X \vdash B$ with no problem, we're asserting
  that the string $X$ is of type $A$, $B$. Which, sure, I guess we can have words that are both nouns and verbs, for example.
- Under this guise, the statement $X \vdash A \land B$ just says that "$X$ is both a noun and a verb".
- Further, if I say $X \vdash A$ and $Y \vdash B$, then one wants to ask "what is type of $X; Y$ ? we want to say
  "it is the type $A$ next to $B$", which is given by $A \circ B$ (or, $A \otimes B$ in modern notation).
- This is cool, since it gives a nice way to conceptualize the difference between conjunction and tensoring.


#### Tensor versus conjunction as vector spaces

- What I got most out of this was the difference between what they call fusion
  (what we now call tensoring in linear logic) and conjunction.
- Key idea: Let's suppose we're living in some huge vector space, and the statement
  $X \vdash A$ should be read as "the vector $X$ lives in the subspace $A$ of the large vector space.
- Then, the rule $X \vdsh A$, $X vdash B$ entails $X \vdash A \land B$ means:
  if $X$ lives in subspace $A$ and $X$ lives in subspace $B$, then $X$ lives in the intersection $A \cap B$.
- On the other hand, the rule $X \vdash A$, $Y \vdash B$ entails $X ; Y \vdash A \circ B$ means:
  if $X$ lives in subspace $A$, $Y$ lives in subspace $B$, then the vector $X \otimes Y$ lives in subspace $A \otimes B$.
- See that in the case of the conjunction, we are talking about **the same** $X$, just choosing to restrict where it lives ($A \cap B$)
- See that in the case of tensor product, we have **two** elements $X$ and $Y$,
  which live in two different subspaces $A$ and $B$.


#### Cut and admissibility

- Cut is the theorem that lets you have lemmas.
- It says that if $X \vdash A$, and $Y(A) \vdash B$ then $Y(X) \vdash B$.
- I don't understand what this means in terms of the interpretation of "left hand side as values, right hand side as types",
  or under "left side is strings, right side is types".
  The rule $Y(A) \vdash B$ is, at best, some kind of unholy dependently typed nonsense under this interpretation.
- A theory is **cut-admissible** if the axioms let you prove cut.
- In general, a theory is admissible to some axiom $A$ if the axioms of the theory allows one to prove $A$.


# Integrating against ultrafilers

- Let $X$ be a set.
- Recall that a filter on $X$ is a collection of subsets $\Omega$ of $X$ that are
  closed under supersets and intersections (union comes for free by closure under supersets).
- Recall that an ultrafilter $\Omega$ on $X$ is a maximal filter. That is, we cannot add any more elements into the filter.
- Equivalently $\Omega$ is an ultrafilter if, for any $A \subseteq X$, either $A \in \Omega$ or $(X - A) \in \Omega$.
- Intuitively, we are considering the set of subsets of $X$ that contains a single $x \in X$.
- We can also say that ultrafilters correspond to lattice homomorphisms $2^X \to 2$.
- A lemma will show that this is equivalent to the following: Whenever $X$ is
  expressed as the disjoint union of three subsets $S_1, S_2, S_3 \subseteq X$, then one of
  then will be in $\Omega$ (there exists some $i$ such that $S_i\in \Omega$).

#### Lemma: Three picking equivalent to ultrafilter

#### Integration by ultrafilter

- Let $B$ a finite set, $X$ a set, $\Omega$ an ultrafilter on $X$.
- Given $f: X \to B$, we wish to define $\int_X f d\Omega$.
- See that the fibers of $f$ partition $X$ into disjoint subsets $f^{-1}(b_1), f^{-1}(b_2), \dots, f^{-1}(b_N)$.
- The ultrafilter $X$ picks out one of these subsets, say $f^{-1}(b_i)$ ($i$ for "integration").
- Then we define the integral to be $b_i$.

#### What does this integral mean?
- We think of $\Omega$ as a probability measure. Subsets in $\Omega$ have measure 1, subsets outside have measure 0.
- Since we want to think of $\Omega$ as some kind of probability measure, we
  want that $\int_X 1 d \Omega = 1$, as would happen when we integrate a probability measure $\int d \mu = 1$.
- Next, if two functions $f, g$ are equal almost everywhere (ie, the set of points where they agree is in $\Omega$),
  then their integral should be the same.



# wegli: Neat tool for semantically grepping C++

- https://github.com/googleprojectzero/weggli


# Mostowski Collapse

- Let $V$ be a set, let $U$ be a universe and let $R$ be a well founded relation on $V$.
- Recall that a relation is well-founded iff every non-empty subset contains a minimal element.
  Thus, we can perform transfinite induction on $V$.
- A function $\pi_R: V \to U$ defined via well founded induction as $\pi_R(x) \equiv \{ \pi(y): y \in V \land yRx \}$
  is called as the mostowski function on $R$. (We suppress $\pi_R$ to $\pi$ henceforth).
- The image $\pi''V \equiv \{ \pi(x) : x \in V \}$ is called as the Mostowski collapse of $R$.
- Consider the well founded relation $R \subseteq N \times N$ such that $xRy$ iff $y = x + 1$


#### Image of collapse is transitive
- Let $U$ be a universe, let $(V, <)$ be a well founded relation on $V$.
- Let $\pi: V \to U$ be the mostowski function on $V$.
- Suppose $a \in b \in \pi[V]$. We must show that $a \in \pi[V]$.
- Since $b \in \pi[V]$, there is a $v_b \in V$ such that $\pi(v_b) = b$.
- By the definition of the Mostowski function, $b = \pi(v_b) = \{ \pi(v) : v \in V \land (v < v_b) \}$
- Since $a \in b$, this implies that there exists a $v_a < v_b$ such that $\pi(v_a) = a$.
- This implies that $a$ is in the image of $\pi[V]$: $a \in \pi[V]$.
- Thus, the set $\pi[V]$ is transitive: for any $b \in \pi[V]$ and $a \in b$, we have shown that $a \in \pi[V]$.

#### Image of collapse is order embedding if $R$ is extensional
- We already know that $\pi[V]$ is transitive from above.
- We assume that $R$ is extentional. That is:  $\forall a, aRx = aRy \iff x = y$. [ie, the fibers $R^{-1}(-)$ are distinct].
- We want to show that $v_1 < v_2 \iff \pi(v_1) \in \pi(v_2)$.

##### Forward: $v_1 < v_2 \implies \pi(v_1) \in \pi(v_2)$:
- $v_1 < v_2$, then $\pi(v_2) = \{ \pi(x): x < v_2 \}$. This implies that $\pi(v_1) \in \pi(v_2)$.

##### Backward: $\pi(v_1) \in \pi(v_2) \implies v_1 < v_2$:
- Let $\pi(v_1) < \pi(v_2)$.
- By the definition of the Mostowski function, we have that $\pi(v_2) = \{ \pi(v'): v' < v_2 \}$
- Thus, there is some $v'$ such that $\pi(v') = \pi(v_1)$.
- We wish to show that $v' = v_1$, or that the collapse function is injective.

##### Collapse is injective:
- We will suppose that the collapse is not injective and derive a contradiction.
- Suppose there are two elements $v_1, v_2$ such that $v_1 \neq v_2$ but $\pi(v_1) = \pi(v_2)$.
- WLOG, suppose $v_1 < v_2$: the relation is well-founded, and thus the set $\{v_1, v_2\}$ ought to have a minimal element, and $v_1 \neq v_2$.
- We must have $\pi(v_1) \subsetneq \pi(v_2)$,





- [Reference: book of proofs](https://www.bookofproofs.org/branches/mostowski-function-and-collapse/)

# Spaces that have same homotopy groups but not the same homotopy type

- Two spaces have the same homotopy type iff there are functions $f: X \to Y$ and $g: Y \to X$
  such that $f \circ g$ and $g \circ f$ are homotopic to the identity.
- Now consider two spaces: (1) the point, (2) the topologists's sine curve with two ends attached (the warsaw circle).
- See that the second space can have no non-trivial fundamental group, as it's impossible to loop around the sine curve.
- So the warsaw circle has all trivial $\pi_j$, just like the point.
- See that the map $W \to \{ \star \}$ must send every point in the warsaw circle to the point $\star$.
- See that the map backward can send $\star$ somewhere, so we are picking a point on $W$.
- The composite smooshes all of $W$ to a single point. For this to be homotopic to the identity is to say that the space is contractible.

# Fundamental group functor does not preserve epis

- Epis in the category of topological spaces are continuous functions that have dense image.
- Take a circle $S^1$ and pinch it in the middle to get $S^1 \lor S^1$. this map is an epi: $f: S^1 \to S^1 \lor S^1$.
- See that this does not induce an epi $\pi(Z) \to \pi_(Z) \star \pi_1(Z)$.
- Maybe even more simply, the map $f: [0, 1] \to S^1$ is an epi
- Thus, fundamental group functor does not preserve epis.

# Epi in topological spaces


- Epis in the category of topological spaces are continuous functions that have dense image.
- Proof: TODO

# Permutation models

- These are used to show create models of `ZF + not(Choice)`.
- Key idea: if we just have ZF without atoms, then a set has no non-trivial `โˆˆ` preserving permutations.
- Key idea: if we have atoms, then we can permute the atoms to find non-trivial automorphisms of our model.
- Key idea: in ZF + atoms, the `ordinal`s come from the ZF fragment, where they live in the kernel [ie the universe formed by repeated application
  of powerset to the emptyset]. Thus, the "order theory" of ZF + atoms is controlled by the ZF fragment.
- Crucially, this means that the notion of "well ordered" [ie, in bijection with ordinal] is determined by the ZF fragment.
- Now suppose (for CONTRADICTION) that `A` is well ordered.
  This means that we Now suppose we have an injection `f: ordinal -> A` where `A` is our set of atoms.
- Since `A` possesses non-trivial structure preserving
  automorphisms, so too must `ordinal`, since `ordinal` is a subset of `A`. But this violates the fact that `ordinal` cannot posses
  a non-trivial automorphism.
- Thus, we have contradiction. Ths means that `A` cannot be well-ordered, ie, there cannot be an injection `f: ordinal -> A`.

# Almost universal class

- A universal class is one that contains all subsets as elements.
- A class is almost universal if every subset of $L$ is a a subset of some element of $L$. But note that $L$ does not need to have all subsets as elements.
- $L$ is almost universal if for any subset $A \subset L$ (where $A$ is a set), there is some $B \in L$ such that $A \subseteq B$,
  but $A$ in itself need not be in $L$.

# Godel operations

- A finite collection of operations that is used to create all constructible sets from ordinals.
- Recall $V$, the von neumann universe, which we build by iterating powersets starting from $\emptyset$.
  That is, $f(V) = \mathcal P(V) \cup \mathcal P (\mathcal P(V))$
- We construct $L$ sort of like $V$, but we build it by not taking $P(V)$ fully, but only taking subsets
  that are carved out by using subsets via first order formulas used to filter the previous stage.
- This makes sure that the resulting sets are independent of the peculiarities of the surrounding model, by
  sticking to FOL filtered formulas.




# Orthogonal Factorization Systems

- For a category $C$, a factorization system consists of sets of morphisms $(E, M)$ such that:
- $E, M$ contain all isos.
- $E, M$ are closed under composition.
- every morphism in $C$ can be factored as $M \circ E$
- The factorization is _functorial_:
- [Reference: Riehl on factorization systems](https://math.jhu.edu/~eriehl/factorization.pdf)

# Orthogonal morphisms

Two morphisms `e: a -> b` and `m: x -> y` are orthogonal iff for any `(f, g)` such
that the square commutes:

```
a --e--> b
|        |
f        g
|        |
v        v
x --m--> y
```

then there exists a UNIQUE diagonal `d: b -> x` such that the the triangles
commute: (`f = d . e`) and (`m . d = g`):

```
a --e--> b
|       / |
f      / g
|   /!d  |
v /      v
x --m--> y
```


# Locally Presentable Category

- A category is locally presentable iff it has a set $S$ of objects such that
  every object is a colimit over these objects. This definition is correct upto
  size issues.
- A locally presentable category is a reflective localization $C \to Psh(S)$ of a category
  of presheaves over $S$. Since $Psh(S)$ is the free cocompletion, and localization imposes
  relations, this lets us write a category in terms of generators and relations.

- Formally, $C$ :
- 1. is locally small
- 2. has all small colimits
- 3. `<TECHNICAL SIZE CONDITIONS; TALK TO OHAD>`

#### Localization

- Let $W$


#### Reflective localization

#### Accessible Reflective localization




# Remez Algorithm

- [link](https://en.wikipedia.org/wiki/Remez_algorithm)

# Permission bits reference

- I always forget the precise encoding of permissions, so I mkae a cheat sheet to
  remember what's what. It's `read,write,execute` which have values `2^2, 2^1, 2^0`.

```
+-----+---+--------------------------+
| rwx | 7 | Read, write and execute  |
| rw- | 6 | Read, write              |
| r-x | 5 | Read, and execute        |
| r-- | 4 | Read,                    |
| -wx | 3 | Write and execute        |
| -w- | 2 | Write                    |
| --x | 1 | Execute                  |
| --- | 0 | no permissions           |
+------------------------------------+
```

```
+------------+------+-------+
| Permission | Octal| Field |
+------------+------+-------+
| rwx------  | 0700 | User  |
| ---rwx---  | 0070 | Group |
| ------rwx  | 0007 | Other |
+------------+------+-------+
```

# Papers on Computational Group Theory

- A practical model for computation with matrix groups.
- A data structure for a uniform approach to computations with finite groups.
- A fast implementatoin of the monster group.

# Kan Extensions: Key idea

- The key insight is to notice that when we map from $C \to E$ via $K$, then the $K(x)$ object that we get
  whose comma we form with $K \downarrow Kx$ also has an arrow $Kx \to Kx$ via the identity arrow.
  Thus we can think of $K \downarrow Kx$ as looking like `(<stuff> -> Kx) -> Kx`. So it's really the `Kx`
  in the `<stuff> -> Kx` that controls the situation.



# Interleaved dataflow analysis and rewriting


```
fact: {} -> PROPAGATE
x = 1
fact: {x: 1}
y = 2
fact: {x: 1, y: 2}
~z = x + y~~
{x: 1, y : 2, z: 3} -> REWRITE + PROPAGATE
z = 3

-- :( rewrite, propagate. --

fact: {} -> PROPAGATE
x = 1
fact: {x: 1}
y = 2
fact: {x: 1, y: 2}
~z = x + y~~
{x: 1, y : 2} -> REWRITE
z = 3 <- NEW statement from the REWRITE;
fact: {x: 1, y: 2, z: 3}

x = 2 * 10 ; (x = 20; x is EVEN)
y = 2 * z; (y = UNK; y is EVEN)

-> if (y %2 == 0) { T } else { E }
T -> analysis
```



# Central variable as `focal`

- The NLTK code which [breaks down a word into syllables](https://www.nltk.org/_modules/nltk/tokenize/sonority_sequencing.html)
  inspects trigrams.
- It names the variables of the trigrams `prev`, `focal`, and `next`.
- I find the name `focal` very evocative for what we are currently focused on! It is free of the implications
  of a word like `current`.

# Wilson's theorem

- We get $p \equiv 1$  (mod $4$) implies $((p-1)/2)!$ is a square root of -1.
- It turns that this is because from Wilson's theorem, $(p-1)! = -1$.
- Pick $p = 13$. Then in the calculation of $(p-1)!$, we can pair off $6$ with $-6=7$, $5$ with $-5=8$ and so on.
- So we get $(p-1)/2 \times (p-1)/2 = (p-1)!$.
- This means that $(p-1)/2 = \sqrt{-1}$.
- The condition $(p-1)/2$ is even is the same as saying that $p-1$ is congruent to $0$ mod $4$,
  or that $p$ is congruent to $1$ mod $4$.
- It's really nice to be able to see where this condition comes from!

# General enough special cases

- Also, I feel thanks to thinking about combinatorial objects for a while
  I've gained some kind of "confidence", where I check a special
  case which I am confident generalizes well.

```
void editor_state_backspace_char(EditorState& s) {
    assert(s.loc.line <= s.contents.size());
    if (s.loc.line == s.contents.size()) { return; }
    std::string& curline = s.contents[s.loc.line];
    assert(s.loc.col <= curline.size());
    if (s.loc.col == 0) { return; }
    // think about what happens with [s.loc.col=1]. Rest will work.
    std::string tafter(curline.begin() + s.loc.col, curline.end());
    curline.resize(s.loc.col - 1); // need to remove col[0], so resize to length 0.
    curline += tafter;
    s.loc.col--;
}
```
# XOR and AND relationship

-  `a xor b = a + b - 2 (a & b)`

# Geometry of complex integrals

- integral f(z) dz is work in real part, flux in imaginary part.
- https://www.youtube.com/watch?v=EyBDtUtyshk

# Green's functions
- Can solve $L y(x) = f(x)$.
- $f(x)$ is called as the forcing operator.
- $L$ is a linear diffeential operator. That is, it's a differential operator lik $\partial_x$ or $\partial_t \partial_t$.
  See that $\partial_t \partial_t$ is linear, because
  $\partial_t \partial_t \alpha f + \beta g = \alpha (\partial_t \partial_t f) + \beta (\partial_t \partial_t g)$
- [Video reference](https://www.youtube.com/watch?v=ism2SfZgFJg)

# CP trick: writing exact counting as counting less than

- If we can solve for number of elements `<= k`, say given by `leq(k)` where `k` is an integer,
  then we can also solve for number of elements `= k`, given by `eq(k) := leq(k) - leq(k - 1)`.
- While simple, this is hugely benificial in many situations because `<=k` can be implement as some kind of
  prefix sum data structure plus binary search, which is much less error prone to hack up than exact equality.

# CP trick: Heavy Light Decomposition euler tour tree

- To implement HLD, first define heavy edge to be edge to heaviest vertex.
- To use segment tree over HLD paths, create a "skewed" DFS where each node visits
  heavy node first, and writes vertices into an array by order of discovery time (left paren time).
- When implementing HLD, we can use this segment tree array of the HLD tree as an euler tour of the tree.
- We maintain intervals so it'll be `[left paren time, right paren time]`. We find
  right paren time based on when we exit the DFS. The time we exit the DFS is the rightmost time
  that lives within this subtree.


# Counting with repetitions via pure binomial coefficients

- If we want to place $n$ things where $a$ of them are of kind `a`, $b$ are of kind `b`, $c$
  of them are kind $c$.
  the usual formula is $n!/(a!b!c!)$.
- An alternative way to count this is to think of it as first picking $a$ slots from $n$, and then
  picking $b$ slots from the leftover $(n - a)$ elements, and finally picking $c$ slots from $(n - a - b)$.
  This becomes $\binom{n}{a}\binom{n-a}{b}\binom{n - a - b}{c}$.
- This is equal to $n!/a!(n -a)! \cdot (n-a)!/n!(n - a - b)! \cdot (n - a - b)! / c!0!$,
  which is equal to the usual $n!/a!b!c!$ by cancelling and setting $c = n - a - b$.
- Generalization is immediate.

# Fundamental theorem of homological algebra [TODO]

- Let $M$ be an $R$ module.
- A resolution of $M$ is an exact chain complex `... -> M2 -> M1 -> M0 -> M -> 0`
- A projective resolution of `P*` of `M` is a resolution such that all the `P*` are projective.

#### Fundamental theorem
- 1. Every `R` module has projective resolution.
- 2. Let `P*` be a chain complex of proj. R modules. Let `Q*` be a chain complex with
     vanishing homology in degree greater than zero. Let `[P*, Q*]` be the group of chain homotopoloy classes
     of chain maps from `P*` to `Q*`.  We are told that this set is in bijection with maps
    `[H0(P*), H0(Q*)]`. That is, the map takes `f*` to `H0[f*]` is a bijection.

#### Corollary: two projective resolutions are chain homotopy equivalent
- Let `P1 -> P0 -> M` and `... -> Q1 -> Q0 -> M` be two projective resolutions.
- `H0(P*)` has an epi mono factorization `P0 ->> H0(P*)` and `H0(P*) ~= M`.



#### Proof of existence of projective resolution
- Starting with `M` there always exists a free module `P0` that is epi onto `M`, given by taking the free
  module of all elements of `M`. So we get `P0 -> M -> 0`.
- Next, we take the kernel, which gives us:

```
     ker e
        |
        |   e
        vP0 -> M -> 0
```

- The next `P1` must be projective, and it must project onto `ker e` for homology to vanish. So we
  choose the free module generated by elements of `ker e` to be `P1`!


```
    ker e
    ^   |
    |   v  e
P1---   P0 -> M -> 0
```


- Composing these two maps gives us `P1 -> P0 -> M`. Iterate until your heart desires.


## Chain homotopy classes of chain maps

# Projective modules in terms of universal property

## (1): Universal property / Defn

- $P$ is projective iff for every epimorphism $e: E \to B$, and every morphism $f: P \to B$,
  there exists a lift $\tilde{f}: P \to E$.


```
     e
   E ->> B
   ^   ^
  f~\  | f
     \ |
       P
```


## Thm: every free module is projective
- Let $P$ be a free module. Suppose we have an epimorphism $e: M \to N$ and a morphism $f: P \to N$.
  We must create $\tilde f: M \to N$
- Let $P$ have basis $\{ p_i \}$. A morphism from a free module is determined by the action on the basis.
  Thus, we simply need to define $\tilde f(p_i)$.
- For each $f(p_i): N$, there is a pre-image $m_i \in M$ such that $e(m_i): N = f(p_i): N$.
- Thus, define $\tilde{f}(p_i) = m_i$. This choice is **not canonical** since there could be **many such $m_i$**.
- Regardless, we have succeeded in showing that every free module is projective by lifting $f: P \to N$ to a map
  $\tilde f: M \to N$.


## (1 => 2): Projective as splitting of exact sequences
- $P$ is projective iff every exact sequence $0 \to N \to M \xrightarrow{\pi} P \to 0$ splits.
- That is, we have a section $s: P \to M$ such that $\pi \circ s = id_P$.

- **PROOF (1 => 2):** Suppose $P$ solves the lifting problem. We wish to show that this implies that exact sequence splits.
- Take the exact sequence:


```
            pi
0 -> N -> M -> P -> 0
               ^
               | idP
               P
```
- This lifts into a map $P \to M$ such that the composition is the identity:

```
            pi
0 -> N -> M -> P -> 0
          ^   ^
       idP~\  | idP
            \ |
             P
```

- This gives us the section `s = idP~` such that `pi . s = idP` from the commutativity of the above diagram.



## (2 => 3): Projective as direct summand of free module
- $P$ is projective iff it is the direct summand of a free module. So there is a another module $N$ such that $P \oplus N \equiv R^n$.
- We can always pick a surjective epi $\pi: F \to P$, where $F$ is the free module over all elements of $P$.
- We get our ses $0 \to ker(\pi) \to F \to P \to 0$. We know this splits because as shown above, projective splits
  exact sequences where $P$ is the surjective image.
- Since the sequence splits, the middle term $F$ is a direct sum of the other two terms. Thus $F \simeq \ker \pi \oplus P$.

#### Splitting lemma

- If an exact sequence splits, then middle term is direct sum of outer terms.

## (3 => 1): Direct summand of free module implies lifting

- Let's start with the diagram:

```
  e
E ->>B
     ^
    f|
     P
```

- We know that $P$ is the direct summand of a free module, so we can write a `P(+)Q` which is free:

```
  e
E ->>B
     ^
    f|
     P <<-- P(+)Q
         pi
```

- We create a new arrow `f~ = f . pi` which has type `f~: P(+)Q -> B`. Since this is a map from a free module into `B`,
  it can be lited to `E`. The diagram with `f~` looks as follows:

```
  e
E ->>B <--
     ^    \f~
    f|     \
     P <<-- P(+)Q
         pi
```

- After lifting `f~` to `E` as `g~`, we have a map `g~: P(+)Q -> E`.

```
--------g~--------
|                |
v e              |
E ->>B <--       g~
     ^    \f~    |
    f|     \     |
     P <<-- P(+)Q
         pi
```


- From this, I create the map `g: P -> E` given by `g(p) = g~((p, 0))`. Thus, we win!

## Non example of projective module

- `Z/pZ` is not projective.
- We have the exact sequence `0 -> Z -(xp)-> Z -> Z/kZ -> 0` of multiplication by `p`.
- This sequence does not split, because `Z` (middle) is not a direct summand of `Z` (left) and `Z/kZ` (right),
  because direct summands are submodules of the larger module. But `Z/pZ` cannot be a submodule of `Z` because `Z/pZ`
  is torsion while `Z` is torsion free.

## Example of module that is projective but not free

- Let $R \equiv F_2 \times F_2$ be a ring.
- The module $P \equiv F_2 \times \{0\}$ is projective but not free.
- It's projective because it along with the other module $Q \equiv \{0\} \times F_2$ is isomorphic to $R$.
  ($P \oplus Q = R$).
- It's not free because any $R^n$ will have $4^n$ elements, while $P$ has only two element.
- Geometrically, we have two points, one for each $F_2$.
  The module $P$ is a vector bundle that only takes values over one of the points.
  Since the bundle different dimensions over the two points (1 versus 0), it is projective but not free.
- It is projective since it's like a vector bundle. It's not free because it doesn't have constant dimension.

#### References
- [video](https://www.youtube.com/watch?v=odva24Ro-44&list=PL2Rb_pWJf9JqgIR6RR3VFF2FwKCyaUUZn&index=37)


# How ideals recover factorization [TODO]

- consider $Z[-5]$. Here, we have the equation that $2 \times 3 = (1 + \sqrt{-5})(1 - \sqrt{-5})$.
- Why are $2, 3, (1 + \sqrt 5), (1 - \sqrt 5)$ prime?
- we can enumerate numbers upto a given absolute value.
  Since the absolute value is a norm and is multiplicative, we only need to check for prime factorization
  of a given number $n$ in terms
  of primes $p$ with smaller absolute value (ie, $|p| < |n|$).
- If we list numbers in $Z[-\sqrt{5}]$ upto norm square $6$ (because $6$ is the norm square of $1 - \sqrt{5}$), we get:


This was generated from the python code:

```py
class algnum:
    def __init__(self, a, b):
        self.a = a
        self.b = b
    def __add__(self, other):
        return algnum(self.a + other.a, self.b + other.b)
    def __mul__(self, other):
        # (a + b \sqrt(-5)) (a' + b' \sqrt(-5))
        # aa' + ab' sqrt(-5) + ba' sqrt(-5) + bb' (- 5)
        # aa' - 5 bb' + sqrt(-5)(+ab' +ba')
        return (self.a * other.b - 5 * self.b * other.b,
                self.a * other.b + self.b * other.a)
    def __str__(self):
        if self.b == 0:
            return str(self.a)
        if self.a == 0:
            return f"{self.b}sqrt(-5)"
        return f"[{self.a}, {self.b} sqrt(-5)]"

    def normsq(self):
        # (a + b \sqrt(-5))(a - b \sqrt(-5))
        # = a^2 - (-5) b^2
        # = a^2 + 5 b^2
        return self.a * self.a + 5 * self.b * self.b
    def is_zero(self):
        return self.a == 0 and self.b == 0
    def is_one(self):
        return self.a == 1 and self.b == 0

    def is_minus_one(self):
        return self.a == -1 and self.b == 0



    __repr__ = __str__

nums = [algnum(a, b) for a in range(-10, 10) for b in range(-10, 10)]

def divisor_candidates(p):
    return [n for n in nums if n.normsq() < p.normsq() \
                  and not n.is_zero() \
                  and not n.is_one() \
                  and not n.is_minus_one()]

# recursive.
print("normsq of 2: ", algnum(2, 0).normsq());
print("normsq of 3: ", algnum(3, 0).normsq());
print("normsq of 1 + sqrt(-5):" , algnum(1, 1).normsq());
print("potential divisors of 2: ", divisor_candidates(algnum(2, 0)))
# candidates must be real. Only real candidate is 2.
print("potential divisors of 3: ", divisor_candidates(algnum(3, 0)))
# Candidate must be mixed.
print("potential divisors of (1 + sqrt(-5)): ", divisor_candidates(algnum(1, 1)))
print("potential divisors of (1 - sqrt(-5)): ", divisor_candidates(algnum(1, -1)))
```

#### Recovering unique factorization of ideals
- In the above ring, define $p_1 \equiv (2, 1 + \sqrt(-5))$.
- Define $p_2 \equiv (2, 1 - \sqrt(-5))$.
- Define $p_3 \equiv (3, 1 + \sqrt(-5))$.
- Define $p_4 \equiv (3, 1 - \sqrt(-5))$.
- We claim that $p_1 p_2 = (2)$, $p_3 p_4 = (3)$, $p_1 p_3 = (1 + \sqrt(-5))$, $p_2 p_4 = (1 - \sqrt{-5})$.
- This shows that the ideals that we had above are the products of "prime ideals".
- We recover prime factorization at the _ideal level_, which we had lost at the _number level_.

- [Video lectures: Intro to algebraic number thory via fermat's last theorem](https://www.youtube.com/watch?v=1f0-pc9zYPQ&list=PLSibAQEfLnTwq2-zCB-t9v2WvnnVKd0wn)

# Centroid of a tree

- Do not confuse with **Center of a tree**, which is a node $v$ that minimizes the distance to all other nodes:
  $max_{w \in V} d(v, w)$. This can be found by taking the node that is the middle of a diameter.
- The centroid of a tree is a node such that no child has over `floor(n/2)` of the vertices
  in the tree.

## Algorithm to find centroid of a tree

- Root tree arbitrarily at $r$
- Compute subtree sizes with respect to this root $r$.
- Start from root. If all children of root $r$ have size **less than or equal to** `floor(n/2)`, we are done. Root is centroid.
- If not, some child $c$ [for child, contradiction] has size **strictly greater than** `floor(n/2)`.
- The total tree has $n$ vertices. $c$ as a subtree has **greater than**
  `floor(n/2)`
  vertices. Thus the rest of the tree
  (ie, the part under $r$ that excludes $c$) has **strictly less than** `floor(n/2)` vertices.
- Let us in our imagination reroot the tree at this child $c$. The childen of $c$ continue to have the
  same subtree size. The old node $r$, as a subtree of the new root $c$, has size strictly ness than `floor(n/2)` vertices.
- Now we recurse, and proceed to analyze `c`.
- This analysis shows us that once we descend from `r -> c`, we **do not** need to analyze the edge `c -> r` if we make `c` the
  new candidate centroid.

```cpp
int sz[N]; // subtree sizes
vector<int> es[N]; // adjacency list
int go_sizes(int v, int p) {
  sz[v] = 1;
  for (int w : es[v]) {
    if (w == p) { continue; }
    go_sizes(w, v);
    sz[node] += sz[i];
  }
}

int centroid(int v, int p) {
  for (int w : es[v]) {
    if (w != p && sz[w] > N/2)
      return centroid(w, v);
  }
  return v;
}

int main() {
  ...
  go_sizes(1, 1);
  centroid(1, 1);
};
```

- Note that one **does not need** to write the code as follows:

```cpp
int centroid(int v, int p) {
  for (int w : es[v]) {
    int wsz = 0;
    if (w == p) {
      // size of parent = total - our size
      wsz = n - sz[v];
    } else {
      wsz = sz[w];
    }
    assert(wsz);
    if (wsz > N/2) {
      return centroid(w, v);
    }
  }
  return v;
}
```

- This is because we have already established that if `p` descends into `v`, then the subtree `p` [rooted at `v`] must have less than `n/2`
  elements, since the subtree `v` [rooted at `p`] has more than `n/2` elements.

## Alternate definition of centroid
- Let the centroid of a tree $T$ be a vertex $v$, such that when $v$ is removed and the graph splits into components
  $T_v[1], T_v[2], \dots, T_v[n]$, then the value $\tau(v) = \max(|T_v[1]|, |T_v[2]|, \dots, |T_v[n]|)$ is minimized.
- That is, it is the vertex that on removal induces subtrees, such that the size of the largest component is smallest
  amongst all nodes.


#### Existence of centroid

#### Equivalence to size definition


## Centroid decomposition

- If we find the centroids of the subtrees that hang from the centroid, then we decompose the graph
  into a **centroid decomposition**.

# Path query to subtree query

- Model question: [CSES counting paths](https://cses.fi/problemset/task/1136)
- We have a static tree, and we wish to perform updates on paths, and a final query.
- We can uniquely represent a path in a tree with an initial and final node. There are $O(n^2)$ paths
  in a tree, so we need to be "smart" when we try to perform path updates.




# Pavel: bridges, articulation points for UNDIRECTED graphs

- Two vertices are 2-edge connected if there are 2 paths between them. The two paths cannot share ANU edges.
- Every bridge must occur as a DFS tree edge, because DFS connects all components together.
- More generally, every spanning tree contains all bridge edges.
- Now we check if each edge is a bridge or not.
- To check, we see what happens when we remove the edge $(u, v)$. If the edge is not a bridge, then the subtree
   of $v$ must connect to the rest of the graph.
- Because we run DFS, the subtree rooted at $v$ **must go upwards**, it cannot go cross. On an undirected graph, DFS
  only gives us tree edges and back edges.
- This means that if the subtree rooted at $v$ is connected to the rest of the graph, it must have a backedge that is "above" $u$,
  and points to an ancestor of $u$.
- Instead of creating a set of back edges for each vertex $v$, we take the *highest* /*topmost* back edge, since it's a safe
  approximation to throw away the other back-edges if all we care about is to check whether there is a backedge that goes higher than $u$.
- To find the components, push every node into a list. When we find an edge that is a bridge, take the sublist from the vertex $v$ to the end of the list.
  This is going to be one connected component. We discover islands in "reverse order", where we find the farthest island from the root first and so on.

#### Vertex connectivity

- The problem is that vertex connectivity is not an equivalence relation on vertices!
- So we define it as an equivalence relation on *edges*.
- More subtle, we cannot "directly" condense. We need to build a bipartite graph, with components on one side
  and articultion points on the other side.



# Monadic functor

- A fuctor $U: D \to C$ is monadic iff it has a left adjoint $F: C \to D$ and
  the adjunction is monadic.
- An adjunction $C : F \vdash U: D$ is monadic if the induced "comparison functor" from $D$ to the
  category of algebras (eilenberg-moore category) $C^T$ is an **equivalence of categories**.
- That is, the functor $\phi: D \to C^T$ is an equivalence of categories.
- Some notes: We have $D \to C^T$ and not the other way around since the full order is
  $C_T \to D \to C^T$: Kleisli, to $D$, to Eilenberg moore. We go from "more semantics" to
  "less semantics" --- such induced functors cannot "add structure" (by increasing the amount of semantics),
  but they can "embed" more semantics into less semantics. Thus, there is a comparison functor from $D$
  to $C^T$.
- Eilenberg-moore is written $C^T$ since the category consists of $T$-algebras, where $T$ is the induced
  monad $T: C \to D \to C$. It's $C^T$ because a $T$ algebra consists of arrows $\{ Tc \to c : c \in C \}$
  with some laws. If one wished to be cute, the could think of this as "$T \to C$".
- The monad $T$ is $C \to C$ and not $D \to D$ because, well, let's pick a concrete example: `Mon`.
  The monad on the set side takes a set $S$ to the set of words on $S$, written $S^\star$. The
  other alleged "monad" takes a monoid $M$ to the free monoid on the element of $M$. We've lost structure.


# Injective module

- An injective module is a generalization of the properties of $\mathbb Q$ as an abelian group ($\mathbb Z$ module.)
- In particular, given any injective group homomorphism $f: X \to Y$ and a morphism $q_X: X \to \mathbb Q$,
  then we induce a group homomorphism $q_Y: Y \to \mathbb Q$, where $X, Y$ are abelian groups.
- We can think of this injection $f: X \to Y$ as identifying a _submodule_ (subgroup)$X$ of $Y$.
- Suppose we wish to define the value of $q_Y$ at some $y \in Y$. If $y$ is in the subgroup $X$
  then define $q_y(y) \equiv q_x(y)$.
- For anything outside the subgroup $X$, we define the value of $q_y$ to be $0$.
- **Non-example of injective module:** See that this does not work if we replace $\mathbb Q$ with $\mathbb Z$.
- Consider the injective map $Z \to Z$ given by $i(x) \equiv 3x$
  Consider the quotient map $f: Z \to Z/3Z$. We cannot factor the map $f$ through $i$ as $f = ci$ [$c$ for contradiction].
  since any map  $c: Z \to Z/3Z$ is determined by where $c$ sends the identity. But in this case,
  the value of $c(i(x)) = c(3x) = 3xc(1)) = 0$. Thus, $\mathbb Z$ is not an injective abelian group,
  since we were unable to factor the homomorphism $Z \to Z/3Z$ along the injective $3 \times: Z \to Z$.
- **Where does non-example break on Q?** Let's have the same situation, where we have an injection $i: Z \to Q$
  given by $i(z) = 3z$. We also have the quotient map $f: Z \to Z/3Z$. We want to factor $f = qi$ where
  $q: Q \to Z/3Z$. This is given by $q(x) = $

# Proof that $Spec(R)$ is a sheaf [TODO]

- Give topology for $Spec(R)$ by defining the base as $D(f)$ --- sets where $f \in R$ does not vanish.
- Note that the base is closed under intersection: $D(f) \cap D(g) = D(fg)$.
- To check sheaf conditions, suffices to check on the base.
- To the set $D(f)$, we associate the locally ringed space $f^{-1}(R)$. That is, we localize $R$
  at the multiplicative monoid $S \equiv \{ f^k \}$.
- We need to show that if $D(f) = \cup D(f_i)$, and given solutions within each $D(f_i)$, we need to create
  a unique solution in $D(f)$.

#### Reduction 1: Replace $R$ by $R[f^{-1}]$
- We localize at $f$. This allows us to assume that $D(f) = Spec(R)$ [ideal blows up as it contains unit],
  and that $f = 1$ [localization makes $f$ into a unit, can rescale?]
- So we now have that $\{ D(f_i) \}$ cover the spectrum $Spec(R)$. This means that for each point $\mathfrak p$,
  there is some $f_i$ such that $f_i \not \equiv_\mathfrak p 0$. This means that $f_i \not \in \mathfrak p$.
- Look at ideal $I \equiv (f_1, f_2, \dots, f_n)$. For every prime (maximal) ideal $mathfrak p$ , there is some $f_i$
  such that $f_i \not in \mathfrak p$. This means that the ideal $I$ is not contained in any maximal ideal, or that $I = R$.
- This immediately means that $1 \in R: = \sum_i f_i a_i \in I$ for arbitrary $a_i \in R$.
- Recall that in a ring, the sums are all finite, so we can write $1$ as a sum of FINITE number of $f_i$, since only a finite
  number of terms in the above expression will be nonzero. [$Spec(R)$ is quasi-compact!]
- This is a partition of unity of $Spec(R)$.

#### Separability
- Given $r \in R = O(Spec(R))$, if $r$ is zero in all $D(f_i)$, then $r = 0$ in $R$.
- $R$ being zero in each $D(f_i)$ means that $r = 0$ in $R[f_i^{-1}]$. This means that $f_i^{n_i} r = 0$, because
  something is zero on localization iff it is killed by the multiplicative set that we are localizing at.
- On the other hand, we also know that $a_1 f_1 + \dots + a_n f_n  = 1$ since $D(f_i)$ cover $R$.
- We can replace $f_i$ by $f_i^{n_i}$, since $D(f_i) = D(f_i^{n_i})$. So if the $D(f_i)$ cover $R$, then so too do $D(f_i^{n_i})$.


#### Check sheaf conditions
- Suppose $r_i/f_i^{n_i} \in R[f_i^{-1}]$ is equal to $r_j/f_j^{n_j}$


#### References
- [Borcherds](https://www.youtube.com/watch?v=AYDq0qY34HU&list=PL8yHsr3EFj50Un2NpfPySgXctRQK7CLG-&index=9)


# Projections onto convex sets

- [Link](https://en.wikipedia.org/wiki/Projections_onto_convex_sets


# BGFS algorithm for unconstrained nonlinear optimization

- [Link](https://en.wikipedia.org/wiki/Broyden%E2%80%93Fletcher%E2%80%93Goldfarb%E2%80%93Shanno_algorithm)

# LM algorithm for nonlinear least squares

- [Link](https://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm)


# Backward dataflow and continuations
- Forward dataflow deals with facts _thus far_.
- Backward dataflow deals with facts about _the future_, or the _rest of the program_.
  Thus, in a real sense, backward dataflow concerns itself with _continuations_!


# Coordinate compression with `set` and `vector`

If we have a `std::set<T>` that represents our set of uncompressed values, we can
quickly compress it with a `std::vector<T>` and `lower_bound` without having to
create an `std::map<T, int>` that holds the index!

```cpp
set<int> ss; // input set to compress
vector<int> index(ss.begin(), ss.end());
int uncompressed = ...; //
int compressed = int(lower_bound(index.begin(), index.end(), key) - index.begin());
assert(xs[compressed] == uncompressed);
```

# Hilbert polynomial and dimension

- Think of non Cohen Macaulay ring (plane with line perpendicular to it). Here the dimension varies per point.
- Let $R$ be a graded ring. Let $R^0$ be noetherian. $R$ is finitely generated as an algebra over $R^0$.
  This implies by hilbert basis theorem that $R$ is noetherian (finitely generated as a module over $R^0$).
- Suppose $M$ is a graded module over $R$, and $M$ is finitely generated as a module over $R$.
- How fast does $M_n$ grow? We need some notion of size.
- Define the size of $M_n$ as $\lambda(M_n)$.Suppose $R$ is a field. Then $M_n$ is a vector space. We define
  $\lambda(M_n)$ to be the dimension of $M_n$ as a vector space over $R$.
- What about taking dimension of tangent space? Doesn't work for cusps! (singular points). Can be used to define
  singular points.
- TODO: show that at $y^2 = x^3$, we have dimension two (we expect dimension one)

# Cost of looping over all multiples of $i$ for $i$ in $1$ to $N$

- Intuitively, when I think of "looping over $i$ and all its multiples", I seem to have a gut
  feeling that its cost is $N$. Of course, it is not. It is $N/i$.
- Thus, the correct total cost becomes $\sum_{i=1}^N N/i$ (versus the false cost of $\sum_{i=1}^N N = N^2$.
- The correct total cost is a harmonic series $N\cdot \sum_{i=1}^N1/i \simeq N \log N$.
- This is useful for number theory problems like [1627D](https://codeforces.com/contest/1627/problem/D)




# Stuff I learnt in 2021

I spent this year focusing on fundamentals, and attempting to prepare
myself for topics I'll need during my PhD. This involved learning things
about dependent typing, compiling functional programs, working with the
[MLIR compiler toolchain](https://mlir.llvm.org/), and reading about
the [GAP system for computational discrete algebra](https://www.gap-system.org/).


## Guitar

I've been meaning to pick up an instrument. I'd learnt the piano as a kid,
but I'd soured on the experience as it felt like I was learning a lot of music
theory and practising to clear the [Royal school of music exams](https://in.abrsm.org/en/).
I'd learnt a little bit of playing the guitar while I was an inten at [Tweag.io](https://tweag.io/); my
AirBnB host had a guitar which he let me borrow to play with. I was eager to pick it back up.

Unfortunately, I wasn't able to play as consistenly as I had hoped I would. I can now play more chords,
but actually switching between them continues to be a challenge. I also find pressing down on barre chords
surprisingly hard. I've been told something about getting lighter strings, but I'm unsure about that.


I was also excited about learning the guitar well enough to play it while a friend sings along.
This seems to require a **lot** more practice than I currently have, as the bottleneck is whether
one can smoothly switch between chords.


## Starting my PhD: Research on Lean4

I'm excited by proof assistants, and I'd like to work on them for my PhD. So the first order
of business was to get an idea of the internals of Lean4, and to decide what exactly I would
be working on. This made me read the papers written by the Lean team over the last couple years
about their runtime, as well as made me learn how to implement dependently typed languages.

During this process, I also had calls with some of the faculty at the University of Edinburgh
to pick a co-advisor. I enjoyed reading
[Liam O connor's thesis: Type Systems for Systems Types](http://unsworks.unsw.edu.au/fapi/datastream/unsworks:61747/SOURCE02?view=true)
The thesis had a very raw, heartfelt epilogue:

> If you will permit some vain pontification on the last page of my thesis,
> I would like to reflect on this undertaking, and on the dramatic effect it
> has had on my thinking. My once-co-supervisor Toby Murray said that
> all graduate students enter into a valley of despair where they no longer believe in
> the value of their work. Certainly I am no counter-example. I do not even know if I
> successfully found my way out of it

This, honestly, made me feel a lot better, since I'd begun to feel this way even *before* launching
into a PhD!

#### Lean implementation details

I read the papers by the Lean researchers on the special features of the language.

- [Counting immutable beans](https://arxiv.org/pdf/1908.05647.pdf) describes an optimization
  that they perform in their IR (lambdarc) that optimizes memory usage by exploiting linear types.
- [Sealing pointer equality](https://arxiv.org/pdf/2003.01685.pdf) describes how to use dependent types
  to hide pointer manipulation in a referentially transparent fashion.

#### Writing a dependently typed language

I felt I had to know how to write a dependently typed language if I wanted to be successful
at working on the Lean theorem prover. So I wrote one, it's at  [`bollu/minitt`]([email protected]:bollu/minitt.git).
The tutorials that helped me the most were:

- [David Christiansen's tutorial on normalization by evaluation](https://davidchristiansen.dk/tutorials/nbe/), where
  he builds a full blown, small dependently typed language type checker.
- [Normalization by evaluation by F. Faviona](https://www.youtube.com/watch?v=atKqqiXslyo) which explains
  why we need this algorithm to implement dependently typed languages, and other cute examples
  of normal forms in mathematics. For example, to check if two lists are equivalent upto permutation,
  we can first sort the two lists, and then check for real equality. So we are reducing a problem
  of "equivalence" to a problem of "reduction to sorted order" followed by "equality". We do something
  similar to type check a dependently typed language.
- [`cubicaltt` lectures by faviona](https://www.youtube.com/watch?v=6cLUwAiQU6Q), which get the point of
  cubical type theory across very well.
- [Bidirectional type checking by Pfenning](https://www.cs.cmu.edu/~fp/courses/15312-f04/handouts/15-bidirectional.pdf)
  These lecture notes explain bidirectional typing
  well, and provide an intuition for which types should be checked and which should be inferred when performing
  bidirectional typing.

#### Paper acceptance

Our research about writing a custom backend for Lean4 was accepted at CGO'22. I was very touched at how
nice the programming languages community is. For example, Leonaro De Moura and Sebastian Ullrich, the maintainers
of Lean4 provided a lot of constructive feedback. I definitely did not expect this to happen. I feel like
I don't understand academia as a community, to be honest, and I'd like to understand how it's organized.




## Statistics

As I was working on the paper, I realised that I didn't truly understand why we were taking the median of the runtimes
to report performance numbers, or why averaging over ten runs was "sufficient" (sufficient for what?).

This led me on a quest to learn statistics correctly. My big takeaways were:

- Frequentist type statistics via null hypotheses are hard to interpret and may not be well suited for performance benchmarking.
- The High Performance Computing community does not use bayesian statistics, so using it would flag one's paper as "weird".
- The best solution is to probably report all raw data, and summarize it via reasonable summary statistics like median, which
  is robust to outliers.

I must admit, I find the entire situation very unsatisfactory. I would love it if researchers in High performance
computing wrote good reference material on how to bencharmk well. Regardless, here are some of the neat
things I wound up reading in this quest:

##### Learning statistics with R

[Learning statistics with R](htps://learningstatisticswithr.com/book/). This is a neat book which explains
statistics and the `R` programming language. I knew basically nothing of statistics and had never used `R`, so working
through the book was a blast. I was able to blaze through the first half of the book, since it's a lot of introductory
programming and introductory math. I had to take a while to digest the ideas of p-values and hypothesis testing. I'm still not
100% confident I really understand what the hell a p value is doing. Regardless, the book was a really nice read, and
it made me realize just how well the R language is designed.


##### Jackknife

The [Jackknife paper "Bootstrap methods: another look at the jackknife"](http://jeti.uni-freiburg.de/studenten_seminar/stud_sem_SS_09/EfronBootstrap.pdf)
which introduces the technique of bootstrapping: drawing many samples from a small dataset to eventually infer summary statistics. I was impressed
by the paper for three reasons. For one, it was quite easy to read as a non-statistician, and I could follow the gist of what was going on
in the proofs. Secondly, I enjoyed how amenable it is to implementation, which makes it widely used in software. Finally, I think it's
a great piece of marketing: labelling it a "Jacknife", and describing how to bootstrap is a rough-and-ready method that will save you
in the jungles of statistical wilderness makes for a great title.



## R language and tidy data

Due to the R lannguage quest, I was exposed to the idea of a data frame in a *coherent* way.
The data frames in R feels *designed* to me, unlike their python counterpart in [`pandas`](https://pandas.pydata.org/).

I realised that I should probably learn languages that are used by domain experts, and not poor approximations
of domain expertise in Python.

##### Tidyverse

This also got me interested to learn about the [tidyverse](https://www.tidyverse.org/), a collection of packages
which define a notion of "tidy data", which is a precise philosophy of how data should be formatted when
working on data science (roughly speaking, it's a dataset analogy of [3rd normal form from database theory](https://en.wikipedia.org/wiki/Third_normal_form).

In particular, I really enjoyed the [tidy data](https://vita.had.co.nz/papers/tidy-data.pdf) paper which
defines tidy data, explains how to tidy untidy data, and advocates for using tidy data as an intermediate
representation for data analysis.



## Starting a reading group: Fuzzing

I felt like I was missing out on hanging with folks from my research lab, so I decided
to start a reading group. We picked [the fuzzing book](https://www.fuzzingbook.org/)
as the book to read, since it seemed an easy and interesting read.

I broadly enjoyed the book. Since it was written in a [literate programming style](https://en.wikipedia.org/wiki/Literate_programming),
this meant that we could read the sources of each chapter and get a clear idea of how the associated topic was to be implemented.
I enjoy reading code, but I felt that the other lab members thought this was too verbose. It did make judging the length of
a particular section hard, since it was unclear how much of the section was pure implementation detail, and how much was conceptual.


##### Ideas learnt

Overall, I learnt some interesting ideas like [delta debugging](https://en.wikipedia.org/wiki/Delta_debugging),
[concolic fuzzing](https://www.fuzzingbook.org/html/ConcolicFuzzer.html), and overall, how to *design*
a fuzzing library (for example, this section on [grammar fuzzing](https://www.fuzzingbook.org/html/GeneratorGrammarFuzzer.html#Synopsis)
provides a convenient class hierarchy one could choose to follow).

I also really enjoyed the book's many (ab)uses of python's
runtime monkey-patching capabilities for fuzzing. This meant that the book could easily explain concepts
that would have been much harder in some other setting, but this also meant that some of the techniques
showcased (eg. [tracking information flow](https://www.fuzzingbook.org/html/InformationFlow.html)
by using the fact that python is dynamically typed) would be much harder to put into practice in a less flexible language.


##### Software bugs are real bugs?

The coolest thing I learnt from the book was [STADS: software testing as species discovery](https://arxiv.org/pdf/1803.02130.pdf),
which models the problem of "how many bugs exist in the program?" as "how many bugs exist in this forest?". It turns
out that ecologists have good models for approximating the **total number of species in a habitat** from the
**number of known species in a habitat**. The paper then proceeds to argue that this analogy is sensible,
and then implements this within [AFL: american fuzzy lop](https://lcamtuf.coredump.cx/afl/). Definitely the
most fun idea in the book by far.


## Persistent data structures for compilers

My friend and fellow PhD student [Mathieu Fehr](https://github.com/math-fehr) is developing
a new compiler framework based on MLIR called [XDSL](https://github.com/xdslproject/xdsl).
This is being developed in Python, as it's meant to be a way to expose the guts of the compilation
pipeline to domain experts who need not be too familiar with how compilers work.


##### Python and immutable data structures

I wished to convince Mathieu to make the data structures immutable by default. Unfortunately, python's
support for immutable style programming is pretty poor, and I never could get libraries like
[pyrsistent](https://github.com/tobgu/pyrsistent) to work well.

##### Immer

On a happier note, this made me search for what the cutting was in embedding immutable
data structures in a mutable language, which led me to [Immer: Persistence for the masses](https://public.sinusoid.es/misc/immer/immer-icfp17.pdf).
It advocates to use [RRB trees](https://dl.acm.org/doi/abs/10.1145/2858949.2784739) and describes how to design an API
that makes it convenient to use within a language like C++. I haven't read the RRB trees paper, but I have been using Immer
and I'm liking it so far.

## `WARD` for quick blackboarding

I hang out with my friends to discuss math, and the one thing I was sorely missing was the lack of a shared
blackboard. I wanted a tool that would let me quickly sketch pictures, with some undo/redo, but most importantly,
be **fast**. I found no such tool on Linux, so I wrote my own: [bollu/ward](https://github.com/bollu/ward). I was great
fun to write a tool to scratch a personal itch. I should do this more often.

## Becoming a Demoscener

I've always wanted to become a part of the [demoscene](), but I felt that I didn't understand the
graphics pipeline or the audio synthesis pipeline well enough. I decided to fix these glaring
gaps in my knowledge.

##### Rasterization

I've been implementing [`bollu/rasterizer`](https://github.com/bollu/rasterizer), which
follows the [`tinyrenderer`](https://github.com/ssloy/tinyrenderer/wiki/Lesson-0:-getting-started) series
of tutorials to implement a from-scratch, by-hand software rasterizer. I already knew
all the math involved, so it was quite rewarding to quickly put together code that applied math I already knew
to make pretty pictures.

##### Audio synthesis


Similarly, on the audio synthesis side, I wrote
[`bollu/soundsynth`](https://github.com/bollu/soundsynth) to learn fundamental synthesis algorithms.
I followed [demofox's series of audio synththesis tutorials](https://blog.demofox.org/) as well as
the very pleasant and gently paced textbook [TODO}(). I particularly enjoyed the ideas
in [karlplus strong string synthesis](https://en.wikipedia.org/wiki/Karplus%E2%80%93Strong_string_synthesis).
I find [FM synthesis](https://github.com/bollu/soundsynth/blob/master/fm-paper.pdf) very counter-intuitive to reason about.
I've been told that audio engineers can perform FM sound synthesis "by ear", and I'd love to have an intuition for
frequency space that's so strong that I can intuit how to FM synthesize a sound. Regardless, the idea is very neat for sure.

##### Plucker coordinates

I also have long wanted to understand
[Plucker coordinates](https://en.wikipedia.org/wiki/Pl%C3%BCcker_coordinates), since I'd read that they are useful
for graphics programming. I eventually plonked down, studied them, and
[wrote down an expository note](https://github.com/bollu/notes/blob/master/diffgeo/grassmanian-plucker.ipynb)
about them in a way that makes sense to me. I now feel I have a better handle on Projective space, Grassmanians, and schemes!


## Category theory

A friend started a category theory reading group, so we've spent the year working
through
[Emily Riehl's "Category theory in Context"](https://math.jhu.edu/~eriehl/context.pdf).
I'd seen categorical ideas before, like colimits to define a germ, "right adjoints preserve limits",
showing that the sheafification functor exists by invoking an adjoint functor theorem, and so on.
But I'd never systematically studied any of this, and if I'm being honest, I hadn't even understood
the statement of the Yoneda lemma properly.

##### Thoughts on the textbook

Working through the book from the ground-up was super useful, since I was forced to solve
exercises and think about limits, adjoints, and so forth. I've
[uploaded my solutions upto Chapter 4](https://github.com/bollu/notes/blob/master/category-theory-in-context/main.pdf).

I felt the textbook gets a little rough around the edges at the chapter on adjunctions. The section
on the 'Calculus of Adjunctions' made so little sense to me that I
[rewrote it](https://github.com/bollu/notes/blob/master/category-theory-in-context/calculus-of-adjunctions.pdf)
with proofs that I could actually grok/believe.

##### Curios


Regardless, it's been a fun read so far. I was also pointed to some other interesting content along
the way, like [Lawvere theories](https://bartoszmilewski.com/2017/08/26/lawvere-theories/)
and the [cohomology associated to a monad](http://www.tac.mta.ca/tac/reprints/articles/2/tr2abs.html).

## Computational mathematics


A Postdoc at our lab, [Andres Goens](https://scholar.google.de/citations?user=vjVhbJoAAAAJ&hl=en)
comes from a pure math background. While we were discussing potential research ideas (since I'm still
trying to formulate my plan for PhD), he
mentioned that we could provide a formal semantics for the
[GAP programming language](https://www.gap-system.org/) in Lean.
This project is definitely up my alley, since it involves computational math (yay), Lean (yay),
and formal verification (yay).

##### Learning GAP

I decided I needed to know some fundamental algorithms of computational group theory, so I skimmed
the book
[Permutation group algorithms by Serees](https://doc.lagout.org/science/0_Computer%20Science/2_Algorithms/Permutation%20Group%20Algorithms%20%5BSeress%202003-03-17%5D.pdf)
which explains the fundamental algorithms behind manipulating finite groups computationally, such
as the [Todd Coxeter coset enumeration algorithm](https://math.berkeley.edu/~kmill/notes/todd_coxeter.html)
and the [Schrier Sims group decomposition algorithm](https://en.wikipedia.org/wiki/Schreier%E2%80%93Sims_algorithm).
I loved the ideas involved, and implemented these at [`bollu/CASette`](https://github.com/bollu/CASette).


I'd also half-read the textbook 'Cox, Little, OShea: Computational Algebraic Geometry' which I picked
up again since I felt like I ought to revisit it after I had seen more algebraic geometry, and also
because I wanted to be better informed about computational mathematics. I felt like this time around,
I felt many of the theorems (such as the [hilbert basis theorem](https://en.wikipedia.org/wiki/Hilbert%27s_basis_theorem))
'in my bones'. Alas, I couldn't proceed more than the second chapter since other life things took priorty.
Perhaps I'll actually finish this book next year `:)`.

##### Cardistry


For something completely different, I got interested in Cardistry and shuffling thanks to Youtube.
I started learning interesting shuffles like the [riffle shuffle](https://mathworld.wolfram.com/RiffleShuffle.html),
and soon got interested in the mathematics involved. I would up reading some of
the book [Group representations for probability and statistics](https://jdc.math.uwo.ca/M9140a-2012-summer/Diaconis.pdf)
by Persi Diaconis, a magician turned mathematician who publishes quite a bit on permutation groups, shuffling, and the like.


###### Symmetric group

I really enjoyed learning the detailed theory of the representation theory of the symmetric group, which I
had read patchily before while studying
[Fourier analysis on the symmetric group](http://people.cs.uchicago.edu/~risi/research/symmetric.html).
A lot of the theory
still feels like magic to me; in particular, [Specht modules](https://en.wikipedia.org/wiki/Specht_module) are so
'magic' that I would find it hard to reconstruct them from memory.

## Competitive peogramming

I need more practice at competitive programming. In fact, I'm [downright atrocious](https://codeforces.com/profile/bollu),
as I'm rated "pupil" on codeforces. If I had to debug, it's a combination of several factors:


- I get discouraged if I can't solve a problem I think I "ought to be able to solve".
- I consider myself good at math and programming, and thus being bad at problem solving makes me feel
  bad about myself.
- I tend to overthink problems, and I enjoy using heavy algorithmic machinery, when in reality, all that's called for
  is a sequence of several observations.
- Codeforces' scoring system needs one to be *fast* at solving problems and implementing them precisely. I don't enjoy
  the time pressure. I'd like a scoring system based on harder problems, but less emphasis on time-to-solve.

To get better, I've been studying more algorithms (because it's fun). I took the
[coursera course on string algorithms](https://www.coursera.org/learn/algorithms-on-strings) and
read the textbook [algorithms on strings](https://www.cambridge.org/core/books/algorithms-on-strings/19049704C876795D95D8882C73257C70).
I loved the ideas of [building a prefix automata in linear time](https://codeforces.com/blog/entry/20861). The algorithm
is vey elegant, and involves a fundamental decomposition of regular grammar
 via the [Myhill Nerode theorem](https://en.wikipedia.org/wiki/Myhill%E2%80%93Nerode_theorem).
You can find [my string algorithm implementations here](https://github.com/bollu/notes/tree/master/strings/impl).


##### Hardness of codeforces problems

Another thing I kept getting tripped up by was the fact that problems that were rated "easy" on codeforces
tended to have _intuitive_ solutions, but with _non-trivial_ watertight proofs. An example of this
was the question [545C](https://codeforces.com/contest/545/problem/C) on codeforces, where the
tutorial gives a [sketch of a exchange argument](https://codeforces.com/blog/entry/17982). Unfortunately,
filling in all the gaps in the exchange argument is [quite complicated](https://gist.github.com/bollu/6b9d7d4b23cc4dd74d6a0fc2b66f452c).
I finally did arrive at a much longer proof. This made me realize that competitive programming sometimes calls for
"leaps" that are in fact quite hard to justify. This kept happening as I solved problems. To recitfy the state of affairs,
I began documenting formal proofs to these problems. Here's a link to [my competitive programming notes](https://github.com/bollu/notes/blob/master/competitive-programming/main.pdf),
which attempts to formally state and prove the correctness of these questions.


## Discrete differential geometry

I love the research of [Keenan Crane](https://www.cs.cmu.edu/~kmcrane/), who works on bridging old school
differential geometry with computational techniques. All of his papers are lucid, full of beautiful figures
and crazy ideas.

##### Replusive curves

Chris Yu, Henrik Schumacherm, and Keenan have a new paper on [Repulsive Curves](http://www.cs.cmu.edu/~kmcrane/Projects/RepulsiveCurves/RepulsiveCurves.pdf)
is really neat. It allows one to create curves that minimize a repulsive force, and can be subject to other arbitrary
constraints. The actual algorithm design leads one to think about all sorts of things like fractional calculus. To be honest,
I find it insane that fractional calculus finds a practical use. Definitely a cool read.

##### SAGE implementation

- I have a [work in progress PR](https://github.com/bollu/SAGE/tree/u/gh-bollu%2Fjan-17-2021-discrete-diffgeo-homology) that
  implements Keenan Crane's [Geodesics in Heat](https://arxiv.org/abs/1204.6216) algorithms
  within SAGE. Unfortunately, the problem was this implementing this requires heavy sparse numerical linear algebra,
  something that sage did not have at the time I attempted this.
- This led to me [opening an issue about sparse Cholesky decomposition](https://trac.sagemath.org/ticket/13674)
  on the SAGE issue tracker.
- Happily, the issue was fixed late this year by SAGE pulling in `cvxopt` as a dependency!
- I can get back to this now in 2022, since there's enough support within SAGE now to actually succeed!


## Writing a text editor (dropped)

I started writing a text editor, because earlier tools that I'd written for myself
such as `ward` for blackboarding, and my custom blog generator all worked really well for me,
as they fit to my idiosyncracies. I tried writing a terminal based editor
at [`bollu/edtr`](https://github.com/bollu/edtr) following the [`kilo` tutorial](https://viewsourcecode.org/snaptoken/kilo/).
Unfortunately, building a text editor is hard work, especially if one wants modern convenienes like
auto-complete.

I've postponed this project as one I shall undertake during the dark night of the soul every PhD student
encounters when writing their thesis. I plan to write a minimal lazily evaluated language, and great
tooling around that language as a means to while away time. But this is for future me!

## DnD

My partner got me into playing dungeons and dragons this year. I had a lot of fun
role-playing, and I plan to keep it up.

#### Nomic

[Nomic](https://en.wikipedia.org/wiki/Nomic)
is a neat game about changing the rules of the game. It takes a particular
type of person to enjoy it, I find, but if you have the type of people who
enjoy C++ template language lawyering, you'll definitely have a blast!

#### Continuum

I found [the continuum RPG](https://en.wikipedia.org/wiki/Continuum_(role-playing_game)),
a game about time travel
very unique, due to the massive amount of lore that surrounds it,
and game mechanics which revolve around creating time paradoxes to deal damage
to those stuck in it. It appears to have a reputation of being a game
that everybody loves but nobody plays.

#### Microscope

Microscope is a [game about storytelling](https://www.lamemage.com/microscope/). I unfortunately
was never able to host it properly because I was busy, and when I wasn't busy, I was unsure
of my abilities as dungeon master `:)` But it definitely is a game I'd be stoked to play.
I'm thnking of running it early 2022 with my group of friends.

## Odds and ends

##### The portal group

I joined the [portal group](https://theportal.group/) on discord, which consist of folks
who follow Eric Weinstein's philosophy, broadly speaking. The discod is a strange mileu. I hung
around because there were folks who knew a *lot* of math and physics. I would up
watching the [geometric anatomy of theoretical physics](https://www.youtube.com/playlist?list=PLPH7f_7ZlzxTi6kS4vCmv4ZKm9u8g5yic)
lectures on YouTube by Fredrick Schuller. The lectures are great expository material, though the hardness
ramps up like a cliff towards the end, because it feels like he stops proving things and beings to simply
state results. Regardless, I learnt a lot from it. I think my favourite takeaway was the
[Serre Swann theorem](https://en.wikipedia.org/wiki/Serre%E2%80%93Swan_theorem) which makes
very precise the idea that "projective modules are like vector bundles".

##### Differential geometry, again

Similarly, I would up realizing that my differential geometry was in fact quite weak, in terms
of computing things in coordinates. So I wound up re-reading
[Do carmo: differential geometry of curves and surfaces](http://www2.ing.unipi.it/griff/files/dC.pdf), and I implemented
the coordinate based computations in Jupyter notebooks. For example,
here is a [Jupyter notebook that calculates covariant derivatives explicitly](https://github.com/bollu/notes/blob/master/diffgeo/geodesics.ipynb).
I found that this forced me to understand what was "really going on". I now know slogans like:

> The Covariant Derivative is the projection of the global derivative onto the tangent space.
> The Christoffel Symbols measure the second dervative(acceleration) along the tangent space.


I got interested in the work of [Elizaboth Polgreen](https://polgreen.github.io/). In particular,
I found the idea of being able to extend an SMT solver with arbitrary black-box functions pretty great.
I read their [technical report on SMT modulo oracles](https://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-10.pdf)
and [implemented the algorithm](https://github.com/bollu/notes/blob/master/smt-modulo-oracles.ipynb).


## What I want for next year

I wish to learn how to focus on one thing. I'm told that the point of a PhD is to become a world
expert on one topic. I don't have a good answer of what I wish to become a world expert on. I like
the varied interessts I have, so it'll be interesting as to how this pans out. However, I have
decided to place all my bets on the Lean ecosystem, and I plan on spending most of 2022 writing
Lean code almost always (or perhaps even always). I wish to understand all parts of the Lean compiler,
from the frontend with its advanced macro system, to the middle end with its dependent typing,
to the back-end. In short, I want to become an expert on the Lean4 compiler `:)`. Let's see how
far along I get!


# Cayley hamilton for 2x2 matrices in sage via AG

- I want to 'implement' the zariski based proof for cayley hamilton in SAGE and show
  that it works by checking the computations scheme-theoretically.
- Let's work through the proof by hand. Take a 2x2 matrix `[a, b; c, d]`.
- The charpoly is `|[a-l; b; c; d-l]| = 0`, which is `p(l) = (a-l)(d-l) - bc = 0`
- This simplified is `p(l) = l^2 - (a + d) l + ad - bc = 0`.
- Now, let's plug in `l = [a; b; c; d]` to get the matrix eqn
- `[a;b;c;d]^2 - (a + d)[a;b;c;d] + [ad - bc; 0; 0; ad - bc] = 0`.
- The square is going to be `[a^2 +]`
- Let `X` be the set of `(a, b, c, d)` such that the matrices `[a;b;c;d]` satisfy their only charpoly.
- Consider the subset `U` of the set `(a, b, c, d)` such that the matrix `[a;b;c;d]` has distinct eigenvalues.
- For any matrix with distinct eigenvalues, it is easy to show that they satisfy their charpoly.
- First see that diagonal matrices satisfy their charpoly by direct computation: `[a;0;0;b]` has eigenvalues `(a, b)`.
  Charpoly is `l^2 - l(a + b) + ab`. Plugging in the matrix, we get `[a^2;0;0;b^2] - [a(a+b);0;0;b(a+b)] + [ab;0;0;ab]` which cancels out to `0`.
- Then note that similar matrices have equal charpoly, so start with `|(ฮปI - VAV')| = 0`. rewrite as `(VฮปIV' - VAV') = 0`, which is `V(ฮปI - A)V' = 0`,
  which is the same `ฮปI - A = 0`.
- Thus, this means that a matrix with distinct eigenvalues, which is similar to a diagonal matrix (by change of basis), has a charpoly that satisfies cayley hamilton.
- Thus, the set of matrices with distinct eigenvalues, `U` is a subset of `X`.

- However, it is not sufficient to show that the system of equations has an infinite set of solutions.
- For example, `xy = 0` has infinite solutions `(x=0, y=k)` and `(x=l, y=0)`, but that does not mean that it is identically zero.
- This is in stark contrast to the 1D case, where a polynomial `p(x) = 0` having infinite zeroes means that it must be the zero polynomial.
- Thus, we are forced to look deeper into the structure of solution sets of polynomials, and we need to come up with the notion of  irreducibility.
- See that the space `K^4` is irreducible, where `K` is the field from which we draw coefficients for our matrix.

- Next, we note that `X` is a closed subset of `k^4` since it's defined by the zero set of the polynomial equations.
- We note that `U` is an open subset of `k^4` since it's defined as the **non-zero set** of the discriminant of the charpoly! (ie, we want non-repeated roots)
- Also note that `U` is trivially non-empty, since it has eg. all the diagonal matrices with distinct eigenvalues.
- So we have a closed subset `X` of `k^4`, with a non-empty open subset `U` inside it.
- But now, note that the closure of `U` must lie in `X`, since `X` is a closed set, and the closure `U` of the subset of a closed set must lie in `X`.
- Then see that since the space is irreducible, the closure of `U` (an open) must be the whole space.
- This means that all matrices satisfy cayley hamilton!

# LispWorks config

- Looks like all emacs keybindings just work
- https://www.nicklevine.org/declarative/lectures/additional/key-binds.html


# Birkhoff Von Neumann theorem

- By Frobenius Konig theorem, $A$ must have block structure:

```
   r
s [B|C]
  --+---
  [0|D]
```
- Where $r + s = n + 1$

- The column sum of $B$ is $1$ for all $j$. So $B^i_j 1^j = 1$
- The row sum of $B$ is less than or equal to $1$ for all $j$. So $B^i_j 1_i \leq 1$
- From the first sum, we get the total sum as $\sum_{i, j} B[i][j] = sk$
- From the second sum, we get the total sum as $\sum_{i, j} B[i][j] \leq (n-r)k$.
- In total, we get $(n-r)k \leq sk$ which implies $s + r \leq n$ which is a contradiction because $s + r = n + 1$.

#### Proof 1 of BVN (Constructive)

- Let's take a `3x3` doubly stochastic matrix:

```
[#0.4  0.3  0.3]
[0.5   #0.2 0.3]
[0.1   0.5  #0.4]
```

- By some earlier lemma, since permanant is greater than zero, the graph has a perfect matching.
- Suppose we know how to find a perfect matching, which we know exists. Use flows (or hungarian?)
- Take the identity matching as the perfect matching (`1-1`, `2-2`, `3-3`).
- Take the minimum of the matches, `min(0.4, 0.2, 0.4) = 0.2`. So we write the original matrix as:

```
0.2 [1 0 0]    [0.2 0.3 0.3]
    [0 1 0] +  [0.5 0   0.3]
    [0 0 1]    [0.1 0.5 0.4]
```

- Second matrix has row/col sums of `0.8`. Rescale by dividing by `0.8` to get another doubly stochastic matrix.
- Then done by induction on the number of zeroes amongst the matrix entries.

```
[0.2 0.3 0.3]
[0.5 0   0.3]
[0.1 0.5 0.4]
```

- (2) Take the matching given by:

```
[#0.2  0.3   0.3]
[0.5   0    #0.3]
[0.1  #0.5   0.4]
```

- (2) This can be written as:

```
   [1 0 0]   [0    0.3   0.3]
0.2[0 0 1] + [0.5  0     0.1]
   [0 1 0]   [0.1  0.3   0.4]
```

- And so on.

##### Nice method to find permutation that makes progress
- `NxN` doubly stochastic. We must have a permutation that forms a perfect matching. How to find it?
- If all elements are `0/1`, then it's already a permutation.
- Otherwise, find a row which has an element `a` between `0/1`. Then this means that the same row will have ANOTHER element `b` betwene `0/1`.
- Then the column of this element `b` will have another element `c` between `0/1`. Keep doing this until you find a loop.
- Then find the minimum of these elements, call it $\epsilon$.
- Subtract $\epsilon$ at the element that had value $\epsilon$. Then add epislon to the element that was in the same row(column). Then
  continue, subtract $\epsilon$ for the pair of this.



# Latin Square

- A latin square of order $N$ is an $N \times N$ array in which each row and column is
  a permutation of $\{ a_1, a_2, \dots, a_n \}$.
- Example latin square (to show that these exist):

```
[1 2 3 4]
[2 3 4 1]
[3 4 1 2]
[4 1 2 3]
```

- A $k \times n$ ($k < n$) latin rectangle is a $k \times n$ matrix
  with elements $\{ a_1, a_2, \dots, a_n \}$ such that
  in each row and column, no element is repeated.
- Can we always complete a Latin rectangle into a Latin square? (YES!)

#### Lemma

- Let $A$ be a $k \times n$ latin rectangle with $k \leq n - 1$.
- We can always augment $A$ into a $(k + 1) \times n$ latin rectangle.
- If we thnk of it as a set system, then we can think of each column as telling us the
  missing sets. Example:

```
[1   2   3   4]
[4   1   2   3]
{2} {3} {1} {1}
{3} {4} {4} {2}
```

- Let's think of the subsets as a 0/1 matrix, encoded as:

```
[0 1 1 0] {2, 3}
[0 0 1 1] {3, 4}
[1 0 0 1] {1, 4}
[1 1 0 0] {1, 2}
```

- It's clear that each row will have sum $2$, since each set has 2 elements.
- We claim that each column also has sum $2$.
- For example, the first column has column sum $2$. This is because in the original
  matrix, $1$ is missing in two columns.
- We can computea perfect matching on the permutation matrix, that tells us how to extend
  the latin square with no overlaps.

# Assignment Problem

- Let $A$ be an $n \times n$ non-negative matrix.
- A permutation $\sigma$ of $[1, \dots, n]$ is called a **simple assignment** if $A[i][\sigma(i)]$ is positive
  for all $i$.
- A permutation $\sigma$ is called as an **optimal assignment** if $\sum_i A[i][\sigma(i)]$ is
  **minimized** over all permutations in $S_n$. (weird? Don't we usually take max?)
- Matrix $A[p][j]$ is the cost of assigining person $p$ the job $j$. Want to minimize cost.

#### 4x4 example

- Let the cost be:

```
[10 12 19 11]
[5  10 07 08]
[12 14 13 11]
[8  15 11  9]
```

- First find some numbers $u[i]$ and $v[i]$ (these correspond to dual variables in the LP) such that $a[i][j] \leq u[i] + v[j]$ for all $i, j$

```
     v[1] v[2] v[3] v[4]
u[1] [10  12  19   11]
u[2] [5   10  07   08]
u[3] [12  14  13   11]
u[4] [8   15  11    9]
```

- We can start by setting $u[r] = 0$, $v[c] = \min[r](a[r][c])$.
  (Can also take $v[c] = 0$ but this is inefficient)
- Circle those positions where equality holds. This becomes:

```
     v[1] v[2] v[3] v[4]
u[1] [10  12   19    11]
u[2] [5#  10#  07#   08#]
u[3] [12  14   13    11]
u[4] [8   15   11     9]
```

- Since $a[i][j] \leq u[i] + v[j]$, this implies that $a[i][\sigma(i)] \geq u[i] + v[\sigma(i)]$.
- This means $\sum a[i][\sigma(i)] \geq \sum_i u[i] + v[\sigma(i)] = \sum_i u[i] + v[i]$ (the summation can be rearranged).
- Now think of the bipartite graph where these circled positions correspond to $1$, the rest correspond to $0$s.
  If we have a perfect matching amongst the circled positions, then that is the solution (??)
- If the circled positions DO NOT have a perfect matching, then by Fobenius Konig, we can write the matrix as:

```
    s  n-s
n-r[B | C]
r  [X | D]

r + s = n + 1
```

- where in $X$, no entry is circled, because entries that are circled correspond to zeroes (conceptually?)
- We add $1$ to $u[\geq (n-r)]$s D. We subtract $1$ for $v[\geq s]$. That is:

```
    -1
   B C
+1 X D
```

- Nothing happens to $B$.
- in $C$, $v$ goes down, so that keeps the inequality correct.
- In $X$, there are no circles, which means everything was a strict ineuality, so we can afford to add 1s.
- In $D$, $u$ goes up by $1$, $v$ goes down by $1$, thus total no change? [I don't follow].
- The net change is going to be $+1(r) - 1 (n - s) = r + s - n = (n+1) - n = 1$.
- The nonconstructive part is decomposing the matrix into $[B; C, X, D]$.

#### Hungarian algorithm

- Take minimum in each row, subtract.
- Take minimumin each col, subtract.


# Interpolating homotopies

- If we have kp + (1-k) q and a contractible space X which contracts to point
  c, where image of p is x and imagine of q is y, then send the above point to
  theta(x, 2k) : k <= 1/2 and theta (y, 1-2(k - 1/2))or theta (y, 2-2k)
- This interpolates p---q to x--c--y by using bary coordinates to interpolate along homotopy.

# Example where MIP shows extra power over IP

- God tells us chess board is draw. Can't verify
- If two Gods, can make one God play against the other. So if one says draw, other says win, can have them play and find out who is lying!
- Hence, MIP has more power than IP? (Intuitively at least).

# Lazy reversible computation?
- Lazy programs are hard to analyze because we need to reason abot them backwards.
- Suppose we limit ourselves to reversible programs. Does it then become easy?

# Theorem coverage as an analogue to code coverage
- Theorem coverage: how many lines of code are covered by correctness theorems?

# Lazy GPU programming

- All laziness is a program analysis problem, where we need to strictify.
- Lazy vectorization is a program analysis problem where we need to find
  "strict blocks/ strict chains". Something like "largest block of values that
  can be forced at once". Seems coinductive?
- Efficiency of lazy program is that of clairvoyant call by value, so we need to know how to force.
- In the PRAM model, efficiency of parallel lazy program is that of clairvoyanl
  call by parallel or something. We need to know how to run in parallel, such
  that if one diverges, then all diverges. This means it's safe to run
  together!
- What is parallel STG?
- PRAM: try to parallelize writes as much as possible
- PSTG: try to parallelize forcing as much as possible
- `Reads (free) ~ forces (conflicts in ||)`
- `Write (conflict in ||) ~ create new data (free)`
- What are equivalents of common pram models?
- Force is Boolean: either returns or does not return
- We need a finer version. Something like returns in k cycles or does not return?
- Old forced values: forced values with T; wait = Infinity; Old unobserved value = forced values with Twait = -1
- Think of call by push value, but can allocate forces and call "tick" which ticks the clock. The Twait is clocked wrt this tick.
- Tick controls live ranges. Maybe obviates GC.
- Tick 1 is expected to be known/forced.
- Optimize in space-time? Looking up a recurrence versus computing a
  recurrence. One is zero space infinite time, other is zero time infinite
  space.
- Another way to think about it: application says how many people need to ask for thunk to get value. Unused values say infinity, used values say zero
- Maybe think of these as deadlines for the compiler to meet? So it's telling
  the compiler to guarantee access in a certain number of ticks. This gives
  control over (abstract) time, like imperative gives control over abstract
  space?
- TARDIS autodiff is key example. As is fib list. Maybe frac.
- Thinking about design in the data constructor side:
- Twait = 0 in data structure means is present at compile time. Twait = 1 is strict. Twait = infty is function pointer. What is Twait = 2? Mu
- Can fuse kernels for all computations in the same parallel force. If one of
  them gets stuck, all of them get stuck. So parallel force is a syntactic way
  to ask for kernel fusion.
- Can we use UB to express things like "this list will be finite, thus map can be safely parallelised" or something?
- Have quantitative: `0,1,fin,inf`?


# Backward dataflow and continuations

- Forward dataflow deals with facts _thus far_.
- Backward dataflow deals with facts about _the future_, or the _rest of the program_.
  Thus, in a real sense, backward dataflow concerns itself with _continuations_!

# The tyranny of structurelessness

- [THE TYRANNY of STRUCTURELESSNESS by Jo Freeman aka Joreen](https://www.jofreeman.com/joreen/tyranny.htm)

> "Elitist" is probably the most abused word in the women's liberation movement.
> It is used as frequently, and for the same reasons, as "pinko" was used in the
> fifties. It is rarely used correctly. Within the movement it commonly refers to
> individuals, though the personal characteristics and activities of those to
> whom it is directed may differ widely: An individual, as an individual can
> never be an elitist, because the only proper application of the term "elite" is
> to groups. Any individual, regardless of how well-known that person may be, can
> never be an elite.

> The inevitably elitist and exclusive nature of informal communication
> networks of friends is neither a new phenomenon characteristic of the women's
> movement nor a phenomenon new to women. Such informal relationships have
> excluded women for centuries from participating in integrated groups of which
> they were a part. In any profession or organization these networks have
> created the "locker room" mentality and the "old school" ties which have
> effectively prevented women as a group (as well as some men individually)
> from having equal access to the sources of power or social reward.

> Although this dissection of the process of elite formation within small groups
> has been critical in perspective, it is not made in the belief that these
> informal structures are inevitably bad -- merely inevitable. All groups create
> informal structures as a result of interaction patterns among the members of
> the group. Such informal structures can do very useful things But only
> Unstructured groups are totally governed by them. When informal elites are
> combined with a myth of "structurelessness," there can be no attempt to put
> limits on the use of power. It becomes capricious.


# Simple Sabotage Field Manual

- (1) Insist on doing everything through "channels." Never permit short-cuts to
  be taken in order to expedite decisions.
- (2) Make "speeches." Talk as frequently as possible and at great length.
  Illustrate your "points" by long anecdotes and accounts of personal
  experiences. Never hesitate to make a few appropriate "patriotic" comments.
- (3) When possible, refer all matters to committees, for "further study and
  consideration." Attempt to make the committees as large as possible - never
  less than five.
- (4) Bring up irrelevant issues as frequently as possible.
- (5) Haggle over precise wordings of communications, minutes, resolutions.
- (6) Refer back to matters decided upon at the last meeting and attempt to
  re-open the question of the advisability of that decision.
- (7) Advocate "caution". Be "reasonable" and urge your fellow-conferees to be
  "reasonable" and avoid haste which might result in embarrassments or
  difficulties later on.
- (8) Be worried about the propriety of any decision โ€” raise the question of
  whether such action as is contemplated lies within the jurisdiction of the
  group or whether it might conflict with the policy of some higher echelon.



# Counting permutations with #MAXSAT

Using #MAXSAT, you can count permutations, weird. Build a complete bipartite
graph K(n,n), and then connect left to source, right to sink with unit
capacity. Each solution to the flow problem is an assignment / permutation.



# Coloring `cat` output with `supercat`
- use `spc -e 'error, red' ` to color all occurrences of string `error` with `red`.
- I use this in [lean-mlir]() to get colored output.

# Reader monoid needs a hopf algebra?!
- 5.1, eg (iii)
- We actually get a free comonoid in a CCC.
- having a splittable random supply in like having a markov category with a comonoid in it.

# Monads mnemonic

- multiplication is $\mu$ because Mu.
- return is $\eta$ because return is unit is Yeta.

# Card stacking

> It's not about the idea, it's about the execution
-  The idea is indeed pedestrain: Let's stack cards!
- The execution is awesome.
- [Link to homepage of insane card stacker](https://www.cardstacker.com/)



# SSH into google cloud
- Setup firewall rules that enable all SSH
- Add SSH key into `metadata` of project.
- ssh `<ssh-key-username>@<external-ip>` ought to just work.



# Comma & Semicolon in index notation

> A comma before an index indicates partial differentiation with respect to that index.
> A semicolon indicates covariate differentiation.

- Thus, the divergence may be written as `v_i,i`

# Spin groups

- Spin group is a 2 to 1 cover of $SO(n)$.
- We claim that for 3 dimensions, $Spin(3) \simeq SU(2)$. So we should have a 2 to 1 homomorphism $\rho: SU(2) \to SO(3)$.
- We want to write the group in some computational way. Let's use the adjoint action (how the lie group acts on its own lie algebra).
- What is the lie algebra $su(2)$? It's trace-free hermitian.
- Why? Physicist: $UU^\dagger = I$ expanded by epsilon gives us $(I + i \epsilon H)(I - i \epsilon H) = I$, which gives $H =  H^\dagger$.
- Also the determinant condition gives us $det(1 + i \epsilon H) = 1$ which means $1 + tr(i \epsilon H) = 1$, or $tr(H) = 0$.
- The adjoint action is $SU(2) \to Aut(H)$ given by $U \mapsto \lambda x. ad_U x$ which is $\lambda x. U X U^{-1}$.
  By unitarry, this is $U \mapsto \lambda x. U X U^{\dagger}$.
- $SO(3)$ acts on $\mathbb R^3$. The trick is to take $\mathbb R^3$ and compare it to the lie algebra $su(2)$
  which has 3 dimensions, spanned by pauli matrices.
- **Conjecture:** There is an isomorphism $\mathbb R^3 \simeq H$ as an inner product space for a custom inner product
  $\langle, \rangle$ on $H$.
- [Reference](https://www.youtube.com/watch?v=Way8FfcMpf0&list=PLPH7f_7ZlzxTi6kS4vCmv4ZKm9u8g5yic&index=27)

# How to write in biblical style?

- I'd like to write in the style of the bible!

# Undefined behaviour is like compactification [TODO]

- We compactify something like $\mathbb N$ into $\mathbb N^\infty$.
- What does Stone Cech give us?
- Read abstract stone duality!

# God of areppo


> One day, a farmer named Arepo built a temple at the edge of his field. It was a humble thing, with stone walls and a thatch roof. At the center of the room Arepo stacked some stones to make a cairn. Two days later, a god moved into Arepo's temple.
> "I hope you are a harvest god," Arepo said, as he set upon the altar two stalks of wheat which he burned. "It would be nice."
>
> He looked down upon the ash that now colored the stone. "I know this isn't much of a sacrifice, but I hope this pleases you. It'd be nice to think there is a god looking after me."
>
> The next day, he left a pair of figs. The day after that, he spent ten minutes in silent prayer. On the third day, the god spoke up.
>
> "You should go to a temple in the city," said a hollow voice. Arepo cocked his head at the otherworldly sound, because it was strangely familiar. The god's voice was not unlike the rustling of wheat, or the little squeaks of fieldmice that run through the grass. "Go to a real temple. Find a real god to bless you, for I am not much myself, but perhaps I may put in a good word?"
>
> The god plucked a stone from the floor and sighed, "Forgive me, I meant not to be rude. I appreciate your temple, and find it cozy and warm. I appreciate your worship, and your offerings, but alas it shall come to naught."
>
> "Already I have received more than I had expected," Arepo said, "Tell me, with whom do I treat? What are you the patron god of?"
>
> The god let the stone he held fall to the floor, "I am of the fallen leaves, and the worms that churn beneath he ground. I am the boundary of the forest and the field, and the first hint of frost before the snow falls," the god paused to touch Arepo's altar, "And the skin of an apple as it yields beneath your teeth. I am the god of a dozen different nothings, scraps that lead to rot, and momentary glimpses." He turned his gaze to Arepo, "I am a change in the air, before the winds blow."
>
> The god shook his head, "I should not have come, for you cannot worship me. Save your prayers for the things beyond your control, good farmer," the god turned away, "You should pray to a greater thing than I,"
>
> Arepo reached out to stay the entity, and laid his hand upon the god's willowy shoulder. "Please, stay."
>
> The god turned his black eyes upon Arepo, but found only stedfast devotion. "This is your temple, I would be honored if you would stay." The god lowered himself to the floor. Arepo joined him. The two said nothing more for a great long while, until Arepo's fellow came calling.
>
> The god watched his worshiper depart, as the man's warmth radiated across the entity's skin.
>
> Next morning, Arepo said a prayer before his morning work. Later, he and the god contemplated the trees. Days passed, and then weeks. In this time the god had come to enjoy the familiarity of Arepo's presence. And then, there came a menacing presence. A terrible compulsion came upon the god, and he bid the air change, for a storm was coming. Terrified, the little god went to meet the god of storms to plead for gentleness, but it was no use.
>
> Arepo's fields became flooded, as the winds tore the tiles from his roof and set his olive tree to cinder. Next day, Arepo and his fellows walked among the wheat, salvaging what they could. At the field's edge, the little temple was ruined. After his work was done for the day, Arepo gathered up the stones and pieced them back together. "Please do not labor," said the god, "I could not protect you from the god of storms, and so I am unworthy of your temple."
>
> "I'm afraid I don't have an offering today," Arepo said, "But I think I can rebuild your temple tomorrow, how about that?"
>
> The god watched Arepo retire, and then sat miserably amongst the ruined stones of his little temple.
>
> Arepo made good on his promise, and did indeed rebuild the god's temple. But now it bore layered walls of stone, and a sturdy roof of woven twigs. Watching the man work, Arepo's neighbors chuckled as they passed by, but their children were kinder, for they left gifts of fruit and flowers.
>
> The following year was not so kind, as the goddess of harvest withdrew her bounty. The little god went to her and passionately pleaded for mercy, but she dismissed him. Arepo's fields sprouted thin and brittle, and everywhere there were hungry people with haunted eyes that searched in vain for the kindness of the gods.
>
> Arepo entered the temple and looked upon the wilted flowers and the shriveled fruit. He murmured a prayer.
>
> "I could not help you," said the god. "I am only a burden to you,"
>
> "You are my friend," said Arepo.
>
> "You cannot eat friendship!" The god retorted.
>
> "No, but I can give it." Arepo replied.
>
> And so the man set his hand upon the altar and spent the evening lost in contemplation with his god.
>
> But the god knew there was another god who would soon visit, and later that year came the god of war. Arepo's god did what he could. He went out to meet the hateful visage of the armored god, but like the others, war ignored the little god's pleas. And so Arepo's god returned to his temple to wait for his friend. After a worrying amount of time, Arepo came stumbling back, his hand pressed to his gut, anointing the holy site with his blood.
>
> Behind him, his fields burned.
>
> "I am so sorry, Arepo," said the god, "My friend. My only friend."
>
> "Shush," said Arepo, tasting his own blood. He propped himself up against the temple that he made, "Tell me, my friend, what sort of god are you?"
>
> The god reached out to his friend and lowered him to the cool soil, "I'm of the falling leaves," the god said, as he conjured an image of them. "And the worms that churn beneath the earth. The boundary of the forest and the field. The first hint of frost before the first snow. The skin of an apple as it yields beneath your teeth."
>
> Arepo smiled as the god spoke. "I am the god of a dozen different nothings, the god of the petals in bloom that lead to rot, and of momentary glimpses, and a change in the air-" the god looked down upon his friend, "Before the winds blow everything away."
>
> "Beautiful," Arepo said, his blood now staining the stones; seeping into the very foundations of his temple. "All of them, beautiful,"
>
> "When the storm came, I could not save your wheat."
>
> "Yes," Arepo said.
>
> "When the harvest failed, I could not feed you."
>
> "Yes,"
>
> Tears blurred the god's eyes, "When war came, I could not protect you."
>
> "My friend, think not yourself useless, for you are the god of something very useful,"
>
> "What?"
>
> "You are my god. The god of Arepo."
>
> And with that, Arepo the sower lay his head down upon the stone and returned home to his god. At the archway, the god of war appeared. The entity looked less imposing now, for his armor had fallen onto the blackened fields, revealing a gaunt and scarred form.
>
> Dark eyes flashed out from within the temple, 'Are you happy with your work?' They seemed to say. The god of war bowed his head, as the god of Arepo felt the presence of the greater pantheon appear upon the blackened fields.
>
> "They come to pay homage to the farmer," war said, and as the many gods assembled near the archway the god of war took up his sword to dig into the earth beneath Arepo's altar. The goddess of the harvest took Arepo's body and blessed it, before the god of storms lay the farmer in his grave.
>
> "Who are these beings, these men," said war, "Who would pray to a god that cannot grant wishes nor bless upon them good fortune? Who would maintain a temple and bring offerings for nothing in return? Who would share their company and meditate with such a fruitless deity?"
>
> The god rose, went to the archway; "What wonderful, foolish, virtuous, hopeless creatures, humans are."
>
> The god of Arepo watched the gods file out, only to be replaced by others who came to pay their respects to the humble farmer. At length only the god of storms lingered. The god of Arepo looked to him, asked; "Why do you linger? What was this man to you?"
>
> "He asked not, but gave." And with that, the grey entity departed.
>
> The god of Arepo then sat alone. Oft did he remain isolated; huddled in his home as the world around him healed from the trauma of war. Years passed, he had no idea how many, but one day the god was stirred from his recollections by a group of children as they came to lay fresh flowers at the temple door.
>
> And so the god painted the sunset with yellow leaves, and enticed the worms to dance in their soil. He flourished the boundary between the forest and the field with blossoms and berries, and christened the air with a crisp chill before the winter came. And come the spring, he ripened the apples with crisp red freckles that break beneath sinking teeth, and a dozen other nothings, in memory of a man who once praised his work with his dying breath.
>
> "Hello," said a voice.
>
> The god turned to find a young man at the archway, "Forgive me, I hope I am not intruding."
>
> "Hello, please come in."
>
> The man smiled as he entered, enchanted the the god's melodic voice. "I heard tell of your temple, and so I have come from many miles away. Might I ask, what are you the god of?"
>
> The god of Arepo smiled warmly as he set his hand upon his altar, "I am the god of every humble beauty in the world."
> -by Chris Sawyer


# Classification of lie algebras, dynkin diagrams

#### Classification of complex lie algebras
- $L$ is a complex vector space with a lie bracket $[., .]$.
- For example, if $G$ is a complex Lie group. For a complex manifold, the transition functions are holomorphic.

#### Theorem (Leri)

- Every finite dimensional complex Lie algebra $(L, [.,.])$ can be decomposed as $L = R \oplus_s (L_1 \dots \oplus L_n)$, where $\oplus$
  is direct sum, $\oplus_s$ is the semidirect sum.
-  $R$ is a solvable lie algebra.
- To define solvable, define $R_0 = R$, $R_1 = [R_0, R_0]$, $R_2 = [R_1, R_1]$, that is, $R_2 = [[R, R], [R, R]]$.
- We have that $R_{i+1}$ is a strict subset of $R_i$.
- If this sequence eventually stabilizes, ie, there is an $n$ such that $R_n = \{ 0 \}$, then $R$ is solvable.
- In the decomposition of $L$, the $R$ is the solvable part.
- We have $L_1$, \dots, $L_n$ which are simple. This means that $L_i$ is non-abelian, and $L_i$ contains no non-trivial
  ideals. An ideal of a lie algebra is a subvevtor space $I \subseteq L$ such that $[I, L] \subseteq I$. (It's like a ring ideal, except with lie bracket).
- The direct sum $L_1 \oplus L_2$ of lie algebras is the direct sum of vector spaces with lie bracket in the bigger space given by
  $[L_1, L_2] = 0$.
- The semidirect sum $R \oplus_s L_2$ as a vector space is $R \oplus L_2$. The lie bracket is given by
  $[R, L_2] \subseteq R$, so $R$ is an ideal. (This looks like internal semidirect product).

#### Remarks
- It is very hard to classify solvable Lie algebras.
- A lie algebra that has no solvable part, ie can be written as $L = L_1 \dots \oplus L_n$ is called as **semi-simple**.
- It is possible to classify the simple Lie algebras.
- We focus on the simple/semi-simple Lie algebras. Simple Lie algebras are the independent building blocks we classify.

#### Adjoint Map
- Let $(L, [., .])$ be a complex lie algebra. Let $h \in L$ be an element of the lie algebra.
- Define $ad(h): L \to L$ as $ad(h)(l) \equiv [h, l]$. Can be written as $ad(h) \equiv [h, -]$. This is the adjoint map wrt $h \in L$.

#### Killing form
- $K: L \times L \to \mathbb C$ is a bilinear map, defined as $K(a, b) \equiv tr(ad(a) \circ ad(b))$.
- See that $ad(a) \circ ad(b): L \to L$. the trace will be complex because $L$ is complex.
- Since $L$ is finite dimensional vector space, $tr$ is cyclic. So $tr(ad(a) \circ ad(b)) = tr(ad(b) \circ ad(a))$. This means
  that $K(a, b) = K(b, a)$, or that the killing form is symmetric!
- **Cartan criterion:** $L$ is semi-simple iff the killing form $K$ is non-degenerate. That is, $K(a, b) = 0$ implies $b = 0$.

#### Calculation wrt basis: $ad$ map.
- Consider for actual calculation the components of $ad(h)$ and $K$ with respect to a basis $E_1, \dots, E_{dim L}$.
- Write down a dual basis $\epsilon^1, \epsilon^{dim L}$.
- $ad(E_i)^j_k \equiv \epsilon^j (ad(E_i)(E_k))$.
- We know that $ad(E_i)(E_k) = [E_i, E_k]$ by definition.
- We write $[E_i, E_k] = C^m_{ik} E_m$ where the $C^m_{ik}$ are the structure constants.
- This gives us $ad(E_i)^j_k = \epsilon^j (C^m_{ik} E_m)$
- Pull out structure coefficient to get $ad(E_i)^j_k = C^m_{ik} \epsilon^j (E_m)$
- Use the fact that $E_m$ and $\epsilon_j$ are dual to get $ad(E_i)^j_k = C^m_{ik} \delta^j_m$
- Contract over repeated index $m$ to get $m=j$: $ad(E_i)^j_k = C^j_{ik}$
- This makes sense, since the $ad$ map is just a fancy way to write the bracket in coordinate free fashion.

#### Calculation wrt basis: Killing form.
- $K(E_i, E_j) = tr(ad(E_i) \circ ad(E_j))$
- Plug in $ad$ to become $K(E_i, E_j) = tr(C^l_{im} C^m_{jk})$ [see that the thing inside the trace is a matrix]
- Execute trace by setting $l = k = o$. This gives us: $K(E_i, E_j) = C^o_{im} C^m_{jo}$. This is also easy to calculate from
  structure coefficients.
- Iff this matrix is non-degenerate, then the lie-algebra is semi-simple.

#### $ad$ is anti-symmetric with respect to the killing form.
- Recall that $\phi$ is called as an anti-symmetric map wrt a non-degenerate bilinear form $B$ iff
  $B(\phi(v), w) = - B(v, \phi(w))$.
- Fact: $ad(h)$ is anti-symmetric wrt killing form. For killing form to be non-degenerate we need $L$ to be semisimple.

#### Key Definition for classification: Cartan subalgebra
- If $(L, [.,.])$ is a lie algebra, then the cartan subalgebra denoted by $H$ ($C$ is already taken for structure coeff.)
  is a vector space, and is a maximal subalgebra of $L$ such that there exists a basis $h_1, \dots, h_m$ of $H$
  that can be extended to a basis of $L$: $h_1, \dots, h_m, e_1, \dots, e_{dim(L)-m}$ such that the extension vectors
  are eigenvectors for any $ad(h)$ for $h \in H$.
- This means that $ad(h)(e_\alpha) = \lambda_\alpha(h) e_\alpha$.
- This can be written as $[h, e_\alpha] = \lambda_\alpha(h) e_\alpha$.
- Does this exist?

#### Existence of cartan subalgebra
- **Thm** Any finite dimensional lie algebra possesses a cartan subalgebra.
- If $L$ is simple, then $H$ is _abelian_. That is, $[H, H] = 0$.
- Thus, the $ad(h)$ are simultaneously diagonalized by the $e_\alpha$ since they all commute.

#### Analysis of Cartan subalgebra.
- $ad(h)(e_\alpha) = \lambda_\alpha(h) e_\alpha$.
- $[h, e_\alpha] = \lambda_\alpha(h) e_\alpha$.
- Since the LHS is linear in $h$, the RHS must also be linear in $H$. But in the RHS, it is only $\lambda_\alpha(h)$ that depends
  on $h$.
- This means that $\lambda_\alpha: H \to \mathbb C$ is a _linear map_!
- This is to say that $\lambda_\alpha \in H^*$ is an element of the dual space!
- The elements $\lambda_1, \lambda_2, \lambda_{dim L - m}$ are called the _roots_ of the Lie algebra.
- This is called as $\Phi \equiv \{ \lambda_1, \dots, \lambda_{dim L - m} \}$, the _root set_ of the Lie algebra.

#### Root set is closed under negation

- We found that $ad(h)$ is antisymmetric with respect to killing form.
- Thus, if $\lambda \in \phi$ is a root, $-\lambda$ is also a root (somehow).

#### Root set is not linearly independent
- We can show that $\Phi$ is not LI.

#### Fundamental roots
- Subset of roots $\Pi \subseteq \Phi$ such that $\Pi$ is linearly independent.
- Let the elements of $\Pi$ be called $\pi_1, \dots, \pi_r$.
- We are saying that $\forall \lambda \in \Phi, \exists n_1, \dots, n_f \in \mathbb N, \exists \epsilon \in \{ -1, +1 \}$
  such that $\lambda = \epsilon \sum_{i=1}^f n_i \pi_i$.
- That is, we can generate the $\lambda$ as natural number combinations of $\pi_i$, upto an overall global sign factor.
- Fact: such a set of fundamental roots can always be found.


#### complex span of fundamental roots is the dual of the cartan subalgebra
- In symbols, this is $span_{\mathbb C}(\Pi) = H^*$.
- They are not a basis of $H^*$ because they are not $\mathbb C$ independent (?)
- $\Pi$ is not unique, since it's a basis.

#### Defn: $H_{\mathbb R}^*$

- Real span of fundamental roots: $span_{\mathbb R}(\Pi)$.
- We have that $\Phi = span_{\pm \mathbb N}(\Pi)$.
- Thus $\Phi$ is contained in $span_{\mathbb R}(\Pi)$, which is contained in $span_{\mathbb C}(\Pi)$.

#### Defn: Killing form on $H^*$
- We restrict $K: L \times L \to \mathbb C$ to $K_H: H \times H \to \mathbb C$.
- What we want is $K^*: H^* \times H^* \to \mathbb C$.
- Define $i: H \to H^*$ given by $i(h) = K(h, \cdot)$.
- $i$ is invertible if $K$ is non-degenerate.
- $K^*(\mu, \nu) \equiv K(i^{-1}(\mu), i^{-1}(\nu))$.

#### $K^*$ on $H^*_{\mathbb R}$
- The restricted action of $K^*$ on $H^*_{\mathbb R}$ will always spit out real numbers.
- Also, $K^*(\alpha, \alpha) \geq 0$ and equal to zero iff $\alpha = 0$.
- See that $K$ was non-degenerate, but $K^*_{\mathbb R}$ is a real, bona fide inner product!
- This means we can calculate length and angles of fundamental roots.


#### Recovering $\Phi$ from $\Pi$
- How to recover all roots from fundamental roots?
- For any $\lambda \in Phi$, define the Weyl transformation $s_\lambda: H^\star_R \to H^\star_R$
- The map is given by $s_\lambda(\mu) = \mu - 2 \frac{K^*(\lambda, mu)}{K^*(\lambda, \lambda)} \lambda$.
- This is linear in $\mu$, but not in $\lambda$.
- Such $s_\lambda$ are called as weyl transformations.
- Define a $W$ group generated by the $s_\lambda$. This is called as the Weyl group.

#### Theorem: Weyl group is generated by fundamental roots
- It's enough to create $s_\Pi$ to generate $W$.

#### Theorem: Roots are prouced by action of Weyl group on fundamental roots
- Any $\lambda \in \Phi$ can be produced by the action of some $w \in W$ on some $\pi \in \Pi$.
- So $\forall \lambda \in \Phi, \exists \pi \in Pi, \exists w \in W$ such that $\lambda = w(\pi)$.
- This means we can create all roots from fundamental roots: first produce the weyl group, then find the action
  of the weyl group on the fundamental roots to find all roots.
- The Weyl group is closed on the set of roots, so $W(\Phi) \subseteq \Phi$.

#### Showdown

- Consider $S_{\pi_i}(\pi_j)$ for $\pi_i, \pi_j \in \Pi$.




# Weird free group construction from adjoint functor theorem


- We wish to construct the free group on a set $S$. Call the free group $\Gamma S$.
- Call the forgetful functor from groups to sets as $U$.
- The defining property of the free group is that if we are given a mapping $\phi: S \to UG$, a map which
  tells us where the generators go, there is a unique map $\Gamma \phi: \Gamma S \to G$ which maps the generators of the free
  group via a group homomorphism into $G$. Further, there is a bijection between $\phi$ and $\Gamma \phi$.
- Written differently, there is a bijection $\hom_\texttt{Set}(S, UG) \simeq \hom_\texttt{Group}(\Gamma S, G)$.
  This is the condition for an adjunction.
- The idea to construct $\Gamma S$ is roughly, to take all possible maps $f_i: S \to UG$ for all groups $G$,
  take the product of all such maps,
  and define $\Gamma S \equiv im(\pi_i f_i)$. The details follow.
- First off, we can't take all groups, that's too large. So we need to cut down the size somehow. We do this by considering groups
  with at most $|S|$ generators, since that's all the image of the maps $f_i$ can be anyway. We're only interested in the image
  at the end, so we can cut down the groups we consider to be set-sized.
- Next, we need to somehow control for isomorphisms. So we first take _isomorphism classes_ of groups with at most $|S|$ generators.
  Call this set of groups $\mathcal G$
  We then construct all possible maps $f_i: S \to UG$ for all possible maps $f$, for all possible $G \in \mathcal G$.
- This lets us construct the product map $f : S \to \prod_{G \in \mathcal G} UG$ given by $f(s) \equiv \prod_{G \in \mathcal G} f_i(s)$.
- Now we define the free group $\gamma S \equiv im(f)$. Why does this work?
- Well, we check the universal property. Suppose we have some map $h: S \to UH$. This must induce a map $\Gamma h: \Gamma S \to H$.
- We can cut down the map, by writing the map as $h_{im}: S \to im(h)$. This maps into some subset of $UH$, from which we can generate
  a group $H_{im} \subseteq H$.
- First off, there must be some index $k$ such that $f_k = h_{im}$, since the set of maps $\{ f_i \}$ covers all possible maps from $S$
  into groups with those many generators.
- This implies we can project the group $\Gamma S$ at the $k$th index to get a map from $\Gamma S$ into $H_{im}$.
- We can then inject $H_{im}$ into $H$, giving us the desired map!

# bashupload

```
curl bashupload.com -T your_file.txt
```

- Super useful if one wants to quickly send a file from/to a server.

# When are the catalan numbers odd

- The catalan numbers $C_n$ count the number of binary trees on $n$ nodes.
- For every binary tree, label the nodes in some standard ordering (eg. BFS).
- Pick the lex smallest _unbalanced_ node (node with different left and right subtree sizes).
- The operation that swaps the left and right subtrees of the lex smallest unbalanced node is an involution.
- This operation only fails when we have a complete binary tree, so the number of nodes is $n = 2^r - 1$, so we pair such a complete binary tree to itself.
- This breaks the set $C_n$ into an even number of trees (pairs of unbalanced trees) and a potential "loner tree" (paired with itself) which is the
  complete binary tree.
- Thus $C_n$ is odd iff $n = 2^r - 1$, which allows for us to have a complete binary tree, which is not paired by the involution.
- [Reference](https://mathoverflow.net/a/409029)


# Geodesic equation, Extrinsic

- The geodesic on a sphere must be a great circle. If it's not, so say we pick a circle at some fixed azimuth,
  then all the velocities point towards the center at this azimuth, not at the center of the sphere! But
  towards the center of the sphere is the real normal plane. So we get a deviation from the normal.

#### How do we know if a path is straight?
- Velocity remains constant on a straight line.
- So it has zero acceleration.
- If we think of a curved spiral climbing a hill (or a spiral staircase), the acceleration vector will point upward (to allow us to climb the hill)
  and will the curved inward into the spiral (to allow us to turn as we spiral).
- On the other hand, if we think of walking straight along an undulating plane, the acceleration with be positive/negative depending
  on whether the terrian goes upward or downward, but we won't have any left/right motion _in the plane_.
- If the acceleration is always along the normal vectors, then we have a geodesic.

#### Geodesic curve
- Curve with zero tangential acceleration when we walk along the curve with constant speed.
- Start with the $(u, v)$ plane, and map it to $R(u, v) \equiv (R_x, R_y, R_z)$.  Denote the curve as $c: I \to \mathbb R^3$  such that $c$ always
  lies on $R$. Said differently, we have $c: I \to UV$, which we then map to $\mathbb R^3$ via $R$.
- So for example, $R(u, v) = (\cos(u), \sin(u)\cos(v), \sin(u)\sin(v))$ and $c(\lambda) = (\lambda, \lambda)$. Which is to say,
  $c(\lambda) = (\cos(\lambda), \sin(\lambda)\cos(\lambda), \sin(\lambda)\sin(\lambda))$.
- Recall that $e_u \equiv \partial_u R, e_v \equiv \partial_v R \in \mathbb R^3$ are the basis of the tangent plane at $R_{u, v}$.
- Similarly, $\partial_\lambda c$ gives us the tangent vector along $c$ on the surface.
- Write out:

$$
\begin{aligned}
&\frac{dc}{d \lambda} = \frac{du}{d\lambda}\frac{dR}{du} + \frac{dv}{d\lambda}\frac{dR}{dv}
&\frac{d}{d\lambda}(\frac{dc}{d \lambda})\\
&=\frac{d}{d\lambda}(\frac{du}{d\lambda}\frac{dR}{du} + \frac{dv}{d\lambda}\frac{dR}{dv}) \\
&=\frac{d}{d\lambda}(\frac{du}{d\lambda}\frac{dR}{du}) + \frac{d}{d\lambda}(\frac{dv}{d\lambda}\frac{dR}{dv}) \\
&= \frac{d^2 u}{d\lambda^2}\frac{dR}{du} + (\frac{du}{d\lambda} \frac{d}{d\lambda} \frac{dR}{du})
  \frac{d^2 v}{d\lambda^2}\frac{dR}{dv} + (\frac{dv}{d\lambda} \frac{d}{d\lambda} \frac{dR}{dv})
\end{aligned}
$$

- How to calculate $\frac{d}{d\lambda} \frac{dR}{ddu}$? Use chain rule, again!
- $\frac{d}{d\lambda} = \frac{du}{d \lambda}\frac{\partial}{\partial u} + \frac{dv}{d \lambda}\frac{\partial}{\partial v}$

#### Geodesic curve with notational abuse
- Denote by $R(u, v)$ the surface, and by $R(\lambda)$ the equation of the curve. So for example, $R(u, v) = (\cos(u), \sin(u)\cos(v), \sin(u)\sin(v))$
  while $R(\lambda) = R(\lambda, \lambda) = (\cos(\lambda), \sin(\lambda)\cos(\lambda), \sin(\lambda)\sin(\lambda))$.

- [EigenChris videos](https://www.youtube.com/watch?v=1CuTNveXJRc)

# Connections, take 2

- I asked a [math.se question](https://math.stackexchange.com/questions/4309198/on-which-tangent-bundles-of-mathbb-r2-does-position-velocity-acceleration)
  about position, velocity, acceleration that recieved a great answer by `peek-a-boo`. Let me try and provide an exposition of his answer.
- Imagein a base manifold $M$, say a circle.
- Now imagine a vector bundle over this, say 2D spaces lying above each point on the circle. Call this $(E, \pi, M)$
- What is a connection? Roughly speaking, it seems to be a device to convert elements of $TM$ into elements of $TE$.
- We imagine the base manifold (circle) as horizontal, and the bundle $E$ as vertical. We imagine $TM$ as vectors lying horizontal
  on the circle, and we imagine $TE$ as vectors lying horizontal above the bundle. So something like:

<img src="./static/connection-vector-bundle-geometry.png"/>


- So the connection has type $C: E \times TM \to TE$. Consider a point $m \in M$ in the base manifold.
- Now think of the fiber $E_m \subseteq E$ over $x$.
- Now think of any point $e \in E_m$ in the fiber of $m$.
- This gives us a map $C_e: T_e M \to T_e E$, which tells us to imagine a particle $e \in E$ following its brother in $m \in M$.
  If we know the velocity $\dot m \in T_m M$, we can find the velocity of the sibling upstrairs with $C_e(\dot m)$.
- In some sense, this is really like path lifting, except we're performing "velocity lifting". Given a point in the base manifold and
  a point somewhere upstairs in the cover (fiber), we are told how to "develop" the path upstairs given information about how to "develop"
  the path downstairs.
- I use "develop" to mean "knowing derivatives".


#### Differentiating vector fields along a curve

- Given all of this, suppose we have a curve $c: I to M$ and a vector field over the curve $v: I \to E$ such that
  the vector field lies correctly over the curve; $\pi \circ v = c$. We want to differentiate $v$, _such that we get another $v': TI \to E$_.
- That's the crucial bit, $v$ and $v'$ have the same type, and this is achieved through the connection. So a vector field and its derivative are _both_
  vector fields over the curve.
- How do we do this? We have the tangent mapping $Tv: TI \mapsto TE$.
- We kill off the component given by pushing forward the tangent vector $Tc(i): TI$ at the bundle location $v(i)$ via the connection.
  This kills of the effect of the curving of the curve when measuring the change in the vector field $v$.
-  We build $[z(t_i: TI) \equiv Tv(ti) - C_{v(i)}(Tc(i))]: TI \to TE$.
- We now have a map from $I$ to $TE$, but we want a map to $E$. What do?
- Well, we can check that the vector field we have created is a vertical vector field, which means that it lies entirely within the fiber.
  Said differently, we check that it pushes forward to the zero vector under projection, so $TM: TE \to TM$ will be zero for the image of $w$.
- This means that $z$ lies entirely "inside" each fiber, or it lies entirely in the tangent to the vector space $\pi^{-1}(m)$ (ie, it lives
  in $T\pi^{-1}(m)$), instead of living in the full tangent bundle $E_m$ where it has access to the horizontal components.
- But for a vector space, the tangent space is canonically isomorphic to the vector space itself! (parallelogram law/can move vectors around/...).
  Thus, we can bring down the image of $w$ from $TE$ down to $E$!
- This means we now have a map $z: TI \to E$.
- But we want a $w: I \to E$. See that the place where we needed a $TI$ was to produce


# Dropping into tty on manjaro/GRUB

- Acces grub by holding down `<ESC>`
-  add a suffix `rw 3` on the GRUB config line that loads `linux ...`


# Why the zero set of a continuous function must be a closed set

- Consider the set of points $Z = f^{-1}(0)$ for some function $f: X \to \mathbb R$.
- Suppose we can talk about sequences or limits in $X$.
- Thus, if $f$ is continuous, then we must have $f(\lim x_i) = \lim f(x_i)$.
- Now consider a limit point $l$ of the set $Z$ with sequence $l_i$ (that is, $\lim l_i = l$). Then we have
  $f(l) = f(\lim l_i) = \lim f(l_i) = \lim 0 = 0$. Thus, $f(l) = 0$.
- This means that the set $Z$ contains $l$, since $Z$ contains all pre-images of zero. Thus, the set $Z$ is closed.
- This implies that the zero set of a continuous function must be a closed set.
- This also motivates zariski; we want a topology that captures polynomial behaviour. Well, then the closed sets _must_ be the zero
  sets of polynomials!

# Derivatives in diffgeo

- A function of the form $f: \mathbb R^i \to \mathbb R^o$ has derivative specified by an $(o \times i)$ matrix, one which says
  how each output varies with each input.
- Now consider a vector field $V$ on the surface of the sphere, and another vector field $D$. Why is $W \equiv \nabla_D V$
  another vector field? Aren't we differentiating a thing with 3 coordinates with another thing with 3 coordinates?
- Well, suppose we consider the previous function $f: \mathbb R^i \to \mathbb R^o$, and we then consider a curve $c: (-1, 1) \to \mathbb R^i$.
  Then the combined function $(f \circ c): (-1, 1) \to \mathbb R^o$ needs only $o$ numbers to specify the derivative, since there's only one
  parameter to the curve (time).
- So what's going on in the above example? Well, though the full function we're defining is from $\mathbb R^i$ to $\mathbb R^o$, composing
  with $c$ "limits our attention" to a 1D input slice. In this 1D input slice, the output is also a vector.
- This should be intuitive, since for example, we draw a circle parameterized by arc length, and then draw its tangents as vectors, and then
  _we draw the normal as vectors_ to the tangents! Why does _that_ work? In both cases (position -> vel, vel -> accel) we have a single parameter,
  time. So in both cases, we get vector fields!
- That's somehow magical, that the derivative of a thing needs the same "degrees of freedom" as the thing in itself. Or is it magical? Well, we're
  used to it working for functions from $\mathbb R$ to $\mathbb R$. It's a little disconcerting to see it work for functions from $\mathbb R$
  to $\mathbb R^n$.
- But how does this make sense in the case of diffgeo? We start with a manifold $M$. We take some curve $c: (-1, 1) \to M$. It's derivative
  must live as $c': (-1, 1) \to TM$. Now what about $c''$? According to our earlier explanation, this too should be a vector! Well... it is and it isn't,
  right? but how? I don't understand this well.
- Looping back to the original question, $W \equiv \nabla_D V$ is a vector field because the value of $W(p)$ is defined as taking $D(p) \in T_p M$,
  treating it as a curve $d_p: [-1, 1] \to M$ such that $d_p(0) = p$ and $d_p'(0) = D(p)$, and then finally taking $V()$.


# Building stuff with Docker

- create `Dockerfile`, write `docker build .`.
- File contains shell stuff to run in `RUN <cmd>` lines. `<cmd>` can have newlines with backslash ala shell script.
- `docker run <image/layer sha> <command>` to run something at an image SHA (ie, not in a running container). Useful to debug.
  protip: `docker run <sha-of-layer-before-error> /bin/bash` to get a shell.
- `docker exec <container-sha> <command>` to run something in a container.
- to delete an image: `docker image ls`, `docker rmi -f <image-sha>`
- docker prune all unused stuff: `docker system prune -a`
- `docker login` to login
- `docker build -t siddudruid/coolname .` to name a docker image.
- `docker push siddudruid/coolname` to push to docker hub.
- `docker pull siddudruid/coolname` to pull from docker hub.




# Lie derivative versus covariant derivative

<img src="./static/lie-bracket-versus-covariant-derivative.png"/>

- Lie derivative cares about all flow lines, covariant derivative cares about a single flow line.
- The black vector field is X
- The red vector field $Y$ such that $L_X Y = 0$. See that the length of the red vectors are compressed as we go towards the right,
  since the lie derivative measures how our "rectangles fail to commute". Thus, for the rectangle to commute, we first (a)
  need a rectangle, meaning we need to care about at least two flows in $X$, and (b) the *flows* (plural) of $X$ force the vector field $Y$
  to shrink.
- The blue vector field $Z$ is such that $\nabla_X Z = 0$. See that this only cares about a single line. Thus to conserve the vectors,
  it needs the support of a metric (ie, to keep perpendiculars perpendicular).


- [Reference question](https://math.stackexchange.com/questions/2145617/lie-vs-covariant-derivative-visual-motivation)



# The Tor functor

Let $A$ be a commutative ring, $P$ an $A$-module. The functors $Tor_i^A(-, P)$ are defined in such a way that

- $Tor_0^A(-,P) = - \otimes_A P$
- For any short exact sequence of $A$-modules $0 \to L \to M \to N \to 0$, you get a long exact sequence.

$$
\dots \to Tor_{n+1}^A(L,P) \to Tor_{n+1}^A(M,P) \to Tor_{n+1}^A(N,P)
\to Tor_n^A(L,P) \to Tor_n^A(M,P) \to Tor_n^A(N,P)
\to \dots
$$

which, on the right side, stops at

$$
\dots \to Tor_1^A(L,P) \to Tor_1^A(M,P) \to Tor_1^A(N,P)
\to L \otimes_A P \to M \otimes_A P \to N \otimes_A P \to 0
$$


```
23:44 <bollu> isekaijin can you describe the existence proof of Tor? :)
23:45 <isekaijin> A projective resolution is a chain complex of projective A-modules โ€œ... -> P_{n+1} -> P_n -> ... -> P_1 -> P_0 -> 0โ€ that is chain-homotopic to โ€œ0 -> P -> 0โ€.
23:45 <isekaijin> And you need the axiom of choice to show that it exists in general.
23:45 <isekaijin> Now, projective A-modules behave much more nicely w.r.t. the tensor product than arbitrary A-modules.
23:46 <isekaijin> In particular, projective modules are flat, so tensoring with a projective module *is* exact.
23:47 <isekaijin> So to compute Tor_i(M,P), you tensor M with the projective resolution, and then take its homology.
23:47 <isekaijin> To show that this is well-defined, you need to show that Tor_i(M,P) does not depend on the chosen projective resolution of P.
23:48 <Plazma> bollu: just use the axiom of choice like everyone else
23:48 <bollu> why do you need to take homology?
23:48 <isekaijin> That's just the definition of Tor.
23:49 <isekaijin> Okay, to show that Tor does not depend on the chosen projective resolution, you use the fact that any two chain-homotopic chains have the same homology.
23:49 <bollu> right
23:49 <isekaijin> Which is a nice cute exercise in homological algebra that I am too busy to do right now.
23:49 <bollu> whose proof I have seen in hatcher
23:49 <bollu> :)
23:49 <isekaijin> Oh, great.
23:49 <bollu> thanks, the big picture is really useful
```

# Sum of quadratic errors

- Consider the function $(x - a)^2 + (x - b)^2$
- Minimum error is at $2(x - a) + 2(x - b)$, or at `(a + b)/2`.
- As we move away towards either end-point, the _error always increases_!
- So the "reduction in error" by moving towards `b` from `(a + b)/2` is ALWAYS DOMINATED by the "increase in error"
  by moving towards `a` from `(a + b)/2`.

# Hip-Hop and Shakespeare

- For whatver reason, it appears like iambie pentameter allows one to rap shakespeaker sonnets to 80bmp / 150bpm.
- [TedX talk by  Akala](https://www.youtube.com/watch?v=DSbtkLA3GrY)


# Write thin to write well

- Set column width to be absurdly low which forces your writing to get better (?!)
- That is, when you write, say in vim or emacs, you put one clause per line.
  Then when you are done, you can use pandoc or something similar to convert
  what you wrote into standard prose. But the artificial line breaks, which
  results in thin lines, make it easier to edit, and also easier to comprehend
  diffs if you use git to track changes.


- The vast majority on book typography agrees on 66 characters per line in
  one-column layouts and 45 characters per line in multi-column layouts as
  being the optimal numbers for reading. The text-block should also be placed
  assymetrically on the page, with the margins in size order being
  `inner<top<outer<bottom`. The line height should be set at 120% of the highest
  character hight for normal book typefaces, but should be increased for
  typewriter typefaces and can be decreased slightly with shorter lines. A
  small set of typefaces are economic without losing readability, and if you
  use them you can increase these numbers slightly. But any more than 80
  characters and anything less than 40 characters is suboptimal for texts that
  are longer than a paragraph or so.
- If you adhere to these very simple principles, you will have avoided like 95% of the typographic choices that can make texts hard or slow to read.

- Try 36 letters per column.

```
Also see VimPencil
set wrap linebreak nolist
call plug#begin('~/.vim/plugged')
Plug 'junegunn/goyo.vim'
call plug#end()

"Goyo settings
let g:goyo_width = 60
let g:goyo_height = 999
let g:goyo_margin_top = 0
let g:goyo_margin_bottom = 0
```

- [Write thin to write fast](https://breckyunits.com/write-thin-to-write-fast.html)

# Hidden symmetries of alg varieties

- Given equations in $A$, can find solutions in any $B$ such that we have $\phi: A \to B$
- Can translate topological ideas to geometry.
- Fundamental theorem of riemann: fundamental group with finitely many covering becomes algebraic (?!)
- So we can look at finite quotients of the fundamental group.
- As variety, we take line minus one point. This can be made by considering $xy - 1 = 0$ in $R[x, y]$ and then projecting solutions to $R[x]$.
- If we look at complex solutions, then we get $\mathbb C - \{0 \} = C^\times$.
- The largest covering space is $\mathbb C \xrightarrow{\exp} \mathbb C^\times$. The fiber above $1 \in C^\times$ (which is the basepont) is $2 \pi i$.
- Finite coverings are $C^\times \xrightarrow{z \mapsto z^n} C^\times$. The subsitute for the fundamental group is the projective (inverse) limit
  of these groups.
- The symmetry of $Gal(\overline{\mathbb Q} / \mathbb Q)$ acts on this fundamental group.
- One can get not just fundamental group, but any finite coefficients!
- Category of coverings is equivalent to category of sets with action of fundamental group.
- [Abel Prize: Pierre Delinge](https://www.youtube.com/watch?v=9WavaUED5i8)

# `fd` for `find`

- `fd` seems to be much, much faster at `find` than, well, `find`.

# Thu Morse sequence for sharing

- Suppose A goes first at picking object from a collection of objects, then B.
- B has an inherent disatvantage, since they went second.
- So rather than repeating and allowing A to go third and B to go fourth (ie, we run `ABAB`), we should instead run `AB BA`,
  since giving `B` the third turn "evens out the disatvantage".
- Now once we're done with 4 elements, what do we do? Do we re-run `A B B A` again? No, this would be argued as unfair by `B`. So we flip this
  to get the full sequence as `ABBA BAAB`.
- What next? you guessed it... flip: `ABBA BAAB|BAAB ABBA`
- And so on. Write the recurrence down `:)`
- [Reference](https://www.youtube.com/watch?v=prh72BLNjIk)

# Elementary and power sum symmetric polynomials

- [Borcherds video on newton identites](https://www.youtube.com/watch?v=JG1F1G0S_bo)
- [Terry tao calls the power sum symmetric polynomials as 'moments'](https://mathoverflow.net/questions/402051/distribution-of-some-sums-modulo-p/402109#402109)

- Let us have $n$ variables $x[1], x[2], \dots, x[n]$.
- Let $e_k$ be the elementary symmetric polynomial that is all products of all $k$ subsets of $x[:]$.
- Let $p_k$ be the power sum symmetric polynomial that is of the form $p_k = \sum_i x[i]^k$.

#### Speedy proof when $k = n$ / no. of vars equals largest $k$ (of $e[k]$) we are expanding:

- Let $P(x) = e[n] x^0 + e[n-1]x^1 + \dots + e[1]x^{n-1} + e[0]x^n$. That is, $P(x) = \sum_i e[n-i] x^{i}$
- Let $r[1], r[2], \dots, r[n]$ be the roots. Then we have $P(r[j]) = \sum_i e[n-i] r[j]^{i} = 0$.
- Adding over all  $r[j]$, we find that:

$$
\begin{aligned}
&\sum_{j=1}^k P(r[j]) = \sum_j 0 = 0\\
&\sum_j \sum_i e[n-i] r[j]^{i}  = 0 \\
&\sum_j \sum_i e[n-i] r[j]^{i}  = 0 \\
&\sum_j e[n] \cdot 1 + \sum_j \sum_{i>0}^k e[n-1] r[j]^i  = 0 \\
k e[n] + &\sum_{i=1}^k e[i] P[n-i] = 0
\end{aligned}
$$

#### Concretely worked out in the case where $n = k = 4$:

$$
\begin{aligned}
&P(x) = 1 \cdot x^4 + e_1 x^3 + e_2 x^2 + e_3 x + e_4 \\
&\texttt{roots: } r_1, r_2, r_3, r_4\\
&P(x) = (x - r_1)(x - r_2)(x - r_3)(x - r_4)\\
&e_0 = 1 \\
&e_1 = r_1 + r_2 + r_3 + r_4 \\
&e_2 = r_1r_2 + r_1r_3 + r_1r_4 + r_2r_3 + r_2r_4 + r_3r_4 \\
&e_3 = r_1r_2r_3 + r_1r_2r_4 + r_2r_3r_4 \\
&e_4 = r_1r_2r_3r_4\\
\end{aligned}
$$

- Expanding $P(r_j)$:

$$
\begin{aligned}
P(r_1) &= r_1^4 + e_1r_1^3 + e_2r_1^2 + e_3 r_1 + e_4 = 0 \\
P(r_2) &= r_2^4 + e_1r_2^3 + e_2r_1^2 + e_3 r_1 + e_4 = 0 \\
P(r_3) &= r_3^4 + e_1r_3^3 + e_2r_1^2 + e_3 r_1 + e_4 = 0 \\
P(r_4) &= r_4^4 + e_1r_4^3 + e_2r_1^2 + e_3 r_1 + e_4 = 0 \\
\end{aligned}
$$

- Adding all of these up:

$$
\begin{aligned}
&P(r_1) + P(r_2) + P(r_3) + P(r_4) \\
&=(r_1^4 + r_2^4 + r_3^4 + r_4^4)
&+ e_1(r_1^3 + r_2^3 + r_3^3 + r_2^3)
&+ e_2(r_1^2 + r_2^2 + r_3^2 + r_4^2)
&+ e_3(r_1 + r_2 + r_3 + r_4)
&+ 4 e_4 \\
&= 1 \cdot P_4 e_1 P_3 + e_2 P_2 + e_3 P_1 + 4 e_4 \\
&= e_0 P_4 + e_1 P_3 + e_2 P_2 + e_3 P_1 + 4 e_4 \\
&= 0 \\
\end{aligned}
$$


#### When $k > n$ (where $n$ is number of variables):

- We have the identity $k e_k + \sum_{i=0}^{k-1} e_i p_{k-i} = 0$.
  (when $i = k$, we get $p_{k-i} = p_0 = 1$, this gives us the $k e_i = k e_k$ term).
- When $k > n$, this means that $e_k = 0$.
- Further, when $k > n$, this means that $s_i$ when $i > n$ is zero.
- This collapses the identity to $\sum_{i=0}^{k-1} e_i p_{k-i} = 0$ (we lose $e_k)$,
  which further collapses to $\sum_{i=0}^n e_i p_{k-1} = 0$ (we lose terms where $k - 1 > n$)
- Proof idea: We add ($k-n$) roots into $f$ to bring it to case where $k = n$. Then we set these new roots to $0$ to get the identity
  $\sum_{i=0}^n s_i p_{k-i} = 0$.

#### When $k < n$ (where $n$ is number of variables):


#### Proof by cute notation

- Denote by the tuple $(a[1], a[2], \dots, a[n])$ with $a[i] \geq a[i+1]$ the sum $\sum x[i]^a[i]$.
- For example, with three variables $x, y, z$, we have:
- $(1) = x + y + z$
- $(1, 1) = xy + yz + xz$
- $(2) = x^2 + y^2 + z^2$
- $(2, 1) = x^2y + y^2z + z^2x$
- $(1, 1, 1) = xyz$.
- $(1, 1, 1, 1) = 0$, because we don't have four variables!
  We would need to write something like $xyzw$, but we don't have a $w$, so this is zero.
- In this notation, the elementary symmetric functions are $(1)$, $(1, 1)$, $(1, 1, 1)$ and so on.
- The power sums are $(1)$, $(2)$, $(3)$, and so on.
- See that $(2)(1) = (x^2 + y^2 + z^2)(x + y + z) = x^3 + y^3 + z^3 + x^2y + x^2z + y^2x + y^2z + z^2x + z^2y = (3) + (2, 1)$.
- That is, the product of powers gives us a larger power, plus some change (in elementary symmetric).
- How do we simplify $(2, 1)$? We want terms of the form only of $(k)$ [power sum] or $(1, 1, \dots, 1)$ [elementary].
- We need to simplify $(2, 1)$.
- Let's consider $(1)(1, 1)$. This is $(x + y + z)(xy + yz + xz)$. This will have terms of the form $xyz$ (ie, $(1, 1, 1)$). These occur with multiplicity $3$,
  since $xyz$ can occur as $(x)(yz)$, $(y)(xz)$, and $(z)(xy)$. This will also have terms of the form $x^2y$ (ie, $(2, 1)$).
- Put together, we get that $(1)(1, 1) = (2, 1) + 3 (1, 1, 1)$.
- This tells us that $(2, 1) = (1)(1, 1) - 3(1, 1, 1)$.
- Plugging back in, we find that $(2)(1) = (3) + (1)(1, 1) - 3 (1, 1, 1)$. That is, $p[3] - p[2]s[1] + p[1]s[2] - 3s[3] = 0$.

In general, we will find:

$$
(k-1)(1) = (k) + (k-1, 1) \\
(k-2)(1, 1) = (k-1, 1) + (k-2, 1, 1) \\
(k-3)(1, 1, 1) = (k-2, 1, 1) + (k-3, 1, 1, 1) \\
(k-4)(1, 1, 1, 1) = (k-3, 1, 1, 1) + (k-4, 1, 1, 1, 1) \\
$$

- In general, we have:

```
(k-i)(replicate 1 i) = (k-i+1, replicate 1 [i-1]) + (k-i , replicate 1 i)
```


# Projective spaces and grassmanians in AG

#### Projective space
- Projective space is the space of all lines through $\mathbb R^n$.
- Algebraically constructed as $(V - \{ 0 \})/ \mathbb R^\times$.
- We exclude the origin to remove "degenerate lines", since the subspace spanned by $\{0\}$ when acted on with $\mathbb R^\times$
  is just $\{ 0 \}$, which is zero dimensional.

#### Grassmanian
- $G(m, V)$: $m$ dimensional subspaces of $V$.
- $G(m, n)$: $m$ dimensional subspaces of $V = k^n$.
- $G(m+1, n+1)$ is the space of $m$ planes $\mathbb P^m$ in $\mathbb P^n$. Can projectivize $(n+1)$ eqns by sending $(x_0, x_1, \dots, x_n) \in k^{n+1}$ to
  $[x_0 : x_1 : \dots : x_n] \in \mathbb P^n$.
- Duality: $G(m, V) โ‰ƒ G(dim(V)-m, V^\star)$. We map the subspace $W \subseteq V$ to the annihilator of $W$ in $V^\star$: That is, we map $W$ to the set of
  all linear functionals that vanish on $W$ [ie, whose kernel is $W$].
- The above implies $G(1, V) = G(n-1, V)$. $G(1, V)$ is just projective space $\mathbb P^1$.
  $n-1$ subspaces are cut out by linear equations, $c_0 x_0 + \dots c_{n-1} x_{n-1} + c_n = 0$.

#### G(2, 4)

- These are lines in $\mathbb P^3$. This will give us two pairs of points of the form $(x_0, y_0, z_0, w_0)$ and $(x_1, y_1, z_1, w_1)$.
  That is, we're considering "lines" between "points" (or "vectors") in $\mathbb R^3$. Exactly what we need to solve stabbing line problems
  for computer graphics :)
- Start by taking a 2D plane. The line will pass through a point in the 2D plane. This gives us two degrees of freedom.
- Then take a direction in ordinary Euclidean $\mathbb R^3$ (or $S^2$ to be precise). This gives us two degrees of freedom.
- Can also be said to be a 2-dim. subspace of a 4-dim. vector space.
- In total, $G(2, 4)$ should therefore have four degrees of freedom.
- Take $W \subseteq V$  where $V \simeq k^4$, and $W$ is 2-dimensional subspace.
- $W$ is spanned by two vectors $v_1, v_2$. So I can record it as a $2x1$ matrix:
   $\begin{bmatrix} a_{11} & a_{12} & a_{13} & a_{14} \\ a_{21} & a_{22} & a_{23} & a_{24} \end{bmatrix}$. Vector $v_i$ has coordinates $a_i$.
- If I had taken another basis $(v_1', v_2')$, there would be an invertible matrix $B \in K^{2 \times 2}$ ($det(B) \neq 0$)
  that sends $(v_1, v_2)$ to $(v_1', v_2')$. Vice Versa, any invertible matrix $B$ gives us a new basis.
- So the redundancy in our choice of parametrization of subspaces (via basis vectors) is captured entirely by the space of $B$s.
- Key idea: compute $2 \times 2$ minors of the  $2 \times 4$ matrix $(v_1, v_2)$.
- This is going to be $(a_{11} a_{22} - a_{12} a_{21}, \dots, a_{13} a_{24} - a_{14} a_{23}) \in K^6$.
- Note here that we are computing $2$ minors of a rectangluar matrix, where we take all possible $2 \times 2$ submatrices and calculate their
  determinant.
- In this case, we must pick both rows, and we have $\binom{4}{2} = 6$ choices of columns, thus we live in $K^6$.
- We represent this map as $m: K^{2 \times 4} \to K^6$ which sends $m((a_{ij})) \equiv (a_{11} a_{22} - a_{12} a_{21}, \dots, a_{13} a_{24} - a_{14} a_{23})$
  which maps a matrix to its vector of minors.
- The great advantage of this is that we have $m(B \cdot (a_{ij})) = det(B) \cdot m((a_{ij}))$, since the minor by definition takes a determinant of submatrices,
  and determinant is multiplicative.
- Thus, we have converted a _matrix_ redundancy of $B$ in $a_{ij}$ into a  **scalar** redundancy (of $det(B)$) in $m(a_{ij})$ .
- We know how to handle scalar redundancies: Live in projective space!
- Therefore, we have a well defined map $G(2, 4) \to \mathbb P^5$. Given a subspace $W \in G(2, 4)$, compute a basis $v_1, v_2 \in K^4$ for $W$,
  then compute the minor of the matrix $m((v_1, v_2)) \in K^6$, and send this to $P^5$.

#### $G(2, 4)$, projectively
- These are lines in $\mathbb P^3$.
- So take two points in $P^3$, call these $[a_0 : a_1 : a_2 : a_3]$ and $[b_0 : b_1 : b_2 : b_3]$. Again, this gives us a matrix:

$$
\begin{bmatrix}
a_0 &: a_1 &: a_2 &: a_3 \\
b_0 &: b_1 &: b_2 &: b_3 \\
\end{bmatrix}
$$

- We define $S_{ij} \equiv a_i b_j - a_j b_i$ which is the minor with columns $(i, j)$.
- Then we compress the above matrix as $(S_{01} : S_{02} : S_{03} : S_{12} : S_{13} : S_{23}) \in \mathbb P^5$.  See that $S_{ii} = 0$ and $S_{ji} = - S_{ij}$.
  So we choose as many $S$s as "useful".
- See that if we change $a$ or $b$ by a constant, then all the $S_{ij}$ change by the constant, and thus the point itself in $\mathbb P^5$ does not change.
- We can also change $b$ by adding some scaled version of $a$. This is like adding a multiple of the second row to the first row when taking determinants.
  But this does not change determinants!
- Thus, the actual plucker coordinates are invariant under which two points $a, b$ we choose to parametrize the line in $\mathbb P^3$.
-  This gives us a well defined map from lines in $\mathbb P^3$ to points in $\mathbb P^5$.
- This is not an onto map; lines in $\mathbb P^3$ have dimension 4 (need $3 + 1$ coefficiens, $ax + by + cz + d$),
  while $\mathbb P^5$ has dimension $5$.
- So heuristically, we are missing "one equation" to cut $\mathbb P^5$ with to get the image of lines in $\mathbb P^3$ in $\mathbb P^5$.
- This is the famous Plucker relation:

$$
S_{02} S_{13} = S_{01} S_{23}  + S_{03} S_{12}
$$

- It suffices to prove the relationship for the "standard matrix":

$$
\begin{bmatrix}
1 &: 0 &: a &: b \\
0 &: 1 &: c &: d \\
\end{bmatrix}
$$

- In this case, we get c (-b) = 1(ad - bc) + d (-a)

- In general, we get _plucker relations_:

$$
S_{i_1 \dots i_k}S_{j_1 \dots j_k} = \sum S_{i_1' \dots i_k'} S_{j_1' j_k'}.
$$


#### Observations of $G(2, 4)$

- Suppose $m(v_1, v_2) = (a_{ij})$ has non-vanishing first minor. Let $B$ be the inverse of the first minor matrix.
  So $B \equiv \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix}$. Set $(v_1', v_2') \equiv B (v_1, v_2)$.
- Then $m(v_1', v_2') = \begin{bmatrix} 1 & 0 & y_{11} & y_{12} \\ 0 & 1 & y_{21} & y_{22} \end{bmatrix}$.
- So the first $2 \times 2$ block is identity. Further, the $y_{ij}$ are unique.
-  As we vary $y_{ij}$, we get we get different 2 dimensional subspaces in $V$. Thus, locally, the grassmanian
   looks like $A^4$. This gives us an affine chart!
- We can recover grassmanian from the $\mathbb P^5$ embedding. Let $p_0, \dots, p_5$ be the coordinate functions on $\mathbb P^5$ ($p$ for plucker).
- The equation $p_0 p_5 - p_1 p_4 + p_2 p_3$ vanishes on the grassmanian. We can show that the zero set of the equation is *exactly* the grassmanian.

- [Computation AG: Grassmanians](https://www.youtube.com/watch?v=EPUl-J4_4sk&list=PL5ErEZ81Tyqc1RixHj65XA32ejrS2eEFK&index=14)


#### Computing cohomology of $G(2, 5)$

- Take all points of the following form:

$$
\begin{bmatrix}
&1 &:0 &:* &:* \\
&0 &:1 &:* &:*
\end{bmatrix}
$$

- Let's look at the first column: it is $\begin{bmatrix} 1 \\ 0 \end{bmatrix}$. Why not $\begin{bmatrix} 1 \\ 1 \end{bmatrix}$? Well, I can always
  cancel the second row by subtracting a scaled version of the first row! (this doesn't change the determinants). Thus, if we have a $1$ somewhere,
  the "complement" must be a $0$.
- Next, we can have something like:

$$
\begin{bmatrix}
&1 &:* &:0 &:* \\
&0 &:0 &:1 &:*
\end{bmatrix}
$$

- Here, at the second second column $\begin{bmatrix} * \\ 0 \end{bmatrix}$, if we didn't have a $0$, then we could have standardized it and put it into
  the form of $\begin{bmatrix} 0 \\ 1 \end{bmatrix}$ which makes it like the first case! Thus,we _must_ have a $0$ to get a case different from the previous.

- Continuing, we get:

$$
\begin{bmatrix}
&1 &:* &:* &:0 \\
&0 &:0 &:0 &:1
\end{bmatrix}
$$

$$
\begin{bmatrix}
&0 &:1 &:0 &:* \\
&0 &:0 &:1 &:*
\end{bmatrix}
$$

$$
\begin{bmatrix}
&0 &:1 &:* &:0 \\
&0 &:0 &:0 &:1
\end{bmatrix}
$$

$$
\begin{bmatrix}
&0 &:0 &:1 &:0 \\
&0 &:0 &:0 &:1
\end{bmatrix}
$$

- If we count the number of $\star$s, which is the number of degrees of freedom, we see that $1$ of them (the last one) has zero stars ($A^0$),
  $1$ of them has 1 star ($A^1$), two of them have 2 stars ($A^2$), one of them has 3 stars, and one of them as 4 stars.
- This lets us read off the cohomology of the grassmanian: we know the cellular decomposition. Ie, we know the number of $n$ cells for different dimensions.
- Alternatively, we can see that over a finite field $k$, we have $k^0 + k^1 + 2k^2 + k^3 + k^4$ points. On the other hand, $\mathbb P^4$ has
  $k^0 + k^1 + k^2 + k^3 + k^4$ points. Thus the grassmanian is different from projective space!
- [Borcherds](https://www.youtube.com/watch?v=bKB4Qu8ETNE&list=PL8yHsr3EFj53j51FG6wCbQKjBgpjKa5PX&index=19)


# Mnemonic for why `eta` is unit:

- Remember that given an adjunction $F \vdash G$, the unit of the adjunction is $\eta: 1 \to GF$.
- We use the symbol `eta` because it's `yunit`, and `eta` is `y` in `greek` (which is why the vim digraph for `eta` is `C-k y*`)
- $\eta$ is `unit`, since when you flip it, you get $\mu$, which is $\mu$-ltiplication (multiplication). Hence $\eta$ is the unit for the multiplication
  to form a monoidal structure for the monad.

# Fundamental theorem of galois theory

- Let $K \subseteq M$ is a finite galois extension (normal + separable), then there a 1:1 correspondence between intermediate fields $L$
  and subgroups of the galois group $G = Gal(M/K)$.
- Recall that a finite extension has finitely many subfields iff it can be written as an extension $K(\theta)/K$. This is the primitive element theorem.
- We send $L \mapsto Gal(M/L)$, the subgroup of $Gal(M/K)$ that fixes $L$ pointwise.
- We send $H$ to $fix(H)$, the subfield of $K$ that is fixed pointwise.

#### $H =  Gal(M/Fix(H))$
- It is clear that $H \subseteq Gal(M/Fix(H))$, by definition, since every element of $H$ fixes $Fix(H)$ pointwise.
- To show equality, we simply need to show that they are the same size, in terms of cardinality.
- So we will show that $|H| = |Gal(M/Fix(H))|$..

#### $L =  Fix(Gal(L/K))$
- It is clear that $L \subseteq Fix(Gal(M/L)))$, by definition, since every element of $Gal(M/L)$ fixes $L$ pointwise.
- To show equality, we simply need to show that they are the same size.
- Here, we measure size using $[M:L]$. This means that as $L$ becomes larger, the "size" actually becomes smaller!
- However, this is the "correct" notion of size, since we will have the size of $L$ to be equal to $Gal(L/K)$.
- As $L$ grows larger, it has fewer automorphisms.
- So, we shall show that $[M:L] = [M:Fix(Gal(L/K))]$.

#### Proof Strategy

- Rather than show that the "round trip" equalities are correct, we will show that the intermediates match in terms of size.
- We will show that the map $H \to Fix(H)$ is such that $|H| = [M:H]$.
- Similarly, we will show that the map $L \mapsto Gal(L/K)$ is such that $[M:K] = |L|$.
- This on composing $Gal$ and $Fix$ show both sides shows equality.

#### Part 1: $H \to Fix(H)$ preserves size

- Consider the map which sends $H \mapsto Fix(H)$. We need to show that $|H| = [M:Fix(H)]$.
- Consider the extension $M/Fix(H)$. Since $M/K$ is separable, so in $M/Fix(H)$ [polynomials separable over $K$ remain separable over super-field $Fix(H)$]
- Since the extension is separable, we have a $\theta \in M$ such that $M = Fix(H)(\theta)$ by the primitive element theorem.
- The galois group of $M/Fix(H) = Fix(H)(\theta)/Fix(H)$ must fix $Fix(H)$ entirely.
  Thus we are trying to extend the function $id: Fix(H) \to Fix(H)$
  to field automorphisms $\sigma: M \to M$.
- Since $M/K$ is normal, so is $M/Fix(H)$, since $M/K$ asserts that automorphisms $\sigma: M \to \overline K$ that fix $K$ stay within $M$.
  This implies that automorphisms $\tau: M \to \overline K$ that fix $Fix(H)$ stay within $M$.
- Thus, the number of field automorphisms $\sigma: M \to \overline M$ that fix $Fix(H)$ is equal to the number of field automorphisms $M \to M$
  that fix $Fix(H)$.
- The latter is equal to the field of the separable extension $[M:Fix(H)]$, since the only choice we have is where we choose to send $\theta$,
  and there are $[M:Fix(H)]$ choices.
- The latter is also equal to the size of the galois group

#### Part 2: $L$ to $Gal(M/L)$ preserves size

- We wish to show that $[M:L] = |Gal(M/L)|$
- Key idea: Start by writing $M = L(\alpha)$ since $M$ is separable by primitive element theorem.
  Let $\alpha$ have minimal polynomial $p(x)$. Then $deg(p(x))$ equals $[M:L]$ equals number of roots of $p(x)$ since the field is separable.
- Next, any automorphism
  $\sigma: M \to M$ which fixes $L$  is uniquely determined by where it sends $\alpha$. Further, such an automorphism $\sigma$
  must send $\alpha$ to some other root of $p(x)$ [by virtue of being a field map that fixes $L$,  $0 = \sigma(0) = \sigma(p(\alpha)) = p(\sigma(\alpha))$].
- There are exactly number of roots of $p$ (= $[M:L]$) many choices. Each gives us one automorphism. Thus $|Gal(M/L)| = [M:L]$.

# Counter-intuitive linearity of expectation [TODO]

- I like the example of "10 diners check 10 hats. After dinner they are given the hats back at random."
  Each diner has a 1/10 chance of getting their own hat back, so by linearity of expectation, the expected number of diners who get the correct hat is 1.

- Finding the expected value is super easy. But calculating any of the individual probabilities (other than the 8, 9 or 10 correct hats cases) is really annoying and difficult!


- Imagine you have 10 dots scattered on a plane. Prove it's always possible to cover all dots with disks
  of unit radius, without overlap between the disks.
   (This isn't as trivial as it sounds, in fact there are configurations of 45 points that cannot be covered by disjoint unit disks.)

- Proof: Consider a repeating honeycomb pattern of infinitely many disks. Such a pattern covers pi / (2 sqrt(3)) ~= 90.69% of the plane, and the disks are clearly disjoint. If we throw such a pattern randomly on the plane, any dot has a 0.9069 chance of being covered, so the expectation value of the total number of dots being covered is 9.069. This is larger than 9, so there must be a packing which covers all 10 dots.


# Metis

> So insofar as Athena is a goddess of war, what really do we mean by that? Note that her most famous weapon is not her sword but her shield Aegis, and Aegis has a gorgon's head on it, so that anyone who attacks her is in serious danger of being turned to stone. She's always described as being calm and majestic, neither of which adjectives anyone ever applied to Ares....

> Let's face it, Randy, we've all known guys like Ares. The pattern of human behavior that caused the internal mental representation known as Ares to appear in the minds of the ancient Greeks is very much with us today, in the form of terrorists, serial killers, riots, pogroms, and agressive tinhorn dictators who turn out to be military incompetents. And yet for all their stupidity and incompetence, people like that can conquer and control large chunks of the world if they are not resisted....

> Who is going to fight them off, Randy?

> Sometimes it might be other Ares-worshippers, as when Iran and Iraq went to war and no one cared who won. But if Ares-worshippers aren't going to end up running the whole world, someone needs to do violence to them. This isn't very nice, but it's a fact: civilization requires an Aegis. And the only way to fight the bastards off in the end is through intelligence. Cunning. Metis.

# Tooling for performance benchmarking

- Optick and Tracy and flame graphs
- https://github.com/wolfpld/tracy
- https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
- Hotspot:  https://www.kdab.com/hotspot-video/amp/
- `perf stat -x` apparently gives CSV?


# Normal field extensions

## Normal extension

- (1) For an extension $L/K$, if a polynomial $p(x) \in K[x]$ and has a root $\alpha \in L$ has _all_ its roots in $L$. So $p(x)$
  splits into linear factors $p(x) = (x - l_1)(x - l_2) \cdot (x - l_n)$ for $l_i \in L$.
- (2) [equivalent] $L$ is the splitting field over $K$ of some _set_ of polynomials.
- (3) [equivalent] Consider $K \subseteq L \subseteq \overline K$. Then any automorphism of $\overline K/K$ (ie, aut that fixes $K$ pointwise)
  maps $L$ to $L$ [fixes $L$ as a set, NOT pointwise].
- Eq: $Q(2^{1/3})$ is not normal
- Eq: $Q(2^{1/3}, \omega_3)$ is a normal extension because it's the splitting field of $x^3 - 2$.

#### (1) implies (2)

- (1) We know that $p$ has a root in $L$ implies $p$ has all rots in $L$.
- For each $\alpha \in L$, we take the minimal polynomial $p(\alpha)$. Then $p$ splits over $L$, because $L$ contains a single root of $p$ ($\alpha$).
- Thus, $L$ is the splitting field for the set of polynomials $\{ minpoly(\alpha) \in K[x] : \alpha \in L \}$.

#### (2) implies (3)

- (2) says that $L$ is the splitting field for some set of polynomials.
- An aut $\sigma: \overline K \to \overline K$ that fixes $K$ acts trivially on polynomials in $K[x]$.
- $L$ is the set of all roots of polynomials $\{ minpoly(\alpha) \in K[x] : \alpha \in L \}$.
- Since $\sigma$ fixes $K[x]$, it also cannot change the set of roots of the polynomials. Thus the set $\{ minpoly(\alpha) \in K[x] : \alpha \in L \}$
  remains invariant under $\sigma$. ($\sigma$ cannot add elements into $L$). It can at most permute the roots of $L$.

#### (3) implies (1)

- (3) says that any automorphism $\sigma$ of $\overline K/K$ fixes $L$ as a set.
- We wish to show that if $p$ has a root $\alpha \in L$, $L$ has all roots of $p$.
- we claim that for any root $\beta \in L$, there is an automorphism $\tau: \overline K/K$ such that $\tau(\alpha) = \beta$.
- Consider the tower of extensions $K \subseteq K(\alpha) \subseteq \overline K$ and $K \subseteq K(\beta) \subseteq \overline K$.
  Both $K(\alpha)$ and $K(\beta)$ look like $K[x] / p$ because $p$ is the minimal polynomial for _both_ $\alpha$ and $\beta$.
- Thus, we can write an a function $\tau: K(\alpha) \to K(\beta)$ which sends $\alpha \mapsto \beta$.
- Now, by uniqueness of field extensions, this map $\tau$ extends uniquely to a map $\overline K \to \overline K$ which sends $\alpha to \beta$. [TODO: DUBIOUS].
- But notice that $\tau$ must fix $L$ (by (3)) and $\alpha in L$. Thus, $\tau(\alpha) \in \tau(L)$, or $\beta = \tau(\alpha \in \tau(L) = L$.
- Thus, for a polynomial $p$ with root $\alpha in L$, and for any other root $\beta$ of $p$, we have that $\beta \in L$.

#### Alternative argument: Splitting field of a polynomial is normal

- Let $L/K$ be the splitting field of $f \in K[x]$. Let $g \in K[x]$ have a root $\alpha \in L$.
- Let $\beta \in \overline K$ be another root of $g$. We wish to show that $\beta \in L$ to show that $L$ is normal.
- There is an embedding $i: K(\alpha) \hookrightarrow \overline K$ which fixes $K$ and sends $\alpha$ to $\beta$.
- See that $i(L)$ is also a splitting field for $f$ over $K$ inside $\overline K$.
- But splitting fields are unique, so $i(L) = L$.
- Since $i(\alpha) = \beta$, this means $\beta \in L$ as desired.

#### Degree 2 elements are normal

- Let us have a  degree 2 extension $K \subseteq L$
- So we have some $p(x) = x^2 + bx + c \in K[x]$, $L = K(\alpha)$ for $\alpha$ a root of $p$.
- We know that $\alpha + \beta = b$ for $\alpha \in L, b \in K$. Thus $\beta = b - \alpha \in L$.
- Thus, the extension is normal since $L$ contains all the roots ($\alpha, \beta$) of $p$ as soon as it contained one of them.


#### Is normality of extensions transitivte?

- Consider $K \subseteq L \subseteq M$. If $K \subseteq L$ is normal, $L \subseteq M$ is normal, then is $K \subseteq M$ normal?
- Answer: NO!
- Counter-example: $Q \subseteq Q(2^{1/2}) \subseteq Q(2^{1/4})$.
- Each of the two pieces are normal since they are degree two. But the full tower is not normal, because $Q(2^{1/4})/Q$ has minimial polynomial $x^4 - 2$.
- On the other hand, $Q(2^{1/4})/Q(2^{1/2})$ has a minimal polynomial $x^2 - \sqrt{2} \in Q[2^{1/2}]$.
- So, normality is not transitive!
- Another way of looking at it: We want to show that $\sigma(M) \subseteq M$ where $\sigma: aut(\overline K/K)$. Since $L/K$ is normal, and $\sigma$ is an
  autormophism of $L/K$, we have $\sigma(L) \subseteq L$ [by normal]. Since $M/L$ is normal, we must have $\sigma(M) \subseteq M$. Therefore, we are done?
- NO! The problem is that $\sigma$ is not a legal automorphism of $M/L$, since $\sigma$ fixes $L$ as a *set* ($\sigma L \subseteq L$),
  and not *pointwise* ($\sigma(l) = l$ for all $l \in L$.)

# Eisenstein Theorem for checking irreducibility

- Let $p(x) = a_0 + a_1 x + \dots + a_n x^n$
- If $p$ divides all coefficients except for the highest one ($a_n$), $a_0$ is $p$-squarefree ($p^2$ does not divide $a_0$), then $p(x)$ is irreducible.
- That is, $p | a0, p | a_1$, upto $p | a_{n-1}$, $p \not | a_n$, and finally $p^2 \not | a_0$.
- Then we must show that $p(x)$ is irreducible.
- Suppose for contradiction that $p(x) = q(x)r(x)$ where $q(x) = (b_0 + b_1 x+ \dots + b_k x^k)$ and $r(x) = (c_0 + c_1 x + \dots c_l x^l)$ (such that $k + l \geq n$,
  and $k > 0, l > 0$).
- See that $a_0 = b_0 c_0$. Since $p | a_0$, $p$ must divide one of $b_0, c_0$. Since $p^2$ **does not divide** $a_0$, $p$ cannot divide **both** $b_0, c_0$.
  WLOG, suppose $p$ divides $b_0$, and $p$ **does not divide** $c_0$.
- Also see that since $a_n = (\sum_{i + j = n} b_i c_j)$, $p$ does not divide this coefficient $\sum_{i + j = n} b_i c_j$. Thus, at least one term
   in $\sum_{i + j = n} b_i c_j$ is not divisible by $p$.
- Now, we know that $p$ divides $b_0$, $p$ does not divide $c_0$. We will use this as a "domino" to show that $p$ divides $b_1$, $b_2$, and so on, all the way upto $b_k$.
  But this will imply that the final term $a_n$ will also be divisible by $p$, leading to contradiction.
- To show the domino effect, start with the coefficient of $x$, which is $a_1 = b_0 c_1 + b_1 c_0$. Since $a_1$ is divisible by $p$, $b_0$ is divisible by $p$, and $c_0$ is
  **not** divisible by $p$, the whole equation reduces to $b_1 c_0 \equiv_p 0$, or $b_1 \equiv_p 0$ [since $c_0$ is a unit modulo $p$].
- Thus, we have now "domino"'d to show that $p$ divides **both** $b_0, b_1$.
- For induction, suppose $p$ divides everything $b_0, b_1, \dots, b_r$. We must show that $p$ divides $b_{r+1}$.
- Consider the coefficient of the term $xri$, ie $a_r$. This is divisible by $p$, and we have that $a_r = b_0 c_r + b_1 c_{r-1} + \dots + b_r c_0$. Modulo $p$, the left
  hand side vanishes (as $a_r$ is divisible by $p$), and every term $b_0, b_1, \dots, b_{r-1}$ vanishes, leaving behind $0 \equiv_p b_r c_0$. Since $c_0$ is a unit, we get
  $b_r \equiv_p 0$.
- Thus, every term $\{ b_i \}$ is divisible by $p$, implying $a_n$ is divisible by $p$, leading to contradiction.
- Again, the key idea: (1) $b_0$ is divisible by $p$ while $c_0$ is not. (This uses $p | a_0$ and $p^2 \not | a_0$).
  (2) This allows us to "domino" and show that all $b_i$ are divisible by $p$ (This uses $p | a_i$). (3) This
  show that $a_n$ is divisible by $p$, a contradiction. (This uses $p \not | a_n$).

# Gauss Lemma for polynomials

- Let $z(x) \in Z[X]$ such that $z(x) = p(x) q(x)$ where $p(x), q(x) \in Q[X]$. Then we claim that there exists
  $p'(x), q'(x) \in Z[x]$ such that $z(x) = p'(x) q'(x)$.
- For example, suppose $p(x) = a_0 / b_0 + a_1 x / b_1$ and $q(x) = c_0 / d_0 + c_1 x / d_1$, such that $p(x)q(x) \in \mathbb Z[x]$ and these
  fractions are in lowest form. So, $b_i \not | a_i$ and $d_i \not | c_i$.
- Take common demoniator, so we can then find things the denominator divides to write as a product in $\mathbb Z$. For example, we know that
  $9/10 \cdot 20 / 3 = 6$. This can be obtained by rearranging the product as $(9/3) \cdot (20/10) = 3 \cdot 2 = 6$. We wish to perform a similar rearrangement,
  by first writing $9/10 \cdot 20 / 3$ as $(9 \cdot 20)/(10 \cdot 3)$, and then pairing up $10 \leftrightarrow 20$ and $3 \leftrightarrow 9$ to get the final
  integer $(9/3) (20/10) = 6$. After pairing up, each of the pairs $(9/3)$ and $(20/10)$ are clearly integers.
- Take common demoniator in $p(x)$ and write it as a fraction: $p(x) = (a_0 b_1 + (a_1 b_0)x) / b_0 b_1$, and similarly $q(x) = (c_0 d_1 + (c_1 d_0)x)/d_0 d_1$.
- We claim that the denominator of $p(x)$, $b_0 b_1$ **does not divide** the numerator of $p(x)$, $(a_0 b_1 + (a_1 b_0)x)$. This can be seen term-by-term.
  $b_0 b_1$ does not divide $a_0 b_1$ since $a_0 b_1 / b_0 b_1 = a_0 / b_0$ which was assumed to be in lowest form, and a real fraction. Similarly for all terms
  in the numerator.
- Since the product $p(x)q(x)$ which we write as fractions as  $(a_0 b_1 + (a_1 b_0)x) (c_0 d_1 + (c_1 d_0)x) / (b_0 b_1)(d_0 d_1)$ is integral, we must have that
  $b_0 b_1$ divides the numerator. Since $b_0 b_1$ **does not divide** the first factor $(a_0 b_1 + (a_1 b_0)x)$,
  it **must divide** the second factor $(c_0 d_1 + (c_1 d_0)x)$. Thus, the polynomial
  $q'(x) \equiv (c_0 d_1 + (c_1 d_0)x)/b_0 b_1$ is therefore integral [ie, $q'(x) \in Z[x]$].
- By the exact same reasoning, we must have $d_0 d_1$ divides the product $p(x)q(x)$.
  Since $d_0 d_1$ does not divide $(c_0 d_1 + (c_1 d_0)x)$, it must divide (a_0 b_1 + (a_1 b_0)x) and therefore $p'(x) \equiv (a_0 b_1 + (a_1 b_0)x)/(d_0 d_1)$
  is integral.
- Thus, we can write $z(x) = p'(x) q'(x)$ where $p'(x), q'(x) \in \mathbb Z[x]$.
- This generalizes, since we never used anything about being linear, we simply reasoned term by term.


#### Alternate way to show that the factorization is correct.

- Start at $p(x)q(x) = (a_0 b_1 + (a_1 b_0)x) (c_0 d_1 + (c_1 d_0)x) / (b_0 b_1)(d_0 d_1)$.
- Rewrite as  $ p(x)q(x) \cdot (b_0 b_1)(d_0 d_1) = (a_0 b_1 + (a_1 b_0)x) (c_0 d_1 + (c_1 d_0)x)$
- Suppose $\alpha$ is a prime factor of $b_0$. Then reduce the above equation mod $\alpha$. We get $0 \equiv_\alpha (a_0 b_1 + (a_1 b_0)x) (c_0 d_1 + (c_1 d_0)x)$.
  Since $\mathbb Z/\alpha \mathbb Z[x]$ is an integral domain, we have that one of $(a_0 b_1 + (a_1 b_0)x)$ or $(c_0 d_1 + (c_1 d_0)x)$ vanishes, and thus $p$
  divides one of the two.
- This works for all prime divisors of the denominators, thus we can "distribute" the prime divisors of the denominators across the two polynomials.
- Proof that $Z/\alpha Z[x]$ is an integral domain: note that $Z/\alpha Z$ is a field, thus $Z/ \alpha Z[x]$ is a Euclidean domain (run Euclid algorithm).
  This implies it is integral.

# How GHC does typeclass resolution

- As told to me by davean:

- Its like 5 steps
- Find all instances I that match the target constraint; that is, the target constraint is a substitution instance of I.
  These instance declarations are the candidates.
- If no candidates remain, the search fails. Eliminate any candidate IX
- for which there is another candidate IY such that both of the following hold:
  IY is strictly more specific than IX. That is, IY is a substitution instance
  of IX but not vice versa. Either IX is overlappable, or IY is overlapping.
  (This โ€œeither/orโ€ design, rather than a โ€œboth/andโ€ design, allow a client to
  deliberately override an instance from a library, without requiring a change
  to the library.)
- If all the remaining candidates are incoherent, the search succeeds,
  returning an arbitrary surviving candidate.
- If more than one non-incoherent candidate remains, the search fails.
- Otherwise there is exactly one non-incoherent candidate; call it the โ€œprime
  candidateโ€.
- Now find all instances, or in-scope given constraints, that unify with the
  target constraint, but do not match it. Such non-candidate instances might
  match when the target constraint is further instantiated. If all of them are
  incoherent top-level instances, the search succeeds, returning the prime
  candidate. Otherwise the search fails.
- [GHC manual](https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/instances.html#overlapping-instances)

# Defining continuity covariantly

- Real analysis: coavriant definition: $f(\lim x) = \lim (f x)$. Contravariant definition in analysis/topology: $f^{-1}(open)$ is open.
- Contravariant in topology via sierpinski: $U \subseteq X$ is open iff characteristic function
  $f(x) = \begin{cases} T & x \in U \\ \bot & \text{otherwise} \end{cases}$
  is continuous.
- A function $f: X \to Y$ is continuous iff every function $f \circ s$ is continuous for every continuous $s: Y \to S$. That is, a function is continuous iff
  the pullback of every indicator is an indicator.
- A topological space is said to be *sequential* iff every *sequentially open set* is open.
- A set $K \subseteq X$ is sequentially open iff whenever a sequence $x_n$ has a limit point in $K$, then there is some $M$ such that $x_{\geq M}$ lies in $K$. [TODO: check]
- Now consider $\mathbb N_\infty$, the one point compactification of the naturals. Here, we add a point called $\infty$ to $\mathbb N$, and declare
  that sets which have a divergent sequences and $\infty$ in them are open.
- More abstractly, we declare all sets that are complements of closed and bounded sets with
  infinity in them as open. So a set $U \subseteq \mathbb N_{\infty}$ is bounded iff there exists a closed bounded
  $C \subseteq \mathbb N$ such that $U = \mathbb N / C \cup \{ infty \}$.
- A function $x: \mathbb N_\infty to X$ is continuous [wrt above topology] iff the sequence $x_n$ converges to the limit $x_\infty$.
- See that we use functions out of $\mathbb N_\infty$ [covariant] instead of functions into $S$ [contravariant].
- Now say a function $f: X \to Y$ is sequentially continuous iff for every continuous $x: \mathbb N_\infty \to X$, the composition $f \circ x: \mathbb N_\infty \to Y$
  is continuous. Informally, the pushforward of every convergent sequence is continuous.
- Can show that the category of sequential spaces is **cartesian closed**.
- Now generalize $\mathbb N_\infty$
- https://twitter.com/EscardoMartin/status/1444791065735729155

# Why commutator is important for QM

- Suppose we have an operator $L$ with eigenvector $x$, eigenvalue $\lambda$. So $Lx = \lambda x$.
- Now suppose we have another operator $N$ such that $[L, N] = \kappa N$ for some constant $\kappa$.
- Compute $[L, N]x = \kappa Nx$, which implies:

$$
\begin{aligned}
&[L, N]x = \kappa Nx \\
&(LN - NL)x = \kappa Nx \\
&L(Nx) - N(Lx) = \kappa Nx \\
&L(Nx) - N(\lambda x) = \kappa Nx \\
&L(Nx) - \lambda N(x) = \kappa Nx \\
&L(Nx) = \kappa Nx + \lambda Nx  \\
&L(Nx) = (\kappa + \lambda)Nx  \\
\end{aligned}
$$

- So $Nx$ is an eigenvector of $L$ with eigenvalue $\kappa + \lambda$.
- This is how we get "ladder operators" which raise and lower the state. If we have a state $x$ with some eigenvalue $\lambda$, the operator like $N$
  gives us an "excited state" from $x$ which eigenvalue $\kappa + \lambda$.

# Deriving pratt parsing by analyzing recursive descent [TODO]


# Level set of a continuous function must be closed

- Let $f$ be continuous, let $L \equiv f^{-1}(y)$ be a level set. We claim $L$ is closed.
- Consider any sequence of points $s: \mathbb N \to L$. We must have $f(s_i) = y$
  since $s(i) \in L$. Thus, $f(s_i) = y$ for all $i$.
- By continuity, we therefore have $f(\lim s_i) = \lim f(s_i) = y$.
- Hence, $\lim s_i \in L$.
- This explains why we build Zariski the way we do: the level sets of functions
  must be closed. Since we wish to study polynomials, we build our topology out of the
  level sets of polynomials.


# HPNDUF - Hard problems need design up front!
- [Norvig v/s some TDD due try to solve sudoku](http://ravimohan.blogspot.com/2007/04/learning-from-sudoku-solvers.html)


# Separable Extension is contained in Galois extension

- Recall that an extension is galois if it is separable and normal.
- Consider some separable extension $L/K$.
- By primitive element, can be written as $L = K(\alpha)$
- Since $L$ is separable, the minimal polynomial of $\alpha$, $p(x) \in K[x]$ is separable, and so splits into linear factors.
- Build the splitting field $M$ of $p(x)$. This will contain $L$, as $L = K(\alpha) \subseteq K(\alpha, \beta, \gamma, \dots)$
  where $\alpha, \beta, \gamma, \dots$ are the roots of $p(x)$.
- This is normal (since it is the splitting field of a polynomial).
- This is separable, since it is generated by separable elements $\alpha$, $\beta$, $\gamma$, and so on.


# Primitive element theorem

- Let $E/k$ be a finite extension. We will characterize when a primitive element exists, and show that
  this will always happen for separable extensions.

#### Try 0: Naive attempt

- We try to find some element $\theta \in E$ such that the multiplicative subgroup generated by $\theta$,
  $\{1, \theta, \theta^2, dots\}$ generate $E$.
- However, to find such an element is our mandate!
- What we can do, however, is to take arbitrary elements of the form $\alpha, \beta, \gamma, \delta$ and so on,
  and try to write one of them in terms of the other. If we can show that $\gamma = f(\beta, \alpha)$ and $\beta = g(\alpha)$
  where $f, g$ are polynomials, then we have reduced everything to powers of $\alpha$.
- Now, clearly, if we can do this for two elements, ie, given $E = k(\alpha, \beta)$ , find a polynomial $f$ such that
  $\beta = f(\alpha)$, we have won. So the question is in finding this $f$.

#### Part 1: Primitive element iff number of intermediate subfields is finite

##### Forward: Finitely many intermediate subfields implies primitive
- If $k$ is a finite field, then $E$ is a finite extension and $E^\times$ is a cyclic group. The generator of $E^\times$ is the primitive element.
- So suppose $k$ is an infinite field. Let $E/k$ have many intermediate fields.
- Pick non-zero $\alpha, \beta \in E$ such that $E = k(\alpha, \beta)$. We will generalize to arbitrarily many generators via recursion.
- As $c$ varies in $k$, the extension $k(\alpha + c\beta)$ varies amongst the extensions of $E$.
- Since $E$ only has finitely many extensions while $k$ is infinite, pigeonhole tells us that there are two $c_1 \neq c_2$ in $E$ such that
  $k(\alpha + c_1 \beta) = k(\alpha + c_2 \beta)$.
- Define $L \equiv k(\alpha + c_1 \beta)$. We claim that $L = E$, which shows that $\alpha + c_1 \beta$ is a primitive element for $E$.
- Since $k(\alpha + c_2 \beta) = k(\alpha + c_1 \beta) = L$, this implies that $\alpha + c_2 \beta \in L$.
- Thus, we find that $\alpha + c_1 \beta \in L$ and $\alpha + c_2 \beta \in L$. Thus, $(c_1 - c_2) \beta in L$. Since $c_1, c_2 \in k$, we have
  $(c_1 - c_2)^{-1} \in K$, and thus $\beta \in L$, which implies $\alpha \in L$.
- Thus $L = k(\alpha, \beta) = k(\alpha + c_1 \beta)$.
- Done. Prove for more generators by recursion.

##### Backward: primitive implies finitely many intermediate subfields

- Let $E = k(\alpha)$ be a simple field (field generated by a primitive element). We need to show that $E/k$ only has finitely many subfields.
- Let $a_k(x) \in k[x]$ be the minimal polynomial for $\alpha$ in $k$. By definition, $a$ is irreducible.
- For any intermediate field $k \subseteq F \subseteq E$, define $a_F(x) \in F[x]$ to be the minimal polynomial of $\alpha$ in $F$.
- Since $a_k$ is also a member of $F[x]$ and $a_k, a_F$ share a common root $\alpha$ and $a_F$ is irreducible in $F$, this means that $a_F$ divides $a_k$.
- Proof sketch that irreducible polynomial divides any polynomial it shares a root with (Also written in another blog post):
  The GCD $gcd(a_F, a_k) \in F[x]$ must be non constant since $a_F, a_k$ share a root). But the irreducible polynomial $a_F$
  cannot have a smaller polynomial ($gcd(a_F, a_k)$) as divisor. Thus the GCD itself is the irreducible polynomial $a_F$. This implies that $a_F$ divides $a_k$
  since GCD must divide $a_k$.
- Since $a_k$ is a polynomial, it only has finitely many divisors (upto rescaling, which does not give us new intermediate fields).
- Thus, there are only finitely many intermediate fields if a field is primitive.



##### Interlude: finite extension with infinitely many subfields

- Let $K = F_p(t, u)$ where $t, u$ are independent variables. This is an infinite field extension of $F_p$, since each of the 
  $t^i$ are independent. 
- Recall that the equation $t^p \equiv t (mod p)$ does not help here, as that only tells us that upon evaluation,
  the polynomials agree at all points. However, they are still two different polynomials / rational functions.
- Consider the subfield $k \equiv F_p(t^p, u^p)$. This too is an infinite field extension of $F_p$.
- Now consider the tower $K/k$, ie, $F_p(t, u)/F_p(t^p, u^p)$. This is of finite degree,
  as we can only have $t^i$ of power upto $p$. $t^p$ lies in the base field $k$.
  Exactly the same with $u$.
- So the extension $F/k$ has basis $\{ t^i u^j : 0 \leq i, j < p \}$, and so has degree $p^2$.
- We will first show that $K$ cannot be generated by a single element $\theta \in k$.
- Suppose we have $\alpha \in K = F_p(t, u)$. Then we must have that $\alpha^p \in F_p(t^p, u^p)$
  frobenius fixes the base field. Thus, frobenius sends $\alpha \in K$ to $\alpha^p \in k$.
- Thus, if we now had that $K = k(\alpha)$, this could not be, since by the
  previous argument, the extension has degree $p$.  But we previously saw that
  $K$ has degree at least $p^2$. Thus, this field extension $K/k$ **does not have a primitive element**.
- We will now show that $K/k$ has infinitely many intermediate subfields.
- Pick elements of the form $\{ t^p + \beta u^p \in K : \beta \in K \}$. We claim that $K_\beta \equiv k(\beta) = F_p(t^p, u^p, \beta)$ are all different fields for different $\beta \in K$.
- Suppose for contradiction that $C = K_\beta = K_\gamma$ for $\beta \neq \gamma$ ($\beta, \gamma \in K$).
  [$C$ for contradictory extension]
- This means that $t^p + \beta u^p - (t^p + \gamma u^p) \in C$, or that $(\beta - \gamma) u^p \in C$,
  which implies that $u^p \in C$, since we know that $(\beta - \gamma)^{-1} \in C$, as $C$ must contain both $\beta$ and $\gamma$.
- Since $u^p \in C$, $\beta \in C$, and $t^p + \beta u^p \in C$, we must have $t^p \in C$.
- This is a contradiction, since thus now means that $C = K$, where $C = k(t^p + \beta u^p)$,
  which makes $t^p + \beta u^p$ a primitive element, somthing that we saw is impossible.
- The key idea is that the extension is generated by a *minimal* polynomial $X^p - 1$, which *factorizes* as $(X - 1)^p$.
  We lose the connection between minimality and irreducibility, which makes the extension inseparable, since the 
  minimal polynomial now has repeated roots.
- [Reference](https://www.mathcounterexamples.net/a-finite-extension-that-contains-infinitely-many-subfields/)

#### Part 2: If $E/k$ is finite and separable then it has a primitive element
- Let $K = F(\alpha, \beta)$ be separable for $\alpha, \beta \in K$. Then we will show that there exists a primitive element $\theta \in K$ such that $K = F(\theta)$.
- By repeated application, this shows that for any number of generators $K = F(\alpha_1, \dots, \alpha_n)$, we can find a primitive element.
- If $K$ is a finite field, then the generator of the cyclic group $K^\times$ is a primitive element.
- So from now on, suppose $K$ is infinite, and $K = F(\alpha, \beta)$ for $\alpha, \beta \in F$.
- Let $g$ be the minimal polynomial for $\alpha$, and $h$ the minimal polynomial for $\beta$. Since the field is separable, $g, h$ have unique roots.
- Let the unique roots of $g$ be $\alpha_i$ such that $\alpha = \alpha_1$, and similarly let the unique roots of $h$ be $\beta_i$ such that $\beta = beta_1$.
- Now consider the equations $\alpha_1 + f_{i, j} \beta_1 = \alpha_i + f_{i, j} \beta_j$ for $i \in [1, deg(g)]$ and $j \in [1, deg(h)]$.
- Rearranging, we get $(\alpha_1 - \alpha_j) = f_{i, j} (\beta_j - \beta_1)$. Since $\beta_j \neq \beta_1$ and $\alpha_1 \neq \alpha_j$, this shows that there
  is a unique $f_{i, j} \equiv (\alpha_1 - \alpha_j)/(\beta_j - \beta_1)$ that solves the above equation.
- Since the extension $F$ is infinite, we can pick a $f_*$ which avoids the finite number of $f_{i, j}$.
- Thus, once we choose such an $f_*$, let $\theta \equiv a_1 + f b_1$. Such a $\theta$ can never be equal to $\alpha_i + f \beta_j$ for _any_ $f$, since the only choices of $f$
  that make $\alpha_1 + f \beta_1 = \alpha_i + f \beta_j$ true are the $f_{i, j}$, and $f_*$ was chosen to be different from these!
- Now let $F_\theta \equiv F(\theta)$. Since $\theta \in K$, $E$ is a subfield of $K$.
- See that $K = F(\alpha, \beta) = F(\alpha, \beta, \alpha + f \beta) = F(\beta, \alpha + f \beta) = F(\theta, \beta) = F_\theta(\beta)$.
- We will prove that $K = F_\theta$.
- Let $p(x)$ denote the minimal polynomial for $\beta$ over $F_\theta$. Since $K = F_\theta(\beta)$, if $p(x)$ is trivial, the $K = F_\theta$.
- By definition, $\beta$ is a root of $h(x)$. Since $p(x)$ is an irreducible over $F_\theta$, we have that $p(x)$ divides $h(x)$
  [proof sketch: irreducible polynomial $p(x)$ shares a root with $h(x)$. Thus, $gcd(p(x), h(x))$ must be linear or higher. Since $gcd$ divides $p(x)$, we must have
   $gcd = p(x)$ as $p(x)$ is irreducible and cannot have divisors. Thus, $p(x)$, being the GCD, also divides $h(x)$].
- Thus, the roots of $p(x)$ must be a subset of the roots $\{ \beta_j \}$ of $h(x)$.
- Consider the polynomial $k(x) = g(\theta - f_* \cdot x)$. $\beta$ is also a root of the polynomial $k(x)$, since $k(\beta) = g(\theta - f_* \beta)$,
  which is equal to $g((\alpha + f_* \beta) - f_* \beta) = g(\alpha) = 0$. [since $\alpha$ is a root of $g$].
- Thus, we must have $p(x)$ divides $k(x)$.
- We will show that $\beta_j$ is not a root of $k(x)$ for $j \neq 2$. $k(\beta_j) = 0$ implies $g(\theta - f_* \beta_j) = 0$, which implies $\theta - f_* \beta_j = \alpha_i$
  since the roots of $g$ are $\alpha_i$. But then we would have $\theta = \alpha_i + f_* \beta_j$, a contradiction as $\theta$ was chosen precisely to _avoid_ this case!
- Thus, every root of $p(x)$ must come from $\{ \beta_j \}$. Also, the roots of $p(x)$ must come from the roots of $k(x)$. But $k(x)$ only shares the root $\beta_1$
  with the set of roots $\beta_2, \dots, \beta_j$. Also, $p(x)$ does not have multiple roots since it is separable. Thus, $p(x)$ is linear, and the degree of the field extension
  is 1. Therefore, $K = E = F(\theta)$.

#### References
- [Reference 1: Primitive Element theorem at Planet Math](https://planetmath.org/proofofprimitiveelementtheorem)
- [Reference 2: NPTEL which has proof based on embeddings into alg. closure](https://nptel.ac.in/content/storage2/courses/111101001/downloads/Lecture12.pdf)
- [Reference 3: ](https://sites.math.washington.edu/~greenber/MATH404-PrimElem.pdf)


# Separable extension via embeddings into alg. closure

#### Defn by embeddings
- Let $L/K$ be a finite extension.
- It is separable iff a given embedding $\sigma: K \to \overline K$ can be extended in $[L:K]$ ways (This number can be at most $[L:K]$.)
- We call the numbe of ways to embed $L$ in $\overline K$ via extending $\sigma$ to be the _separability degree_ of $L/K$.


##### At most $[L:K]$ embeddings exist

- We will show for simple extensions $K(\alpha)/K$ that there are at most $[K(\alpha): K]$ ways to extend $\sigma: K \to \overline K$ into $\sigma': K(\alpha) \to \overline K$.
- We use two facts: first, $\sigma'$ is entirely determined by where it sends $\alpha$. Second, $\alpha$ can only go to another root of its minimal polynomial $p \in K[x]$.
  Thus, there are only finitely many choices, and the minimal polynomial has at most $degree(p)$ unique roots, and $[K(\alpha):K] = degree(p)$.
  Thus, there are at most $degree(p_\alpha) = [L:K]$ choices of where $\alpha$ can go to, which entirely determines $\sigma'$. Thus there are at most $degree(p) = [K(\alpha):K]$
  choices for $\sigma'$.
- Given a larger extension, write a sequence of extensions $L = K(\alpha_1)(\alpha_2)\dots(\alpha_n)$. Then, since $[L:K] = [K(\alpha):K][K(\alpha_1, \alpha_2):K(\alpha_1)]$
  and so on, can repeatedly apply the same argument to bound the number of choices of $\sigma'$.
- In detail, for the case $K(\alpha)/K$, consider the minimal polynomial of $\alpha$, $p(x) \in K[x]$. Then $p(\alpha) = 0$.
- Since $\sigma$ fixes $K$, and $p$ has coefficients from $K$, we have that $\sigma(p(x)) = p(\sigma(x))$.
- Thus, in particular, $\sigma(0) = \sigma(p(\alpha)) = p(\sigma(\alpha))$.
- This implies that $p(\sigma(\alpha)) = 0$, or $\sigma(\alpha)$ is a root of $p$.
- Since $\sigma': L \to \overline K$, $\sigma'$ can only map $\alpha$ to one of the other roots of $p$.
- $p$ has at most $deg(p)$ unique roots [can have repeated roots, or some such, so could have fewer that that].
- Further, $\sigma'$ is entirely determined by where it maps $\alpha$. Thus, there are at most $[K(\alpha):K]$ ways to extend $\sigma$ to $\sigma'$.

##### Separability is transitive
- Given a tower $K \subseteq L \subseteq M \subseteq \overline K$, we fix an embedding $\kappa: K \to \overline K$. If both $L/K$ and $M/L$ are
  finite and separable, then $\kappa$ extends into $\lambda: L \to \overline K$ through $L/K$ in $[L:K]$ ways, and then again
  as $\mu: L \to \overline K$ in $[M:L]$ ways.
- This together means that we have $[L:K] \cdot [M:L] = [M:K]$ ways to extend $\kappa$ into $\mu$, which is the maximum possible.
- Thus, $M/K$ is separable.

##### Separable by polynomial implies separable by embeddings
- Let every $\alpha \in L$ have minimal polynomial that is separable (ie, has distinct roots).
- Then we must show that $L/K$ allows us to extend any embedding $\sigma: K \to \overline K$ in $[L:K]$ ways into $\sigma': L \to K$
- Write $L$ as a tower of extensions. Let $K_0 \equiv K$, and $K_{i+1} \equiv K_i(\alpha_i)$ with $K_n = L$.
- At each step, since the polynomial is separable, we have the maximal number of choices of where we send $\sigma'$. Since degree
  is multiplicative, we have that $[L:K] = [K_1:K_0][K_2:K_1]\dots[K_{n-1}:K_n$.
- We build $\sigma'$ inductively as $\sigma'_i: K \to K_i$ with $\sigma'_0 \equiv \sigma$.
- Then at step $i$, $\sigma'_{i+1}: K \to K(i+1)$ which is $\sigma'_{i+1}: K \to K_i(\alpha_{i+1})$ has $[K_{i+1}:K_i]$ choices, since $\alpha_{i+1}$ is separable over
  $K_i$ since its minimal polynomial is separable.
- This means that in toto, we have the correct $[L:K]$ number of choices for $\sigma_n: K \to K_n = L$, which is what it means to be separable by embeddings.

##### Separable by embeddings implies separable by polynomial
- Let $L/K$ be separable in terms of embeddings. Consider some element $\alpha \in L$, let its minimal polynomial be $p(x)$.
- Write $L = K(\alpha)(\beta_1, \dots, \beta_n)$. Since degree is multiplicative, we have $[L:K] = [K(\alpha):K][K(\alpha, \beta_i):K(\alpha)]$.
- So given an embedding $\sigma: K \to \overline K$,we must be able to extend it in $[L:K]$ ways.
- Since $\sigma$ must send $\alpha$ to a root of $\alpha$, and we need the total to be $[L:K]$, we must have that $p(x)$ has no repeated roots.
- If $p(x)$ had repeated roots, then we will have fewer choices of $\sigma(\alpha)$ thatn $[K(\alpha):K]$, which means the total count of choices for $\sigma'$ will be
  less than $[L:K]$, thereby contradicting separability.


#### Finite extensions generated by separable elements are separable

- Let $L = K(\alpha_1, \dots, \alpha_n)$ be separable, so there are $[L: K]$ ways to extend a map $\kappa: K \to \overline K$ into $\lambda: L \to \overline L$.
- Since we have shown that separable by polyomial implies separable by embedding, we write $L = K(\alpha_1)(\alpha_2)\dots(\alpha_n)$. Each step is separable
  by the arguments given above in terms of counting automorphisms by where they send $\alpha_i$. Thus, the full $L$ is separable.

##### References
- https://math.stackexchange.com/questions/2227777/compositum-of-separable-extension
- https://math.stackexchange.com/questions/1248781/primitive-element-theorem-without-galois-group


# Separable extensions via derivation
- Let $R$ be a commutative ring, $M$ an $R$-module. A derivation is a map such that $D(a + b) = D(a) + D(b)$ and $D(ab) = aD(b) + D(a)b$ [ie, the calculus chain rule is obeyed].
- Note that the map does not need to be an $R$-homomorphism (?!)
- The elements of $R$ such that $D(R) = 0$ are said to be the _constants_ of $R$.
- The set of constants under $X$-differentiation for $K[X]$ in char. 0 is $K$, and $K[X^p]$ in char. p
- Let $R$ be an integral domain with field of fractions $K$. Any derivation $D: R \to K$ uniquely extends to $D': K \to K$ given by the
  quotient rule: $D'(a/b) = (bD(a) - aD(b))/b^2$.
- Any derivation $D: R \to R$ extends to a derivation $(.)^D: R[x] \to R[x]$. For a $f = \sum_i a_i x^i \in R[x]$, the derivation
  is given by $f^D(x) \equiv \sum_i D(a_i) X^i$. This applies $D$ to $f(x)$ coefficientwise.
- For a derivation $D: R \to R$ with ring of constants $C$, the associated derivation $(.)^D: R[x] \to R[x]$ has ring of constants $C[x]$.
- **Key thm:** Let $L/K$ be a field extension and let $D: K \to K$ be a derivation. $D$ extends uniquely to $D_L$ iff $L$ is separable over $K$.

#### If $\alpha$ separable, then derivation over $K$ lifts uniquely to $K(\alpha)$

- Let $D: K \to K$ be a derivation.
- Let $\alpha \in L$ be separable over $K$ with minimal polynomial $\pi(X) \in K[X]$.
- So, $\pi(X)$ is irreducible in $K[X]$, $\pi(\alpha) = 0$, and $\pi'(\alpha) \neq 0$.
- Then $D$ has a unique extension $D': K(\alpha) \to K(\alpha)$ given by:

\begin{aligned}
D'(f(\alpha)) \equiv f^D(\alpha) - f'(\alpha) \frac{\pi^D(\alpha)}{pi'(\alpha)}
\end{aligned}

- To prove this, we start by assuming $D$ has an extension, and then showing that it must agree with $D'$. This tells us why it __must__ look this way.
- Then, after doing this, we start with $D'$ and show that it is well defined and obeys the derivation conditions. This tells us why it's __well-defined__.

#### Non example: derivation that does not extend in inseparable case

- Consider $F_p(u)$ as the base field, and let $L = F_p(u)(\alpha)$ where $\alpha$ is a root of $X^p - u \in F_p(u)[x]$. This is inseparable over $K$.
- The $u$ derivative on $F_p(u)$ [which treats $u$ as a polynomial and differentiates it] cannot be extended to $L$.
- Consider the equation $\alpha^p = u$, which holds in $L$, since $\alpha$ was explicitly a root of $X^p - u$.
- Applying the $u$ derivative gives us $p \alpha^{p-1} D(\alpha) = D(u)$. The LHS is zero since we are in characteristic $p$.
  The RHS is 1 since $D$ is the $u$ derivative, and so $D(u) = 1$. This is a contradiction, and so $D$ does not exist [any mathematical operation must respect equalities].


#### Part 2.a: Extension by inseparable element $\alpha$ does not have unique lift of derivation for $K(\alpha)/K$
- Let $\alpha \in L$ be inseparable over $K$. Then $\pi'(X) = 0$ where $\pi(X)$ is the minimal polynomial for $\alpha \in L$.
- In particular, $\pi'(\alpha) = 0$. We will use the vanishing of $\pi'(\alpha)$ to build a nonzero derivation on $K(\alpha)$ which extends the zero
  derivation on $K$.
- Thus, the zero derivation on $K$ has two lifts to $K(\alpha)$: one as the zero derivation on $K(\alpha)$, and one as our non-vanishing lift.
- Define $Z: K(\alpha) \to K(\alpha)$ given by $Z(f(\alpha)) = f'(\alpha)$ where $f(x) \in K[x]$. By doing this, we are conflating elements $l \in K(\alpha)$
  with elements of the form $\sum_i k_i \alpha^i = f(\alpha)$. We need to check that this is well defined, that if $f(\alpha) = g(\alpha)$, then $Z(f(\alpha)) = Z(g(\alpha))$.
- So start with $f(\alpha) = g(\alpha)$. This implies that $f(x) \equiv g(x)$ modulo $\pi(x)$.
- So we write $f(x) = g(x) + k(x)\pi(x)$.
- Differentiating both sides wrt $x$, we get $f'(x) = g'(x) + k'(x) \pi(x) + k(x) \pi'(x)$.
- Since $\pi(\alpha) = \pi'(\alpha) = 0$, we get that $f'(\alpha) = g'(\alpha) + 0$ by evaluating previous equation at $\alpha$.
- This shows that $Z: K(\alpha) \to K(\alpha)$ is well defined.
- See that the derivation $Z$ kills $K$ since $K = K \alpha^0$. But we see that $Z(\alpha) = 1$, so $Z$ extends the zero derivation on $K$ while not being zero itself.
- We needed separability for the derivation to be well-defined.


##### Part 2.b: Inseparable extension can be written as extension by inseparable element

- Above, we showed that if we have $K(\alpha)/K$ where $\alpha$ inseparable, then derivations cannot be uniquely lifted.
- We want to show that if we have $L/K$ inseparable, then derivation cannot be uniquely lifted. But this is not the same!
- $L/K$ inseparable implies that there is some $\alpha \in L$ which is inseparable, NOT that $L = K(\alpha)/K$ is inseparable!
- So we either need to find some element $\alpha$ such that $L = K(\alpha)$ [not always possible], or find some field $F$ such that $L = F(\alpha)$ and
  $\alpha$ is inseparable over $F$.
- Reiterating: Given $L/K$ is inseparable, we want to find some $F/K$ such that $L = F(\alpha)$ where $\alpha$ is inseparable over $F$.
- TODO!


#### Part 1 + Part 2: Separable iff unique lift

- Let $L/K$ be separable. By primitive element theorem, $L = K(\alpha)$ for some $\alpha \in L$, $\alpha$ separable over $K$.
- Any derivation of $K$ can be extended to a derivation of $L$ from results above. Thus, separable implies unique lift.
- Suppose $L/K$ is inseparable. Then we can write $L = F(\alpha)/K$ where $\alpha$ is inseparable over $F$, and $K \subseteq F \subseteq L$.
- Then by Part 2.a, we use the $Z$ derivation to non-zero derivation on $L$ that is zero on $F$. Since it is zero on $F$ and $K \subseteq F$, it is zero on $K$.
- This shows that if $L/K$ is inseparable, then there are two ways to lift the zero derivation, violating uniqueness.


#### Lemma: Derivations at intermediate separable extensions
- Let $L/K$ be a finite extension, and let $F/K$ be an intermediate separable extension. So $K \subseteq F \subseteq L$ and $F/K$ is separable.
- Then we claim that every derivation $D: F \to L$ that sends $K$ to $K$ has values in $F$. (ie, it's range is only $F$, not all of $L$).
- Pick $\alpha \in F$, so $\alpha$ is separable over $K$. We know what the unique derivation looks like, and it has range only $F$.


#### Payoff: An extension $L = K(\alpha_1, \dots, \alpha_n)$ is separable over $K$ iff $\alpha_i$ are separable

- Recursively lift the derivations up from $K_0 \equiv K$ to $K_{i+1} \equiv K_i(\alpha_i)$. If the lifts all succeed,
  then we have a separable extension. If the unique lifts fail, then the extension is not separable.
- The lift can only succeed to uniquely lift iff the final extension $L$ is separable.

# Irreducible polynomial over a field divides any polynomial with common root

- Let $p(x) \in K[x]$ be an irreducible polynomial over a field $K$. Let $p$ it share a common root $\alpha$ with another polynomial $q(x) \in K[x]$. Then we claim
  that $p(x)$ divides $q(x)$.
- Consider the GCD $g \equiv gcd(p, q)$. Since $p, q$ share a root $\alpha$, we have that $(x - \alpha)$ divides $g$.  Thus $g$ is a non-constant polynomial.
- Further, we have $g | p$ since $g$ is GCD. But $p$ is irreducible, it cannot be written as product of smaller polynomials, and thus $g = p$.
- Now, we have $g | q$, but since $g = p$, we have $g | q$. This implies $p | q$ for any $q$ that shares a root with $p$.

# Galois extension

- Let $M$ be a finite extension of $K$. Let $G = Gal(M/K)$. Then $M$ is said to be Galois iff:

1. $M$ is normal and separable (over $K$).
2. $deg(M/K) = |G|$. We will show that $|G| \leq deg(M/K)$. So $M$ is "symmetric as possible" --- have the largest possible galois group
3. $K = M^G$ [The fixed poits of $M$ under $G$]. This is useful for examples.
4. $M$ is the splitting field of a separable polynomial over $K$. Recall that a polynomial is separable over $K$ if it has distinct roots in
   the algebraic closure of $K$. Thus, the number of roots is equal to the degree.
5. $K \subseteq L \subseteq M$ and $1 \subseteq H \subseteq G$: There is a 1-1 correspondece between $L \mapsto Gal(M/L)$ [NOT $L/K$!],
   and the other way round, to go from $H$ to $M^H$. This is a 1-1 correspondence. $L$ is in the denominator because we want to fix $L$ when we go back.

- We'll show (1) implies (2) implies (3) implies (4) implies (1)

#### (4) implies (1)

- We've shown that splitting fields of _sets_ of polynomials are normal, so this case is trivial.
- Just to recall the argument, let $M$ be the splitting field of some separable polynomial $p \in K[x]$ over $K$. We need to show that $M$ is normal and separable.
- It's separable because it only adds elements to new elements to $K$ which are the roots of $p$, a separable polynomial. Thus, the minimal polynomial of new elements
  will also be separable, and the base field is trivially separable.
- We must now show that $M$ is normal. We proceed by induction on degree.
  Normality is trivial for linear polynomials, if $M$ contains one root it
  contains all of the roots (the only one).
- Let $q \in K[x]$ have a root $\alpha \in M$. If $\alpha \in K$, then divide by $(x - \alpha)$ and use induction. So suppose $\alpha \not \in K$.
- Then $\alpha$ is some element that is generated by the roots

- [Borcherds lecture](https://www.youtube.com/watch?v=g87CBjYqHWk&list=PL8yHsr3EFj53Zxu3iRGMYL_89GDMvdkgt&index=8)


# Separability of field extension as diagonalizability

- Take $Q(\sqrt 2)$ over $Q$. $\sqrt(2)$ corresponds to the linear transform $[0 1][2 0]$ over the basis $a + b \sqrt 2$.
- The chracteristic polynomial of the linear transform is $x^2 - 2$, which is indeed the minimal polynomial for $\sqrt(2)$.
- Asking for every element of $Q(\sqrt 2)$ to be separable is the same as
  asking every element of $Q(\sqrt 2)$ interpreted as a linear opearator to have separable minimal polynomial.
- Recall that the minimal polynomial is the lowest degree polynomial that annhilates the linear operator.
  So $minpoly(I) = x - 1$, $charpoly(I) = (x - 1)^n$.

# Motivation for the compact-open topology

- If $X$ is a compact space and $Y$ is a metric space, consider two functions $f, g: X \to Y$.
- We can define a distance $d(f, g) \equiv \min_{x \in X} d(f(x), g(x))$.
- The $\min_{x \in X}$ has a maximum because $X$ is compact.
- Thus this is a real metric on the function space $Map(X, Y)$.
- Now suppose $Y$ is no longer a metric space, but is Haussdorf. Can we still define a topology on $Map(X, Y)$?
- Let $K \subseteq X$ be compact, and let $U \subseteq Y$ be open such that $f(K) \subseteq U$.
- Since $Y$ is Hausdorff, $K \subseteq X$


# Example of covariance zero, and yet "correlated"

- $x$ and $y$ coordinates of points on a disk.
- $E[X], E[Y]$ is zero because symmetric about origin.
- $E[XY] = 0$ because of symmetry along quadrants.
- Thus, $E[XY] - E[X] E[Y]$, the covariance, is zero.
- However, they are clearly correlated. Eg. if $x = 1$, then $y$ must be zero.
- If $Y = aX+b$ the $corr(X, Y) = sgn(a)$.

# Hypothesis Testing

#### Mnemonic for type I versus type II errors

- Once something becomes "truth", challenging the status quo and making it
  "false" is very hard. (see: disinformation).
- Thus, Science must have high barriers for accepting hypothesis as true.
- That is, we must have high barries for incorrectly rejecting the null (that
  nothing happened).
- This error is called as type I error, and is denoted by $\alpha$ (more
  important error).
- The other type of error, where something is true, but we conclude it is false
  is less important. Some grad student can run the experiment again with better
  experimental design and prove it's true later if need be.
- Our goal is to protect science from entrenching/enshrining "wrong" facts as
  true. Thus, we control type I errors.
- Our goal is to "reject" current theories (the null) and create "new theories" (the alternative).
  Thus, in statistics, we setup our tests with the goal of enabling us to "reject the null".

#### Mnemonic for remembering the procedure
- $H_0$ is the null hypothesis (null for zero). They are presumed innocent until proven guilty.
- If $H_0$ is judged guilty, we reject them (from society) and send them to the gulag.
- If $H_0$ is judged not guilty, we retain them (in society).
- We are the prosecution, who are trying to reject $H_0$ (from society) to send them to the gulag.
- The scientific /statistical process is the Judiciary which is attempting to keep the structure of "innocent until proven guilty" for $H_0$.
- We run experiments, and we find out how likely it is that $H_0$ is guilty based on our experiments.
- We calculate an error $\alpha$, which is the probably we screw up the fundamental truth of the court: we must not send an innocent man to the gulag.
  Thus, $\alpha$ it the probability that $H_0$ is innocent (ie, true) but we reject it (to the gulag).

#### P value, Neyman interpretation
- Now, suppose we wish to send $H_0$ to the gulag, because we're soviet union
  like that. What's the probability we're wrong in doing so? (That is, what is the probability that us sending $H_0$ is innocent and we are
  condemning them incorrectly to a life in the gulag)? that's the $p$ value. We estimate this based on our expeiment, of course.
- Remember, we **can never speak** of the "probability of $H_0$ being true/false", because $H_0$ _is true_ or _is false_ [frequentist]. There is no
  probability.

#### P value, Fisher interpretation

- The critical region of the test corresponds to those values of the test statistic
  that would lead us to reject null hypothesis (and send it to the gulag).
- Thus, the critical region is also sometimes called the "rejection region",
  since we reject $H_0$ from society if the test statistic lies in this region.
- The rejection region is usually corresponds to the tails of the sampling distribution.
- The reason for that is that a good critical region almost always corresponds
  to those values of the test statistic that are least likely to be observed if
  the null hypothesis is true. This will be the "tails" / "non central tendency" if a test is good.
- In this situation, we define the $p$ value to be the probability we would have observed a test statistic that is
  at least as extreme as the one we did get. `P(new test stat >= cur test stat)`.
- ??? I don't get it.


#### P value, completely wrong edition

- "Probability that the null hypothesis is true" --- WRONG
- compare to "probability _us_ rejecting the null hypothesis is wrong" -- CORRECT. The probability is in US being wrong, and has NOTHING to do with the
  truth or falsity of the null hypothesis _itself_.

#### Power of the test

- The value $\beta$ is the probability that $H_0$ was guilty, but we chose to retain them into society instead.
- The less we do this (ie, the larger is $1 - \beta$), the more "power" our test has.



# Dumb mnemonic for remembering adjunction turnstile

- The left side of the adjunction `F` wants to "push the piston" on the right
  side, so it must be `F -| G` where `-|` allows `F` to "crush" `G` with the
  flat surface `|`.

# Delta debugging

- [Delta debugging from the fuzzing book]([email protected]:opencompl/lean-gap.git)
- Start with a program that crashes.
- Run `reduce` on it:

```
def reduce(inp: str, test: str -> bool):
  assert test(inp) == False
  # v remove 1/2 of the lines.
  n = 2 # Initial granularity
  while len(inp) >= 2:
    ix = 0
    found_failure = False
    skiplen = len(inp) / n

    while ix < len(inp):
      inp_noix = inp[:ix] +inp[ix+skiplen:]
      if not self.test(inp_noix):
          inp = inp_noix # use smaller input
          n = max(n - 1, 2) # decrease granularity by 1
          found_failure = True; break
      else:
        ix += skiplen

    if not found_failure:
      if n == len(inp): break
      n = min(n * 2, len(inp)) # double

  return inp
```

# Tidy Data

- [The paper](http://vita.had.co.nz/papers/tidy-data.pdf)

- Tidy data is a standard way of mapping the meaning of a dataset to its structure. A dataset is
  messy or tidy depending on how rows, columns and tables are matched up with observations,
  variables and types. In tidy data:

> 1. Each variable forms a column.
> 2. Each observation forms a row.
> 3. Each type of observational unit forms a table.

> While the order of variables and observations does not affect analysis, a good ordering makes
> it easier to scan the raw values. One way of organising variables is by their role in the analysis:
> are values fixed by the design of the data collection, or are they measured during the course of
> the experiment? Fixed variables describe the experimental design and are known in advance.
> Computer scientists often call fixed variables dimensions, and statisticians usually denote them
> with subscripts on random variables. Measured variables are what we actually measure in the
> study. Fixed variables should come first, followed by measured variables, each ordered so that
> related variables are contiguous. Rows can then be ordered by the first variable, breaking
> ties with the second and subsequent (fixed) variables. This is the convention adopted by all
> tabular displays in this paper.


#### Messy 1: Column headers are values, not variable names

- eg. columns are `religion |<$10k |$10-20k |$20-30k |$30-40k |$40-50k |$50-75k`.
- melt dataset to get `molten` stacked data.

#### Messy 2: Multiple variables stored in one column

- This often manifests _after_ melting.
- eg. columns are `country | year | m014 | m1524 | .. | f014 | f1524...`
- columns represent _both_ sex _and_ age ranges. After metling, we get a single column `sexage` with entries like `m014` or `f1524`
- The data is still _molten_, so we should reshape it before it sets into tidy columnlar data. We do this by splitting the column into two,
  one for `age` and one for `sex`.


#### Messy 3: Variables are stored in both rows and columns

- Original data:

```
id      year month element d1 d2 d3 d4 d5 ...
MX17004 2010 1     tmax    โ€” โ€” โ€” โ€” โ€” โ€” โ€” โ€”
MX17004 2010 1     tmin    โ€” โ€” โ€” โ€” โ€” โ€” โ€” โ€”
MX17004 2010 2     tmax    โ€” 27.3 24.1 โ€” โ€” โ€” โ€” โ€”
MX17004 2010 2     tmin    โ€” 14.4 14.4 โ€” โ€” โ€” โ€” โ€”
MX17004 2010 3     tmax    โ€” โ€” โ€” โ€” 32.1 โ€” โ€” โ€”
MX17004 2010 3     tmin    โ€” โ€” โ€” โ€” 14.2 โ€” โ€” โ€”
MX17004 2010 4     tmax    โ€” โ€” โ€” โ€” โ€” โ€” โ€” โ€”
MX17004 2010 4     tmin    โ€” โ€” โ€” โ€” โ€” โ€” โ€” โ€”
MX17004 2010 5     tmax    โ€” โ€” โ€” โ€” โ€” โ€” โ€” โ€”
MX17004 2010 5     tmin    โ€” โ€” โ€” โ€” โ€” โ€” โ€” โ€”
```

- Some variables are in individual columns (id, year, month)
- Some variables are spread across columns (day is spread as d1โ€“d31)
- Some variables are smearted across rows (eg. `tmax/tmin`). TODO: what does this mean, really?
- First, tidy by collating into `date`:

```
id      date       element value
MX17004 2010-01-30 tmax 27.8
MX17004 2010-01-30 tmin 14.5
MX17004 2010-02-02 tmax 27.3
MX17004 2010-02-02 tmin 14.4
MX17004 2010-02-03 tmax 24.1
MX17004 2010-02-03 tmin 14.4
MX17004 2010-02-11 tmax 29.7
MX17004 2010-02-11 tmin 13.4
MX17004 2010-02-23 tmax 29.9
MX17004 2010-02-23 tmin 10.7
```
- Dataset above is still molten. Must reshape along `element` to get two columns for `max` and `min`. This gives:

```
id      date       tmax tmin
MX17004 2010-01-30 27.8 14.5
MX17004 2010-02-02 27.3 14.4
MX17004 2010-02-03 24.1 14.4
MX17004 2010-02-11 29.7 13.4
MX17004 2010-02-23 29.9 10.7
MX17004 2010-03-05 32.1 14.2
MX17004 2010-03-10 34.5 16.8
MX17004 2010-03-16 31.1 17.6
MX17004 2010-04-27 36.3 16.7
MX17004 2010-05-27 33.2 18.2
```

- Months with less than 31 days have structural missing values for the last day(s) of the month.
- The element column is not a variable; it stores the names of variables.


#### Multiple types in one table:


#### data manipulation, relationship to `dplyr`:

- [Data transformation in R for data science](https://r4ds.had.co.nz/transform.html)
- `mutate()` adds new variables that are functions of existing variables
- `select()` picks variables based on their names.
- `filter()` picks cases based on their values.
- `summarise()` reduces multiple values down to a single summary.
- `arrange()` changes the ordering of the rows.


#### Visualization

- Most of R's visualization ecosystem is tidy by default.
- base `plot`, `lattice`, `ggplot` are all tidy.


#### Modelling

- Most modelling tools work best with tidy datasets.


#### Questions about performance benching in terms of tidy
- Is runs of a program at different performance levels like `O1`, `O2`, `O3` to be stored as
  separate columns? Or as a categorical column called "optimization level" with
  entries stored in separate rows of `O1`, `O2`, `O3`?
- If we go by the tidy rule "Each variable forms a column", then this suggests that `optimization level` is a variable.
- Then the tidy rule `Each observation forms a row.` makes us use rows like `[foo.test | opt-level=O1 | <runtime>]` and `[foo.test | opt-level=O2 | <runtime>]`.
- Broader question: what is the tidy rule for categorical column?
- However, in the tidy data paper, Table 12, it is advocated to have two columns for `tmin` and `tmax` instead of having a column called `element` with
  choices `tmin`, `tmax`. So it seems to be preferred that if one has a categorical variable, we make its observations into columns.
- This suggests that I order my bench data as `[foo.test | O1-runtime=_ | O2-runtime=_ | O3-runtime=_ ]`.

# Normal subgroups through the lens of actions

- finite group is permutation subgroup
- ghg' is relavelling by g
- if gHg' = H, then H does not care about labelling
- thus H treats everyone uniformly
- prove that if H is normal, then if s in fix(H) then orb(s) in fix(H)
- when is stab(S) normal?when Stab(gx) equals g Stab(x) g' ?

- topology onS: closed sets are the common fixpoints of a set of group elements.

# Writing rebuttals, Tobias style

- Writing rebuttals, key take-aways:
- Make your headings for reviewers who are seeing your rebuttal projected on a screen to defend your paper.
- Don't write in Q?A style
- Write as a paragraph, where we write the strong answer first, and then point back to the question.
- Use subclause to indicate that sentence is unfinished. Eg: the bug in our compiler has been fixed (bad!).
  The reader may see "the bug in our compiler..." and conclude something crazy.
  Ratther, we should write "While there was a bug in our compiler, we fixed it ...". The `While` makes it
  clear

# LCS DP: The speedup is from filtration

- I feel like I finally see where the power of dynamic programming lies.
- Consider the longest common subsequence problem over arrays $A$, $B$ of size $n$, $m$.
- Naively, we have $2^n \times 2^m$ pairs of subsequences and we need to process each of them.
- How does the LCS DP solution manage to solve this in $O(nm)$?
- Key idea 1: create "filtration" of the problem $F_{i, j} \subseteq 2^n\times2^m$. At step $(i, j)$, consider the "filter" $F_{i, j}$
  containing all pairs of subsequences $(s \in 2^n, t \in 2^m)$ where `maxix(s) \leq i` and `maxix(t) \leq j`.
- These filters of the filtration nest into one another, so $F_{i, j} \subseteq F_{i', j'}$ iff $i \leq i'$ and $j \leq j'$.
- Key idea 2: The value of `max LCS(filter)` is (a) monotonic, and (b) can be computed efficiently from the values of lower filtration.
  So we have a monotone map from the space of filters to the solution space, and this monotone map is efficiently computable, given the
  values of filters below this in the filtration.
- This gives us a recurrence, where we start from the bottom filter and proceed to build upward.
- See that this really has _nothing_ to do with recursion. It has to do with _problem decomposition_.
  We decompose the space $2^n \times 2^m$
  cleverly via filtration $F_{i, j}$ such that `max LCS(F[i, j])` was efficiently computable.
- To find a DP, think of the entire state space, then think of filtrations, such that the solution function becomes a monotone map, and the solution
  function is effeciently computable given the values of filters below it.

# Poisson distribution

- Think about flipping a biased coin with some bias $p$ to associate a coin flip to each real number. Call this $b: \mathbb R \to \{0, 1\}$.
- Define the count of an interval $I$ as $\#I \equiv \{ r \in I | b(r) = 1 \}$.
- Suppose that this value $\#I$ is finite for any bounded interval.
- Then the process we have is a poisson process.
- Since the coin flips are independent, all 'hits' of the event must be independent.
- Since there is either a coin flip or there is not, at most one 'hit' of the event can happen at any moment in time.
- Since the bias of the coin is fixed, the rate at which we see $1$s is overall constant.



# F1 or Fun : The field with one element

- Many combinatorial phenomena can be recovered as the "limit" of geometric phenomena over the "field with one element",
  a mathematical mirage.

#### Cardinality ~ Lines

- Consider projective space of dimension $n$ over $F_p$. How many lines are there?
- Note that for each non-zero vector, we get a 'direction'. So there are $p^n - 1$ potential directions.
- See that for any choice of direction $d \in F_p - \vec 0$, there are $(p -  1)$ "linearly equivalent" directions, given by $1 \cdot d$, $2 \cdot d$,
  \dots, $(p - 1) \cdot d$ which are all distinct since field multiplication is a group.
- Thus, we have $(p^n - 1)/(p - 1)$ lines. This is equal to $1 + p + p^2 + \dots + p^{n-1}$, which is $p^0 + p^1 + \dots + p^{n-1}$
- If we plug in $p = 1$ (study the "field with one element", we recover $\sum_{i=0}^{n-1} p^i = n$.
- Thus, "cardinality of a set of size $n$" is the "number of lines of $n$-dimensional projective space over $F_1$!
- Since $[n] \equiv \{1, 2, \dots, n\}$ is the set of size $n$, it is only natural that $[n]_p$ is defined to be the lines in $F_p^n$.
  We will abuse notation and conflate $[n]_p$ with the cardinality, $[n]_p \equiv (p^n - 1)/(p - 1)$.


#### Permutation ~ Maximal flags

- Recall that a maximal flag is a sequence of subspaces $V_1 \subseteq V_2 \subseteq \dots \subseteq V$. At each step, the dimension increases by $1$,
  and we start with dimension $1$. So we pick a line $l_1$ through the origin for $V_1$. Then we pick a plane through the origin that contains the line $l_1$
  through the origin. Said differently, we pick a plane $p_2$ spanned by $l_1, l_2$. And so on.
- How many ways can we pick a line? That's $[n]_p$. Now we need to pick another line orthogonal to the first line. So we build the quotient space $F_p^n/L$,
  which is $F_p^{n-1}$. Thus picking another line here is $[n-1]_p$.  On multiplying all of these, we get $[n]_p [n-1]_p \dots [1]_p$.
- In the case of finite sets, this gives us $1 \cdot 2 \cdot \dots n = n!$.

#### Combinations ~ Grassmanian

- Recall that a grassmanian consists of $k$ dimensional subspaces of an $n$ dimensional space.


- [Reference: This week's finds 184 by baez](https://math.ucr.edu/home/baez/week184.html)

# McKay's proof of Cauchy's theorem for groups [TODO]

- In a group, if $gh = 1$ then $hg = 1$. Prove this by writing $hg = hg (h h^{-1}) = h(gh)h^{-1} = h \cdot 1 \cdot h^{-1} = 1$.
- We can interpret this as follows: in the multiplication table of a group, firstly, each row contains exactly one $1$.
- Also, when $g \neq h$ (ie, we are off the main diagonal of the multiplication table), each $gh = 1$ has a "cyclic permutation solution" $hg = 1$.
- If the group as even order, then there are even number of $1$s on the main diagonal.
- Thus, the number of solutions to $x^2 = 2$ for $x \in G$ is even, since each solution has another paired with it.
- Let's generalize from pairs to
- [Reference](http://www.cs.toronto.edu/~yuvalf/McKay%20Another%20Proof%20of%20Cauchy's%20Group%20Theorem.pdf)


# ncdu for disk space measurement

- I've started to use `ncdu` to get a quick look at disk space instead of `baobab`. It's quite handy
  since it's an ncurses based TUI.


# nmon versus htop

- I've switched to using `nmon` instead of `htop` for viewing system load. It's TUI looks much nicer than `htop`,
  and I find its process list much easier to parse.

# Schrier sims --- why purify generators times coset

- Let `p = (0 3 4)(1 2)`. Let `G = <p>`. What is the stabilizer of `k=0`?
- `purify(p) = e` so we would imagine we would have `H = e`.
- But actually, consider orbit(k). We have `0 <-> id`, `3 <-> p`, `4 <-> p^2`.
- If I now consider `p * orbit(k)` then I get `p, p^2, p^3`, where `purify(p) = id`, `purify(p^2) = id`, `purify(p^3) = p^3`.
- Thus we find the nontrivial generator `p^3`.

# Vyn's feeling about symmetry

- They are of the opinion that the correct definition of a symmetry of an object $S$ in space is that
  a transformation $T$ is a symmetry of $S$ iff $T(S) = S$ (as a set).
- The above rules out things like translations of a cube.
- Indeed, one can only recover translations by considering a line on the space and then considering the orbit of the line under
  a specific translation $T$.

# Convergence in distribution is very weak

- consider $X \sim N(0, 1)$. Also consider $-X$ which will be identically distributed (by symmetry of $-$ and $N$).
- So we have that $-X \sim N(0, 1)$.
- But this tells us nothing about $X$ and $-x$! so this type of "convergence of distribution" is very weak.
- Strongest notion of convergence (#2): Almost surely. $T_n \xrightarrow{a.s} T$ iff $P(\{ \omega : T_n(\omega) \to T(\omega) \}) = 1$.
  Consider a snowball left out in the sun. In a couple hours, It'll have a random shape, random volume, and so on. But the ball itself
  is a definite thing --- the $\omega$. Almost sure says that for almost all of the balls, $T_n$ converges to $T$.
- #2 notion of convergence: Convergence in probability.
  $T_n \xrightarrow{P} T$ iff $P(|T_n - T| \geq \epsilon) \xrightarrow{n \to \infty} 0$ for all
  $\epsilon > 0$. This allows us to squeeze $\epsilon$ probability under the rug.
- Convergence in $L^p$: $T_n \xrightarrow{L^p} T$ iff $E[|T_n - T|^p] \xrightarrow{n \to \infty} 0$. Eg. think of convergence in variance of a gaussian.
- Convergence in distrbution: (weakest): $T_n \xrightarrow{d} T$ iff $P[T_n \leq x] \xrightarrow{n \to \infty} P[T \leq x]$ for all $x$.

#### Characterization of convergence in distribution

- (1) $T_n \xrightarrow{d} T$
- (2) For all $f$ continuous and bounded, we have $E[f(T_n)] \xrightarrow{n \to \infty} E[f(T)]$.
- (2) we have $E[e^{ixT_n}] \xrightarrow{n \to \infty} E[e^{ixT}]$. [characteristic function converges].


#### Strength of different types of convergence

- Almost surely convergence implies convergence in probability. Also, the two limits (which are RVs) are almost surely equal.
- Convergence in $L^p$ implies convergence in probability and convergence in $L^q$ for all $q \leq p$. Also, the limits (which are RVs) are almost
  surely equal.
- If $T$ converges in probability, it also converges in distribution (meaning the two sequences will have the same DISTRIBUTION, not same RV).
- All of almost surely, probabilistic convergence, convergence in distribution (not $L^p$)
  map properly by continuous fns. $T_n \to T$ implies $f(T_n) \to f(T)$.
- almost surely implies P implies distribution convergence.


#### Slutsky's Theorem

- If $X_n \xrightarrow{d} X$ and $Y_n \xrightarrow{P} c$ (That is, the sequence of $Y_n$ is eventually deterministic),we then have that
  $(X_n, Y) \xrightarrow{d} (X, c)$. In particular, we get that $X_n + Y_n \xrightarrow{d} X + c$ and $X_n Y_n \xrightarrow{d} X c$.
- This is important, because in general, convergence in distribution says nothing about the RV! but in this special case, it's possible.

#### References

-   [MIT OCW stats](https://www.youtube.com/watch?v=C_W1adH-NVE&list=PLUl4u3cNGP60uVBMaoNERc6knT_MgPKS0&index=2)




# Class equation, P-group structure

#### Centralizer

- The centralizer of a subset $S$ of a group $G$ is largest subgroup of $G$ which is the center of $S$.
  It's defined as $C_G(S) \equiv \{ g \in G : \forall s \in S, gs = sg \}$. This can be
  written as $C_G(S) \equiv \{ g \in G : \forall s \in S, gsg^{-1} = s \}$.
-

#### Conjucacy classes and the class equation

- Define $g \sim g'$ if there exists a $h$ such that $g' =kgk^{-1}$. This is an equivalence
  relation on the group, and it partitions the group into _conjugacy classes_.
- Suppose an element $z \in G$ is in the center (Zentrum). Now, the product $kzk^{-1} = z$ for all $k \in G$.
  Thus, elements in the center all sit in conjugacy classes of size $1$.
- Let $Z$ be the center of the group, and let $\{ J_i \subset G \} $ (J for conJugacy) be conjugacy classes of elements other than the center.
  Let $j_i \in J_i$ be representatives of the conjugacy classes, which also generate the conjugacy class as orbits under the action of
  conjugation.
- By orbit stabilizer, we have that $|J_i| = |Orb(j_i)| = |G|/|Stab(j_i)|$.
- The stabilizer under the action of conjugation is the centralizer! So we have $|Orb(j_i) = |G|/|C(j_i)|$.
- Thus, we get the class equation: $|G| = |Z| + \sum_{j_i} |G|/C(j_i)|$.

#### $p$-group

- A $p$ group is a group where every element has order divisible by $p$.
- Claim: a finite group is a $p$-group iff it has cardinality $p^N$ for some $N$.
- Forward - $|G| = p^N$ implies $G$ is a $p$-group: Let $g \in G$. The $|\langle g \rangle|$ divides $|G| = p^N$ by Lagrange. Hence proved.
- Backward - $p$ divides |\langle g \rangle| for all $g \in G$ implies $|G| = p^N$ for some $N$: Write $G$ as disjoint union of cyclic subgroups:
  $G = \langle g_1 \rangle \cup \langle g_2 \rangle \cup \dots \langle g_n \rangle$. Take cardinality on both sides, modulo $p$.
  Each of the terms on the RHS $|\langle g_i \rangle|$ is divisible by $p$, and thus vanish. Thus, $|G| =_p 0 + 0 + \dots + 0 = 0$ modulo $p$.
  Hence, $|G|$ is divisible by $p$.

#### Center of $p$ group

- Let $G$ be a $p$-group. We know that $|G| = |Z(G)| + \sum_{g_i} |Orb(g_i)|$, where we are considering orbits under
  group conjugation.
- See that $|Orb(g_i)| = |G|/|Stab(g_i)|$. The quantity on the right must be a power of $p$ (since the numerator is $p^N$). The quantity
  must be more than $1$, since the element $g_i$ is not in the center (and thus is conjugated non-trivially by _some_ element of the group).
- Thus, $|Orb(g_i)|$ is divisible by $p$.
- Take the equation $|G| = |Z(G)| + \sum_{g_i} |Orb(g_i)|$ modulo $p$. This gives $0 =_p |Z(G)$. Hence, $Z(G) \neq \{ e \}$
  (Since that would give $|Z(G)| =_p 1 \neq 0$). So, the center is non-trivial.

#### Cauchy's theorem: order of group is divisible by $p$ implies group has element of order $p$.

- **Abelian case, order $p$**: immediate, must be the group $Z/pZ$ which has generator of order $p$. Now induction on group cardinality.
- **Abelian case, order divisible by $p$**: Pick an element $g \in G$ and let the cyclic subgroup be generated by it be $C_g$ and
  let the order of $g$ be $o$ (Thus, $|C_g| = o$).
- *Case 1:* If $p$ divides $o$, then there is a power of $g$ with order $p$ (Let $o' \equiv o/p$. Consider $g^{o'}$; this has order $p$).
- *Case 2:* If $p$ does not divide $o$. Then $p$ divides the order of the
  quotient $G' \equiv G / C_g$. Thus by induction, we have an element $h C_g \in G / C_g$ of order $p$.
- Let $o$ be the order of $h$ in $G$. Then we have that that $(h C_g)^o =  h^o C_g = e C_g$, where the last equality follows from the assumption that
  $o$ is the order of $h$. Thus we can raise $h C_g$ to $o$ get the identity in $G/C_g$. This implies $p$ (the order of $h G/C_g$)
  must divide $o$ (the order of $h$).
- Thus, by an argument similar to the previous, there is some power of $h$ with order $p$. (Let $o' \equiv o/p$. Consider $h^{o'}$' this has order $p$)

- **General case:** consider the center $Z$. If $p$ divides $|Z|$, then use the abelian case to find an element of order $p$ and we are done.
- Otherwise, use the class equation: $|G| = |Z| + \sum_{j_i} |Orb(j_i)|$.
- The LHS vanishes modulo $p$, the RHS has $|Z|$ which does not vanish. Thus there is some term $j_i$ whose orbit is not divisible modulo $p$.
- We know that $Orb(j_i) = G/Stab(j_i)$ where the action is conjugacy. Since the LHS is not divisible by $p$, while $|G|$ is divisible by $p$,
  this means that $Stab(j_i)$ has order divisible by $p$ and is a subgroup of $G$.
- Further, $Stab(j_i)$ is a proper subgroup as $Orb(j_i)$ is a proper orbit, and is thus not stabilized by every element of the group.
- Use induction on $Stab(j_i)$ to find element of order $p$.

#### Subgroups of p-group

- Let $G$ be a finite $p$ group. So $|G| = p^N$. Then  $G$ has a normal subgroup of size $p^l$ for all $l \leq N$.
- Proof by induction on $l$.
- For $l = 0$, we have the normal subgroup $\{ e \}$.
- Assume this holds for $k$. We need to show it's true for $l \equiv k + 1$.
- So we have a normal subgroup $N_k$ of size $p^k$. We need to establish a subgroup $N_l$ of size $p^{k+1}$.
- Consider $G/N_k$. This is a $p$-group and has cardinality $p^{N-k}$. As it is a $p$-group, it has non-trivial center.
  So, $Z(G/N_k)$ is non-trivial and has cardinality at least $p$.
- Recall that every subgroup of the center is normal. This is because the center is fixed under conjugation, thus
  subgroups of the center are fixed under conjugation and are therefore normal.
- Next, by Cauchy's theorem, there exists an element $z$ of order $p$ in $Z(G/N_k)$. Thus, there is a normal subgroup $\langle z \rangle \subset G/N_k$
- We want to pull this back to a normal subgroup of $G$ of order $|\langle z \rangle \cdot N_k| = p^{k+1}$.
- By correspndence theorem, the group $\langle z \rangle \cdot N_k$ is normal in $G$ and has order $p^{k+1}$. Thus we are done.


# Sylow Theorem 1

I've always wanted a proof I can remember, and I think I've found one.

- Let $G$ be a group such that $|G| = p^n m $ where $p$ does not divide $m$.
- We start by considering the set of all subsets of $G$ of size $p^n$. Call this set $\Omega$.
- We will prove the existence of a special subset $S \subseteq G$ such that
  $S \in \Omega$, and $|Stab(S)| = p^n$. That is, $|S| = p^n$ and $|Stab(S) = p^n$.
  This is somewhat natural, since the only way to get subgroups out of actions is to
  consider stabilizers.
- We need to show the existence of an $S \in \Omega$ such that $Stab(S)$ has maximal cardinality.

#### Lemma: $\binom{pa}{pb} \equiv_p \binom{a}{b}$:

- this is the coefficient of $x^{pb}$ in $(x + 1)^{pa}$.
  But modulo $p$, this is the same as the coefficient of $x^{pb}$ in $(x^p + 1^p)^a$. The latter is $\binom{a}{b}$.
  Thus, $\binom{ap}{bp} \equiv_p \binom{a}{b}$ (modulo $p$).

#### Continuing: Size of $\Omega$ modulo $p$:
- Let us begin by considering $|\Omega|$. This is $\binom{p^n m}{p^n}$ since we pick all subsets of size $p^n$ from $p^n$ m.
  See that if we want to calculate $\binom{pa}{pb}$, this is the coefficient of $x^{pb}$ in $(x + 1)^{pa}$.
  But modulo $p$, this is the same as the coefficient of $x^{pb}$ in $(x^p + 1^p)^a$. The latter is $\binom{a}{b}$.
  Thus, $\binom{ap}{bp} \equiv_p \binom{a}{b}$ (modulo $p$).
  Iterating the lemma shows us that $\binom{p^n m}{p^n} = m$. Thus, $p$ does not divide $|\Omega|$, since $m$ was the $p$-free part of $|G|$.
- This implies that there is some orbit $O \subset \Omega$ whose size is not divisible by $p$.
  --- Break $\Omega$ into orbits. Since the left hand side $|\Omega|$ is not divisible by $p$,
  there is some term in the orbits size that is not divisible by $p$.
- Let the orbit $O$ be generated by a set $S \in \Omega$. So $O = Orb(S)$. Now orbit
  stabilizer tells us that $|Orb(S)| \cdot |Stab(S)| = |G|$. Since $|O = Orb(S)|$ is not divisible by $p$,
  this means that $Stab(S)$ must be of size at least $p^n$. It could also have some divisors of $m$ inside it.
- Next, we will show that $Stab(S)$ can be at most $p^n$.

#### Lemma: size of stabilizer of subset when action is free:

- Let a group $G$ act freely on a set $S$. This means that for all group elements $g$, if for any $s$
  we have $g(s) = s$, then we must have $g = id$. In logic, this is: $\forall g, \exists s, g(s) = s \implies g = id$.
- See that an implication of this is that for any two elements $s, t \in S$, we can have at most one $g$ such that $g(s) = t$.
  Suppose that we have two elements, $g, h$ such that $g(s) = t$ and $h(s) = t$. This means that $g^{-1}h(s) = s$. But we know that
  in such a case, $gh^{-1} = id$ or $g = h$.
- What does this mean? it means that $Stab(s) = \{ e \}$ for all $s$.
- Now let's upgrade this to subsets of $S$. Let $P$ (for part) be a subset of $S$. What is $|Stab(P)|$? We want to show that it
  is at most $P$. Let's pick a unique basepoint $p_0 \in P$ [thus $p_0 \in S$ since $P \subseteq S$].
- Let's suppose that $g \in Stab(P)$. This means that $g(p_0) \in P$. Say it sends $p_0$ to $p_g \in P$.
  Now no other element of $Stab(P)$ can send $p_0$ to $p_g$ since the action is free!
- Thus, there are at most $|P|$ choices for $p_0$ to be sent to, one for each element of $Stab(P)$.
- Thus, $|Stab(P)| \leq |P|$.

#### Continuing: Showing that $|Stab(S) = p^n$.

- Since the action of $G$ on $G$ is free, and since we are considering the stabilizer of some subset $S \subseteq G$,
  we must have that $|Stab(S) \leq |S| = p^n$. Thus, since $|S| \geq p^n$ (from the orbit argument above) and $|S| \leq p^n$ (from the stabilizer
  argument), we have $|Stab(S) = p^n$. Thus we are done.

- More explicitly perhaps, let us analyze $|Stab(S)|$. We know that $Stab(S) \cdot S = S$.
  Thus, for any $t \in S$, we know that $Stab(S) \cdot t \subseteq S$.
  Thus, $|Stab(s) \cdot t| \leq |S|$.

- Also notice that $|Stab(S) \cdot t$ is a coset of $Stab(S)$. Thus, $|Stab(S) \cdot t| = |Stab(S)|$.


Combining the above, we find that $|Stab(S)| \leq |S|$. So the stabilizer of
size $|S| = p^k$ it is in some sense "maximal": it has the largest size a
stabilizer could have!



# Fuzzing book

- Statement coverage is different from branch coverage, since an `if (cond) { s1; } s2` will say that `s1` and `s2` were executed when
  `cond=True`, so we have full statement coverage. On the other hand, this does not guarantee full branch coverage, since we have not
  exectuted the branch where `cond=False`. We can't tell that we haven't covered this branch since *there is no statement* to record that
  we have taken the `else` branch!

- Branch distances: for conditions `a == b`, `a != b`, `a < b`, `a <= b`, define the "distance true/distance false" to be the number that is to be
  added/subtracted to `a` to make the condition true/false (for a fixed `b`). So, for example, the "distance true" for `a == b` is `abs(b - a)`,
  while "distance false" is `1 - int(a == b)`.

- **What are we missing in coverage?** The problem here is that coverage is unable
  to evaluate the quality of our assertions. Indeed, coverage does not care
  about assertions at all. However, as we saw above, assertions are an
  extremely important part of test suite effectiveness. Hence, what we need is
  a way to evaluate the quality of assertions.

- **Competent Programmer Hypothesis / Finite Nbhd Hypothesis**: Mutation
  Analysis provides an alternative to a curated set of faults. The key insight
  is that, if one assumes that the programmer understands the program in
  question, the majority of errors made are very likely small transcription
  errors (a small number of tokens). A compiler will likely catch most of these
  errors. Hence, the majority of residual faults in a program is likely to be
  due to small (single token) variations at certain points in the structure of
  the program from the correct program (This particular assumption is called
  the Competent Programmer Hypothesis or the Finite Neighborhood Hypothesis).


- **Equivalent mutants**: However, if the number of mutants are sufficiently
  large (say > 1000), one may choose a smaller number of mutants from the alive
  mutants randomly and manually evaluate them to see whether they represent
  faults. We choose the sample size by
  [sampling theory of binomial distributions](https://www.itl.nist.gov/div898/handbook/prc/section2/prc242.htm).
- **Chao's estimator**: way to estimate the number of true mutants (and hence
  the number of equivalent mutants) is by means of Chao's estimator:

$$
hat M \equiv
\begin{cases}
M(n) + k_1^2 / (2 k_2) & \text{if } k_2 > 0 \\
M(n) + k_1(k_1 - 1)/2 & \text{otherwise} \\
\end{cases}
$$

- $k_1$ is the number of mutants that were killed exactly once, $k_2$ is the number of mutants that were
  killed exactly twice. $\hat M$ estimates the the true numbe of mutants.
- If $T$ is the total mutants generated, then $T - M(n)$ represents **immortal** mutants.
- $\hat M$ is the  is the mutants that the testset can detect given an infinite amount of time.


# Fisher Yates

- We wish to generate a random permutation.
- Assume we can generate a random permutation of $[a, b, c]$.
- How do we extend this to a random permutation of $[w, x, y, z]$?
- Idea: (0) Consider $\{w, x, y, z\}$ in a line. Our random permutation is $[\_, \_, \_, \_]$.
- (1) Decide who goes at the rightmost blank. Suppose $y$. Then our random permutation state is $[\_, \_, \_, y]$.
- (2) What do we have left? We have $\{w, x, z \}$. Use recursion to produce a random permutation of length 3 with $[w, x, z]$.
- (3) Stick the two together to get a full random permutation.
- To save space, we can write this on a "single line", keeping a stick to tell us which part is the "set", and which part is the "array":

```
0. {w,  x,  y, z}[]
1. {w, x, y, [z}] (growing array)
1. {w,  x,  z, [y}] (swapping z<->y)
1. {w,  x,  z}, [y] (shrinking set)
```


Similarly, for the next round, we choose to swap `w` with `z` as follows:

```
1. {w,  x,  z}, [y]
2. {w, x,  [z}, y] (grow array)
2. {x,  z, [w}, y]  (swap w <-> x)
2. {x, z}, [w, y] (shrinking set)
```

For the next round, we swap `z` with `z` (ie, no change!)

```
2. {x,  z}, [w, y]
3. {x, [z}, w, y] (grow array)
3. {x, [z}, w, y] (swap z<->z)
3. {x},[z, w, y] (shrink set)
```

Finally, we swap `x` with `x`:


```
3. {x},  [z, w, y]
4. {[x}, z, w, y] (grow array)
3. {x}, [z, w, y] (swap x<->x)
3. {}[x, z, w, y] (shrink set)
```

- This way, we generate a random permutation _in place_, by treating the left portion of the sequence as a set, and the right portion of the
  sequence as a sorted permutation. At each stage, we grow the array, and choose a random element from the set to "enter" into the array
  at the location of intersection between set and array.

- In code, the index `i` tracks the location of the array border, where we must fix the value of the permutation (ie, the ordering of elements)
  at the `i`th location.
  Th index `r` is a random index chosen  in `[0, i]` which is the element to be chosen as the value at the `i`th location.

```py
@composite
def permutation(draw, n):
    # raw random
    # Fishes Yates: https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle
    xs = { i : xs[i] for i in range(n) }

    i = n-1 # from (n-1), down to zero.
    while i >= 0:
        r = draw(integers(0, i)) # r โˆˆ [0, i]
        temp = xs[i]; xs[i] = xs[r]; xs[r] = temp; # swap
        i -= 1
    return xs
```


# Bucchberger algorithm

- multidegree: term of maximum degree, where maximum is defined via lex ordering.
- Alternatively, multidegree is the degree of the leading term.
- If `multideg(f) = a` and `multideg(g) = b`, define `c[i] = max(a[i], b[i])`. Then $\vec x^c$ is the LCM of the leading monomial
  of $f$ and the leading monomial of $g$.
-  The S-polynomial of $f$ and $g$ is the combination $\vec (x^c/LT(f)) f - (\vec x^c/LT(g)) g$
- The S-polynomial is designed to create cancellations of leading terms.


#### Bucchberger's criterion

- Let $I$ be an ideal. Then a basis $\langle g_1, \dots, g_N \rangle$ is a Groebner basis iff for all pairs $i \neq j$, $S(g_i, g_j) = 0$.
- Recall that a basis is an Grober basis iff $LT(I) = \langle LT(g_1), \dots, LT(g_N) \rangle$. That is, the ideal of leading terms of $I$
  is generated by the leading terms of the generators.

- for a basis $F$, we should consider $r(i, j) \equiv rem_F(S(f_i, f_j))$. If $r(i, j) \neq 0$, then make $F' \equiv F \cup \{ S(f_i, f_j \}$.
- Repeat till we find that




# GAP permutation syntax
- The action of permutation on an element is given by $i^p$. This is the "exponential notation" for group actions.
- See that we only ever write permutations multiplicatively, eg `(1) (23)` is
  the composition of permutations [written multiplicatively].
- Thus the identity permutation must be `1`, and it's true that any number `n^1 = n`,
  so the identity permutation `1` fixes everything.

# Why division algorithm with multiple variables go bad

- In `C[x, y]`, defining division is complicated, and needs grobner bases to work.
- It's because they don't obey the GCD property. Just because `gcd(a, b) = g` does not mean that there exist `k, l` such that `ak + bl = g`
- For example, in `C[x, y]`, we have `gcd(x, y) = 1` but we don't have polynomials `k, l` such that `kx + ly = 1`.
- Proof: suppose for contradiction that there do exist `k, l` such that `kx + ly = 1`. Modulo `x`, this means that `ly = 1` which is absurd,
  and similarly modulo `x` it means `kx = 1` which is also absurd.

# Integral elements of a ring form a ring [TODO]

- An integral element of a field $L$ (imagine $\mathbb C$)
  relative to an integral domain $A$ (imagine $\mathbb Z$) is the root of a monic polynomial in $A$.

- So for example, in the case of $\mathbb C$ over $\mathbb Z$, the element $i$ is integral as it is a root of $p(x) = x^2 + 1$.
- On the other hand, the element $1/2$ is not integral. Intuitively, if we had a polynomial of which it is a root,
  such a polynomial would be divisible by $2x - 1$ (which is the minimal polynomial for $1/2$). But $2x - 1$ is not monic.
- Key idea: take two element $a, b$ which are roots of polynomial $p(x), q(x) \in A[x]$.
- Create the polynomial $c(x)$ (for construction) given by $c(x) \equiv p(x)q(x) \in A[x]$. See that $c(x)$ has both $a$ and $b$
  as roots, and lies in $A[x]$.

# "Cheap" proof of euler characteristic

- If we punch a hole in a sphere, we create an edge with no vertex or face. This causes $V - E + F$ to go down by 1.
- If we punch two holes, that causes $V - E + F$ to go down by two. But we can glue the two edges together.
  This gluing gives us a handle, so each hole/genus reduces the euler characteristic by two!

# Siefert Algorithm [TODO]

- Algorithm to find surface that a knot bounds.
- If we find a surface, then the genus of the boundary is one minus the genus of the surface.
- Compute genus via classification of surfaces.

# Cap product [TODO]
- https://www.youtube.com/watch?v=oxthuLI8PQk

- We need an ordered simplex, so there is a total ordering on the vertices. This is to split a chain apart at number $k$.
- Takes $i$ cocahins and $k$ chains to spit out a $k - i$ chain given by $\xi \frown \gamma \equiv \sum_a \gamma_a \xi (a_{\leq i}) a_{\geq i}$.
- The action of the boundary on a cap product will be $\partial (\xi \frown \gamma) \equiv (-1)^i [(\xi \frown \partial \gamma) - (\partial \gamma \frown \gamma)]$.
- Consequence: cocycle cap cycle is cycle.
- coboundary cap cycle is boundary.
- cocyle cap boundary is boundary.
- Cap product will be zero if the chain misses the cochain.
- Cap product will be nonzero if the chain *must* always intersect the cochain.
- This is why it's also called as the intersection product, since it somehow counts intersections.

# Cup product [TODO]

- We need an ordered simplex, so there is a total ordering on the vertices. This is to split a chain apart at number $k$.
- Can always multiply functions together. This takes a $k$ chain $\xi$ and an $l$ chain $\eta$ and produces $\xi \cup \eta$ which is a $k + l$
  cochain. The action on a $(k+l)$ chain $\gamma$ acts by $(\xi \cup \eta)(\gamma) \equiv \xi (\gamma_{\leq k}) \cdot \eta (\gamma_{> k})$.
- No way this can work for chains, can only ever work for cochains.
- This cup product "works well" with coboundary. We have $\partial (\xi \cup \eta) \equiv (\partial \xi \cup \eta) + (-1)^k (\xi \cup \partial \eta)$.
- We get cocycle cup cocyle is cocycle.
- Similarly, coboundary cup cocycle is coboundary.
- Simiarly, cocycle cup coboundary is coboundary.
- The three above propositions imply that the cup product descends to cohomology groups.
- The _algebra_ of cohomology (cohomology plus the cup product) sees the difference between spaces of identical homology!
- The space $S^1 \times S^1$ have the same homology as $S^2 \cap S^1 \cap S^1$. Both have equal homology/cohomology.
- However, we will find that it will be zero on the torus and non-zero on other side.
- The cup product measures how the two generators are locally product like. So if we pick two generators on the torus, we can find a triangle
  which gives non-zero


# Colimits examples with small diagram categories

- Given a colimit, compute the value as taking the union of all objects, and imposing the relation $x \sim f(x)$
  for all arrows $f \in Hom(X, Y)$ and all $x \in X$.

- A colimit of the form $A \xrightarrow{f} B$ is computed by taking $A \sqcup B$ and then imposing the relation $a \sim f(b)$. This is entirely useless.
- A colimit of the form $A \xrightarrow{f, g} B$ is computed by taking $A \sqcup B$ and then imposing the relation $a \sim f(a)$ as well as $a \sim g(a)$.
  Thus, this effectively imposes $f(a) \sim g(a)$. If we choose $f = id$, then we get $a \sim g(a)$. So we can create quotients by taking the colimit
  of an arrow with the identity.
- A colimit of the form $A \xleftarrow{f} B \xrightarrow{g} C$ will construct $A \cup B \cup C$ and impose the relations $b \sim f(b) \in A$ and $b \sim g(b) \in C$.
  Thus, we take $A, B, C$ and we glue $A$ and $C$ along $B$ via $f, g$. Imagine gluing the upper and lower hemispheres of a sphere by a great circle.

# Limits examples with small diagram categories

- Given a limit, compute the value as taking product of all objects, and taking only those tuples which obey the relations
  the relation $f(a) = b$ for all arrows $f \in Hom(X, Y)$.

# Classification of compact 2-manifolds [TODO]

- Oriented compact 2-surfaces: sphere, torus, 2 holed torus, etc.
- have euler characteristic $V - E + F $ as $2 - 2g$
- Strategy: cut surface into polygonal pieces. Use oriented edges to know cutting. Lay them down on the surface such that the "top part" or
  "painted surface" will be up [so we retain orientation].
- Attach all the polygons into one big polygon on the plane.
- For each edge on boundary of the big polygon, it must attach to some other boundary of the big poygon [since the manifold is compact].
  Furthermore, this edge must occur in the *opposite direction* to make the surface orientable. Otherwise we could pass through the side
  and flip orientation. Consider:

```
>>>>
|  |
>>>>
```

- When I appear from the "other side", my direction wil have flipped. [TODO]

- So far, we know the edges. What about identifying vertices?

- Next, we need to group vertices together on the big polygon. We can find this by going *around the edges incident at the vertex*
  on the *manifold surface*.


- The next step is to reduce the number of vertices to exactly one. We can cut the current polygon and re-paste it as long as we preserve
  all cutting/pasting relations.

- Suppose I glue all the B vertices to a single vertex. Then, the edges emenating from this B vertex _must necessarily be the same_.
  If not, then the edge emenating would need a complementary edge somewhere else, which would give me another "copy" of the B vertex.

- I can imagine such a B vertex as being "pushed inside the polygon" and then "glued over itself", thereby making it part of the *interior*
  of the polygon.

- We can repeat this till there is only one type of vertex (possibly multiple copies).
- If we only had two adjacent edges [edges incident against the same vertices], then we are done, since we get a sphere.
- We can always remove adjacent pairs of edges. What about non-adjacent pairs?
- Take a non adjacent pair. think of these as "left" and "right". We claim that for each edge at the "top", there is a corresponding
  edge at the "bottom". So we have left and right identified, and top identified with a continugous segment in the bottom. If there  wasn't,
   then we would need another vertex!
- This lets me create a commutator on the boundary, of the form
  $cdc^{-1}d^{-1}x$. Topologically, this is a handle, since if it were "full"
  [without the extra $x$], then we would have a torus. Since we do have the
  $x$, we have a "hole on the torus" which is a handle.
- We keep removing hanldes till we are done.

#### Why does euler characteristic become $2-2g$?
- If we add a vertex on an edge, we add a vertex and subrtact the (new) edge we have created. Thus $\xi$ is unchanged on adding a vertex on an edge.
- Joining two vertices on a face also does not change $\xi$, since we add an edge and a face.
- Given any two subdivisios, we find a common finer subdivision by these steps. Since the steps we use retain the euler characteristic,
  finally my original subdiv = common subdiv = friend subdiv.
- Key idea: at each crossing between our subdivsion and the other subdivision, make a new vertex at every crossing. Then "trace over" the
  other subdivision to make our subdivision agree on the other subdivision on the inside.

https://www.youtube.com/watch?v=dUOmU-0t2Nc&list=PLIljB45xT85DWUiFYYGqJVtfnkUFWkKtP&index=27

# Gauss, normals, fundamental forms [TODO]

- consider a parametrization $r: u, v \to \mathbb R^3$
- at a point $p = r(u, v)$ on the surface, the tangent vectors are $r_u \equiv \partial_u r$ and similarly $r_v \equiv \partial_v r$.
- Let $k = xr_u + y r_v$. Then $k \cdot k$ is the **first fundamental form**. Computed as
  $k= (xr_u + y r+v) \cdot (x r_u + y r_v)$. Write this as $E x^2 + 2F x y + G y^2$.  These numbers depend on the point $(u, v)$,
  or equally, depend on the point $p = r(u, v)$.
- Further, we also have a normal vector to the tangent plane.$N(p)$ is the unit normal pointing outwards. We can describe it in terms
  of a parametrization as $n \equiv r_u \times r_v / ||r_u \times r_v||$.
- Gauss map / Gauss Rodrigues map ($N$): map from the surface to $S^2$. $N$ sends a point $p$ to the unit normal at $p$.
- The tangent plane to $N(p)$ on the sphere is parallel to the tanent plane on the surface at $p$, since the normals are the same,
  as that is the action of $N$ which sends the normal at the surface $p \in S$ to a point of the sphere / normal to the sphere.
- Thus, the the derivative intuitively "preserves" tangent planes! [as normal directions are determined].
- If we now think of $dN$, it's a map from $T_p S$ to $T N(p) S^2 = T_p S$. Thus it is a map to the tangent space to _itself_.
- In terms of this, gauss realized that gaussian curvature $K_2 = K = k_1 k_2$ is the determinant of the map $dN_p$ [ie, the jacobian].
  Curvature is the distortion of areas by the normal. So we can think of it as the ratio of areas `area of image/area of preimage`.

https://www.youtube.com/watch?v=drOldszOT7I&list=PLIljB45xT85DWUiFYYGqJVtfnkUFW

More Repositories

1

mathemagic

Toybox of explanations of mathematics. Initial focus on (discrete) differential geometry
JavaScript
276
star
2

cellularAutomata

a collection of cellular automata written in Haskell with Diagrams
Haskell
179
star
3

tiny-optimising-compiler

A tiny *optimising* compiler for an imperative programming language written in haskell
Haskell
154
star
4

sublimeBookmark

a better bookmark system for SublimeText
Python
129
star
5

teleport

A CLI in haskell to quickly move through the filesystem
HTML
108
star
6

simplexhc

compiler with polyhedral optmization for a lazy functional programming language
Haskell
67
star
7

timi

A visual interpreter of the template instantiation machine to understand evaluation of lazy functional languages
Rust
64
star
8

blaze

Haskell re-implementation of STOKE, the stochastic superoptimizer
Jupyter Notebook
62
star
9

notes

Latex notes on papers, courses, ideas: Pure math and computer science.
TeX
61
star
10

SublimeRealityCheck

Sublime text plugin for live value watching for interpreted languages
Python
36
star
11

discrete-differential-geometry

An elegant implementation of discrete diffgeo in haskell
Haskell
33
star
12

simplexhc-cpp

optimising compiler for Haskell's intermediate representation (STG) to LLVM IR
C++
31
star
13

minitt

bollu learns implementation of dependent typing
Haskell
24
star
14

lean-to

Jupyter notebook for the Lean4 programming language
C++
24
star
15

ward

WARD is a minimal, performant infinite whiteboard app for wacom tablets
C
21
star
16

coremlir

Encoding of GHC Core inside MLIR
Haskell
16
star
17

mlir-hs

Pure haskell encoding of MLIR for printing, parsing, and mutating MLIR within haskell
Haskell
15
star
18

lz

A minimal in MLIR dialect along the lines of STG to represent laziness.
LLVM
15
star
19

koans

Short pieces of code that are "plays" - mostly haskell, sometimes math / other things
Haskell
13
star
20

linker-koans

Snippets that explore how linkers work, one flag at a time.
Makefile
11
star
21

rete

An implementation of the rete algorithm from 'Production Matching for Large Learning Systems'
C++
11
star
22

polymage

PolyMage is a domain-specific language and optimizing code generator for auto-parallelisation
Python
10
star
23

hask-error-messages-catalog

A catalog of broken Haskell programs to improve error messages
9
star
24

diffgeo

A formalization of synthetic differential geometry in Coq using infinitesimal analysis
Coq
9
star
25

quantum-course-exercises

Solutions to coursework in Q#
C#
9
star
26

soundsynth

Bollu learns physically based sound sythesis
C++
7
star
27

w

algorithms implemented in C++, written in the arthur whitney style
C++
7
star
28

llama.lean

Reimplementation of llama.cpp in Lean4
C
7
star
29

IIIT-H-Code

code written for assignments and whatnot at IIIT-H
C
7
star
30

myriad

A library for manifold algorithms, as I learn discrete diferential geometry and general relativity
Haskell
5
star
31

dependence-analysis-hask

Dependence Analysis for Haskell code using the polyhedral framework
5
star
32

elide

Elide: Elegant Metamodal Lean4 IDE.
C++
5
star
33

polyir

A semantics for the types of loops that can be modelled by polyhedral compilation techniques, developed in Coq.
Coq
5
star
34

TaleOfTwoDialects

nontrivial lowering examples for MLIR that are ignored by the MLIR tutorials
C++
5
star
35

software-foundations-solutions

My solutions to the software foundations book
HTML
5
star
36

qoc

Quite Obfuscated Constructions
Haskell
4
star
37

equinox

game to experiment with Rust, Carmack's ideas
Rust
4
star
38

lean4-entemology

Where we collect lean4 bugs
Lean
4
star
39

minos

There are many OSes, this one is mine
Makefile
4
star
40

mlir-hoopl-rete

rewrites for MLIR with hoopl / rete
MLIR
4
star
41

smol

smol IDE for a smol language that permits insane static analysis because smol
C
4
star
42

shakuni

An exploration of minimality and parallelism in probabilstic programming languages.
Jupyter Notebook
4
star
43

pico-mlir

A mini language written using MLIR + MAKEFILES! so you get to see all the commands, no CMake magic.
C++
4
star
44

polybench-c

PolyBench/C from http://web.cse.ohio-state.edu/~pouchet/software/polybench/
C
3
star
45

pisigma

A reference copy of PiSigma: dependent types with without the sugar
Haskell
3
star
46

lean.egraphs

Egraphs & ematching in Lean
C++
3
star
47

biter

library / CLI as a swiss-army knife for low level bit fiddling debugging.
Haskell
3
star
48

fbip-demos

Demos to test out Lean's functional but in place.
Lean
3
star
49

hugs

A copy of the hugs haskell98 implementation; hoping to eliminate bitrot
Haskell
3
star
50

sdl2.lean

bindings to SDL2 (Simple DirectMedia library) in Lean
C
3
star
51

dotfiles

my dotfiles for easy access
Vim Script
3
star
52

master-thesis

My master's thesis on NLP and representation learning
TeX
3
star
53

SCEV-coq

LLVM's loop analysis theory (Scalar Evolution) formalized in Coq
Makefile
3
star
54

warren

The warren abstract machine for Pascal, in Hakell
Haskell
3
star
55

functionalconf-2019-slides-probabilistic-programming

Slides for my talk at functional conf 2019 on probabilistic programming
Haskell
3
star
56

polly

A personal fork of the Polly-LLVM project
C
2
star
57

ppcg

A fork of the original PPCG with debug code: http://repo.or.cz/w/ppcg.git
C
2
star
58

mips-bsv

an implementation of a MIPS processor in BlueSpec System Verilog
Bluespec
2
star
59

haskell-tutorial

Files for a haskell tutorial I'm teaching at IIIIT-Hyderabad
Haskell
2
star
60

freejit

Try to JIT Free monads in Haskell.
Haskell
2
star
61

slides-haskell-exchange-2020-smallpt

Slides for haskell exchange 2020 talk on smallpt
TeX
2
star
62

llvm

A fork of the LLVM project for personal use
LLVM
2
star
63

dataflow

A view of dataflow architectures, with a modern haskell perspective
Haskell
2
star
64

paper-deltas

Deltas: An algebraic theory of diffs in haskell
TeX
2
star
65

hask-lisp-interp

Lisp interpreter in Haskell
Haskell
2
star
66

alok-bollu

A repo for work between Alok Debnath and Siddharth Bhat
C
2
star
67

sicm

structure and interpretation of classical mechanics
Scheme
2
star
68

CASette

Mixtape of computer algebra system (CAS) algorithms
Lean
2
star
69

captainslog

Documenting the PhD slog, one day at a time
2
star
70

amalgam

amalgam ~ composite | A small library for interactive symbolic number theory explorations in haskell
Haskell
2
star
71

gde-game

Game on using text generation to trigger empathy
Python
2
star
72

unification

polymorphic type inference with unification from the Dragon book
C++
2
star
73

warren-cpp

An implementation of warren, the abstract machine for Prolog. Is a transcription of the lecture notes "warren's abstract machine a tutorial reconstruction"
C++
2
star
74

proGenY

procedurally generated 2d shooter
C++
2
star
75

lent-2024-logic-and-proof

Lean notes for "Logic & Proof" : Cambridge Tripos Part 1B, Lent 2024
Lean
1
star
76

clisparkline

Tiny haskell library to prettyprint sparklines onto the CLI!
Haskell
1
star
77

polybench-hs

Polybench HS
C
1
star
78

tabledtypeclass

tabled typeclass resolution implementation
C++
1
star
79

functional-fluids-on-surfaces

implementation of the paper "functional fluids on surfaces"
Python
1
star
80

sunnyside

Equality saturation for fun and profit
Rust
1
star
81

optics

optics and refraction simulation in C++
C
1
star
82

gutenberger

fast vectorized presburger automata
Haskell
1
star
83

hs-stockfighter

Haskell bindings to Stockfighter using Servant
Haskell
1
star
84

decompile-transformer

The one where bollu decompiles attention models
Jupyter Notebook
1
star
85

musquared

Demand-agnostic managed language compiler using MLIR
Haskell
1
star
86

smallpths

Smallpt rewrite that's fast!
Haskell
1
star
87

tinyfort

Minimal fortran-ish language with LLVM backend, written for a compilers course
C++
1
star
88

languagemodels

Me messing around with language models, trying to make NLP run on commodity hardware with weird ideas.
C++
1
star
89

haikus

detect haikus
Python
1
star
90

lean-koans

A dumping ground for short Lean programs that demonstrate a point: a kลan
Lean
1
star
91

geometric-algebra

Implementation of geometric algebra primitives
Haskell
1
star
92

propogators-coq

A formalisation of propogators as ekmett speaks about them on the livestream: https://www.twitch.tv/ekmett
Coq
1
star
93

pegasos-svm

An implmentation of the pegasos SVM learning algorithm
Python
1
star
94

ppsspp-help

Help for ppsspp
CSS
1
star
95

competitive

Competitive coding solutions
C++
1
star
96

prettyprinter-core

quchen's prettyprinter library, stolen and stripped of other code for GHC.
Haskell
1
star
97

ghc-asterius

For of terrorjack/GHC to hack on austerius
Haskell
1
star
98

lispInterpreter

A lisp interpreter in C++ for fun :)
C++
1
star
99

FPGA-playground

Code written using BlueSpec Verilog, general FPGA messing around for my course
Bluespec
1
star
100

absint

abstract interpreters for a tiny SSA language in haskell
Haskell
1
star