A fact that was initially surprising to me is that stochastic differential equations can actually be useful in solving discrete problems. The Lovett–Meka proof of Joel Spencer’s classic discrepancy theorem is an elegant showcase of this idea. Today we will discuss Spencer’s theorem and a proof based on Brownian motion!
Spencer's theorem.
Let \(A\) be an \(n \times n\) matrix with entries in \([-1,1]\).
Then there is a vector \(x \in \{\pm 1\}^n\) such that
\[ \|Ax\|_\infty \leq O(\sqrt n). \]
Remark. If one chooses \(x\) uniformly at random from \(\{\pm 1\}^n\), then one may see via concentration with union bound that \(\|Ax\|_{\infty} \le O(\sqrt{n\log n})\) with high probability.
Spencer proved the above statement with a constant of $6$, but had a nonconstructive proof. Nikhil Bansal later gave the first polynomial-time algorithm matching Spencer’s bound up to constants. There have been many proofs and algorithms since then.
The algorithm
Consider the polytope $\mathcal{P}$ described by $\{x:x\in[-1,1]^n,\|Ax\|_{\infty}\le C\sqrt{n}\}$ where $C$ is some constant we choose later. We may also describe $\mathcal{P}$ via the constraints $-1\le x_i\le 1$ and $|\langle a_i, x\rangle| \le C\sqrt{n}$ where $a_i$ refers to the $i$-th row of $A$. We will refer to the form type of constraints as coordinate constraints and the latter as discrepancy constraints. We will also refer to the bound enforced on $|\langle a_i, x\rangle|$ as the tolerance. Start at $ X_0 = 0 \in \mathbb R^n, $ and evolve $X_t$ according to a Brownian motion until it hits a face of the polytope $\mathcal{P}$. Once it hits a face, continue running Brownian motion restricted to the face until it hits a lower dimensional face. Then restrict to the lower dimensional face, and so on.
Concretely, the local rule to evolve $X_t$ is:
$$ \mathrm{d} X_t = \Pi(X_t)\mathrm{d}B_t, $$where $\Pi(X_t)$ refers to the projection onto the space orthogonal to all the coordinate and discrepancy constraints that are tight.
If all coordinates of this walk hit $\pm 1$, then we are done. However, in the presence of discrepancy constraints, it is a bit unclear how to argue that one actually reaches a hypercube vector upon running the process for sufficiently long. (I am actually not even sure that is true.)
That makes the algorithm a tad bit more complicated. Instead, we run the above algorithm until $n/2$ constraints are saturated, then enlarge the polytope by relaxing the discrepancy constraints a bit, resume the random walk, and repeat.
Algorithm (Phased Discrepancy Walk).
- Start with $X_0^{(0)} = 0 \in \mathbb{R}^n$, a phase counter $r$ initialized at $0$, and let $C > 1$ and $\lambda < 1$ be "hyperparameters" we choose later.
- Let $n_r$ be the number of unsaturated coordinate constraints of $X_t$. Run sticky Brownian motion initialized at $X_r^{(0)}$ where phase $r-1$ ended until an additional $n_r/2$ constraints of $\mathcal{P}$ are saturated. If $n_r < \sqrt{n}$, saturate the remaining coordinate constraints arbitrarily.
- If number of new saturated coordinate constraints is less than $0.1n_r$, reset to $X_{r}^{(0)}$ and go back to previous step. Else, increment the tolerance by $\lambda^r C\sqrt{n}$, increment $r$ by $1$, and perform previous step.
We will show that in each phase, most of the saturated constraints are indeed coordinate constraints with large constant probability.
Concretely, we want to prove:
Phase lemma.
There is a universal constant $\alpha>0$ such that in each phase, with constant probability, at least an $\alpha$ fraction of the active coordinate constraints become saturated.
A consequence of the above is that after $O(\log n)$ phases, all coordinates are frozen. At the end, $X_t \in \{\pm 1\}^n$, and the total discrepancy is $\sum_{r\ge 0}\lambda^r C\sqrt{n} = O(\sqrt{n})$.
Proof of Phase lemma
Fix some phase $r$, and let $k\coloneqq n_r$. At the start of a phase, there are no discrepancy constraints saturated as they were just relaxed. Our proof proceeds by comparing to another process $(Y_t)_{t\ge 0}$ that is easier to analyze, that agrees with $X_t$ up to the stopping time. $Y_0$ is initialized at $X_0^{(r)}$. The evolution of $Y_t$ is described by:
$$ \mathrm{d}Y_t = \widetilde{\Pi}_t\mathrm{d}B_t, $$where $\widetilde{\Pi}_t = \Pi(X_t)$ until the time $t_*$ when $n_r/2$ constraints are satisfied, remains equal to $\widetilde{\Pi}_{t_*}$ for the rest of time (and additionally is $0$ once $\|Y_t\| = 2\sqrt{n}$: this last part is really necessarily only to maintain a bound on the fluctuation of $\|Y_t\|_2^2$, but whatever). In particular, after $X_t$ has saturated $n_r/2$ constraints, the process $Y_t$ keeps going, but it only respects those $n_r/2$ constraints that were already saturated, and possibly violates other constraints.
So we prove two things.
- The process hits $n_r/2$ constraints before time $10$ with high probability.
- The comparison process $Y_{10}$ violates only a tiny fraction of the discrepancy constraints in expectation.
Then by Markov’s inequality, with reasonably large constant probability, $Y_{10}$ violates only a small number of discrepancy constraints. On that event, $X_t$ also saturated only a small number of discrepancy constraints before stopping. Since $X_t$ saturated $n_r/2$ constraints total, most of them must have been coordinate constraints.
Concretely, let’s first prove that by time $10$, the phase should have ended. We have: $ \mathrm{d}\|Y_t\|_2^2 = 2\langle Y_t,\Pi(Y_t)\mathrm{d}B_t\rangle + \operatorname{tr}(\widetilde{\Pi}_t)\,\mathrm{d}t. $ By design, we have $ \operatorname{tr}(\widetilde{\Pi}_t) \geq \frac{n_r}{2}. $ By martingale concentration, with high probability $\|Y_t\|_2^2 > \|Y_0\|_2^2 + n_r = n$, in which case $Y_t$ is outside $[-1,1]^n$, which in particular implies that it should have decoupled from $X_t$ earlier, and the stopping condition for the phase should have been reached.
To prove that the fraction of violated discrepancy constraints is small, consider the comparison process $Y_t$ up to time $10$. The number of saturated/violated constraints for $Y_10$ is at least the number of saturated constraints for $X_{t_*} = Y_{t_*}$.
Fix a row $a_i$ of the matrix $A$. By martingale concentration the random variable $ \langle a_i,Y_{10}-Y_0\rangle $ is subgaussian with parameter $\sqrt{n_r}$. Indeed, we have $\langle a_i, \mathrm{d}Y_t\rangle = \langle \widetilde{\Pi}_t a_i, \mathrm{d}B_t\rangle$, and $\widetilde{\Pi}_t$ resides only on the unfrozen coordinates, and so $\| \widetilde{\Pi}_t a_i\| \le \sqrt{n_r}$.
Therefore, the probability that $Y_10$ saturates/violates the constraint given by $a_i$ is: $ \Pr\left[|\langle a_i,Y_{10} - Y_0 \rangle| \geq C \lambda^r \sqrt n\right] \leq \exp(-\Omega(C^2 \lambda^{2r} n/n_r)) \le \exp(-\Omega(C^2 \lambda^{2r}/0.9^r )). $ Thus, by choosing $\lambda < \sqrt{0.9}$, this probability is at most $\exp(-\Omega(C^2))\cdot O(0.9^r)$. Summing over all $n$ discrepancy constraints, the expected number of violated discrepancy constraints at time $10$ is at most $\exp(-\Omega(C^2))n_r$. By Markov’s inequality, with constant probability, the number of discrepancy constraints violated by $Y_{10}$ is at most, say, $n_r/100$.
But every discrepancy constraint that $X_t$ saturated before the stopping time is also violated or saturated by $Y_{10}$. Therefore, on the event $t_*\le 10$, the true process $X_t$ saturated at most $n_r/100$ discrepancy constraints before stopping, which in particular implies the statement we want to prove.