Bite-Sized Mathematics for Computer Science

Notes on MIT's 6.042J

Eda Eren
Table of Contents

Introduction

Prerequisites & Resources

Unit 1: Proofs

00. Introduction to Proofs

01. (Two) Proof Methods

02. The Well Ordering Principle

03. Logic & Propositions

04. Quantifiers and Predicate Logic

05. Sets

06. Binary Relations

07. Induction

08. State Machines — Invariants

09. Recursive Definitions

10. Infinite Sets

Unit 2: Structures

11. GCDs

12. Congruences

13. Euler's Theorem

14. RSA Encryption

15. Digraphs: Walks and Paths

16. Directed Acyclic Graphs

17. Partial Orders and Equivalence

18. Degrees & Isomorphism

19. Coloring & Connectivity

20. Trees

21. Stable Matching

Unit 3: Counting

22. Sums & Products

23. Asymptotics

24. Counting with Bijections

25. Repetitions & Binomial Theorem

26. Pigeonhole Principle, Inclusion-Exclusion

Unit 4: Probability

27. Intro to Discrete Probability

28. Conditional Probability

29. Independence & Causality

30. Random Variables, Density Functions

31. Expectation

33. Sampling & Confidence

34. Random Walks & PageRank

Deviation: Markov & Chebyshev Bounds

The main focus in this section is finding the probability that an estimate deviates by a given amount from its expected value.

The mean, or, the expectation, is not always good enough to predict a random variable's behavior.

When a random variable takes a value much larger than its mean, its probability can be estimated with Markov's theorem.

If RR is nonnegative, then for all x>0x \gt 0:

Pr[Rx]E[R]x\text{Pr}[R \geq x] \leq \frac{E[R]}{x}

The example given in the lectures is the disputable IQ score. The average IQ is said to be 100, and this means that only a 13\frac{1}{3} of a population can have an IQ of 300:
13300=100\frac{1}{3} \cdot 300 = 100.
Because if it were more than 13\frac{1}{3}, average would have to be more than 100.

So, the probability of having an IQ of 300 is less than or equal to the expectation divided by 300:
Pr[IQ300]E[IQ]300\text{Pr}[\text{IQ} \geq 300] \leq \frac{E[\text{IQ}]}{300}.

The average (the expectation) is 100, so:

Pr[IQ300]100300=13\text{Pr}[\text{IQ} \geq 300] \leq \frac{100}{300} = \frac{1}{3}.

As the factor of expectation increases, the probability decreases proportionally.

If we are given that IQ score is always more than or equal to 50, we can get a better bound by adjusting the formula a little, namely, shifting RR to have 00 as minimum.

So, in that case, it would look like:

Pr[IQ5030050]1005030050=15\text{Pr}[\text{IQ} - 50 \geq 300 - 50] \leq \frac{100 - 50}{300 - 50} = \frac{1}{5}.


Another example is from the Rice University's lecture notes:

Assume the expected time it takes Algorithm A to traverse a graph with nn nodes is 2n2n. What is the probability that the algorithm takes more than 10 times that?

Since we're looking for the outcome that is 10 times more than the expectation:

Pr[R10E[R]]E[R]10E[R]=110\text{Pr}[R \geq 10 \cdot E[R]] \leq \frac{E[R]}{10 \cdot E[R]} = \frac{1}{10}

So, the probability is not more than 0.100.10.


Another example from the practice exercises:

Given a random variable RR with E[R]=50E[R] = 50, give the smallest value xx such that the Markov Bound guarantees Pr[Rx]0.5\text{Pr}[R \geq x] \leq 0.5 if RR is nonnegative.

Well, in this case, we have Pr[Rx]50x=0.5\text{Pr}[R \geq x] \leq \frac{50}{x} = 0.5. Then, xx is 100100 (500.5=100\frac{50}{0.5} = 100).


The Chebyshev bound says that the probability that the distance between RR and its mean is greater than or equal to xx is the variance of RR divided by x2x^2.

The term variance stands out, it is the expectation of the square of the distance between RR and its mean, that is RE[R]R - E[R]:

E[(RE[R])2]E[\big(R - E[R]\big)^2]

Let's call it Var[R]\text{Var}[R].

So, the Chebyshev bound is:

Pr[RE[R]x]Var[R]x2\text{Pr}[|R - E[R]| \geq x] \leq \frac{\text{Var}[R]}{x^2}

The square root of variance is the standard deviation, denoted by σ\sigma:

σR=Var[R]\sigma_R = \sqrt{\text{Var}[R]}

Now, let μ\mu denote the expectation of RR, E[R]E[R]. The Chebyshev bound can then be restated as:

Pr[Rμx]σR2x2\text{Pr}[|R - \mu| \geq x] \leq \frac{\sigma_R^2}{x^2}