Bite-Sized Mathematics for Computer Science

Notes on MIT's 6.042J

Eda Eren
Table of Contents

Introduction

Prerequisites & Resources

Unit 1: Proofs

00. Introduction to Proofs

01. (Two) Proof Methods

02. The Well Ordering Principle

03. Logic & Propositions

04. Quantifiers and Predicate Logic

05. Sets

06. Binary Relations

07. Induction

08. State Machines — Invariants

09. Recursive Definitions

10. Infinite Sets

Unit 2: Structures

11. GCDs

12. Congruences

13. Euler's Theorem

14. RSA Encryption

15. Digraphs: Walks and Paths

16. Directed Acyclic Graphs

17. Partial Orders and Equivalence

18. Degrees & Isomorphism

19. Coloring & Connectivity

20. Trees

21. Stable Matching

Unit 3: Counting

22. Sums & Products

23. Asymptotics

24. Counting with Bijections

25. Repetitions & Binomial Theorem

26. Pigeonhole Principle, Inclusion-Exclusion

Unit 4: Probability

27. Intro to Discrete Probability

28. Conditional Probability

29. Independence & Causality

30. Random Variables, Density Functions

32. Deviation: Markov & Chebyshev Bounds

33. Sampling & Confidence

34. Random Walks & PageRank

Expectation

The expected value of a random variable RR is the average value of RR where the values are weighted against their probabilities.

So, the expectation of RR is the sum of all possible values vv times the probability of R=vR = v:

E[R]=vPr[R=v]E[R] = \displaystyle\sum v \cdot \text{Pr}[R = v]

It sounds a bit complicated, but let's look at an example that's given in the lecture.

Let's say that we pick a number beforehand, and we roll a dice three times. If we don't get the number we choose, we lose 1 dollar. If the number we choose comes up only once, we win 1 dollar. If it comes up twice, we win 2 dollars, and if all three rolls ends up with our number, we win 3 dollars.

Looking at the probability of each of these three cases, the probability of not getting the number we choose is (56)3\Big(\frac{5}{6}\Big)^3, which is 125216\frac{125}{216}.

The probability of getting our number once is (31)(56)2(16)\binom{3}{1} \big(\frac{5}{6}\big)^2 \big(\frac{1}{6}\big).
(Note that the order doesn't matter, we're looking at all possible sequences, that's why it is 3 choose 1 with possibility of one of them being our number and the rest different).

The probability of getting our number twice is similar: (32)(56)(16)2\binom{3}{2} \big(\frac{5}{6}\big) \big(\frac{1}{6}\big)^2.

Finally, the probability of getting our number all three times is: (16)3\big(\frac{1}{6}\big)^3.

Let's see what we have for now:

number of matches probability dollars won
0
125216\frac{125}{216}
-1
1
75216\frac{75}{216}
1
2
15216\frac{15}{216}
2
3
1216\frac{1}{216}
3

So, if we play 216 games, we expect to never get our number 125 times. We expect to get our number once 75 times, get it twice 15 times, and get it three times only once.

So, on average, what we gain is:

125(1)+75(1)+15(2)+1(3)216=172168\frac{125(-1) + 75(1) + 15(2) + 1(3)}{216} = - \frac{17}{216} \approx -8 cents

That's a negative! Which means that this is not a fair game, we're losing 8 cents on average.
Which means, we expect to lose 8 cents each time we play.


Expectations can be defined alternatively:

E[R]=ωSR(ω)Pr[ω]E[R] = \displaystyle\sum_{\omega \in S} R(\omega) \cdot \text{Pr}[\omega]

So, it is the sum over all the possible outcomes in the sample space of the value of the random variable at that outcome times the probability of that outcome.

In the example, all the possible outcomes in our sample space are the dollar values: -1, 1, 2, or 3. For each of them, we already calculated the probabilities, so the only thing is to multiply each and add them together. Which we did, and find that it was 17216- \frac{17}{216}.

This expected value (which was 8-8 in our example) can also be called the mean value, or just mean, or expectation.

Let's look at another example from the practice exercises.

We have a dice that lands on an even number with probability 14\frac{1}{4}. It lands on the values with equal probability. Let's find the expectation of it.

We have 3 even numbers (2, 4, 6), and 3 odd numbers (1, 3, 5) it can land on.

The probability that we get an even number is 1413=112\frac{1}{4} \cdot \frac{1}{3} = \frac{1}{12}.
The probability of we get an odd number is 3413=312=14\frac{3}{4} \cdot \frac{1}{3} = \frac{3}{12} = \frac{1}{4}.

So, what we'll do is to sum up each value with its probability:
1(14)+2(112)+3(14)+4(112)+5(14)+6(112)1(\frac{1}{4}) + 2(\frac{1}{12}) + 3(\frac{1}{4}) + 4(\frac{1}{12}) + 5(\frac{1}{4}) + 6(\frac{1}{12}), which is equal to 134=3.25\frac{13}{4} = 3.25.


The expectation of a binomial distribution (E[B(n, p)]E[B_{(n, \ p)}]) is npnp.
Let's look at an example.
All of our coins are fair, and we flip 200 of them. (Or, we flip a fair coin 200 times).
We want to know the expected number of heads.
Well, nn is 200, and pp is 12\frac{1}{2}. Multiplying them, we have 100.
So, the expected number of heads is 100.


Conditional expectation is defined as:

E[R  A]=vPr[R=v  A]E[R \ | \ A] = \displaystyle\sum v \cdot \text{Pr}[R = v \ | \ A]

It is the sum over all possible values that RR might take of the probability that RR takes that value, given AA.

An example: let's say we roll a dice. Let AA be the probability of getting an even number (one of 2, 4, or 6); let BB be the probability of getting a prime number (one of 2, 3, or 5).

Expectation of AA given BB (expectation of getting an even number given that it is a prime number) is 13\frac{1}{3}.


The law of total expectation is similar to the law of total probability in the sense that it is helpful to reason by cases.
It is defined as:

E[R]=(E[R  A1]Pr[A1])+(E[R  A2]Pr[A2])+ ... +(E[R  An]Pr[An])E[R] = \big(E[R \ | \ A_1] \cdot \text{Pr}[A_1]\big) + \big(E[R \ | \ A_2] \cdot \text{Pr}[A_2]\big) + \ ... \ + \big(E[R \ | \ A_n] \cdot \text{Pr}[A_n]\big)

Let's see how we can use it to calculate the expectation of getting heads in nn times of flipping a coin.

Let e(n)e(n) be the expected number of heads in nn flips.
Then, if the first flip is a head, the expected number of heads is now 1+e(n1)1 + e(n - 1). So, we have 11 head and the remaining expectations.

On the other hand, if the first flip is a tail, e(n)e(n) is going to be e(n1)e(n - 1). Because it wasn't heads, what we can look for is the remaining expectations.

These are two cases.

Now, let pp be the probability of a head, and qq be the probability of a tail.

By total expectation:
e(n)=([1+e(n1)]p)+(e(n1)q)e(n) = \Big([1 + e(n - 1)] \cdot p\Big) + \Big(e(n - 1) \cdot q\Big)

qq is 1p1 - p, so replacing it, we get
e(n)=([1+e(n1)]p)+(e(n1)(1p))e(n) = \Big([1 + e(n - 1)] \cdot p\Big) + \Big(e(n - 1) \cdot (1 - p)\Big)

Simplifying it further:

e(n)=p+(pe(n1))+e(n1)(pe(n1))e(n) = p + (p \cdot e(n - 1)) + e(n - 1) - (p \cdot e(n - 1))

It all simplifies to e(n)=e(n1)+pe(n) = e(n - 1) + p.

Look what a beautiful thing we have: e(n)e(n) is defined by itself, it is a recursive formula!

So, we need to subtract 11 from nn, and add a pp (rather, npnp).

Going further, e(n1)=e(n2)+2pe(n - 1) = e(n - 2) + 2p, ......
So, e(1)=e(0)+np=npe(1) = e(0) + np = np.
Therefore, the expectation of the number of heads in 1 flip, is just npnp, which is just the probability of getting a head in a flip, which is 12\frac{1}{2}.


The expected time to failure is just the answering how long we have to wait for something.

Let's say we want to get a tail this time, so head will be indicating failure.
Then, the expected time to failure in this case is how long until a head comes up.

Let pp be the probability of getting a head (Pr[head]\text{Pr}[\text{head}]), let qq be the probability of getting a tail, and FF be the number of flips until we get the first head.

The probability of getting a head on the first flip (Pr[F=1]\text{Pr}[F = 1]) is just pp.

The probability of getting a head on the second flip (Pr[F=2]\text{Pr}[F = 2]) is going to be qpq \cdot p because it means that our first flip resulted in tails.

Note: PDFF(n)=qn1pPDF_F(n) = q^{n - 1} \cdot p.

The expected number of flips before we get a head is 1p\frac{1}{p}:
E[F]=1pE[F] = \frac{1}{p}

Why?

Let's look at this tree that we're going to call BB that helps us see it:

Mean time to failure tree example 1

Branches correspond to the act of flipping the coin, so the number of times until we get a head as a result can be counted by following the branches.

This tree is recursive, which means that we can replace a part of it by itself:

Mean time to failure tree example 2

As the slide says, we can use Total Expectation to find E[F]E[F], which is just the expected number of branches we have to follow until we reach an HH.
It will be (the expectation of FF given that the first one is a head) times (the probability of getting a head) plus (the expectation of FF given that the first one is a tail) times (the probability of getting a tail).

English can be confusing sometimes, so:
E[F]=E[F  first = H]p+E[F  first = T]qE[F] = E[F \ | \ \text{first = H}] \cdot p + E[F \ | \ \text{first = T}] \cdot q

Well, the first part (the expected number of flips before getting a head on the first time) is just 1:
E[F]=1p+E[F  first = T]qE[F] = 1 \cdot p + E[F \ | \ \text{first = T}] \cdot q
E[F]=p+E[F  first = T]qE[F] = p + E[F \ | \ \text{first = T}] \cdot q

The expected number of flips until we get a head given that the first flip resulted in a tail is 1+E[F]1 + E[F]. Realize that after we get a tail, following the branch going through qq, we still have the tree BB itself. So, we already passed one branch (that's for 11), and we're still expecting to get a head (that's for E[F]E[F]), therefore 1+E[F]1 + E[F].

Putting it all together, we have E[F]=p+(1+E[F])qE[F] = p + (1 + E[F]) \cdot q.

Simplifying it:
E[F]=p+q+q(E[F])E[F] = p + q + q(E[F])
E[F]q(E[F])=p+qE[F] - q(E[F]) = p + q

Factoring out E[F]E[F]:
E[F](1q)=p+qE[F](1 - q) = p + q

Then:
E[F]=p+q1qE[F] = \frac{p + q}{1 - q}

qq can be written as 1p1 - p:
E[F]=p+(1p)1(1p)E[F] = \frac{p + (1 - p)}{1 - (1 - p)}
E[F]=1pE[F] = \frac{1}{p}

So, the expected number of flips before we get a head is indeed 1p\frac{1}{p}.


Two things to note:

Expectation is linear:
Let RR and SS be random variables, and aa and bb constants. Then,
E[aR+bS]=aE[R]+bE[S]E[aR + bS] = aE[R] + bE[S]
For independent XX and YY,
E[XY]=E[X]E[Y]E[XY] = E[X] \cdot E[Y]