Bite-Sized Mathematics for Computer Science

Notes on MIT's 6.042J

Eda Eren
Table of Contents

Introduction

Prerequisites & Resources

Unit 1: Proofs

00. Introduction to Proofs

01. (Two) Proof Methods

02. The Well Ordering Principle

03. Logic & Propositions

04. Quantifiers and Predicate Logic

05. Sets

06. Binary Relations

07. Induction

08. State Machines — Invariants

09. Recursive Definitions

10. Infinite Sets

Unit 2: Structures

11. GCDs

12. Congruences

13. Euler's Theorem

14. RSA Encryption

15. Digraphs: Walks and Paths

16. Directed Acyclic Graphs

17. Partial Orders and Equivalence

18. Degrees & Isomorphism

19. Coloring & Connectivity

20. Trees

21. Stable Matching

Unit 3: Counting

22. Sums & Products

23. Asymptotics

24. Counting with Bijections

25. Repetitions & Binomial Theorem

26. Pigeonhole Principle, Inclusion-Exclusion

Unit 4: Probability

27. Intro to Discrete Probability

29. Independence & Causality

30. Random Variables, Density Functions

31. Expectation

32. Deviation: Markov & Chebyshev Bounds

33. Sampling & Confidence

34. Random Walks & PageRank

Conditional Probability

What is the probability that you own a cat, given that you have just bought a cat food?

That's conditional probability, the probability that some event occurs given certain information about it.


Let's suppose that a friend rolls a fair die, and we want to know the probability of rolling 1. The answer is obvious, we have 6 possibilities, and the probability of rolling 1 is 16\frac{1}{6}.

But, now let's say that our friend says that she has rolled an odd number. Now, what's the possibility of rolling 1?
Now that we have our number of possibilities down to three (it can only be 1, 3, or 5), we know that the probability of rolling 1 is 13\frac{1}{3}.

What it shows is that "knowledge" changes probabilities.

If you remember the tree model example in the previous section, it might be recognizable that what we were doing at the last step is using the product rule:

Pr[AB]=Pr[A]Pr[B  A]\text{Pr}[A \cap B] = \text{Pr}[A] \cdot \text{Pr}[B \ | \ A]

The Pr[A]\text{Pr}[A] is the first branch, and the probability of BB given AA (Pr[B  A]\text{Pr}[B \ | \ A]) is the second branch. We multiply them to find their intersection.

So, Pr[B  A]\text{Pr}[B \ | \ A] is the probability of event BB, given that event AA has occurred:

Pr[B  A]=Pr[AB]Pr[A]\text{Pr}[B \ | \ A] = \frac{\text{Pr}[A \cap B]}{\text{Pr}[A]}

Since we can't divide by 00, it means that probability of AA can't be 00.

Conditioning on AA defines a new probability function PrA\text{Pr}_A where PrA[ω]=0  if ωA\text{Pr}_A[\omega] = 0 \ \ \text{if } \omega \notin A (probability of an outcome is 0 if that outcome is not in AA). And, if ωA\omega \in A, Pr[ω]Pr[A]\frac{\text{Pr}[\omega]}{\text{Pr}[A]}.


Let's take a look at what is called the law of total probability.
It is a law for reasoning about probability by cases. If you remember from the (Two) Proof Methods, one of them was Proof by Cases. Breaking up a complicated problem into cases is indeed useful, now let's see how it can be helpful when it comes to probability.

So, let's say that in a sample space SS, we have a set AA which is actually an event. Let's say that we have three more sets B1B_1, B2B_2, and B3B_3 partitioning the sample space, and none of them intersects with each other. Each of these sets has to intersect with AA, though, so it looks like this:

Law of total probability

Since B1B_1, B2B_2, and B3B_3 constitute everything, the union of each of their intersection with AA is AA:

A=(B1A)(B2A)(B3A)A = (B_1 \cap A) \cup (B_2 \cap A) \cup (B_3 \cap A)

We can add probabilities of those intersections to find the probability of AA:

Pr[A]=Pr[B1A]+Pr[B2A]+Pr[B3A]\text{Pr}[A] = \text{Pr}[B_1 \cap A] + \text{Pr}[B_2 \cap A] + \text{Pr}[B_3 \cap A]

Remember the product rule defines an intersection, so we can do a little substitution:

Pr[A]=(Pr[A  B1]Pr[B1])+(Pr[A  B2]Pr[B2])+(Pr[A  B3]Pr[B3])\text{Pr}[A] = \big(\text{Pr}[A \ | \ B_1] \cdot \text{Pr}[B_1]\big) + \big(\text{Pr}[A \ | \ B_2] \cdot \text{Pr}[B_2]\big) + \big(\text{Pr}[A \ | \ B_3] \cdot \text{Pr}[B_3]\big)

Okay, now it's time to define it better:

If SS is a disjoint union of B0, B1, ...B_0, \ B_1, \ ..., then:

Pr[A]=iPr[ABi]\text{Pr}[A] = \displaystyle\sum_i \text{Pr}[A \cap B_i]

=iPr[A  Bi]Pr[Bi]= \displaystyle\sum_i \text{Pr}[A \ | \ B_i] \cdot \text{Pr}[B_i]

Now, let's look at the famous Bayes' Law.

Pr[B  A]=Pr[A  B]Pr[B]Pr[A]\text{Pr}[B \ | \ A] = \frac{\text{Pr}[A \ | \ B] \cdot \text{Pr}[B]}{\text{Pr}[A]}

For example, let's say that two friends are at a café. There are only two options: either coffee or tea (that's a lousy café), and one of them is drinking coffee. What's the probability that both are drinking coffee?

Before using the Bayes' theorem, let's first enumerate all the possibilities.
There are three possibilities, given that one of them is drinking coffee: CC (both are drinking coffee), CT (the first one is drinking coffee and the other is drinking tea), TC (the first one is drinking tea and the other is drinking coffee).

It's obvious that the probability of both of them drinking coffee is 13\frac{1}{3}. We're done.

But, let's now use Bayes' rule.

Let's say that BB denotes that both of them are drinking coffee, and AA defines that one of them is drinking coffee.
We want Pr[B  A]\text{Pr}[B \ | \ A], probability of both drinking coffee, given that one of them does.

The formula is Pr[B  A]=Pr[A  B]Pr[B]Pr[A]\text{Pr}[B \ | \ A] = \frac{\text{Pr}[A \ | \ B] \cdot \text{Pr}[B]}{\text{Pr}[A]}.

Pr[A  B]\text{Pr}[A \ | \ B] is 11 (if both have coffee, then one of them obviously does have coffee).

Pr[B]\text{Pr}[B] is 1212=14\frac{1}{2} \cdot \frac{1}{2} = \frac{1}{4} (both of them are drinking coffee with 12\frac{1}{2} chance).

Pr[A]\text{Pr}[A] is 1141 - \frac{1}{4} which is 34\frac{3}{4} (The complement of Pr[A]\text{Pr}[A] is 14\frac{1}{4}).*

The only thing left is to apply the formula:

Pr[B  A]=11434=13\text{Pr}[B \ | \ A] = \frac{1 \cdot \frac{1}{4}}{\frac{3}{4}} = \frac{1}{3}.

So, the probability of both of them drinking coffee given that one of them does is 13\frac{1}{3}, or 0.330.33.

*Not one of them drinking coffee is that they are drinking tea with a 12\frac{1}{2} chance. So, 1212=14\frac{1}{2} \cdot \frac{1}{2} = \frac{1}{4}.


One thing to note is that the Bayes' theorem allows us to calculate the probability of a prior event given the result of a later event.

Now, let's look at another example from the practice questions of the lecture:

Let AA be the event that Albert is giving the lecture.
Let LL be the event that Louis goes to the lecture.

Pr[A]=0.8\text{Pr}[A] = 0.8 and Pr[L]=0.4\text{Pr}[L] = 0.4.

The probability that Louis goes to lecture given that Albert is giving the lecture (Pr[L  A]\text{Pr}[L \ | \ A]) is 0.30.3.

We want to know the probability of Albert giving the lecture given that Louis goes to lecture, that is, Pr[A  L]\text{Pr}[A \ | \ L].

We have all that we need, so let's plug them into the formula:

Pr[A  L]=0.3  0.80.4\text{Pr}[A \ | \ L] = \frac{0.3 \ \cdot \ 0.8}{0.4}

What we have as a result is 0.60.6.


One more entertaining example from the practice questions.

This weekend it will rain with probability 14\frac{1}{4}, be sunny with probability 14\frac{1}{4}, and hail baseballs otherwise. If it rains, then there is a 50% chance that I will see a crocodile. If it is sunny, then crocodiles will hide themselves from me with probability 38\frac{3}{8}. If it hails, then I will see a crocodile with complete certainty.

What is the probability I see a crocodile this weekend?

Now, let's look at all the probabilities of seeing a crocodile first.
We have one given that it rains, one given that it is sunny, and one given that it hails.

If it rains (which has a 14\frac{1}{4} chance), we'll see crocodiles with 12\frac{1}{2} chance, so multiplying those together we get 18\frac{1}{8}.

If it is sunny (which has a 14\frac{1}{4} chance), we'll see crocodiles with 58\frac{5}{8} chance (138=581 - \frac{3}{8} = \frac{5}{8}), so multiplying those together we get 532\frac{5}{32}.

If it hails (which has a 12\frac{1}{2} chance), we'll see crocodiles with complete certainty, which is 11.

All we have to do is to add them up together: 18+532+1\frac{1}{8} + \frac{5}{32} + 1.

We have the result 2532\frac{25}{32}, so we have 0.781250.78125 chance of seeing crocodiles this weekend.