Bite-Sized Mathematics for Computer Science

Notes on MIT's 6.042J

Eda Eren
Table of Contents

Introduction

Prerequisites & Resources

Unit 1: Proofs

00. Introduction to Proofs

01. (Two) Proof Methods

02. The Well Ordering Principle

03. Logic & Propositions

04. Quantifiers and Predicate Logic

05. Sets

06. Binary Relations

07. Induction

08. State Machines — Invariants

09. Recursive Definitions

10. Infinite Sets

Unit 2: Structures

11. GCDs

12. Congruences

13. Euler's Theorem

14. RSA Encryption

15. Digraphs: Walks and Paths

16. Directed Acyclic Graphs

17. Partial Orders and Equivalence

18. Degrees & Isomorphism

19. Coloring & Connectivity

20. Trees

21. Stable Matching

Unit 3: Counting

22. Sums & Products

23. Asymptotics

24. Counting with Bijections

25. Repetitions & Binomial Theorem

26. Pigeonhole Principle, Inclusion-Exclusion

Unit 4: Probability

27. Intro to Discrete Probability

28. Conditional Probability

29. Independence & Causality

31. Expectation

32. Deviation: Markov & Chebyshev Bounds

33. Sampling & Confidence

34. Random Walks & PageRank

Random Variables, Density Functions

A random variable RR is formally defined as:

R:SRR: S \rightarrow \mathbb{R}

So, it is not a variable, but a function that maps outcomes in sample space (SS) to real numbers.

One simple example is getting a tail when flipping a coin. It has a 50% chance, and is random.

RR is a package of events (R=aR = a, for aRa \in \mathbb{R}). Then the event properties that we have seen until now apply to random variables as well.
For example, random variables R1, R2, ..., RnR_1, \ R_2, \ ..., \ R_n are mutually independent iff the events that they define [R1=a1], [R2=a2], ..., [Rn=an][R_1 = a_1], \ [R_2 = a_2], \ ..., \ [R_n = a_n] are mutually independent for all a1, a2, ..., ana_1, \ a_2, \ ..., \ a_n.

Say, we flip a coin three times, and we define getting all tails as a match, which we can denote as MM. So, if we get tails three times, it is a match, so M=1M = 1. If we don't get all tails, it is not a match, so M=0M = 0.
This is a simple example that a random variable is a package of events, in this case, MM is our random variable.
It is also what is called the indicator variable.

An indicator variable II for event AA is:

IA={1if A occurs,0if A does not occur. I_A = \begin{cases} 1 & \text{if } A \text{ occurs,}\\ 0 & \text{if } A \text{ does not occur.} \end{cases}

Let's look at some types of random variables.

A uniform random variable is when it takes values with equal probability.
An example is the outcome of rolling a fair die. The probability of each number is 16\frac{1}{6}.

Another is binomial random variable. An example is what we get when we flip nn mutually independent coins (or, flipping it nn times). Let's assume that our coin is not fair this time, so for example, the probability of getting a head is 23\frac{2}{3}.

Let's say that we flip the coin 5 times, and want to know the probability of getting HHTTH.
Since, the probability of getting a head is 23\frac{2}{3}, the probability of getting a tail is going to be 13\frac{1}{3}.
So, the probability of getting HHTTH will be 2323131323\frac{2}{3} \cdot \frac{2}{3} \cdot \frac{1}{3} \cdot \frac{1}{3} \cdot \frac{2}{3}.
Put it another way, Pr[HHTTH]=(23)3(13)2\text{Pr}[\text{HHTTH}] = \big(\frac{2}{3}\big)^{3} \cdot \big(\frac{1}{3}\big)^{2}.
This is also true not just this one sequence, but for each sequence with ii heads, and nin - i tails (nn being the total number of flips).
Let pp denote Pr[head]\text{Pr}[\text{head}], then the formula is:

pi(1p)nip^i \cdot (1 - p)^{n - i}

It is the probability of a sequence of nn flips in which there are ii heads and nin - i tails.
(Any particular sequence of H's and T's of length nn.)

So, all sequences with the same number of H's will have the same probability.

Then, what is the probability of actually getting ii heads and nin - i tails in the nn flips?
It is simply the matter of choosing ii heads from nn number of flips. Which sounds like n choose i.

So, the probability of choosing ii heads and nin - i tails is:

(ni) pi (1p)ni\binom{n}{i} \ p^i \ (1 - p)^{n - i}

Probability Density Function tells what's the probability that a random variable takes a given value for every possible value.

For each aa, it is the possibility of RR equals aa:

PDFR(a)=Pr[R=a]PDF_R(a) = \text{Pr}[R = a]

For example, let's think about the binomial random variable example we've just used. ii (ii = the number of heads) is an integer from 00 to nn (nn = the number of flips). Of all the possible ii's, the probability that the random variable equals ii (choosing ii heads) will be equal to that formula:

PDFBn,p(i)=(ni) pi (1p)niPDF_{B_{n, p}} (i) = \binom{n}{i} \ p^i \ (1 - p)^{n - i}

(Bn,pB_{n, p} is the binomial random variable that has the parameters nn and ppnn as the number of flips, pp as the probability of getting head).

The probability density function of a uniform variable will be a constant. Probability of it taking any possible value vv is the same: PDFU(v)=constantPDF_U(v) = \text{constant}.

Of course, it is in the range of UU (the range that the uniform variable takes on a value):

PDFU(v)=1range(U)PDF_U(v) = \frac{1}{|\text{range}(U)|}

There is also the cumulative distribution function, defining the probability of RR being less than or equal to aa:

CDFR(a)=Pr[Ra]CDF_R(a) = \text{Pr}[R \leq a]

An example is from the practice exercises:

Let XX be a uniformly distributed random variable on the interval [1,12][1,12]. What is the value of the Cumulative Distribution Function (CDF) at 88?

Since XX is a uniformly distributed random variable, PDFX(x)=112PDF_X(x) = \frac{1}{12} for x[1, 12]x \in [1, \ 12].
We are looking for the probability of RR being less than or equal to 88, so from 11 to 1212, the first 88 values satisfy it: 812=23\frac{8}{12} = \frac{2}{3}.