Duplication-Free Means That Duplicated Elements Are Ignored

Set Theory

Definitions

A set is an unordered, duplication-free collection of values (values = strings, numbers, functions, objects, sets, whatever).

Examples:

{0, 1, 2, 3, 5, 8, 13}
{}
{a, e, i, o, u}
{{}, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}

Unordered means order doesn't matter:

{1, 2, 3} = {1, 3, 2} = {2, 1, 3} = etc.

Duplication-free means that duplicated elements are ignored:

{1, 1, 2, 3} = {1, 2, 3}

Suppose S = {1, 2, 3}, then 1, 2, and 3 are the members of S. Equivalently, S contains 1, 2, and 3.

Notation:

1 S & 2  S & 3 S & 4  S

We use a special symbol to denote the empty set:

{} = 

The cardinality or size of a set is the number of members:

|S| = 3
|| = 0

A multiset (or bag) is an unordered collection of values.

Notation:

S = [1, 2, 3] = multiset containing 1, 2, & 3 (Just made up [].)

Duplicates are not ignored in multisets:

[1, 1, 2, 3] != [1, 2, 3]

A list (or sequence) is an ordered collection of values. Duplicates are allowed and order is important.

For example:

(1, 2, 3) != (3, 2, 1)

(1, 1, 2, 3) != (1, 2, 3)

Each element in a list has a unique position. The first element has position 0.

For example, suppose L = (a, e, i, o, u), then L[0] = a, L[1] = e, etc.

More notation:

In general, there are two ways to describe a set.

Set lister notation:

A = {1, 2, 3}

Set builder notation:

A = {x | x == 1 || x == 2 || x == 3}

In English:

A = the set of all x such that x equals 1 or x equals 2 or x equals 3.

Set lister notation is convenient for small sets. We sometimes stretch it to describe infinite sets by using ellipses:

EVEN = {0, 2, 4, 6, ... }

Set builder notation is more formal since any property can come after the "such that" pipe:

PRIME = {x | 1 < x & x = a * b implies a = x || b = x}

Operations

Assume A and B are sets, then so are:

A  B = {x | x  A || x  B}

A X B = {(x, y) | x  A & y  B}

Ak = A X A X A ... X A (k-times)

A – B = {x | x  A & x  B}

A  B = {x | x  A & x  B}

P(A) = {x | x  A}

A -> B = {f | f is a function & domain(f)  A & range(f)  B}

A is a subset of B if every member of A is also a member of B. Notation: A  B

A equals B (A = B) if A  B & B  A

Note: For each set A,  A

It is conventional to write f  A->B as f: A->B.

Famous Sets

NAT = {0, 1, 2, ... }

INT = {0, 1, -1, 2, -2, ... }

RAT = {a/b | a, b  INT & 0 < b}

REAL = all possible magnitudes and their negatives

STRING = all strings of char

BOOLEAN = {true, false}

Infinities

Yes, there is more than one infinity. In fact, there are infinity infinities, somemore infiniter than others!

Theorem: For every set A, |P(A)| = 2|A|

Proof: By induction on |A|.

Induction is a cool way to prove things. Here's the idea. Start by defining:

W = { n NAT | if |A| = n, then |P(A)| = 2n}

In other words:

W = { n  NAT | the theorem is true if |A| = n}

Our goal will be to prove our theorem by proving W = NAT. We do this by showing:

(i) 0  W

(ii) if n - 1  W, then n  W

In other words, W contains the increment of any member by (ii), and since 0  W by (i), then 1  W. Applying (ii) again we conclude that 2  W, and so on.

The cool feature of induction is that (ii) allows us to assume n – 1  W. This is a powerful assumption. In essence a proof by induction is analogous to a recursive algorithm.

proof of (i): If |A| = 0, then A = and P(A) = {} and |P(A)| = 1 = 20.

proof of (ii): Assume |A| = n > 0. Pick a random element aA. Let B = A – {a}. Then |P(B)| = 2n-1 by induction. Of course P(B)  P(A).

P(A) – P(B) = {x  {a} | x  B}, so |P(A – B)| = |P(B)| and therefore |P(A)| = 2 * 2n-1 = 2n.

QED.

Assume A and B are infinite sets. What does it mean to say that |A| = |B|?

Definition |A| = |B| if there is an f: A -> B such that:

x != y implies f(x) != f(y) (f is one-to-one)
for every b: B there is an a: A such that f(a) = b (f is onto)

A one-to-one onto function is called a bijection. Note that if f is a bijection, then so is f-1: B -> A, where f(f-1(a)) = a.

Note, using this definition, it's possible that a proper subset can have the same cardinality as the superset.

Theorem: |NAT| = |EVEN|

Proof: This is true because the function f(n) = 2 * n is a bijection. QED.

Theorem: If A = {n  NAT | n = 2k for some k}, then |A| = |NAT|

Proof: Left as an exercise.

Theorem (Euclid): If A = {n  NAT | n is prime}, then |A| = |NAT|

Proof: Clearly |A| <= |NAT|. If |A| < |NAT|, then A = {p0, p1, p2, ...,pn}, where 2 = p0 < p1 < p2 < ... < pn.

Let k = (p0*p1*p2*...*pn), and let p = k + 1. Assume p = r * pi for some i and r, then p = pi*s + 1 = pi * r, so pi = 1/(r – s). This is impossible if pi is a NAT larger than 1, hence p is a prime larger than all of the primes assumed to exist.

QED.

Theorem: |A x B| <= |A| * |B|

Proof: Left as an exercise.

Theorem: |NAT| = |INT| = |RAT| = |STRING|

Proof: Left as an exercise.

Theorem: |NAT x NAT| = |NAT|

Proof: Left as an exercise.

Theorem: |A  B| <= |A| + |B|

Proof: Left as an exercise.

Theorem: A B implies |A| <= |B|

Proof: Left as an exercise.

Theorem: If A  NAT and A is finite (i.e., |A| < |NAT|), then |NAT – A| = |NAT|

Proof: Left as an exercise.

Theorem: If A = all binary strings, then |A| = |NAT|

Proof: Left as an exercise.

Theorem: If A = all valid C programs, then |A| = |NAT|

Proof: Left as an exercise.

Theorem: If A = all protons in the universe, then |A| <= |NAT|

Proof: Left as an exercise.

Theorem (Cantor): |NAT| < |P(NAT)|

Proof: Clearly |Nat| <= |P(NAT)|.

Now assume there is a bijection f: NAT -> P(NAT). Let D = {n: NAT | n f(n)}. Since f is onto, f(m) = D for some m. Is m a member of D? If yes, then mf(m), but this implies m D. On the other hand, if m f(m), then by definition m D, i.e., m  f(m). A contradiction either way!

QED.

Theorem: |A| < |P(A)| for any set A.

Proof: Left as an exercise.

Notation: Even though |A| may be infinite, we still use exponentiation notation to denote the cardinality of its power set: |P(A)| = 2|A|.

We also write |P(A)| = |A|+. So for all A, |A| < 2|A| = |A|+.

Theorem: if 2 <= |B|, then |A| < |A->B|

Proof: Note that |P(A)| = |A->BOOLEAN| and D = {n: NAT | f(n) = false}.

More generally, assume f: A -> (A->B) is a bijection. Let g:A->B be defined by g(a) != f(a)(a). This should be possible since B contains more than one element. Now assume g = f(k) for some k  A. then g(k) != f(k)(k) = g(k), a contradiction!

QED.

Theorem: |NAT| < |REAL|

Proof: The proof depends on a famous bijection from NAT->NAT to REAL.

So, we have an infinity of infinities: |NAT| < |P(NAT)| < |P(P(NAT))| < ...

Notation: |NAT| = 0

0 is called countable infinity since any set with this cardinality can be put in a one-to-one correspondence with NAT, i.e., it can be counted.

The next infinity after 0 is 1, and so on:

012 < ... < 0 < ...

Generalized Continuum Hypothesis: i+1 = i+.

This is a hypothesis because it is known that in some models of set theory it is true, while it is false in others. For example, in some weird models of set theory 0+ = 2.