Lecture 2: Complexity

如何比较算法

最优性-基于问题

Sorting（排序） vs Packing（装箱/打包）

• 优雅性/可解释性-重要但通常不是必须的
• 在需要向用户/非计算机专家提供解释时有用
• 更简洁/更优雅使得以后需要的更改更容易实现
• 通常不是优先事项*

复杂度-用来比较算法的标准,尤其是用大O

• 时间复杂度
• 空间复杂度

Efficiency

比较复杂度时使用O, Θ, Ω (Big O, Big Theta, Big Omega)来比较

复杂度从大x大到小排序

渐近复杂度(Asymptotic notations)

Best case（最好情况，渐近下界 Ω）

表示：算法在“最好”的情况下，至少需要多少时间/操作数

Worst case（最坏情况，渐近上界 O）

表示：算法在“最坏”的情况下，最多需要多少时间/操作数

Exact order of growth（精确增长阶，紧确界 Θ）

表示：算法的增长速度被上界和下界同时夹住，也就是“总体上就是这个量级”。

Θ 需要对“所有足够大的 n”都同时成立（不能只在偶数或奇数成立）。

定理

对任意两个函数 f(n) 和 g(n)：

f(n) = Θ(g(n)) if and only if f(n) = O(g(n)) and f(n) = Ω(g(n))

不存在一个 g(n) 让 f(n) 同时有匹配的上界和下界，因此 f(n) 的 Θ（紧确增长阶）不存在。

特性

1) Transitivity 传递性

如果“f 和 g 同阶”，且“g 和 h 同阶”，那“f 和 h 也同阶”。

Θ 的传递性

$f(n)=\Theta(g(n)),\ g(n)=\Theta(h(n)) \Rightarrow f(n)=\Theta(h(n))$
O 的传递性（上界可以串起来）

$f(n)=O(g(n)),\ g(n)=O(h(n)) \Rightarrow f(n)=O(h(n))$
Ω 的传递性（下界可以串起来）

$f(n)=\Omega(g(n)),\ g(n)=\Omega(h(n)) \Rightarrow f(n)=\Omega(h(n))$

2) Reflexivity 自反性

任何函数和自己当然是同阶，也当然是自己的上界和下界：

$f(n)=\Theta(f(n)),\quad f(n)=O(f(n)),\quad f(n)=\Omega(f(n))$

3)Symmetry 对称性（以及 O 与 Ω 的互换）

Θ 是真正的对称（互相同阶）

$f(n)=\Theta(g(n)) \iff g(n)=\Theta(f(n))$

O 和 Ω 是“互为对偶”（交换方向）

$f(n)=O(g(n)) \iff g(n)=\Omega(f(n))$

Lecture 3: Recursive Complexity

创建递归函数需要什么

Base Case（基本情况 / 终止条件）:

一个不再继续递归、可以直接返回答案的情况。
作用：保证递归会停，否则会无限调用导致栈溢出

Recursive Step（递归步骤 / 递归规则）
一套规则，把当前问题转化成规模更小、更加接近 base case 的同类问题，然后用递归调用去解决。
作用：让问题一步步“变小”直到触发 base case。

为什么我们要使用递归？

• 许多计算问题有成熟的递归解法
• 递归解法看起来易于理解和编码
• 这些递归解法通常可以转换为迭代解法（使用循环）
• 但迭代解法更难理解和编码
• 对于某些问题，递归可以成为一个强有力的工具

什么时候不应该用递归

Efficiency

有时递归函数的时间复杂度更差

递归函数的复杂度计算(代入展开法expansion/substitution)

What if the base case is not O(1)

展开思路

把 T(n) 展开 n 次：

$T(n)=\underbrace{O(1)+O(1)+\cdots+O(1)}_{n\ \text{层}} + T(0)$

前面 n 个常数加起来是 O(n)，再加上 base case 的成本：

$T(n)=O(n)+T(0)$

而 T(0)=O(f(n))，所以：

$T(n)=O(f(n)) + O(n)$

What if the recursive call is not O(1)

常用模板

1.递归一定要有

Base case（终止条件）
Recursive step（递归步骤）

2.找 T(n)=O(f(n)) 用代入展开法(expansion/substitution)

如果

T(0) = O(f(n)) and T(n) = O(g(n)) + **T(n-c)**则 T(n) = O(f(n)) + O(g(n)) * O(n)

如果T(0) = O(f(n)) and T(n) = O(g(n)) + T(n/c) 则 T(n) = O(f(n)) + O(g(n)) * O(log(n))

Lecture 4: P and NP

什么时候一个算法的复杂度算 good（好)

取决于问题本身有多难

有些问题天生就能做到很快（比如排序 $O(n\log n)$ ），有些问题目前已知就很难，可能只能用指数/阶乘级算法，或者只能做近似。

When is a problem easy or hard?

****

Cook-Levin Theorem

Let X be any problem in NP. Then X ∝ SAT

What is NP?

定义:

一个问题 X 是判定问题，如果：

对任意输入实例，答案只能是 Yes 或 No
答案完全由输入实例的细节决定（也就是它是确定的、可由输入唯一决定）

第 2 点是在排除那种“带随机性/不可确定”的提问：同一个输入不应该出现时而 Yes 时而 No 的情况。

Will a this coin land Heads up the next time it is tossed?

看起来是 Yes/No，但不是由输入完全决定（现实中有随机性/不确定性），同样的“硬币+环境描述不充分”不能唯一确定结果，所以不算

P（Polynomial，多项式时间）

P 的定义

所有“判定问题（Yes/No）”中，能够被 确定性图灵机（deterministic Turing machine） 在 多项式时间内解决的问题

NP（Nondeterministic Polynomial，非确定性多项式）****

定义 1：非确定性图灵机多项式时间可解

NP = 所有判定问题中，存在一种“非确定性图灵机”能在多项式时间内解决的。

“非确定性图灵机”可以想成：它能在分叉里“瞬间猜到正确的那条路”（课件右边那句话：能立刻正确猜出解）。
这是一种理论模型，不是现实电脑。

定义 2：多项式时间可验证（polynomial-time verifiable）

NP 也等价于：所有判定问题中，只要答案是 Yes，我们就能给出一个“证据/证书（certificate）”，并且可以在多项式时间内验证这个证据是真的。

课件右下角的意思：

当给你一个“Yes”的细节（证据）时，你能在多项式时间内确认它正确。

P and NP

P ⊆ NP

如果 **P=NP$*：很多现在“只会验证、不会求解”的问题，其实都存在多项式时间算法。

如果 **P != NP$*：就说明有些问题确实只能“给答案你能验”，但你自己找答案可能要花指数级时间。

Reduction（归约）

设 X, Y 是判定问题。如果每个 $X$ 的实例都能变成一个 $Y$ 的实例，并且：

转换过程是多项式时间完成的
答案保持一致（answer preserving）
Y 的答案是 “yes” 当且仅当 X 的答案是 “yes”

那么说 X 归约到 Y，记作：X∝Y

SAT

给定一个包含若干布尔变量（只能取 True/False）的逻辑表达式 E，问：

是否存在一种对这些变量的 True/False 赋值，使得整个表达式 E 的结果为 True？

如果存在，就回答 Yes（可满足）；不存在就是 No（不可满足）。

$E=(x1∧¬x2) ∨ ¬x1$

$E=(x1∧¬x2) ∧ (x2∨¬x1)$

Lecture 5: NP-Complete and Reduction

证明NP-Complete

步骤:

1.证明ETS is in NP (polynomial-time verification)

2.Reduction from k-Coloring to ETS

(1) ETS is in NP (polynomial-time verification).

A solution for ETS is a proposed sched-ule/assignment To verify the solution, we can check every student and ensure that no two exams in in a single students list share the same timeslot. The growth of this increases with respect to the number of students multiplied by the number of exams which is polynomial time:
(2) Reduction from k-Coloring to ETS.

We reduce from the NP-complete problem k-Coloring: Input to k-Coloring: an undirected graph G = (V, EG) and an integer k ≥ 3. Construct an ETS instance (E, S, R, k) as follows:

Create one exam for each vertex:

$E := V$

Create one student for each edge. For every edge {u, v} ∈ EG, create a student suv:

$S := \{ s_{uv} : \{u, v\} ∈ E_G \}.$

Define the relation R (exams a student will take) so that each student corresponding to an edge must take exactly the two endpoint exams:

$R(s_{uv}) := \{ u, v \}\$
Keep the same integer k which now represents the number of timeslots instead of colors.

(3) The reduction runs in polynomial time. The construction creates:
• |V | exams,
• |EG| students,
• and each student is mapped to a set of size exactly 2.
So the output instance size is O(|V | + |EG|), and constructing it clearly takes polynomial time.
(4) The reduction is answer-preserving.
(4.1) If k-Coloring is YES, then ETS is YES.

Assume G is k-colorable. So there exists a proper coloring such that for every edge the two vertices it is attached to are different colors. We build an ETS schedule by assigning each exam (vertex) to the timeslot given by its color: Now consider any student corresponding to an edge. They take the two exams (vertices) their edge is connected to, since those vertices are different colors, their exams must be in different timeslots.
Therefore, no student has two exams in the same timeslot, and ETS is a YES instance.

(4.2) If ETS is YES, then k-Coloring is YES.

Assume the constructed ETS instance has a valid schedule. We define a vertex coloring for G by coloring each vertex with its exam timeslot. Every edge in the graph corresponds to a student in the ETS. Since the ETS schedule is valid, that student cannot have both exams in the same timeslot, so the coloring of the two vertices must be different. Therefore we get a proper k-coloring of the graph. So k-Coloring is a YES instance.