# Chapter 2 Linear Algebra

broadcasting: We allow the addition of a matrix and a vector, yielding another matrix: $\boldsymbol{C} = \boldsymbol{A} + \boldsymbol{b}$ where $C_{i,j} = A_{i,j} + b_j$. In other words, the vector $\textbf{b}$ is added to each row of the matrix. This shorthand eliminates the need to define a matrix with $\textbf{b}​$ copied into each row before doing the addition.

Multiplying Matrices: We can think of the matrix product $\boldsymbol{C} = \boldsymbol{AB}​$ as computing $\it{C_{i,j}}​$ as the dot product between row $i​$ of $\boldsymbol{A}​$ and column $j​$ of$\boldsymbol{B}​$.

Range of A: Determining whether Ax=b has a solution thus amounts to testing whether b is in the span of the columns of A.

$$\boldsymbol{Ax} = [\sum_i(\sum_j \boldsymbol{A}_{i,j}x_j)] = \sum_i\boldsymbol{A}_{:,i}x_i$$

# Chapter 3 概率与信息论

Multinoulli Distribution: If you perform an experiment that can have K outcomes and you denote by $X_i$ a random variable that takes value 1 if you obtain the i-th outcome and 0 otherwise.

$$I(x) = -\log P(x)$$

$$H(x) = \mathbb{E}_{x\sim P}[I(x)] = -\mathbb E {x\sim P}[\log P(x)]$$
KL散度：如果对于同一个随机变量x有两个单独的概率分布P(x) 和 Q(x) ，那么可以使用KL散度来衡量这两个分布的差异:
$$D {KL}(P||Q) = \mathbb E _{x\sim P} \Big[\log \frac{P(x)}{Q(x)} \Big] = \mathbb E {x\sim P}[\log P(x) - \log Q(x)]$$

$$H(P,Q)= H(P) + D_{KL}(P||Q)$$

# chapter 4 数值计算

Optimization: refers to the task of either maximizing or minimizing some function $f(\boldsymbol{x})$ by altering $\boldsymbol x$

Partial derivative: for functions with multiple inputs, the partial derivative measures how $f$ changes as only the variable $x_i$ increases at point $\boldsymbol x$.

Gradient: the gradient of $f$ is the vector containing all the partial derivatives, denoted $\nabla _\boldsymbol x f(\boldsymbol x)$.

0%