因为是神经网络中十分经典的算法,故特意开设一篇记录自己对其的理解,伴随着学习的深入将不断完善。
前向传输时使用sigmoid函数作为激活函数,输出层使用softmax作为输出函数。
代价函数使用均方差MSE函数$J = \displaystyle \frac{1}{2}\sum_{i = 1}^{M}(t_i - y_i)^2$。使用随机梯度降SDG方法进行优化。
主要框架的伪代码如下
1 | def backPropNeuralNetwork(training_data, epochs): |
1 | def update_mini_batch(mini_batch, learning_rate): |
其中反向传播算法计算偏导的公式推导如下
代价函数$J = \displaystyle \frac{1}{2}\sum_{i = 1}^{M}(t_i - y_i)^2$
定义误差$\delta_j^l = \displaystyle \frac{\partial J}{\partial z_j^l} $
公式一:对于输出层
$\begin{cases} \displaystyle \frac{\partial J}{\partial z_j^L}=\sum_k\frac{\partial J}{\partial a _k^L}\frac{\partial a_k^L}{\partial z_j^L} = \frac{\partial J}{\partial a_j^L}\frac{\partial a_j^L}{\partial z_j^L} \\ \displaystyle \frac{\partial J}{\partial a_j^L} = \frac{\partial (\frac{1}{2}(y_j - t_i)^2)}{\partial t_i} = t_k - y_k \\ \displaystyle \frac{\partial a_j^L }{\partial z_j^L} = \text{softmax}’(z_j) = y_i [(y_i - t_i) + \sum_{j = 1}^M (t_j - y_j)y_j] \\ \displaystyle \delta^L_j = (t_j - y_j)[(y_j - t_j) + \sum_{i = 1}^{M}(t_i - y_i)y_i] \end{cases}$
公式二:对于隐藏层
$\begin{cases} \displaystyle \delta_j^l = \frac{\partial J}{ \partial z_j^l} = \sum_k \frac{\partial J}{\partial z_k^{l + 1}} \frac{\partial z_k^{l + 1}}{\partial z_j ^l} = \sum_k \frac{\partial z_k^{l + 1}}{\partial z_j ^l}\delta_k^{l + 1} \\ \displaystyle \frac{\partial z_k^{l + 1}}{\partial z_j^l} = \sum_k w_{kj}^{l + 1}\sigma’(z_j^l) \\ \displaystyle \delta _j^l = \sum_kw_{kj}^{l+1}\delta_k^{l + 1}\text{sigmoid}’(z_j^l) \end{cases}$
公式三:对于偏置量$\Delta b = \displaystyle \frac{\partial J}{\partial b}= \frac{\partial J}{\partial z}\frac{\partial z}{\partial b} = \delta$
公式四:对于权重 $\Delta w_{ji}^l = \displaystyle \frac{\partial J}{\partial w_{ji}^l} = \frac{\partial J}{\partial z_j^l}\frac{\partial z_j^l}{\partial w_{ji}^l} = \delta^l a_i^{l - 1} $