Back Propagation (BP)
the code is from here.
Here is his
blog.
1. The four fundamental equations behind backpropagation definition:
$$δ^l_j=\frac{∂C}{∂a^l_j}$$
==>
An equation for the error in the output layer:
$$δ^L_j=\frac{∂C}{∂a^L_j}σ′(z^L_j)$$
(BP1):
- L means the last layer(l)
- you can understand this simply by the chain rule.
==>
$$δ^L=∇_aC⊙σ′(z^L)$$
(BP1a)
- matrix-based form
An equation for the error δl in terms of the error in the next layer:
$$δ^l=((w^{l+1})^Tδ^{l+1})⊙σ′(z^l)$$
(BP2)
- this is bp !!
$$\frac{∂C}{∂b^l_j}=δ^l_j$$
(BP3)
$$\frac{∂C}{∂w^l_{jk}}=a^{l-1}_kδ^l_j$$
(BP4)
BP3 and BP4 is what we want, BP1 is the start of BP2.
That’s all for bp.
2. Basic Implementation
cost function:
$$ C(w,b)≡1/2n∑_x ||y(x)−a||^2 $$
- w: collection of all weights in network
- b: all the biases
- quadratic cost function
$$w_k→w′_k=w_k−η\frac{∂C}{∂w_k}$$
$$b_k→b′_k=b_k−η\frac{∂C}{∂b_k}$$
gradient
descent algorithm
1 | def sigmoid(z): |
use sigmoid function
1 | """ |
test with mnist dataset
1 | import mnist_loader |
1 | net = Network([784, 30, 10]) |
1 | net.SGD(training_data, 30, 10, 3.0, test_data=test_data) |
1 | Epoch 1: 9171 / 10000 |
3. Other details I want to mention …
Matrix element W[j,k] means the weight of kth neural to jth neural of the next layer.