You can find this article and source code at my GitHub repo
A Deadly Simple Neural Network
Have you ever heard of or look at some materials about the neural network, if so, a model looks like the one in below figure should not be a stranger to you.
Well, that's a little bit complicated, so what about this one?
Let's have a brief explanation for each component in the figure. Each circle represents a unit (or a neuron). And each square represents a calculation. The left most three units form the input layer. The neuron with an h inside is the only neuron the output layer of this neural network has.
Recall that for a biological neuron, there exists a threshold for such a neuron to be activated. In our neural network, the neuron will calculate the input with an activation function and send the result as the output. One of the biggest advantages of the activation function is that the function could be any function. It means you can use step, polynomial or sigmoid functions of your choice to build your model. The output unit returns the result of f(h), where h is the input to the output unit, and y is the output of the neural network.
if you let f(h) = h as your activation function, then the output of the network will be this, note that here y = f(h).
You are correct if you think this is just the linear regression model. Once you start using activation functions that are continuous and differentiable, it's possible to train the network using gradient descent. We gonna need the first derivative of the activation function. Before we dive into the training process, let's code the dead simple neural network in Python. For the activation function, we will use the sigmoid function. Don't worry if you think this network can only make a prediction by feedforward but learn (get trained) from backpropagation.
import numpy as np
def sigmoid(x):
# sigmoid function
return 1/(1 + np.exp(-x))
inputs = np.array([0.7, -0.3])
weights = np.array([0.1, 0.8])
bias = -0.1
# calculate the output
output = sigmoid(np.dot(weights, inputs) + bias)
print('Output:')
print(output)
You can find the code here.
Your First 2-Layer NN
Now, you would have a basic idea of how a neural network makes predictions. In a real-world problem, such a simple neural network may not be very helpful for your problem. A new concept needs to introduced here, the hidden layer.
In the previous simple network, our weight is a vector, but for a more common case, our weight should be a matrix looks like below (and this is the weight matrix represented in the figure above).
You may get the idea of calculating h1 from the 2-layer neural network structure. Let's name it a mathematical formula.
And for our case,
Note: The weight indices have changed in the above image and no longer match up with the labels used in the earlier diagrams. That's because, in matrix notation, the row index always precedes the column index, so it would be misleading to label them the way we did in the neural net diagram.
Remember, the above is not a correct view of the indices, but it uses the labels from the earlier neural net diagrams to show you where each weight ends up in the matrix.
Combine with the formula we learned from the first section, we can implement the 2-layer neural network! The activation function used here is the sigmoid function.
Things to do:
- Calculate the input to the hidden layer.
- Calculate the hidden layer output.
- Calculate the input to the output layer.
- Calculate the output of the network.
import numpy as np
def sigmoid(x):
# sigmoid function
return 1/(1+np.exp(-x))
# Network size
N_input = 3
N_hidden = 2
N_output = 1
np.random.seed(42)
# Make some fake data
X = np.random.randn(4)
weights_in_hidden = np.random.normal(0, scale=0.1, size=(N_input, N_hidden))
weights_hidden_out = np.random.normal(0, scale=0.1, size=(N_hidden, N_output))
hidden_layer_in = np.dot(X, weights_in_hidden)
hidden_layer_out = sigmoid(hidden_layer_in)
print('Hidden-layer Output:')
print(hidden_layer_out)
output_layer_in = np.dot(hidden_layer_out, weights_hidden_out)
output_layer_out = sigmoid(output_layer_in)
print('Output-layer Output:')
print(output_layer_out)
You can find the code here.
Reference
- All formulas are generated by HostMath
- Some figures are taken from the Udacity deep learning course
- 【模式識別】多層感知器 MLP
- CS224d:
TensorFlow Tutorial - CS231n Winter 2016 Lecture 4 Backpropagation, Neural Networks
Thanks for reading. If you find any mistake/typo in this blog, please don't hesitate to let me know, you can reach me by email: jyang7[at]ualberta.ca