4) Back-propagation (20 points) Suppose we train 2 layer fully connected neural network with the fol- lowing structure for the regression 0 Input layer: 2-dimensiona1 input vector. 0 Hidden layer: 3 hidden units with ReLU non-linear activation. ' Output layer: 1 scalar output without any non-linear activation. Noted that ReLU activation is defined as ReLU(x)=max(0, :12), we also use following notations: 0 x is the training input vector , y denotes the true target . g) is the output of your neural network. All vectors are column vectors. Note that vector x has two elements, y and g" has only one element (i.e., scalar ) G We consider L2 squared loss for regression , i.e., : fly 7 g?z. ' Denote g is the vector of hidden unit values before non-linear activation functions are applied, h is the vector of hidden unit values after they are applied. ' V is the weight matrix that map the input layer to the hidden layer (assume NO bias) ' W is the weight matrix that map the hidden layer to the output layer (assume NO bias) (a) (3 pts) Write down the corresponding math expression for each layer using the notations defined above. (Hint; you may have 3 equations) (b) (6 pts ) Calculate BL (c) (6 pts) Calculate 3—1" (d) (5 pts) Suppose we have 11, training samples , i.e., X = {xl,x2,...,xn}. Brie?y describe how you would use stochastic gradient decent to update model parameters V and W. Write down the update rule in each iteration .Assurne the learning rate is 17.

pur-new-sol

Related Questions