Derivative of Loss Function w.r.t. softmax function
Softmax function is given by: \[S(x_{i}) = S_{i} = \frac{e^{x_i}}{\sum_{k=1}^K e^{x_k}} \;\;\;\text{ for } i = 1, \dots, K\] Softmax is fundamentally a vector function. It takes a vector as...
Backpropagation Through Time for Recurrent Neural Network
The dynamical system is defined by: \[\begin{split} h_{t} & = f_{h} (X_{t}, h_{t-1})\\ \hat{y}_{t} &= f_{o}(h_{t}) \end{split}\] A conventional RNN is constructed by defining the transition function and the output...
Expected value and variance of sum of a random number of i.i.d. random variables
Let $N$ be a random variable assuming positive integer values $1, 2, 3, \dots$. Let $X_{i}$ be a sequence of independent random variables which are also independent of $N$ and...
Derivation of Softmax Function
In this post, we talked a little about softmax function and how to easily implement it in Python. Now, we will go a bit in details and to learn how...
Dimensions of matrices in an LSTM Cell
A general LSTM unit (not a cell! An LSTM cell consists of multiple units. Several LSTM cells form one LSTM layer.) can be shown as given below (source). Equations below...