Derivative of Loss Function w.r.t. softmax function

Softmax function is given by: \[S(x_{i}) = S_{i} = \frac{e^{x_i}}{\sum_{k=1}^K e^{x_k}} \;\;\;\text{ for } i = 1, \dots, K\] Softmax is fundamentally a vector function. It takes a vector as...

Backpropagation Through Time for Recurrent Neural Network

The dynamical system is defined by: \[\begin{split} h_{t} & = f_{h} (X_{t}, h_{t-1})\\ \hat{y}_{t} &= f_{o}(h_{t}) \end{split}\] A conventional RNN is constructed by defining the transition function and the output...

Expected value and variance of sum of a random number of i.i.d. random variables

Let $N$ be a random variable assuming positive integer values $1, 2, 3, \dots$. Let $X_{i}$ be a sequence of independent random variables which are also independent of $N$ and...

Derivation of Softmax Function

In this post, we talked a little about softmax function and how to easily implement it in Python. Now, we will go a bit in details and to learn how...

Dimensions of matrices in an LSTM Cell

A general LSTM unit (not a cell! An LSTM cell consists of multiple units. Several LSTM cells form one LSTM layer.) can be shown as given below (source). Equations below...