on
1 min to read.
Implementing Softmax Function in Python
The softmax function is used in various multiclass classification methods. It takes an un-normalized vector, and normalizes it into a probability distribution. It is often used in neural networks, to map the non-normalized output to a probability distribution over predicted output classes. It is a function which gets applied to a vector in $x \in R^{K}$ and returns a vector in $[0, 1]^{K}$ with the property that the sum of all elements is 1, in other words, the softmax function is useful for converting an arbitrary vector of real numbers into a discrete probability distribution:
\[S(x)_j = \frac{e^{x_j}}{\sum_{k=1}^K e^{x_k}} \;\;\;\text{ for } j=1, \dots, K\]Python implementation
NOTE: Sigmoid function is special case of softmax function. It is easy to prove. Whereas the softmax outputs a valid probability distribution over $K > 2$ distinct outputs, the sigmoid does the same for $K=2$.