Bag-of-Words and TF-IDF Tutorial
In information retrieval and text mining, TF-IDF, short for term-frequency inverse-document frequency is a numerical statistics (a weight) that is intended to reflect how important a word is to a...
Bash Shell Tutorial
What is a shell? Traditionally, when you log into a Unix system, the system would start one program for you. That program is a shell, i.e., a program designed to...
GloVE
The core concept of word embeddings is that every word used in a language can be represented by a set of real numbers (a vector). Word embeddings are N-dimensional vectors...
Simple Running Average using Window Functions of PostgreSQL
Recently, a friend of mine discovered a weird but interesting property of ORDER BY while using a window function intriqued me. I did not know that and I wanted to...
Principal Component Analysis
CURSE OF DIMENSIONALITY As the number of features (dimensionality) increases, the data becomes relatively more sparse and often exponentially more samples are needed to make statistically significant predictions. Curse of...