Bag-of-Words and TF-IDF Tutorial

In information retrieval and text mining, TF-IDF, short for term-frequency inverse-document frequency is a numerical statistics (a weight) that is intended to reflect how important a word is to a...

Bash Shell Tutorial

What is a shell? Traditionally, when you log into a Unix system, the system would start one program for you. That program is a shell, i.e., a program designed to...

GloVE

The core concept of word embeddings is that every word used in a language can be represented by a set of real numbers (a vector). Word embeddings are N-dimensional vectors...

Simple Running Average using Window Functions of PostgreSQL

Recently, a friend of mine discovered a weird but interesting property of ORDER BY while using a window function intriqued me. I did not know that and I wanted to...

Principal Component Analysis

CURSE OF DIMENSIONALITY As the number of features (dimensionality) increases, the data becomes relatively more sparse and often exponentially more samples are needed to make statistically significant predictions. Curse of...