Columnar Storage

An important part of the Hadoop ecosystem is HDFS, Hadoop’s distributed file system. Like other file systems the format of the files you can store on HDFS is entirely up...

Using PySpark to connect to PostgreSQL locally

Apache Spark is a unified open-source analytics engine for large-scale data processing a distributed environment, which supports a wide array of programming languages, such as Java, Python, and R, eventhough...

Hierarchical clustering

Hierarchical clustering, also known as hierarchical cluster analysis, is an algorithm that groups similar objects into groups called clusters. The endpoint is a set of clusters, where each cluster is...

RGB to Grayscale Conversion

Consider a color image, given by its red, green, blue components R, G, B. The range of pixel values is often 0 to 255. Color images are represented as multi-dimensional...

How to create virtual environments for different applications?

Just imagine that you have an application which is fully developed and you do not want to make any changes to the libraries it is using but at the same...