A Book Review of "Architecting Modern Data Platforms"

1.1 Billion Taxi Rides: Spark 2.4.0 versus Presto 0.214

1.1 Billion Taxi Rides: 108-core ClickHouse Cluster

Convert CSVs to ORC Faster

Working with the Hadoop Distributed File System (HDFS)

Systems Monitoring: top vs Htop vs Glances

Working with Data Feeds

A Minimalist Guide to Microsoft SQL Server 2017 on Ubuntu Linux

1.1 Billion Taxi Rides with SQLite, Parquet & HDFS

Customising Airflow: Beyond Boilerplate Settings

Using SQL to query Kafka, MongoDB, MySQL, PostgreSQL and Redis with Presto

Python & Big Data: Airflow & Jupyter Notebook with Hadoop 3, Spark & Presto

1.1 Billion Taxi Rides Benchmark: EC2 versus EMR

Hadoop 3 Single-Node Install Guide

1.1B Taxi Rides with 20 Nvidia Telsa P100s and BrytlytDB

1.1 Billion Taxi Rides with BrytlytDB 2.0 & 2x p2.16xls

A Minimalist Guide to SQLite (with Python 3)

1.1 Billion Taxi Trips on 3 Raspberry Pis Running Spark 2.2

1.1B taxi rides benchmark on the GPU- and PostgreSQL-powered BrytlytDB

Compiling MapD's Source Code

1.1B taxi rides benchmarked on distributed GPU-powered MapD

Detecting Bots in Apache and Nginx Logs Using Python

Doom Bots in TensorFlow

Analysing Petabytes of Websites with PySpark

Summary of the 1.1 Billion Taxi Rides Benchmarks

1.1B Taxi Rides on Kdb+/q and 4 Xeon Phi CPUs

1.1B Taxi Rides with MapD and 8 Nvidia Pascal Titan Xs

Using Python, Postgres & Redis to create a Forex Data Pipeline

Database with 4 Nvidia Gaming GPUs 43X Faster Than CPU Clusters

A Billion NYC Taxi Rides in PostreSQL

More →