The no bulls*t guide to joining data in SQL! [With Infographic]

0
The no bulls*t guide to joining data in SQL! [With Infographic]

In the previous guide to SQL we helped you understand how you can extract data from a single table using SQL but what if we have to extract data from these two tables simultaneously and use them together? Well, this is a situation encountered by data scientists very often. This guide explores how we can do just this with a neat infographic! SQL offers us ways in which we can do this using a few simple lines of code or queries involving ‘joins’.

A JOIN clause is used to combine rows from two or more tables based on a related column between them.

In a mutating join, two or more tables are merged based on a matching values in all those tables.

In a filtering join, it only filters data from one table based on a common value between tables and does not merge both the data tables.

Inner join is very useful when you want to mutate rows of your choice from both the tables that have matching values in both the tables. In our case:

The above query will give you a list of matches played by players who played football as well as baseball. Here, the selection of records is based on the players who played both the sports.

Left join is very useful when you want to mutate rows of your choice from both the tables that may or may not have matching values in both the tables. For the rows on the left table that do not have a matching value on the right table, the corresponding value will be a NULL. For our example, I would like to add a row ‘PlayerID’ to both Football and Baseball tables.

The above query will give you a list of all the football players and the IDs of those players who played both the sports will be displayed next to it. If the football player didn’t play baseball, the PlayerID would be NULL.

In a Right Join, for the rows on the right table that do not have a matching value on the left table, the corresponding value will be a NULL.

The above query will give you a list of all the baseball players and the IDs of those players who played both the sports will be displayed next to it. If the baseball player didn’t play football, the PlayerID would be NULL.

The above query will give you a list of all the football players and the baseball player IDs. For the players who played both the sports, the ID and player name would be displayed on the same row.

The above mentioned joins mutate tables to form a new temporary table and are hence called MUTATING JOINS.

There are another set of joins known as FILTERING JOINS. Filtering joins are of two types: Semi Join and Anti Join.  Unlike a mutating join, a filtering join only filters data from one table based on a common value between tables and does not merge both the data tables.

A Semi join checks and filters data from the first table for which there is a match in the second table.

An Anti join checks and filters data from the first table for which there isn’t a match in the second table.

This guide will prove useful to you as a budding data scientist who is new to Big Data and would like to play around with different data tables to find some sort of correlation or connection between these tables to enable you to understand the data better and thus draw much better insights from the data available.

LEAVE A REPLY