Joins – Part 1

What is Join?
An instruction to a Database or Hadoop Framework to combine data from more than one table or dataset.

 

Table or DataSet
If you have ever used excel or any other spreadsheet program, you have already worked with tables. A table/dataset is a collection of columns and rows where you store data. Typically, each row represents data combined with column representing an attribute that the row can have.

 

Types Of Joins
JoinInner11. Inner Join OR Equi Join
An inner join produces a result set that is limited to the rows where there is a match in both tables/datasets. If you don’t know which kind of join you need, this will usually be your best bet.

 
 

JoinLeftOuter12. Left Outer Join
Records from the first table/dataset are included whether they have a match or not in second table/dataset. Fields from the unmatched second table/dataset are set to null.

 
 

JoinRightOuter13. Right Outer Join
The opposite of Left Outer Join. Records from the second table/dataset are included whether they have a match or not in first table/dataset. Fields from the unmatched first table/dataset are set to null.

 
 

JoinFullOuter14. Full Outer Join
Records from both tables/datasets are included. For unmatched records the fields from the ‘other’ table/dataset are set to null.

 
 
 

5. Cross Join
A cross join that does not have a WHERE clause produces the Cartesian product of the tables involved in the join. The size of a Cartesian product result set is the number of rows in the first table multiplied by the number of rows in the second table. The following example shows a Transact-SQL cross join.

 
Stay tuned we will have more updates on joins using Pig Latin and Hive.

Leave a Reply

Your email address will not be published. Required fields are marked *