Today we will see how to work with joins in Apache Hive. In my previous post “Joins – Part 1” we have seen What are joins?
1. Inner Join
-- inner join SELECT person.empid, person.name, person.birthdate, person.birthcountry, person.phone, person.emailaddress, employees.gender, employees.maritalstatus, employees.jobtitle, employees.annualrate, employees.startdate, employees.terminationdate FROM person person JOIN employees employees ON (person.empid = employees.empid);
Inner join output resultset —
KC0001 Martina Griffiths 1956-03-31 CAN 9256947901 HCMGENUser1@ap6023fems.us.oracle.com F M Secretary-Senior 31500.0 1990-01-01 NULL KC0003 Cynthia Rogers 1950-07-11 9256947903 HCMGENUser1@ap6023fems.us.oracle.com F S Clerk-Accounting 23010.0 1988-03-12 NULL KC0015 Barry Campbell 1950-12-31 CAN 9256947915 HCMGENUser1@ap6023fems.us.oracle.com M S Specialist-Training 43200.0 1984-11-02 2000-01-28 KC0016 Paulette Vaillancourt 1929-12-10 9256947916 HCMGENUser1@ap6023fems.us.oracle.com F M Analyst-Human Resources 52320.0 1981-03-20 1999-12-31 KU0055 Netty Owyang 1958-08-02 9255556547 netty.owyang@xyzcompany.com F S Manager-Finance 60008.0 1988-03-21 NULL KU0059 Vicki Zinn 1958-04-16 9256947959 vicki.zinn@co.com F M Analyst-Financial 48006.4 1988-07-16 NULL KU0067 Alex Passantino 1967-10-01 USA 9256947967 HCMGENUser1@ap6023fems.us.oracle.com M M Vice President-Intl Ops 182200.0 1991-02-10 NULL KWG126 Johann Moser 1950-04-15 CHE 49891430 HCMGENUser1@ap6023fems.us.oracle.com M M Cleaner 20000.0 1965-03-01 2005-12-31 KWG127 Anita Zahnd 1966-05-23 CHE 49891430 HCMGENUser1@ap6023fems.us.oracle.com F S Researcher 110000.0 1985-02-01 2005-12-31 KWG128 Susette Racine 1946-08-26 CHE 49891430 HCMGENUser1@ap6023fems.us.oracle.com F M Other Job2 20000.0 1960-04-01 2005-12-31
2. Full Outer Join
-- full outer join SELECT IF(employees.empid is NULL, person.empid, employees.empid), person.name, person.birthdate, person.birthcountry, person.phone, person.emailaddress, employees.gender, employees.maritalstatus, employees.jobtitle, employees.annualrate, employees.startdate, employees.terminationdate FROM person person FULL JOIN employees employees ON (person.empid = employees.empid);
Full outer join output resultset —
KC0001 Martina Griffiths 1956-03-31 CAN 9256947901 HCMGENUser1@ap6023fems.us.oracle.com F M Secretary-Senior 31500.0 1990-01-01 NULL KC0003 Cynthia Rogers 1950-07-11 9256947903 HCMGENUser1@ap6023fems.us.oracle.com F S Clerk-Accounting 23010.0 1988-03-12 NULL KC0015 Barry Campbell 1950-12-31 CAN 9256947915 HCMGENUser1@ap6023fems.us.oracle.com M S Specialist-Training 43200.0 1984-11-02 2000-01-28 KU0056 NULL NULL NULL NULL NULL M S Consultant-Junior 48006.4 1988-04-03 NULL KU0057 NULL NULL NULL NULL NULL F M Manager-Project 60008.0 1988-05-17 2005-12-31 KU0058 NULL NULL NULL NULL NULL F M Specialist-Pension 26000.0 1988-06-11 NULL KWG110 Heinz Ganz 1986-02-28 CHE 49891430 HCMGENUser1@ap6023fems.us.oracle.com NULL NULL NULL NULL NULL NULL KWG111 Monica Herz 1966-06-30 CHE 49891430 HCMGENUser1@ap6023fems.us.oracle.com NULL NULL NULL NULL NULL NULL KWG112 Rosa Inglese 1979-10-15 CHE 49891430 HCMGENUser1@ap6023fems.us.oracle.com NULL NULL NULL NULL NULL NULL
3. Left Outer Join
-- left outer join SELECT IF(employees.empid is NULL, person.empid, employees.empid), person.name, person.birthdate, person.birthcountry, person.phone, person.emailaddress, employees.gender, employees.maritalstatus, employees.jobtitle, employees.annualrate, employees.startdate, employees.terminationdate FROM person person LEFT JOIN employees employees ON (person.empid = employees.empid);
Left outer join output resultset —
KC0001 Martina Griffiths 1956-03-31 CAN 9256947901 HCMGENUser1@ap6023fems.us.oracle.com F M Secretary-Senior 31500.0 1990-01-01 NULL KC0003 Cynthia Rogers 1950-07-11 9256947903 HCMGENUser1@ap6023fems.us.oracle.com F S Clerk-Accounting 23010.0 1988-03-12 NULL KC0015 Barry Campbell 1950-12-31 CAN 9256947915 HCMGENUser1@ap6023fems.us.oracle.com M S Specialist-Training 43200.0 1984-11-02 2000-01-28 KWG110 Heinz Ganz 1986-02-28 CHE 49891430 HCMGENUser1@ap6023fems.us.oracle.com NULL NULL NULL NULL NULL NULL KWG111 Monica Herz 1966-06-30 CHE 49891430 HCMGENUser1@ap6023fems.us.oracle.com NULL NULL NULL NULL NULL NULL KWG112 Rosa Inglese 1979-10-15 CHE 49891430 HCMGENUser1@ap6023fems.us.oracle.com NULL NULL NULL NULL NULL NULL
4. Right Outer Join
-- left outer join SELECT IF(employees.empid is NULL, person.empid, employees.empid), person.name, person.birthdate, person.birthcountry, person.phone, person.emailaddress, employees.gender, employees.maritalstatus, employees.jobtitle, employees.annualrate, employees.startdate, employees.terminationdate FROM person person RIGHT JOIN employees employees ON (person.empid = employees.empid);
Right outer join output resultset —
KC0001 Martina Griffiths 1956-03-31 CAN 9256947901 HCMGENUser1@ap6023fems.us.oracle.com F M Secretary-Senior 31500.0 1990-01-01 NULL KC0003 Cynthia Rogers 1950-07-11 9256947903 HCMGENUser1@ap6023fems.us.oracle.com F S Clerk-Accounting 23010.0 1988-03-12 NULL KC0015 Barry Campbell 1950-12-31 CAN 9256947915 HCMGENUser1@ap6023fems.us.oracle.com M S Specialist-Training 43200.0 1984-11-02 2000-01-28 KU0056 NULL NULL NULL NULL NULL M S Consultant-Junior 48006.4 1988-04-03 NULL KU0057 NULL NULL NULL NULL NULL F M Manager-Project 60008.0 1988-05-17 2005-12-31 KU0058 NULL NULL NULL NULL NULL F M Specialist-Pension 26000.0 1988-06-11 NULL
For more on Hive commands, functions, and UDFs follow Qubole Hive Cheatsheet.