ACID transactions in Hive
Now a days there is growing need of updating/deleting of data in hive. Lets have quick look at how to update/delete data in ACID tables in hive. Minimum pre-requisites to perform Hive CRUD using ACID operations are --
- Hive version 0.14 and later
- Table created with file format must ...
Read More
Split file into multiple files using Pig Script
We see some time there is requirement where you need to split a file into different individual files based on some key value. You can do it using Java/C/C++ or any other programming language where you write some dozens of lines of code which is fine if the file size ...
Read More
Read More
XML parsing in Hive
Working with complex XML data that contains multiple collections in it. From the sample file you find book authors with multiple book titles, genre and other details. We will try to find solution using Hive. Sample xml file (sample.xml) [cc lang="xml"] Gambardella, MatthewXML Developer's GuideComputer44.952000-10-01An in-depth look at creating applications ...
Read More
Read More
Dynamic partitioning in Hive
I am writing this post how to work with dynamic partition tables along with bucketing. Also we will see how to store data in ORC format.
As we all know there is huge amount of data (terabytes/petabytes/exabytes/zettabytes) is stored in Hadoop HDFS so, it becomes very difficult for Hadoop users ...
Read More
As we all know there is huge amount of data (terabytes/petabytes/exabytes/zettabytes) is stored in Hadoop HDFS so, it becomes very difficult for Hadoop users ...
Read More
Sqoop Export
Today we will see how to load data from Hive to RDBMS (MySQL) using Sqoop Export command. Below are some of key observations that need to be keep in mind before proceeding with Sqoop Export process — ☛ Exporting table schema must exists in target RDBMS. ☛ The data which is exported from HDFS to ...
Read More
Read More
Joins – Part 3
Today we will see how to work with joins in Apache Hive. In my previous post "Joins - Part 1" we have seen What are joins? 1. Inner Join -- inner join SELECT person.empid, person.name, person.birthdate, person.birthcountry, person.phone, person.emailaddress, employees.gender, employees.maritalstatus, employees.jobtitle, employees.annualrate, employees.startdate, employees.terminationdate FROM person person JOIN employees ...
Read More
Read More