Blog List

ACID transactions in Hive

Now a days there is growing need of updating/deleting of data in hive. Lets have quick look at how to update/delete data in ACID tables in hive. Minimum pre-requisites to perform Hive CRUD using ACID operations are --
  1. Hive version 0.14 and later
  2. Table created with file format must ...
    Read More

Split file into multiple files using Pig Script

We see some time there is requirement where you need to split a file into different individual files based on some key value. You can do it using Java/C/C++ or any other programming language where you write some dozens of lines of code which is fine if the file size ...
Read More

XML parsing in Hive

Working with complex XML data that contains multiple collections in it. From the sample file you find book authors with multiple book titles, genre and other details. We will try to find solution using Hive. Sample xml file (sample.xml) [cc lang="xml"] Gambardella, MatthewXML Developer's GuideComputer44.952000-10-01An in-depth look at creating applications ...
Read More

Dynamic partitioning in Hive

I am writing this post how to work with dynamic partition tables along with bucketing. Also we will see how to store data in ORC format.

As we all know there is huge amount of data (terabytes/petabytes/exabytes/zettabytes) is stored in Hadoop HDFS so, it becomes very difficult for Hadoop users ...
Read More

Sqoop Export

Today we will see how to load data from Hive to RDBMS (MySQL) using Sqoop Export command. Below are some of key observations that need to be keep in mind before proceeding with Sqoop Export process — ☛ Exporting table schema must exists in target RDBMS. ☛ The data which is exported from HDFS to ...
Read More

Joins – Part 3

Today we will see how to work with joins in Apache Hive. In my previous post "Joins - Part 1" we have seen What are joins? 1. Inner Join -- inner join SELECT person.empid,, person.birthdate, person.birthcountry,, person.emailaddress, employees.gender, employees.maritalstatus, employees.jobtitle, employees.annualrate, employees.startdate, employees.terminationdate FROM person person JOIN employees ...
Read More

Hadoopers! Welcome.