Reading JSON file in Pig

Today we will see how to read schema less JSON files in Pig. To read JSON files we will be working with the following Jar files— json-simple-1.1.1.jar; elephant-bird-hadoop-compat-4.3.jar; elephant-bird-pig-4.3.jar; findString.jar; Best sample JSON file for testing this is to download tweets from findString.jar is the custom UDF written in ...
Joins – Part 2

Today we will see how to work with joins in Pig Latin script. In my previous post "Joins - Part 1" we have seen What are joins? First we load datasets "person" and "employees" into relations person and employees — -- load person person = load '/tmp/input/person.csv' using PigStorage(',') as ...
Fixed width files in Hive

Today we will see how to load fixed width files into Hive database. We use SerDe properties of Hive to load fixed width files. Initially we will create a staging table before loading fixed file data into table. We create table using SerDe properties by specifying the lengths of each ...
Fixed width files in Pig – Part 2

In my previous post "Fixed width files in Pig – Part 1" we have seen how to read fixed width files and load them into HDFS as tab separated dataset using static pig script. Today we will discuss how to make the pig script dynamic so that we DO NOT ...
Handling database fetch size in Sqoop

When importing data from various RDBMSs you might not have found any issue in the process. This does not mean your Sqoop Import command will work perfectly all the time. When the size of fetched data is small which can fit into allocated memory you will not face any issues ...
Fixed width files in Pig – Part 1

Today we will see how to load fixed length file data into HDFS using Apache Pig. Sample fixed length file (sample_file): EID NAME AGEGSALARY DEPT 1001 Subbayya Sivasankaranarayana Pillai 25M 425000.00HR 1002 Raj Chandra Bose 27M 310000.00FIN 1003 Tirukkannapuram Vijayaraghavan 30M 544000.00MKT 1004 Dattaraya Ramchandra Kaprekar 21M 682345.00EDU 1005 Samarendra ...
