Create Table in Hive

What Is Hive

Hive is a data warehousing infrastructure based on Hadoop. Hadoop provides massive scale out and fault tolerance capabilities for data storage and processing (using the map-reduce programming paradigm) on commodity hardware.

Hive is designed to enable easy data summarization, ad-hoc querying and analysis of large volumes of data. It provides a simple query language called Hive QL, which is based on SQL and which enables users familiar with SQL to do ad-hoc querying, summarization and data analysis easily. At the same time, Hive QL also allows traditional map/reduce programmers to be able to plug in their custom mappers and reducers to do more sophisticated analysis that may not be supported by the built-in capabilities of the language.

Syntax for Create Table

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)
 (col_name data_type [COMMENT col_comment], ...)
 [COMMENT table_comment]
 [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
 [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
 [SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]
     ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
     [STORED AS DIRECTORIES]
 [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
     | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)
 ]
 [LOCATION hdfs_path]
 [TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later)
 [AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
 LIKE existing_table_or_view_name
 [LOCATION hdfs_path];

Note:
   ☛ Follow the sequence of commands as given else will throw error during creation of tables.
   ☛ All bold words are KeyWords.
   ☛ All red colored italic text are comments to help you understand with command usability.

data_type
 : primitive_type
   | array_type
   | map_type
   | struct_type
   | union_type                 -- (Note: Available in Hive 0.7.0 and later)

primitive_type
 : TINYINT
   | SMALLINT
   | INT
   | BIGINT
   | BOOLEAN
   | FLOAT
   | DOUBLE
   | STRING
   | BINARY                     -- (Note: Available in Hive 0.8.0 and later)
   | TIMESTAMP                  -- (Note: Available in Hive 0.8.0 and later)
   | DECIMAL                    -- (Note: Available in Hive 0.11.0 and later)
   | DECIMAL(precision, scale)  -- (Note: Available in Hive 0.13.0 and later)
   | DATE                       -- (Note: Available in Hive 0.12.0 and later)
   | VARCHAR                    -- (Note: Available in Hive 0.12.0 and later)
   | CHAR                       -- (Note: Available in Hive 0.13.0 and later)

array_type
 : ARRAY < data_type >

map_type
 : MAP < primitive_type, data_type >

struct_type
 : STRUCT < col_name : data_type [COMMENT col_comment], ...>

union_type
 : UNIONTYPE < data_type, data_type, ... > -- (Note: Available in Hive 0.7.0 and later)
row_format
 : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
     [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
     [NULL DEFINED AS char] -- (Note: Available in Hive 0.13 and later)
   | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
file_format:
 : SEQUENCEFILE
   | TEXTFILE  -- (Default, depending on hive.default.fileformat configuration)
   | RCFILE    -- (Note: Available in Hive 0.6.0 and later)
   | ORC       -- (Note: Available in Hive 0.11.0 and later)
   | PARQUET   -- (Note: Available in Hive 0.13.0 and later)
   | AVRO      -- (Note: Available in Hive 0.14.0 and later)
   | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname