Hive is a data warehousing framework based on Apache Hadoop which enables easy data summarization, ad-hoc queries and analytics on large volumes of Hadoop data.
Hive provides a SQL-like (HiveQL) interface to query data from Hadoop based databases and file systems.
Hive is suitable for batch and data warehousing tasks. It is not suitable for online transaction processing.
HiveQL queries are implicitly converted to MapReduce, Apache Tez and Spark jobs.
Following are the key features of Hive.
Hive provides a SQL-like interface to query data from Hadoop based databases and file systems. Hence Hive enables SQL based portability on Hadoop.
Hive supports different storage types such as HBase, text files etc.
Following are the major components of a Hive architecture.
Metastore - Stores the metadata for each of the Hive tables.
Driver - Controller that receives the HiveQL statements.
Compiler - Compiles the HiveQL query to an execution plan.
Optimizer - Performs transformations on the execution plan.
Executor - Executes the tasks after compilation and execution.
UI - Command line interface to interact with Hive.
You can create a Hive table using the DDL 'CREATE TABLE' statement.
CREATE TABLE employees (fname STRING, age INT)
You can create a Hive table that can be partitioned using the DDL 'CREATE TABLE... PARTITIONED BY... ' statement?
hive> CREATE TABLE employees (fname STRING, lname STRING, age INT) PARTITIONED BY (ds STRING);
You can list Hive tables using the DDL statement 'SHOW TABLES' statement?
SHOW TABLES
You can list columns of a tables using the DDL statement 'DESCRIBE'?
DESCRIBE employees
You can add new columns to a table in Hive by using the DDL statement 'ALTER TABLE... ADD COLUMNS..'?
hive> ALTER TABLE employees ADD COLUMNS (lname STRING);
By default, Hive platform stores the table metadata in an embedded Derby database.
You can load data from flat files into Hive by using the command 'LOAD DATA... INTO TABLE' command.
//Load from local files
hive> LOAD DATA LOCAL INPATH './files/employees.txt' OVERWRITE INTO TABLE employees;
//Load from Hadoop files
hive> LOAD DATA INPATH './files/employees.txt' OVERWRITE INTO TABLE employees;
You can load data from flat files into different partitions of Hive by using the command 'LOAD DATA... INTO TABLE... PARTITION...' command.
//Load from local files
hive> LOAD DATA LOCAL INPATH './files/employees.txt' OVERWRITE INTO TABLE employees PARTITION (ds='2008-08-15');
//Load from Hadoop files
hive> LOAD DATA INPATH './examples/files/employees.txt' OVERWRITE INTO TABLE employees PARTITION (ds='2008-08-15');