Data Manipulation

Hive supports a rich set of options to retrieve, store, modify, delete, insert and update data

Loading Data

Load command can be used to bulk load data. Hive does not do any transformation while loading data into tables. Load operations are pure copy/move operations that can move data in bulk into locations corresponding to Hive tables.

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename
[PARTITION (partcol1=val1, partcol2=val2 ...)]

LOCAL is identifier to specify the local path. It is optional
OVERWRITE is optional to overwrite the data in the table
PARTITION is optional

Inserting data into Hive Tables from queries

Query Results can be inserted into tables by using the insert clause. INSERT OVERWRITE will overwrite any existing data in the table or partition. INSERT INTO will append to the table or partition, keeping the existing data intact.

Querying Data

SELECT query can be used to retrieve rows from tables. If the WHERE clause is specified, it filters the results and all rows that do not satisfy the condition are eliminated from the output. Use the GROUP BY clause or aggregate function calls to combine output into groups of rows that match on one or more values. Use the ORDER BY clause to return rows in sorted order.

In general, a SELECT query scans the entire table (other than for sampling). If a table is created using the PARTITIONED BY clause, a query can do partition pruning and scan only a fraction of the table relevant to the partitions specified by the query. Hive currently does partition pruning if the partition predicates are specified in the WHERE clause or the ON clause in a JOIN.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select

Joins

JOINS are used for retrieving results from multiple tables. Hive supports inner joins and left, right and full outer joins.

An inner join is the most common join operation used in applications and can be regarded as the default join-type. An inner join query compares each row of A with each row of B to find all those pairs of rows that satisfy the join predicate.

PreviousData Definition NextCRUD Statement

Last updated 6 years ago