SerDe
Serializer/Deserializer
SerDe is shortform for Serializer/Deserializer. SerDes are used by Hive for reading in data from a table, and writing it back out to HDFS in any custom format. The interface handles both serialization and deserialization as well as interpreting the serialization results as individual fields for processing.
Addon SerDes
XML Processing
Copybook
Fixed Layout Files
XML SerDe
The XML SerDe allows you to query XML data stored in Hive tables through the use of XPath definitions. It allows users to define tables with a combination of regular columns and values parsed from XML string. The SerDe can also be used to query repeating groups within the XML document and returns the value as arrays. These arrays can be further flattened through the use of functions provided by other Invariant UDFs which support nullable values, positional values within the array etc.
The SerDe creates a single XML document sourced from a column in the source table. In essence the definition expects the root elements to be defined and any repeating groups within the document to be handled by the node and node leaf definition.
Last updated