Schema Design
Schema Design
Hadoop supports schema-less storage but we still need to make decisions about the directory structure as the data flows through the system. To access and manage data via Hive, schemas need to be first defined. Metadata for the stored data plays an important role in the analysis process and a shared catalog can help the load scripts, query tools and BI applications.
HDFS Schema Design
Important to create a structured and organized repository of data
Standard directory structure
Stage data in separate location
Enforce access control
File Location
Standard location where files are stored. User files under /user/<name>
Data is stored in files within the Hadoop filesystem. Data can be separated based on functional use with enforced access control.
Last updated