HDFS

Hadoop Distributed File System (HDFS)

The Hadoop distributed filesystem (HDFS ) allows users to store large amount of data across multiple server instances. A HDFS cluster consists of NameNode to manage the file system metadata and DataNodes to store the actual data. The client applications contact NameNode for file metadata or file modifications, while the actual file I/O is performed directly with the DataNodes.

Key HDFS features -

Fault tolerant and horizontally scalable
Provides distributed storage and distributed processing on commodity hardware.
Can scale linearly as data footprint grows
Provides users UNIX shell-like commands to interact with HDFS

For additional details, visit the HDFS page on Apache Hadoop website

PreviousCompatible Hadoop Versions NextHDFS Architecture

Last updated 6 years ago