Documents
  • Invariant Documents
  • Platform
    • Data Platform
      • Install Overview
      • System Requirement
      • Software Requirement
      • Prepare the Environment
      • Installing Ambari Server
      • Setup Ambari Server
      • Start Ambari Server
      • Single Node Install
      • Multi-Node Cluster Install
      • Cluster Install from Ambari
      • Run and monitor HDFS
    • Apache Hadoop
      • Compatible Hadoop Versions
      • HDFS
        • HDFS Architecture
        • Name Node
        • Data Node
        • File Organization
        • Storage Format
          • ORC
          • Parquet
        • Schema Design
      • Hive
        • Data Organization
        • Data Types
        • Data Definition
        • Data Manipulation
          • CRUD Statement
            • Views, Indexes, Temporary Tables
        • Cost-based SQL Optimization
        • Subqueries
        • Common Table Expression
        • Transactions
        • SerDe
          • XML
          • JSON
        • UDF
      • Oozie
      • Sqoop
        • Commands
        • Import
      • YARN
        • Overview
        • Accessing YARN Logs
    • Apache Kafka
      • Compatible Kafka Versions
      • Installation
    • Elasticsearch
      • Compatible Elasticsearch Versions
      • Installation
  • Discovery
    • Introduction
      • Release Notes
    • Methodology
    • Discovery Pipeline
      • Installation
      • DB Event Listener
      • Pipeline Configuration
      • Error Handling
      • Security
    • Inventory Manager
      • Installation
      • Metadata Management
      • Column Mapping
      • Service Configuration
      • Metadata Configuration
      • Metadata Changes and Versioning
        • Generating Artifacts
      • Reconciliation, Merging Current View
        • Running daily reconciliation and merge
      • Data Inventory Reports
    • Schema Registry
  • Process Insight
    • Process Insight
      • Overview
    • Process Pipeline
      • Data Ingestion
      • Data Storage
    • Process Dashboards
      • Panels
      • Templating
      • Alerts
        • Rules
        • Notifications
  • Content Insight
    • Content Insight
      • Release Notes
      • Configuration
      • Content Indexing Pipeline
    • Management API
    • Query DSL
    • Configuration
  • Document Flow
    • Overview
  • Polyglot Data Manager
    • Polyglot Data Manager
      • Release Notes
    • Data Store
      • Concepts
      • Sharding
    • Shippers
      • Filerelay Container
    • Processors
    • Search
    • User Interface
  • Operational Insight
    • Operational Insight
      • Release Notes
    • Data Store
      • Concepts
      • Sharding
    • Shippers
      • Filerelay Container
    • Processors
    • Search
    • User Interface
  • Data Science
    • Data Science Notebook
      • Setup JupyterLab
      • Configuration
        • Configuration Settings
        • Libraries
    • Spark DataHub
      • Concepts
      • Cluster Setup
      • Spark with YARN
      • PySpark Setup
        • DataFrame API
      • Reference
  • Product Roadmap
    • Roadmap
  • TIPS
    • Service Troubleshooting
    • Service Startup Errors
    • Debugging YARN Applications
      • YARN CLI
    • Hadoop Credentials
    • Sqoop Troubleshooting
    • Log4j Vulnerability Fix
Powered by GitBook
On this page
  1. Platform
  2. Apache Hadoop
  3. YARN

Accessing YARN Logs

YARN applications and logs

YARN supports multiple programming models by decoupling resource management from application scheduling/monitoring. YARN uses a global Resource Manager (RM), per-worker-node Node Managers (NMs), and per-application Application Masters (AMs). The per-application AM negotiates resources (CPU, memory, disk, network) for running your application with the RM. The RM works with NMs to grant these resources, which are granted as containers. The AM is responsible for tracking the progress of the containers assigned to it by the RM. An application may require many containers depending on the nature of the application. Each application may consist of multiple application attempts. If an application fails, it may be retried as a new attempt. Each attempt runs in a container. In a sense, a container provides the context for basic unit of work performed by a YARN application. All work that is done within the context of a container is performed on the single worker node on which the container was allocated.

Application logs (and the associated container logs) are critical in debugging problematic Hadoop applications. YARN provides a nice framework for collecting, aggregating, and storing application logs. It aggregates logs across all containers on a worker node and stores them as one aggregated log file per worker node. The log is stored on the default file system after an application finishes. Your application may use hundreds or thousands of containers, but logs for all containers run on a single worker node are always aggregated to a single file. So, there is only 1 log per worker node used by your application. Aggregated logs are located in default storage for the cluster.

/app-logs/<user>/logs/<applicationId>

In the path above, user is the name of the user who started the application. The applicationId is the unique identifier assigned to an application by the YARN RM. The aggregated logs are not directly readable, as they are written in a binary format and indexed by container. Use the YARN ResourceManager logs or CLI tools to view these logs as plain text for applications or containers of interest.

YARN CLI tools

To use the YARN CLI tools, you must first connect to the Hadoop CLI

$ yarn logs -applicationId <applicationId> -appOwner <user-who-started-the-application>
$ yarn logs -applicationId <applicationId> -appOwner <user-who-started-the-application> 
    -containerId <containerId> -nodeAddress <worker-node-address>

Specify the <applicationId>, <user-who-started-the-application>, <containerId>, and <worker-node-address> information when running these commands.

PreviousOverviewNextApache Kafka

Last updated 2 months ago