Documents
  • Invariant Documents
  • Platform
    • Data Platform
      • Install Overview
      • System Requirement
      • Software Requirement
      • Prepare the Environment
      • Installing Ambari Server
      • Setup Ambari Server
      • Start Ambari Server
      • Single Node Install
      • Multi-Node Cluster Install
      • Cluster Install from Ambari
      • Run and monitor HDFS
    • Apache Hadoop
      • Compatible Hadoop Versions
      • HDFS
        • HDFS Architecture
        • Name Node
        • Data Node
        • File Organization
        • Storage Format
          • ORC
          • Parquet
        • Schema Design
      • Hive
        • Data Organization
        • Data Types
        • Data Definition
        • Data Manipulation
          • CRUD Statement
            • Views, Indexes, Temporary Tables
        • Cost-based SQL Optimization
        • Subqueries
        • Common Table Expression
        • Transactions
        • SerDe
          • XML
          • JSON
        • UDF
      • Oozie
      • Sqoop
        • Commands
        • Import
      • YARN
        • Overview
        • Accessing YARN Logs
    • Apache Kafka
      • Compatible Kafka Versions
      • Installation
    • Elasticsearch
      • Compatible Elasticsearch Versions
      • Installation
  • Discovery
    • Introduction
      • Release Notes
    • Methodology
    • Discovery Pipeline
      • Installation
      • DB Event Listener
      • Pipeline Configuration
      • Error Handling
      • Security
    • Inventory Manager
      • Installation
      • Metadata Management
      • Column Mapping
      • Service Configuration
      • Metadata Configuration
      • Metadata Changes and Versioning
        • Generating Artifacts
      • Reconciliation, Merging Current View
        • Running daily reconciliation and merge
      • Data Inventory Reports
    • Schema Registry
  • Process Insight
    • Process Insight
      • Overview
    • Process Pipeline
      • Data Ingestion
      • Data Storage
    • Process Dashboards
      • Panels
      • Templating
      • Alerts
        • Rules
        • Notifications
  • Content Insight
    • Content Insight
      • Release Notes
      • Configuration
      • Content Indexing Pipeline
    • Management API
    • Query DSL
    • Configuration
  • Document Flow
    • Overview
  • Polyglot Data Manager
    • Polyglot Data Manager
      • Release Notes
    • Data Store
      • Concepts
      • Sharding
    • Shippers
      • Filerelay Container
    • Processors
    • Search
    • User Interface
  • Operational Insight
    • Operational Insight
      • Release Notes
    • Data Store
      • Concepts
      • Sharding
    • Shippers
      • Filerelay Container
    • Processors
    • Search
    • User Interface
  • Data Science
    • Data Science Notebook
      • Setup JupyterLab
      • Configuration
        • Configuration Settings
        • Libraries
    • Spark DataHub
      • Concepts
      • Cluster Setup
      • Spark with YARN
      • PySpark Setup
        • DataFrame API
      • Reference
  • Product Roadmap
    • Roadmap
  • TIPS
    • Service Troubleshooting
    • Service Startup Errors
    • Debugging YARN Applications
      • YARN CLI
    • Hadoop Credentials
    • Sqoop Troubleshooting
    • Log4j Vulnerability Fix
Powered by GitBook
On this page
  1. Discovery
  2. Inventory Manager

Data Inventory Reports

The records counts are tracked across the source and target views nightly. Every night, counts of all records created, updated for the day based on the audit columns configured in the dataSourceTables are recorded. In addition, the count of number of inserts, updates and deletes from the incremental table are recorded. The record count of the reconciliation is recorded after the sqooping the data. The counts of the records from the current view are recorded after the completion of the merge activities for the day.

The counts across the data sources, targets are used to validate the data moved by the pipeline compared to changes in the source systems and what gets merged to the current views. A set of command line utilities are provided to check the status of the tables individually or as a set for the last day or for a specific time period.

./run-invmgr-report.sh                                                                                                      "
    --csvReport | --consoleReport | --mailReport | --dailyCount ] <- Action Param   
    [-d YYYY-mm-DD]                              <- Optional Processing Date      
    [-s iinv|isrc]                               <- Source - only for daily count 
    [-t csv/html]                                <- Format - only for mailReport  
   

The report can be displayed on the screen using consoleReport or can be sent to a file using csvReport option. For mailing the report, define the list of email ids in the "recon.properties" configuration file.

The processing date can be specified but defaults to previous day if empty.

In addition to command line query capability administrative configurations are available to email an operational group of any errors in data inventory so operational and environments team can troubleshoot further.

PreviousRunning daily reconciliation and mergeNextSchema Registry

Last updated 4 years ago