Documents
  • Invariant Documents
  • Platform
    • Data Platform
      • Install Overview
      • System Requirement
      • Software Requirement
      • Prepare the Environment
      • Installing Ambari Server
      • Setup Ambari Server
      • Start Ambari Server
      • Single Node Install
      • Multi-Node Cluster Install
      • Cluster Install from Ambari
      • Run and monitor HDFS
    • Apache Hadoop
      • Compatible Hadoop Versions
      • HDFS
        • HDFS Architecture
        • Name Node
        • Data Node
        • File Organization
        • Storage Format
          • ORC
          • Parquet
        • Schema Design
      • Hive
        • Data Organization
        • Data Types
        • Data Definition
        • Data Manipulation
          • CRUD Statement
            • Views, Indexes, Temporary Tables
        • Cost-based SQL Optimization
        • Subqueries
        • Common Table Expression
        • Transactions
        • SerDe
          • XML
          • JSON
        • UDF
      • Oozie
      • Sqoop
        • Commands
        • Import
      • YARN
        • Overview
        • Accessing YARN Logs
    • Apache Kafka
      • Compatible Kafka Versions
      • Installation
    • Elasticsearch
      • Compatible Elasticsearch Versions
      • Installation
  • Discovery
    • Introduction
      • Release Notes
    • Methodology
    • Discovery Pipeline
      • Installation
      • DB Event Listener
      • Pipeline Configuration
      • Error Handling
      • Security
    • Inventory Manager
      • Installation
      • Metadata Management
      • Column Mapping
      • Service Configuration
      • Metadata Configuration
      • Metadata Changes and Versioning
        • Generating Artifacts
      • Reconciliation, Merging Current View
        • Running daily reconciliation and merge
      • Data Inventory Reports
    • Schema Registry
  • Process Insight
    • Process Insight
      • Overview
    • Process Pipeline
      • Data Ingestion
      • Data Storage
    • Process Dashboards
      • Panels
      • Templating
      • Alerts
        • Rules
        • Notifications
  • Content Insight
    • Content Insight
      • Release Notes
      • Configuration
      • Content Indexing Pipeline
    • Management API
    • Query DSL
    • Configuration
  • Document Flow
    • Overview
  • Polyglot Data Manager
    • Polyglot Data Manager
      • Release Notes
    • Data Store
      • Concepts
      • Sharding
    • Shippers
      • Filerelay Container
    • Processors
    • Search
    • User Interface
  • Operational Insight
    • Operational Insight
      • Release Notes
    • Data Store
      • Concepts
      • Sharding
    • Shippers
      • Filerelay Container
    • Processors
    • Search
    • User Interface
  • Data Science
    • Data Science Notebook
      • Setup JupyterLab
      • Configuration
        • Configuration Settings
        • Libraries
    • Spark DataHub
      • Concepts
      • Cluster Setup
      • Spark with YARN
      • PySpark Setup
        • DataFrame API
      • Reference
  • Product Roadmap
    • Roadmap
  • TIPS
    • Service Troubleshooting
    • Service Startup Errors
    • Debugging YARN Applications
      • YARN CLI
    • Hadoop Credentials
    • Sqoop Troubleshooting
    • Log4j Vulnerability Fix
Powered by GitBook
On this page
  1. Discovery
  2. Inventory Manager
  3. Metadata Changes and Versioning

Generating Artifacts

Inventory manager generates various run-time artifacts that is used by data pipeline services and merge/recon processes.

 ./generateArtifacts.sh [ generatePPLXML         | 
                          generateDDL            | 
                          generateSqoopImport    |
                          generateAutoReconHQL   | 
                          generateStreamMergeHQL | 
                          generateAll 
                        ]

The options are

  • generatePPLXML - Streaming Pipeline mapping XML

  • generateDDL - DDL for stream, recon and current view tables

  • generateSqoopImport - Columns and predicate for sqoop command

  • generateAutoReconHQL - HQL to reconcile data in stream using recon tables

  • generateSTreamMergeHQL - HQL to merge stream data into current view

  • generateAll - Generate all the above artifacts

The user can also pass a flag "-p" to indicate if the table is streamed using event publication or change data capture (CDC). The default is CDC.

Another optional flag "-f" can be passed to indicate if the cached schema version should be updated. The valid values are Y or N and default is Y. Note - It is more efficient to use the cached values if there are no structural changes in the source tables.

Please note that from version 1.0.9 onwards, the generated DDL for CDC tables contains the audit timestamp column (inv_updtd_dtm) as the first field. The generated merge HQL also accounts for this and the columns are ordered accordingly. The EP tables may have this audit timestamp column at the end and the HQL is generated accordingly. In case of mismatch, the developers can adjust the generated HQL and check it into source control.

PreviousMetadata Changes and VersioningNextReconciliation, Merging Current View

Last updated 4 years ago