Documents
  • Invariant Documents
  • Platform
    • Data Platform
      • Install Overview
      • System Requirement
      • Software Requirement
      • Prepare the Environment
      • Installing Ambari Server
      • Setup Ambari Server
      • Start Ambari Server
      • Single Node Install
      • Multi-Node Cluster Install
      • Cluster Install from Ambari
      • Run and monitor HDFS
    • Apache Hadoop
      • Compatible Hadoop Versions
      • HDFS
        • HDFS Architecture
        • Name Node
        • Data Node
        • File Organization
        • Storage Format
          • ORC
          • Parquet
        • Schema Design
      • Hive
        • Data Organization
        • Data Types
        • Data Definition
        • Data Manipulation
          • CRUD Statement
            • Views, Indexes, Temporary Tables
        • Cost-based SQL Optimization
        • Subqueries
        • Common Table Expression
        • Transactions
        • SerDe
          • XML
          • JSON
        • UDF
      • Oozie
      • Sqoop
        • Commands
        • Import
      • YARN
        • Overview
        • Accessing YARN Logs
    • Apache Kafka
      • Compatible Kafka Versions
      • Installation
    • Elasticsearch
      • Compatible Elasticsearch Versions
      • Installation
  • Discovery
    • Introduction
      • Release Notes
    • Methodology
    • Discovery Pipeline
      • Installation
      • DB Event Listener
      • Pipeline Configuration
      • Error Handling
      • Security
    • Inventory Manager
      • Installation
      • Metadata Management
      • Column Mapping
      • Service Configuration
      • Metadata Configuration
      • Metadata Changes and Versioning
        • Generating Artifacts
      • Reconciliation, Merging Current View
        • Running daily reconciliation and merge
      • Data Inventory Reports
    • Schema Registry
  • Process Insight
    • Process Insight
      • Overview
    • Process Pipeline
      • Data Ingestion
      • Data Storage
    • Process Dashboards
      • Panels
      • Templating
      • Alerts
        • Rules
        • Notifications
  • Content Insight
    • Content Insight
      • Release Notes
      • Configuration
      • Content Indexing Pipeline
    • Management API
    • Query DSL
    • Configuration
  • Document Flow
    • Overview
  • Polyglot Data Manager
    • Polyglot Data Manager
      • Release Notes
    • Data Store
      • Concepts
      • Sharding
    • Shippers
      • Filerelay Container
    • Processors
    • Search
    • User Interface
  • Operational Insight
    • Operational Insight
      • Release Notes
    • Data Store
      • Concepts
      • Sharding
    • Shippers
      • Filerelay Container
    • Processors
    • Search
    • User Interface
  • Data Science
    • Data Science Notebook
      • Setup JupyterLab
      • Configuration
        • Configuration Settings
        • Libraries
    • Spark DataHub
      • Concepts
      • Cluster Setup
      • Spark with YARN
      • PySpark Setup
        • DataFrame API
      • Reference
  • Product Roadmap
    • Roadmap
  • TIPS
    • Service Troubleshooting
    • Service Startup Errors
    • Debugging YARN Applications
      • YARN CLI
    • Hadoop Credentials
    • Sqoop Troubleshooting
    • Log4j Vulnerability Fix
Powered by GitBook
On this page
  1. Discovery
  2. Inventory Manager
  3. Reconciliation, Merging Current View

Running daily reconciliation and merge

Inventory manager can use daily Sqoop extracts to reconcile data against the stream tables. The data can then be merged into the final current view tables.

Use the recon_invmgr_recon script to extract, reconcile and merge the records into target schema

./run-invmgr-recon.sh [--all | --onlyMerge | --onlyRecon | --onlySqoop ]   <- Action Param          
                      [-e dev|test|prod]                                   <- Env Param             
                      [-s bpm|ods]                                         <- Source DB Param         
                      [-d yyyy-mm-dd]                                      <- Optional processing date
                      [-n num]                                             <- Optional Num days       
                      [-t \"table1 table2\"]                               <- Optional Table List     

The action parameter is required and specifies the steps which will be executed

  • --onlyMerge - merge stream tables to current

  • --onlyRecon - reconcile stream and recon records

  • --onlySqoop - sqoop data into the recon schema

  • --all - run all steps in sequence (sqoop, recon and

The script uses configuration in the "recon.properties" file in the config folder. By default, the merge is run for the previous day. This can be overridden by specifying a specific date or number of days to go back and run the action.

The list of tables is defined in recon.properties, however this can be overridden on the command line and you can pass in a specific table or list of tables. The list should be within double quotes and separated by space. Alternatively you can specify the list in the configuration with source db and table_list suffix. for example - bpm_table_list=work history

# Recon script config

# queue name
queue_name=pipeline
tez_java_opts=-Xmx1640m
container_size=2048

# ods source db url
ods_source_url=jdbc:db2://localhost:51101/DBODSX11
ods_source_user_name=invapp
ods_source_user_alias=db2.invapp.password.alias
ods_source_user_jceks=//hdfs/user/invapp/db2.jceks

#bpm source db url
bpm_source_url=jdbc:db2://localhost:51102/DBBPMX11
bpm_source_user_name=invapp
bpm_source_user_alias=db2.invapp.password.alias
bpm_source_user_jceks=//hdfs/user/invapp/db2.jceks

# target store URL
target_url="jdbc:hive2://local_1:2181, local_2:2181/scratch;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2"
target_sqoop_schema=recon
beeline_password_file=<location>

# schema
recon_schema=recon
stream_schema=stream
ods_schema=ods
bpm_schema=bpm

# list of tables to be processed
bpm_table_list=work history
ods_table_list=activity document

# target schema is table specific - since 1.0.9
default_target_schema=ods
activity_target_schema=dev
history_target_schema=dev
document_target_schema=dev
work_target_schema=dev

# user to be notified
mail_to=test@inv.com

The target schema is new in 1.0.9 and is required for merge to work

The target schema for each table should be defined in the recon.properties. The syntax is

{table_name}_target_schema={schema name}

For example if activity table is to be merged into activity table defined in dev schema, use

activity_target_schema=dev

PreviousReconciliation, Merging Current ViewNextData Inventory Reports

Last updated 4 years ago