Generating Artifacts

Inventory manager generates various run-time artifacts that is used by data pipeline services and merge/recon processes.

 ./generateArtifacts.sh [ generatePPLXML         | 
                          generateDDL            | 
                          generateSqoopImport    |
                          generateAutoReconHQL   | 
                          generateStreamMergeHQL | 
                          generateAll 
                        ]

The options are

  • generatePPLXML - Streaming Pipeline mapping XML

  • generateDDL - DDL for stream, recon and current view tables

  • generateSqoopImport - Columns and predicate for sqoop command

  • generateAutoReconHQL - HQL to reconcile data in stream using recon tables

  • generateSTreamMergeHQL - HQL to merge stream data into current view

  • generateAll - Generate all the above artifacts

The user can also pass a flag "-p" to indicate if the table is streamed using event publication or change data capture (CDC). The default is CDC.

Another optional flag "-f" can be passed to indicate if the cached schema version should be updated. The valid values are Y or N and default is Y. Note - It is more efficient to use the cached values if there are no structural changes in the source tables.

Please note that from version 1.0.9 onwards, the generated DDL for CDC tables contains the audit timestamp column (inv_updtd_dtm) as the first field. The generated merge HQL also accounts for this and the columns are ordered accordingly. The EP tables may have this audit timestamp column at the end and the HQL is generated accordingly. In case of mismatch, the developers can adjust the generated HQL and check it into source control.

Last updated