Metadata Configuration

The inventory manager configuration is controlled by a set of YAML files in the config sub-folder of inventory manager installation. The next section explains the key files used to configure the data-sources, tables of interest, target data store etc. Other files present such as the data type mapping should not require change and are maintained by product.

The key files to be managed by the administrators

datasources.yml

This configuration file is used to describe the data sources that feed in source data to the discovery pipeline. Multiple data sources and databases are supported. Please refer to the feature section for the data sources supported by the current version.

An example datasources.yml configuration file.

datasources:
 - name: datasource1
   driverClass: com.ibm.db2.jcc.DB2Driver
   user: db2inst1
   password: aaaaaaaaaa==:bbbbbbbb/cccccc==
   schema: schema1
   url: jdbc:db2://invdb.localhost:50000/sample
 - name: datasource2
   driverClass: com.ibm.db2.jcc.DB2Driver
   user: db2inst1
   password: aaaaaaaaaa==:bbbbbbbb/cccccc==
   schema: schema2
   url: jdbc:db2://invdb.localhost:50000/sample2

The datasources YAML file support multiple data sources that are be configured to source data for the discovery pipeline. Each data source is configured within a section with the following properties

- name: <name of the data source. This field value is used to match sections across all of the configuration file. So please maintain the same configuration with case>
   driverClass: <the jdbc driver used to connect to the database please refer to the utilities section>
   user: <the database user used for the connection>
   password: <the encrypted url used to connect, use the encryption tool to generate the encrypted password, refer to the utilities section>
   schema: <the schema where the data resides>
   url: <the jdbc url used to make the connection>

Sample for DB2 based data source

- name: datasource2
   driverClass: com.ibm.db2.jcc.DB2Driver
   user: db2inst1
   password: aaaaaaaaaa==:bbbbbbbb/cccccc==
   schema: schema2
   url: jdbc:db2://invdb.localhost:50000/sample

datasource_tables.yml

The dataSource_tables YAML is used to configure the tables used to source data from each one of the data sources. In addition, the audit columns are marked to support inventory tracking.

An example datasource_tables.yml configuration file.

# Database settings.
---
datasources:
- name:  datasource1
  auditCols:
    created: created
    updated: updated
  tables:
  - tableName: tabltwo
  - tableName: tablthree
  - tableName: tablfour
- name: datasource2
  auditCols:
    created: created_dt
    updated: updated_dt
  tables:
  - tableName: "tabla"
  - tableName: "tablb"
  - tableName: "tablc"
    overrideAuditCols:
      created: crtd_dt
      updated: updt_dt

The datasource_tables YAML file support multiple data sources to be configured to source data for the discovery pipeline. Each data source is configured within a section with the following properties

- name: datasource2
  auditCols:
    created: <audit column used to identify created date time across all tables for this data source>
    updated: <audit column used to identify update date time across all tables for this data source>
  tables:
  - tableName: <name of table>
  - tableName: <name of table>
  - tableName: <name of table>
    overrideAuditCols: <over ride audit columns, in case of table which has a different set of columns to track audit>
     created: <audit column used to identify created date time across all tables for this data source>
     updated: <audit column used to identify update date time across all tables for this data source>

Sample of a data source with tables, a set of audit columns defined at the data source level and overridden for “tablc”

- name: datasource2
  auditCols:
    created: created_dt
   updated: updated_dt
  tables:
  - tableName: "tabla"
  - tableName: "tablb"
  - tableName: "tablc"
    overrideAuditCols:
     created: crtd_dt
     updated: updt_dt

target_datastore.yml

Target store YAML configuration is used to configure the discovery HDFS datastores. It includes the Hive configurations, HDFS locations used for data inventory and support of data merge processes.

An example target_datastore.yml configuration file.

# Target DataStore settings.
hadoopHome: /opt/inv/current/hadoop-client/
hiveHome: /opt/inv/current/hive-server2/
connectHdfsPrincipal: admin
hadoopConfDir: /opt/inv/current/hadoop-client/conf/
hdfsUrl: hdfs://invnn.invariant.locahost:8020/
hiveMetastoreUris: thrift://invnmeta.localhost:9083
hiveWarehouseBaseDir: /apps/hive/warehouse/
hdfsAuthKerberosConfig: false
hdfsNamenodePrincipal:
connectHdfsKeytab:
targetStores:
 - name: datasource1
   incrementalStore : stream
   reconStore : recon.ods
   currStore : curr.ods
   incrTableExtn : incr
 - name: datasource2
   incrementalStore : stream
   reconStore : recon.mdm
   currStore : curr.mdm
   incrTableExtn : incr

The target_datastore configuration consists of two sections. The first section contains the HDFS connectivity details such as home directory, URL, user authentication details etc. This information should be provided by the Invariant platform administrator. Please also check the installation instructions.

hadoopHome: <Hadoop dir where the client jars are present>
hiveHome: <Hive dir where the client jars are present>
connectHdfsPrincipal: < admin account used to connect >
hadoopConfDir: <Hadoop dir where the conf files are present>
hdfsUrl: <name node url>
hiveMetastoreUris: <hive metastore url>
hiveWarehouseBaseDir: <base dir where the incremental tables are written>
hdfsAuthKerberosConfig: <Kerberos login flag>hdfsNamenodePrincipal: <Kerberos login principal>
connectHdfsKeytab: <Kerberos keytab>

The second section is related to the source tables and where the data will be persisted in the HDFS store. An entry to match each data source is expected to be configured.

targetStores:
 - name: <datasource name>
   incrementalStore : <the schema where incremental/stream data is written>
   reconStore : <the schema where recon data is written daily>
   currStore : <the schema where curr view data is written>
   incrTableExtn : <the incr table extension>

PreviousService Configuration NextMetadata Changes and Versioning

Last updated 5 years ago