Configuration
Pipeline System Configurations
The data pipeline also consists of a series of definitions which are for additional tuning as well as contains the environment specific details.
avrotypemappingdef.yml
orctypemappingdef.yml
brkadapter.properties
inbound.topics.brkadapter.properties
invariant-hdfs-adapter.yml
Mapping configuration
The data pipeline is configured to translate data types between source and target systems based on the nature of HDFS file format. the supported sources are DB2, Oracle and target HDFS formats are AVRO and ORC.
The avrotypemappingdef.yml is used to define the source datatype to Avro data type.
The orctypemappingdef.yml is used to map source datatype to Orc data types
In this definition CHAR, VARCHAR, TIMESTAMP, DATE, XML datatypes from a DB2 source will be mapped to Avro STRING.
avrotypemapping.yml
dataserdetype: AVRO mapping:
- STRING: => Target (Avro ) Data type
- !dbtypeMapping
dbtype: DB2 => Source DBMS
type: [CHAR,VARCHAR,TIMESTAMP,DATE,XML] => Source DBMS data types
- DECIMAL:
- !dbtypeMapping
dbtype: DB2
type: [DECIMAL]
- INT:
- !dbtypeMapping
dbtype: DB2
type: [SMALLINT,INTEGER]
- BIGINT:
- !dbtypeMapping
dbtype: DB2
type: [BIGINT,LONG]
In this definition CHAR, VARCHAR, XML datatypes from a DB2 source will be mapped to ORC STRING.
orctypemappingdef.yml
Broker Configuration
brkadapter.properties is used to define the broker properties used to read the database events streamed from the DBMS. This includes the list of brokers, topic and group information used to read from the brokers.
Broker Topics
inbound.topics.brkadapter.properties file is used to list the topics from which the data will be consumed. The list of topics should be comma separated
HDFS Adapter Configuration
invariant-hdfs-adapter.yml is used to configure the Hadoop environment variables, target schema as well as credentials used to interact with HDFS.
Last updated