Schema Registry

In the data pipeline, Schema is used to define the message metadata - structure and type of message exchanged between systems. A schema registry acts as a central repository of the message metadata allowing applications to discover and decipher the messages. The registry can also provide interfaces to serialize/deserialize messages.

Schema includes metadata such as

  • name - Unique name of the schema.

  • description - Description of the schema.

  • type - The type of schema. e.g, Avro, Json etc

  • compatibility - Compatibility between different versions.

Run Schema Registry

To launch the schema registry in background, use the command below. Change the "env type" parameter below to match your environment in case you want to run multiple instances on the same server.

./run_schema_registry.sh -e <env type> --start

To start the schema registry in the foreground, use the command

./run_schema_registry.sh -e <env type> -m interactive --start 

To stop the schema registry, use the command

./run_schema_registry.sh -e <env type> --stop

Kafka Consumer Integration

To use the registry with Kafka consumer, set the config as follows:

bootstrap.servers=localhost:9092
topic.name=test_activity
schemaregistry.url=http://localhost:8081/
key.deserializer=org.apache.kafka.common.serialization.ByteArrayDeserializer
value.deserializer=io.invariant.kafka.deserializer.KafkaAvroDeserializer
group.id=InvKafkaAvroConsumer

A console consumer is provided to extract the AVRO messages in a topic. To start the consumer with the de-serializer, use:

./bin/kafka-avro-consumer.sh $*

Users can utilize the bundled schema or use the registry from Confluent or Hortonworks. In which case, modify the schemaregistry URL to point to the appropriate registry.

You can also plugin custom serializers and deserializers, if you wish to make use of SerDE provided by Confluent. In which case, copy the appropriate client jars in the classpath.

Last updated