Documents
  • Invariant Documents
  • Platform
    • Data Platform
      • Install Overview
      • System Requirement
      • Software Requirement
      • Prepare the Environment
      • Installing Ambari Server
      • Setup Ambari Server
      • Start Ambari Server
      • Single Node Install
      • Multi-Node Cluster Install
      • Cluster Install from Ambari
      • Run and monitor HDFS
    • Apache Hadoop
      • Compatible Hadoop Versions
      • HDFS
        • HDFS Architecture
        • Name Node
        • Data Node
        • File Organization
        • Storage Format
          • ORC
          • Parquet
        • Schema Design
      • Hive
        • Data Organization
        • Data Types
        • Data Definition
        • Data Manipulation
          • CRUD Statement
            • Views, Indexes, Temporary Tables
        • Cost-based SQL Optimization
        • Subqueries
        • Common Table Expression
        • Transactions
        • SerDe
          • XML
          • JSON
        • UDF
      • Oozie
      • Sqoop
        • Commands
        • Import
      • YARN
        • Overview
        • Accessing YARN Logs
    • Apache Kafka
      • Compatible Kafka Versions
      • Installation
    • Elasticsearch
      • Compatible Elasticsearch Versions
      • Installation
  • Discovery
    • Introduction
      • Release Notes
    • Methodology
    • Discovery Pipeline
      • Installation
      • DB Event Listener
      • Pipeline Configuration
      • Error Handling
      • Security
    • Inventory Manager
      • Installation
      • Metadata Management
      • Column Mapping
      • Service Configuration
      • Metadata Configuration
      • Metadata Changes and Versioning
        • Generating Artifacts
      • Reconciliation, Merging Current View
        • Running daily reconciliation and merge
      • Data Inventory Reports
    • Schema Registry
  • Process Insight
    • Process Insight
      • Overview
    • Process Pipeline
      • Data Ingestion
      • Data Storage
    • Process Dashboards
      • Panels
      • Templating
      • Alerts
        • Rules
        • Notifications
  • Content Insight
    • Content Insight
      • Release Notes
      • Configuration
      • Content Indexing Pipeline
    • Management API
    • Query DSL
    • Configuration
  • Document Flow
    • Overview
  • Polyglot Data Manager
    • Polyglot Data Manager
      • Release Notes
    • Data Store
      • Concepts
      • Sharding
    • Shippers
      • Filerelay Container
    • Processors
    • Search
    • User Interface
  • Operational Insight
    • Operational Insight
      • Release Notes
    • Data Store
      • Concepts
      • Sharding
    • Shippers
      • Filerelay Container
    • Processors
    • Search
    • User Interface
  • Data Science
    • Data Science Notebook
      • Setup JupyterLab
      • Configuration
        • Configuration Settings
        • Libraries
    • Spark DataHub
      • Concepts
      • Cluster Setup
      • Spark with YARN
      • PySpark Setup
        • DataFrame API
      • Reference
  • Product Roadmap
    • Roadmap
  • TIPS
    • Service Troubleshooting
    • Service Startup Errors
    • Debugging YARN Applications
      • YARN CLI
    • Hadoop Credentials
    • Sqoop Troubleshooting
    • Log4j Vulnerability Fix
Powered by GitBook
On this page
  • Run Schema Registry
  • Kafka Consumer Integration
  1. Discovery

Schema Registry

In the data pipeline, Schema is used to define the message metadata - structure and type of message exchanged between systems. A schema registry acts as a central repository of the message metadata allowing applications to discover and decipher the messages. The registry can also provide interfaces to serialize/deserialize messages.

Schema includes metadata such as

  • name - Unique name of the schema.

  • description - Description of the schema.

  • type - The type of schema. e.g, Avro, Json etc

  • compatibility - Compatibility between different versions.

Run Schema Registry

To launch the schema registry in background, use the command below. Change the "env type" parameter below to match your environment in case you want to run multiple instances on the same server.

./run_schema_registry.sh -e <env type> --start

To start the schema registry in the foreground, use the command

./run_schema_registry.sh -e <env type> -m interactive --start 

To stop the schema registry, use the command

./run_schema_registry.sh -e <env type> --stop

Kafka Consumer Integration

To use the registry with Kafka consumer, set the config as follows:

bootstrap.servers=localhost:9092
topic.name=test_activity
schemaregistry.url=http://localhost:8081/
key.deserializer=org.apache.kafka.common.serialization.ByteArrayDeserializer
value.deserializer=io.invariant.kafka.deserializer.KafkaAvroDeserializer
group.id=InvKafkaAvroConsumer

A console consumer is provided to extract the AVRO messages in a topic. To start the consumer with the de-serializer, use:

./bin/kafka-avro-consumer.sh $*

Users can utilize the bundled schema or use the registry from Confluent or Hortonworks. In which case, modify the schemaregistry URL to point to the appropriate registry.

You can also plugin custom serializers and deserializers, if you wish to make use of SerDE provided by Confluent. In which case, copy the appropriate client jars in the classpath.

PreviousData Inventory ReportsNextProcess Insight

Last updated 5 years ago