Documents
  • Invariant Documents
  • Platform
    • Data Platform
      • Install Overview
      • System Requirement
      • Software Requirement
      • Prepare the Environment
      • Installing Ambari Server
      • Setup Ambari Server
      • Start Ambari Server
      • Single Node Install
      • Multi-Node Cluster Install
      • Cluster Install from Ambari
      • Run and monitor HDFS
    • Apache Hadoop
      • Compatible Hadoop Versions
      • HDFS
        • HDFS Architecture
        • Name Node
        • Data Node
        • File Organization
        • Storage Format
          • ORC
          • Parquet
        • Schema Design
      • Hive
        • Data Organization
        • Data Types
        • Data Definition
        • Data Manipulation
          • CRUD Statement
            • Views, Indexes, Temporary Tables
        • Cost-based SQL Optimization
        • Subqueries
        • Common Table Expression
        • Transactions
        • SerDe
          • XML
          • JSON
        • UDF
      • Oozie
      • Sqoop
        • Commands
        • Import
      • YARN
        • Overview
        • Accessing YARN Logs
    • Apache Kafka
      • Compatible Kafka Versions
      • Installation
    • Elasticsearch
      • Compatible Elasticsearch Versions
      • Installation
  • Discovery
    • Introduction
      • Release Notes
    • Methodology
    • Discovery Pipeline
      • Installation
      • DB Event Listener
      • Pipeline Configuration
      • Error Handling
      • Security
    • Inventory Manager
      • Installation
      • Metadata Management
      • Column Mapping
      • Service Configuration
      • Metadata Configuration
      • Metadata Changes and Versioning
        • Generating Artifacts
      • Reconciliation, Merging Current View
        • Running daily reconciliation and merge
      • Data Inventory Reports
    • Schema Registry
  • Process Insight
    • Process Insight
      • Overview
    • Process Pipeline
      • Data Ingestion
      • Data Storage
    • Process Dashboards
      • Panels
      • Templating
      • Alerts
        • Rules
        • Notifications
  • Content Insight
    • Content Insight
      • Release Notes
      • Configuration
      • Content Indexing Pipeline
    • Management API
    • Query DSL
    • Configuration
  • Document Flow
    • Overview
  • Polyglot Data Manager
    • Polyglot Data Manager
      • Release Notes
    • Data Store
      • Concepts
      • Sharding
    • Shippers
      • Filerelay Container
    • Processors
    • Search
    • User Interface
  • Operational Insight
    • Operational Insight
      • Release Notes
    • Data Store
      • Concepts
      • Sharding
    • Shippers
      • Filerelay Container
    • Processors
    • Search
    • User Interface
  • Data Science
    • Data Science Notebook
      • Setup JupyterLab
      • Configuration
        • Configuration Settings
        • Libraries
    • Spark DataHub
      • Concepts
      • Cluster Setup
      • Spark with YARN
      • PySpark Setup
        • DataFrame API
      • Reference
  • Product Roadmap
    • Roadmap
  • TIPS
    • Service Troubleshooting
    • Service Startup Errors
    • Debugging YARN Applications
      • YARN CLI
    • Hadoop Credentials
    • Sqoop Troubleshooting
    • Log4j Vulnerability Fix
Powered by GitBook
On this page
  • REST Services for managing document indexes
  • Insert Document
  • Update document
  • Delete Document by ID
  • Search By ID
  • Full text document search with metadata filters
  1. Content Insight

Management API

The Content Insight Management API can be used to add, update, delete and search documents within the Index Store

REST Services for managing document indexes

Insert Document

PUT https://api.invariant.io/v1/document/index

This endpoint inserts a document and returns Document ID on successful insert.

Path Parameters

Name
Type
Description

id

string

Unique ID of document to be uploaded

file

object

type = file; /path/to/file/to/be/uploaded

raw

string

JSON metadata as name value pair, e.g.{"id": "RAW-C00003", "documenttitle": "This is first title C", "dln": "DD1", "tax_year": "2019", "cust_type": "TP_TYPE_1", "classified_cd": "CLASS_CD_1", "conf_code": "CONF_CD_1", "mimetype": "application/json", "form_type_cd": "FORM_TYPE_CD_1", "ica_id": "CAPTURE_1", "notice_id": "NOTICE_1", "scan_id": "SCAN_PAGE_1", "scan_date": "2017-02-19", "effective_date": "2017-02-20", "expiration_date": "2021-02-19", "last_updated_date": "2018-12-10", "candelete": "Y", "document_class": "DOC_CLASS_1" }

Query Parameters

Name
Type
Description

mediaType

string

The results to be returned as XML or JSON

Headers

Name
Type
Description

authentication

string

Authentication token to track down who wants to view the documents.

{ 
    "result": "success", 
    "message": "index complete", 
    "filename": "mqdecode.pdf", 
    "_id": "6790"
}
{
    "_id": null,
    "filename": "somefile.pdf",
    "result": "fail",
    "message": "Connection refused: no further information"
}

Update document

POST https://api.invariant.io/v1/document/update

This endpoint updates the document

Path Parameters

Name
Type
Description

id

string

ID of the document to be updated

file

object

type = file /path/to/file/that/is/updated/

raw

string

JSON metadata as name value pair e.g. { "id": "RAW-C00004", "doctitle": "This is first title C Updated", "dln": "DD3", "tax_year": "2033", "tp_type": "TP_TYPE_3", "conf_code": "CONF_CD_3", "mimetype": "application/json", "form_type": "FORM_3", "ica_id": "CAPTURE_3", "scan_id": "SCAN_PAGE_3", "scan_date": "2017-02-13", "effective_date": "2017-02-23", "expiration_date": "2021-02-13", "last_updated_date": "2018-12-13", "candelete": "N", "document_class": "DOC_CLASS_3" }

Query Parameters

Name
Type
Description

mediaType

string

The result to be returned as XML or JSON

Headers

Name
Type
Description

authentication

string

Authentication token to update a document

{
    "result": "success",
    "message": "updated",
    "filename": "UpdatedFile.pdf",
    "_id": "C-1000020"
}

Delete Document by ID

DELETE https://api.invariant.io/v1/document/:id

This endpoint deletes document

Path Parameters

Name
Type
Description

id

string

ID of the document to delete

Query Parameters

Name
Type
Description

mediaType

string

The result to be returned as XML or JSON

Headers

Name
Type
Description

authentication

string

Authentication token to delete the document

{
    "found": true,
    "result": "deleted",
    "_id": "C-1000037"
}

Search By ID

GET https://api.invariant.io/v1/search/:id

This endpoint returns a document by ID.

Path Parameters

Name
Type
Description

id

string

ID of the document to locate.

Query Parameters

Name
Type
Description

mediaType

string

The results to be returned as XML or JSON

Headers

Name
Type
Description

authentication

string

Authentication token to track down who wants to view the documents.

{
    "_id": "6790",
    "_version": "1",
    "found": "true",
    "_source": {
        "author": "Administrator",
        "title": "Microsoft PowerPoint - WSTE-WMQDataConversion1",
        "date": "1189704111000",
        "format": "application/pdf; version=1.3"
        "raw": {
            "docno": "dg78878",
            "filename": "mqdecode.pdf",
            "filesize": "413256",
            "indexing_date": "1518715553333",
            "checksum": "2d95a41b72440d86d89f4c18e5eff2c0"
        }
    }
}
{
    "message": "Document not found."
}

Full text document search with metadata filters

GET https://api.invariant.io/v1/search/content

Path Parameters

Name
Type
Description

dln

string

e.g. z89897878 The following string types are supported as filter params - id, documenttitle, dln, tax_year, tp_type, classified_cd, conf_code, form_type_cd, document_class

effective_date

string

e.g. 2010-06-30,2012-01-30 The date format is yyyy-MM-DD. The first date is the start date and the second date after the comma is end date. The date are inclusive so the search would be (effective_date >= 2010-06-30 and effective_date <= 2012-01-30) The following elements can be used in date range query - effective_date, last_update_date, expiration_date, scan_date

search_text

string

Text to be searched in the content database

Query Parameters

Name
Type
Description

mediaType

string

The result to be returned as XML or JSON

Headers

Name
Type
Description

authentication

string

The authentication token to track down who wants the view the documents

{
    "_id": "6790",
    "_version": "1",
    "found": "true",
    "_source": {
        "author": "Administrator",
        "title": "Microsoft PowerPoint - WSTE-WMQDataConversion1",
        "date": "1189704111000",
        "format": "application/pdf; version=1.3"
        "raw": {
            "docno": "dg78878",
            "filename": "mqdecode.pdf",
            "filesize": "413256",
            "indexing_date": "1518715553333",
            "checksum": "2d95a41b72440d86d89f4c18e5eff2c0"
        }
    }
}
{
    "message": "Document not found."
}
PreviousContent Indexing PipelineNextQuery DSL

Last updated 6 years ago