Feed/Category Indexing on Metadata (Properties)

Description

As a user, I would like to be able to:

  • add properties in the form of key:value to a Feed and/or a Feed Category

  • have these properties indexed as feed metadata = to find the feed with a certain property (similarly to feed description, title, tags)

  • have these properties indexed with feed data = to be used to search for data with specific property(s)

  • being able to search them from Kylo's Global Search (or from external connected tools like Kibana)

There are two options we can think of for implementing data search based on category and feed tags with different pros and cons. Both options are reasonably doable, the main difference between the two is whether we want to permanently associate category and feed tags with indexed data, i.e. keep historically accurate tags on indexed data or whether we don’t care about historical tags and want the tags to be present only on categories and feeds such that when tag is updated and is searched for we get data which has been already indexed.

  • Option 1 – Add category and feed tags to data being indexed

    • Description

      • This is the simplest and quickest option to implement as it requires minimal changes to Kylo and minimal changes Nifi flows and will be supported by Kylo UI out of the box and will work on both ES and Solr. This option will inject category and feed tags right into every indexed data record.

    • Pros

      • No change required to Kylo UI to support this option because each data record will have category and feed tags embedded in it

      • Indexed data will have historical record of what category and feed tags were applied to it during ingest

      • Minor change to Nifi flow to send category and feed tags with every data record to ES/Solr

      • Minor change to Kylo to make category and feed tags available to Nifi flows

      • Works the same way for both ES and Solr

    • Cons

      • Changing category and feed tags will not be reflected in already indexed data without re-indexing the same data again

      • Increased disk space requirement because indexed data will have category and feed tags indexed with each data record

  • Option 2 – Parent–child relationship as suggested by PR #53 https://github.com/Teradata/kylo/pull/53

    • Description

      • Establish parent-child relationship between data index and category/feed indexes. This requires more effort to implement and is currently not clear how applicable this is to Solr. There will be changes required in Nifi feed because current ES processor does not support parent-child relationship. Kylo UI does not support parent-child relationship queries and will require some kind of custom search syntax. If primary analysis tools is going to be Kibana, we can potentially either add support for parent-child queries to Global Search over time or drop it altogether. The main benefit of this option is that there is no need to reindex data when category and feed tags are updated.

    • Pros

      • No need to re-index the data when category and feed tags are updated

      • No need for extra disk space for indexed data

    • Cons

      • There will be no historical record of what category and feed tags were set when the data was ingested

      • Kylo UI does not support search queries based on category and feed tags out of the box, extra work will be required to come up with special search syntax to refer to category and feed tags
        ElasticSearch Nifi processor does not support parent-child relationship, can potentially be replaced with InvokeHTTP processor

      • Not currently clear if Sorl supports the same parent-child model

Status

Assignee

RuslansU

Reporter

Paolo Freuli

Reviewer

None

Sprint

None

Fix versions

Priority

Highest