Problem Statement
The current catalog just allows users to browse data. It doesnt provide much value other than a simple data preview. This can be updated, leveraging a lot of our existing code to make it much more useful and feature rich


  • Make the Catalog more than a data browser (TBK-162)

  • Dataset centric UI Full-view schema view with metadata descriptions, samples, lineage, column profiling, links to feeds and Projects*(new feature)

  • Allow users to annotate datasets (add descriptions to datasets, tag data)

  • Allow users to update column metadata descriptions

  • View column profiles of the data (leverage existing code that does this in the data wrangler)

  • View column summary analysis (leverage existing code that does this in the data wrangler)

  • Jump to wrangler from the catalog dataset

  • Ability to add items to the catalog.

  • When browsing view what datasets have been curated by Kylo already

  • Better searching for data

    • Index names into Elastic search for catalog browsing/searching.

Framework model changes

  • Add URN to a data set. Unique name of the datasource regardless of the connection

  • ie. if you and I have different data sources representing the same physical database, we would want them to tie to the same dataset and metadata

  • Unique name that can be set (and changed) for curating / de-duping datasources.


  • We may need a simplified "browse" view vs the detailed view above.

  • The dataset picker used in the data wrangler should be simple (as it is today), whereas the catalog should show the details




Epic Name

Enhanced Catalog