Kylo Metastore periodic Cleanup

Description

Our feed specifics:
streaming incoming data, batched in 5 min intervals
Data has a lot of columns
We have about 9 feeds

Problem:
On day 1, backup and restore takes less than a minute
In 3 weeks or so, it takes 1.5 hours to restore - the rate at which data builds up seems high
We need a mechanism to purge job stats info and other stuff before this reaches critical level

Suggestion:
Like the index data flow, could we have housekeeping feeds which carefully considers all relevant tables and purges old data or rolls it up
In the short term, could we get the set of related sql commands to pass on to the client team

Activity

Show:
Scott Reisdorf
October 27, 2017, 10:29 PM

The biggest thing taking up space is probably the NIFI_FEED_PROCESSOR_STATS table.
You can verify table size using this query:

SELECT
table_name AS 'Table',
ROUND(((data_length + index_length) / 1024 / 1024),
2) AS size_megabytes
FROM
information_schema.TABLES
WHERE
table_schema = 'kylo'
and TABLE_TYPE != 'VIEW'
ORDER BY size_megabytes;

If you dont need data older than xx weeks
You could write a mysql procedure to move that data off to a separate schema.
You could then create a template/feed in Kylo and have that scheduled to run on a periodic time (i.e. every 2 weeks) to cleanup and archive operational data.

Scott Reisdorf
November 28, 2017, 1:44 PM

this is taken care of in the compact_nifi_stats procedure that runs in 0.8.4

Done

Assignee

Scott Reisdorf

Reporter

Anindita Mahapatra

Labels

None

Reviewer

None

Components

Sprint

None

Fix versions

Priority

Medium