Our feed specifics:
streaming incoming data, batched in 5 min intervals
Data has a lot of columns
We have about 9 feeds
On day 1, backup and restore takes less than a minute
In 3 weeks or so, it takes 1.5 hours to restore - the rate at which data builds up seems high
We need a mechanism to purge job stats info and other stuff before this reaches critical level
Like the index data flow, could we have housekeeping feeds which carefully considers all relevant tables and purges old data or rolls it up
In the short term, could we get the set of related sql commands to pass on to the client team
The biggest thing taking up space is probably the NIFI_FEED_PROCESSOR_STATS table.
You can verify table size using this query:
table_name AS 'Table',
ROUND(((data_length + index_length) / 1024 / 1024),
2) AS size_megabytes
table_schema = 'kylo'
and TABLE_TYPE != 'VIEW'
ORDER BY size_megabytes;
If you dont need data older than xx weeks
You could write a mysql procedure to move that data off to a separate schema.
You could then create a template/feed in Kylo and have that scheduled to run on a periodic time (i.e. every 2 weeks) to cleanup and archive operational data.
this is taken care of in the compact_nifi_stats procedure that runs in 0.8.4