Job statistics are not correct for ingestion from subdirectories

Description

While ingesting data using Data Ingestion template from multiple subdirectories, it seems that the job statistics are not populated correctly:

  • there is only 1 absolute path per job

  • there is only 1 filename per job

I tested with 3 files under 3 directories, and it created 2 job statistics for the exact timestamp.

Expected: 1 job with a list of files processed, and a list of subdirectories processed

Attached you can find the stats for job 1, 2 and ingested data.

Feed settings:
Source = Filesystem
Input Directory = /var/dropzone/
File Filter = .*
Recurse Subdirectories = true

My hierarchy
/var/dropzone/folder1/folder11/weather11.csv
/var/dropzone/folder1/folder12/weather12.csv
/var/dropzone/folder2/folder21/weather21.csv

The last 2 digits of the filename match the station_id first 2 digits, to follow from where the data came from.

Environment

NiFi 1.3
CDH 5.10

Assignee

Unassigned

Reporter

Claudiu Stanciu

Labels

None

Reviewer

None

Story point estimate

None

Affects versions

Priority

Low
Configure