Data integrity control
Splunk has now come up with the data integrity managing feature in its latest version 6.3. It provides a way to verify the integrity of data that is indexed over Splunk. On enabling this feature, Splunk computes hashes on every slice of uploaded data and stores those hashes so that they can be used to verify the integrity of the data. It is a very useful feature where the logs are from sources such as bank transactions and other critical data where an integrity check is necessary.
On enabling this feature, Splunk computes hashes on every slice of newly indexed raw data and writes it to an l1Hashes
file. When the bucket rolls from one bucket to another, say from hot to warm, Splunk computes the hash of contents of the l1Hashes
file and stores it into the l2Hash
file.
Hash validation can be done on Splunk's data by running the following CLI command:
./splunk check-integrity -bucketPath [ bucket path ] [ verbose ] ./splunk check-integrity -index [ index name ] [ verbose ]
In case hashes are lost, they can be regenerated using the following commands:
./splunk generate-hash-files -bucketPath [ bucket path ] [ verbose ] ./splunk generate-hash-files -index [ index name ] [ verbose ]
Let's now configure data integrity control. To configure data integrity control, modify the indexes.conf
file located at $SPLUNK_HOME\etc\system\local
as follows:
enableDataIntegrityControl=true
Note
In a clustered environment, all the clusters and peers should run Splunk 6.3 to enable accurate data integrity control.