Exploring our competition data
The LANL Earthquake Prediction
dataset consists of the following data:
- A
train.csv
file, with two columns only:acoustic_data
: This is the amplitude of the acoustic signal.time_to_failure
: This is the time to failure corresponding to the current data segment.
- A test folder with 2,624 files with small segments of acoustic data.
- A
sample_submission.csv
file; for each test file, those competing will need to give an estimate for time to failure.
The training data (9.56 GB) contains 692 million rows. The actual time constant for the samples in the training data results from the continuous variation of time_to_failure
values. The acoustic data is integer values, from -5,515 to 5,444, with an average of 4.52 and a standard deviation of 10.7 (values oscillating around 0). The time_to_failure
values are real numbers, ranging from 0 to 16, with a mean of 5.68 and a standard deviation of 3.67...