Seismic-Acoustic Dataset of Coastal Bryde’s Whales in the Beibu Gulf

<h3>1. Overview</h3><p dir="ltr">This repository contains both the <b>dataset and deep learning code</b> used in the study:<br><b>“</b><b>Listening to Whales with Island Seismometers: Year-Round Presence and Diel Rhythms of Bryde's...

Full description

Saved in:
Bibliographic Details
Main Author: Yue Wang (22435645) (author)
Other Authors: Zhuo Xiao (22067186) (author)
Published: 2025
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:<h3>1. Overview</h3><p dir="ltr">This repository contains both the <b>dataset and deep learning code</b> used in the study:<br><b>“</b><b>Listening to Whales with Island Seismometers: Year-Round Presence and Diel Rhythms of Bryde's Whales Unveiled by Deep Learning.</b><b>”</b></p><p dir="ltr">It provides a complete package for detecting <b>coastal Bryde’s whale vocalizations</b> using three-component seismic data collected at the <b>Xieyang Island (XYD)</b> station in the Beibu Gulf, northwestern South China Sea, during <b>January–December 2021</b>.<br>It includes:</p><ul><li>A preprocessed dataset of labeled spectrograms;</li><li>CNN-ECA model source code and trained weights;</li><li>Configuration and environment files for reproducible research.</li></ul><h2>2. File Structure</h2><table><tr><td><p dir="ltr">Path</p></td><td><p dir="ltr">Description</p></td></tr><tr><td><p dir="ltr"><code>dataset/</code></p></td><td><p dir="ltr">Folder containing preprocessed spectrogram data and labels.</p></td></tr><tr><td><p dir="ltr">├── <code>all_labels.xls</code></p></td><td><p dir="ltr">Metadata for all samples (timestamps, labels, data source).</p></td></tr><tr><td><p dir="ltr">├── <code>split_info.xls</code></p></td><td><p dir="ltr">Summary of data split ratios.</p></td></tr><tr><td><p dir="ltr">├── <code>*_indices.npy</code></p></td><td><p dir="ltr">Index files for train/val/test subsets.</p></td></tr><tr><td><p dir="ltr">├── <code>*_spectrograms.npy</code></p></td><td><p dir="ltr">3-channel normalized spectrogram arrays for each subset.</p></td></tr><tr><td><p dir="ltr">├── <code>y_*.xls</code></p></td><td><p dir="ltr">Label files for train/val/test sets.</p></td></tr><tr><td><p dir="ltr"><code>config.json</code></p></td><td><p dir="ltr">Configuration file for data paths and hyperparameters.</p></td></tr><tr><td><p dir="ltr"><code>train.py</code></p></td><td><p dir="ltr">CNN-ECA model training script.</p></td></tr><tr><td><p dir="ltr"><code>test.py</code></p></td><td><p dir="ltr">Model evaluation script.</p></td></tr><tr><td><p dir="ltr"><code>optimized_whale_detector_best.pth</code></p></td><td><p dir="ltr">Trained model weights (best validation F1).</p></td></tr><tr><td><p dir="ltr"><code>requirements.txt</code></p></td><td><p dir="ltr">Python dependencies for environment setup.</p></td></tr></table><table><tr><td><h2>3. Dataset Description</h2><ul><li><b>Sampling rate:</b> 100 Hz</li><li><b>Frequency band:</b> 3–20 Hz (Butterworth band-pass filtered)</li><li><b>Channels:</b> North, East, Vertical (three-component seismic data)</li><li><b>Sample length:</b> 10 seconds</li><li><b>Format:</b> Log-scaled, z-score normalized spectrograms (<code>[N, 3, F, T]</code>)</li><li><b>Labels:</b></li></ul><ul><li><ul><li><code>1</code> = Bryde’s whale vocalization</li><li><code>0</code> = Background / non-vocalization</li></ul></li></ul><ul><li><b>Split ratio:</b> 70% train / 15% validation / 15% test</li></ul></td></tr></table><p><br></p>