Sound detection and feature extraction

class soundscape_IR.soundscape_viewer.utility.spectrogram_detection(input, f, threshold, smooth=0, minimum_interval=0, minimum_duration=None, maximum_duration=None, pad_size=0, filename='Detection.txt', folder_id=[], path='./', status_print=True, show_result=True, run_detection=True)[source]

This class applies an energy thresholding method to find regions of interest displayed on a spectrogram.

It uses a known estimate of minimum signal interval to separate regions of interest and subsequently remove false alarms according to the minimum and maximum signal duration.

The output is a table containing Begin Time (s), End Time (s), Low Frequency (Hz), High Frequency (Hz), and Maximum SNR (dB). The table is saved in a text file, which can be imported to the Raven software (

inputndarray of shape (time, frequency+1)

Spectrogram data for analysis.

The first column is time, and the subsequent columns are power spectral densities (or signal-to-noise ratios) associated with f. Use the same spectrogram format generated from audio_visualization.

fndarray of shape (frequency,)

Frequency of the input spectrogram data.


Energy threshold for binarizing the spectrogram data.

Only time and frequency bins with intensities higher than threshold are considered as detections.

smoothfloat ≥ 0, default = 0

Standard deviation of Gaussian kernel for smoothing the spectrogram data.

See sigma in scipy.ndimage.gaussian_filter for details.

minimum_intervalfloat ≥ 0, default = 0

Minimum time interval (in seconds) for the algorithm to separate two regions of interest.

If the interval between two signals is shorter than minimum_interval, the two signals are considered to be the same detection.

minimum_duration, maximum_durationNone or float > 0, default = None

Minimum and maximum signal durations of each detection (in seconds).

pad_sizefloat ≥ 0, default = 0

Duration that increases the length before and after each detection (in seconds).

filenamestr, default = ‘Detection.txt’

Name of the txt file contains detections.


Path to save detection result.

folder_id[] or str, default = []

The folder ID of Google Drive folder for saving detection result.

See for the detial of folder ID.

status_printboolean, default = True

Print file saving process if set to True.

show_resultboolean, default = True

Plot detection results on the spectrogram if set to True.

run_detectionboolean, default = True

Run detection procedures if set to True.

Set to False will generate one detection covering the entire duration of spectrogram. Only set to False for the purpose of extracting acoustic features.


Generate a prewhitened spectrogram and detect high-intensity signals with signal-to-noise ratios higher than 6 dB. Combine consecutive signals with intervals shorter than 0.1 sec for one detection. Only signals with durations ranging between 0.1 and 1 sec are saved.

>>> from soundscape_IR.soundscape_viewer import audio_visualization
>>> sound = audio_visualization(filename='audio.wav', path='./wav/', prewhiten_percent=50, plot_type=None)
>>> from soundscape_IR.soundscape_viewer import spectrogram_detection
>>> sp=spectrogram_detection(, sound.f, threshold=6, smooth=1, minimum_interval=0.5, minimum_duration=0.1, maximum_duration=1, filename='Detection.txt', path='./save/')

Detect regions of interest from a spectrogram that is filtered by using a source separation model. Add 0.1-sec padding before and after each detection.

>>> from soundscape_IR.soundscape_viewer import source_separation
>>> model=source_separation()
>>> model.load_model(filename='./model/model.mat')
>>> model.prediction(, f=sound.f)
>>> from soundscape_IR.soundscape_viewer import spectrogram_detection
>>> source_num = 1 # Choose the source for signal detection
>>> sp=spectrogram_detection(model.separation[source_num-1], model.f, threshold=3, smooth=1, minimum_interval=0.5, pad_size=0.1, filename='Detection.txt', path='./save/')
detectionndarray of shape (detection,2)

Begin time (the first column) and end time (the second column) of each detection (row).

resultpandas DataFrame

A table contains time and frequency boundaries of regions of interest.


feature_extraction([interval_range, ...])

This method extracts spectral and temporal features from regions of interest.

feature_extraction(interval_range=[1, 500], energy_percentile=None, filename='Features.mat', folder_id=[], path='./')[source]

This method extracts spectral and temporal features from regions of interest.

Spectral features are extracted by averaging the power spectral densities across time bins. Temporal features are extracted by performing autocorrelation on the time-domain energy envelope of the input spectrogram data. For sounds with repetitive pulse structure, this method generates a time-lagged autocorrelation function that represents the variation of inter-pulse intervals.

According to the table of detection results, this method will extract spectral and temporal features for each region of interest.

interval_rangea list of 2 scalars [min, max], default = [1, 500]

Minimum and maximum time intervals (in milliseconds) for measuring autocorrelation function.

The maximum time interval should not be greater than minimum_duration in the procedure of spectrogram_detection.

energy_percentileNone or float > 0, default = None

Choose a percentile to represent the energy envelope of spectrogram data.

Use this parameter when a spectrogram contains high-intensity noise. If energy_percentile is not set, the energy envelope is extracted by averaging power spectral densities across frequencies.

filenamestr, default = ‘Features.mat’

Name of the mat file contains feature extraction results.


Path to save feature extraction results.

folder_id[] or str, default = []

The folder ID of Google Drive folder for saving feature extraction result.

See for the detial of folder ID.


Detect a list of signals and extract their spectral and temporal features. Restrict the range of autocorrelation functions between 1 and 200 ms. Features are saved in a mat file.

>>> from soundscape_IR.soundscape_viewer import spectrogram_detection
>>> sp=spectrogram_detection(, sound.f, threshold=6, smooth=1, minimum_interval=0.5, minimum_duration=0.2, maximum_duration=1)
>>> sp.feature_extraction(interval_range=[1, 200], filename='Features.mat')
PIndarray of shape (autocorrelation function,)

Array of segment time intervals (in milliseconds).

PI_resultndarray of shape (detection, autocorrelation function)

Autocorrelation function(s) of region(s) of interest.

fndarray of shape (frequency,)

Array of sample frequencies (in Hertz).

spectral_resultndarray of shape (detection, frequency)

Mean spectrum(s) of region(s) of interest (in dB).