Audio visualization

class soundscape_IR.soundscape_viewer.utility.audio_visualization(filename, path=None, channel=1, offset_read=0, duration_read=None, FFT_size=512, time_resolution=None, window_overlap=0.5, f_range=None, sensitivity=0, environment='wat', plot_type='Spectrogram', vmin=None, vmax=None, prewhiten_percent=None, mel_comp=None, annotation=None, padding=0)[source]

This class loads the waveform of an audio recording (only WAVE files) and applies discrete Fourier transform to generate a spectrogram on the Hertz or Mel scale.

Two noise reduction methods are provided. Welch’s method reduces random noise by measuring the average power spectrum over a short period of time (Welch 1967).

Spectrogram prewhitening method finds the spectral pattern of background noise by calculating a specific percentile of power spectral densities at each frequency bin and subsequently subtracting the entire spectrogram from the background noise (Lin et al. 2021). After the prewhitening procedure, sound intensities are converted into signal-to-noise ratios.

This class can also generate a concatenated spectrogram of annotated fragments by importing a text file containing annotations. The text file can be prepared using the Raven software (https://ravensoundsoftware.com).

Parameters
filenamestr

Name of the audio file.

pathNone or str, default = None

Path of the input audio file.

If path is not set, current folder is used.

channelint ≥ 1, default = 1

Recording channel for analysis.

In stereo recordings, set to 1 for the left channel and set to 2 for the right channel.

offset_readfloat ≥ 0, default = 0

Start reading time of the input audio file (in seconds).

duration_readNone or float > 0, default = None

Duration load after offset_read (in seconds).

If duration_read is not set, the entire audio file after offset_read is processed.

FFT_sizeint > 0, default = 512

Window size to perform discrete Fourier transform (in samples).

window_overlapfloat [0, 1), default = 0.5

Ratio of overlap between consecutive windows.

time_resolutionNone or float > 0, default = None

Applying Welch’s method to calculate averaging power spectra.

After generating a regular spectrogram, a mean power spectrum is calculated within the range of time_resolution (in seconds).

time_resolution should not be smaller than (1-window_overlap)*FFT_size/sf, which is the original time resolution of a regular spectrogram. sf represents the sampling frequency of an audio file.

f_rangeNone or a list of 2 scalars [min, max], default = None

Minimum and maximum frequencies of the spectrogram.

prewhiten_percentNone or float [0, 100), default = None

Applying prewhitening method to suppress background noise and convert power spectral densities into signal-to-noise ratios.

After generating a regular spectrogram and applying Welch’s averaging method, the spectral pattern of background noise is estimated by calculating the percentile of power spectral densities in each frequency bin. Subtracting background noise from the whole spectrogram, signal-to-noise ratios below 0 are converted to 0.

mel_compNone or int ≥ 0, default = None

Number of Mel bands to generate.

If mel_comp is not set, a Hertz scaled spectrogram is generated.

sensitivityfloat, default = 0

Recording sensitivity of the input audio file (in dB re 1 V/μPa).

Set to 0 when sensitivity information is not available.

environment{‘wat’, ‘air’}, default = ‘wat’

Recording environment (underwater or in air) of the input audio file.

plot_typeNone or {‘Spectrogram’, ‘Waveform’, ‘Both’}, default = ‘Spectrogram’

Choose to only generate a spectrogram or a waveform, or do both plots.

vmin, vmaxNone or float, default = None

The data range that the colormap covers.

By default (None), the colormap covers the complete value range of the spectrogram.

annotationNone or str, default = None

Path and name of the text file containing annotations.

The text file should be saved using the format supported by the Raven software (https://ravensoundsoftware.com).

paddingfloat ≥ 0, default = 0

Duration that increase the length before and after each annotation (in seconds).

References

1

Welch, P. D. (1967). The use of Fast Fourier Transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms. IEEE Transactions on Audio and Electroacoustics, 15 (2): 70–73. https://doi.org/10.1109/TAU.1967.1161901

2

Lin, T.-H., Akamatsu, T., & Tsao, Y. (2021). Sensing ecosystem dynamics via audio source separation: A case study of marine soundscapes off northeastern Taiwan. PLoS Computational Biology, 17(2), e1008698. https://doi.org/10.1371/journ al.pcbi.1008698

Examples

Load an audio recording and generate the associated waveform and spectrogram.

>>> from soundscape_IR.soundscape_viewer import audio_visualization
>>> sound = audio_visualization(filename='audio.wav', path='./wav/', f_range=[0, 8000], plot_type='Both')

Use Welch’s method to suppress random noise and reduce time resolution.

>>> from soundscape_IR.soundscape_viewer import audio_visualization
>>> sound = audio_visualization(filename='audio.wav', path='./wav/', time_resolution=0.1, f_range=[0, 8000], plot_type='Spectrogram')

Generate a prewhitened spectrogram in Mel scale.

>>> from soundscape_IR.soundscape_viewer import audio_visualization
>>> sound = audio_visualization(filename='audio.wav', path='./wav/', FFT_size=2048, prewhiten_percent=10, mel_comp=128, plot_type='Spectrogram')

Generate a concatenated spectrogram by importing annotations, with 0.5 s padding before and after each annotation.

>>> from soundscape_IR.soundscape_viewer import audio_visualization
>>> sound = audio_visualization(filename='audio.wav', path='./wav/', annotation='./txt/annotations.txt', padding=0.5)
Attributes
sfint

Sampling frequency of the input audio file.

xndarray of shape (time,)

Waveform data, with subtraction of the DC value.

fndarray of shape (frequency,)

Frequency of spectrogram data (in Hertz).

datandarray of shape (time, frequency+1)

Log-scaled power spectral densities (in dB).

The first column is time, and the subsequent columns are power spectral densities associated with f.

phasendarray of shape (frequency,time)

Phase of the spectrogram data.

Not available when time_resolution is set.

ambientndarray of shape (frequency,)

Background noise estimated using the spectrogram prewhitening method.

Methods

convert_audio(magnitude_spec[, snr_factor])

This method recovers a time-domain waveform from a magnitude spectrogram by using inverse discrete Fourier transform.

convert_audio(magnitude_spec, snr_factor=1)[source]

This method recovers a time-domain waveform from a magnitude spectrogram by using inverse discrete Fourier transform.

The input spectrogram should be prepared in Hertz scale. Phase data is necessary during the inverse procedure, so time_resolution, f_range, and mel_comp should not be used when loading an audio file.

Parameters
magnitude_specndarray of shape (time, frequency+1)

Log-scaled power spectral densities, presumably to be noise filtered.

The first column is time, and the subsequent columns are power spectral densities associated with f.

snr_factorfloat > 0, default=1

A ratio for amplifying the input signal.

Examples

Load an audio recording and apply spectrogram prewhitening to suppress background noise. Then, use the prewhitened spectrogram to generate a waveform.

>>> from soundscape_IR.soundscape_viewer import audio_visualization
>>> sound = audio_visualization(filename='audio.wav', path='./wav/', FFT_size=512, window_overlap=0.5, prewhiten_percent=10, plot_type=None)
>>> sound.convert_audio(sound.data, snr_factor=1.5)
>>>
>>> from IPython.display import Audio
>>> Audio(sound.xrec, rate=sound.sf)

Use a source separation model to separate non-target signals and reconstruct the waveform of target source.

>>> from soundscape_IR.soundscape_viewer import source_separation
>>> model=source_separation(filename='model.mat')
>>> model.prediction(input_data=sound.data, f=sound.f)
>>> sound.convert_audio(model.separation[0], snr_factor=1.5)
Attributes
xrecndarray of shape (time,)

Reconstructed waveform data.