Audio source separation

class soundscape_IR.soundscape_viewer.source_separation.source_separation(feature_length=1, basis_num=60, filename=None)[source]

This class provides a set of source separation methods based on non-negative matrix factorization (NMF).

NMF is a machine learning algorithm that iteratively learns to reconstruct a non-negative input matrix V by finding a set of basis functions W and encoding vectors H. The NMF algorithm is based on sklearn.decomposition.NMF.

NMF-based source separation consists of a model training phase and a prediction phase.

In the training phase, a source separation model can be trained using supervised NMF or unsupervised PC-NMF. If training data is clean, we suggest using supervised NMF for learning source-specific features (Lin & Tsao 2020). If training data is noisy, PC-NMF can learn two sets of basis functions by assuming the target source and noise possess different periodicities (Lin et al. 2017).

In the prediction phase, adaptive source separation is applied if target sources alter their acoustic characteristics (Kwon et al. 2015), and semi-supervised SS is used when unseen sources are encountered (Smaragdis et al. 2007).

Parameters

feature_lengthint ≥ 1, default = 1

Number of time bins used in the learning procedure of basis functions.

The duration of each basis function is determined by multiplying feature_length and the time resolution of the input spectrogram. We suggest choosing a minimum length that can cover the basic unit of animal vocalizations (such as a note or syllable of bird songs). Choosing a shorter duration may result in learning fragmented signals, but choosing a longer duration will slow down the computation speed.

basis_numint ≥ 1, default = 60

Number of basis functions used in the training phase of source separation.

Using a larger number of basis functions is expected to learn more diverse features but may generate a set of time-shifting functions sharing the same spectral structure and reduce the abstraction of invariant features.

filenamestr

Path and name of the mat file containing a trained source separation model.

References

1: Kwon, K., Shin, J. W., & Kim, N. S. (2015). NMF-Based Speech Enhancement Using Bases Update. IEEE Signal Processing Letters, 22(4), 450–454. https://doi.org/10.1109/LSP.2014.2362556
2: Lin, T.-H., Fang, S.-H., & Tsao, Y. (2017). Improving biodiversity assessment via unsupervised separation of biological sounds from long-duration recordings. Scientific Reports, 7(1), 4547. https://doi.org/10.1038/s41598-017-04790-7
3: Lin, T.-H., & Tsao, Y. (2020). Source separation in ecoacoustics: A roadmap towards versatile soundscape information retrieval. Remote Sensing in Ecology and Conservation, 6(3), 236–247. https://doi.org/10.1002/rse2.141
4: Smaragdis, P., Raj, B. & Shashanka, M. (2007). Supervised and semi-supervised separation of sounds from single-channel mixtures. Independent Component Analysis and Signal Separation, 414–421. https://doi.org/10.1007/978-3-540-74494-8_52

Examples

Learn two sets of basis functions and combine them within a model.

>>> from soundscape_IR.soundscape_viewer import source_separation
>>> # Train 1st model
>>> model=source_separation(feature_length=5, basis_num=10)
>>> model.learn_feature(sound1.data, sound1.f, method='NMF')
>>>
>>> # Train 2nd model
>>> model2=source_separation(feature_length=5, basis_num=10)
>>> model2.learn_feature(sound2.data, sound2.f, method='NMF')
>>>
>>> # Merge the two models
>>> model.merge([model2])

Train a source separation model using PC-NMF and save the model as a mat file.

>>> from soundscape_IR.soundscape_viewer import source_separation
>>> model=source_separation(feature_length=5, basis_num=20)
>>> model.learn_feature(input_data=sound_train.data, f=sound_train.f, method='PCNMF')
>>> model.save_model(filename='model.mat')

Use a trained source separation model for prediction and plot the separation results

>>> from soundscape_IR.soundscape_viewer import source_separation
>>> # Load a saved model and perform source separation
>>> model=source_separation(filename='model.mat')
>>> model.prediction(input_data=sound_predict.data, f=sound_predict.f)
>>>
>>> # View individual reconstructed spectrogram
>>> model.plot_nmf(plot_type = 'separation', source = 1)
>>> model.plot_nmf(plot_type = 'separation', source = 2)

Apply adaptive and semi-supervised source separation in prediction

>>> from soundscape_IR.soundscape_viewer import source_separation
>>> # Enable adaptive SS by using adaptive_alpha
>>> # Enable semi-supervised SS by using additional_basis
>>> model=source_separation(filename='model.mat')
>>> model.prediction(input_data=sound_predict.data, f=sound_predict.f, adaptive_alpha=0.05, additional_basis=2)

Apply adaptive source separation for the target source, but not for the noise source.

>>> from soundscape_IR.soundscape_viewer import source_separation
>>> # Enable adaptive SS for 1st source, not for 2nd source
>>> model=source_separation(filename='model.mat')
>>> model.prediction(input_data=sound_predict.data, f=sound_predict.f, adaptive_alpha=[0.25, 0], additional_basis=0)

Methods

`learn_feature`(input_data, f[, alpha, ...])	This method supports the use of NMF or PC-NMF in the feature learning procedure.
`load_model`(filename[, model_check])	Load a source separation model
`merge`(model)	Merge multiple source separation models.
`plot_nmf`([plot_type, source, time_range, ...])	Generate a figure to show the content of basis functions or encoding vectors learned in a source separation model.
`prediction`(input_data, f[, iter, ...])	Perform prediction in source separation procedures.
`save_model`([filename, folder_id])	Save basis functions and model parameters
`specify_target`(index)	This method specifies the target source from the two sound sources learned by using PC-NMF.

learn_feature(input_data, f, alpha=0, method='NMF', iter=200, show_result=False)[source]

This method supports the use of NMF or PC-NMF in the feature learning procedure.

Use the NMF method when a training spectrogram is clean.

Use the PC-NMF method when a training spectrogram contains significant noise. Note that PC-NMF assumes that the target source and noise display different periodicities on the input spectrogram.

Parameters

input_datandarray of shape (time, frequency+1)

Spectrogram data for source separation.

The first column is time, and the subsequent columns are power spectral densities associated with f. Using the same spectrogram format generated from audio_visualization.

fndarray of shape (frequency,)

Frequency of spectrogram data.

alphafloat, default=0

Constant that multiplies the regularization terms of W.

See the introduction of alpha_W in sklearn.decomposition.NMF.

method{‘NMF’, ‘PCNMF’}, default = ‘NMF’

Type of NMF method for model training.

Use the NMF method when a training spectrogram is clean.

Use the PC-NMF method when a training spectrogram contains significant noise.

iterint ≥ 1, default = 200

Number of iterations for learning spectral features.

show_resultboolean, default = False

Plot learned basis functions and reconstructed spectrogram if set to True.

Attributes

fndarray of shape (frequency,): Frequency of spectrogram data.
time_vecndarray of shape (time,): Array of segment time of spectrogram data.
Wndarray of shape (frequency*feature_length, basis_num): Basis functions (spectral features) essential for reconstructing the input spectrogram.
Hndarray of shape (basis_num, time_vec): Encoding vectors describing the temporal activations of each basis function in the input spectrogram.
source_numint ≥ 1: Number of sources learned in a source separation model.
W_clusterndarray of shape (basis_num,): Array of source indicator of basis functions.

load_model(filename, model_check=True)[source]

Load a source separation model

Parameters

filenamestr: Name of the mat file.
model_checkboolean, default = True: Print model parameters if set to True.

merge(model)[source]

Merge multiple source separation models.

The principle is to use one model to merge the other models trained using NMF or PC-NMF. For models trained by using PC-NMF, please specify their target sources before the merge procedure. This method gives each target source a unique source indicator but combines all noise sources into the same source indicator.

Parameters

modela list of models: Source separation models trained using NMF or PC-NMF.

Examples

Train three models and combine them into one model.

>>> from soundscape_IR.soundscape_viewer import source_separation
>>> # 1st model
>>> model_1=source_separation(feature_length=5, basis_num=10)
>>> model_1.learn_feature(input_data=sound_1.data, f=sound_1.f, method='NMF')
>>>
>>> # 2nd model
>>> model_2=source_separation(feature_length=5, basis_num=15)
>>> model_2.learn_feature(input_data=sound_2.data, f=sound_2.f, method='PCNMF')
>>> model_2.specify_target(index=2) # Assuming the 2nd source is the target source
>>>
>>> # 3rd model
>>> model_3=source_separation(feature_length=5, basis_num=20)
>>> model_3.learn_feature(input_data=sound_3.data, f=sound_3.f, method='PCNMF')
>>> model_3.specify_target(index=1) # Assuming the 1st source is the target source
>>>
>>> # Merge the three models
>>> model_1.merge([model_2, model_3])

plot_nmf(plot_type='W', source=None, time_range=None, fig_width=14, fig_height=6)[source]

Generate a figure to show the content of basis functions or encoding vectors learned in a source separation model.

Alternatively, plot the reconstructed spectrogram of each sound source.

Parameters

plot_type{‘W’, ‘H’, ‘separation’}, default = ‘W’

Type of content for plotting.

Set to ‘W’ for plotting basis functions, set to ‘H’ for plotting encoding vectors, and set to ‘separation’ for plotting a reconstructed spectrogram.

sourceNone or int ≥ 1

Source number (start from 1), with a maximum value according to the number of sources learned.

For plot_type = {'W', 'H'}, this method will plot all basis functions if source is not set. For plot_type='separation', source must be specified.

time_rangeNone or list of 2 scalars [min, max]

Time range to plot ‘H’ and ‘separation’.

fig_width, fig_heightfloat > 0

Figure width and height.

prediction(input_data, f, iter=50, adaptive_alpha=0, additional_basis=0)[source]

Perform prediction in source separation procedures. This method supports conventional NMF, adaptive NMF, and semi-supervised NMF.

Set adaptive_alpha and additional_basis to 0 for using conventional NMF, which assumes that the testing spectrogram contains the same target sources and noise sources as the training spectrograms. Apply adaptive source separation if target sources alter their acoustic characteristics. This can be done by setting adaptive_alpha. Apply semi-supervised source separation when unseen sources are encountered. This can be done by setting additional_basis.

adaptive_alpha can be a value (apply for all sources) or a list of scalars (apply for different sources according to the source indicator information W_cluster.

Parameters

input_datandarray of shape (time, frequency+1)

Spectrogram data for source separation.

The first column is time, and the subsequent columns are power spectral densities associated with f. Using the same spectrogram format generated from audio_visualization.

fndarray of shape (frequency,)

Frequency of spectrogram data.

iterint ≥ 1, default = 50

Number of iterations for predicting source behaviors.

adaptive_alphafloat [0, 1) or a list of scalars, default = 0

Ratio to update basis functions in each iteration of adaptive source separation.

The choice of adaptive_alpha depends on the prior knowledge regarding whether the trained basis functions are representative of the target sources. If adaptive_alpha equals 0, we assume that the spectral features of target sources are invariant. If adaptive_alpha equals 1, the basis functions are set to be freely updated.

Provide a list of scalars to set adaptive_alpha for different sound sources.

additional_basisint ≥ 0, default = 0

Adding a set of basis functions initiated by random values into a source separation model to enable semi-supervised source separation.

During the iterative updating procedure, the trained basis functions are fixed (if adaptive source separation is inactivated), but the newly added basis functions can update themselves through the standard NMF update rule. For mixtures containing many new sources, a higher number of additional_basis can give more building blocks to perform spectrogram reconstruction.

Attributes

separationndarray of shape (source_num,)

Reconstructed spectrograms of sources separated by using a source separation model.

relative_levelndarray of shape (source_num,)

Intensities of sources separated by using a source separation model.

For each source, the intensity at each time bin is an integration of signal-to-noise ratio along the frequency domain.

original_levelndarray of shape (time,)

Time-series intensities of the input spectrogram.

save_model(filename='NMF_model.mat', folder_id=[])[source]

Save basis functions and model parameters

Parameters

filenamestr, default = ‘NMF_model.mat’

Name of the mat file.

folder_id[] or str, default = []

The folder ID of Google Drive folder for saving model.

See https://ploi.io/documentation/database/where-do-i-get-google-drive-folder-id for the detial of folder ID.

specify_target(index)[source]

This method specifies the target source from the two sound sources learned by using PC-NMF.

Parameters

indexint ≥ 1

Source number (start from 1) associated with target source.

In the method of learn_feature, PC-NMF only learns 2 sound sources. Please set index to 1 or 2.