Onsets
Onset Detection
ID: onset
Boolean value (0 or 1) indicating whether a sound event onset has been detected. Evaluated using the onsetsds library's. We highly recommend the reading of the paper Adaptive whitening for improved real-time audio onset detection to better understand how the onset detection works and how to choose the best method for the music you will use (Table 1 and Table 2).
-
Equation
Default method uses:
\(MKL_n = \sum_{k=0}^{K} \log \left ( 1 + \frac{|S_{n,k}|}{|S_{n-1,k}|} \right )\)
But all methods descripted in the article can be used using the
ONSETFUNCTIONon config. Where \(S_{n,k}\) is a FFT bin[^1]. -
Notes
Different methods
| sigla | name | description |
|---|---|---|
| pow | Power | Measures frame-to-frame changes in signal energy. Works well for signals with clear amplitude attacks (e.g., percussion) because onsets often correspond to sudden increases in energy. |
| pd | Phase Deviation | Detects onsets by measuring irregularities in the phase progression of spectral bins. Effective when phase continuity breaks at note attacks. |
| wpd | Weighted Phase Deviation | Similar to phase deviation but weights bins by magnitude, emphasizing stronger spectral components. Improves robustness when relevant partials dominate the spectrum. |
| sf | Spectral Flux | Measures the positive change in magnitude spectrum between consecutive frames. Commonly used for onset detection since note attacks typically introduce new spectral energy. |
| cd | Complex Domain | Uses both magnitude and phase to estimate expected spectral evolution and measures the deviation from this prediction. Captures subtle onset cues in complex signals. |
| rcd | Rectified Complex Domain | A rectified version of the complex-domain method that counts only increases in deviation. Helps suppress false detections from decreases or cancellations. |
| hfc | High Frequency Content | Emphasizes changes in high-frequency bins. Effective for percussive sounds where attacks introduce strong high-frequency components. |
| mkl | Modified Kullback–Leibler | Measures divergence between spectral distributions of consecutive frames. Sensitive to structural changes in the spectrum, which often correspond to note onsets. |
Considerations of use
In a review of Tables 1 and 2 from the article Adaptive Whitening for Improved Real‑Time Audio Onset Detection, it is possible to conclude that no single onset detection function consistently dominates across all types of audio material. Performance varies according to signal characteristics such as percussiveness, harmonic content, and polyphonic complexity. Energy-based and high-frequency methods tend to perform well for percussive signals, while approaches that incorporate phase or spectral distribution information—particularly the complex-domain method—show stronger and more stable performance across a wider range of datasets. The results also indicate that adaptive whitening generally improves detection accuracy for complex mixtures and pitched material, suggesting that preprocessing the spectrum can significantly enhance the robustness of onset detection functions.
| situation | best odf | reason |
|---|---|---|
| percussive / drums | cd, pow, hfc | Strong transient attacks produce large energy increases and high-frequency bursts, which these methods capture effectively. |
| polyphonic pitched music | cd | Combines magnitude and phase prediction, allowing it to detect subtle spectral changes in harmonic textures. |
| monophonic pitched signals | mkl, cd | Sensitive to distribution changes in the spectrum, which helps when attacks are softer and energy change is smaller. |
| complex mixtures | cd, pow | More robust to heterogeneous signals where both energy changes and spectral deviations occur. |
| general-purpose | cd | Typically the most robust overall because it models expected spectral evolution using both magnitude and phase information. |