Audio: MFCC: Use the MFCC module as compress PCM encoder with discontinuous stream#10814
Open
singalsu wants to merge 4 commits into
Open
Audio: MFCC: Use the MFCC module as compress PCM encoder with discontinuous stream#10814singalsu wants to merge 4 commits into
singalsu wants to merge 4 commits into
Conversation
Collaborator
Author
|
Note: To run the MFCC compress topologies, need kernel patches thesofproject/linux#5647 and thesofproject/linux#5789. |
singalsu
commented
May 26, 2026
d5267b3 to
969d644
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR extends the SOF MFCC component and related tooling/topology to support VAD + DTX behavior and to use MFCC as a compress PCM “encoder” that can emit discontinuous (DTX-suppressed) feature frames, including optional IPC4 control notifications for VAD state.
Changes:
- Add MFCC VAD/DTX support in firmware (new VAD implementation, frame header with VAD/energy fields, optional IPC4 notifications, and compress-output mode).
- Add/adjust topology2 definitions to expose MFCC feature capture for both normal PCM and compress PCM on SDW jack/DMIC, including new build targets.
- Update MFCC tuning/export and host-side decode/visualization/transcription tools (Matlab/Octave + Python scripts), plus new documentation.
Reviewed changes
Copilot reviewed 40 out of 40 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/topology/topology2/platform/intel/sdw-jack-audio-feature.conf | Adds MFCC frame sizing define and VAD mixer control naming for jack feature capture. |
| tools/topology/topology2/platform/intel/sdw-jack-audio-feature-compress.conf | New compress PCM MFCC feature-capture topology for jack (MFCC encoder type, blob selection, VAD control). |
| tools/topology/topology2/platform/intel/sdw-dmic-audio-feature.conf | Adds MFCC frame sizing define and VAD mixer control naming for DMIC feature capture. |
| tools/topology/topology2/platform/intel/sdw-dmic-audio-feature-compress.conf | New compress PCM MFCC feature-capture topology for DMIC (MFCC encoder type, blob selection, VAD control). |
| tools/topology/topology2/platform/intel/dmic1-mfcc.conf | Renames MFCC bytes control and adds VAD mixer control naming. |
| tools/topology/topology2/include/pipelines/cavs/host-gateway-src-mfcc-capture.conf | Adds MFCC_FRAME_BYTES-driven ibs/obs to support variable-sized (compress) MFCC frames. |
| tools/topology/topology2/include/components/mfcc/mel80.conf | Updates exported MFCC configuration blob. |
| tools/topology/topology2/include/components/mfcc/mel80_compress.conf | New exported MFCC configuration blob for compress output. |
| tools/topology/topology2/include/components/mfcc/mel80_compress_dtx.conf | New exported MFCC configuration blob for compress output + DTX. |
| tools/topology/topology2/include/components/mfcc/default.conf | Updates exported default MFCC configuration blob. |
| tools/topology/topology2/include/components/mfcc/ceps13_compress_dtx.conf | New exported MFCC configuration blob for cepstral output + compress + DTX. |
| tools/topology/topology2/include/components/mfcc.conf | Adds mixer control template to MFCC widget and allows type override (e.g., encoder). |
| tools/topology/topology2/include/common/common_definitions.conf | Adds default feature flags for SDW jack/DMIC compress MFCC capture. |
| tools/topology/topology2/include/bench/mfcc_controls_playback.conf | Enables an MFCC mixer switch control in bench playback controls. |
| tools/topology/topology2/include/bench/mfcc_controls_capture.conf | Enables an MFCC mixer switch control in bench capture controls. |
| tools/topology/topology2/development/tplg-targets.cmake | Renames MFCC topology targets and adds compress MFCC mel/ceps variants with frame sizing + blob selection. |
| tools/topology/topology2/cavs-sdw.conf | Adds feature-gated includes for new compress MFCC capture topologies. |
| src/include/user/mfcc.h | Extends MFCC config ABI with VAD/DTX/compress flags and timing parameters. |
| src/include/sof/audio/mfcc/mfcc_vad.h | New VAD API/state definitions for MFCC. |
| src/include/sof/audio/mfcc/mfcc_comp.h | Refactors MFCC component interfaces (source/sink API, frame header, VAD/DTX state, IPC4 helpers). |
| src/audio/mfcc/tune/sof_mel_to_text_live_dsp_vad.py | New live Whisper transcription script using DSP VAD embedded in PCM stream. |
| src/audio/mfcc/tune/sof_mel_to_text_live_compress.py | New live Whisper transcription script for compress PCM + DTX/discontinuous frames. |
| src/audio/mfcc/tune/sof_mel_spectrogram_compress.py | New live mel spectrogram viewer for compress PCM MFCC frames. |
| src/audio/mfcc/tune/sof_ceps_spectrogram_compress.py | New live cepstral viewer for compress PCM MFCC frames. |
| src/audio/mfcc/tune/setup_mfcc.m | Updates blob export for new config layout; adds compress + DTX blob exports. |
| src/audio/mfcc/tune/README.txt | Removed in favor of README.md. |
| src/audio/mfcc/tune/README.md | New markdown documentation for tuning, decoding, and live scripts. |
| src/audio/mfcc/tune/decode_mel.m | Updates decoder for new int32 + header format and DTX gap filling. |
| src/audio/mfcc/tune/decode_ceps.m | Updates decoder for new int32 + header format and DTX gap filling. |
| src/audio/mfcc/tune/decode_all.m | Updates batch decode to new decoder signatures and int32 outputs. |
| src/audio/mfcc/mfcc.c | Moves MFCC to source/sink API processing, hooks VAD notifications and compress/DTX behavior. |
| src/audio/mfcc/mfcc_vad.c | New VAD implementation (noise floor tracking + weighted energy + hangover). |
| src/audio/mfcc/mfcc_setup.c | Adds VAD init, DTX/compress state init, buffer free fixes, sample-rate limit check. |
| src/audio/mfcc/mfcc_ipc4.c | New IPC4 control notification plumbing for VAD state reporting. |
| src/audio/mfcc/mfcc_hifi4.c | Removes old stream-buffer source copy implementations (now in common source/sink code). |
| src/audio/mfcc/mfcc_hifi3.c | Removes old stream-buffer source copy implementations (now in common source/sink code). |
| src/audio/mfcc/mfcc_generic.c | Removes old stream-buffer source copy implementations (now in common source/sink code). |
| src/audio/mfcc/mfcc_common.c | Adds source/sink copy funcs, header/VAD handling, legacy vs compress output paths, and DTX suppression logic. |
| src/audio/mfcc/CMakeLists.txt | Registers new mfcc_vad.c and conditionally mfcc_ipc4.c in build. |
| src/audio/base_fw.c | Advertises BESPOKE codec capability for MFCC compress capture. |
35e56d5 to
71404ce
Compare
71404ce to
97a3c57
Compare
lyakh
reviewed
May 29, 2026
b491cf1 to
65792ff
Compare
65792ff to
4ff0a7e
Compare
lyakh
reviewed
Jun 1, 2026
4ff0a7e to
66743ea
Compare
kv2019i
requested changes
Jun 3, 2026
kv2019i
left a comment
Collaborator
There was a problem hiding this comment.
Code changes look good, some notes of newly added python apps.
66743ea to
687e01b
Compare
lyakh
reviewed
Jun 8, 2026
lyakh
left a comment
Collaborator
There was a problem hiding this comment.
nothing critical, can be addressed later at the next convenience
687e01b to
e1fa45b
Compare
e1fa45b to
9f14d81
Compare
Switch from process_audio_stream to source/sink API. Add compress PCM output mode (variable-size frames, no zero padding) alongside legacy mode (full period with zero-fill). Unify all output to int32 Q9.23 regardless of source format. Remove out_data_ptr_32, mel_spectra int16 copy, mfcc_func typedef, and per-format output functions from mfcc_common/hifi3/hifi4. Add DTX for compress mode: suppress silence frames after configurable trailing count, with optional periodic keepalive. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Register SND_AUDIOCODEC_BESPOKE capture in codec info TLV when CONFIG_COMP_MFCC is enabled so the kernel detects compress capture support via IPC4_SOF_CODEC_INFO. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Update Octave decode scripts for int32 Q9.23 output and DTX gap filling. Add DTX blob generation to setup_mfcc.m. Add Python compress capture tools: sof_mel_spectrogram_compress.py, sof_ceps_spectrogram_compress.py, sof_mel_to_text_live_compress.py. Refactor sof_mel_to_text_live_dsp_vad.py to use shared compress capture code. Add README with usage examples. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Add sdw-jack-audio-feature-compress.conf (PCM 53, pipeline 132) and sdw-dmic-audio-feature-compress.conf (PCM 54, pipeline 133) for compress MFCC capture with DTX blobs. Fix buffer sizes: set MFCC obs and host-copier ibs/obs to 344 bytes (24-byte header + 80 x int32). Add mel and ceps compress topology targets for MTL and ARL. Rename normal MFCC topologies to *-mfcc-mel-normal for clarity. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
9f14d81 to
4c40079
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds commits to previous VAD add PR #10782
A kernel PR for encoder type ALSA controls fix is needed to run this.