Speaker Audio Media Player¶
The speaker
media player platform allows you to play on-device and online audio media via speaker components.
This platform greatly benefits from having external PSRAM. See the performance section for details.
It natively supports decoding FLAC
, MP3
, and WAV
audio files. Home Assistant (since version 2024.10) can proxy any media it sends and transcode it to a specified format and sample rate to minimize the device’s computational load.
It supports two different audio pipelines: announcement and media. Each audio pipeline must output to a unique speaker. Use a mixer speaker component to create two different speakers that output to a single audio speaker.
On-device files built directly into the firmware are played without a network connection. Encode on-device files with the configured sample rate, 1 or 2 channels, and 16 bits per sample.
This platform only works on ESP32 based chips using the ESP-IDF framework.
Warning
Audio and voice components consume a significant amount of resources (RAM, CPU) on the device.
Crashes are likely to occur if you include too many additional components in your device’s configuration. In particular, Bluetooth/BLE components are known to cause issues when used in combination with Voice Assistant and/or other audio components.
# Example minimal configuration entry
media_player:
- platform: speaker
announcement_pipeline:
speaker: announcment_spk_id
Configuration variables:¶
announcement_pipeline (Required, Pipeline Schema): Configuration settings for the announcement pipeline.
format (Optional, enum): The audio format Home Asssistant will transcode audio to before sending it to the device. One of
FLAC
,MP3
,WAV
, orNONE
.NONE
disables transcoding in Home Assistant. Defaults toFLAC
.sample_rate (Optional, positive integer): Sample rate for the transcoded audio. Should be supported by the configured
speaker
component. Defaults to the speaker’s sample rate.num_channels (Optional, positive integer): Number of channels for the transcoded audio. Must be either
1
or2
. Defaults to the speaker’s number of channels.
media_pipeline (Optional, Pipeline Schema): Configuration settings for the media pipeline. Same options as the
announcement_pipeline
.buffer_size (Optional, positive integer): The buffer size in bytes for each pipeline. Must be between
4000
and4000000
. Defaults to1000000
.codec_support_enabled (Optional, boolean): Enables the MP3 and FLAC decoders and optimizes the WiFi configuration for streaming high quality audio. Defaults to
true
.task_stack_in_psram (Optional boolean): Run the audio tasks in external memory. Defaults to
false
.volume_increment (Optional, percentage): Increment amount that the
media_player.volume_up
andmedia_player.volume_down
actions will increase or decrease volume by. Defaults to5%
.volume_min (Optional, percentage): The minimum volume allowed. Defaults to
0%
.volume_max (Optional, percentage): The maximum volume allowed. Defaults to
100%
.- files (Optional, list): A list of media files to build into the firmware for on-device playback.
id (Required, ID): Unique ID for the file.
file (Required, string): Path to audio file. Can be a local file path or a URL.
on_mute (Optional, Automation): An automation to perform when muted.
on_unmute (Optional, Automation): An automation to perform when unmuted.
on_volume (Optional, Automation): An automation to perform when the volume is changed.
All other options from Media Player
Example Configuration¶
This example outputs audio to an I²S Audio Speaker configured with a 48000 Hz sample rate. It uses a mixer
speaker component to handle combining the two different pipelines, and it uses resampler
speaker components to ensure the source speakers uses the same sample rate.
It adds a switch for playing an on-device file for an alarm notification. Any playing media is ducked while the alarm is activated. After the alarm is turned off, the media ducking will gradually stop.
i2s_audio:
i2s_lrclk_pin: GPIOXX
i2s_bclk_pin: GPIOXX
sample_rate: 48000
speaker:
- platform: i2s_audio
id: speaker_id
dac_type: external
i2s_dout_pin: GPIOXX
sample_rate: 48000
- platform: mixer
id: mixer_speaker_id
output_speaker: speaker_id
source_speakers:
- id: announcement_spk_mixer_input
- id: media_spk_mixer_input
- platform: resampler
id: media_spk_resampling_input
output_speaker: media_spk_mixer_input
- platform: resampler
id: announcement_spk_resampling_input
output_speaker: announcement_spk_mixer_input
media_player:
- platform: speaker
name: "Speaker Media Player"
id: speaker_media_player_id
media_pipeline:
speaker: media_spk_resampling_input
num_channels: 2
announcement_pipeline:
speaker: announcement_spk_resampling_input
num_channels: 1
files:
- id: alarm_sound
file: alarm.flac # Placed in the yaml directory. Should be encoded with a 48000 Hz sample rate, mono or stereo audio, and 16 bits per sample.
switch:
- platform: template
name: "Ring Timer"
id: timer_ringing
optimistic: true
restore_mode: ALWAYS_OFF
on_turn_off:
# Stop playing the alarm
- media_player.stop:
announcement: true
- mixer_speaker.apply_ducking: # Stop ducking the media stream over 2 seconds
id: media_spk_mixer_input
decibel_reduction: 0
duration: 2.0s
on_turn_on:
# Duck media audio by 20 decibels instantly
- mixer_speaker.apply_ducking:
id: media_spk_mixer_input
decibel_reduction: 20
duration: 0.0s
- while:
condition:
switch.is_on: timer_ringing
then:
# Play the alarm sound as an announcement
- media_player.speaker.play_on_device_media_file:
media_file: alarm_sound
announcement: true
# Wait until the alarm sound starts playing
- wait_until:
media_player.is_announcing:
# Wait until the alarm sound stops playing
- wait_until:
not:
media_player.is_announcing:
Automations¶
media_player.speaker.play_on_device_media_file
Action¶
This action will play a on-device media file.
on_...:
# Simple
- media_player.speaker.play_on_device_media_file: file_id
# Full
- media_player.speaker.play_on_device_media_file:
media_file: wake_word_trigger_sound
announcement: true
Configuration variables:
media_file (Required, ID): The ID of the media file.
announcement (Optional, boolean): Whether to play back the file as an announcement or media stream. Defaults to
false
.enqueue (Optional, boolean): Whether to add the media file to the end of the pipeline’s internal playlist. Defaults to
false
.
Performance¶
Decoding audio files is CPU and memory intensive. PSRAM external memory is strongly recommended. To use the component on a memory constrained device, define only the announcement pipeline, decrease the buffer size, set codec_support_enabled
to false, and set the pipeline transcode setting format to WAV
with a low sample rate and only 1 channel.
In general, decoding FLAC has the lowest CPU usage, but requires a strong WiFi connection. Decoding MP3 requires less data to be sent over WiFi but is more CPU intensive to decode. Decoding WAV is only recommended at low sample rates if streamed over a network connection.
Increasing the buffer size may reduce stuttering, but do not set it to the entire size of the external memory. Each pipeline allocates the configured amount, and this setting also does not take into account other smaller buffers allocated throughout the audio stack.
Only set task_stack_in_psram
to true if you have many components configured and your logs show that memory allocation failed. It is slower, especially if your PSRAM doesn’t support octal
mode.
Troubleshooting¶
While you are troubleshooting, simplify your setup as much as possible . Only configure the announcement_pipeline
and do not use resampler
or mixer
speakers.
If you can’t hear anything, check whether your hardware requires a GPIO pin to be high or low to enable the speaker. Verify you have the correct speaker channel configured: try setting your speaker configuration to stereo if you are unsure which channels are available.
If the audio quality is poor, check your output speaker configuration. Experiment with the bits per sample, channels, and sample rate settings. In general, higher sample rates improve audio quality: try using 44100
Hz or 48000
Hz instead of 16000
Hz.
If there is a noticeable delay before a pause command takes effect, reduce the buffer duration in the output speaker. Be sure to adjust both the hardware speaker component settings and the mixer
speaker component settings, if used.