Building a Recording Studio App with Backing Tracks
Musicians practicing or recording want to:
- Play along with a backing track (drums, bass, full mix)
- Record their performance with studio-quality audio
- Control the blend between backing track and wet effects signal
- Zero latency - everything happens on-device
flowchart LR
subgraph Device["Hoopi Pedal"]
subgraph ESP["ESP32"]
SD_READ[SD Card<br/>Read WAV]
SD_WRITE[SD Card<br/>Write WAV]
I2S_TX[I2S TX]
I2S_RX[I2S RX]
end
subgraph DSP["Daisy Seed"]
BT_IN[Backing Track In]
FX[Effects Engine]
MIX_OUT[Output Mixer]
MIX_REC[Recording Mixer]
WET_OUT[Wet Audio Out]
end
HP[Headphones]
end
SD_READ -->|Read WAV| I2S_TX
I2S_TX -->|Backing Track| BT_IN
BT_IN --> MIX_OUT
FX -->|Wet Signal| MIX_OUT
FX -->|Wet Signal| MIX_REC
BT_IN -.->|If flag set| MIX_REC
MIX_OUT --> HP
MIX_REC --> WET_OUT
WET_OUT -->|Recording Audio| I2S_RX
I2S_RX --> SD_WRITE
Key insight: The backing track never leaves the device. Recording happens on ESP32, DSP handles mixing and effects.
Architecture Overview
flowchart TB
subgraph App["Flutter App"]
LIB[Track Library]
UPLOAD[Upload Service]
API[API Service]
CTRL[Blend Controls]
end
subgraph ESP["ESP32"]
REST[REST API]
FS[SD Card FS]
STREAM[WAV Streamer]
RECORDER[WAV Recorder]
I2S[I2S Driver<br/>Bidirectional]
end
subgraph Daisy["Daisy Seed DSP"]
RX[Backing Track RX]
TX[Recording TX]
EFFECTS[Effects Chain]
BLEND_OUT[Output Blend]
BLEND_REC[Recording Blend]
end
LIB --> UPLOAD
UPLOAD -->|HTTP POST| REST
REST --> FS
CTRL --> API
API -->|HTTP| REST
REST -->|UART| Daisy
FS --> STREAM
STREAM -->|I2S TX| RX
RX --> BLEND_OUT
EFFECTS --> BLEND_OUT
EFFECTS --> BLEND_REC
RX -.->|Optional| BLEND_REC
BLEND_REC -->|I2S RX| RECORDER
RECORDER --> FS
Bidirectional I2S Flow
The ESP32 and Daisy communicate over a bidirectional I2S bus:
sequenceDiagram
participant SD as SD Card
participant ESP as ESP32
participant DSP as Daisy Seed
participant HP as Headphones
Note over ESP,DSP: I2S Bus (Full Duplex)
loop Every audio frame (48kHz)
SD->>ESP: Read backing track samples
ESP->>DSP: I2S TX: Backing track L/R
DSP->>DSP: Blend backing + wet for output
DSP->>HP: Output blended audio
DSP->>DSP: Mix wet (+ backing if flag)
DSP->>ESP: I2S RX: Recording audio L/R
ESP->>SD: Write recording samples
end
Audio Signal Routing
flowchart TB
subgraph Inputs["Inputs"]
GUITAR[Guitar/Instrument]
BT[Backing Track<br/>from ESP32 I2S]
end
subgraph DSP["Daisy Seed Processing"]
FX[Effects Chain<br/>Reverb, Delay, etc.]
subgraph OutputPath["Output Path (Headphones)"]
MIX_HP[Blend Mixer]
end
subgraph RecordPath["Recording Path (to ESP32)"]
MIX_REC[Recording Mixer]
end
end
subgraph Outputs["Outputs"]
HP[Headphones<br/>Backing + Wet]
I2S_OUT[I2S to ESP32<br/>Wet + optional Backing]
end
GUITAR --> FX
FX -->|Wet signal| MIX_HP
BT -->|Backing audio| MIX_HP
MIX_HP --> HP
FX -->|Wet signal<br/>Always| MIX_REC
BT -.->|Backing audio<br/>If blend_to_recording flag| MIX_REC
MIX_REC --> I2S_OUT
Key points:
- Output (headphones): Always backing track + wet effects blended
- Recording (to ESP32): Always wet effects, optionally includes backing track if flag is set
Backing Track Sources
Tracks on the device SD card can come from:
- Previous recordings - Practice along with your own takes
- Uploaded tracks - Push WAV files from the app
- Downloaded content - Backing tracks from the cloud
class BackingTrack {
final String filename; // Filename on device SD card
final String displayName; // User-friendly name
final Duration duration;
final int sampleRate; // Always 48000
final int bitDepth; // Always 16
final BackingTrackSource source;
const BackingTrack({
required this.filename,
required this.displayName,
required this.duration,
this.sampleRate = 48000,
this.bitDepth = 16,
required this.source,
});
}
enum BackingTrackSource {
recording, // Previous session recording
uploaded, // Pushed from app
downloaded, // From cloud library
}
Uploading Tracks to Device
Tracks must be 16-bit/48kHz WAV format for I2S compatibility:
class BackingTrackUploadService {
final ApiService _apiService;
Future<bool> uploadTrack(String localPath, String deviceFilename) async {
final file = File(localPath);
// Verify format
final header = await _readWavHeader(file);
if (header.sampleRate != 48000 || header.bitsPerSample != 16) {
throw FormatException(
'Backing tracks must be 16-bit/48kHz WAV. '
'Got ${header.bitsPerSample}-bit/${header.sampleRate}Hz'
);
}
// Upload to device
final bytes = await file.readAsBytes();
final response = await http.post(
Uri.parse('${_apiService.baseUrl}/upload/backingtrack'),
headers: {
'Content-Type': 'application/octet-stream',
'X-Filename': deviceFilename,
'X-Duration': header.duration.inSeconds.toString(),
},
body: bytes,
);
return response.statusCode == 200;
}
Future<WavHeader> _readWavHeader(File file) async {
final bytes = await file.openRead(0, 44).first;
return WavHeader.parse(Uint8List.fromList(bytes));
}
}
Progress Tracking
Future<void> uploadWithProgress(
String localPath,
String deviceFilename,
void Function(double progress) onProgress,
) async {
final file = File(localPath);
final fileSize = await file.length();
final request = http.StreamedRequest(
'POST',
Uri.parse('${_apiService.baseUrl}/upload/backingtrack'),
);
request.headers['Content-Type'] = 'application/octet-stream';
request.headers['X-Filename'] = deviceFilename;
request.headers['Content-Length'] = fileSize.toString();
int uploaded = 0;
final stream = file.openRead().transform(
StreamTransformer.fromHandlers(
handleData: (data, sink) {
uploaded += data.length;
onProgress(uploaded / fileSize);
sink.add(data);
},
),
);
request.sink.addStream(stream).then((_) => request.sink.close());
final response = await request.send();
if (response.statusCode != 200) {
throw Exception('Upload failed: ${response.statusCode}');
}
}
Listing Available Tracks
The device API returns all WAV files that can be used as backing tracks:
Future<List<BackingTrack>> getAvailableTracks() async {
final response = await http.get(
Uri.parse('${_apiService.baseUrl}/api/backingtracks'),
);
if (response.statusCode != 200) {
throw Exception('Failed to list backing tracks');
}
final List<dynamic> json = jsonDecode(response.body);
return json.map((item) => BackingTrack(
filename: item['filename'],
displayName: item['name'] ?? _extractName(item['filename']),
duration: Duration(seconds: item['duration_sec'] ?? 0),
source: BackingTrackSource.values.byName(item['source'] ?? 'uploaded'),
)).toList();
}
String _extractName(String filename) {
// "my_song.wav" -> "my song"
return filename
.replaceAll('.wav', '')
.replaceAll('_', ' ');
}
Enabling a Backing Track
sequenceDiagram
participant App as Flutter App
participant ESP as ESP32
participant DSP as Daisy Seed
App->>ESP: POST /api/backingtrack/enable<br/>{filename: "drums.wav", blendToRecording: false}
ESP->>ESP: Open WAV file
ESP->>ESP: Parse header, seek to data
ESP->>DSP: UART: CMD_BACKING_TRACK_ENABLE
DSP->>DSP: Configure blend flags
DSP-->>ESP: ACK
ESP-->>App: {status: "enabled", filename: "drums.wav"}
Note over ESP,DSP: Ready to stream on recording start
API Call
Future<bool> enableBackingTrack(String filename, {bool blendToRecording = false}) async {
try {
final response = await http.post(
Uri.parse('$_baseUrl/api/backingtrack/enable'),
headers: {'Content-Type': 'application/json'},
body: jsonEncode({
'filename': filename,
'blend_to_recording': blendToRecording,
}),
);
if (response.statusCode == 200) {
final json = jsonDecode(response.body);
return json['status'] == 'enabled';
}
return false;
} catch (e) {
log('Failed to enable backing track: $e');
return false;
}
}
Future<bool> disableBackingTrack() async {
try {
final response = await http.post(
Uri.parse('$_baseUrl/api/backingtrack/disable'),
);
return response.statusCode == 200;
} catch (e) {
log('Failed to disable backing track: $e');
return false;
}
}
I2S Streaming Architecture
flowchart TB
subgraph ESP32["ESP32"]
FILE_R[WAV File Read]
FILE_W[WAV File Write]
DMA_TX[DMA TX Buffer]
DMA_RX[DMA RX Buffer]
I2S_PERIPH[I2S Peripheral<br/>Full Duplex]
end
subgraph I2S_BUS["I2S Bus"]
BCLK[BCLK: 48kHz × 32]
LRCK[LRCK: 48kHz]
DATA_OUT[DOUT: Backing Track]
DATA_IN[DIN: Recording Audio]
end
subgraph Daisy["Daisy Seed"]
I2S_RX[I2S RX<br/>Backing Track]
I2S_TX[I2S TX<br/>Recording Audio]
AUDIO[Audio Processing]
end
FILE_R -->|Read| DMA_TX
DMA_TX --> I2S_PERIPH
I2S_PERIPH --> DATA_OUT
DATA_OUT --> I2S_RX
I2S_RX --> AUDIO
AUDIO --> I2S_TX
I2S_TX --> DATA_IN
DATA_IN --> I2S_PERIPH
I2S_PERIPH --> DMA_RX
DMA_RX -->|Write| FILE_W
ESP32 I2S Configuration (Full Duplex)
// ESP32 side - I2S full duplex for backing track + recording
i2s_config_t i2s_config = {
.mode = I2S_MODE_MASTER | I2S_MODE_TX | I2S_MODE_RX,
.sample_rate = 48000,
.bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
.channel_format = I2S_CHANNEL_FMT_RIGHT_LEFT,
.communication_format = I2S_COMM_FORMAT_STAND_I2S,
.intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,
.dma_buf_count = 8,
.dma_buf_len = 256,
.use_apll = true, // Better clock accuracy
};
i2s_pin_config_t pin_config = {
.bck_io_num = GPIO_NUM_26,
.ws_io_num = GPIO_NUM_25,
.data_out_num = GPIO_NUM_22, // TX: Backing track to Daisy
.data_in_num = GPIO_NUM_23, // RX: Recording audio from Daisy
};
Bidirectional Streaming Task
void audio_streaming_task(void* param) {
int16_t tx_buffer[BUFFER_SIZE]; // Backing track to Daisy
int16_t rx_buffer[BUFFER_SIZE]; // Recording from Daisy
size_t bytes_written, bytes_read;
while (streaming_enabled) {
// Read backing track from SD card
if (backing_track_enabled && wav_file != NULL) {
size_t samples = fread(tx_buffer, sizeof(int16_t), BUFFER_SIZE, wav_file);
if (samples == 0 && loop_enabled) {
fseek(wav_file, wav_data_offset, SEEK_SET);
continue;
}
} else {
// Send silence if no backing track
memset(tx_buffer, 0, sizeof(tx_buffer));
}
// Full duplex I2S transfer
i2s_write(I2S_NUM_0, tx_buffer, sizeof(tx_buffer), &bytes_written, portMAX_DELAY);
i2s_read(I2S_NUM_0, rx_buffer, sizeof(rx_buffer), &bytes_read, portMAX_DELAY);
// Write recording audio to SD card
if (recording_enabled && rec_file != NULL) {
fwrite(rx_buffer, 1, bytes_read, rec_file);
}
}
}
Blend Controls
The Daisy receives blend parameters via UART and mixes in real-time:
class BlendSettings {
final int backingLevel; // 0-255: backing track volume in output
final int wetLevel; // 0-255: effects signal volume in output
final int pan; // 0-255: L/R balance (128 = center)
final bool blendToRecording; // Include backing track in recording
const BlendSettings({
this.backingLevel = 180, // ~70%
this.wetLevel = 255, // 100%
this.pan = 128, // Center
this.blendToRecording = false, // Record wet only by default
});
}
Future<bool> setBlend(BlendSettings settings) async {
try {
final response = await http.post(
Uri.parse('$_baseUrl/api/backingtrack/blend'),
headers: {'Content-Type': 'application/json'},
body: jsonEncode({
'backing': settings.backingLevel,
'wet': settings.wetLevel,
'pan': settings.pan,
'blend_to_recording': settings.blendToRecording,
}),
);
return response.statusCode == 200;
} catch (e) {
log('Failed to set blend: $e');
return false;
}
}
UART Command
// CMD_BACKING_BLEND (0x0C) - 4 data bytes
struct BackingBlendFrame {
uint8_t start; // 0xAA
uint8_t len; // 0x05
uint8_t cmd; // 0x0C
uint8_t backing; // Backing track level (output)
uint8_t wet; // Wet effects level (output)
uint8_t pan; // L/R pan
uint8_t blend_to_rec; // 0 = wet only, 1 = wet + backing
uint8_t checksum;
};
Recording Session Flow
stateDiagram-v2
[*] --> Idle
Idle --> TrackSelected: Select backing track
TrackSelected --> Enabled: Enable track
Enabled --> TrackSelected: Disable track
Enabled --> Recording: Start recording
Recording --> Enabled: Stop recording
Recording --> Recording: Adjust blend
Note right of Recording: ESP32 streams backing via I2S TX
Note right of Recording: Daisy returns wet audio via I2S RX
Note right of Recording: ESP32 writes recording to SD
Coordinated Recording
Future<void> startRecordingWithBackingTrack({
required String backingTrackFilename,
required String recordingFilename,
BlendSettings blend = const BlendSettings(),
}) async {
// 1. Enable backing track (prepares I2S streaming)
final enabled = await _apiService.enableBackingTrack(
backingTrackFilename,
blendToRecording: blend.blendToRecording,
);
if (!enabled) {
throw Exception('Failed to enable backing track');
}
// 2. Set initial blend
await _apiService.setBlend(blend);
// 3. Start recording (triggers bidirectional I2S)
final started = await _apiService.startRecording(recordingFilename);
if (!started) {
await _apiService.disableBackingTrack();
throw Exception('Failed to start recording');
}
// Recording is now active:
// - ESP32 sends backing track to Daisy via I2S TX
// - Daisy sends wet effects (+ optional backing) to ESP32 via I2S RX
// - ESP32 writes received audio to SD card
}
Future<void> stopRecordingWithBackingTrack() async {
// 1. Stop recording (stops I2S and SD write)
await _apiService.stopRecording();
// 2. Disable backing track
await _apiService.disableBackingTrack();
}
The UI
class BackingTrackPanel extends StatefulWidget {
final ApiService apiService;
@override
State<BackingTrackPanel> createState() => _BackingTrackPanelState();
}
class _BackingTrackPanelState extends State<BackingTrackPanel> {
List<BackingTrack> _tracks = [];
BackingTrack? _selectedTrack;
bool _isEnabled = false;
BlendSettings _blend = const BlendSettings();
@override
void initState() {
super.initState();
_loadTracks();
}
Future<void> _loadTracks() async {
final tracks = await widget.apiService.getAvailableTracks();
setState(() => _tracks = tracks);
}
@override
Widget build(BuildContext context) {
return Card(
child: Padding(
padding: EdgeInsets.all(16),
child: Column(
crossAxisAlignment: CrossAxisAlignment.start,
children: [
// Header with enable toggle
Row(
children: [
Icon(Icons.music_note),
SizedBox(width: 8),
Text('Backing Track', style: Theme.of(context).textTheme.titleMedium),
Spacer(),
Switch(
value: _isEnabled,
onChanged: _selectedTrack != null ? _toggleEnabled : null,
),
],
),
SizedBox(height: 16),
// Track selector
_buildTrackDropdown(),
if (_isEnabled) ...[
SizedBox(height: 16),
// Blend controls
_buildSlider(
label: 'Backing Level',
value: _blend.backingLevel,
onChanged: (v) => _updateBlend(backingLevel: v),
),
_buildSlider(
label: 'Effects Level',
value: _blend.wetLevel,
onChanged: (v) => _updateBlend(wetLevel: v),
),
_buildSlider(
label: 'Pan',
value: _blend.pan,
onChanged: (v) => _updateBlend(pan: v),
showCenter: true,
),
SizedBox(height: 8),
// Blend to recording toggle
SwitchListTile(
title: Text('Include backing in recording'),
subtitle: Text(
_blend.blendToRecording
? 'Recording will include backing track'
: 'Recording wet effects only',
),
value: _blend.blendToRecording,
onChanged: (v) => _updateBlend(blendToRecording: v),
),
],
],
),
),
);
}
Widget _buildTrackDropdown() {
return DropdownButtonFormField<BackingTrack>(
value: _selectedTrack,
hint: Text('Select a track'),
isExpanded: true,
items: _tracks.map((track) {
return DropdownMenuItem(
value: track,
child: Row(
children: [
Icon(_sourceIcon(track.source), size: 16),
SizedBox(width: 8),
Expanded(child: Text(track.displayName)),
Text(
_formatDuration(track.duration),
style: TextStyle(color: Colors.grey),
),
],
),
);
}).toList(),
onChanged: (track) {
setState(() => _selectedTrack = track);
if (_isEnabled && track != null) {
widget.apiService.enableBackingTrack(track.filename);
}
},
);
}
Future<void> _toggleEnabled(bool enabled) async {
if (enabled && _selectedTrack != null) {
final success = await widget.apiService.enableBackingTrack(
_selectedTrack!.filename,
blendToRecording: _blend.blendToRecording,
);
if (success) {
setState(() => _isEnabled = true);
}
} else {
await widget.apiService.disableBackingTrack();
setState(() => _isEnabled = false);
}
}
Future<void> _updateBlend({
int? backingLevel,
int? wetLevel,
int? pan,
bool? blendToRecording,
}) async {
final newBlend = BlendSettings(
backingLevel: backingLevel ?? _blend.backingLevel,
wetLevel: wetLevel ?? _blend.wetLevel,
pan: pan ?? _blend.pan,
blendToRecording: blendToRecording ?? _blend.blendToRecording,
);
setState(() => _blend = newBlend);
await widget.apiService.setBlend(newBlend);
}
IconData _sourceIcon(BackingTrackSource source) {
return switch (source) {
BackingTrackSource.recording => Icons.mic,
BackingTrackSource.uploaded => Icons.upload,
BackingTrackSource.downloaded => Icons.cloud_download,
};
}
}
Recording Modes
| Mode | Output (Headphones) | Recording (SD Card) | Use Case |
|---|---|---|---|
| Wet Only (default) | Backing + Wet | Wet effects only | Post-production flexibility |
| Blend to Recording | Backing + Wet | Backing + Wet mixed | Quick demos, practice review |
flowchart LR
subgraph WetOnly["Wet Only Mode"]
direction TB
HP1[🎧 Backing + Wet]
REC1[💾 Wet Only]
end
subgraph BlendMode["Blend to Recording Mode"]
direction TB
HP2[🎧 Backing + Wet]
REC2[💾 Backing + Wet]
end
Advantages of On-Device Architecture
| Aspect | Phone Streaming | On-Device (Our Approach) |
|---|---|---|
| Latency | 40-200ms (Bluetooth) | <1ms (I2S) |
| Sync | Requires compensation | Perfect sync |
| Quality | Compressed (BT codec) | Lossless 16-bit |
| Recording | Complex routing | Simple - ESP32 handles it |
| Battery | Drains phone | Minimal phone usage |
| Reliability | BT can drop | Rock solid |
Key Takeaways
- Bidirectional I2S - ESP32 sends backing track, receives recording audio
- Recording on ESP32 - Daisy focuses on DSP, ESP32 handles file I/O
- Wet effects always recorded - Clean signal path for recording
- Optional backing blend - Flag to include backing track in recording
- Zero latency monitoring - Backing + wet mixed in real-time on Daisy
- 16-bit/48kHz standard - Consistent format throughout
This architecture delivers a professional practice experience - zero-latency backing tracks with flexible recording options, all handled seamlessly between ESP32 and Daisy Seed.