Building a Safe OTA System for Audio Hardware
The Stakes
A bricked guitar pedal mid-gig is unacceptable. Our OTA system must be:
| Requirement | Why |
|---|---|
| Atomic | Partial updates can't corrupt |
| Recoverable | Power loss at any point is safe |
| Verifiable | Detect corruption before applying |
| Fast | Minimize the danger window |
The Staging Area Approach
The key insight: Never write directly to the active region during OTA.
Memory Layout
#define OTA_QSPI_ACTIVE_ADDR 0x90040000 // Bootloader loads from here
#define OTA_QSPI_STAGING_ADDR 0x90140000 // OTA writes here first
#define OTA_QSPI_MAX_SIZE 0x100000 // 1MB max firmware
The OTA State Machine
struct OtaState {
bool active = false;
uint32_t fw_size = 0;
uint32_t fw_crc_expected = 0;
uint32_t bytes_written = 0;
uint32_t crc_calculated = 0xFFFFFFFF;
uint16_t total_blocks = 0;
uint16_t blocks_received = 0;
};
Phase 1: Erase Staging Area
When OTA starts, we erase only the staging area:
void OtaStart(uint8_t* data, uint16_t len) {
// Parse expected size and CRC
ota.fw_size = data[0] | (data[1] << 8) | (data[2] << 16) | (data[3] << 24);
ota.fw_crc_expected = data[4] | (data[5] << 8) | (data[6] << 16) | (data[7] << 24);
// Validate size
if (ota.fw_size > OTA_QSPI_MAX_SIZE) {
SendNack(0x02); // Too large
return;
}
// Switch to write mode and erase staging
hw.seed.qspi.DeInit();
QSPIHandle::Config qcfg = hw.seed.qspi_config;
qcfg.mode = QSPIHandle::Config::Mode::INDIRECT_POLLING;
hw.seed.qspi.Init(qcfg);
// Erase staging area (uses 64KB block erase for speed)
hw.seed.qspi.Erase(OTA_QSPI_STAGING_ADDR,
OTA_QSPI_STAGING_ADDR + ota.fw_size);
ota.active = true;
ota.bytes_written = 0;
ota.crc_calculated = 0xFFFFFFFF;
// ACK with block info
SendOtaStartAck();
}
Power failure here? No problem - active firmware is untouched.
Phase 2: Receive and Write Blocks
Data arrives in 4KB blocks:
void OtaData(uint8_t* data, uint16_t len) {
uint16_t block_num = data[0] | (data[1] << 8);
uint8_t* block_data = &data[2];
uint16_t block_size = len - 2;
// Write to staging area
uint32_t addr = OTA_QSPI_STAGING_ADDR + (block_num * OTA_BLOCK_SIZE);
hw.seed.qspi.Write(addr, block_size, block_data);
// Update running CRC
for (uint16_t i = 0; i < block_size; i++) {
ota.crc_calculated = crc32_update(ota.crc_calculated, block_data[i]);
}
ota.bytes_written += block_size;
ota.blocks_received++;
SendAckData((uint8_t*)&block_num, 2);
}
Power failure here? Staging has partial data, but active is untouched. Just retry OTA.
Phase 3: Verify CRC
Before committing, we verify the entire staged firmware:
void OtaVerify() {
// Finalize CRC calculation
uint32_t final_crc = ota.crc_calculated ^ 0xFFFFFFFF;
if (final_crc == ota.fw_crc_expected) {
// CRC matches - send success with calculated CRC
SendAckData((uint8_t*)&final_crc, 4);
} else {
// CRC mismatch - send failure with calculated CRC for debugging
SendNackData((uint8_t*)&final_crc, 4);
}
}
CRC mismatch? Discard staging, retry OTA. Active firmware untouched.
Phase 4: The Critical Copy
This is the only dangerous phase - copying from staging to active:
void OtaFinish() {
// Erase active region
hw.seed.qspi.Erase(OTA_QSPI_ACTIVE_ADDR,
OTA_QSPI_ACTIVE_ADDR + ota.fw_size);
// Copy from staging to active in 32KB chunks
constexpr uint32_t CHUNK_SIZE = 32 * 1024;
uint8_t* sram_buf = new uint8_t[CHUNK_SIZE];
uint32_t bytes_copied = 0;
while (bytes_copied < ota.fw_size) {
uint32_t chunk_size = std::min(CHUNK_SIZE, ota.fw_size - bytes_copied);
// Read from staging (memory-mapped read)
memcpy(sram_buf,
(uint8_t*)(OTA_QSPI_STAGING_ADDR + bytes_copied),
chunk_size);
// Write to active
hw.seed.qspi.Write(OTA_QSPI_ACTIVE_ADDR + bytes_copied,
chunk_size, sram_buf);
bytes_copied += chunk_size;
}
delete[] sram_buf;
// ACK and reboot
SendAck();
System::Delay(100);
System::ResetToBootloader();
}
Critical window: ~3 seconds for a 300KB firmware.
Minimizing the Critical Window
We optimized this phase heavily:
| Optimization | Before | After |
|---|---|---|
| 64KB block erase (vs 4KB sectors) | 24s | 1.6s |
| 32KB write chunks (vs 256B pages) | 62s | 3s |
| Total critical window | 86s | ~3s |
Recovery: The Bootloader
If the worst happens (power loss during copy), the Daisy bootloader provides recovery:
// Built-in Daisy bootloader behavior:
// 1. Wait 2 seconds for USB DFU connection
// 2. If DFU detected, enter programming mode
// 3. Otherwise, jump to QSPI firmware
// Recovery procedure:
// 1. Power on while holding BOOT button
// 2. Use Daisy Web Programmer or dfu-util
// 3. Flash firmware via USB
CRC32 Implementation
We use the standard CRC32 polynomial:
uint32_t crc32_update(uint32_t crc, uint8_t byte) {
crc ^= byte;
for (int j = 0; j < 8; j++) {
crc = (crc >> 1) ^ (0xEDB88320 & -(crc & 1));
}
return crc;
}
uint32_t crc32(uint8_t *data, uint32_t len) {
uint32_t crc = 0xFFFFFFFF;
for (uint32_t i = 0; i < len; i++) {
crc = crc32_update(crc, data[i]);
}
return ~crc;
}
Failure Mode Analysis
| Failure Point | Consequence | Recovery |
|---|---|---|
| During erase staging | Staging corrupted | Retry OTA |
| During write staging | Partial firmware in staging | Retry OTA |
| CRC mismatch | Detected, staging discarded | Retry OTA |
| During copy (rare) | Active corrupted | USB DFU recovery |
| After copy, before reboot | New firmware active | Boot normally |
Key Takeaways
- Staging area - Never write directly to active firmware
- Verify before commit - CRC check the entire staged image
- Minimize critical window - Optimize the copy phase aggressively
- Hardware recovery - Bootloader with USB DFU is the last resort
- Streaming CRC - Calculate CRC as data arrives, not after