Building a Safe OTA System for Audio Hardware

The Stakes

A bricked guitar pedal mid-gig is unacceptable. Our OTA system must be:

Requirement Why
Atomic Partial updates can't corrupt
Recoverable Power loss at any point is safe
Verifiable Detect corruption before applying
Fast Minimize the danger window

The Staging Area Approach

flowchart TB subgraph QSPI["QSPI Flash (8MB)"] ACTIVE["Active Region<br/>0x90040000<br/>1MB"] STAGING["Staging Region<br/>0x90140000<br/>1MB"] DATA["Data Region<br/>0x90240000<br/>6MB"] end subgraph FLOW["OTA Flow"] F1[1. Receive firmware] F2[2. Write to staging] F3[3. Verify CRC] F4[4. Copy to active] F5[5. Reboot] end F1 --> F2 F2 --> STAGING F3 --> STAGING F4 --> ACTIVE F5 --> ACTIVE

The key insight: Never write directly to the active region during OTA.

Memory Layout

#define OTA_QSPI_ACTIVE_ADDR  0x90040000  // Bootloader loads from here
#define OTA_QSPI_STAGING_ADDR 0x90140000  // OTA writes here first
#define OTA_QSPI_MAX_SIZE     0x100000    // 1MB max firmware

The OTA State Machine

struct OtaState {
    bool active = false;
    uint32_t fw_size = 0;
    uint32_t fw_crc_expected = 0;
    uint32_t bytes_written = 0;
    uint32_t crc_calculated = 0xFFFFFFFF;
    uint16_t total_blocks = 0;
    uint16_t blocks_received = 0;
};

Phase 1: Erase Staging Area

When OTA starts, we erase only the staging area:

void OtaStart(uint8_t* data, uint16_t len) {
    // Parse expected size and CRC
    ota.fw_size = data[0] | (data[1] << 8) | (data[2] << 16) | (data[3] << 24);
    ota.fw_crc_expected = data[4] | (data[5] << 8) | (data[6] << 16) | (data[7] << 24);

    // Validate size
    if (ota.fw_size > OTA_QSPI_MAX_SIZE) {
        SendNack(0x02);  // Too large
        return;
    }

    // Switch to write mode and erase staging
    hw.seed.qspi.DeInit();
    QSPIHandle::Config qcfg = hw.seed.qspi_config;
    qcfg.mode = QSPIHandle::Config::Mode::INDIRECT_POLLING;
    hw.seed.qspi.Init(qcfg);

    // Erase staging area (uses 64KB block erase for speed)
    hw.seed.qspi.Erase(OTA_QSPI_STAGING_ADDR,
                       OTA_QSPI_STAGING_ADDR + ota.fw_size);

    ota.active = true;
    ota.bytes_written = 0;
    ota.crc_calculated = 0xFFFFFFFF;

    // ACK with block info
    SendOtaStartAck();
}

Power failure here? No problem - active firmware is untouched.

Phase 2: Receive and Write Blocks

Data arrives in 4KB blocks:

void OtaData(uint8_t* data, uint16_t len) {
    uint16_t block_num = data[0] | (data[1] << 8);
    uint8_t* block_data = &data[2];
    uint16_t block_size = len - 2;

    // Write to staging area
    uint32_t addr = OTA_QSPI_STAGING_ADDR + (block_num * OTA_BLOCK_SIZE);
    hw.seed.qspi.Write(addr, block_size, block_data);

    // Update running CRC
    for (uint16_t i = 0; i < block_size; i++) {
        ota.crc_calculated = crc32_update(ota.crc_calculated, block_data[i]);
    }

    ota.bytes_written += block_size;
    ota.blocks_received++;

    SendAckData((uint8_t*)&block_num, 2);
}

Power failure here? Staging has partial data, but active is untouched. Just retry OTA.

Phase 3: Verify CRC

Before committing, we verify the entire staged firmware:

void OtaVerify() {
    // Finalize CRC calculation
    uint32_t final_crc = ota.crc_calculated ^ 0xFFFFFFFF;

    if (final_crc == ota.fw_crc_expected) {
        // CRC matches - send success with calculated CRC
        SendAckData((uint8_t*)&final_crc, 4);
    } else {
        // CRC mismatch - send failure with calculated CRC for debugging
        SendNackData((uint8_t*)&final_crc, 4);
    }
}

CRC mismatch? Discard staging, retry OTA. Active firmware untouched.

Phase 4: The Critical Copy

This is the only dangerous phase - copying from staging to active:

void OtaFinish() {
    // Erase active region
    hw.seed.qspi.Erase(OTA_QSPI_ACTIVE_ADDR,
                       OTA_QSPI_ACTIVE_ADDR + ota.fw_size);

    // Copy from staging to active in 32KB chunks
    constexpr uint32_t CHUNK_SIZE = 32 * 1024;
    uint8_t* sram_buf = new uint8_t[CHUNK_SIZE];
    uint32_t bytes_copied = 0;

    while (bytes_copied < ota.fw_size) {
        uint32_t chunk_size = std::min(CHUNK_SIZE, ota.fw_size - bytes_copied);

        // Read from staging (memory-mapped read)
        memcpy(sram_buf,
               (uint8_t*)(OTA_QSPI_STAGING_ADDR + bytes_copied),
               chunk_size);

        // Write to active
        hw.seed.qspi.Write(OTA_QSPI_ACTIVE_ADDR + bytes_copied,
                           chunk_size, sram_buf);

        bytes_copied += chunk_size;
    }

    delete[] sram_buf;

    // ACK and reboot
    SendAck();
    System::Delay(100);
    System::ResetToBootloader();
}

Critical window: ~3 seconds for a 300KB firmware.

Minimizing the Critical Window

We optimized this phase heavily:

Optimization Before After
64KB block erase (vs 4KB sectors) 24s 1.6s
32KB write chunks (vs 256B pages) 62s 3s
Total critical window 86s ~3s

Recovery: The Bootloader

If the worst happens (power loss during copy), the Daisy bootloader provides recovery:

// Built-in Daisy bootloader behavior:
// 1. Wait 2 seconds for USB DFU connection
// 2. If DFU detected, enter programming mode
// 3. Otherwise, jump to QSPI firmware

// Recovery procedure:
// 1. Power on while holding BOOT button
// 2. Use Daisy Web Programmer or dfu-util
// 3. Flash firmware via USB

CRC32 Implementation

We use the standard CRC32 polynomial:

uint32_t crc32_update(uint32_t crc, uint8_t byte) {
    crc ^= byte;
    for (int j = 0; j < 8; j++) {
        crc = (crc >> 1) ^ (0xEDB88320 & -(crc & 1));
    }
    return crc;
}

uint32_t crc32(uint8_t *data, uint32_t len) {
    uint32_t crc = 0xFFFFFFFF;
    for (uint32_t i = 0; i < len; i++) {
        crc = crc32_update(crc, data[i]);
    }
    return ~crc;
}

Failure Mode Analysis

Failure Point Consequence Recovery
During erase staging Staging corrupted Retry OTA
During write staging Partial firmware in staging Retry OTA
CRC mismatch Detected, staging discarded Retry OTA
During copy (rare) Active corrupted USB DFU recovery
After copy, before reboot New firmware active Boot normally

Key Takeaways

  1. Staging area - Never write directly to active firmware
  2. Verify before commit - CRC check the entire staged image
  3. Minimize critical window - Optimize the copy phase aggressively
  4. Hardware recovery - Bootloader with USB DFU is the last resort
  5. Streaming CRC - Calculate CRC as data arrives, not after