Building Video-Audio Sync in Flutter: A Deep Dive into Native Platform Integration

February 10, 2026

flutterhoopiaudio-processingnative-code

When recording video with a phone camera while simultaneously capturing high-quality audio from an external device, you face a fundamental challenge: the two recordings are completely independent. The phone's video has its own audio track (from the phone's microphone), while your external audio device (in our case, the Hoopi recorder) captures pristine stereo audio to its SD card.

To create a polished final video, you need to:

Detect the time offset between the two recordings
Replace or blend the phone's audio with the high-quality recording
Handle platform differences between iOS and Android
Maintain perfect sync throughout the video duration

This post walks through how we solved these challenges in the Hoopi companion app.

Architecture Overview

Our solution uses Flutter for the UI layer with native implementations for heavy audio processing:

flowchart TB subgraph Flutter["Flutter Layer (Dart)"] UI[Video Player Widget] VRS[Video Recording Service] WS[WAV Service] end subgraph Bridge["Platform Channel"] MC[Method Channel: video_audio_sync] end subgraph iOS["iOS Native (Swift)"] AVF[AVFoundation] CMT[CoreMedia] AE[AVAssetExportSession] end subgraph Android["Android Native (Java)"] ME[MediaExtractor] MCd[MediaCodec] MM[MediaMuxer] end UI --> VRS VRS --> MC MC --> AVF MC --> ME WS --> UI AVF --> CMT --> AE ME --> MCd --> MM

Why Native Code?

Flutter's audio/video packages are great for playback, but when you need to:

Extract raw PCM samples for analysis
Perform sample-accurate mixing
Mux audio and video tracks together
Convert between audio formats (WAV → AAC)

...you need native platform APIs. There's no pure-Dart way to do sample-level audio manipulation with the performance required for real-time feedback.

The Sync Workflow

sequenceDiagram participant User participant Flutter participant Native participant Storage User->>Flutter: Tap "Sync Audio" Flutter->>Flutter: Download Hoopi audio if needed Flutter->>Native: detectAudioOffset(video, audio) Native->>Native: Extract video audio to WAV Native->>Native: Downsample both to 8kHz Native->>Native: Cross-correlation search ±2s Native-->>Flutter: offset (ms) Flutter->>User: Display detected offset User->>Flutter: Adjust offset / Tap "Sync" Flutter->>Native: replaceAudioInVideo(video, audio, offset, blend) Native->>Native: Load assets Native->>Native: Apply offset to audio track Native->>Native: Blend/mix channels Native->>Native: Mux video + audio Native-->>Flutter: outputPath Flutter->>Storage: Save synced video Flutter->>User: Navigate to synced video

Key Architecture Decisions

1. Cross-Correlation for Offset Detection

The most reliable way to automatically detect the time offset between two audio recordings is normalized cross-correlation. Both recordings capture the same acoustic event (the performance), just from different microphones.

flowchart LR subgraph Input V[Video Audio Phone Mic] H[Hoopi Audio Device Recording] end subgraph Process D[Downsample to 8kHz] M[Convert to Mono] XC[Cross-Correlation Search ±2 seconds] end subgraph Output O[Offset in ms + or -] end V --> D --> M --> XC H --> D XC --> O

Why 8kHz downsampling?

Cross-correlation is O(n²) - processing 5 seconds at 44.1kHz would be slow
Timing information is preserved at lower sample rates
8kHz gives us 5× speedup while maintaining sub-millisecond accuracy

The algorithm:

// Android implementation
private double computeCorrelation(float[] video, float[] hoopi, int offset) {
    int start = Math.max(0, offset);
    int end = Math.min(video.length, hoopi.length + offset);

    double sum = 0, sumV2 = 0, sumH2 = 0;

    for (int i = start; i < end; i++) {
        int hoopiIdx = i - offset;
        if (hoopiIdx >= 0 && hoopiIdx < hoopi.length) {
            sum += video[i] * hoopi[hoopiIdx];
            sumV2 += video[i] * video[i];
            sumH2 += hoopi[hoopiIdx] * hoopi[hoopiIdx];
        }
    }

    double denom = Math.sqrt(sumV2 * sumH2);
    return denom > 0 ? sum / denom : 0;
}

2. Channel Mixing Mode

For musicians, we offer a powerful feature: phone mic on left channel, Hoopi on right channel. This lets you capture room ambience with the phone while getting direct instrument signal from Hoopi.

flowchart LR subgraph Inputs VM[Video Audio Stereo: L=Mic, R=Speaker] HA[Hoopi Audio Stereo: L+R Recording] end subgraph Processing EL[Extract Left Phone Mic] ER[Extract Right Hoopi Signal] MX[Mix with Gains] end subgraph Output OUT[Output Stereo L=Phone, R=Hoopi] end VM --> EL --> MX HA --> ER --> MX MX --> OUT

3. Method Channel Bridge

All native communication goes through a single method channel:

// Flutter side
class VideoRecordingService {
  static const _channel = MethodChannel('video_audio_sync');

  Future<int?> detectAudioOffset(String videoPath, String audioPath) async {
    return await _channel.invokeMethod('detectOffset', {
      'videoPath': videoPath,
      'audioPath': audioPath,
    });
  }

  Future<bool> replaceAudioInVideo({
    required String videoPath,
    required String audioPath,
    required String outputPath,
    required int offsetMs,
    required int blendPercentage,
  }) async {
    return await _channel.invokeMethod('replaceAudio', {
      'videoPath': videoPath,
      'audioPath': audioPath,
      'outputPath': outputPath,
      'offsetMs': offsetMs,
      'blendPercentage': blendPercentage,
    });
  }
}

Platform Differences: The Challenges

Challenge 1: Audio Format Handling

iOS can work directly with WAV files in AVMutableComposition:

// iOS: Direct WAV support
let audioAsset = AVAsset(url: audioURL)  // Works with WAV
let audioTrack = composition.addMutableTrack(
    withMediaType: .audio,
    preferredTrackID: kCMPersistentTrackID_Invalid
)
try audioTrack?.insertTimeRange(timeRange, of: sourceTrack, at: insertionTime)

Android requires explicit WAV → AAC conversion before muxing:

// Android: Must convert WAV to AAC for MP4 container
private String convertWavToAac(String wavPath, long offsetUs) {
    MediaExtractor extractor = new MediaExtractor();
    extractor.setDataSource(wavPath);

    // Configure AAC encoder
    MediaFormat outputFormat = MediaFormat.createAudioFormat(
        MediaFormat.MIMETYPE_AUDIO_AAC,
        sampleRate,
        channelCount
    );
    outputFormat.setInteger(MediaFormat.KEY_BIT_RATE, 384000);
    outputFormat.setInteger(MediaFormat.KEY_AAC_PROFILE,
        MediaCodecInfo.CodecProfileLevel.AACObjectLC);

    MediaCodec encoder = MediaCodec.createEncoderByType(
        MediaFormat.MIMETYPE_AUDIO_AAC
    );
    encoder.configure(outputFormat, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE);

    // ... encoding loop with offset application
}

Challenge 2: Offset Application

Applying a time offset sounds simple, but the implementation differs significantly:

iOS uses time-based insertion points:

// iOS: Positive offset = delay audio insertion
if offsetMs > 0 {
    let offsetTime = CMTimeMake(value: Int64(offsetMs), timescale: 1000)
    try audioTrack.insertTimeRange(
        CMTimeRangeMake(start: .zero, duration: audioDuration),
        of: sourceAudioTrack,
        at: offsetTime  // Insert at offset, not at zero
    )
} else {
    // Negative offset = skip beginning of audio
    let skipTime = CMTimeMake(value: Int64(abs(offsetMs)), timescale: 1000)
    try audioTrack.insertTimeRange(
        CMTimeRangeMake(start: skipTime, duration: adjustedDuration),
        of: sourceAudioTrack,
        at: .zero
    )
}

Android manipulates presentation timestamps during muxing:

// Android: Modify timestamps during mux
private void muxTrackWithOffset(MediaExtractor extractor, MediaMuxer muxer,
                                 int trackIndex, long maxDurationUs, long offsetUs) {
    ByteBuffer buffer = ByteBuffer.allocate(1024 * 1024);
    MediaCodec.BufferInfo bufferInfo = new MediaCodec.BufferInfo();

    // For negative offset, seek forward first
    if (offsetUs < 0) {
        extractor.seekTo(Math.abs(offsetUs), MediaExtractor.SEEK_TO_CLOSEST_SYNC);
    }

    while (true) {
        int sampleSize = extractor.readSampleData(buffer, 0);
        if (sampleSize < 0) break;

        bufferInfo.presentationTimeUs = extractor.getSampleTime();

        // Apply offset to timestamp
        if (offsetUs > 0) {
            bufferInfo.presentationTimeUs += offsetUs;
        } else {
            bufferInfo.presentationTimeUs -= Math.abs(offsetUs);
            if (bufferInfo.presentationTimeUs < 0) {
                extractor.advance();
                continue;  // Skip samples with negative timestamps
            }
        }

        muxer.writeSampleData(trackIndex, buffer, bufferInfo);
        extractor.advance();
    }
}

Challenge 3: Audio Blending

When blending video's original audio with Hoopi audio, the approaches diverge:

iOS uses AVMutableAudioMix with volume parameters:

// iOS: Hardware-accelerated mixing
let audioMix = AVMutableAudioMix()
var mixParameters: [AVMutableAudioMixInputParameters] = []

// Hoopi audio at 100%
let deviceParams = AVMutableAudioMixInputParameters(track: deviceAudioTrack)
deviceParams.setVolume(1.0, at: .zero)
mixParameters.append(deviceParams)

// Video audio at blend percentage
let videoParams = AVMutableAudioMixInputParameters(track: videoAudioTrack)
videoParams.setVolume(Float(blendPercentage) / 100.0, at: .zero)
mixParameters.append(videoParams)

audioMix.inputParameters = mixParameters
exportSession.audioMix = audioMix

Android requires sample-by-sample mixing in code:

// Android: Manual sample mixing
private void blendAudioFiles(String videoAudioPath, String hoopiPath,
                              String outputPath, int blendPercent, long offsetUs) {
    // Read both files
    byte[] videoBuffer = new byte[8192];
    byte[] hoopiBuffer = new byte[8192];

    while (/* samples remaining */) {
        // Apply offset by reading from different positions
        videoStream.read(videoBuffer);
        hoopiStream.read(hoopiBuffer);

        // Mix samples
        for (int i = 0; i < videoBuffer.length; i += 2) {
            short videoSample = bytesToShort(videoBuffer, i);
            short hoopiSample = bytesToShort(hoopiBuffer, i);

            // Blend formula
            float mixed = hoopiSample + (videoSample * blendPercent / 100.0f);

            // Clamp to prevent clipping
            short output = (short) Math.max(-32768, Math.min(32767, mixed));
            shortToBytes(output, outputBuffer, i);
        }

        outputStream.write(outputBuffer);
    }
}

The Flutter UI Layer

The sync interface provides intuitive controls:

Widget _buildAudioSyncPanel() {
  return Column(
    children: [
      // Track selection (for stereo Hoopi recordings)
      Row(
        children: [
          ChoiceChip(
            label: Text('L'),
            selected: _selectedTrack == StereoTrack.left,
            onSelected: (_) => setState(() => _selectedTrack = StereoTrack.left),
          ),
          ChoiceChip(
            label: Text('R'),
            selected: _selectedTrack == StereoTrack.right,
            onSelected: (_) => setState(() => _selectedTrack = StereoTrack.right),
          ),
          ChoiceChip(
            label: Text('Both'),
            selected: _selectedTrack == StereoTrack.both,
            onSelected: (_) => setState(() => _selectedTrack = StereoTrack.both),
          ),
        ],
      ),

      // Offset input with auto-detect
      Row(
        children: [
          Expanded(
            child: TextField(
              controller: _offsetController,
              keyboardType: TextInputType.number,
              decoration: InputDecoration(
                labelText: 'Offset (ms)',
                suffixText: 'ms',
              ),
            ),
          ),
          TextButton(
            onPressed: _detectOffset,
            child: Text('Detect'),
          ),
        ],
      ),

      // Blend slider
      Slider(
        value: _blendPercentage,
        min: 0,
        max: 100,
        divisions: 20,
        label: '${_blendPercentage.round()}% video audio',
        onChanged: (value) => setState(() => _blendPercentage = value),
      ),

      // Sync button
      ElevatedButton(
        onPressed: _isSyncing ? null : _performAudioSync,
        child: _isSyncing
          ? CircularProgressIndicator()
          : Text('Sync Audio'),
      ),
    ],
  );
}

WAV File Handling

The Hoopi device records stereo WAV files. We need to handle channel extraction for the "L/R" selection feature:

// wav_service.dart
class WavService {
  Future<void> extractChannel(String inputPath, String outputPath, int channel) async {
    final bytes = await File(inputPath).readAsBytes();
    final header = _parseWavHeader(bytes);

    if (header.numChannels != 2) {
      throw Exception('Input must be stereo');
    }

    // Calculate mono output size
    final monoDataSize = header.dataSize ~/ 2;
    final monoBytes = Uint8List(44 + monoDataSize);  // 44-byte header + data

    // Write mono WAV header
    _writeWavHeader(monoBytes,
      sampleRate: header.sampleRate,
      bitsPerSample: header.bitsPerSample,
      numChannels: 1,
      dataSize: monoDataSize,
    );

    // Extract channel samples
    final bytesPerSample = header.bitsPerSample ~/ 8;
    final frameSize = bytesPerSample * 2;  // Stereo frame

    int readPos = header.dataOffset + (channel * bytesPerSample);
    int writePos = 44;

    while (readPos < header.dataOffset + header.dataSize) {
      // Copy one sample from selected channel
      for (int b = 0; b < bytesPerSample; b++) {
        monoBytes[writePos++] = bytes[readPos + b];
      }
      readPos += frameSize;  // Skip to next frame
    }

    await File(outputPath).writeAsBytes(monoBytes);
  }
}

Performance Optimizations

1. Streaming, Not Loading

For large files, we never load entire audio files into memory:

// Android: Stream-based processing
ByteBuffer buffer = ByteBuffer.allocate(1024 * 1024);  // 1MB buffer
while ((sampleSize = extractor.readSampleData(buffer, 0)) >= 0) {
    // Process chunk
    extractor.advance();
}

2. Background Threading

All heavy operations run off the main thread:

// iOS
DispatchQueue.global(qos: .userInitiated).async {
    let offset = self.performCrossCorrelation(video, hoopi)
    DispatchQueue.main.async {
        result(offset)  // Return to Flutter
    }
}

// Android
new Thread(() -> {
    int offset = detectAudioOffset(videoPath, audioPath);
    new Handler(Looper.getMainLooper()).post(() -> {
        result.success(offset);
    });
}).start();

3. Cleanup

Temporary files are always cleaned up:

private void cleanupTempFiles() {
    String[] tempFiles = {
        "_extracted_audio.wav",
        "_blended_audio.wav",
        "_converted.aac",
        "_mixed_channels.wav"
    };

    for (String name : tempFiles) {
        File f = new File(cacheDir, name);
        if (f.exists()) f.delete();
    }
}

Lessons Learned

1. Don't Trust Platform Abstractions

We initially tried using higher-level Flutter packages for audio manipulation. They work for simple playback but fall apart when you need:

Sample-accurate timing
Custom mixing algorithms
Format conversion

Solution: Embrace platform channels and native code.

2. Test on Real Devices

Emulators handle MediaCodec differently. Some codecs that work on real devices fail on emulators.

3. Handle Edge Cases

What if video is shorter than audio? (Trim audio)
What if audio is shorter? (Pad with silence or trim video)
What if offset detection fails? (Fall back to 0, allow manual adjustment)
What if user's phone has mono mic? (Detect and adapt)

4. Cross-Correlation Has Limits

The algorithm works great when both recordings have clear transients (drums, speech). It struggles with:

Ambient/pad sounds with no clear peaks
Very different frequency responses between mics
Heavily distorted guitar (use clean signal for sync)

We always provide manual offset adjustment as a fallback.

Conclusion

Building video-audio sync in Flutter required diving deep into native platform APIs. The key takeaways:

Use the right tool for the job - Flutter for UI, native code for DSP
Cross-correlation is powerful but needs optimization (downsampling)
Platform differences are significant - iOS and Android have fundamentally different audio APIs
Always provide fallbacks - Auto-detect is great, but users need manual control

The result is a seamless experience: record with your phone, capture pro audio with Hoopi, and combine them with a single tap.

This is part of the Hoopi companion app, built with Flutter, Swift, and Java.

← Back to Blog