Building Video-Audio Sync in Flutter: A Deep Dive into Native Platform Integration
When recording video with a phone camera while simultaneously capturing high-quality audio from an external device, you face a fundamental challenge: the two recordings are completely independent. The phone's video has its own audio track (from the phone's microphone), while your external audio device (in our case, the Hoopi recorder) captures pristine stereo audio to its SD card.
To create a polished final video, you need to:
- Detect the time offset between the two recordings
- Replace or blend the phone's audio with the high-quality recording
- Handle platform differences between iOS and Android
- Maintain perfect sync throughout the video duration
This post walks through how we solved these challenges in the Hoopi companion app.
Architecture Overview
Our solution uses Flutter for the UI layer with native implementations for heavy audio processing:
Why Native Code?
Flutter's audio/video packages are great for playback, but when you need to:
- Extract raw PCM samples for analysis
- Perform sample-accurate mixing
- Mux audio and video tracks together
- Convert between audio formats (WAV → AAC)
...you need native platform APIs. There's no pure-Dart way to do sample-level audio manipulation with the performance required for real-time feedback.
The Sync Workflow
Key Architecture Decisions
1. Cross-Correlation for Offset Detection
The most reliable way to automatically detect the time offset between two audio recordings is normalized cross-correlation. Both recordings capture the same acoustic event (the performance), just from different microphones.
Why 8kHz downsampling?
- Cross-correlation is O(n²) - processing 5 seconds at 44.1kHz would be slow
- Timing information is preserved at lower sample rates
- 8kHz gives us 5× speedup while maintaining sub-millisecond accuracy
The algorithm:
// Android implementation
private double computeCorrelation(float[] video, float[] hoopi, int offset) {
int start = Math.max(0, offset);
int end = Math.min(video.length, hoopi.length + offset);
double sum = 0, sumV2 = 0, sumH2 = 0;
for (int i = start; i < end; i++) {
int hoopiIdx = i - offset;
if (hoopiIdx >= 0 && hoopiIdx < hoopi.length) {
sum += video[i] * hoopi[hoopiIdx];
sumV2 += video[i] * video[i];
sumH2 += hoopi[hoopiIdx] * hoopi[hoopiIdx];
}
}
double denom = Math.sqrt(sumV2 * sumH2);
return denom > 0 ? sum / denom : 0;
}
2. Channel Mixing Mode
For musicians, we offer a powerful feature: phone mic on left channel, Hoopi on right channel. This lets you capture room ambience with the phone while getting direct instrument signal from Hoopi.
3. Method Channel Bridge
All native communication goes through a single method channel:
// Flutter side
class VideoRecordingService {
static const _channel = MethodChannel('video_audio_sync');
Future<int?> detectAudioOffset(String videoPath, String audioPath) async {
return await _channel.invokeMethod('detectOffset', {
'videoPath': videoPath,
'audioPath': audioPath,
});
}
Future<bool> replaceAudioInVideo({
required String videoPath,
required String audioPath,
required String outputPath,
required int offsetMs,
required int blendPercentage,
}) async {
return await _channel.invokeMethod('replaceAudio', {
'videoPath': videoPath,
'audioPath': audioPath,
'outputPath': outputPath,
'offsetMs': offsetMs,
'blendPercentage': blendPercentage,
});
}
}
Platform Differences: The Challenges
Challenge 1: Audio Format Handling
iOS can work directly with WAV files in AVMutableComposition:
// iOS: Direct WAV support
let audioAsset = AVAsset(url: audioURL) // Works with WAV
let audioTrack = composition.addMutableTrack(
withMediaType: .audio,
preferredTrackID: kCMPersistentTrackID_Invalid
)
try audioTrack?.insertTimeRange(timeRange, of: sourceTrack, at: insertionTime)
Android requires explicit WAV → AAC conversion before muxing:
// Android: Must convert WAV to AAC for MP4 container
private String convertWavToAac(String wavPath, long offsetUs) {
MediaExtractor extractor = new MediaExtractor();
extractor.setDataSource(wavPath);
// Configure AAC encoder
MediaFormat outputFormat = MediaFormat.createAudioFormat(
MediaFormat.MIMETYPE_AUDIO_AAC,
sampleRate,
channelCount
);
outputFormat.setInteger(MediaFormat.KEY_BIT_RATE, 384000);
outputFormat.setInteger(MediaFormat.KEY_AAC_PROFILE,
MediaCodecInfo.CodecProfileLevel.AACObjectLC);
MediaCodec encoder = MediaCodec.createEncoderByType(
MediaFormat.MIMETYPE_AUDIO_AAC
);
encoder.configure(outputFormat, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE);
// ... encoding loop with offset application
}
Challenge 2: Offset Application
Applying a time offset sounds simple, but the implementation differs significantly:
iOS uses time-based insertion points:
// iOS: Positive offset = delay audio insertion
if offsetMs > 0 {
let offsetTime = CMTimeMake(value: Int64(offsetMs), timescale: 1000)
try audioTrack.insertTimeRange(
CMTimeRangeMake(start: .zero, duration: audioDuration),
of: sourceAudioTrack,
at: offsetTime // Insert at offset, not at zero
)
} else {
// Negative offset = skip beginning of audio
let skipTime = CMTimeMake(value: Int64(abs(offsetMs)), timescale: 1000)
try audioTrack.insertTimeRange(
CMTimeRangeMake(start: skipTime, duration: adjustedDuration),
of: sourceAudioTrack,
at: .zero
)
}
Android manipulates presentation timestamps during muxing:
// Android: Modify timestamps during mux
private void muxTrackWithOffset(MediaExtractor extractor, MediaMuxer muxer,
int trackIndex, long maxDurationUs, long offsetUs) {
ByteBuffer buffer = ByteBuffer.allocate(1024 * 1024);
MediaCodec.BufferInfo bufferInfo = new MediaCodec.BufferInfo();
// For negative offset, seek forward first
if (offsetUs < 0) {
extractor.seekTo(Math.abs(offsetUs), MediaExtractor.SEEK_TO_CLOSEST_SYNC);
}
while (true) {
int sampleSize = extractor.readSampleData(buffer, 0);
if (sampleSize < 0) break;
bufferInfo.presentationTimeUs = extractor.getSampleTime();
// Apply offset to timestamp
if (offsetUs > 0) {
bufferInfo.presentationTimeUs += offsetUs;
} else {
bufferInfo.presentationTimeUs -= Math.abs(offsetUs);
if (bufferInfo.presentationTimeUs < 0) {
extractor.advance();
continue; // Skip samples with negative timestamps
}
}
muxer.writeSampleData(trackIndex, buffer, bufferInfo);
extractor.advance();
}
}
Challenge 3: Audio Blending
When blending video's original audio with Hoopi audio, the approaches diverge:
iOS uses AVMutableAudioMix with volume parameters:
// iOS: Hardware-accelerated mixing
let audioMix = AVMutableAudioMix()
var mixParameters: [AVMutableAudioMixInputParameters] = []
// Hoopi audio at 100%
let deviceParams = AVMutableAudioMixInputParameters(track: deviceAudioTrack)
deviceParams.setVolume(1.0, at: .zero)
mixParameters.append(deviceParams)
// Video audio at blend percentage
let videoParams = AVMutableAudioMixInputParameters(track: videoAudioTrack)
videoParams.setVolume(Float(blendPercentage) / 100.0, at: .zero)
mixParameters.append(videoParams)
audioMix.inputParameters = mixParameters
exportSession.audioMix = audioMix
Android requires sample-by-sample mixing in code:
// Android: Manual sample mixing
private void blendAudioFiles(String videoAudioPath, String hoopiPath,
String outputPath, int blendPercent, long offsetUs) {
// Read both files
byte[] videoBuffer = new byte[8192];
byte[] hoopiBuffer = new byte[8192];
while (/* samples remaining */) {
// Apply offset by reading from different positions
videoStream.read(videoBuffer);
hoopiStream.read(hoopiBuffer);
// Mix samples
for (int i = 0; i < videoBuffer.length; i += 2) {
short videoSample = bytesToShort(videoBuffer, i);
short hoopiSample = bytesToShort(hoopiBuffer, i);
// Blend formula
float mixed = hoopiSample + (videoSample * blendPercent / 100.0f);
// Clamp to prevent clipping
short output = (short) Math.max(-32768, Math.min(32767, mixed));
shortToBytes(output, outputBuffer, i);
}
outputStream.write(outputBuffer);
}
}
The Flutter UI Layer
The sync interface provides intuitive controls:
Widget _buildAudioSyncPanel() {
return Column(
children: [
// Track selection (for stereo Hoopi recordings)
Row(
children: [
ChoiceChip(
label: Text('L'),
selected: _selectedTrack == StereoTrack.left,
onSelected: (_) => setState(() => _selectedTrack = StereoTrack.left),
),
ChoiceChip(
label: Text('R'),
selected: _selectedTrack == StereoTrack.right,
onSelected: (_) => setState(() => _selectedTrack = StereoTrack.right),
),
ChoiceChip(
label: Text('Both'),
selected: _selectedTrack == StereoTrack.both,
onSelected: (_) => setState(() => _selectedTrack = StereoTrack.both),
),
],
),
// Offset input with auto-detect
Row(
children: [
Expanded(
child: TextField(
controller: _offsetController,
keyboardType: TextInputType.number,
decoration: InputDecoration(
labelText: 'Offset (ms)',
suffixText: 'ms',
),
),
),
TextButton(
onPressed: _detectOffset,
child: Text('Detect'),
),
],
),
// Blend slider
Slider(
value: _blendPercentage,
min: 0,
max: 100,
divisions: 20,
label: '${_blendPercentage.round()}% video audio',
onChanged: (value) => setState(() => _blendPercentage = value),
),
// Sync button
ElevatedButton(
onPressed: _isSyncing ? null : _performAudioSync,
child: _isSyncing
? CircularProgressIndicator()
: Text('Sync Audio'),
),
],
);
}
WAV File Handling
The Hoopi device records stereo WAV files. We need to handle channel extraction for the "L/R" selection feature:
// wav_service.dart
class WavService {
Future<void> extractChannel(String inputPath, String outputPath, int channel) async {
final bytes = await File(inputPath).readAsBytes();
final header = _parseWavHeader(bytes);
if (header.numChannels != 2) {
throw Exception('Input must be stereo');
}
// Calculate mono output size
final monoDataSize = header.dataSize ~/ 2;
final monoBytes = Uint8List(44 + monoDataSize); // 44-byte header + data
// Write mono WAV header
_writeWavHeader(monoBytes,
sampleRate: header.sampleRate,
bitsPerSample: header.bitsPerSample,
numChannels: 1,
dataSize: monoDataSize,
);
// Extract channel samples
final bytesPerSample = header.bitsPerSample ~/ 8;
final frameSize = bytesPerSample * 2; // Stereo frame
int readPos = header.dataOffset + (channel * bytesPerSample);
int writePos = 44;
while (readPos < header.dataOffset + header.dataSize) {
// Copy one sample from selected channel
for (int b = 0; b < bytesPerSample; b++) {
monoBytes[writePos++] = bytes[readPos + b];
}
readPos += frameSize; // Skip to next frame
}
await File(outputPath).writeAsBytes(monoBytes);
}
}
Performance Optimizations
1. Streaming, Not Loading
For large files, we never load entire audio files into memory:
// Android: Stream-based processing
ByteBuffer buffer = ByteBuffer.allocate(1024 * 1024); // 1MB buffer
while ((sampleSize = extractor.readSampleData(buffer, 0)) >= 0) {
// Process chunk
extractor.advance();
}
2. Background Threading
All heavy operations run off the main thread:
// iOS
DispatchQueue.global(qos: .userInitiated).async {
let offset = self.performCrossCorrelation(video, hoopi)
DispatchQueue.main.async {
result(offset) // Return to Flutter
}
}
// Android
new Thread(() -> {
int offset = detectAudioOffset(videoPath, audioPath);
new Handler(Looper.getMainLooper()).post(() -> {
result.success(offset);
});
}).start();
3. Cleanup
Temporary files are always cleaned up:
private void cleanupTempFiles() {
String[] tempFiles = {
"_extracted_audio.wav",
"_blended_audio.wav",
"_converted.aac",
"_mixed_channels.wav"
};
for (String name : tempFiles) {
File f = new File(cacheDir, name);
if (f.exists()) f.delete();
}
}
Lessons Learned
1. Don't Trust Platform Abstractions
We initially tried using higher-level Flutter packages for audio manipulation. They work for simple playback but fall apart when you need:
- Sample-accurate timing
- Custom mixing algorithms
- Format conversion
Solution: Embrace platform channels and native code.
2. Test on Real Devices
Emulators handle MediaCodec differently. Some codecs that work on real devices fail on emulators.
3. Handle Edge Cases
- What if video is shorter than audio? (Trim audio)
- What if audio is shorter? (Pad with silence or trim video)
- What if offset detection fails? (Fall back to 0, allow manual adjustment)
- What if user's phone has mono mic? (Detect and adapt)
4. Cross-Correlation Has Limits
The algorithm works great when both recordings have clear transients (drums, speech). It struggles with:
- Ambient/pad sounds with no clear peaks
- Very different frequency responses between mics
- Heavily distorted guitar (use clean signal for sync)
We always provide manual offset adjustment as a fallback.
Conclusion
Building video-audio sync in Flutter required diving deep into native platform APIs. The key takeaways:
- Use the right tool for the job - Flutter for UI, native code for DSP
- Cross-correlation is powerful but needs optimization (downsampling)
- Platform differences are significant - iOS and Android have fundamentally different audio APIs
- Always provide fallbacks - Auto-detect is great, but users need manual control
The result is a seamless experience: record with your phone, capture pro audio with Hoopi, and combine them with a single tap.
This is part of the Hoopi companion app, built with Flutter, Swift, and Java.