Question: how to find an offset from two audio file ? one is noisy and one is clear


Answers 3
Added at 2016-12-30 07:12

I have once scenario in which user capturing the concert scene with the realtime audio of the performer and at the same time device is downloading the live streaming from audio broadcaster device.later i replace the realtime noisy audio (captured while recording) with the one i have streamed and saved in my phone (good quality audio).right now i am setting the audio offset manually with trial and error basis while merging so i can sync the audio and video activity at exact position.

Now what i want to do is to automate the process of synchronisation of audio.instead of merging the video with clear audio at given offset i want to merge the video with clear audio automatically with proper sync.

for that i need to find the offset at which i should replace the noisy audio with clear audio.e.g. when user start the recording and stop the recording then i will take that sample of real time audio and compare with live streamed audio and take the exact part of that audio from that and sync at perfect time.

does any one have any idea how to find the offset by comparing two audio files and sync with the video.?

nr: #1 dodano: 2017-01-02 14:01

I don't know a lot about the subject, but I think you are looking for "audio fingerprinting". Similar question here.

An alternative (and more error-prone) way is running both sounds through a speech to text library (or an API) and matching relevant part. This would be of course not very reliable. Sentences frequently repeat in songs and concert maybe instrumental.

Also, doing audio processing on a mobile device may not play well (because of low performance or high battery drain or both). I suggest you to use a server if you go that way.

Good luck.

nr: #2 dodano: 2017-01-04 03:01

Here's a concise, clear answer.

• It's not easy - it will involve signal processing and math.
• A quick Google gives me this solution, code included.
• There is more info on the above technique here.
• I'd suggest gaining at least a basic understanding before you try and port this to iOS.
• I would suggest you use the Accelerate framework on iOS for fast Fourier transforms etc
• I don't agree with the other answer about doing it on a server - devices are plenty powerful these days. A user wouldn't mind a few seconds of processing for something seemingly magic to happen.

nr: #3 dodano: 2017-01-05 15:01

This could prove to be a difficult problem, as even though the signals are of the same event, the presence of noise makes a comparison harder. You could consider running some post-processing to reduce the noise, but noise reduction in its self is an extensive non-trivial topic.

Another problem could be that the signal captured by the two devices could actually differ a lot, for example the good quality audio (i guess output from the live mix console?) will be fairly different than the live version (which is guess is coming out of on stage monitors/ FOH system captured by a phone mic?)

Perhaps the simplest possible approach to start would be to use cross correlation to do the time delay analysis.

A peak in the cross correlation function would suggest the relative time delay (in samples) between the two signals, so you can apply the shift accordingly.

