I think you're definitely on the right lines by recording locally for the reasons you mention. By counting down, you're basically creating an 'audio clacker' in a similar vein to what they do on films (Y'know, the black and white "Take One" thing) to sync audio and video.
I don't think there is going to be any way to avoid this process, unless there is some kind of specific Podcasting software (That'd be cool, actually) that can handle it in software.