Transpose audio up or down by semitones (and cents for fine-tuning) while the duration stays the same. Used for matching keys between tracks, transposing a song into your vocal range, sample tuning, or detuning a doubler track.
Vocal, instrument, full mix — anything decodable.
Up to 200 MB
+12 = up an octave. -5 = down a perfect fourth. 100 cents = 1 semitone, so use the cents slider to detune slightly (matching a slightly out-of-tune live track) or to land between standard pitches.
Method: asetrate (resample) + chained atempo (time-correct)
ffmpeg shifts pitch by raising/lowering the sample rate, then time-stretches back to the original duration. Within ±5 semitones the result sounds natural; beyond that vocals start to sound chipmunk-like (up) or muddy (down).
Output: 16-bit WAV, same duration as source
Two stages chained. First, ffmpeg lies about the sample rate — playing the file 'faster' shifts pitch up an octave for each doubling of rate. Second, atempo time-stretches back to the original duration. The pitch shift survives; the duration cancels out. Result: pitch changes, tempo stays put.
Up to about ±5 semitones, yes. Beyond that the formants (the resonances that make vowels sound like specific vowels) shift along with pitch, which is why aggressive up-shifts sound chipmunk-y and aggressive down-shifts sound muddy. Tools that preserve formants separately (Auto-Tune, Melodyne) sound more natural at extreme shifts but cost money. For modest shifts this is fine.
Fine-tuning between semitones. 100 cents = 1 semitone. So +50 cents is a quarter-tone up — useful when a live recording is slightly sharp, when you're matching a sample to an out-of-tune piano, or when you want a deliberate detune effect on a doubler.
Roughly. 'Key change' is the musical term — moving a song from C major to D major is a 2-semitone pitch shift. The technical operation is identical. Songwriters use this to transpose a song into a singer's comfortable range; producers use it for sample matching.