Auto-Chunking Audio Files into Intonational Phrases

T. Mark Ellison

May 1, 2017

Eri Kashima and I have found a neat way of chunking speech from the audio file, as a first step in transcription. Initial efforts using silence-detection in ELAN were not successful. Instead, we found that PRAAT’s silence detection did the job quite well once the right parameters were chosen.

We use PRAAT’s Annotate >> To TextGrid (silences)… option from the PRAAT file window. This option is available once you have loaded the .wav file. Our parameter settings are:

  • Minimum pitch 70Hz
  • Silence threshold (dB): -35
  • Minimum silent interval duration(s): 0.25
  • Minimum sounding interval duration(s): 0.1
  • Silent interval label: (empty)
  • Sounding interval label: ***

A detailed walkthrough - of chunking by PRAAT for a file normally explored in ELAN - can be seen on Eri’s blog page on the topic.