PEASYV

From videos to TextGrids

Adrien Méli and Nicolas Ballier

2024-03-28

Basic concept

Phonetic

Extraction and

Alignment of

Subtitled

YouTube

Videos

flowchart LR
  A(Link to video) --> B(Praat TextGrid)

Key tools

Detailed flowchart

flowchart LR

subgraph S1[Step 1]
direction TB
A(fa:fa-file-lines list of links) --> |yt-dlp| B(fa:fa-video video)
A --> |yt-dlp| C(fa:fa-closed-captioning subtitles)
B -->|ffmpeg| D(fa:fa-file-audio fa:fa-regular far:fa-square Main Audio TG)
C -->|praat| D
D -->|praat| E1(fa:fa-file-audio far:fa-square)
D -->|praat| E2(fa:fa-file-audio far:fa-square)
D -->|praat| E3(fa:fa-file-audio far:fa-square)
end

subgraph S2[Step 2]
direction LR
F1(fa:fa-file-audio far:fa-square) -->|SPPAS| G1(fa:fa-table-cells-large Segm TG)
F2(fa:fa-file-audio far:fa-square) -->|SPPAS| G2(fa:fa-table-cells-large Segm TG)
F3(fa:fa-file-audio far:fa-square) -->|SPPAS| G3(fa:fa-table-cells-large Segm TG)
F1(fa:fa-file-audio far:fa-square) -->|fa:fa-align P2FA| H1(fa:fa-table-cells-large Segm TG)
F2(fa:fa-file-audio far:fa-square) -->|fa:fa-align P2FA| H2(fa:fa-table-cells-large Segm TG)
F3(fa:fa-file-audio far:fa-square) -->|fa:fa-align P2FA| H3(fa:fa-table-cells-large Segm TG)
G1--> GH(fa:fa-file-audio fa:fa-table-cells-large)
H1--> GH
G2--> GH
H2--> GH
G3--> GH
H3--> GH
end

subgraph S3[Step 3]
direction TB
I(fa:fa-file-audio fa:fa-table-cells-large) -->|praat| K(fa:fa-table-cells Segm Syll TG)
J(fa:fa-book LPD) -->|praat|K
K -->|R| L(fa:fa-file-csv spreadsheets)
L -->|R| M(fa:fa-chart-line vocalic diagnoses)

end

S1 --> S2
S2 --> S3

About the two aligners

Claims

  • 2 aligners are better than just one

    • SPPAS
    • P2FA
  • Step 2 prevents cascading alignment errors

  • Added values:

    • low-tech
    • syllabic tiers based on the LPD (Wells 2008)
    • (MIS)MATCHES on the TextGrid

Outputs

the PRAAT TextGrid

Tiers

  1. Transcription
  2. Momel
  3. INTSINT
  1. SPPAS Word
  2. SPPAS Phoneme
  3. SPPAS LPD Word
  4. SPPAS Syllable
  5. SPPAS LPD Syllable
  1. P2FA Word
  2. P2FA Phoneme
  3. P2FA LPD Word
  4. P2FA Syllable
  5. P2FA LPD Syllable
  1. SPPAS MISMATCH
  2. P2FA MISMATCH

Screenshot of a TextGrid

Secondary outputs

Spreadsheets

  • .csv format
  • one per aligner
  • one row per vowel
  • formant readings at each centile of a vowel’s duration
  • = 300 + 24 columns

Columns and data

Diagnoses

An example: English Like A Native

SPPAS

Data aligned by SPPAS.

General information

Nb of TextGrids: 453

Total length of the videos: 172:39:22

Monophthongs

Data on monophthongs.

Distribution

Distribution of monophthongs.

Durations

Per-monophthong boxplots of durations.

Vocalic Trapezoids

Scatterplots

Per-monophthong mean F1/F2 values with error bars (1 SE)
Deterding

Dotted grey line: reported native values — Black line: speaker

References: Deterding (1997)

Hillenbrand

Dotted grey line: reported native values — Black line: speaker

References: Hillenbrand et al. (1995)

Density plots

F1

Per-monophthong F1 density plots
F2

Per-monophthong F2 density plots

Formant tracking

Next are the formant tracks for monophthongs.

F1

F2

Diphthongs

Data on diphthongs.

Distribution

Distribution of diphthongs

Durations

Boxplot of diphthongs durations

Vocalic Trapezoids

Diphthong ɪə

Vector of the mean F1/F2 values from 20% to 80% percent of the diphthong’s duration
Diphthong eɪ

Vector of the mean F1/F2 values from 20% to 80% percent of the diphthong’s duration
Diphthong aɪ

Vector of the mean F1/F2 values from 20% to 80% percent of the diphthong’s duration
Diphthong eə

Vector of the mean F1/F2 values from 20% to 80% percent of the diphthong’s duration
Diphthong əʊ

Vector of the mean F1/F2 values from 20% to 80% percent of the diphthong’s duration
Diphthong aʊ

Vector of the mean F1/F2 values from 20% to 80% percent of the diphthong’s duration
Diphthong ɔɪ

Vector of the mean F1/F2 values from 20% to 80% percent of the diphthong’s duration
Diphthong ʊə

Vector of the mean F1/F2 values from 20% to 80% percent of the diphthong’s duration
Overview

Overview of all diphthongs

Formant tracking

Next are the formant tracks for diphthongs.

F1

F2

P2F

Data aligned by P2FA.

Monophthongs

Data on monophthongs.

Distribution

Distribution of monophthongs.

Durations

Per-monophthong boxplots of durations.

Vocalic Trapezoids

Scatterplots

Per-monophthong mean F1/F2 values with error bars (1 SE)
Deterding

Dotted grey line: reported native values — Black line: speaker

References: Deterding (1997)

Hillenbrand

Dotted grey line: reported native values — Black line: speaker

References: Hillenbrand et al. (1995)

Density plots

F1

Per-monophthong F1 density plots
F2

Per-monophthong F2 density plots

Formant tracking

Next are the formant tracks for monophthongs.

F1

F2

Diphthongs

Data on diphthongs.

Distribution

Distribution of diphthongs

Durations

Boxplot of diphthongs durations

Vocalic Trapezoids

Diphthong ɪə

Vector of the mean F1/F2 values from 20% to 80% percent of the diphthong’s duration
Diphthong eɪ

Vector of the mean F1/F2 values from 20% to 80% percent of the diphthong’s duration
Diphthong aɪ

Vector of the mean F1/F2 values from 20% to 80% percent of the diphthong’s duration
Diphthong eə

Vector of the mean F1/F2 values from 20% to 80% percent of the diphthong’s duration
Diphthong əʊ

Vector of the mean F1/F2 values from 20% to 80% percent of the diphthong’s duration
Diphthong aʊ

Vector of the mean F1/F2 values from 20% to 80% percent of the diphthong’s duration
Diphthong ɔɪ

Vector of the mean F1/F2 values from 20% to 80% percent of the diphthong’s duration
Diphthong ʊə

Vector of the mean F1/F2 values from 20% to 80% percent of the diphthong’s duration
Overview

Overview of all diphthongs

Formant tracking

Next are the formant tracks for diphthongs.

F1

F2

Comparison between the two aligners

Per-diphthong

Let’s now compare the data obtained with the two aligners.

Diphthong ɪə

Diphthong eɪ

Diphthong aɪ

Diphthong eə

Diphthong əʊ

Diphthong aʊ

Diphthong ɔɪ

Diphthong ʊə

Overview

What next?

  • make PEASYV installable

but potential issues with Wells (2008)

References

Bigi, B. 2012. SPPAS: a tool for the phonetic segmentations of Speech.” Istanbul.
Bigi, B., and D. Hirst. 2012. SPeech Phonetization Alignment and Syllabification (SPPAS):a Tool for the Automatic Analysis of Speech Prosody.” Shanghai.
Boersma, Paul, and David Weenink. 2019. “Praat: Doing Phonetics by Computer [Computer Program]. Version 6.1.07, Retrieved 26 November 2019 from Http://Www.praat.org/.” 2019.
Deterding, David. 1997. “The Formants of Monophthong Vowels in Standard Southern British English Pronunciation.” Journal of the International Phonetic Association 27 (1-2): 47–55. https://doi.org/10.1017/s0025100300005417.
Developers, FFmpeg. 2021. “Ffmpeg Tool (Version Be1d324)[software].” URL: Http://Ffmpeg. Org.
Hillenbrand, J., L. A. Getty, M. J. Clark, and K. Wheeler. 1995. “Acoustic Characteristics of American English Vowels.” The Journal of the Acoustical Society of America 97 (5): 3099–3111.
Lee, A., and T. Kawahara. 2019. https://doi.org/10.5281/zenodo.2530395.
R Core Team. 2023. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Weide, R. L. 1994. CMU Pronouncing Dictionary.” http://www.speech.cs.cmu.edu/cgi-bin/cmudict.
Wells, J. C. 2008. Longman Pronunciation Dictionary. London: Pearson Longman.
Young, S. J., G. Evermann, M. J. F. Gales, T. Hain, D. Kershaw, G. Moore, J. Odell, et al. 2006. The HTK Book, Version 3.4. Cambridge, UK: Cambridge University Engineering Department.
yt-dlp, Developers. 2022. “Yt-Dlp.” GitHub Repository. https://github.com/yt-dlp/yt-dlp; GitHub.
Yuan, J., and M. Liberman. 2008. “Speaker Identification on the SCOTUS Corpus.” Journal of the Acoustical Society of America, 123(5): 5687.