Project Barcelona · Team Track

Discover the
formula for
progress.

We analyzed lesson transcripts to measure, visualize, and explain how Preply students actually improve their spoken language over time — turning messy conversations into a single, trustworthy Progress Index.

Week 12
+23 pts
Learner #A47 · Spanish
78 / 100
FluencyAccuracyComplexityLexical RangeEngagement
5
Subscores tracked
15+
Linguistic metrics
0–100
Progress Index
Methodology · Scoring

How we score progress.

Each lesson produces a Progress Index (0–100) from five subscores — consistent across students and over time.

Weighting
01
Fluency
How smooth and natural the student sounds.
25%
+
Speaking Pace
Words per minute — comfortable, not rushed.
Filler Words
How often they say "um", "uh", or repeat words.
Long Pauses
Stops longer than a second — fewer means more confidence.
Speaking Pace
Weight: 40% of Fluency
Speaking alpha-token count ÷ speaking-turn minutes. Linearly mapped: 0 WPM → 0, 140 WPM → 100, capped at 100. No penalty above 140. Reading turns excluded.
Filler Words
Weight: 30% of Fluency
Disfluency rate: (filler tokens + immediate word repetitions) ÷ speaking alpha tokens × 100. Inverse-mapped: 0% → 100, 20%+ → 0. Fillers include um, uh, erm, hmm.
Long Pauses
Weight: 30% of Fluency
Gaps >1.0 s between consecutive speaking tokens, normalized to pauses per minute. Inverse-mapped: 0 per min → 100, 4+ per min → 0.
02
Accuracy
Grammar correctness — basic errors count more than advanced ones.
25%
+
Basic Mistakes
Core grammar like "he go" → "he goes". Weighted heavily.
Intermediate
Wrong prepositions, tense mix-ups. Weighted moderately.
Advanced
Subtle style and word choice. Weighted lightly.
Basic Mistakes (T1 Core)
Tier weight: ×3.0
A2-level errors: subject-verb agreement slips, double negatives, "I am agree." Weighted ×3 because they signal missing foundations.
Intermediate (T2)
Tier weight: ×2.0
B1–B2 errors: wrong prepositions, tense mix-ups, second-conditional slips. Weighted ×2.
Advanced (T3 Stretch)
Tier weight: ×1.0
C1+ structures the student is stretching into. Weighted ×1 — these are signs of growth, not gaps. Mostly surfaced by the LLM-enhanced pass.

Score formula: (3×T1 + 2×T2 + 1×T3) ÷ speaking alpha tokens × 100 = weighted rate per 100 words. Log-mapped: rate 0 → 100, rate 1 → 79, rate 3 → 58, rate 10 → 28, clamped to 0–100. Segments with <30 speaking words are excluded.

03
Complexity
How sophisticated the student's sentences and vocabulary are.
20%
+
Sentence Length
Longer, fuller sentences show growing comfort.
Connecting Ideas
Using "because", "although", "while" to link thoughts.
Advanced Vocabulary
Reaching for more precise, sophisticated words.
Sentence Length
Weight: 40% of Complexity
Mean alphabetic token count per speaking sentence. Linearly mapped: 0 words → 0, 14+ words → 100, capped at 100.
Connecting Ideas
Weight: 35% of Complexity
% of speaking sentences containing connectors (because, although, while, if, when, which, but, so, therefore, since). Mapped: 0% → 0, 50%+ → 100, capped at 100.
Advanced Vocabulary
Weight: 25% of Complexity
% of speaking words with 8+ characters. Mapped: 0% → 0, 25%+ → 100, capped at 100.
04
Lexical Range
Vocabulary diversity — using many different words, not repeating the same ones.
15%
+
Vocabulary Variety
Ratio of unique words to total words. More variety = higher score.
Vocabulary Variety (MATTR-50)
Weight: 100% of Lexical Range
Moving-average type-token ratio over 50-word windows from speaking tokens. Mapped: MATTR 0.45 → 0, MATTR 0.80 → 100, capped at 100. Falls back to unique ÷ total if <50 speaking words.
05
Engagement
Active participation, clear speaking, and thoughtful responses.
15%
+
Pronunciation
How clearly the student speaks.
Speaking Share
Student talks enough vs. just listening.
Answer Depth
Detailed answers, not just a few words.
Response Speed
Quick replies show comfort.
Pronunciation
Weight: 50% of Engagement
Mean word-level ASR confidence across speaking tokens. Proxy for pronunciation clarity. Mapped: confidence 0.70 → 0, confidence 0.97 → 100, capped at 100.
Speaking Share
Weight: 25% of Engagement
Student speaking alpha tokens ÷ (student + tutor) speaking alpha tokens. Mapped: share 10% → 0, share 55%+ → 100, capped at 100. No penalty above 55%.
Answer Depth
Weight: 15% of Engagement
Average speaking-response length in words after open tutor prompts (why, how, tell me, describe). Mapped: 0 words → 0, 25+ words → 100, capped at 100.
Response Speed
Weight: 10% of Engagement
Median gap in seconds between tutor turn end and student speaking turn start. Inverse-mapped: 0 s → 100, 4+ s → 0.
Progress Index Formula
0.25·Fluency + 0.25·Accuracy + 0.20·Complexity + 0.15·Lexical + 0.15·Engagement
0–100scale

Results

Select a Student

Standalone recordings

Audio Transcript Analysis

Full lessons use the five-rubric Progress Index (with Engagement). Single-speaker recordings don't have a tutor, so we swap Engagement for Naturalness and drop the tutor-side signals we can't observe — giving each clip a comparable 0–100 snapshot.

Native Avg Snapshot91.8

4 recordings

Non-Native Avg Snapshot78.2

5 recordings

Native Advantage+13.6

Mean native minus mean non-native

How we score single-speaker audio Audio Snapshot formula, in plain English

Audio Snapshot = 0.20 · Fluency + 0.25 · Accuracy + 0.15 · Complexity + 0.15 · Lexical Range + 0.25 · Naturalness

The five ingredients

Each ingredient is a 0–100 score. The snapshot is a weighted average.

  • Fluency 20%

    How smoothly they speak — pacing, pauses, fillers.

  • Accuracy 25%

    Tier-weighted grammar mistakes from the transcript.

  • Complexity 15%

    Sentence length and use of connectors.

  • Lexical Range 15%

    Vocabulary variety (MATTR-50).

  • Naturalness 25%

    Does it sound like a native would say it? Replaces Engagement here.

Why Naturalness replaces Engagement

Full lessons measure Engagement from tutor–student interaction (talk ratio, latency, elaboration, tutor corrections). Single-speaker recordings have no tutor, so those signals are missing.

Lessons use Engagement Needs tutor side
Audio uses Naturalness Transcript-only signal

How the Naturalness score is built

Start at 100. Scan the transcript for phrasing a native speaker would word differently. Add up penalties, normalize per 100 spoken words, convert to a 0–100 score. Fewer issues = higher score.

Awkward phrase / collocation ×3.0
“applied with my team” “winning of the hackathon”
Off word choice ×2.5
“higher energy” “scientific fair”
Article / preposition ×2.0
“on the weekend” “the homework”
Sentence linking ×1.0
“so because that” “and then and”

We also add a small ×1.5 penalty for explicit lexical-uncertainty cues such as “the English word, I don't know…”.

Weighted issue rate rate = (3.0·phrase + 2.5·word_choice + 2.0·article_prep + 1.0·linking) ÷ speaking_tokens × 100 Naturalness = clamp(100 − scaled(rate), 0, 100)

What's intentionally left out

Not scored

Accent & pronunciation. Naturalness is transcript-only; it doesn't judge how the words sounded.

Not available

Tutor-side signals (speaking share, response latency, elaboration depth, tutor correction rate) can't be observed in a solo recording, so we omit them rather than guess.

Single-speaker transcript estimate

Student 1

non-native
Estimated Progress Snapshot 83.5
FluencyAccuracyComplexityLexical RangeNaturalness
Metric detail

Available Metrics

Speaking Alpha Tokens127
Speaking Duration (s)56.06
Student WPM135.926
MATTR-500.791
Avg Sentence Length11.545
Complex Sentence Rate (%)18.182
Long Words (8+ chars %)10.236
Long Pauses >1s2
Filler Ratio (%)0
Mistakes (total)0
Mistakes T1 (Core)0
Mistakes T2 (Intermediate)0
Mistakes T3 (Stretch)0
Weighted Error Rate (per 100w)0
Naturalness Issues (total)1
Naturalness Issues: Phrase0
Naturalness Issues: Word Choice1
Naturalness Issues: Article/Prep0
Naturalness Issues: Linking0
Weighted Naturalness Issue Rate (per 100w)1.181
Reading Turns0
Reading WPM0
Reading Share (%)0

Single-speaker transcript estimate

Student 2

native
Estimated Progress Snapshot 93.0
FluencyAccuracyComplexityLexical RangeNaturalness
Metric detail

Available Metrics

Speaking Alpha Tokens174
Speaking Duration (s)53.64
Student WPM194.631
MATTR-500.825
Avg Sentence Length10.875
Complex Sentence Rate (%)37.5
Long Words (8+ chars %)9.195
Long Pauses >1s1
Filler Ratio (%)0
Mistakes (total)0
Mistakes T1 (Core)0
Mistakes T2 (Intermediate)0
Mistakes T3 (Stretch)0
Weighted Error Rate (per 100w)0
Naturalness Issues (total)0
Naturalness Issues: Phrase0
Naturalness Issues: Word Choice0
Naturalness Issues: Article/Prep0
Naturalness Issues: Linking0
Weighted Naturalness Issue Rate (per 100w)0
Reading Turns0
Reading WPM0
Reading Share (%)0

Single-speaker transcript estimate

Student 3

native
Estimated Progress Snapshot 89.9
FluencyAccuracyComplexityLexical RangeNaturalness
Metric detail

Available Metrics

Speaking Alpha Tokens116
Speaking Duration (s)53.56
Student WPM129.948
MATTR-500.863
Avg Sentence Length10.545
Complex Sentence Rate (%)0
Long Words (8+ chars %)17.241
Long Pauses >1s1
Filler Ratio (%)0
Mistakes (total)0
Mistakes T1 (Core)0
Mistakes T2 (Intermediate)0
Mistakes T3 (Stretch)0
Weighted Error Rate (per 100w)0
Naturalness Issues (total)0
Naturalness Issues: Phrase0
Naturalness Issues: Word Choice0
Naturalness Issues: Article/Prep0
Naturalness Issues: Linking0
Weighted Naturalness Issue Rate (per 100w)0
Reading Turns0
Reading WPM0
Reading Share (%)0

Single-speaker transcript estimate

Student 4

non-native
Estimated Progress Snapshot 71.7
FluencyAccuracyComplexityLexical RangeNaturalness
Metric detail

Available Metrics

Speaking Alpha Tokens111
Speaking Duration (s)63.28
Student WPM105.247
MATTR-500.666
Avg Sentence Length13.875
Complex Sentence Rate (%)25
Long Words (8+ chars %)11.712
Long Pauses >1s1
Filler Ratio (%)1.802
Mistakes (total)0
Mistakes T1 (Core)0
Mistakes T2 (Intermediate)0
Mistakes T3 (Stretch)0
Weighted Error Rate (per 100w)0
Naturalness Issues (total)3
Naturalness Issues: Phrase0
Naturalness Issues: Word Choice2
Naturalness Issues: Article/Prep0
Naturalness Issues: Linking1
Weighted Naturalness Issue Rate (per 100w)5.405
Reading Turns0
Reading WPM0
Reading Share (%)0

Single-speaker transcript estimate

Student 5

non-native
Estimated Progress Snapshot 83.8
FluencyAccuracyComplexityLexical RangeNaturalness
Metric detail

Available Metrics

Speaking Alpha Tokens128
Speaking Duration (s)50.78
Student WPM151.241
MATTR-500.801
Avg Sentence Length11.636
Complex Sentence Rate (%)54.545
Long Words (8+ chars %)10.156
Long Pauses >1s0
Filler Ratio (%)1.562
Mistakes (total)0
Mistakes T1 (Core)0
Mistakes T2 (Intermediate)0
Mistakes T3 (Stretch)0
Weighted Error Rate (per 100w)0
Naturalness Issues (total)2
Naturalness Issues: Phrase1
Naturalness Issues: Word Choice1
Naturalness Issues: Article/Prep0
Naturalness Issues: Linking0
Weighted Naturalness Issue Rate (per 100w)4.297
Reading Turns0
Reading WPM0
Reading Share (%)0

Single-speaker transcript estimate

Student 6

non-native
Estimated Progress Snapshot 73.1
FluencyAccuracyComplexityLexical RangeNaturalness
Metric detail

Available Metrics

Speaking Alpha Tokens129
Speaking Duration (s)56.92
Student WPM135.98
MATTR-500.773
Avg Sentence Length12.9
Complex Sentence Rate (%)40
Long Words (8+ chars %)8.527
Long Pauses >1s1
Filler Ratio (%)0.775
Mistakes (total)0
Mistakes T1 (Core)0
Mistakes T2 (Intermediate)0
Mistakes T3 (Stretch)0
Weighted Error Rate (per 100w)0
Naturalness Issues (total)6
Naturalness Issues: Phrase4
Naturalness Issues: Word Choice1
Naturalness Issues: Article/Prep1
Naturalness Issues: Linking0
Weighted Naturalness Issue Rate (per 100w)12.791
Reading Turns0
Reading WPM0
Reading Share (%)0

Single-speaker transcript estimate

Student 7

native
Estimated Progress Snapshot 95.1
FluencyAccuracyComplexityLexical RangeNaturalness
Metric detail

Available Metrics

Speaking Alpha Tokens208
Speaking Duration (s)58.4
Student WPM213.699
MATTR-500.841
Avg Sentence Length10.947
Complex Sentence Rate (%)36.842
Long Words (8+ chars %)12.5
Long Pauses >1s0
Filler Ratio (%)0.481
Mistakes (total)0
Mistakes T1 (Core)0
Mistakes T2 (Intermediate)0
Mistakes T3 (Stretch)0
Weighted Error Rate (per 100w)0
Naturalness Issues (total)0
Naturalness Issues: Phrase0
Naturalness Issues: Word Choice0
Naturalness Issues: Article/Prep0
Naturalness Issues: Linking0
Weighted Naturalness Issue Rate (per 100w)0
Reading Turns0
Reading WPM0
Reading Share (%)0

Single-speaker transcript estimate

Student 8

non-native
Estimated Progress Snapshot 78.8
FluencyAccuracyComplexityLexical RangeNaturalness
Metric detail

Available Metrics

Speaking Alpha Tokens153
Speaking Duration (s)60.08
Student WPM152.796
MATTR-500.689
Avg Sentence Length11.769
Complex Sentence Rate (%)46.154
Long Words (8+ chars %)3.268
Long Pauses >1s0
Filler Ratio (%)0.654
Mistakes (total)0
Mistakes T1 (Core)0
Mistakes T2 (Intermediate)0
Mistakes T3 (Stretch)0
Weighted Error Rate (per 100w)0
Naturalness Issues (total)2
Naturalness Issues: Phrase1
Naturalness Issues: Word Choice0
Naturalness Issues: Article/Prep1
Naturalness Issues: Linking0
Weighted Naturalness Issue Rate (per 100w)3.268
Reading Turns0
Reading WPM0
Reading Share (%)0

Single-speaker transcript estimate

Student 9

native
Estimated Progress Snapshot 89.1
FluencyAccuracyComplexityLexical RangeNaturalness
Metric detail

Available Metrics

Speaking Alpha Tokens178
Speaking Duration (s)62.11
Student WPM171.953
MATTR-500.765
Avg Sentence Length8.476
Complex Sentence Rate (%)19.048
Long Words (8+ chars %)10.674
Long Pauses >1s1
Filler Ratio (%)0
Mistakes (total)0
Mistakes T1 (Core)0
Mistakes T2 (Intermediate)0
Mistakes T3 (Stretch)0
Weighted Error Rate (per 100w)0
Naturalness Issues (total)0
Naturalness Issues: Phrase0
Naturalness Issues: Word Choice0
Naturalness Issues: Article/Prep0
Naturalness Issues: Linking0
Weighted Naturalness Issue Rate (per 100w)0
Reading Turns0
Reading WPM0
Reading Share (%)0
Exact Scoring Formulas (Reference)
  • Fluency = 0.40*wpm + 0.30*inverse_disfluency + 0.30*inverse_long_pause_rate
  • Accuracy = log-mapped tier-weighted mistake rate
  • Complexity = 0.40*avg_sentence_len + 0.35*complex_sentence_rate + 0.25*long_words_pct
  • Lexical Range = MATTR-50
  • Engagement = 0.50*asr_confidence + 0.25*speaking_share + 0.15*elaboration_depth + 0.10*inverse_response_latency
  • student_response_latency_s = median gap between a tutor turn ending and the next student turn beginning
  • elaboration_depth = average student response length after open tutor prompts like why, how, tell me, or describe
  • tutor_correction_rate = supplementary context only; it does not affect the Progress Index
  • student_reading_wpm = reading alpha tokens divided by reading-turn minutes
  • student_reading_share_pct = share of the student's total alpha tokens that came from reading turns
  • subscore_fluency × 0.25
  • subscore_accuracy × 0.25
  • subscore_complexity × 0.20
  • subscore_lexical_range × 0.15
  • subscore_engagement × 0.15
  • Audio transcript snapshots below use a separate audio-only formula: they keep the lesson anchors for Fluency, Accuracy, Complexity, and Lexical Range, but replace Engagement with Naturalness.