Project

Text To Speech Highlighted

A terminal-driven text-to-speech reader that speaks text through Piper while a second window tracks the active words in real time so the user stays visually aligned with the audio stream.

GitHub Repository

What It Does

The project combines text normalization, Piper speech generation, audio playback, and a live highlight pane so the spoken output and visible reading position stay linked together.

Why I Built It

This came from wanting something more trackable than plain TTS. For longer passages, the missing piece is not just hearing the words, it is being able to see exactly where the reader currently is.

Visual Flow

How The Reader Moves

$ python3 tts_highlight.py "This text will be spoken..."

[reader] normalize input

[reader] load Piper model

[reader] synthesize chunk 01

[reader] write active index 0,4

[reader] play wav via pw-play

[reader] synthesize chunk 02

[reader] write active index 4,8

                This terminal reader keeps the
                currently spoken words
                visually visible so the user can follow the narration line by line without losing position.
              

Pipeline

Runtime Stages

Input and normalization. Raw text is cleaned into an ASCII-safe form so punctuation, symbols, and odd Unicode characters do not break speech synthesis.

Piper synthesis. The script resolves the Piper binary and voices directory, chooses a voice model, and generates short audio spans from the text.

Chunk playback. Each spoken segment is mapped to a word range and played back locally through `pw-play` or `aplay`.

Highlight sync. The active index window is written to a temp file, and the separate pane redraws the text so the current words stay highlighted as playback advances.

Stack

Technical Notes

Core Runtime

Python 3 drives the reader, while `pydub` handles the intermediate audio slicing used to keep the spoken chunks aligned with the highlight spans.

Speech Backend

The current public path is Piper-based. The repo removes machine-specific assumptions and lets users rely on `PATH`, `PIPER_BIN`, `PIPER_VOICES_DIR`, or manual path input.

Highlight UI

The highlight pane is terminal-native rather than browser-based. It watches the active word index and keeps the reading position centered on screen.

Legacy Preservation

Older Coqui-TTS work is kept in `legacy/` for reference, but the public repo is intentionally centered on the current Piper workflow.

Install Model

Why The Public Copy Is Cleaner

The public repo is packaged as a portable reader rather than a mirror of the original laptop setup. Hard-coded local paths were stripped out, runtime discovery was improved, and users can now provide their own Piper binary and voices directory instead of inheriting one machine’s assumptions.