SEED: The Speech Exemplar and Evaluation Database
SEED includes thousands of audio recordings from children and adults, along with associated demographic and developmental information.
It supports:
-
Training and evaluation of AI models for speech assessment
-
Research on intelligibility, articulation, and linguistic variation Development of screening tools aligned with clinical decision-making What makes SEED unique?
-
Includes speech from children with speech disorders
-
Captures natural speech variation across dialects and developmental stages
-
Includes expert classifications
Researchers & Teachers Can Request Access through the Credentialed Dataset Here
SPROUT: Speech Production Repository to Optimizing Use of AI Technologies
SPROUT is a growing repository designed for training foundational models of young children’s speech. It expands on SEED by focusing on:
- Controlled speech production tasks (word lists, phrases, sentences)
- High-quality, multi-microphone audio capture
- Representative datasets
- Alignment with vocal biomarker research and early screening tools
The SPROUT datasets includes ~300 children from different backgrounds — (Black, Latine, White, and those experiencing living in low socioeconomic environments).
SPROUT enables:
- Fine-tuning and benchmarking of child-specific ASR models
- Exploration of acoustic biomarkers related to developmental concerns
- Cross-site collaboration across 8 research sites.
The dataset will become available in March 2026. Apply now to join the SPROUT user community and be notified when it is released. The application can be found under the Data Access and Governance tab.