Abstract
We present a series of three analyses of young children's linguistic input to determine the distributional information it could plausibly offer to the process of grammatical category learning. Each analysis was conducted on four separate corpora from the CHILDES database (MacWhinney, 2000) of speech directed to children under 2;5. We show that, in accord with other findings, a distributional analysis, which categorizes words based on their co-occurrence patterns with surrounding words successfully, categorizes the majority of nouns and verbs. In Analyses 2 and 3, we attempt to make our analyses more closely relevant to natural language acquisition by adopting more realistic assumptions about how young children represent their input. In Analysis 2, we limit the distributional context by imposing phrase structure boundaries, and find that categorization improves even beyond that obtained from less limited contexts. In Analysis 3, we reduce the representation of input elements which young children might not fully process and we find that categorization is not adversely affected: Although noun categorization is worse than in Analyses 1 and 2, it is still good; and verb categorization actually improves. Overall, successful categorization of nouns and verbs is maintained across all analyses. These results provide promising support for theories of grammatical category formation involving distributional analysis, as long as these analyses are combined with appropriate assumptions about the child learner's computational biases and capabilities.
Original language | English (US) |
---|---|
Pages (from-to) | 393-424 |
Number of pages | 32 |
Journal | Cognitive science |
Volume | 26 |
Issue number | 4 |
DOIs | |
State | Published - 2002 |
Keywords
- Distributional structure
- Grammatical categories
- Young children
ASJC Scopus subject areas
- Experimental and Cognitive Psychology
- Cognitive Neuroscience
- Artificial Intelligence