TY - GEN
T1 - Word segmentation as general chunking
AU - Hewlett, Daniel
AU - Cohen, Paul
PY - 2011
Y1 - 2011
N2 - During language acquisition, children learn to segment speech into phonemes, syllables, morphemes, and words. We examine word segmentation specifically, and explore the possibility that children might have general purpose chunking mechanisms to perform word segmentation. The Voting Experts (VE) and Bootstrapped Voting Experts (BVE) algorithms serve as computational models of this chunking ability. VE finds chunks by searching for a particular information-theoretic signature: low internal entropy and high boundary entropy. BVE adds to VE the ability to incorporate information about word boundaries previously found by the algorithm into future segmentations. We evaluate the general chunking model on phonemically encoded corpora of child-directed speech, and show that it is consistent with empirical results in the developmental literature. We argue that it offers a parsimonious alternative to special purpose linguistic models.
AB - During language acquisition, children learn to segment speech into phonemes, syllables, morphemes, and words. We examine word segmentation specifically, and explore the possibility that children might have general purpose chunking mechanisms to perform word segmentation. The Voting Experts (VE) and Bootstrapped Voting Experts (BVE) algorithms serve as computational models of this chunking ability. VE finds chunks by searching for a particular information-theoretic signature: low internal entropy and high boundary entropy. BVE adds to VE the ability to incorporate information about word boundaries previously found by the algorithm into future segmentations. We evaluate the general chunking model on phonemically encoded corpora of child-directed speech, and show that it is consistent with empirical results in the developmental literature. We argue that it offers a parsimonious alternative to special purpose linguistic models.
UR - https://www.scopus.com/pages/publications/84862288892
UR - https://www.scopus.com/pages/publications/84862288892#tab=citedBy
M3 - Conference contribution
AN - SCOPUS:84862288892
SN - 9781932432923
T3 - CoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference
SP - 39
EP - 47
BT - CoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference
T2 - 15th Conference on Computational Natural Language Learning, CoNLL 2011
Y2 - 23 June 2011 through 24 June 2011
ER -