Word segmentation as general chunking

Daniel Hewlett, Paul Cohen

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    6 Scopus citations

    Abstract

    During language acquisition, children learn to segment speech into phonemes, syllables, morphemes, and words. We examine word segmentation specifically, and explore the possibility that children might have general purpose chunking mechanisms to perform word segmentation. The Voting Experts (VE) and Bootstrapped Voting Experts (BVE) algorithms serve as computational models of this chunking ability. VE finds chunks by searching for a particular information-theoretic signature: low internal entropy and high boundary entropy. BVE adds to VE the ability to incorporate information about word boundaries previously found by the algorithm into future segmentations. We evaluate the general chunking model on phonemically encoded corpora of child-directed speech, and show that it is consistent with empirical results in the developmental literature. We argue that it offers a parsimonious alternative to special purpose linguistic models.

    Original languageEnglish (US)
    Title of host publicationCoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference
    Pages39-47
    Number of pages9
    StatePublished - 2011
    Event15th Conference on Computational Natural Language Learning, CoNLL 2011 - Portland, OR, United States
    Duration: Jun 23 2011Jun 24 2011

    Publication series

    NameCoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference

    Other

    Other15th Conference on Computational Natural Language Learning, CoNLL 2011
    Country/TerritoryUnited States
    CityPortland, OR
    Period6/23/116/24/11

    ASJC Scopus subject areas

    • Artificial Intelligence
    • Linguistics and Language
    • Human-Computer Interaction

    Fingerprint

    Dive into the research topics of 'Word segmentation as general chunking'. Together they form a unique fingerprint.

    Cite this