Godby, C J (2002) A computational study of lexicalized noun phrases in English. Unpublished PhD thesis, The Ohio State University, USA.
Abstract
Lexicalized noun phrases are noun phrases that function as words. In English, lexicalized noun phrases are usually realized as noun-noun compounds such as theater ticket and garbage man, or as adjective-noun phrases such as black market and high school. In specialized or technical subject domains, phrases such as urban planning, air traffic control, highway engineering and combinatorial mathematics represent conventional names for concepts that are just as important to the as single-word terms such as adsorbents, hydrology, or aerodynamics. Yet despite the fact that lexicalized noun phrases are frequent enough to be cited in dictionaries, book indexes, the traditional linguistic literature has failed to identify consistent and categorical formal criteria for identifying them. This study develops and evaluates a linguistically natural computational method for recognizing lexicalized noun phrases in a large corpus of English-language engineering text by synthesizing the insights of studies in traditional linguistics and computational linguists. From the scholarship in theoretical linguistics, the analysis adopts the perspective that lexicalized noun phrases represent the names of concepts that are important to a community of speakers and have survived a single context of use. Theoretical linguists have also proposed diagnostic tests for identifying lexicalized noun phrases, many of which can be formalized in a computational study. From the scholarship in computational linguistics, the analysis incorporates the view that a linguistic investigation can be extended and verified by processing relevant evidence from a corpus of text, which can be evaluated using mathematical models that do not require categorical input. In a engineering text, a small set of linguistic contexts, including professor of, department of or studies in, yields long lists of lexicalized noun phrases, including public safety, abstract state machines, complex systems, computer graphics, and mathematical morphology. The study reported here identifies lexical and syntactic contexts that harbor lexicalized noun phrases and submits them to a machine-learning algorithm that classifies the lexical status of noun phrases extracted from the text. Results from several evaluations show that this evidence is relevant to the classification, and informal evidence from many other subject domains implies that the results can be generalized.
Item Type: | Thesis (Doctoral) |
---|---|
Thesis advisor: | Roberts, C |
Uncontrolled Keywords: | market; morphology; highway; traffic; aerodynamics; hydrology; safety; mathematics |
Date Deposited: | 16 Apr 2025 19:25 |
Last Modified: | 16 Apr 2025 19:25 |