TY - GEN
T1 - Linking corpus characteristics to performance of semantic annotation systems for biosystematic descriptions
AU - Cui, Hong
PY - 2010
Y1 - 2010
N2 - Digitizing and repurposing taxonomic descriptions of living organisms is an urgent task facing biodiversity informatics researchers. Semantic annotation is the essential technology that makes taxonomic descriptions' reuse and repurpose possible. However, annotation systems performance often vary by collections. Given large content and structural variations inherent in different collections of taxonomic descriptions, this paper looks into corpus characteristic measures in an attempt to establish a performance prediction model which, when given a small set of samples, predicts a system's performance for a collection. The predication model helps deepen our understanding of strengths and weaknesses of an annotation system, but more importantly provides a valuable decision-making tool for end users. We started this research by using MARTT (Markuper for Taxonomic Treatments) system as a base. Although an universal performance predication model for all systems and all corpora may not be possible at this time, we hope more and more individual systems would offer such tools as a regular component in their delivery package.
AB - Digitizing and repurposing taxonomic descriptions of living organisms is an urgent task facing biodiversity informatics researchers. Semantic annotation is the essential technology that makes taxonomic descriptions' reuse and repurpose possible. However, annotation systems performance often vary by collections. Given large content and structural variations inherent in different collections of taxonomic descriptions, this paper looks into corpus characteristic measures in an attempt to establish a performance prediction model which, when given a small set of samples, predicts a system's performance for a collection. The predication model helps deepen our understanding of strengths and weaknesses of an annotation system, but more importantly provides a valuable decision-making tool for end users. We started this research by using MARTT (Markuper for Taxonomic Treatments) system as a base. Although an universal performance predication model for all systems and all corpora may not be possible at this time, we hope more and more individual systems would offer such tools as a regular component in their delivery package.
KW - Corpus characteristics
KW - Performance evauation
KW - Performance prediction
KW - Semantic annotation systems
UR - http://www.scopus.com/inward/record.url?scp=77954480345&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77954480345&partnerID=8YFLogxK
U2 - 10.1109/ICBBT.2010.5479002
DO - 10.1109/ICBBT.2010.5479002
M3 - Conference contribution
AN - SCOPUS:77954480345
SN - 9781424467761
T3 - ICBBT 2010 - 2010 International Conference on Bioinformatics and Biomedical Technology
SP - 92
EP - 96
BT - ICBBT 2010 - 2010 International Conference on Bioinformatics and Biomedical Technology
T2 - 2010 International Conference on Bioinformatics and Biomedical Technology, ICBBT 2010
Y2 - 16 April 2010 through 18 April 2010
ER -