TY - JOUR
T1 - Task-based evaluation of segmentation algorithms for diffusion-weighted MRI without using a gold standard
AU - Jha, Abhinav K.
AU - Kupinski, Matthew A.
AU - Rodríguez, Jeffrey J.
AU - Stephen, Renu M.
AU - Stopeck, Alison T.
PY - 2012/7/7
Y1 - 2012/7/7
N2 - In many studies, the estimation of the apparent diffusion coefficient (ADC) of lesions in visceral organs in diffusion-weighted (DW) magnetic resonance images requires an accurate lesion-segmentation algorithm. To evaluate these lesion-segmentation algorithms, region-overlap measures are used currently. However, the end task from the DW images is accurate ADC estimation, and the region-overlap measures do not evaluate the segmentation algorithms on this task. Moreover, these measures rely on the existence of gold-standard segmentation of the lesion, which is typically unavailable. In this paper, we study the problem of task-based evaluation of segmentation algorithms in DW imaging in the absence of a gold standard. We first show that using manual segmentations instead of gold-standard segmentations for this task-based evaluation is unreliable. We then propose a method to compare the segmentation algorithms that does not require gold-standard or manual segmentation results. The no-gold-standard method estimates the bias and the variance of the error between the true ADC values and the ADC values estimated using the automated segmentation algorithm. The method can be used to rank the segmentation algorithms on the basis of both the ensemble mean square error and precision. We also propose consistency checks for this evaluation technique.
AB - In many studies, the estimation of the apparent diffusion coefficient (ADC) of lesions in visceral organs in diffusion-weighted (DW) magnetic resonance images requires an accurate lesion-segmentation algorithm. To evaluate these lesion-segmentation algorithms, region-overlap measures are used currently. However, the end task from the DW images is accurate ADC estimation, and the region-overlap measures do not evaluate the segmentation algorithms on this task. Moreover, these measures rely on the existence of gold-standard segmentation of the lesion, which is typically unavailable. In this paper, we study the problem of task-based evaluation of segmentation algorithms in DW imaging in the absence of a gold standard. We first show that using manual segmentations instead of gold-standard segmentations for this task-based evaluation is unreliable. We then propose a method to compare the segmentation algorithms that does not require gold-standard or manual segmentation results. The no-gold-standard method estimates the bias and the variance of the error between the true ADC values and the ADC values estimated using the automated segmentation algorithm. The method can be used to rank the segmentation algorithms on the basis of both the ensemble mean square error and precision. We also propose consistency checks for this evaluation technique.
UR - http://www.scopus.com/inward/record.url?scp=84862734500&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84862734500&partnerID=8YFLogxK
U2 - 10.1088/0031-9155/57/13/4425
DO - 10.1088/0031-9155/57/13/4425
M3 - Article
C2 - 22713231
AN - SCOPUS:84862734500
SN - 0031-9155
VL - 57
SP - 4425
EP - 4446
JO - Physics in medicine and biology
JF - Physics in medicine and biology
IS - 13
ER -