In this study we investigate the parallelization of a key feature extraction method called spectral correlation density (SCD) function, which is used in signal classification systems particularly under low signal-to-noise ratio conditions for classifying numerous signals. In order to reduce the computation complexity of the SCD function, we introduce a method called Quarter SCD (QSCD) that allows extracting features of a given signal by processing only quarter of the input signal data. We then parallelize the QSCD by targeting general purpose graphics processing unit (GPU) through architecture specific optimization strategies. We present experimental evaluations on identifying the parallelization configuration for maximizing the efficiency of the program architecture in utilizing the threading power of the GPU architecture. We show that algorithmic and architecture specific optimization strategies result with improving the throughput of the state of the art GPU based Full SCD from 120 signals/second to 2719 signals/second.