Control-Flow Graph (CFG) similarity is a core technique in many areas, including malware detection and software plagiarism detection. While many algorithms have been proposed in the literature, their relative strengths and weaknesses have not been previously studied. Moreover, it is not even clear how to perform such an evaluation. In this paper we therefore propose the first methodology for evaluating CFG similarity algorithms with respect to accuracy and efficiency. At the heart of our methodology is a technique to automatically generate benchmark graphs, CFGs of known edit distances. We show the result of applying our methodology to four popular algorithms. Our results show that an algorithm proposed by Hu et al. is most efficient both in terms of running time and accuracy.