SLIDE 7 BLEU (Bilingual Evaluation Understudy)
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL '02). Association for Computational Linguistics, Stroudsburg, PA, USA, 311-318. DOI: https:// doi.org/10.3115/1073083.1073135 Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien N. Nguyen. 2013. Lexical statistical machine translation for language migration. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2013). ACM, New York, NY, USA, 651-654. DOI: https://doi.org/10.1145/2491411.2494584
It is a popular metrics in Statistical Machine Translation (SMT) that measures translation quality by the accuracy of translating n-grams to n-grams with various n. The BLEU is applied for evaluating lexical matching when migrating equivalent API usage sequence
Usually, the N is set to be 4 and Wn = 1/N BP is the brevity penalty value, c is the length if the candidate translation and the r is the length of reference corpus length. P_i is the metrics for the overlapping between the bag of i-grams appearing in the candidate sentences and that appearing in the reference sentences.