Main Content

Evaluate translation or summarization with BLEU similarity score

The BiLingual Evaluation Understudy (BLEU) scoring algorithm evaluates the similarity between a candidate document and a collection of reference documents. Use the BLEU score to evaluate the quality of document translation and summarization models.

returns the BLEU similarity score between the specified candidate document and the reference
documents. The function computes n-gram overlaps between `score`

= bleuEvaluationScore(`candidate`

,`references`

)`candidate`

and
`references`

for n-gram lengths one through four, with equal weighting.
For more information, see BLEU Score.

uses the specified n-gram weighting, where `score`

= bleuEvaluationScore(`candidate`

,`references`

,'NgramWeights',`ngramWeights`

)`ngramWeights(i)`

corresponds to
the weight for n-grams of length `i`

. The length of the weight vector
determines the range of n-gram lengths to use for the BLEU score evaluation.

[1] Papineni, Kishore, Salim Roukos,
Todd Ward, and Wei-Jing Zhu. "BLEU: A Method for Automatic Evaluation of Machine Translation."
In *Proceedings of the 40th annual meeting on association for computational
linguistics*, pp. 311-318. Association for Computational Linguistics,
2002.

`tokenizedDocument`

| `rougeEvaluationScore`

| `bm25Similarity`

| `cosineSimilarity`

| `textrankScores`

| `lexrankScores`

| `mmrScores`

| `extractSummary`