lexrankScores

Document scoring with LexRank algorithm

Syntax

scores = lexrankScores(documents)

scores = lexrankScores(bag)

Description

scores = lexrankScores(documents) scores the specified documents for importance according to pairwise similarity values using the LexRank algorithm. The function uses cosine similarity, and computes importance using the PageRank algorithm.

example

scores = lexrankScores(bag) scores documents encoded by a bag-of-words or bag-of-n-grams model.

example

Examples

collapse all

Importance of Documents

Open Live Script

Create an array of tokenized documents.

str = [
    "the quick brown fox jumped over the lazy dog"
    "the fast brown fox jumped over the lazy dog"
    "the lazy dog sat there and did nothing"
    "the other animals sat there watching"];
documents = tokenizedDocument(str)

documents = 
  4×1 tokenizedDocument:

    9 tokens: the quick brown fox jumped over the lazy dog
    9 tokens: the fast brown fox jumped over the lazy dog
    8 tokens: the lazy dog sat there and did nothing
    6 tokens: the other animals sat there watching

Calculate their LexRank scores.

scores = lexrankScores(documents);

Visualize the scores in a bar chart.

figure
bar(scores)
xlabel("Document")
ylabel("Score")
title("LexRank Scores")

Figure contains an axes object. The axes object with title LexRank Scores, xlabel Document, ylabel Score contains an object of type bar.

Scores Using Bag-of-Words Model

Open Live Script

Create a bag-of-words model from the text data in sonnets.csv.

filename = "sonnets.csv";
tbl = readtable(filename,'TextType','string');
textData = tbl.Sonnet;
documents = tokenizedDocument(textData);
bag = bagOfWords(documents)

bag = 
  bagOfWords with properties:

        NumWords: 3527
          Counts: [154×3527 double]
      Vocabulary: ["From"    "fairest"    "creatures"    "we"    "desire"    "increase"    ","    "That"    "thereby"    "beauty's"    "rose"    "might"    "never"    "die"    "But"    "as"    "the"    "riper"    "should"    "by"    …    ] (1×3527 string)
    NumDocuments: 154

Calculate LexRank scores for each sonnet.

scores = lexrankScores(bag);

Visualize the scores in a bar chart.

figure
bar(scores)
xlabel("Document")
ylabel("Score")
title("LexRank Scores")

Figure contains an axes object. The axes object with title LexRank Scores, xlabel Document, ylabel Score contains an object of type bar.

Input Arguments

collapse all

`documents` — Input documents
`tokenizedDocument` array | string array | cell array of character vectors

Input documents, specified as a tokenizedDocument array, a string array of words, or a cell array of character vectors. If documents is not a tokenizedDocument array, then it must be a row vector representing a single document, where each element is a word. To specify multiple documents, use a tokenizedDocument array.

`bag` — Input model
`bagOfWords` object | `bagOfNgrams` object

Input bag-of-words or bag-of-n-grams model, specified as a bagOfWords object or a bagOfNgrams object. If bag is a bagOfNgrams object, then the function treats each n-gram as a single word.

Output Arguments

collapse all

`scores` — LexRank scores
vector

LexRank scores, returned as a N-by-1 vector, where scores(i) corresponds to the score for the ith input document and N is the number of input documents.

References

[1] Erkan, Günes, and Dragomir R. Radev. "LexRank: Graph-based Lexical Centrality as Salience in Text Summarization." Journal of Artificial Intelligence Research 22 (2004): 457-479.

Version History

Introduced in R2020a

lexrankScores

Syntax

Description

Examples

Importance of Documents

Scores Using Bag-of-Words Model

Input Arguments

`documents` — Input documents
`tokenizedDocument` array | string array | cell array of character vectors

`bag` — Input model
`bagOfWords` object | `bagOfNgrams` object

Output Arguments

`scores` — LexRank scores
vector

References

Version History

See Also

Topics

lexrankScores

Syntax

Description

Examples

Importance of Documents

Scores Using Bag-of-Words Model

Input Arguments

documents — Input documents tokenizedDocument array | string array | cell array of character vectors

bag — Input model bagOfWords object | bagOfNgrams object

Output Arguments

scores — LexRank scores vector

References

Version History

See Also

Topics

`documents` — Input documents
`tokenizedDocument` array | string array | cell array of character vectors

`bag` — Input model
`bagOfWords` object | `bagOfNgrams` object

`scores` — LexRank scores
vector