Visualize Text Data Using Word Clouds
This example shows how to visualize text data using word clouds.
Text Analytics Toolbox extends the functionality of the wordcloud
(MATLAB) function. It adds support for creating word clouds directly from string arrays and creating word clouds from bag-of-words models and LDA topics.
Load the example data. The file factoryReports.csv
contains factory reports, including a text description and categorical labels for each event.
filename = "factoryReports.csv"; tbl = readtable(filename,'TextType','string');
Extract the text data from the Description
column.
textData = tbl.Description; textData(1:10)
ans = 10×1 string
"Items are occasionally getting stuck in the scanner spools."
"Loud rattling and banging sounds are coming from assembler pistons."
"There are cuts to the power when starting the plant."
"Fried capacitors in the assembler."
"Mixer tripped the fuses."
"Burst pipe in the constructing agent is spraying coolant."
"A fuse is blown in the mixer."
"Things continue to tumble off of the belt."
"Falling items from the conveyor belt."
"The scanner reel is split, it will soon begin to curve."
Create a word cloud from the reports.
figure
wordcloud(textData);
title("Factory Reports")
Compare the words in the reports with labels "Leak"
and "Mechanical Failure"
. Create word clouds of the reports for each of these labels. Specify the word colors to be blue and magenta for each word cloud respectively.
figure labels = tbl.Category; subplot(1,2,1) idx = labels == "Leak"; wordcloud(textData(idx),'Color','blue'); title("Leak") subplot(1,2,2) idx = labels == "Mechanical Failure"; wordcloud(textData(idx),'Color','magenta'); title("Mechanical Failure")
Compare the words in the reports with urgency "Low", "Medium", and "High".
figure urgency = tbl.Urgency; subplot(1,3,1) idx = urgency == "Low"; wordcloud(textData(idx)); title("Urgency: Low") subplot(1,3,2) idx = urgency == "Medium"; wordcloud(textData(idx)); title("Urgency: Medium") subplot(1,3,3) idx = urgency == "High"; wordcloud(textData(idx)); title("Urgency: High")
Compare the words in the reports with cost reported in hundreds of dollars to the reports with costs reported in thousands of dollars. Create word clouds of the reports for each of these amounts with highlight color blue and red respectively.
cost = tbl.Cost; idx = cost > 100; figure wordcloud(textData(idx),'HighlightColor','blue'); title("Cost > $100")
idx = cost > 1000; figure wordcloud(textData(idx),'HighlightColor','red'); title("Cost > $1,000")
See Also
wordcloud
| tokenizedDocument
| bagOfWords