Arrange words and phrases separated by semi-colon into a single column

2 views (last 30 days)
I am analyzing some journal articles using their keywords. I extracted the keywords from a publication, and they have been saved in the form as shown in the figure.
Keywords and key-phrases are separated by a semi-colon in each cell. Since my least time interval is one month, I'd like to arrange all the key words and phrases in columns corresponding to each month of the year. For example, all the words and phrases in the image would come under a single column 'JAN 2001'. My ultimate goal is to do a frequency analysis of these keywords, and I want 'galaxies' to be considered separate from 'iras galaxies' or 'elliptic galaxies'. I guess repetition of keywords would be allowed within a month as well, since it just shows the trendiness of that concept. How can I separate strings using the semi-colon and arrange them by month? Thank you!

Accepted Answer

Steve Eddins
Steve Eddins on 29 Sep 2020
% Simulated data covering two months.
Phrases = ["EARLY-TYPE GALAXIES; X-RAY; DENSE CLUSTERS"
"LOCAL CONVERGENCE DEPTH; TULLY_FISHER OBSERVATIONS; X_RAY"
"SEYFERT-GALAXIES; PERIODICITY; ASSOCIATIONS"
"LYMAN-LIMIT ABSORPTION; LUMINOSITY FUNCTION"];
Month = ["JAN" "JAN" "FEB" "FEB"]';
Year = [2001 2001 2001 2001]';
t = table(Phrases,Month,Year);
% Group all the phrase sets by month and year.
t2 = varfun(@(x) {x(:)},t,'GroupingVariables',["Month" "Year"]);
% Grab the grouped phrase sets from t2 as a cell array, one cell per
% month/year.
c = t2.Fun_Phrases;
% Join the individual phrase sets by a semicolon. Use UniformOutput = false
% to keep it in a cell array.
c2 = cellfun(@(x) join(x,";"),c,"UniformOutput",false);
% Now split by semicolon and remove leading and trailing blanks.
c3 = cellfun(@(x) strtrim(split(x,";")),c2,'UniformOutput',false);
% Put back in a table.
t3 = t2(:,["Month" "Year"]);
t3.Phrases = c3;
At this point, here's what t3 looks like:
  3 Comments
Walter Roberson
Walter Roberson on 1 Oct 2020
t3.Datetime = datetime(t3.Month + " " + t3.Year, 'InputFormat', 'MMM uuuu');
sort(t3, 'Datetime')

Sign in to comment.

More Answers (0)

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!