Text Analytics Toolbox seems making lots of mistakes on recognizing language and PartOfSpeech
2 views (last 30 days)
Show older comments
Hi,
My input is a list of VERY BASIC ENGLISH words shown below. I would like to find out the part of speech of them.
kid
killer
kind
king
kiss
kitchen
knee
knife
knowledge
words = {'kid','killer','kind','king','kiss','kitchen','knee','knife','knowledge'};
words = string(words);
documents = tokenizedDocument(words);
documents = addPartOfSpeechDetails(documents);
tdetails = tokenDetails(documents);
And this is where the mistakes are when I check the 'tdetails' (see below).
Why Matlab thinks these words are german (should be 'en' for 'english') and adjectives (most of them should be nouns)?
tdetails =
9×7 table
Token DocumentNumber SentenceNumber LineNumber Type Language PartOfSpeech
___________ ______________ ______________ __________ _______ ________ ____________
"kid" 1 1 1 letters de adjective
"killer" 2 1 1 letters de adjective
"kind" 3 1 1 letters de adjective
"king" 4 1 1 letters de adjective
"kiss" 5 1 1 letters de adjective
"kitchen" 6 1 1 letters de adjective
"knee" 7 1 1 letters de adjective
"knife" 8 1 1 letters de adjective
"knowledge" 9 1 1 letters de adjective
0 Comments
Answers (1)
Christopher Creutzig
on 9 Mar 2020
Language detection also works very much better on longer text. It is not trying to do a dictionary lookup (and several of your words are valid German, anyway), it uses statistical information of letter distribution.
Part of speech detection relies heavily on the context in a sentence.
documents = tokenizedDocument("My kid is a king");
documents = addPartOfSpeechDetails(documents);
tokenDetails(documents)
ans =
5×7 table
Token DocumentNumber SentenceNumber LineNumber Type Language PartOfSpeech
______ ______________ ______________ __________ _______ ________ ______________
"My" 1 1 1 letters en pronoun
"kid" 1 1 1 letters en noun
"is" 1 1 1 letters en auxiliary-verb
"a" 1 1 1 letters en determiner
"king" 1 1 1 letters en noun
0 Comments
See Also
Categories
Find more on Text Data Preparation in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!