textanalytics.unicode.UTF32

Unicode UTF-32 string representation

Since R2021a

Description

The 32-bit Unicode transformation format (UTF-32) is a fixed length Unicode code point encoding that uses exactly 32 bits per code point.

Creation

Syntax

str32 = textanalytics.unicode.UTF32(str)

Description

str32 = textanalytics.unicode.UTF32(str) returns the Unicode UTF-32 representation of str. If str is an array, then str32(i) is the Unicode UTF-32 representation of the string str(i).

example

Input Arguments

expand all

`str` — Input text
string array | character vector | cell array of character vectors

Input text, specified as a string array, character vector, or cell array of character vectors.

Example: ["An example of a short sentence."; "A second short sentence."]

Data Types: string | char | cell

Properties

expand all

`Data` — UTF-32 code points
`uint32` vector

UTF-32 code points, specified as a vector of integers with type uint32.

If the input string contains surrogate pairs, then the corresponding list of code points has a different length.

Data Types: uint32

Object Functions

`characterCategories`	Unicode character categories
`hex`	Convert UTF-32 representation to hexadecimal values
`string`	Convert UTF-32 representation to string

Examples

collapse all

Convert Text to Unicode UTF-32 String Representation

Open Live Script

Convert the string "Hello! " to its Unicode UTF-32 string representation using the textanalytics.unicode.UTF32 function.

str = "Hello! ";
str32 = textanalytics.unicode.UTF32(str)

str32 = 
  UTF32 with properties:

    Data: [72 101 108 108 111 33 32 128512]

Get Unicode Character Categories

Open Live Script

Convert the string "Hello! " to its Unicode UTF-32 string representation using the textanalytics.unicode.UTF32 function.

str = "Hello! ";
str32 = textanalytics.unicode.UTF32(str)

str32 = 
  UTF32 with properties:

    Data: [72 101 108 108 111 33 32 128512]

Get the Unicode character categories of str32 using the characterCategories function.

ucats = characterCategories(str32)

ucats = 1x1 cell array
    {[L    L    L    L    L    P    Z    S]}

The Unicode character categories "L", "P", "Z", and "S" correspond to "letter", "punctuation", "separator", and "symbol", respectively.

Get Detailed Unicode Character Categories

Open Live Script

Convert the string "Hello! " to its Unicode UTF-32 string representation using the textanalytics.unicode.UTF32 function.

str = "Hello! ";
str32 = textanalytics.unicode.UTF32(str)

str32 = 
  UTF32 with properties:

    Data: [72 101 108 108 111 33 32 128512]

Get the Unicode character categories of str32 using the characterCategories function. To return detailed Unicode character categories, set the 'Granularity' option to 'detailed'.

ucats = characterCategories(str32,'Granularity','detailed')

ucats = 1x1 cell array
    {[Lu    Ll    Ll    Ll    Ll    Po    Zs    So]}

The Unicode character categories "Lu", "Ll", "Po", "Zs", and "So" correspond to "uppercase letter", "lowercase letter", "other punctuation", "space separator", and "other symbol", respectively.

Convert UTF-32 String Representation to Hexadecimal Values

Open Live Script

Convert the string "Hello! " to its Unicode UTF-32 string representation using the textanalytics.unicode.UTF32 function.

str = "Hello! ";
str32 = textanalytics.unicode.UTF32(str)

str32 = 
  UTF32 with properties:

    Data: [72 101 108 108 111 33 32 128512]

Convert str32 to hexadecimal values using the hex function.

hexStr = hex(str32)

hexStr = 
" 0048  0065  006C  006C  006F  0021  0020 1F600"

Convert UTF-32 String Representation to String

Open Live Script

Convert the string "Hello! " to its Unicode UTF-32 string representation using the textanalytics.unicode.UTF32 function.

str = "Hello! ";
str32 = textanalytics.unicode.UTF32(str)

str32 = 
  UTF32 with properties:

    Data: [72 101 108 108 111 33 32 128512]

Convert str32 to string using the string function.

str = string(str32)

str = 
"Hello! "

References

[1] Unicode Standard Annex #19 UTF-32 https://www.unicode.org/reports/tr19/tr19-9.html

Version History

Introduced in R2021a

textanalytics.unicode.UTF32

Description

Creation

Syntax

Description

Input Arguments

`str` — Input text
string array | character vector | cell array of character vectors

Properties

`Data` — UTF-32 code points
`uint32` vector

Object Functions

Examples

Convert Text to Unicode UTF-32 String Representation

Get Unicode Character Categories

Get Detailed Unicode Character Categories

Convert UTF-32 String Representation to Hexadecimal Values

Convert UTF-32 String Representation to String

References

Version History

See Also

Topics

textanalytics.unicode.UTF32

Description

Creation

Syntax

Description

Input Arguments

str — Input text string array | character vector | cell array of character vectors

Properties

Data — UTF-32 code points uint32 vector

Object Functions

Examples

Convert Text to Unicode UTF-32 String Representation

Get Unicode Character Categories

Get Detailed Unicode Character Categories

Convert UTF-32 String Representation to Hexadecimal Values

Convert UTF-32 String Representation to String

References

Version History

See Also

Topics

`str` — Input text
string array | character vector | cell array of character vectors

`Data` — UTF-32 code points
`uint32` vector