|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.ibm.icu.lang.UCharacter
public final class UCharacter
The UCharacter class provides extensions to the java.lang.Character class. These extensions provide support for more Unicode properties and together with the UTF16 class, provide support for supplementary characters (those with code points above U+FFFF). Each ICU release supports the latest version of Unicode available at that time.
Code points are represented in these API using ints. While it would be more convenient in Java to have a separate primitive datatype for them, ints suffice in the meantime.
To use this class please add the jar file name icu4j.jar to the
class path, since it contains data files which supply the information used
by this file.
E.g. In Windows
set CLASSPATH=%CLASSPATH%;$JAR_FILE_PATH/ucharacter.jar.
Otherwise, another method would be to copy the files uprops.dat and
unames.icu from the icu4j source subdirectory
$ICU4J_SRC/src/com.ibm.icu.impl.data to your class directory
$ICU4J_CLASS/com.ibm.icu.impl.data.
Aside from the additions for UTF-16 support, and the updated Unicode properties, the main differences between UCharacter and Character are:
Further detail differences can be determined from the program com.ibm.icu.dev.test.lang.UCharacterCompare
In addition to Java compatibility functions, which calculate derived properties, this API provides low-level access to the Unicode Character Database.
Unicode assigns each code point (not just assigned character) values for many properties. Most of them are simple boolean flags, or constants from a small enumerated list. For some properties, values are strings or other relatively more complex types.
For more information see "About the Unicode Character Database" (http://www.unicode.org/ucd/) and the ICU User Guide chapter on Properties (http://www.icu-project.org/userguide/properties.html).
There are also functions that provide easy migration from C/POSIX functions like isblank(). Their use is generally discouraged because the C/POSIX standards do not define their semantics beyond the ASCII range, which means that different implementations exhibit very different behavior. Instead, Unicode properties should be used directly.
There are also only a few, broad C/POSIX character classes, and they tend to be used for conflicting purposes. For example, the "isalpha()" class is sometimes used to determine word boundaries, while a more sophisticated approach would at least distinguish initial letters from continuation characters (the latter including combining marks). (In ICU, BreakIterator is the most sophisticated API for word boundaries.) Another example: There is no "istitle()" class for titlecase characters.
ICU 3.4 and later provides API access for all twelve C/POSIX character classes. ICU implements them according to the Standard Recommendations in Annex C: Compatibility Properties of UTS #18 Unicode Regular Expressions (http://www.unicode.org/reports/tr18/#Compatibility_Properties).
API access for C/POSIX character classes is as follows:
- alpha: isUAlphabetic(c) or hasBinaryProperty(c, UProperty.ALPHABETIC)
- lower: isULowercase(c) or hasBinaryProperty(c, UProperty.LOWERCASE)
- upper: isUUppercase(c) or hasBinaryProperty(c, UProperty.UPPERCASE)
- punct: ((1<
The C/POSIX character classes are also available in UnicodeSet patterns,
using patterns like [:graph:] or \p{graph}.
Note: There are several ICU (and Java) whitespace functions.
Comparison:
- isUWhiteSpace=UCHAR_WHITE_SPACE: Unicode White_Space property;
most of general categories "Z" (separators) + most whitespace ISO controls
(including no-break spaces, but excluding IS1..IS4 and ZWSP)
- isWhitespace: Java isWhitespace; Z + whitespace ISO controls but excluding no-break spaces
- isSpaceChar: just Z (including no-break spaces)
This class is not subclassable
Get the numeric value for a Unicode code point as defined in the
Unicode Character Database. A "double" return type is necessary because some numeric values are
fractions, negative, or too large for int. For characters without any numeric values in the Unicode Character
Database, this function will return NO_NUMERIC_VALUE. API Change: In release 2.2 and prior, this API has a
return type int and returns -1 when the argument ch does not have a
corresponding numeric value. This has been changed to synch with ICU4C
UCharacterEnums
Nested Class Summary
static interfaceUCharacter.DecompositionType
Decomposition Type constants.
static interfaceUCharacter.EastAsianWidth
East Asian Width constants.
static interfaceUCharacter.GraphemeClusterBreak
Grapheme Cluster Break constants.
static interfaceUCharacter.HangulSyllableType
Hangul Syllable Type constants.
static interfaceUCharacter.JoiningGroup
Joining Group constants.
static interfaceUCharacter.JoiningType
Joining Type constants.
static interfaceUCharacter.LineBreak
Line Break constants.
static interfaceUCharacter.NumericType
Numeric Type constants.
static interfaceUCharacter.SentenceBreak
Sentence Break constants.
static classUCharacter.UnicodeBlock
A family of character subsets representing the character blocks in the
Unicode specification, generated from Unicode Data file Blocks.txt.
static interfaceUCharacter.WordBreak
Word Break constants.
Field Summary
static intFOLD_CASE_DEFAULT
Option value for case folding: use default mappings defined in CaseFolding.txt.
static intFOLD_CASE_EXCLUDE_SPECIAL_I
Option value for case folding: exclude the mappings for dotted I
and dotless i marked with 'I' in CaseFolding.txt.
static intMAX_CODE_POINT
Cover the JDK 1.5 API, for convenience.
static charMAX_HIGH_SURROGATE
Cover the JDK 1.5 API, for convenience.
static charMAX_LOW_SURROGATE
Cover the JDK 1.5 API, for convenience.
static intMAX_RADIX
Compatibility constant for Java Character's MAX_RADIX.
static charMAX_SURROGATE
Cover the JDK 1.5 API, for convenience.
static intMAX_VALUE
The highest Unicode code point value (scalar value) according to the
Unicode Standard.
static intMIN_CODE_POINT
Cover the JDK 1.5 API, for convenience.
static charMIN_HIGH_SURROGATE
Cover the JDK 1.5 API, for convenience.
static charMIN_LOW_SURROGATE
Cover the JDK 1.5 API, for convenience.
static intMIN_RADIX
Compatibility constant for Java Character's MIN_RADIX.
static intMIN_SUPPLEMENTARY_CODE_POINT
Cover the JDK 1.5 API, for convenience.
static charMIN_SURROGATE
Cover the JDK 1.5 API, for convenience.
static intMIN_VALUE
The lowest Unicode code point value.
static doubleNO_NUMERIC_VALUE
Special value that is returned by getUnicodeNumericValue(int) when no
numeric value is defined for a code point.
static intREPLACEMENT_CHAR
Unicode value used when translating into Unicode encoding form and there
is no existing character.
static intSUPPLEMENTARY_MIN_VALUE
The minimum value for Supplementary code points
static intTITLECASE_NO_BREAK_ADJUSTMENT
Do not adjust the titlecasing indexes from BreakIterator::next() indexes;
titlecase exactly the characters at breaks from the iterator.
static intTITLECASE_NO_LOWERCASE
Do not lowercase non-initial parts of words when titlecasing.
Method Summary
static intcharCount(int cp)
Cover the JDK 1.5 API, for convenience.
static intcodePointAt(char[] text,
int index)
Cover the JDK 1.5 API, for convenience.
static intcodePointAt(char[] text,
int index,
int limit)
Cover the JDK 1.5 API, for convenience.
static intcodePointAt(CharSequence seq,
int index)
Cover the JDK 1.5 API, for convenience.
static intcodePointBefore(char[] text,
int index)
Cover the JDK 1.5 API, for convenience.
static intcodePointBefore(char[] text,
int index,
int limit)
Cover the JDK 1.5 API, for convenience.
static intcodePointBefore(CharSequence seq,
int index)
Cover the JDK 1.5 API, for convenience.
static intcodePointCount(char[] text,
int start,
int limit)
Cover the JDK API, for convenience.
static intcodePointCount(CharSequence text,
int start,
int limit)
Cover the JDK API, for convenience.
static intdigit(int ch)
Retrieves the numeric value of a decimal digit code point.
static intdigit(int ch,
int radix)
Retrieves the numeric value of a decimal digit code point.
static intfoldCase(int ch,
boolean defaultmapping)
The given character is mapped to its case folding equivalent according
to UnicodeData.txt and CaseFolding.txt; if the character has no case
folding equivalent, the character itself is returned.
static intfoldCase(int ch,
int options)
The given character is mapped to its case folding equivalent according
to UnicodeData.txt and CaseFolding.txt; if the character has no case
folding equivalent, the character itself is returned.
static StringfoldCase(String str,
boolean defaultmapping)
The given string is mapped to its case folding equivalent according to
UnicodeData.txt and CaseFolding.txt; if any character has no case
folding equivalent, the character itself is returned.
static StringfoldCase(String str,
int options)
The given string is mapped to its case folding equivalent according to
UnicodeData.txt and CaseFolding.txt; if any character has no case
folding equivalent, the character itself is returned.
static charforDigit(int digit,
int radix)
Provide the java.lang.Character forDigit API, for convenience.
static VersionInfogetAge(int ch)
Get the "age" of the code point.
static intgetCharFromExtendedName(String name)
Find a Unicode character by either its name and return its code
point value.
static intgetCharFromName(String name)
Find a Unicode code point by its most current Unicode name and
return its code point value.
static intgetCharFromName1_0(String name)
Find a Unicode character by its version 1.0 Unicode name and return
its code point value.
static intgetCodePoint(char char16)
Returns the code point corresponding to the UTF16 character.
static intgetCodePoint(char lead,
char trail)
Returns a code point corresponding to the two UTF16 characters.
static intgetCombiningClass(int ch)
Gets the combining class of the argument codepoint
static intgetDirection(int ch)
Returns the Bidirection property of a code point.
static bytegetDirectionality(int cp)
Cover the JDK API, for convenience.
static StringgetExtendedName(int ch)
Retrieves a name for a valid codepoint.
static ValueIteratorgetExtendedNameIterator()
Gets an iterator for character names, iterating over codepoints.
static intgetHanNumericValue(int ch)
Return numeric value of Han code points.
static intgetIntPropertyMaxValue(int type)
Get the maximum value for an integer/binary Unicode property.
static intgetIntPropertyMinValue(int type)
Get the minimum value for an integer/binary Unicode property type.
static intgetIntPropertyValue(int ch,
int type)
Gets the property value for an Unicode property type of a code point.
static StringgetISOComment(int ch)
Get the ISO 10646 comment for a character.
static intgetMirror(int ch)
Maps the specified code point to a "mirror-image" code point.
static StringgetName(int ch)
Retrieve the most current Unicode name of the argument code point, or
null if the character is unassigned or outside the range
UCharacter.MIN_VALUE and UCharacter.MAX_VALUE or does not have a name.
static StringgetName(String s,
String separator)
Gets the names for each of the characters in a string
static StringgetName1_0(int ch)
Retrieve the earlier version 1.0 Unicode name of the argument code
point, or null if the character is unassigned or outside the range
UCharacter.MIN_VALUE and UCharacter.MAX_VALUE or does not have a name.
static ValueIteratorgetName1_0Iterator()
Gets an iterator for character names, iterating over codepoints.
static ValueIteratorgetNameIterator()
Gets an iterator for character names, iterating over codepoints.
static intgetNumericValue(int ch)
Returns the numeric value of the code point as a nonnegative
integer.
static intgetPropertyEnum(String propertyAlias)
Return the UProperty selector for a given property name, as
specified in the Unicode database file PropertyAliases.txt.
static StringgetPropertyName(int property,
int nameChoice)
Return the Unicode name for a given property, as given in the
Unicode database file PropertyAliases.txt.
static intgetPropertyValueEnum(int property,
String valueAlias)
Return the property value integer for a given value name, as
specified in the Unicode database file PropertyValueAliases.txt.
static StringgetPropertyValueName(int property,
int value,
int nameChoice)
Return the Unicode name for a given property value, as given in
the Unicode database file PropertyValueAliases.txt.
static StringgetStringPropertyValue(int propertyEnum,
int codepoint,
int nameChoice)
Deprecated. This API is ICU internal only.
static intgetType(int ch)
Returns a value indicating a code point's Unicode category.
static RangeValueIteratorgetTypeIterator()
Gets an iterator for character types, iterating over codepoints.
static doublegetUnicodeNumericValue(int ch)
Get the numeric value for a Unicode code point as defined in the
Unicode Character Database.
static VersionInfogetUnicodeVersion()
Gets the version of Unicode data used.
static booleanhasBinaryProperty(int ch,
int property)
Check a binary Unicode property for a code point.
static booleanisBaseForm(int ch)
Determines whether the specified code point is of base form.
static booleanisBMP(int ch)
Determines if the code point is in the BMP plane.
static booleanisDefined(int ch)
Determines if a code point has a defined meaning in the up-to-date
Unicode standard.
static booleanisDigit(int ch)
Determines if a code point is a Java digit.
static booleanisHighSurrogate(char ch)
Cover the JDK 1.5 API, for convenience.
static booleanisIdentifierIgnorable(int ch)
Determines if the specified code point should be regarded as an
ignorable character in a Unicode identifier.
static booleanisISOControl(int ch)
Determines if the specified code point is an ISO control character.
static booleanisJavaIdentifierPart(int cp)
Compatibility override of Java method, delegates to
java.lang.Character.isJavaIdentifierPart.
static booleanisJavaIdentifierStart(int cp)
Compatibility override of Java method, delegates to
java.lang.Character.isJavaIdentifierStart.
static booleanisJavaLetter(int cp)
Deprecated. ICU 3.4 (Java)
static booleanisJavaLetterOrDigit(int cp)
Deprecated. ICU 3.4 (Java)
static booleanisLegal(int ch)
A code point is illegal if and only if
Out of bounds, less than 0 or greater than UCharacter.MAX_VALUE
A surrogate value, 0xD800 to 0xDFFF
Not-a-character, having the form 0x xxFFFF or 0x xxFFFE
Note: legal does not mean that it is assigned in this version of Unicode.
static booleanisLegal(String str)
A string is legal iff all its code points are legal.
static booleanisLetter(int ch)
Determines if the specified code point is a letter.
static booleanisLetterOrDigit(int ch)
Determines if the specified code point is a letter or digit.
static booleanisLowerCase(int ch)
Determines if the specified code point is a lowercase character.
static booleanisLowSurrogate(char ch)
Cover the JDK 1.5 API, for convenience.
static booleanisMirrored(int ch)
Determines whether the code point has the "mirrored" property.
static booleanisPrintable(int ch)
Determines whether the specified code point is a printable character
according to the Unicode standard.
static booleanisSpace(int ch)
Deprecated. ICU 3.4 (Java)
static booleanisSpaceChar(int ch)
Determines if the specified code point is a Unicode specified space
character, i.e. if code point is in the category Zs, Zl and Zp.
static booleanisSupplementary(int ch)
Determines if the code point is a supplementary character.
static booleanisSupplementaryCodePoint(int cp)
Cover the JDK 1.5 API, for convenience.
static booleanisSurrogatePair(char high,
char low)
Cover the JDK 1.5 API, for convenience.
static booleanisTitleCase(int ch)
Determines if the specified code point is a titlecase character.
static booleanisUAlphabetic(int ch)
Check if a code point has the Alphabetic Unicode property.
static booleanisULowercase(int ch)
Check if a code point has the Lowercase Unicode property.
static booleanisUnicodeIdentifierPart(int ch)
Determines if the specified code point may be any part of a Unicode
identifier other than the starting character.
static booleanisUnicodeIdentifierStart(int ch)
Determines if the specified code point is permissible as the first
character in a Unicode identifier.
static booleanisUpperCase(int ch)
Determines if the specified code point is an uppercase character.
static booleanisUUppercase(int ch)
Check if a code point has the Uppercase Unicode property.
static booleanisUWhiteSpace(int ch)
Check if a code point has the White_Space Unicode property.
static booleanisValidCodePoint(int cp)
Cover the JDK 1.5 API, for convenience.
static booleanisWhitespace(int ch)
Determines if the specified code point is a white space character.
static intoffsetByCodePoints(char[] text,
int start,
int count,
int index,
int codePointOffset)
Cover the JDK API, for convenience.
static intoffsetByCodePoints(CharSequence text,
int index,
int codePointOffset)
Cover the JDK API, for convenience.
static char[]toChars(int cp)
Cover the JDK 1.5 API, for convenience.
static inttoChars(int cp,
char[] dst,
int dstIndex)
Cover the JDK 1.5 API, for convenience.
static inttoCodePoint(char high,
char low)
Cover the JDK 1.5 API, for convenience.
static inttoLowerCase(int ch)
The given code point is mapped to its lowercase equivalent; if the code
point has no lowercase equivalent, the code point itself is returned.
static StringtoLowerCase(Locale locale,
String str)
Gets lowercase version of the argument string.
static StringtoLowerCase(String str)
Gets lowercase version of the argument string.
static StringtoLowerCase(ULocale locale,
String str)
Gets lowercase version of the argument string.
static StringtoString(int ch)
Converts argument code point and returns a String object representing
the code point's value in UTF16 format.
static inttoTitleCase(int ch)
Converts the code point argument to titlecase.
static StringtoTitleCase(Locale locale,
String str,
BreakIterator breakiter)
Gets the titlecase version of the argument string.
static StringtoTitleCase(String str,
BreakIterator breakiter)
Gets the titlecase version of the argument string.
static StringtoTitleCase(ULocale locale,
String str,
BreakIterator titleIter)
Gets the titlecase version of the argument string.
static StringtoTitleCase(ULocale locale,
String str,
BreakIterator titleIter,
int options)
Gets the titlecase version of the argument string.
static inttoUpperCase(int ch)
Converts the character argument to uppercase.
static StringtoUpperCase(Locale locale,
String str)
Gets uppercase version of the argument string.
static StringtoUpperCase(String str)
Gets uppercase version of the argument string.
static StringtoUpperCase(ULocale locale,
String str)
Gets uppercase version of the argument string.
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Field Detail
MIN_VALUE
public static final int MIN_VALUE
MAX_VALUE
public static final int MAX_VALUE
Up-to-date Unicode implementation of java.lang.Character.MIN_VALUE
SUPPLEMENTARY_MIN_VALUE
public static final int SUPPLEMENTARY_MIN_VALUE
REPLACEMENT_CHAR
public static final int REPLACEMENT_CHAR
NO_NUMERIC_VALUE
public static final double NO_NUMERIC_VALUE
getUnicodeNumericValue(int),
Constant Field Values
MIN_RADIX
public static final int MIN_RADIX
MAX_RADIX
public static final int MAX_RADIX
TITLECASE_NO_LOWERCASE
public static final int TITLECASE_NO_LOWERCASE
toTitleCase(int),
Constant Field Values
TITLECASE_NO_BREAK_ADJUSTMENT
public static final int TITLECASE_NO_BREAK_ADJUSTMENT
toTitleCase(int),
TITLECASE_NO_LOWERCASE,
Constant Field Values
FOLD_CASE_DEFAULT
public static final int FOLD_CASE_DEFAULT
FOLD_CASE_EXCLUDE_SPECIAL_I
public static final int FOLD_CASE_EXCLUDE_SPECIAL_I
MIN_HIGH_SURROGATE
public static final char MIN_HIGH_SURROGATE
UTF16.LEAD_SURROGATE_MIN_VALUE,
Constant Field Values
MAX_HIGH_SURROGATE
public static final char MAX_HIGH_SURROGATE
UTF16.LEAD_SURROGATE_MAX_VALUE,
Constant Field Values
MIN_LOW_SURROGATE
public static final char MIN_LOW_SURROGATE
UTF16.TRAIL_SURROGATE_MIN_VALUE,
Constant Field Values
MAX_LOW_SURROGATE
public static final char MAX_LOW_SURROGATE
UTF16.TRAIL_SURROGATE_MAX_VALUE,
Constant Field Values
MIN_SURROGATE
public static final char MIN_SURROGATE
UTF16.SURROGATE_MIN_VALUE,
Constant Field Values
MAX_SURROGATE
public static final char MAX_SURROGATE
UTF16.SURROGATE_MAX_VALUE,
Constant Field Values
MIN_SUPPLEMENTARY_CODE_POINT
public static final int MIN_SUPPLEMENTARY_CODE_POINT
UTF16.SUPPLEMENTARY_MIN_VALUE,
Constant Field Values
MAX_CODE_POINT
public static final int MAX_CODE_POINT
UTF16.CODEPOINT_MAX_VALUE,
Constant Field Values
MIN_CODE_POINT
public static final int MIN_CODE_POINT
UTF16.CODEPOINT_MIN_VALUE,
Constant Field Values
Method Detail
digit
public static int digit(int ch,
int radix)
This method observes the semantics of
java.lang.Character.digit(). Note that this
will return positive values for code points for which isDigit
returns false, just like java.lang.Character.
Semantic Change: In release 1.3.1 and
prior, this did not treat the European letters as having a
digit value, and also treated numeric letters and other numbers as
digits.
This has been changed to conform to the java semantics.
A code point is a valid digit if and only if:
ch - the code point to queryradix - the radix
digit
public static int digit(int ch)
This is a convenience overload of digit(int, int)
that provides a decimal radix.
Semantic Change: In release 1.3.1 and prior, this
treated numeric letters and other numbers as digits. This has
been changed to conform to the java semantics.
ch - the code point to query
getNumericValue
public static int getNumericValue(int ch)
If the code point does not have a numeric value, then -1 is returned.
If the code point has a numeric value that cannot be represented as a
nonnegative integer (for example, a fractional value), then -2 is
returned.
ch - the code point to query
getUnicodeNumericValue
public static double getUnicodeNumericValue(int ch)
ch - Code point to get the numeric value for.
isSpace
public static boolean isSpace(int ch)
ch - the code point
getType
public static int getType(int ch)
Return results are constants from the interface
UCharacterCategory
NOTE: the UCharacterCategory values are not compatible with
those returned by java.lang.Character.getType. UCharacterCategory values
match the ones used in ICU4C, while java.lang.Character type
values, though similar, skip the value 17.
ch - code point whose type is to be determined
public static boolean isDefined(int ch)
ch - code point to be determined if it is defined in the most
current version of Unicode
public static boolean isDigit(int ch)
java.lang.Character.isDigit(). It returns true for decimal
digits only.
ch - code point to query
public static boolean isISOControl(int ch)
ch - code point to determine if it is an ISO control character
public static boolean isLetter(int ch)
ch - code point to determine if it is a letter
public static boolean isLetterOrDigit(int ch)
ch - code point to determine if it is a letter or a digit
public static boolean isJavaLetter(int cp)
cp - the code point
public static boolean isJavaLetterOrDigit(int cp)
cp - the code point
public static boolean isJavaIdentifierStart(int cp)
cp - the code point
public static boolean isJavaIdentifierPart(int cp)
cp - the code point
public static boolean isLowerCase(int ch)
ch - code point to determine if it is in lowercase
public static boolean isWhitespace(int ch)
ch - code point to determine if it is a white space
public static boolean isSpaceChar(int ch)
ch - code point to determine if it is a space
public static boolean isTitleCase(int ch)
ch - code point to determine if it is in title case
public static boolean isUnicodeIdentifierPart(int ch)
ch - code point to determine if is can be part of a Unicode
identifier
public static boolean isUnicodeIdentifierStart(int ch)
ch - code point to determine if it can start a Unicode identifier
public static boolean isIdentifierIgnorable(int ch)
ch - code point to be determined if it can be ignored in a Unicode
identifier.
public static boolean isUpperCase(int ch)
ch - code point to determine if it is in uppercase
public static int toLowerCase(int ch)
This function only returns the simple, single-code point case mapping. Full case mappings should be used whenever possible because they produce better results by working on whole strings. They take into account the string context and the language and can map to a result string with a different length as appropriate. Full case mappings are applied by the case mapping functions that take String parameters rather than code points (int). See also the User Guide chapter on C/POSIX migration: http://www.icu-project.org/userguide/posix.html#case_mappings
ch - code point whose lowercase equivalent is to be retrieved
public static String toString(int ch)
ch - code point
public static int toTitleCase(int ch)
This function only returns the simple, single-code point case mapping. Full case mappings should be used whenever possible because they produce better results by working on whole strings. They take into account the string context and the language and can map to a result string with a different length as appropriate. Full case mappings are applied by the case mapping functions that take String parameters rather than code points (int). See also the User Guide chapter on C/POSIX migration: http://www.icu-project.org/userguide/posix.html#case_mappings
ch - code point whose title case is to be retrieved
public static int toUpperCase(int ch)
This function only returns the simple, single-code point case mapping. Full case mappings should be used whenever possible because they produce better results by working on whole strings. They take into account the string context and the language and can map to a result string with a different length as appropriate. Full case mappings are applied by the case mapping functions that take String parameters rather than code points (int). See also the User Guide chapter on C/POSIX migration: http://www.icu-project.org/userguide/posix.html#case_mappings
ch - code point whose uppercase is to be retrieved
public static boolean isSupplementary(int ch)
ch - code point to be determined if it is in the supplementary
plane
public static boolean isBMP(int ch)
ch - code point to be determined if it is not a supplementary
character
public static boolean isPrintable(int ch)
ch - code point to be determined if it is printable
public static boolean isBaseForm(int ch)
ch - code point to be determined if it is of base form
public static int getDirection(int ch)
ch - the code point to be determined its direction
public static boolean isMirrored(int ch)
ch - code point whose mirror is to be determined
public static int getMirror(int ch)
ch - code point whose mirror is to be retrieved
public static int getCombiningClass(int ch)