ICU 59.1  59.1
Namespaces | Macros | Typedefs | Functions
ucasemap.h File Reference

C API: Unicode case mapping functions using a UCaseMap service object. More...

#include "unicode/utypes.h"
#include "unicode/localpointer.h"
#include "unicode/ustring.h"

Go to the source code of this file.

Namespaces

 icu
 File coll.h.
 

Macros

#define U_TITLECASE_NO_LOWERCASE   0x100
 Do not lowercase non-initial parts of words when titlecasing. More...
 
#define U_TITLECASE_NO_BREAK_ADJUSTMENT   0x200
 Do not adjust the titlecasing indexes from BreakIterator::next() indexes; titlecase exactly the characters at breaks from the iterator. More...
 
#define UCASEMAP_OMIT_UNCHANGED_TEXT   0x4000
 Omit unchanged text when case-mapping with Edits. More...
 

Typedefs

typedef struct UCaseMap UCaseMap
 C typedef for struct UCaseMap. More...
 

Functions

UCaseMapucasemap_open (const char *locale, uint32_t options, UErrorCode *pErrorCode)
 Open a UCaseMap service object for a locale and a set of options. More...
 
void ucasemap_close (UCaseMap *csm)
 Close a UCaseMap service object. More...
 
const char * ucasemap_getLocale (const UCaseMap *csm)
 Get the locale ID that is used for language-dependent case mappings. More...
 
uint32_t ucasemap_getOptions (const UCaseMap *csm)
 Get the options bit set that is used for case folding and string comparisons. More...
 
void ucasemap_setLocale (UCaseMap *csm, const char *locale, UErrorCode *pErrorCode)
 Set the locale ID that is used for language-dependent case mappings. More...
 
void ucasemap_setOptions (UCaseMap *csm, uint32_t options, UErrorCode *pErrorCode)
 Set the options bit set that is used for case folding and string comparisons. More...
 
const UBreakIteratorucasemap_getBreakIterator (const UCaseMap *csm)
 Get the break iterator that is used for titlecasing. More...
 
void ucasemap_setBreakIterator (UCaseMap *csm, UBreakIterator *iterToAdopt, UErrorCode *pErrorCode)
 Set the break iterator that is used for titlecasing. More...
 
int32_t ucasemap_toTitle (UCaseMap *csm, UChar *dest, int32_t destCapacity, const UChar *src, int32_t srcLength, UErrorCode *pErrorCode)
 Titlecase a UTF-16 string. More...
 
int32_t ucasemap_utf8ToLower (const UCaseMap *csm, char *dest, int32_t destCapacity, const char *src, int32_t srcLength, UErrorCode *pErrorCode)
 Lowercase the characters in a UTF-8 string. More...
 
int32_t ucasemap_utf8ToUpper (const UCaseMap *csm, char *dest, int32_t destCapacity, const char *src, int32_t srcLength, UErrorCode *pErrorCode)
 Uppercase the characters in a UTF-8 string. More...
 
int32_t ucasemap_utf8ToTitle (UCaseMap *csm, char *dest, int32_t destCapacity, const char *src, int32_t srcLength, UErrorCode *pErrorCode)
 Titlecase a UTF-8 string. More...
 
int32_t ucasemap_utf8FoldCase (const UCaseMap *csm, char *dest, int32_t destCapacity, const char *src, int32_t srcLength, UErrorCode *pErrorCode)
 Case-folds the characters in a UTF-8 string. More...
 

Detailed Description

C API: Unicode case mapping functions using a UCaseMap service object.

The service object takes care of memory allocations, data loading, and setup for the attributes, as usual.

Currently, the functionality provided here does not overlap with uchar.h and ustring.h, except for ucasemap_toTitle().

ucasemap_utf8XYZ() functions operate directly on UTF-8 strings.

Definition in file ucasemap.h.

Macro Definition Documentation

§ U_TITLECASE_NO_BREAK_ADJUSTMENT

#define U_TITLECASE_NO_BREAK_ADJUSTMENT   0x200

Do not adjust the titlecasing indexes from BreakIterator::next() indexes; titlecase exactly the characters at breaks from the iterator.

Option bit for titlecasing APIs that take an options bit set.

By default, titlecasing will take each break iterator index, adjust it by looking for the next cased character, and titlecase that one. Other characters are lowercased.

This follows Unicode 4 & 5 section 3.13 Default Case Operations:

R3 toTitlecase(X): Find the word boundaries based on Unicode Standard Annex #29, "Text Boundaries." Between each pair of word boundaries, find the first cased character F. If F exists, map F to default_title(F); then map each subsequent character C to default_lower(C).

See also
ucasemap_setOptions
ucasemap_toTitle
ucasemap_utf8ToTitle
UnicodeString::toTitle
U_TITLECASE_NO_LOWERCASE
Stable:
ICU 3.8

Definition at line 186 of file ucasemap.h.

§ U_TITLECASE_NO_LOWERCASE

#define U_TITLECASE_NO_LOWERCASE   0x100

Do not lowercase non-initial parts of words when titlecasing.

Option bit for titlecasing APIs that take an options bit set.

By default, titlecasing will titlecase the first cased character of a word and lowercase all other characters. With this option, the other characters will not be modified.

See also
ucasemap_setOptions
ucasemap_toTitle
ucasemap_utf8ToTitle
UnicodeString::toTitle
Stable:
ICU 3.8

Definition at line 161 of file ucasemap.h.

§ UCASEMAP_OMIT_UNCHANGED_TEXT

#define UCASEMAP_OMIT_UNCHANGED_TEXT   0x4000

Omit unchanged text when case-mapping with Edits.

See also
CaseMap
Edits
Draft:
This API may be changed in the future versions and was introduced in ICU 59

Definition at line 195 of file ucasemap.h.

Typedef Documentation

§ UCaseMap

typedef struct UCaseMap UCaseMap

C typedef for struct UCaseMap.

Stable:
ICU 3.4

Definition at line 47 of file ucasemap.h.

Function Documentation

§ ucasemap_close()

void ucasemap_close ( UCaseMap csm)

Close a UCaseMap service object.

Parameters
csmObject to be closed.
Stable:
ICU 3.4

§ ucasemap_getBreakIterator()

const UBreakIterator* ucasemap_getBreakIterator ( const UCaseMap csm)

Get the break iterator that is used for titlecasing.

Do not modify the returned break iterator.

Parameters
csmUCaseMap service object.
Returns
titlecasing break iterator
Stable:
ICU 3.8

§ ucasemap_getLocale()

const char* ucasemap_getLocale ( const UCaseMap csm)

Get the locale ID that is used for language-dependent case mappings.

Parameters
csmUCaseMap service object.
Returns
locale ID
Stable:
ICU 3.4

§ ucasemap_getOptions()

uint32_t ucasemap_getOptions ( const UCaseMap csm)

Get the options bit set that is used for case folding and string comparisons.

Parameters
csmUCaseMap service object.
Returns
options bit set
Stable:
ICU 3.4

§ ucasemap_open()

UCaseMap* ucasemap_open ( const char *  locale,
uint32_t  options,
UErrorCode pErrorCode 
)

Open a UCaseMap service object for a locale and a set of options.

The locale ID and options are preprocessed so that functions using the service object need not process them in each call.

Parameters
localeICU locale ID, used for language-dependent upper-/lower-/title-casing according to the Unicode standard. Usual semantics: ""=root, NULL=default locale, etc.
optionsOptions bit set, used for case folding and string comparisons. Same flags as for u_foldCase(), u_strFoldCase(), u_strCaseCompare(), etc. Use 0 or U_FOLD_CASE_DEFAULT for default behavior.
pErrorCodeMust be a valid pointer to an error code value, which must not indicate a failure before the function call.
Returns
Pointer to a UCaseMap service object, if successful.
See also
U_FOLD_CASE_DEFAULT
U_FOLD_CASE_EXCLUDE_SPECIAL_I
U_TITLECASE_NO_LOWERCASE
U_TITLECASE_NO_BREAK_ADJUSTMENT
Stable:
ICU 3.4

§ ucasemap_setBreakIterator()

void ucasemap_setBreakIterator ( UCaseMap csm,
UBreakIterator iterToAdopt,
UErrorCode pErrorCode 
)

Set the break iterator that is used for titlecasing.

The UCaseMap service object releases a previously set break iterator and "adopts" this new one, taking ownership of it. It will be released in a subsequent call to ucasemap_setBreakIterator() or ucasemap_close().

Break iterator operations are not thread-safe. Therefore, titlecasing functions use non-const UCaseMap objects. It is not possible to titlecase strings concurrently using the same UCaseMap.

Parameters
csmUCaseMap service object.
iterToAdoptBreak iterator to be adopted for titlecasing.
pErrorCodeMust be a valid pointer to an error code value, which must not indicate a failure before the function call.
See also
ucasemap_toTitle
ucasemap_utf8ToTitle
Stable:
ICU 3.8

§ ucasemap_setLocale()

void ucasemap_setLocale ( UCaseMap csm,
const char *  locale,
UErrorCode pErrorCode 
)

Set the locale ID that is used for language-dependent case mappings.

Parameters
csmUCaseMap service object.
localeLocale ID, see ucasemap_open().
pErrorCodeMust be a valid pointer to an error code value, which must not indicate a failure before the function call.
See also
ucasemap_open
Stable:
ICU 3.4

§ ucasemap_setOptions()

void ucasemap_setOptions ( UCaseMap csm,
uint32_t  options,
UErrorCode pErrorCode 
)

Set the options bit set that is used for case folding and string comparisons.

Parameters
csmUCaseMap service object.
optionsOptions bit set, see ucasemap_open().
pErrorCodeMust be a valid pointer to an error code value, which must not indicate a failure before the function call.
See also
ucasemap_open
Stable:
ICU 3.4

§ ucasemap_toTitle()

int32_t ucasemap_toTitle ( UCaseMap csm,
UChar dest,
int32_t  destCapacity,
const UChar src,
int32_t  srcLength,
UErrorCode pErrorCode 
)

Titlecase a UTF-16 string.

This function is almost a duplicate of u_strToTitle(), except that it takes ucasemap_setOptions() into account and has performance advantages from being able to use a UCaseMap object for multiple case mapping operations, saving setup time.

Casing is locale-dependent and context-sensitive. Titlecasing uses a break iterator to find the first characters of words that are to be titlecased. It titlecases those characters and lowercases all others. (This can be modified with ucasemap_setOptions().)

Note: This function takes a non-const UCaseMap pointer because it will open a default break iterator if no break iterator was set yet, and effectively call ucasemap_setBreakIterator(); also because the break iterator is stateful and will be modified during the iteration.

The titlecase break iterator can be provided to customize for arbitrary styles, using rules and dictionaries beyond the standard iterators. The standard titlecase iterator for the root locale implements the algorithm of Unicode TR 21.

This function uses only the setUText(), first(), next() and close() methods of the provided break iterator.

The result may be longer or shorter than the original. The source string and the destination buffer must not overlap.

Parameters
csmUCaseMap service object. This pointer is non-const! See the note above for details.
destA buffer for the result string. The result will be NUL-terminated if the buffer is large enough. The contents is undefined in case of failure.
destCapacityThe size of the buffer (number of UChars). If it is 0, then dest may be NULL and the function will only return the length of the result without writing any of the result string.
srcThe original string.
srcLengthThe length of the original string. If -1, then src must be NUL-terminated.
pErrorCodeMust be a valid pointer to an error code value, which must not indicate a failure before the function call.
Returns
The length of the result string, if successful - or in case of a buffer overflow, in which case it will be greater than destCapacity.
See also
u_strToTitle
Stable:
ICU 3.8

§ ucasemap_utf8FoldCase()

int32_t ucasemap_utf8FoldCase ( const UCaseMap csm,
char *  dest,
int32_t  destCapacity,
const char *  src,
int32_t  srcLength,
UErrorCode pErrorCode 
)

Case-folds the characters in a UTF-8 string.

Case-folding is locale-independent and not context-sensitive, but there is an option for whether to include or exclude mappings for dotted I and dotless i that are marked with 'T' in CaseFolding.txt.

The result may be longer or shorter than the original. The source string and the destination buffer must not overlap.

Parameters
csmUCaseMap service object.
destA buffer for the result string. The result will be NUL-terminated if the buffer is large enough. The contents is undefined in case of failure.
destCapacityThe size of the buffer (number of bytes). If it is 0, then dest may be NULL and the function will only return the length of the result without writing any of the result string.
srcThe original string.
srcLengthThe length of the original string. If -1, then src must be NUL-terminated.
pErrorCodeMust be a valid pointer to an error code value, which must not indicate a failure before the function call.
Returns
The length of the result string, if successful - or in case of a buffer overflow, in which case it will be greater than destCapacity.
See also
u_strFoldCase
ucasemap_setOptions
U_FOLD_CASE_DEFAULT
U_FOLD_CASE_EXCLUDE_SPECIAL_I
Stable:
ICU 3.8

§ ucasemap_utf8ToLower()

int32_t ucasemap_utf8ToLower ( const UCaseMap csm,
char *  dest,
int32_t  destCapacity,
const char *  src,
int32_t  srcLength,
UErrorCode pErrorCode 
)

Lowercase the characters in a UTF-8 string.

Casing is locale-dependent and context-sensitive. The result may be longer or shorter than the original. The source string and the destination buffer must not overlap.

Parameters
csmUCaseMap service object.
destA buffer for the result string. The result will be NUL-terminated if the buffer is large enough. The contents is undefined in case of failure.
destCapacityThe size of the buffer (number of bytes). If it is 0, then dest may be NULL and the function will only return the length of the result without writing any of the result string.
srcThe original string.
srcLengthThe length of the original string. If -1, then src must be NUL-terminated.
pErrorCodeMust be a valid pointer to an error code value, which must not indicate a failure before the function call.
Returns
The length of the result string, if successful - or in case of a buffer overflow, in which case it will be greater than destCapacity.
See also
u_strToLower
Stable:
ICU 3.4

§ ucasemap_utf8ToTitle()

int32_t ucasemap_utf8ToTitle ( UCaseMap csm,
char *  dest,
int32_t  destCapacity,
const char *  src,
int32_t  srcLength,
UErrorCode pErrorCode 
)

Titlecase a UTF-8 string.

Casing is locale-dependent and context-sensitive. Titlecasing uses a break iterator to find the first characters of words that are to be titlecased. It titlecases those characters and lowercases all others. (This can be modified with ucasemap_setOptions().)

Note: This function takes a non-const UCaseMap pointer because it will open a default break iterator if no break iterator was set yet, and effectively call ucasemap_setBreakIterator(); also because the break iterator is stateful and will be modified during the iteration.

The titlecase break iterator can be provided to customize for arbitrary styles, using rules and dictionaries beyond the standard iterators. The standard titlecase iterator for the root locale implements the algorithm of Unicode TR 21.

This function uses only the setUText(), first(), next() and close() methods of the provided break iterator.

The result may be longer or shorter than the original. The source string and the destination buffer must not overlap.

Parameters
csmUCaseMap service object. This pointer is non-const! See the note above for details.
destA buffer for the result string. The result will be NUL-terminated if the buffer is large enough. The contents is undefined in case of failure.
destCapacityThe size of the buffer (number of bytes). If it is 0, then dest may be NULL and the function will only return the length of the result without writing any of the result string.
srcThe original string.
srcLengthThe length of the original string. If -1, then src must be NUL-terminated.
pErrorCodeMust be a valid pointer to an error code value, which must not indicate a failure before the function call.
Returns
The length of the result string, if successful - or in case of a buffer overflow, in which case it will be greater than destCapacity.
See also
u_strToTitle
U_TITLECASE_NO_LOWERCASE
U_TITLECASE_NO_BREAK_ADJUSTMENT
Stable:
ICU 3.8

§ ucasemap_utf8ToUpper()

int32_t ucasemap_utf8ToUpper ( const UCaseMap csm,
char *  dest,
int32_t  destCapacity,
const char *  src,
int32_t  srcLength,
UErrorCode pErrorCode 
)

Uppercase the characters in a UTF-8 string.

Casing is locale-dependent and context-sensitive. The result may be longer or shorter than the original. The source string and the destination buffer must not overlap.

Parameters
csmUCaseMap service object.
destA buffer for the result string. The result will be NUL-terminated if the buffer is large enough. The contents is undefined in case of failure.
destCapacityThe size of the buffer (number of bytes). If it is 0, then dest may be NULL and the function will only return the length of the result without writing any of the result string.
srcThe original string.
srcLengthThe length of the original string. If -1, then src must be NUL-terminated.
pErrorCodeMust be a valid pointer to an error code value, which must not indicate a failure before the function call.
Returns
The length of the result string, if successful - or in case of a buffer overflow, in which case it will be greater than destCapacity.
See also
u_strToUpper
Stable:
ICU 3.4