UnicodeSet Class Reference

A mutable set of Unicode characters and multicharacter strings. More...

#include <uniset.h>

Inheritance diagram for UnicodeSet:

UnicodeFilter UnicodeFunctor UnicodeMatcher UObject UMemory

Public Types

enum  { MIN_VALUE = 0, MAX_VALUE = 0x10ffff }

Public Member Functions

UBool isBogus (void) const
 Determine if this object contains a valid set.
void setToBogus ()
 Make this UnicodeSet object invalid.
 UnicodeSet ()
 Constructs an empty set.
 UnicodeSet (UChar32 start, UChar32 end)
 Constructs a set containing the given range.
 UnicodeSet (const UnicodeString &pattern, UErrorCode &status)
 Constructs a set from the given pattern.
 UnicodeSet (const UnicodeString &pattern, uint32_t options, const SymbolTable *symbols, UErrorCode &status)
 Constructs a set from the given pattern.
 UnicodeSet (const UnicodeString &pattern, ParsePosition &pos, uint32_t options, const SymbolTable *symbols, UErrorCode &status)
 Constructs a set from the given pattern.
 UnicodeSet (const UnicodeSet &o)
 Constructs a set that is identical to the given UnicodeSet.
virtual ~UnicodeSet ()
 Destructs the set.
UnicodeSetoperator= (const UnicodeSet &o)
 Assigns this object to be a copy of another.
virtual UBool operator== (const UnicodeSet &o) const
 Compares the specified object with this set for equality.
UBool operator!= (const UnicodeSet &o) const
 Compares the specified object with this set for equality.
virtual UnicodeFunctorclone () const
 Returns a copy of this object.
virtual int32_t hashCode (void) const
 Returns the hash code value for this set.
UBool isFrozen () const
 Determines whether the set has been frozen (made immutable) or not.
UnicodeFunctorfreeze ()
 Freeze the set (make it immutable).
UnicodeFunctorcloneAsThawed () const
 Clone the set and make the clone mutable.
UnicodeSetset (UChar32 start, UChar32 end)
 Make this object represent the range start - end.
UnicodeSetapplyPattern (const UnicodeString &pattern, UErrorCode &status)
 Modifies this set to represent the set specified by the given pattern, optionally ignoring white space.
UnicodeSetapplyPattern (const UnicodeString &pattern, uint32_t options, const SymbolTable *symbols, UErrorCode &status)
 Modifies this set to represent the set specified by the given pattern, optionally ignoring white space.
UnicodeSetapplyPattern (const UnicodeString &pattern, ParsePosition &pos, uint32_t options, const SymbolTable *symbols, UErrorCode &status)
 Parses the given pattern, starting at the given position.
virtual UnicodeStringtoPattern (UnicodeString &result, UBool escapeUnprintable=FALSE) const
 Returns a string representation of this set.
UnicodeSetapplyIntPropertyValue (UProperty prop, int32_t value, UErrorCode &ec)
 Modifies this set to contain those code points which have the given value for the given binary or enumerated property, as returned by u_getIntPropertyValue.
UnicodeSetapplyPropertyAlias (const UnicodeString &prop, const UnicodeString &value, UErrorCode &ec)
 Modifies this set to contain those code points which have the given value for the given property.
virtual int32_t size (void) const
 Returns the number of elements in this set (its cardinality).
virtual UBool isEmpty (void) const
 Returns true if this set contains no elements.
virtual UBool contains (UChar32 c) const
 Returns true if this set contains the given character.
virtual UBool contains (UChar32 start, UChar32 end) const
 Returns true if this set contains every character of the given range.
UBool contains (const UnicodeString &s) const
 Returns true if this set contains the given multicharacter string.
virtual UBool containsAll (const UnicodeSet &c) const
 Returns true if this set contains all the characters and strings of the given set.
UBool containsAll (const UnicodeString &s) const
 Returns true if this set contains all the characters of the given string.
UBool containsNone (UChar32 start, UChar32 end) const
 Returns true if this set contains none of the characters of the given range.
UBool containsNone (const UnicodeSet &c) const
 Returns true if this set contains none of the characters and strings of the given set.
UBool containsNone (const UnicodeString &s) const
 Returns true if this set contains none of the characters of the given string.
UBool containsSome (UChar32 start, UChar32 end) const
 Returns true if this set contains one or more of the characters in the given range.
UBool containsSome (const UnicodeSet &s) const
 Returns true if this set contains one or more of the characters and strings of the given set.
UBool containsSome (const UnicodeString &s) const
 Returns true if this set contains one or more of the characters of the given string.
int32_t span (const UChar *s, int32_t length, USetSpanCondition spanCondition) const
 Returns the length of the initial substring of the input string which consists only of characters and strings that are contained in this set (USET_SPAN_CONTAINED, USET_SPAN_SIMPLE), or only of characters and strings that are not contained in this set (USET_SPAN_NOT_CONTAINED).
int32_t spanBack (const UChar *s, int32_t length, USetSpanCondition spanCondition) const
 Returns the start of the trailing substring of the input string which consists only of characters and strings that are contained in this set (USET_SPAN_CONTAINED, USET_SPAN_SIMPLE), or only of characters and strings that are not contained in this set (USET_SPAN_NOT_CONTAINED).
int32_t spanUTF8 (const char *s, int32_t length, USetSpanCondition spanCondition) const
 Returns the length of the initial substring of the input string which consists only of characters and strings that are contained in this set (USET_SPAN_CONTAINED, USET_SPAN_SIMPLE), or only of characters and strings that are not contained in this set (USET_SPAN_NOT_CONTAINED).
int32_t spanBackUTF8 (const char *s, int32_t length, USetSpanCondition spanCondition) const
 Returns the start of the trailing substring of the input string which consists only of characters and strings that are contained in this set (USET_SPAN_CONTAINED, USET_SPAN_SIMPLE), or only of characters and strings that are not contained in this set (USET_SPAN_NOT_CONTAINED).
virtual UMatchDegree matches (const Replaceable &text, int32_t &offset, int32_t limit, UBool incremental)
 Implement UnicodeMatcher::matches().
virtual void addMatchSetTo (UnicodeSet &toUnionTo) const
 Implementation of UnicodeMatcher API.
int32_t indexOf (UChar32 c) const
 Returns the index of the given character within this set, where the set is ordered by ascending code point.
UChar32 charAt (int32_t index) const
 Returns the character at the given index within this set, where the set is ordered by ascending code point.
virtual UnicodeSetadd (UChar32 start, UChar32 end)
 Adds the specified range to this set if it is not already present.
UnicodeSetadd (UChar32 c)
 Adds the specified character to this set if it is not already present.
UnicodeSetadd (const UnicodeString &s)
 Adds the specified multicharacter to this set if it is not already present.
UnicodeSetaddAll (const UnicodeString &s)
 Adds each of the characters in this string to the set.
UnicodeSetretainAll (const UnicodeString &s)
 Retains EACH of the characters in this string.
UnicodeSetcomplementAll (const UnicodeString &s)
 Complement EACH of the characters in this string.
UnicodeSetremoveAll (const UnicodeString &s)
 Remove EACH of the characters in this string.
virtual UnicodeSetretain (UChar32 start, UChar32 end)
 Retain only the elements in this set that are contained in the specified range.
UnicodeSetretain (UChar32 c)
 Retain the specified character from this set if it is present.
virtual UnicodeSetremove (UChar32 start, UChar32 end)
 Removes the specified range from this set if it is present.
UnicodeSetremove (UChar32 c)
 Removes the specified character from this set if it is present.
UnicodeSetremove (const UnicodeString &s)
 Removes the specified string from this set if it is present.
virtual UnicodeSetcomplement (void)
 Inverts this set.
virtual UnicodeSetcomplement (UChar32 start, UChar32 end)
 Complements the specified range in this set.
UnicodeSetcomplement (UChar32 c)
 Complements the specified character in this set.
UnicodeSetcomplement (const UnicodeString &s)
 Complement the specified string in this set.
virtual UnicodeSetaddAll (const UnicodeSet &c)
 Adds all of the elements in the specified set to this set if they're not already present.
virtual UnicodeSetretainAll (const UnicodeSet &c)
 Retains only the elements in this set that are contained in the specified set.
virtual UnicodeSetremoveAll (const UnicodeSet &c)
 Removes from this set all of its elements that are contained in the specified set.
virtual UnicodeSetcomplementAll (const UnicodeSet &c)
 Complements in this set all elements contained in the specified set.
virtual UnicodeSetclear (void)
 Removes all of the elements from this set.
UnicodeSetcloseOver (int32_t attribute)
 Close this set over the given attribute.
virtual UnicodeSetremoveAllStrings ()
 Remove all strings from this set.
virtual int32_t getRangeCount (void) const
 Iteration method that returns the number of ranges contained in this set.
virtual UChar32 getRangeStart (int32_t index) const
 Iteration method that returns the first character in the specified range of this set.
virtual UChar32 getRangeEnd (int32_t index) const
 Iteration method that returns the last character in the specified range of this set.
int32_t serialize (uint16_t *dest, int32_t destCapacity, UErrorCode &ec) const
 Serializes this set into an array of 16-bit integers.
virtual UnicodeSetcompact ()
 Reallocate this objects internal structures to take up the least possible space, without changing this object's value.
virtual UClassID getDynamicClassID (void) const
 Implement UnicodeFunctor API.

Static Public Member Functions

static UBool resemblesPattern (const UnicodeString &pattern, int32_t pos)
 Return true if the given position, in the given pattern, appears to be the start of a UnicodeSet pattern.
static UnicodeSetcreateFrom (const UnicodeString &s)
 Makes a set from a multicharacter string.
static UnicodeSetcreateFromAll (const UnicodeString &s)
 Makes a set from each of the characters in the string.
static UClassID getStaticClassID (void)
 Return the class ID for this class.

Friends

class USetAccess
class UnicodeSetIterator

Detailed Description

A mutable set of Unicode characters and multicharacter strings.

Objects of this class represent character classes used in regular expressions. A character specifies a subset of Unicode code points. Legal code points are U+0000 to U+10FFFF, inclusive.

The UnicodeSet class is not designed to be subclassed.

UnicodeSet supports two APIs. The first is the operand API that allows the caller to modify the value of a UnicodeSet object. It conforms to Java 2's java.util.Set interface, although UnicodeSet does not actually implement that interface. All methods of Set are supported, with the modification that they take a character range or single character instead of an Object, and they take a UnicodeSet instead of a Collection. The operand API may be thought of in terms of boolean logic: a boolean OR is implemented by add, a boolean AND is implemented by retain, a boolean XOR is implemented by complement taking an argument, and a boolean NOT is implemented by complement with no argument. In terms of traditional set theory function names, add is a union, retain is an intersection, remove is an asymmetric difference, and complement with no argument is a set complement with respect to the superset range MIN_VALUE-MAX_VALUE

The second API is the applyPattern()/toPattern() API from the java.text.Format-derived classes. Unlike the methods that add characters, add categories, and control the logic of the set, the method applyPattern() sets all attributes of a UnicodeSet at once, based on a string pattern.

Pattern syntax

Patterns are accepted by the constructors and the applyPattern() methods and returned by the toPattern() method. These patterns follow a syntax similar to that employed by version 8 regular expression character classes. Here are some simple examples:

[] No characters
[a] The character 'a'
[ae] The characters 'a' and 'e'
[a-e] The characters 'a' through 'e' inclusive, in Unicode code point order
[\u4E01] The character U+4E01
[a{ab}{ac}] The character 'a' and the multicharacter strings "ab" and "ac"
[\p{Lu}] All characters in the general category Uppercase Letter

Any character may be preceded by a backslash in order to remove any special meaning. White space characters, as defined by UCharacter.isWhitespace(), are ignored, unless they are escaped.

Property patterns specify a set of characters having a certain property as defined by the Unicode standard. Both the POSIX-like "[:Lu:]" and the Perl-like syntax "\\p{Lu}" are recognized. For a complete list of supported property patterns, see the User's Guide for UnicodeSet at http://icu-project.org/userguide/unicodeSet.html. Actual determination of property data is defined by the underlying Unicode database as implemented by UCharacter.

Patterns specify individual characters, ranges of characters, and Unicode property sets. When elements are concatenated, they specify their union. To complement a set, place a '^' immediately after the opening '['. Property patterns are inverted by modifying their delimiters; "[:^foo]" and "\\P{foo}". In any other location, '^' has no special meaning.

Ranges are indicated by placing two a '-' between two characters, as in "a-z". This specifies the range of all characters from the left to the right, in Unicode order. If the left character is greater than or equal to the right character it is a syntax error. If a '-' occurs as the first character after the opening '[' or '[^', or if it occurs as the last character before the closing ']', then it is taken as a literal. Thus "[a\-b]", "[-ab]", and "[ab-]" all indicate the same set of three characters, 'a', 'b', and '-'.

Sets may be intersected using the '&' operator or the asymmetric set difference may be taken using the '-' operator, for example, "[[:L:]&[\\u0000-\\u0FFF]]" indicates the set of all Unicode letters with values less than 4096. Operators ('&' and '|') have equal precedence and bind left-to-right. Thus "[[:L:]-[a-z]-[\\u0100-\\u01FF]]" is equivalent to "[[[:L:]-[a-z]]-[\\u0100-\\u01FF]]". This only really matters for difference; intersection is commutative.

>[a]The set containing 'a'
>[a-z]The set containing 'a' through 'z' and all letters in between, in Unicode order
>[^a-z]The set containing all characters but 'a' through 'z', that is, U+0000 through 'a'-1 and 'z'+1 through U+10FFFF
>[[pat1][pat2]] The union of sets specified by pat1 and pat2
>[[pat1]&[pat2]] The intersection of sets specified by pat1 and pat2
>[[pat1]-[pat2]] The asymmetric difference of sets specified by pat1 and pat2
>[:Lu:] or \p{Lu} The set of characters having the specified Unicode property; in this case, Unicode uppercase letters
>[:^Lu:] or \P{Lu} The set of characters not having the given Unicode property

Warning: you cannot add an empty string ("") to a UnicodeSet.

Formal syntax

pattern :=  ('[' '^'? item* ']') | property
item :=  char | (char '-' char) | pattern-expr
pattern-expr :=  pattern | pattern-expr pattern | pattern-expr op pattern
op :=  '&' | '-'
special :=  '[' | ']' | '-'
char :=  any character that is not special
| ('\'
any character)
| ('\u' hex hex hex hex)
hex :=  any character for which Character.digit(c, 16) returns a non-negative result
property :=  a Unicode property set pattern

Legend:
a := b   a may be replaced by b
a? zero or one instance of a
a* one or more instances of a
a | b either a or b
'a' the literal string between the quotes

Note:

Author:
Alan Liu
Stable:
ICU 2.0

Definition at line 272 of file uniset.h.


Member Enumeration Documentation

anonymous enum

Enumerator:
MIN_VALUE  Minimum value that can be stored in a UnicodeSet.

Stable:
ICU 2.4
MAX_VALUE  Maximum value that can be stored in a UnicodeSet.

Stable:
ICU 2.4

Definition at line 332 of file uniset.h.


Constructor & Destructor Documentation

UnicodeSet::UnicodeSet (  ) 

Constructs an empty set.

Stable:
ICU 2.0

UnicodeSet::UnicodeSet ( UChar32  start,
UChar32  end 
)

Constructs a set containing the given range.

If end > start then an empty set is created.

Parameters:
start first character, inclusive, of range
end last character, inclusive, of range
Stable:
ICU 2.4

UnicodeSet::UnicodeSet ( const UnicodeString pattern,
UErrorCode status 
)

Constructs a set from the given pattern.

See the class description for the syntax of the pattern language.

Parameters:
pattern a string specifying what characters are in the set
status returns U_ILLEGAL_ARGUMENT_ERROR if the pattern contains a syntax error.
Stable:
ICU 2.0

UnicodeSet::UnicodeSet ( const UnicodeString pattern,
uint32_t  options,
const SymbolTable symbols,
UErrorCode status 
)

Constructs a set from the given pattern.

See the class description for the syntax of the pattern language.

Parameters:
pattern a string specifying what characters are in the set
options bitmask for options to apply to the pattern. Valid options are USET_IGNORE_SPACE and USET_CASE_INSENSITIVE.
symbols a symbol table mapping variable names to values and stand-in characters to UnicodeSets; may be NULL
status returns U_ILLEGAL_ARGUMENT_ERROR if the pattern contains a syntax error.
Internal:
Do not use. This API is for internal use only.

UnicodeSet::UnicodeSet ( const UnicodeString pattern,
ParsePosition pos,
uint32_t  options,
const SymbolTable symbols,
UErrorCode status 
)

Constructs a set from the given pattern.

See the class description for the syntax of the pattern language.

Parameters:
pattern a string specifying what characters are in the set
pos on input, the position in pattern at which to start parsing. On output, the position after the last character parsed.
options bitmask for options to apply to the pattern. Valid options are USET_IGNORE_SPACE and USET_CASE_INSENSITIVE.
symbols a symbol table mapping variable names to values and stand-in characters to UnicodeSets; may be NULL
status input-output error code
Stable:
ICU 2.8

UnicodeSet::UnicodeSet ( const UnicodeSet o  ) 

Constructs a set that is identical to the given UnicodeSet.

Stable:
ICU 2.0

virtual UnicodeSet::~UnicodeSet (  )  [virtual]

Destructs the set.

Stable:
ICU 2.0


Member Function Documentation

UBool UnicodeSet::isBogus ( void   )  const [inline]

Determine if this object contains a valid set.

A bogus set has no value. It is different from an empty set. It can be used to indicate that no set value is available.

Returns:
TRUE if the set is valid, FALSE otherwise
See also:
setToBogus()
Draft:
This API may be changed in the future versions and was introduced in ICU 4.0

Definition at line 1560 of file uniset.h.

void UnicodeSet::setToBogus (  ) 

Make this UnicodeSet object invalid.

The string will test TRUE with isBogus().

A bogus set has no value. It is different from an empty set. It can be used to indicate that no set value is available.

This utility function is used throughout the UnicodeSet implementation to indicate that a UnicodeSet operation failed, and may be used in other functions, especially but not exclusively when such functions do not take a UErrorCode for simplicity.

See also:
isBogus()
Draft:
This API may be changed in the future versions and was introduced in ICU 4.0

UnicodeSet& UnicodeSet::operator= ( const UnicodeSet o  ) 

Assigns this object to be a copy of another.

A frozen set will not be modified.

Stable:
ICU 2.0

virtual UBool UnicodeSet::operator== ( const UnicodeSet o  )  const [virtual]

Compares the specified object with this set for equality.

Returns true if the two sets have the same size, and every member of the specified set is contained in this set (or equivalently, every member of this set is contained in the specified set).

Parameters:
o set to be compared for equality with this set.
Returns:
true if the specified set is equal to this set.
Stable:
ICU 2.0

Referenced by operator!=().

UBool UnicodeSet::operator!= ( const UnicodeSet o  )  const [inline]

Compares the specified object with this set for equality.

Returns true if the specified set is not equal to this set.

Stable:
ICU 2.0

Definition at line 1540 of file uniset.h.

References operator==().

virtual UnicodeFunctor* UnicodeSet::clone (  )  const [virtual]

Returns a copy of this object.

All UnicodeFunctor objects have to support cloning in order to allow classes using UnicodeFunctors, such as Transliterator, to implement cloning. If this set is frozen, then the clone will be frozen as well. Use cloneAsThawed() for a mutable clone of a frozen set.

See also:
cloneAsThawed
Stable:
ICU 2.0

Implements UnicodeFunctor.

virtual int32_t UnicodeSet::hashCode ( void   )  const [virtual]

Returns the hash code value for this set.

Returns:
the hash code value for this set.
See also:
Object::hashCode()
Stable:
ICU 2.0

UBool UnicodeSet::isFrozen (  )  const [inline]

Determines whether the set has been frozen (made immutable) or not.

See the ICU4J Freezable interface for details.

Returns:
TRUE/FALSE for whether the set has been frozen
See also:
freeze

cloneAsThawed

Stable:
ICU 3.8

Definition at line 1544 of file uniset.h.

References NULL.

UnicodeFunctor* UnicodeSet::freeze (  ) 

Freeze the set (make it immutable).

Once frozen, it cannot be unfrozen and is therefore thread-safe until it is deleted. See the ICU4J Freezable interface for details. Freezing the set may also make some operations faster, for example contains() and span(). A frozen set will not be modified. (It remains frozen.)

Returns:
this set.
See also:
isFrozen

cloneAsThawed

Stable:
ICU 3.8

UnicodeFunctor* UnicodeSet::cloneAsThawed (  )  const

Clone the set and make the clone mutable.

See the ICU4J Freezable interface for details.

Returns:
the mutable clone
See also:
freeze

isFrozen

Stable:
ICU 3.8

UnicodeSet& UnicodeSet::set ( UChar32  start,
UChar32  end 
)

Make this object represent the range start - end.

If end > start then this object is set to an an empty range. A frozen set will not be modified.

Parameters:
start first character in the set, inclusive
end last character in the set, inclusive
Stable:
ICU 2.4

static UBool UnicodeSet::resemblesPattern ( const UnicodeString pattern,
int32_t  pos 
) [static]

Return true if the given position, in the given pattern, appears to be the start of a UnicodeSet pattern.

Stable:
ICU 2.4

UnicodeSet& UnicodeSet::applyPattern ( const UnicodeString pattern,
UErrorCode status 
)

Modifies this set to represent the set specified by the given pattern, optionally ignoring white space.

See the class description for the syntax of the pattern language. A frozen set will not be modified.

Parameters:
pattern a string specifying what characters are in the set
status returns U_ILLEGAL_ARGUMENT_ERROR if the pattern contains a syntax error. Empties the set passed before applying the pattern.
Returns:
a reference to this
Stable:
ICU 2.0

UnicodeSet& UnicodeSet::applyPattern ( const UnicodeString pattern,
uint32_t  options,
const SymbolTable symbols,
UErrorCode status 
)

Modifies this set to represent the set specified by the given pattern, optionally ignoring white space.

See the class description for the syntax of the pattern language. A frozen set will not be modified.

Parameters:
pattern a string specifying what characters are in the set
options bitmask for options to apply to the pattern. Valid options are USET_IGNORE_SPACE and USET_CASE_INSENSITIVE.
symbols a symbol table mapping variable names to values and stand-ins to UnicodeSets; may be NULL
status returns U_ILLEGAL_ARGUMENT_ERROR if the pattern contains a syntax error. Empties the set passed before applying the pattern.
Returns:
a reference to this
Internal:
Do not use. This API is for internal use only.

UnicodeSet& UnicodeSet::applyPattern ( const UnicodeString pattern,
ParsePosition pos,
uint32_t  options,
const SymbolTable symbols,
UErrorCode status 
)

Parses the given pattern, starting at the given position.

The character at pattern.charAt(pos.getIndex()) must be '[', or the parse fails. Parsing continues until the corresponding closing ']'. If a syntax error is encountered between the opening and closing brace, the parse fails. Upon return from a successful parse, the ParsePosition is updated to point to the character following the closing ']', and a StringBuffer containing a pairs list for the parsed pattern is returned. This method calls itself recursively to parse embedded subpatterns. Empties the set passed before applying the pattern. A frozen set will not be modified.

Parameters:
pattern the string containing the pattern to be parsed. The portion of the string from pos.getIndex(), which must be a '[', to the corresponding closing ']', is parsed.
pos upon entry, the position at which to being parsing. The character at pattern.charAt(pos.getIndex()) must be a '['. Upon return from a successful parse, pos.getIndex() is either the character after the closing ']' of the parsed pattern, or pattern.length() if the closing ']' is the last character of the pattern string.
options bitmask for options to apply to the pattern. Valid options are USET_IGNORE_SPACE and USET_CASE_INSENSITIVE.
symbols a symbol table mapping variable names to values and stand-ins to UnicodeSets; may be NULL
status returns U_ILLEGAL_ARGUMENT_ERROR if the pattern contains a syntax error.
Returns:
a reference to this
Stable:
ICU 2.8

virtual UnicodeString& UnicodeSet::toPattern ( UnicodeString result,
UBool  escapeUnprintable = FALSE 
) const [virtual]

Returns a string representation of this set.

If the result of calling this function is passed to a UnicodeSet constructor, it will produce another set that is equal to this one. A frozen set will not be modified.

Parameters:
result the string to receive the rules. Previous contents will be deleted.
escapeUnprintable if TRUE then convert unprintable character to their hex escape representations, \uxxxx or \Uxxxxxxxx. Unprintable characters are those other than U+000A, U+0020..U+007E.
Stable:
ICU 2.0

Implements UnicodeMatcher.

UnicodeSet& UnicodeSet::applyIntPropertyValue ( UProperty  prop,
int32_t  value,
UErrorCode ec 
)

Modifies this set to contain those code points which have the given value for the given binary or enumerated property, as returned by u_getIntPropertyValue.

Prior contents of this set are lost. A frozen set will not be modified.

Parameters:
prop a property in the range UCHAR_BIN_START..UCHAR_BIN_LIMIT-1 or UCHAR_INT_START..UCHAR_INT_LIMIT-1 or UCHAR_MASK_START..UCHAR_MASK_LIMIT-1.
value a value in the range u_getIntPropertyMinValue(prop).. u_getIntPropertyMaxValue(prop), with one exception. If prop is UCHAR_GENERAL_CATEGORY_MASK, then value should not be a UCharCategory, but rather a mask value produced by U_GET_GC_MASK(). This allows grouped categories such as [:L:] to be represented.
ec error code input/output parameter
Returns:
a reference to this set
Stable:
ICU 2.4

UnicodeSet& UnicodeSet::applyPropertyAlias ( const UnicodeString prop,
const UnicodeString value,
UErrorCode ec 
)

Modifies this set to contain those code points which have the given value for the given property.

Prior contents of this set are lost. A frozen set will not be modified.

Parameters:
prop a property alias, either short or long. The name is matched loosely. See PropertyAliases.txt for names and a description of loose matching. If the value string is empty, then this string is interpreted as either a General_Category value alias, a Script value alias, a binary property alias, or a special ID. Special IDs are matched loosely and correspond to the following sets:
"ANY" = [\u0000-\U0010FFFF], "ASCII" = [\u0000-\u007F], "Assigned" = [:^Cn:].

Parameters:
value a value alias, either short or long. The name is matched loosely. See PropertyValueAliases.txt for names and a description of loose matching. In addition to aliases listed, numeric values and canonical combining classes may be expressed numerically, e.g., ("nv", "0.5") or ("ccc", "220"). The value string may also be empty.
ec error code input/output parameter
Returns:
a reference to this set
Stable:
ICU 2.4

virtual int32_t UnicodeSet::size ( void   )  const [virtual]

Returns the number of elements in this set (its cardinality).

Note than the elements of a set may include both individual codepoints and strings.

Returns:
the number of elements in this set (its cardinality).
Stable:
ICU 2.0

virtual UBool UnicodeSet::isEmpty ( void   )  const [virtual]

Returns true if this set contains no elements.

Returns:
true if this set contains no elements.
Stable:
ICU 2.0

virtual UBool UnicodeSet::contains ( UChar32  c  )  const [virtual]

Returns true if this set contains the given character.

This function works faster with a frozen set.

Parameters:
c character to be checked for containment
Returns:
true if the test condition is met
Stable:
ICU 2.0

Implements UnicodeFilter.

virtual UBool UnicodeSet::contains ( UChar32  start,
UChar32  end 
) const [virtual]

Returns true if this set contains every character of the given range.

Parameters:
start first character, inclusive, of the range
end last character, inclusive, of the range
Returns:
true if the test condition is met
Stable:
ICU 2.0

UBool UnicodeSet::contains ( const UnicodeString s  )  const

Returns true if this set contains the given multicharacter string.

Parameters:
s string to be checked for containment
Returns:
true if this set contains the specified string
Stable:
ICU 2.4

virtual UBool UnicodeSet::containsAll ( const UnicodeSet c  )  const [virtual]

Returns true if this set contains all the characters and strings of the given set.

Parameters:
c set to be checked for containment
Returns:
true if the test condition is met
Stable:
ICU 2.4

UBool UnicodeSet::containsAll ( const UnicodeString s  )  const

Returns true if this set contains all the characters of the given string.

Parameters:
s string containing characters to be checked for containment
Returns:
true if the test condition is met
Stable:
ICU 2.4

UBool UnicodeSet::containsNone ( UChar32  start,
UChar32  end 
) const

Returns true if this set contains none of the characters of the given range.

Parameters:
start first character, inclusive, of the range
end last character, inclusive, of the range
Returns:
true if the test condition is met
Stable:
ICU 2.4

Referenced by containsSome().

UBool UnicodeSet::containsNone ( const UnicodeSet c  )  const

Returns true if this set contains none of the characters and strings of the given set.

Parameters:
c set to be checked for containment
Returns:
true if the test condition is met
Stable:
ICU 2.4

UBool UnicodeSet::containsNone ( const UnicodeString s  )  const

Returns true if this set contains none of the characters of the given string.

Parameters:
s string containing characters to be checked for containment
Returns:
true if the test condition is met
Stable:
ICU 2.4

UBool UnicodeSet::containsSome ( UChar32  start,
UChar32  end 
) const [inline]

Returns true if this set contains one or more of the characters in the given range.

Parameters:
start first character, inclusive, of the range
end last character, inclusive, of the range
Returns:
true if the condition is met
Stable:
ICU 2.4

Definition at line 1548 of file uniset.h.

References containsNone().

UBool UnicodeSet::containsSome ( const UnicodeSet s  )  const [inline]

Returns true if this set contains one or more of the characters and strings of the given set.

Parameters:
s The set to be checked for containment
Returns:
true if the condition is met
Stable:
ICU 2.4

Definition at line 1552 of file uniset.h.

References containsNone().

UBool UnicodeSet::containsSome ( const UnicodeString s  )  const [inline]

Returns true if this set contains one or more of the characters of the given string.

Parameters:
s string containing characters to be checked for containment
Returns:
true if the condition is met
Stable:
ICU 2.4

Definition at line 1556 of file uniset.h.

References containsNone().

int32_t UnicodeSet::span ( const UChar *  s,
int32_t  length,
USetSpanCondition  spanCondition 
) const

Returns the length of the initial substring of the input string which consists only of characters and strings that are contained in this set (USET_SPAN_CONTAINED, USET_SPAN_SIMPLE), or only of characters and strings that are not contained in this set (USET_SPAN_NOT_CONTAINED).

See USetSpanCondition for details. Similar to the strspn() C library function. Unpaired surrogates are treated according to contains() of their surrogate code points. This function works faster with a frozen set and with a non-negative string length argument.

Parameters:
s start of the string
length of the string; can be -1 for NUL-terminated
spanCondition specif