My Training Period: xx hours.
Locales and Code Pages Stories
|
Locale categories are manifest constants used by the localization routines to specify which portion of a program's locale information will be used. The following Table lists the locale category.
Locale category | Parts of program affected |
LC_ALL | All locale-specific behavior (all categories). |
LC_COLLATE | Behavior ofstrcoll() and strxfrm() functions. |
LC_CTYPE | Behavior of character-handling functions (exceptisdigit(),isxdigit(),mbstowcs(), and mbtowc(), which are unaffected). |
LC_MAX | Same asLC_TIME. |
LC_MIN | Same asLC_ALL. |
LC_MONETARY | Monetary formatting information returned by the localeconv() function |
LC_NUMERIC | Decimal-point character for formatted output routines (for example,printf()), data conversion routines, and non-monetary formatting information returned by localeconv() function. |
LC_TIME | Behavior ofstrftime() function. |
Table 7: Locale category. |
The following Table lists other functions and constants that dependent on locale.
function/constant | Use | setlocale category setting dependence |
atof(),atoi(),atol() | Convert character to floating-point, integer, or long integer value, respectively. | LC_NUMERIC |
is, isw Routines | Test given integer for particular condition. The functions include: isalnum(),iswalnum(),isalpha(),iswalpha(),__isascii(),iswascii(),_isatty(),iscntrl(), iswcntrl(), __iscsym(),__iscsymf(),isdigit(),iswdigit(),isgraph(),iswgraph(),isleadbyte(),islower(),iswlower(),isprint(),iswprint(),ispunct(),iswpunct(),isspace(),iswspace(),isupper(),iswupper(),iswctype(),isxdigit(),iswxdigit(). | LC_CTYPE |
isleadbyte() | Test for lead byte. | LC_CTYPE |
localeconv() | Read appropriate values for formatting numeric quantities. | LC_MONETARY, LC_NUMERIC |
MB_CUR_MAX | Maximum length in bytes of any multibyte character in current locale (macro defined instdlib.h) | LC_CTYPE |
_mbccpy() | Copy one multibyte character. | LC_CTYPE1 |
_mbclen() | Return length, in bytes, of given multibyte character. | LC_CTYPE1 |
mblen() | Validate and return number of bytes in multibyte character. | LC_CTYPE1 |
_mbstrlen() | For multibyte-character strings: validate each character in string; return string length. | LC_CTYPE1 |
mbstowcs() | Convert sequence of multibyte characters to corresponding sequence of wide characters. | LC_CTYPE1 |
mbtowc() | Convert multibyte character to corresponding wide character. | LC_CTYPE1 |
printf() | Write formatted output. | LC_NUMERIC(determines radix character output) |
scanf() | Read formatted input. | LC_NUMERIC(determines radix character recognition) |
setlocale(),_wsetlocale() | Select locale for program. | Not applicable |
strcoll(),wcscoll() | Compare characters of two strings. | LC_COLLATE |
_stricmp(),_wcsicmp(),_mbsicmp() | Compare two strings without regard to case. | LC_CTYPE1 |
_stricoll(), _wcsicoll() | Compare characters of two strings (case insensitive). | LC_COLLATE |
_strncoll(), _wcsncoll() | Compare firstn characters of two strings. | LC_COLLATE |
_strnicmp(), _wcsnicmp(),_mbsnicmp() | Compare characters of two strings without regard to case. | LC_CTYPE1 |
_strnicoll(), _wcsnicoll() | Compare first n characters of two strings (case insensitive). | LC_COLLATE |
strftime(),wcsftime() | Format date and time value according to suppliedformat argument. | LC_TIME |
_strlwr() | Convert, in place, each uppercase letter in given string to lowercase. | LC_CTYPE |
strtod(),wcstod(),strtol(),wcstol(),strtoul(), wcstoul() | Convert character string todouble,long, orunsigned long value. | LC_NUMERIC(determines radix character recognition) |
_strupr() | Convert, in place, each lowercase letter in string to uppercase. | LC_CTYPE |
strxfrm(),wcsxfrm() | Transform string into collated form according to locale. | LC_COLLATE |
tolower(),towlower() | Convert given character to corresponding lowercase character. | LC_CTYPE |
toupper(),towupper() | Convert given character to corresponding uppercase letter. | LC_CTYPE |
wcstombs() | Convert sequence of wide characters to corresponding sequence of multibyte characters. | LC_CTYPE |
wctomb() | Convert wide character to corresponding multibyte character. | LC_CTYPE |
_wtoi(),_wtol() | Convert wide-character string toint orlong. | LC_NUMERIC |
Table 8: Locale dependant functions. |
1 For multibyte routines, the multibyte code page must be equivalent to the locale set with setlocale(). _setmbcp(), with an argument of _MB_CP_LOCALE makes the multibyte code page the same as the setlocale code page.
The setlocale() function used to set, change, or query some or all of the current program locale information specified by locale and category.
Item | Description |
Function | setlocale(),_wsetlocale(). |
Use | To change or query some or all of the current program locale information. |
Prototype | char *setlocale(int category, const char *locale); wchar_t *_wsetlocale(int category, const wchar_t *locale); |
Parameters | category - Category affected by locale. locale - Locale name. |
Return value | If a valid locale and category are given, returns a pointer to the string associated with the specified locale and category. If the locale or category is invalid, returns a null pointer and the current locale settings of the program are not changed. For example, the call:
Sets all categories, returning only the string The null pointer is a special directive that tells setlocale() to query rather than set the international environment. For example, the sequence of calls:
Will return:
Which is the string associated with theLC_ALL category. You can use the string pointer returned bysetlocale() in subsequent calls to restore that part of the program's locale information, assuming that your program does not alter the pointer or the string. Later calls tosetlocale() overwrite the string; you can use_strdup() to save a specific locale string. |
Include file | <locale.h> or <wchar.h> for _wsetlocale(). |
Table 9: setlocale(),_wsetlocale() information. |
|
LC_ALL | All categories, as listed below. |
LC_COLLATE | Thestrcoll(),_stricoll(),wcscoll(),_wcsicoll(),strxfrm(),_strncoll(),_strnicoll(),_wcsncoll(),_wcsnicoll(), andwcsxfrm() functions. |
LC_CTYPE | The character-handling functions exceptisdigit(),isxdigit(),mbstowcs() and mbtowc(), which are unaffected. |
LC_MONETARY | Monetary-formatting information returned by the localeconv() function. |
LC_NUMERIC | Decimal-point character for the formatted output routines (such asprintf()), for the data-conversion routines, and for the non-monetary-formatting information returned bylocaleconv(). In addition to the decimal-point character,LC_NUMERIC also sets the thousands separator and the grouping control string returned by localeconv(). |
LC_TIME | Thestrftime() and wcsftime() functions. |
Table 10: category macros. |
The locale argument is a pointer to a string that specifies the name of the locale. Iflocale points to an empty string, the locale is the implementation-defined native environment.
A value of C specifies the minimal ANSI conforming environment for C translation. The C locale assumes that allchar data types are 1 byte and that their value is always less than 256.
The C locale is the only locale supported in Microsoft Visual C++ version 1.0 and earlier versions of Microsoft C/C++. At program startup, the equivalent of the following statement is executed:
setlocale(LC_ALL, "C");
The locale argument takes the following form:
locale::"lang[_country_region[.code_page]]" | ".code_page" | "" | NULL
The set of available languages, country/region codes, and code pages includes all those supported by the Win32 NLS API. The set of language and country/region codes supported by setlocale() can be found at: Language strings and Country/Region strings.
If locale is a null pointer, setlocale() queries, rather than sets, the international environment, and returns a pointer to the string associated with the specifiedcategory. The program's current locale setting is not changed.
For example:
setlocale(LC_ALL, NULL);
Returns the string associated with category.
The following Table lists examples pertain to the LC_ALL category. Either of the strings ".OCP" and ".ACP" can be used in place of a code page number to specify use of the user-default OEM code page and user-default ANSI code page, respectively.
Examples |
setlocale( LC_ALL, "" ); Sets the locale to the default, which is the user-default ANSI code page obtained from the operating system. |
setlocale( LC_ALL, ".OCP" ); Explicitly sets the locale to the current OEM code page obtained from the operating system. |
setlocale( LC_ALL, ".ACP" ); Sets the locale to the ANSI code page obtained from the operating system. |
setlocale( LC_ALL, "[lang_ctry]" ); Sets the locale to the language and country/region indicated, using the default code page obtained from the host operating system. |
setlocale( LC_ALL, "[lang_ctry.cp]" ); Sets the locale to the language, country/region, and code page indicated in the[lang_ctry.cp] string. You can use various combinations of language, country/region, and code page. For example:
setlocale( LC_ALL, "French_Canada.1252" ); // Set code page to French Canada ANSI default setlocale( LC_ALL, "French_Canada.ACP" ); // Set code page to French Canada OEM default setlocale( LC_ALL, "French_Canada.OCP" ); |
setlocale( LC_ALL, "[lang]" ); Sets the locale to the country/region indicated, using the default country/region for the language specified, and the user-default ANSI code page for that country/region as obtained from the host operating system. For example, the following two calls to setlocale are functionally equivalent:
setlocale( LC_ALL, "English" ); setlocale( LC_ALL, "English_United States.1252" ); |
setlocale( LC_ALL, "[.code_page]" ); Sets the code page to the value indicated, using the default country/region and language (as defined by the host operating system) for the specified code page. The category must be eitherLC_ALL or LC_CTYPE to effect a change of code page. For example, if the default country/region and language of the host operating system are "United States" and "English," the following two calls tosetlocale() are functionally equivalent:
setlocale( LC_ALL, ".1252" ); setlocale( LC_ALL, "English_United States.1252"); |
Table 11: LC_ALL category examples. |
The following is a simple example that demonstrates the setlocale().
/* sets the current locale to "Italian" and "French" using the setlocale() function. */
#include <stdio.h>
#include <locale.h>
#include <time.h>
int main()
{
time_t ltime;
struct tm *testime;
unsigned char locstr[100];
/* set the locale to Italian */
setlocale(LC_ALL, "italian");
time(<ime);
testime = gmtime(<ime);
/* %#x is the long date representation, appropriate to the current locale */
if(!strftime((char *)locstr, 100, "%#x", (const struct tm *)testime))
printf("strftime failed!\n");
else
printf("In Italian locale, strftime returns \"%s\"\n", locstr);
/* Set the locale to French */
setlocale(LC_ALL, "french");
time(<ime);
testime = gmtime(<ime);
/* %#x is the long date representation, appropriate to the current locale */
if(!strftime((char *)locstr, 100, "%#x", (const struct tm *)testime))
printf("strftime failed!\n");
else
printf("In French locale, strftime returns \"%s\"\n", locstr);
/* set the locale back to the default environment */
setlocale(LC_ALL, "C");
time(<ime);
testime = gmtime(<ime);
printf("Back to default...\n");
if(!strftime((char *)locstr, 100, "%#x", (const struct tm *)testime))
printf("strftime failed!\n");
else
printf("In 'C' locale, strftime returns \"%s\"\n", locstr);
return 0;
}
A sample output:
In Italian locale, strftime returns "sabato 18 giugno 2005"
In French locale, strftime returns "samedi 18 juin 2005"
Back to default...
In 'C' locale, strftime returns "Saturday, June 18, 2005"
Press any key to continue
A sample output run using VC++ 2005.
This section will explain briefly about the locale implementation using Standard C++ Library.
You have to check Part III, Standard Template Library of this Tutorial and MSDN documentation on how to use this template based library.
The <locale> defines template classes and functions that C++ programs can use to encapsulate and manipulate different cultural conventions regarding the representation and formatting of numeric, monetary, and calendric data, including internationalization support for character classification and string collation.
You must put the following include in order to use the locale functionalities in C++.
#include <locale>
Functions available in the <locale> are listed below.
Function | Description |
has_facet() | Tests if a particular facet is stored in a specified locale. |
isalnum() | Tests whether an element in a locale is an alphabetic or a numeric character. |
isalpha() | Tests whether an element in a locale is alphabetic character. |
iscntrl() | Tests whether an element in a locale is a control character. |
isdigit() | Tests whether an element in a locale is a numeric character. |
isgraph() | Tests whether an element in a locale is an alphanumeric or punctuation character. |
islower() | Tests whether an element in a locale is lower case. |
isprint() | Tests whether an element in a locale is a printable character. |
ispunct() | Tests whether an element in a locale is a punctuation character. |
isspace() | Tests whether an element in a locale is a whitespace character. |
isupper() | Tests whether an element in a locale is upper case. |
isxdigit() | Tests whether an element in a locale is a character used to represent a hexadecimal number. |
tolower() | Converts a character to lower case. |
toupper() | Converts a character to upper case. |
use_facet() | Returns a reference to a facet of a specified type stored in a locale. |
Table 12: Standard C++ <locale> member functions. |
The classes available in <locale> are listed below.
codecvt() | A template class that provides a facet used to convert between internal and external character encodings. |
codecvt_base() | A base class for the codecvt class that is used to define an enumeration type referred to asresult, used as the return type for the facet member functions to indicate the result of a conversion. |
codecvt_byname() | A derived template class that describes an object that can serve as a collate facet of a given locale, enabling the retrieval of information specific to a cultural area concerning conversions. |
collate() | A collate template class that provides a facet that handles string sorting conventions. |
collate_byname() | A derived template class that describes an object that can serve as a collate facet of a given locale, enabling the retrieval of information specific to a cultural area concerning string sorting conventions. |
ctype() | A template class that provides a facet that is used to classify characters, convert from upper- and lowercase and between the native character set and that set used by the locale. |
ctype<char> | A class that is an explicit specialization of template classctype<CharType> to type char, describing an object that can serve as a locale facet to characterize various properties of a character of type char. |
ctype_base() | A base class for the ctype class that is used to define enumeration types used to classify or test characters either individually or within entire ranges. |
ctype_byname() | A derived template class that describes an object that can serve as a ctype facet of a given locale, enabling the classification of characters and conversion of characters between case and native and locale specified character sets. |
locale() | A class that describes a locale object that encapsulates culture-specific information as a set of facets that collectively define a specific localized environment. |
messages() | A template class that describes an object that can serve as a locale facet to retrieve localized messages from a catalog of internationalized messages for a given locale. |
messages_base() | A base class that describes anint type for the catalog of messages. |
messages_byname() | A derived template class that describes an object that can serve as a message facet of a given locale, enabling the retrieval of localized messages. |
money_base() | A base class for the ctype class that is used to define enumeration types used to classify or test characters either individually or within entire ranges. |
money_get() | A template class that describes an object that can serve as a locale facet to control conversions of sequences of type CharType to monetary values. |
money_put() | A template class that describes an object that can serve as a locale facet to control conversions of monetary values to sequences of typeCharType. |
moneypunct() | A template class that describes an object that can serve as a locale facet to describe the sequences of typeCharType used to represent a monetary input field or a monetary output field. |
moneypunct_byname() | A derived template class that describes an object that can serve as a moneypunct facet of a given locale enabling the formatting monetary input or output fields. |
num_get() | A template class that describes an object that can serve as a locale facet to control conversions of sequences of type CharType to numeric values. |
num_put() | A template class that describes an object that can serve as a locale facet to control conversions of numeric values to sequences of typeCharType. |
numpunct() | A template class that describes an object that can serve as a local facet to describe the sequences of typeCharType used to represent information about the formatting and punctuation of numeric and Boolean expressions. |
numpunct_byname() | A derived template class that describes an object that can serve as amoneypunct facet of a given locale enabling the formatting and punctuation of numeric and Boolean expressions. |
time_base() | A class that serves as a base class for facets of template classtime_get, defining just the enumerated typedateorder and several constants of this type. |
time_get() | A template class that describes an object that can serve as a locale facet to control conversions of sequences of type CharType |
time_get_byname() | A derived template class that describes an object that can serve as a locale facet of typetime_get<CharType, InputIterator>. |
time_put() | A template class that describes an object that can serve as a locale facet to control conversions of time values to sequences of typeCharType. |
time_put_byname() | A derived template class that describes an object that can serve as a locale facet of typetime_put<CharType, OutputIterator>. |
Table 13: Standard C++ <locale> classes. |
Further reading and digging:
Check the best selling C, C++ and Windows books at Amazon.com.
Microsoft C references, online MSDN.
Microsoft Visual C++, online MSDN.
ReactOS - Windows binary compatible OS - C/C++ source code repository, Doxygen.
For more program examples please refer toWindows Users & Groups Win32 programming (Implementation).
Structure, enum, union and typedef story can be found at C/C++ struct, enum, union & typedef.
Linux Access Control Lists (ACL) info can be found atAccess Control Lists.
For Unicode and character set reference that contains functions, structures, macros and constants:Unicode and character set reference (MSDN).
Notation used in MSDN is Hungarian Notation instead of CamelCase and is discussedWindows programming notations.
Windows data type information is inWindows data types used in Win32 programming.