wstring, wofstream, and encodings

"Jeffrey Walton" wrote in message
>I'm attempting to write a wstring to a file by way of wofstream. I'm
>getting compression on the stream (I presume it is UTF-8). How/where
>do I invoke an alternate constructotor so that the stream stays wide
>I suspect that it is hidden in a locale, but I don't have much
>experience with them. I also have not been able to locate it in
>Stroustrup: Appendix D: Locales [1]. [1] does state the following, but
>I do not have section 21.7: "Section 21.7 describes how to change
>locale for a stream; this appendix describes how a locale is
>constructed out of facets and explains the mechanisms through which a
>locale affects its stream."

From my understanding of iostream, locales will not be the answer.

Locales apply to the upper layer of the iostream, which takes
care of converting values to characters. They affect the
choice of the characters used to represent a value, but not
the encoding of these characters.

The internal filebuf or basic_filebuf is the object that will
determine how the in-memory characters are wirtten to a file.
This is the layer (the stream *buffer*) that can define whether
a file is written using UTF8 or another character encoding.

However, the C++ standard does not specify an interface allowing
to select what character encoding is to be used by (w)filebuf.

Your best bet would be to ask your question on a platform-
specific forum, related to the library implementation you use.
A specific wfilebuf (/basic_filebuf) implementation may
allow you to specify the file's enocding style.
Or maybe this is configurable at an OS or C library level.
Worst case, you will still be able to write your own streambuf
layer to write files using the specific encoding you want.

I hope this helps...

Posted On: Wednesday 7th of November 2012 11:42:58 AM Total Views:  539
View Complete with Replies

Related Messages:

Streaming a wstring to a wofstream   (306 Views)
Is there ever any case in which streaming a wstring to a wofstream should fail to write all the characters in the wstring to the file In code, built with VC9 in Windows, I have: std::wstring astring; // Code that fills the wstring with about 210,000 wide characters. // I can verify the size of this in the debugger. std::wofstream wofs("somefile",std::ios_base:ut | std::ios_base::binary); wofs
Map with string key and wstring[] value   (259 Views)
Hi! I'd like to have a map with string keys, and wstring[] (arrays) values in c++. I thought map would work out, but it doesn't (at least I think so). Additionally I'd like to know what's the easiest way to populate such a map. Would - mymap["somekey"] = {L"My Unicode value one", L"My other unicode value"}; - work Many
std::wstringbuf and imbue to convert from utf-8 to wchar_t?   (252 Views)
I have an API that returns UTF-8 encoded strings. I have a utf8 codevt facet available to do the conversion from UTF-8 to wchar_t encoding defined by the platform. I have no trouble converting when a UTF-8 encoded string comes from file - I just create a std::wifstream and imbue it with a locale that uses the utf-8 facet for std::locale::ctype. Then I just use operator>> to get wstring properly decoded from UTF-8. I thought I could create something similar for std::wstringstream or std::wstringbuf, but I have a hard time with it. I imagine the situation that if a std::wstringstream is imbued with UTF-8, then it stored an array of char (not wchar_t) which is encoded with UTF-8. I can push to it or get from it wide string like I like, and the result is encoded in UTF-8 in some internal buffer. What I now need is to be able to supply my UTF-8 buffer prefilled with the values I need in UTF-8 to act as the internal UTF-8 encoded buffer for the std::wstingbuf, and then call operator>>(..., std::wstring &), to get the wide-string representation converted from the UTF-8 to the proper wide encoding. Also while I am at it, I would like to know the reverse - how to get this internal UTF-8 encoded buffer (so I can push wstrings into it as I like and get a "char *" encoded in UTF-8). Sample code (how I would imagine it): char name[] = "Boris Du" "\xc5\xa1" "ek"; // my name - Boris Duek std::wstringstream conv; conv.rdbuf()->pubsetcharbuf(name, 11); // pubsetbuf only accepts "wchar_t *", not "char *" std::wstring wname; conv >> wname; // now my name should be properly decoded from UTF-8