Thanks for the picture to clarify.
This is what I get with Lister under German Win 8.1 (same settings as yours, except the font, which it didn't let me change):
- lister.png (10.18 KiB) Viewed 7978 times
The major goal is to have a reproducible and predictable system, and character sets and code pages always have been a very muddy topic, so let me elaborate.
So far I only found one reference on the web, that shows the special symbols (such as smilies and dingbats) of DOS/OEM codepages for Windows CP1252 is this:
http://beacon.chebucto.ca/Back/A9908/img/vga_1252.gif
All other references do not show any glyphs for the control characters.
If you draw the control characters, too, even if they are declared by Windows as non-printable, you get this:
- Windows-1251.png (55.82 KiB) Viewed 7987 times
I checked an old Win95 virtual machine: notepad showed only box glyphs for the control characters for all fonts, and the command line always used a DOS/OEM codepage (and you cannot switch to cp1252 as you can in modern Windows).
From all this information I guess that Windows-1251 (or Windows-1252 on a Western Windows) is used, but the control characters are displayed as in a DOS/OEM codepage.
So actually it is a mix of two codepages.
All tests with Windows 8.1 cmd.exe do not show such a mixed output, neither with the DOS code page, nor the cp1252, nor did Win95, as mentioned above.
So it seems to be pretty non standard to combine the two codepages and draw as shown in the screenshot.
Regarding fonts having their own encoding, yes, that's true. For example "Terminal" is encoded as "OEM/DOS".
If you select that font, it will render all as in "OEM/DOS" encoding, instead of the one selected in HxD.
The disadvantage is, that when you try to copy a string and paste it in an edit field (such as for the search function), it will look differently from what you see displayed in the hex editor, because the Terminal font has the wrong encoding. So actually, I would have to enforce the proper encoding, to make sure all works consistently.
But this is not the case for Courier New. It is an OpenType font and is encoded as Unicode, and should render Windows-1252 as defined.
In conclusion, I think the system as it is is correct. Rendering only the control characters when the MB_USEGLYPHCHARS option of MultiByteToWideChar() returns printable characters, as defined by GetStringTypeW(), will produce reliable results, and create strings that can be copy and pasted properly.
Mixing code pages/charsets also does not seem to be a good idea, because it gives surprising results, and more importantly, MultiByteToWideChar does not support a reliable translation for these bytes.
MultiByteToWideChar just leaves them untouched (same byte values 0-31) and does not translate them to their corresponding Unicode codepoints (which exist). So you get a rendering that has nothing to do with the string data you provide, and therefore, it is reasonable to exclude these characters from rendering when using ANSI charsets. The result is basically random, as can be seen comparing our outputs (I also used the Russian code page).
What needs to be changed is dealing with fonts that do not support the requested encoding, possibly excluding them from the list of selectable fonts, or changing their encoding if possible (Windows will substitute missing glyphs).
Some more relevants links:
All font encodings map to a known codepage, see here:
https://msdn.microsoft.com/en-us/library/cc194829.aspx
About differences between Unicode and ANSI versions of TextOut functions when fonts have special code pages:
http://stackoverflow.com/questions/2138 ... rset-displ
TextOutA does however print the same glyphs in our case. So it's probably dependent on what Windows version you use and possibly other locale specific things beyond code pages (of font or string).
For completeness, some screenshots with different code pages:
Under a German Windows and using Courier New (I guess that's the font you meant?) I get the following results, using the Cyrillic codepages I know, which are: Windows-1251, IBM855, cp866
If you draw all characters, including control characters (and additionally enable the MB_USEGLYPHCHARS option), this is what you get:
- Windows-1251.png (55.82 KiB) Viewed 7987 times
- IBM855.png (56.48 KiB) Viewed 7987 times
- cp866.png (56.12 KiB) Viewed 7987 times