Консольный ввод / вывод Unicode в Haskell на Windows

Кажется довольно сложным заставить консольный ввод / вывод работать с символами Юникода в Haskell под окнами. Вот история о горе:

(Preliminary.) Before you even consider doing Unicode I/O in the console under windows, you need to make sure that you're using a console font which can render the characters you want. The raster fonts (the default) have infinitely poor coverage (and don't allow copy pasting of characters they can't represent), and the truetype options MS provides (consolas, lucida console) have not-great coverage (though these will allow copy/pasting of characters they cannot represent). You might consider installing DejaVu Sans Mono (follow the instructions at the bottom here; you may have to reboot before it works). Until this is sorted, no apps will be able to do much Unicode I/O; not just Haskell. Having done this, you will notice that some apps will be able to do console I/O under windows. But getting it to work remains quite complicated. There are basically two ways to write to the console under windows. (What follows is true for any language, not just Haskell; don't worry, Haskell will enter the picture in a bit!)... Option A is to use the usual c-library style byte-based i/o functions; the hope is that the OS will interpret these bytes according to some encoding which can encode all the weird and wonderful characters you want. For instance, using the equivalent technique on Mac OS X, where the standard system encoding is usually UTF8, this works great; you send out utf8 output, you see pretty symbols. On windows, it works less well. The default encoding that windows expects will generally not be an encoding covering all the Unicode symbols. So if you want to see pretty symbols this way, one way or another, you need to change the encoding. One possibility would be for your program to use the SetConsoleCP win32 command. (So then you need to bind to the Win32 library.) Or, if you'd rather not do that, you can expect your program's user to change the code page for you (they would then have to call the chcp command before they run your program). Option B is to use the Unicode-aware win32 console API commands like WriteConsoleW. Here you send UTF16 direct to windows, which renders it happily: there's no danger of an encoding mismatch because windows always expects UTF16 with these functions.

К сожалению, ни один из этих вариантов не работает очень хорошо от Haskell. Во-первых, нет известных мне библиотек, использующих вариант B, так что это не очень легко. Это оставляет вариант A. Если вы используете библиотеку ввода-вывода Haskell (putStrLn и так далее), вот что сделает библиотека. В современных версиях Haskell он будет тщательно спрашивать у окон, что такое текущая кодовая страница, и выводить ваши строки в правильной кодировке. У этого подхода есть две проблемы:

One is not a showstopper, but is annoying. As mentioned above, the default encoding will almost never encode the characters you want: you are the user need to change to an encoding which does. Thus your user needs to chcp cp65001 before they run your program (you may find it distasteful to force your users to do this). Or you need to bind to SetConsoleCP and do the equivalent inside your program (and then use hSetEncoding so that the Haskell libraries will send output using the new encoding), which means you need to wrap the relevant part of the win32 libraries to make them Haskell-visible. Much more seriously, there is a bug in windows (resolution: won't fix) which leads to a bug in Haskell which means that if you have selected any code page like cp65001 which can cover all of Unicode, Haskell's I/O routines will malfunction and fail. So essentially, even if you (or your user) set the encoding properly to some encoding which covers all the wonderful Unicode characters, and then 'do everything right' in telling Haskell to output things using that encoding, you still lose.

Ошибка, указанная выше, все еще не устранена и указана с низким приоритетом; основной вывод заключается в том, что вариант А (в моей классификации выше) не работает, и для получения надежных результатов нужно переключиться на вариант Б. Не ясно, какие сроки будут для этого решены, так как это выглядит как значительная работа.

Вопрос в том:in the meantime, can anyone suggest a workaround to allow the use of Unicode console I/O in Haskell under windows.

Смотрите также этозапись в базе данных отслеживания ошибок Python, решая ту же проблему в Python 3 (исправление предложено, но еще не принято в кодовой базе), иэтот ответ stackoverflow, давая обходной путь для этой проблемы в Python (на основе «варианта B» в моей классификации).

Ответы на вопрос(1)

Ваш ответ на вопрос