Сжатие текста в Твиттере
Rules
Your program must have two modes: encoding and decoding.
When encoding:
Your program must take as input some human readableLatin1
text, presumably English.
It doesn't matter if you ignore punctuation marks.
You only need to worry about actual English words, not L337.
Any accented letters may be converted to simple ASCII.
You may choose how you want to deal with numbers.
123
one two three
one hundred twenty three
123
1 2 3
one hundred twenty three
one two three
one hundred twenty three
123
1 2 3
Your program must output a message which can be represented in
140 code points in the range U+0000
–U+10FFFF
Excluding non-characters:
U+FFFE
U+FFFF
U+
n
FFFE
, U+
n
FFFF
where n
is 1
–10
hexadecimal
U+FDD0
–U+FDEF
U+D800
–U+DFFF
(surrogate code points).
It may be output in any reasonable encoding of your choice;
any encoding supported by GNU iconv
will be considered reasonable,
and your platform native encoding or locale encoding would likely be a good choice.
When decoding:
Your program should take as input the output of your encoding mode. The text output should be an approximation of the input text. The closer you can get to the original text, the better. Doesn't need to have any punctuation.The output text should be readable by a human, again presumably English.
Can be L337, or lol. The decoding process may have no access to any other output of the encoding process other than the output specified above; that is, you can't upload the text somewhere and output the URL for the decoding process to download, or anything silly like that. For the sake of consistency in user interface, your program must behave as follows: Your program must be a script that can be set to executable on a platform with the appropriate interpreter, or a program that can be compiled into an executable. Your program must take as its first argument eitherencode
or decode
to set the mode.
Your program must take input in at least one of the following ways:
Take input from standard in and produce output on standard out.
my-program encode <input.txt >output.utf
my-program decode <output.utf >output.txt
Take input from a file named in the second argument, and produce output in the file named in the third.
my-program encode input.txt output.utf
my-program decode output.utf output.txt
For your solution, please post:
Your code, in full, and/or a link to it hosted elsewhere
(if it's very long, or requires many files to compile, or something).
An explanation of how it works, if it's not immediately obvious from the code
or if the code is long and people will be interested in a summary.
An example text, with the original text, the text it compresses down to, and the decoded text.
If you are building on an idea that someone else had, please attribute them.
It's OK to try to do a refinement of someone else's idea, but you must attribute them.
Правила являются вариацией правил дляTwitter image encoding challenge.