[Bug-gs] RE : ASCII85-Decoding: Wrong Odds??
Danny Boelens
Danny.Boelens@pandora.be
Sun, 24 Feb 2002 03:23:38 +0100
Hi all,
I'm not following this mailinglist (In fact I don't know much about gs and
ps at all), but by some coincidence I just read the thread
'ASCII85-Decoding: Wrong Odds ??' and I want to share some
things about this...
The problem seems to be that "<~0eOS8De+-4@<-H~>" results in "1. Bodyparu".
I just felt this was an interesting problem and had a look at it. Now, after
a small investigation of ASCII85, I think I know what causes this behaviour,
and I think "1. Bodyparu" is the wrong output, it should be "1. Bodypart".
Therefore, any decoder that returns "1. Bodyparu" has a bug in my opinion.
I'll try to explain.
Therefore, let's decode that ASCII85 encoded data. The first 2 blocks of 5
bytes are no problem, and result in "1. Bodyp". The problem is situated at
the end, in the partial block of 4 bytes. Let's decode this in some more
detail.
The 4 bytes represented by "@<-H" are 64 60 45 72 (DEC notation).
According to the ASCII85 spec, we subtract 33 from each byte, resulting in
31 27 12 39. Since we only have a partial block, we should add 5-n zeros,
where n is the number of bytes in the partial block.
So we add one zero, resulting in the 5-byte block 31 27 12 39 0 which we
can decode in the normal way. Let's do this.
We should interprete these 5 numbers as base-85 digits, and convert them to
4 bytes, base-256. One can check that this results in the 4 base-256 digits
97 114 116 13. Now comes the tricky part. Since we had a partial block with
n bytes (in our case n=4) we should only keep the n-1 = 3 first bytes since
the original input of the ASCII85 encode had only 3 bytes. Before encoding,
the encoder added a zero byte to these 3 bytes. As one can see, the fourth
byte we get after the base conversion is not zero but 13 !! We're about to
ignore the 13 in this way and it looks like we're throwing away information
! However, when our original input was 3 bytes, we _know_ we added a zero
byte, and encoded the resulting 4 bytes to 5 bytes, of which we kept 4. Most
likely, the fifth byte, the one that was lost, was not zero !! What we know
for sure is that it was a positive number, let's call it 'k', smaller then
85 (0 <= k < 85), since it was a base-85 digit. So when we added the zero
byte before decoding, we made an error, since we shouldn't have added
zero, but 'k'. Well, if you get this far, you have the solution at hand I
think. Since 0 <= k < 85 holds, the error we made by adding a zero instead
of 'k' is one that makes the resulting number, 97 114 116 13 (interpreted in
base-256) smaller then it probably was originally. Remember, originally the
fourth digit (the 13 here) was zero. I assume most developers realize this,
and they check the last digit. When this one is zero, we have a perfect
match and we got our original data. When the last digit isn't 0, like our 13
here, they should make it 0 (because we know it was a zero originally) and
the only way to do that is to compensate for the error made by adding 0
instead of k (but that was the only thing we could do, since the digit k was
thrown away while encoding, so we can't know the value of k). So they make
the number bigger and bigger, until the last digit becomes 0. Due to the
overflow, the last but one digit increments. In our case : 97 114 116 13 ->
97 114 117 0 and there you have it. By using your ASCII table, you'll see
that 97 114 117 results in "aru"....
This only explains why a lot of decoders get "1. Bodyparu". Why is this
wrong ? Well, if you paid attention, you'll remember that 0 <= k < 85, so
the maximum error we made by ignoring digit k is 84 (interpreted in
base-10). Adding 84 to 97 114 116 13 results in 97 114 116 97. The
original number was _not_ bigger then this, that's for sure. So it is
impossible that it was 97 114 117 0, the number we got after 'correcting'
the output of the base conversion step. Therefore, it must have been 97 114
116 0, what results in "art", and the total output after decoding becomes
"1. Bodypart"...
Who is wrong ? Well, I think both the encoder and the decoder ;-) First of
all, the encoder is wrong, because the encoded data is wrong. If you do it
correctly, 97 114 116 0 (base-256) converts to 31 27 12 38 72 (base-85, so
our 'k' was here 72) and after ignoring the last byte, 72, and adding 33 to
the remaining bytes, we get the values 64 60 45 71 what are the DEC
representations of the ASCII characters @, <, - and G respectively. So the
correct encoded data should be "<~0eOS8De+-4@<-G~>". On the other hand the
decoder is wrong because he returns the wrong result. I know the encoded
data was wrong, but in this case he could have checked that "1. Bodyparu"
was impossible....
Just a few thoughts of me...
If you have any comment or question, you can e-mail me at
Danny.Boelens@pandora.be since I don't follow this mailinglist ;-)
Kind regards,
Danny
/***************************************/
Proud member of the Dutch Power Cows
The #1 team in RC5-64, ECCp-109 and OGR
Join the #1 team in distributed computing too !
Resistance is futil, you will be assimilated !
www.dutchpowercows.org
/***************************************/