[Gs-devel] RE: refine MDRC patch
Igor V. Melichev
igor at artifex.com
Sat Apr 14 08:03:36 PDT 2001
Dear Mr. Toshiya Suzuki,
> From: mpsuzuki at hiroshima-u.ac.jp
> Sent: 14 ????? 2001 ?. 16:08
> To: igor at artifex.com
> Cc: gs-devel at ghostscript.com; mpsuzuki at hiroshima-u.ac.jp
> Subject: refine MDRC patch
> For first, I have to note my terminology.
Thank you for the explanation. It looks that I wrote my remark
without enough understanding. Your terminology is fine.
> I wrote: make 1-byte prefix, and rest is used as key.
> So, many "ranges" to specify single characters aslike
OK, now I see. At first time I understood your code wrongly.
> However, of course, I'm not sure if this is good solution.
> Do you have any good idea?
I think that better algorithm is based on different 'merge' criterion.
Currently the decision about decomposition into prefix/key is taken
without knowledge about other ranges. The heuristic "1-byte prefix"
works for 2-byte codes, but I am not sure about longer codes.
If I write this from scratch, I would do in reverse order.
First I would compare 2 neighbor ranges and find maximal common prefix.
Then I take all consecutive ranges, which correspond to the prefix,
and generate keys for them. This requires 1 range lookahead.
This algorithm is not optimal also. A better result may be obtained
if consider trees of prefixes. Actually there are 3 trees :
the source one consists of 1 root node and multiple "range" nodes;
The output one consists of 1 root node, some "prefix" nodes and
some "key pair" nodes. The third tree (I would say "natural tree")
represent finite automaton, which take 1 byte per step.
CMap is some compact code for the automaton. Output "merged"
ranges are another compact representation of the automaton.
The question is what is "optimal" output tree. I believe that
compilation theory has answer for it, and mainly it is given
in LEXX. But I believe that all this is too academic and overcomplicated
for our simple problem, so as simple lookahead would give acceptable result.
Possibly your algorithm gives acceptable result for most cases -
I just have insufficient knowledge about specific existing CMaps.
Of cause, I can easy construct one, on which your algorithm fails (it can
exceed 64K limitation for key string length, if CMap uses 3-byte codes),
but this is not the issue. The criterion is 3d party CMaps to run
effectively.
--------------------------------------------------------------------
> >You suppose that the range [<0101> <0201>] covers 2 CIDs.
> >Why not 256 ones ? Where did you take this knowledge from ?
>
> I did small experience by Display PostScript on Solaris 2.8.
> For first, I defined a "/Z" CMap including one range specification
>
> 1 begincidrange
> <2121> <7E21> 6064
> endcidrange
I still insufficiently understand about this. In the first message
you wrote :
> In most CMap, range operators for character definition
> (begincidrange, endcidrange) are smaller than 8bit width
> range. So the limitation is not problem. But some of
> CMap uses begincodespace/endcodespacerange as 16bit width
> range, e.g. EUC-H
>
> 3 begincodespacerange
> <00> <80>
> <8EA0> <8EDF>
> <A1A1> <FEFE>
> endcodespacerange
Yes, I agree that "codespacerange" may contain multi-dimensional
ranges. But I never saw "cidrange" or "bfrange" except your /Z example
(being constructed artificially, and tested on Solaris only).
Do you have any 3d party CMaps with multi-dimensional
"cidrange" or "bfrange" ?
For "codespacerange" we don't need to compute "relpos", right ?
I believe that "gs_multidim_calc_relpos" should be never called
for them : if a text string code don't fall into any "cidrange"
or "bfrange", we only need to return CID 0. Thus, if you cannot
represent 3d party CMap, containing multi-dimensional "cidrange" or
"bfrange",
the function gs_multidim_calc_relpos appears irrelevant and unuseful.
Next, the policy of Artifex is to follow Adobe specs,
and enhance our product with compatibility to other systems
when it is useful for users. Since Adobe specs don't explain
about multi-dimensional cidranges, your proposal looks as
an enhancement with a feature of Solaris. Are you sure that
this feature is compatible with other systems ?
For instance, what happens if you example runs on Adobe Photoshop ?
At this moment I don't see enough reasons for including
gs_multidim_calc_relpos into Ghostscript. To my opinion,
a better solution would be to check for "non-multi-dim"
in code_map_decode_next_mdrc, "case CODE_VALUE_CID".
If I missed something, please explain. Also I would be happy if you run
your tests with other Adobe's products : Photoshop, Acrobat Reader,
Distiller.
> However, I could not find explicit specification for such case
> in Adobe documentations, so I wonder this is what Adobe thinks
> "should be so".
Another alternative is : Adobe thinks "this should never happen".
Igor.
More information about the gs-devel
mailing list