[Gs-devel] refine MDRC patch
mpsuzuki at hiroshima-u.ac.jp
mpsuzuki at hiroshima-u.ac.jp
Sat Apr 14 05:07:30 PDT 2001
Dear Mr. Igor,
Now I'm refining my MDRC (multi-dimensional range comparator) patch,
I have to ask a few questions for better implementation.
# MDRC patch is really my first programming in C,
# I'm an oldman living in Fortran77 world :-).
Best Wishes,
mpsuzuki
=======================================================================
For first, I have to note my terminology.
If there's any difference with GS codes,
it means that I'm misunderstanding GS,
so please correct.
code:
the byte string written in CMap, and passed by endcidrange etc.
In following CMap,
1 begincidrange
<2121> <217E> 0
endcidrange
I call <2121> as code_lo, and <217E> as code_hi.
prefix:
In the internal expression of CMap for GS,
"code" is decomposed into two parts.
The common preceding bytes in the codes
are called as "prefix".
key:
The valiable bytes following to the "prefix" in the codes
are called as "key".
value:
The CID offset (in above example, "0"),
glyph name and strings (in CMap for rearranged font)
to be paired with "key" are called as "value".
----------------------------------------------------------------------
My implementation makes prefix as common longest bytes of <code_lo>
and <code_hi>, when <code_lo> != <code_hi>.
When <code_lo> == <code_hi>, how I make prefix & key ?
As a basic procedure, gs_cmap.ps generates 5 items (prefix, parameter,
key, value, map number (0=space, 1=defined CID, 2=undefined CID)) for
each range.
When decoded prefix and length of key/value are same with previous range,
(original) gs_cmap.ps appends current range specification into the items
of previous range. I call it as "merge". In following, I call the ranges
with same prefix and same length of key/value as "merge-able".
To decrease the size of /TempMaps (and, /.CodeMapData based on it),
I tried to maximize merge-ability as far as possible. When I wrote
a code ignoring merge-ability, some CMap (Chinese EUC) makes
/.TempMaps overflew.
For <code_lo> == <code_hi> case, if I make prefix = <code_lo>
(using same method for <code_lo> != <code_hi> case),
the resulted range is not merge-able at all,
because the resulted prefix is unique.
I wrote: make 1-byte prefix, and rest is used as key.
So, many "ranges" to specify single characters aslike
<a1a1> <a1a1> 0
<a1a2> <a1a2> 10
<a1a3> <a1a3> 5
<a1a4> <a1a4> 6
....
can be merged as the ranges with common <a1> prefix.
I hard-wired "making 1-byte prefix" for <code_lo> == <code_hi>,
it does not work well for 1-byte range specification, and
I made a special branch for 1-byte range specification,
to treat as 0-byte prefix and 1-byte key.
However, of course, I'm not sure if this is good solution.
Do you have any good idea?
--------------------------------------------------------------------
>You suppose that the range [<0101> <0201>] covers 2 CIDs.
>Why not 256 ones ? Where did you take this knowledge from ?
I did small experience by Display PostScript on Solaris 2.8.
For first, I defined a "/Z" CMap including one range specification
1 begincidrange
<2121> <7E21> 6064
endcidrange
Then, I composed a font as
/Ryumin-Light-Z /Z
/Ryumin-Light-H findfont /FDepVector get
composefont
and, I printed a few characters by this font.
100 scalefont setfont
50 50 moveto
<2121> show
<2122> show
<2221> show
<2222> show
<2321> show
For <2121>, <2221> and <2321>, corresponding kanjis are printed,
but for <2122> and <2222>, full-width space (default glyph for
undefined kanji characters) are printed out.
>From this result, I thought <0101> <0201> means only <0101> and <0201>.
However, I could not find explicit specification for such case
in Adobe documentations, so I wonder this is what Adobe thinks
"should be so".
More information about the gs-devel
mailing list