[Gs-devel] multi-dimensional range comparator patch for new CMap handler (Re: new CMap handler has several bugs and instability.)

Igor V. Melichev igor at artifex.com
Wed Apr 11 09:26:39 PDT 2001


Dear Mr. Suzuki,


I reviewed your "big" patch for GS.

In general it appears good. At least I could not find important bugs.
But since GS code is used by multiple users, I have multiple remarks,
which have to be processed before we apply it to GS code.

My remarks may be grouped several categories :

1. "Don't understand". This means that I have insufficient knowledge
   about the subject, or you insufficiently commented your code so
   as I am unable to reconstruct your ideas. I need your assistance
   to clarify them.

2. "Coding style". GS sets very strong restrictions to coding style.
   See doc/c-style.htm and doc/ps-style.htm, being supplied with GS.

3. "inaccuracy". Some expressions to be simplified for better understanding.

4. "Formatting style".
   See doc/c-style.htm and doc/ps-style.htm .

5. "Spelling". Rather I am not native English speaker,
   I detected few misprints.

I believe that it would be too hard to order my remarks according to
these categories and to process them in this order. Due to this I
list them in the textual order of your patch.

In the rest of this message I copy text from your patch without
inserting quotation symbols ("> " at beginning of lines),
and insert my remarks between your lines, shifting them
with 4 spaces. I don't specify category for my remarks,
because they are obvious.

I would very appreciate if you process my remarks,
and send improved patch to us. Please feel free to argue
everything what you want, except the coding style.
Also please answer my questions whenever they appear.




diff -Brub lib.orig/gs_cmap.ps lib/gs_cmap.ps
--- lib.orig/gs_cmap.ps Wed Nov 29 16:10:27 2000
+++ lib/gs_cmap.ps      Wed Apr  4 16:03:42 2001
@@ -23,6 +23,22 @@

 % ---------------- Public operators ---------------- %

+/.rewriteTempMapsNotDef {
+  DEBUG { (rewriting TempMapsNotDef\n) print flush } if
+  .TempMaps 2 get
+  dup length 0 gt {
+    0 get
+    DEBUG { (...original...\n) print flush } if
+    1 5 2 index length 1 sub {
+      { 1 index exch get 2 3 put } stopped
+      { DEBUG { (cannot rewrite\n) print flush } if }
+      { DEBUG { (rewrite\n) print flush } if } ifelse
+    } for
+  } if
+  pop
+  DEBUG { (...FINISHED...\n) print } if
+} bind def
+
 % composefont doesn't appear in CMap files -- it's documented in
 % the "PostScript Language Reference Manual Supplement".
 /composefont {         % <name> <cmap|cmapname> <fonts> composefont <font>
@@ -89,11 +105,19 @@
   /CodeMap null def            % for .buildcmap
 } bind def
 /endcmap {             % - endcmap -
+  .rewriteTempMapsNotDef
+  DEBUG {
+    (*** defined charmap ***\n) print
+    .TempMaps 1 get {exch == (\t) print ==} forall
+    (*** undefined charmap ***\n) print
+    .TempMaps 2 get {exch == (\t) print ==} forall
+  } if
   10 dict begin 0 1 2 {
     /i exch def
                % Append data from .TempMaps to .CodeMapData.
     /t .TempMaps i get def
     .CodeMapData i get length t { exch pop length add } forall
+    DEBUG { (requested array size ) print dup == } if
     array /a exch def
     a 0 .CodeMapData i get .putmore
     0 1 t length 1 sub {
@@ -273,34 +297,93 @@
   counttomark 3 idiv {
     counttomark -3 roll                % process in correct order
                % Construct prefix, params, key_lo, key_hi, value,
font_index
-    3 1 roll dup length 1 eq {
-      () 3 1 roll      % prefix
-      <01 01 00 02>    % params
-      3 1 roll         % keys
-      concatstrings 4 -1 roll .endmapvalue
-    } {
+    3 1 roll   % Stack: <cid_base> <code_lo> <code_hi>

     Please unify the terminology. Old code used the word 'key' for this.
     New one sometimes uses "code', sometimes - 'key'.
     We need to be closer to old one.


+    dup 2 index eq
+     {  % if <code_lo> == <code_hi>


     This looks as incompatible change.
     The old code creates "duplicated" keys only for key length = 1.
     The new one creates "duplicated" keys for some other keys.
     This looks ineffective and not well reasoned.
     Moreover, I did not understand, why do you need to
     change the old branch for length = 1.


+        length 255 gt
+        {(too long coderange specification\n) print stop} if

    Don't understanmd why do you need this check here.
    If CMap is assumed to follow Adobe spec, it is not needed.
    If CMap is not assumed to follow Adobe spec,
    much more checks to be done : conflicting ranges,
    degenerate ranges, length(code_lo) != length(code_hi), and so on.
    My opinion is that it only overloads the code with unrelated things.


+       dup 0 1 getinterval                     % code_lo 1byte_prefix
+       exch                                    % prefix code_lo
+       dup length 1 sub 1 exch getinterval     % prefix key
+       % here, for 1-byte case, prefix = code, key = ()
+       % for mergeable expression, prefix = (), key = code is better.

    The comment is hard to understand. What is "mergable expression" ?
    Why "better" ? Actually it must be kept compatible with the
corresponding
    portion of C code - please say this.


+       dup () eq
+       2 index length 1 eq
+       and { exch } if
+       % twice key, for "range" style expression
+       dup length dup 2 mul string dup dup     % key len d_key d_key d_key
+       4 index 0 exch putinterval              % key len d_key d_key
+       2 index                                 % key len d_key d_key len
+       5 -1 roll                               % len d_key d_key len key
+       putinterval                             % len d_key

    d_key construction may be done much simpler - see the old code
    for the branch "length = 1". Also the operator 'concatstring' helps.


+       exch                                    % d_key len
+       4 string dup 0 <00 01 00 02> putinterval dup % d_key len par par

    Simpler code :
        <00 01 00 02> 4 srting copy dup


+       3 -1 roll                               % d_key par par len
+       0 exch put      % Stack: cid_base prefix key params
+       exch            % Stack: cid_base prefix params key
+       4 -1 roll       % Stack: prefix params key cid_base

    Please unify notation for stack comments.
    Sometimes you insert "Stack:", sometimes you don't.
    It's better don't insert whenever.


+        DEBUG { (---single range calculated---\n) print } if
+        DEBUG { dup == 1 index == 2 index == 3 index == } if
+        % (...\n) print
+        .endmapvalue
+     }
+     { % if <code_lo> != <code_hi>
+        % (---endrange_called---\n) print pstack
                % Stack: cid_base code_lo code_hi
-               % Hack: handle 16-bit single-range mappings specially.
-      counttomark 3 eq 1 index length 2 eq and {
-       () 3 1 roll     % prefix
-       <02 01 00 02>   % params
-       3 1 roll        % keys
-       concatstrings 4 -1 roll .endmapvalue
-      } {

    Why don't unite both alternatives ?
    I believe that only 'length = 1' needs a special processing
    with "duplicating" keys. If length > 1, your <code_lo> == <code_hi>
    branch appears to be partial case of <code_lo> != <code_hi>.



-       exch dup dup length 1 sub 0 exch getinterval    % prefix
+        0
+        {
+           % (---looking for prefix---) print dup ==
+           % code_lo code_hi i
+           dup dup      % code_lo code_hi i i i
+           4 index exch % code_lo code_hi i i code_lo i
+           0 exch       % code_lo code_hi i i code_lo 0 i
+           getinterval  % code_lo code_hi i i head_lo

    This line produces lot of garbage in memory !
    Please rewrite this cycle without 'getinterval' -
    use 'get' for extracting consequtive bytes from both strings,
    then compare them as integers.

+           exch         % code_lo code_hi i head_lo i
+           3 index exch % code_lo code_hi i head_lo code_hi i
+           0 exch       % code_lo code_hi i head_lo code_hi 0 i
+           getinterval  % code_lo code_hi i head_lo head_hi

    Same as above.


+           eq           % code_lo code_hi i -bool-
+           3 index length       % code_lo code_hi i -bool- -code_len-
+           2 index gt   % code_lo code_hi i -bool- -bool-
+           and          % code_lo code_hi i -bool-

    Bad notation in comments :
    -bool- looks as notation for PS operators. Please remove '-'.


+           { % -true-   % code_lo code_hi i
+             1 add
+           }
+           { % -false-  % code_lo code_hi i
+             exit
+           } ifelse
+        } loop
+        1 sub           % i is last TRIED length (not SUCCESS length)
+        1 index exch 0 exch getinterval
                        % Stack: cid_base code_hi code_lo prefix
-       <01 01 00 02>   % params
-       3 -1 roll dup length 1 sub 1 getinterval        % key_lo
-       4 -1 roll dup length 1 sub 1 getinterval        % key_hi
-       concatstrings
-       4 -1 roll .endmapvalue
+        dup length 3 index length exch sub
+                        % cid_base code_hi code_lo prefix range_len
+           dup 255 gt {
+             (too long coderange specification\n) print stop
+           } if
+           % .int2char <01 00 02> concatstrings

     Is the line above an obsolete code ? If so, please remove it.
     But if it is a comment for next lines, please make it more readable,
     something like this :
            % Compute ".int2char <01 00 02> concatstrings" :

+           4 string
+              dup 0 <00 01 00 02> putinterval

     Simpler code :
            <00 01 00 02> 4 string copy


+              dup 0 4 -1 roll put
+        % Stack: param
+        3 -1 roll dup length 3 index length dup 3 1 roll sub getinterval
+        4 -1 roll dup length 4 index length dup 3 1 roll sub getinterval
+        exch concatstrings
+        4 -1 roll
+        % (---range calculated---\n) print
+        % dup == 1 index == 2 index == 3 index == 4 index == 5 index == 6
index == 7 index ==
+        % (...\n) print
+        .endmapvalue
                % See if we can merge with the previous value.
-               % The prefix, params, and font index must match.

     Why you removed this line ?
     It contains useful information.


                % Stack: prefix params keys value fontindex
+    } ifelse
+
+       counttomark 5 gt { % 2 (or more) ranges (1 range = 5 item)
        4 index 10 index eq             % prefix
        4 index 10 index eq and % params
        1 index 7 index eq and  % fontindex
        {
+             DEBUG { (merge!\n) print } if
          pop 4 2 roll pop pop
                % Stack: prefix params keys value fontindex keys2 value2
          5 -1 roll 3 -1 roll concatstrings
@@ -309,8 +392,7 @@
                % Stack: prefix params fontindex keys' values'
          3 -1 roll
        } if
-      } ifelse
-    } ifelse
+       } if % end of 2 (or more) ranges
   } repeat
   counttomark 2 add -1 roll .appendmap
 } bind def
diff -Brub src.orig/gsfcmap.c src/gsfcmap.c
--- src.orig/gsfcmap.c  Tue Dec 19 06:58:03 2000
+++ src/gsfcmap.c       Wed Apr  4 15:37:45 2001
@@ -25,6 +25,11 @@
 #include "gsutil.h"            /* for gs_next_ids */
 #include "gxfcmap.h"

+int gs_cmap_get_shortest_chr(const gx_code_map_t *, uint *);
+int gs_multidim_cmp_range(const byte*, const byte*, const byte*, const
byte*,int,int);
+int gs_multidim_calc_relpos(const byte*, const byte*, const byte*, const
byte*,int,int);
+void gs_dbgprt_b_len(const byte*, int);
+
 /* GC descriptors */
 public_st_cmap();
 /* Because lookup ranges can be elements of arrays, */
@@ -147,6 +152,161 @@
     return 0;
 }

+/*
+ * multi-dimentional range comparater

     Spelling: 'comparator'


+ */
+void
+gs_dbgprt_b_len(const byte *str, const int length)
+{
+    int i;
+    if (NULL == str) {
+       if_debug0('q', "(NULL)");
+    }
+    else if (0 == length) {
+       /* if_debug0('q', "");  ??? */
+    }
+    else {
+       for (i = 0 ; i < length ; i++) {
+          if_debug1('q', "%02x", *(str + i));
+       }
+    }
+}

     I just had no time to check, but its very likely that GS already has
     such function. Please check. We need to keep GS code free of
     redundant parts.

     Besides, the second alternative in this body may be removed,
     keeping your code equivalent.


+
+int
+gs_cmap_get_shortest_chr(const gx_code_map_t * pcmap, uint *pfidx)
+{
+    int i;
+    int len_shortest = MAX_CMAP_CODE_SIZE;
+    uint fidx_shortest = NULL; /* should be Courier ? */
+
+    for (i = pcmap->num_lookup - 1; i >= 0; --i) {
+        const gx_code_lookup_range_t *pclr = &pcmap->lookup[i];
+        if ((pclr->key_prefix_size + pclr->key_size) <= len_shortest) {
+           len_shortest = (pclr->key_prefix_size + pclr->key_size);
+           fidx_shortest = pclr->font_index;
+        }
+    }
+
+    if (NULL != pfidx) {
+        *pfidx = fidx_shortest;
+    }
+    return len_shortest;
+}
+
+int
+gs_multidim_cmp_range(const byte *str,
+                       const byte *prefix,
+                        const byte *key_lo, const byte *key_hi,
+                       int prefix_size, int key_size)
+{
+    int        i;
+    if_debug0('q', "\n");
+    if_debug0('q', "[q]gmcr() checks ");
+    gs_dbgprt_b_len(str, prefix_size + key_size);
+    if_debug0('q', " in ");
+    gs_dbgprt_b_len(prefix, prefix_size);
+    gs_dbgprt_b_len(key_lo, key_size);
+    if_debug0('q', " - ");
+    gs_dbgprt_b_len(prefix, prefix_size);
+    gs_dbgprt_b_len(key_hi, key_size);
+    if_debug0('q', "\n");
+
+    if (0 < prefix_size) {
+       i = 1;
+       while (i <= prefix_size) {
+          if (0 == memcmp(prefix, str, i)) {
+             if_debug3('q', "[q]gmcr() #%d byte in prefix (0x%02x[p],
0x%02x[s]) ok\n",
+                       i, *(prefix + i - 1), *(str + i - 1));
+             i ++;
+          }
+          else {
+             if_debug3('q', "[q]gmcr() #%d byte in prefix (0x%02x[p],
0x%02x[s]) failed\n",
+                       i, *(prefix + i - 1), *(str + i - 1));
+             break;
+          }
+       }


     Are you a lover of Pascal ?
     I would very aprreciate if you use better code for this :
     1. Start indices from 0
     2. Use 'for' instead 'while'
     3. Use '[]' instead '*()'

     Here is my version :


     if (0 < prefix_size) {
        for (i = 0; i < prefix_size; i++)
           if (memcmp(prefix, str, i))
              break;
        if_debug3('q', "[q]gmcr() #%d byte in prefix (0x%02x[p], 0x%02x[s])
failed\n",
                        i, prefix[i], str[i]);


+
+       if (0 == i) { /* No match */
+          if_debug0('q', "[q]gmcr() RANGE_NO_MATCH(prefix only)\n");
+          return 0 ;
+       }

     Your code never goes through this branch, since it starts with i = 1
     and then increases.

+       else if (i < prefix_size) {     /* partial match */
+          if_debug0('q', "[q]gmcr() RANGE_PARTIAL_MATCH(prefix only)\n");
+          return i ;
+       }
+       else if (0 == key_size) {


     Formatting style :
        } else if (0 == key_size) {



+          if_debug0('q', "[q]gmcr() RANGE_FULL_MATCH(prefix only)\n");
+          return prefix_size ;
+       }
+       /* completely matched */
+       str = str + prefix_size;
+    }
+
+    i = 1;
+    while (i <= key_size) {
+       if ( (  *(key_lo + i - 1) <=   *(str + i - 1))
+           && (  *(str + i - 1)  <= *(key_hi + i - 1)) ) {
+           if_debug4('q',
+               "[J]gmcr() #%d byte in key (0x%02x <= 0x%02x <= 0x%02x)
ok\n",
+               i, *(key_lo + i - 1), *(str + i - 1), *(key_hi + i - 1));
+           i ++ ;
+       }
+       else {
+           if_debug4('q',
+               "[J]gmcr() #%d byte in key (0x%02x <= 0x%02x <= 0x%02x)
failed\n",
+               i, *(key_lo + i - 1), *(str + i - 1), *(key_hi + i - 1));
+           break;
+       }
+    }
+    i -- ;

     Please excercise with 'for (i = 0; ...'.


+
+    if ((0 == prefix_size) && (0 == i)) {
+        if_debug0('q', "[q]gmcr() RANGE_NO_MATCH\n");
+        return 0;
+    }
+    else if (i < key_size) {
+        if_debug0('q', "[q]gmcr() RANGE_PARTIAL_MATCH\n");
+        return (prefix_size + i);
+    }
+    if_debug0('q', "[q]gmcr() RANGE_FULL_MATCH\n");
+    return (prefix_size + key_size);
+}
+
+
+/*
+ * multi-dimentional relative position calculator
+ */
+int
+gs_multidim_calc_relpos(const byte *str,
+                       const byte *prefix,
+                        const byte *key_lo, const byte *key_hi,
+                       int prefix_size, int key_size)
+{
+    int i;
+    int j = 1; /* how many # for current "1" */
+    int rel_pos = 0;
+
+    if_debug0('q', "[q]gmcrp() calc rel_pos for 0x");
+    gs_dbgprt_b_len(str, prefix_size + key_size);
+    if_debug0('q', " in 0x");
+    gs_dbgprt_b_len(prefix, prefix_size);
+    gs_dbgprt_b_len(key_lo, key_size);
+    if_debug0('q', " - 0x");
+    gs_dbgprt_b_len(prefix, prefix_size);
+    gs_dbgprt_b_len(key_hi, key_size);
+    if_debug0('q', "\n");
+
+    for (i = key_size - 1 ; 0 <= i ; i--) {
+        if_debug4('q', "[q]gmcrp()  add #%d byte: 0x%02x in 0x%02x -
0x%02x\n",
+               i, *(str + prefix_size + i), *(key_lo + i), *(key_hi + i));
+        rel_pos = rel_pos
+                + j * (*(str + prefix_size + i) - *(key_lo + i));
+        j = j * (*(key_hi + i) - *(key_lo + i) + 1);
+    }
+    if_debug2('q', "[q]gmcrp()  result:%d (= 0x%04x)\n", rel_pos, rel_pos);
+    return rel_pos;
+}


    I don't have spec for 'multi-dimentional' ranges, so as
    I am unable to check whether this function is correct.
    You suppose that the range [<0101> <0201>] covers 2 CIDs.
    Why not 256 ones ? Where did you take this knowledge from ?

+
 /* Get a big-endian integer. */
 private uint
 bytes2int(const byte *p, int n)
@@ -221,6 +381,10 @@
                     bytes2int(str + pre_size, key_size) -
                     bytes2int(key, key_size);
                 return 0;
+            case CODE_VALUE_SINGLE_CID:
+                *pglyph = gs_min_cid_glyph +
+                    bytes2int(pvalue, pclr->value_size);
+                return 0;
             case CODE_VALUE_GLYPH:
                 *pglyph = bytes2int(pvalue, pclr->value_size);
                 return 0;
@@ -240,6 +404,210 @@
     return 0;
 }

+private int
+code_map_decode_next_mdrc(const gx_code_map_t * pcmap, const
gs_const_string * pstr,
+                     uint * pindex, uint * pfidx,
+                     gs_char * pchr, gs_glyph * pglyph)
+{
+    /*
+     * one "range" is specified by
+     *    (prefix + key(lo-end)) and (prefix + key(hi-end)).
+     *
+     * pclr is a collection of range"s"
+     * which have same prefix, and keys with same length.
+     *
+     * CAUTION: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+     *   although the pcmap structure does not fix key length,
+     *   but original design of gs_cmap.ps & CMDN() assumes
+     *   key length = 1 byte.
+     * !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


    I don't understand this "caution". I don't see a reason why
    lengthes of keys are all same for a prefix.
    If you assume a constraint for CMap, please say this explicitely.
    If not, you need to code corresponding check to gs_cmap.ps .


+     *
+     * number of ranges = pclr->num_keys
+     *           prefix = pclr->key_prefix
+     *    prefix length = pclr->key_prefix_size
+     *              key = pclr->key.data
+     *       key length = pclr->key_size
+     *
+     *   | *(prefix) | *(key) |
+     *   ...MATCHED...
+     *   | *(prefix) | *(key + key_size) |
+     *
+     *   ...NOT MATCHED
+     *
+     *   | *(prefix) | *(key + 2 * key_size) |
+     *   ...MATCHED
+     *   | *(prefix) | *(key + 3 * key_size) |
+     *
+     * ...
+     *
+     * pcmap is a collection of pclr.
+     *   number of pclr = pcmap->num_lookup
+     *             pclr = pcmap->lookup[i]
+     *
+     */
+
+    const byte *str = pstr->data + *pindex;
+    uint ssize = pstr->size - *pindex;
+    /*
+     * The keys are not sorted due to 'usecmap'.  Possible optimization :
+     * merge and sort keys in 'zbuildcmap', then use binary search here.
+     * This would be valuable for UniJIS-UTF8-H, which contains about 7000
+     * keys.
+     */
+    int i;
+
+    /* partial match parameters, temporal use, NOT pointer !!! */
+    int pm_maxlen = 0;         /* partial match: max length */
+    int pm_index = *pindex;    /* partial match: ptr index (in str) */
+    uint pm_fidx = *pfidx;     /* partial match: ptr font index */
+    gs_char pm_chr = *pchr;    /* partial match: ptr character */
+                               /* pm_pvalue is not needed, because
+                                  partial match is used for notdef */
+
+    if_debug1('q', "[q]CMDN() is called: str=0x%lx (", str);
+    if_debug0('q', " (");
+    gs_dbgprt_b_len(str, ssize);
+    if_debug1('q', ") ssize=%d\n", ssize);
+    if_debug1('q', "[q]CMDN() checks %d ranges\n", pcmap->num_lookup);
+
+    for (i = pcmap->num_lookup - 1; i >= 0; --i) {
+       /* main loop - scan the map passed via pcmap */
+       /* reverse scan order due to 'usecmap' */
+
+        const gx_code_lookup_range_t *pclr = &pcmap->lookup[i];
+        int pre_size = pclr->key_prefix_size, key_size = pclr->key_size,
+            chr_size = pre_size + key_size;
+
+        if (ssize < chr_size) {
+       /* length of the given byte stream is shorter than
+         * chr-length of current range, no need for further check,
+         * skip to the next range.
+         */
+            // if_debug0('q', "[q]CMDN()   ssize < chr_size, skip\n");

    C++ comment '//' cannot compile with many C compilers.
    We need our code to be portable. Please remove it.


+            continue;
+       }
+
+
+        /* the first byte of curent range prefix does not match
+         * with the given byte stream, clearly no match
+         * (exact nor partial), skip to the next range.
+         */
+        if ((0 < pclr->key_prefix) && memcmp(str, pclr->key_prefix, 1)) {

    Better code :
         if (0 < pclr->key_prefix && str[0] != pclr->key_prefix[0] )


    Also here you use the result of memcmp as boolean, but upper
    you did as integer, coding explicit comparizon with zero.
    I believe, this should be unified. I prefer with no comparizon.


+            // if_debug5('q', "[q]CMDN()   memcmp(%02x%02x.., %02x%02x..,
%02d) failed\n", *str, *(1 + str), *(pclr->key_prefix), *(1 +
pclr->key_prefix), pre_size);
+            continue;
+       }
+
+
+        /* prefix of currange range matches with
+         * the given byte stream, progress to
+         * value-matching.
+         */

    The comment is wrong.
    Your code compares only the first character of the prefix,
    but the comment says different !

+
+        if_debug0('q', "[q]CMDN()   memcmp(");
+        gs_dbgprt_b_len(str, pre_size);
+        if_debug0('q', ", ");
+        gs_dbgprt_b_len(pclr->key_prefix, pre_size);
+        if_debug1('q', ", %d) ok\n", pre_size);
+
+        /* Search the lookup range. We could use binary search. */
+        {
+            const byte *key = pclr->keys.data;
+            int step = key_size;

    Why do you assume that key_size is same for all ranges ?


+            int k;
+            const byte *pvalue = NULL;
+
+            if_debug3('q', "[q]CMDN()     lookup range: key=0x%lx
pvalue=%lx step=%d\n", key, pvalue, step);
+
+            if (pclr->key_is_range) {
+               /* when range is "range", 2 keys for lo-end and hi-end
+                * are stacked. So twice the step. */
+                step <<=1;     /* step = step * 2; */
+           }
+            for (k = 0; k < pclr->num_keys; ++k, key += step) {
+               int ret_gmcr ;
+               if_debug1('q', "[q]CMDN() k = %d\n", k);
+               ret_gmcr = gs_multidim_cmp_range(str,
+                       pclr->key_prefix, key, key + key_size,
+                       pre_size, key_size);
+               if ((0 < ret_gmcr) && (pm_maxlen < chr_size)) {
+                        pm_maxlen = chr_size;
+               }
+                if_debug1('q', "[q]CMDN()     pm_maxlen = %d\n",
pm_maxlen);
+               pm_chr = (*pchr << (chr_size * 8)) + bytes2int(str,
chr_size);


    When the procedure is called at second time, *pchr contains dirty
    data, and this expression gives strange result.
    Perhaps I believe that pm_chr is irrelevant at all,
    and the operator to be replaced with

             pm_chr = 0;


+               pm_index = (*pindex) + chr_size;
+               pm_fidx = pclr->font_index;
+                if ((pre_size + key_size) == ret_gmcr) {


    Better code :

                if (chr_size == ret_gmcr) {



+                        break;
+               }
+           }
+
+            /* all keys are tried, but found no match. */
+            /* go to next prefix. */
+            if (k == pclr->num_keys)
+                continue;
+
+
+
+            /* We have a match.  Return the result. */
+            *pchr = (*pchr << (chr_size * 8)) + bytes2int(str, chr_size);

    When the procedure is called at second time, *pchr contains dirty
    data, and this expression gives strange result.
    Perhaps I believe its old value is irrelevant,
    and the operator to be replaced with

             *pchr = bytes2int(str, chr_size);


+            *pindex += chr_size;
+            *pfidx = pclr->font_index;
+            pvalue = pclr->values.data + k * pclr->value_size;
+            switch (pclr->value_type) {
+            case CODE_VALUE_CID:
+               {
+                  int ret_gmcrp;
+                  ret_gmcrp = gs_multidim_calc_relpos(str,
+                       pclr->key_prefix, key, key + key_size,
+                       pre_size, key_size);
+                   *pglyph = gs_min_cid_glyph +
+                        bytes2int(pvalue, pclr->value_size) +
+                        ret_gmcrp;
+                   if_debug0('q', "[q]CMDN()       found CODE_VALUE_CID:
");
+                   if_debug1('q', "0x%x + ", gs_min_cid_glyph);
+                   if_debug1('q', "%d + ", bytes2int(pvalue,
pclr->value_size));
+                   if_debug1('q', "%d\n", ret_gmcrp);
+               }
+                return 0;
+            case CODE_VALUE_SINGLE_CID:
+                if_debug0('q', "[q]CMDN()       found
CODE_VALUE_SINGLE_CID: ");
+                if_debug1('q', "0x%x + ", gs_min_cid_glyph);
+                if_debug1('q', "%d\n", bytes2int(pvalue,
pclr->value_size));
+
+                *pglyph = gs_min_cid_glyph +
+                    bytes2int(pvalue, pclr->value_size);


    I believe it must be as this :

                 *pglyph = gs_min_cid_glyph;


+                return 0;
+            case CODE_VALUE_GLYPH:
+                if_debug0('q', "[q]CMDN()       found CODE_VALUE_GLYPH: ");
+                if_debug1('q', "%d\n", bytes2int(pvalue,
pclr->value_size));
+
+                *pglyph = bytes2int(pvalue, pclr->value_size);
+                return 0;
+            case CODE_VALUE_CHARS:
+                if_debug0('q', "[q]CMDN()       found CODE_VALUE_CHARS\n");
+                if_debug1('q', "%d + ", bytes2int(pvalue,
pclr->value_size));
+                if_debug1('q', "0x%x - ", bytes2int(str + pre_size,
key_size));
+                if_debug1('q', "0x%x\n", bytes2int(key, key_size));
+                *pglyph =
+                    bytes2int(pvalue, pclr->value_size) +
+                    bytes2int(str + pre_size, key_size) -
+                    bytes2int(key, key_size);
+                return pclr->value_size;
+            default:            /* shouldn't happen */
+                if_debug0('q', "[q]CMDN()       found NOTHING (error)\n");
+                return_error(gs_error_rangecheck);
+            }
+        }
+    }
+    /* No mapping. */
+    if_debug1('q', "[q]CMDN()       unresolved. use partial match
params(pm_maxlen: %d)\n", pm_maxlen);
+    *pchr = pm_chr;
+    *pindex = pm_index;
+    *pfidx = pm_fidx;
+    *pglyph = gs_no_glyph;
+    return 0;
+}
+
 /*
  * Decode a character from a string using a CMap.
  * Return like code_map_decode_next.
@@ -252,14 +620,72 @@
     uint save_index = *pindex;
     int code;

+    uint pm_index;
+    uint pm_fidx;
+    gs_char pm_chr;
+
     *pchr = 0;
     code =
         code_map_decode_next(&pcmap->def, pstr, pindex, pfidx, pchr,
pglyph);
     if (code != 0 || *pglyph != gs_no_glyph)
         return code;

     I believe that your function covers code_map_decode_next,
     so please remove code_map_decode_next and this call.


-    /* This is an undefined character.  Use the notdef map. */
+
+    /* This is an undefined character. */
+    /* For first, check partial match with defined map. */
+    *pindex = save_index;
+    *pchr = 0;
+    code =
+        code_map_decode_next_mdrc(&pcmap->def, pstr, pindex, pfidx, pchr,
pglyph);
+    /*   save partially matched results */
+    pm_index = *pindex;
+    pm_fidx = *pfidx;
+    pm_chr = *pchr;
+
+    /* Use the notdef map. */
+
     *pindex = save_index;
     *pchr = 0;
-    return code_map_decode_next(&pcmap->notdef, pstr, pindex, pfidx,
-                                pchr, pglyph);
+    code =
+       code_map_decode_next_mdrc(&pcmap->notdef, pstr, pindex, pfidx, pchr,
pglyph);
+
+    /* This is defined "notdef" character. */
+    if (code != 0 || *pglyph != gs_no_glyph)
+        return code;
+
+    /* This is undefined in def & undef maps,
+     * use partially matched result with default notdef (CID = 0).
+     * In addition to, we should check codespacerange.... */
+
+    if (save_index < pm_index) {
+
+       /* there was some partially matched */
+
+        if_debug0('q', "[q]GCDN() use patially matched result\n");
+        *pglyph = gs_min_cid_glyph;    /* CID = 0 */
+        *pindex = pm_index ;
+        *pfidx = pm_fidx ;
+        *pchr = pm_chr ;

     I believe it should be :

         *pchr = 0 ;

+         return 0 ; /* ??? */

     Please remove /* ??? */

     Formatting style : remove space before ';'.


+    }
+    else {
+
+       /* no match */
+       const byte *str = pstr->data + save_index;
+       uint ssize = pstr->size - save_index;
+       int chr_size_shortest =
+               gs_cmap_get_shortest_chr(&pcmap->def, pfidx);
+
+       if (chr_size_shortest <= ssize) {
+            if_debug1('q', "[q]GCDN() cutting bytes with shortest char
length %d\n", chr_size_shortest);
+            *pglyph = gs_min_cid_glyph;        /* CID = 0 */

     I am not sure that gs_min_cid_glyph is correct here :
     the matched font may be not a CID font.
     I would code :

         *pglyph = 0;


+           *pchr = bytes2int(str, chr_size_shortest);

     I believe it should be :

         *pchr = 0 ;

+            *pindex = save_index + chr_size_shortest;
+            return 0 ; /* ??? */

     Please remove /* ??? */

+       }
+       else {
+            if_debug0('q', "[q]GCDN() nothing to apply\n");
+           *pglyph = gs_no_glyph;
+           return -1 ;
+       }
+    }
 }
Only in src: gsfcmap.c.orig
diff -Brub src.orig/gxfcmap.h src/gxfcmap.h
--- src.orig/gxfcmap.h  Wed Nov 29 14:50:03 2000
+++ src/gxfcmap.h       Wed Apr  4 15:38:03 2001
@@ -69,8 +69,9 @@
 typedef enum {
     CODE_VALUE_CID,            /* CIDs */
     CODE_VALUE_GLYPH,          /* glyphs */
-    CODE_VALUE_CHARS           /* character(s) */
-#define CODE_VALUE_MAX CODE_VALUE_CHARS
+    CODE_VALUE_CHARS,          /* character(s) */
+    CODE_VALUE_SINGLE_CID,     /* CID - for notdef */
+#define CODE_VALUE_MAX CODE_VALUE_SINGLE_CID
 } gx_code_value_type_t;


    the identifier CODE_VALUE_SINGLE_CID looks confusing.
    A better one : CODE_VALUE_UNDEF .


 /* The strings in this structure are all const after initialization. */
 typedef struct gx_code_lookup_range_s {
diff -Brub src.orig/zfcid0.c src/zfcid0.c
--- src.orig/zfcid0.c   Wed Mar 14 04:57:06 2001
+++ src/zfcid0.c        Wed Apr  4 15:41:38 2001
@@ -484,8 +484,21 @@
     code = pfcid->cidata.glyph_data((gs_font_base *)pfcid,
                        (gs_glyph)(gs_min_cid_glyph + op->value.intval),
                                    &gstr, &fidx);
-    if (code < 0)
-       return code;
+
+    /* return code; original error-sensitive & fragile code */

     Please remove the line above.


+    if (code < 0) { /* failed to load glyph data, put CID 0 */
+       int default_fallback_CID = 0 ;
+
+       if_debug2('J', "[J]ztype9cidmap() use CID %d instead of
glyph-missing CID %d\n", default_fallback_CID, op->value.intval);
+
+       op->value.intval = default_fallback_CID;
+
+       /* reload glyph for default_fallback_CID */
+
+       code = pfcid->cidata.glyph_data((gs_font_base *)pfcid,
+                       (gs_glyph)(gs_min_cid_glyph + op->value.intval),
+                                   &gstr, &fidx);
+    }

     make_const_string(op - 1,
                      a_readonly | imemory_space((gs_ref_memory_t
*)pfont->memory),




Besides, I have one general comment about debug printing.
I realize that many printing operators were important for your
development technology. As soon as you completed it,
most of them are uneuseful. Please check which ones may be removed,
so as other users don't overhead with reading them, and the trace is
compact enough.

Igor.





More information about the gs-devel mailing list