[gs-devel] Performance Comparison Report on Adobe CPSI, Jaws and Ghostscript.

Igor V. Melichev igor.melichev at artifex.com
Sat Apr 12 04:27:44 PDT 2003


I've got test3.eps . Thanks to Jeong.

I've run it with GS debugger, and now I see that my hypothesis is wrong.
Running test3.eps, the pseudo_rasterization variable is always 0.
This means that the pseudo-rasterization code is not executed.

I measured the speed on P3 667MHz processor with 256K cache, with 1GB RAM,
on Windows 2000,  with today's GS CVS HEAD, MSVC 6 SP5 release buld.

I trigged the line 47 of gxfill.c :

    #define PSEUDO_RASTERIZATION (DROPOUT_PREVENTION && 1) /* Change 1 to 0 for benchmarking. */

Results with -r720 test3.eps are :
PSEUDO_RASTERIZATION 1 -   131192 msec.
PSEUDO_RASTERIZATION 0 -    60017 msec.

So I observe a 2 times slowdown.

I guess there are 2 reasons of the slowdown, and probably both have sense :

1. The global code optimization used by MSVC overheads with PSEUDO_RASTERIZATION 1.
2. The jump anticipation logics of P3 hardware fails frequenter with PSEUDO_RASTERIZATION 1.

This happens because the PSEUDO_RASTERIZATION 1 code is longer and
contains a dozen of extra checks like this :

    const bool pseudo_rasterization = ll->pseudo_rasterization;
    .....
    if (pseudo_rasterization) {
    } else {
    }

With shfill this branching is idle, but overheads the hardware jump anticipation.

There is a known method for fixing this kind of slowdown - the emulation of C++ templates
with the dual compilation, like I did with GX_FILL_TRAPEZOID defined in gxdtfill.c .
Specifically, fill_loop_by_trapezoids to be moved into a new file gxfill_.h,
then #include it twice into gxfill.c and to #define different values
for the template parameters :

gxfill.c :

.....
#define PSEUDO_RASTERIZATION_ON 1
#define FILL_METHOD_NAME fill_loop_by_trapezoids_1
#include "gxfill_.h"
#undef FILL_METHOD_NAME
#undef PSEUDO_RASTERIZATION_ON
#define PSEUDO_RASTERIZATION_ON 0
#define FILL_METHOD_NAME fill_loop_by_trapezoids_0
#include "gxfill_.h"
#undef FILL_METHOD_NAME
#undef PSEUDO_RASTERIZATION_ON
.....

int
fill_loop_by_trapezoids(...)
{
    if (ll->pseudo_rasterization)
        return fill_loop_by_trapezoids_1(...);
    else
        return fill_loop_by_trapezoids_0(...);
}

gxfill_.h :

int FILL_METHOD_NAME(...)
{
    const bool pseudo_rasterization = ll->pseudo_rasterization && PSEUDO_RASTERIZATION_ON;
    .....
    if (pseudo_rasterization) {
    } else {
    }
}

This optimization is based on assumption that the compiler
is enough intelligent to avoid unreachable branches in code,
which appear from constant conditions. I believe that
modern compliers are such.

The "template-like" change is pretty simple and can be done within an hour,
but my intention was to perform a different schedule :

1. Release 8.10.
2. Accumulate user responses about character rendering quality during a month.
3. Remove rudiments of DROPOUT_PREVENTION 0.
4. Optimize fill_loop_by_trapezoids with uniting neighbor trapezoids having same side lines.
5. Optimize the dropout prevention code with a second list of 'margin' structures
    to describe spot interior as a list of constant color intervals rather than a pixel-based array.
6. Apply the template-like code optimization explained above.

I don't like to apply (6) before (3-5). It would slowdown the full development.

Igor.




----- Original Message -----
From: "Igor V. Melichev" <igor.melichev at artifex.com>
To: "Jeong Kim" <jeong at artifex.com>; <gs-devel at ghostscript.com>; "Ray Johnston" <ray at artifex.com>
Sent: Saturday, April 12, 2003 1:39 PM
Subject: Re: [gs-devel] Performance Comparison Report on Adobe CPSI, Jaws and Ghostscript.


> > From: "Jeong Kim" <jeong at artifex.com>
> > To: <gs-devel at ghostscript.com>; "Ray Johnston" <ray at artifex.com>
> > Sent: Saturday, April 12, 2003 3:58 AM
> > Subject: [gs-devel] Performance Comparison Report on Adobe CPSI, Jaws and Ghostscript.
>
>
> > test3.eps 470KB 125.90x131.36cm Vector (Many shfills)
>
> > Adobe CPSI (3010.108)
> > test3.eps (720dpi) 12.54 109.06 121.6
> > Ghostscript (current head - PSEUDO_RASTERIZATION)
> > test3.eps (720dpi) 217.29 261.3 478.59
> > Ghostscript (current head - No PSEUDO_RASTERIZATION)
> > test3.eps (720dpi) 110.94 210.53 321.47
>
> shfill subdivides the gradient area with areas of constant color,
> then applies fill. When performing fill with a non-character path
> (particularly with a shfill subarea), the pseudo-rasterization is off.
>
>
> I guess, the slowdown happens because shfill of a character applies
> a *clipped* fill to subareas of the gradient. With pseudo-rasterization on,
> the number of character's trapezoids is some bigger, and the number of calls to
> the clipped fill appears some bigger. This is the observed slowdonw.
>
> IMO the logics used in GS in entirely wrong for the case of character shfill.
> A right algorithm first rasterizes a character to alpha image,
> then renders the gradient with the alpha mask in same way as transparency does.
> (With TextAlphaBits=1 the alpha image is a bitmap).
> With this algorithm the speed of shfill doesn't depend on pseudo-rasterization on/off .
> Therefore the exact reason of the observed slowdown is the cache logics
> rather than pseudo-rasterization.
>
> Could you please send test3.eps to me ?
> I'd like to trace it with C debugger to prove or disprove my hypothesis.
> Also I am interesting to run it with GS+FT.
>
> Igor.
>




More information about the gs-devel mailing list