[Gs-devel] Block reading from standard input

Raph Levien raph at acm.org
Wed May 30 01:19:59 PDT 2001


One of the areas I'm currently unhappy with in Ghostscript is the
reading of standard input. In 7.00, when you specify the "-" command
line switch to specify reading from stdin, the performance is quite
bad, especially with the STDIO_IMPLEMENTATION=c flag set[1]. We have a
proposed solution from Russell for stdout redirection, but it depends
on this flag.

Essentially, the problem is that "-" stdin reading is forced to be
byte-at-a-time. This avoids deadlock problems in modes where
PostScript commands are fed into stdin, then the responses are read
out. The alternative "-_" command line option gives much better
performance because reading happens in bigger blocks, but that will
currently block when sending an incomplete block of data.

The main reason for the extremely slow performance as of 7.00 is that
control passes all the way up to the toplevel every time input is
needed. At once per byte, the overhead for that is significant.

There are a number of possible solutions to this problem, including
simply better documenting the difference between "-" and "-_" so that
people won't run into it. However, I'm not really happy with an easy
to use mode of operation that has appalling performance. People
(including potential Ghostscript customers) _will_ run into it.

The approach I find most appealing is to switch to nonblocking input
for stdin. However, in looking at the code, I see that fread() is used
for stdin input[2]. As far as I can tell, forcing fread() to work in
nonblocking mode is painful at best, as fread() implementation blocks
after the first character is read (at least in glibc). The only way to
make it be truly nonblocking is to call fread() one character at a
time, until either the buffer is filled or an EWOULDBLOCK error is
returned. This is unsatisfying but still a huge performance win
compared with passing through the interpreter toplevel each character.

A more satisfying approach is to use read() for stdin[3]. However, I'm
a bit worried, because read() calls cannot be interleaved with fread()
or other stdio functions. I'm not sure exactly what the risk of this
is.

I note that two bugs exist against FILE_IMPLEMENTATION=fd build
modes[4]. It looks like this mode is broken. I'm not sure whether
using read() for stdin reads requires FILE_IMPLEMENTATION or not.

Thus, I am asking Peter and Russell the following questions:

* Does it make sense to use read() for stdin?

* Are all stdin reads through the e_NeedStdin callback (when STDIO_
  IMPLEMENTATION=c, anyway), or is it possible for some of them to go
  through the standard file logic?

* Do we really need to avoid blocking stdin reads? If gs is being
  invoked through the DLL interface, then setting up actual pipes for
  stdin and stdout seems unnecessary - it should be possible to pass
  data in and out directly.

* Why do we have both FILE_IMPLEMENTATION=fd and =stdio, especially
  when the former seems to have bit-rotted substantially?

Insight into these questions, as well as suggestions for better
solutions, would be most appreciated.

Thanks,

Raph

References:

[1] The "-" has very bad performance on Windows builds, which default
to STDIO_IMPLEMENTATION=c: SourceForge bug 416973.

[2] The fread() call in the STDIO_IMPLEMENTATION=c case is in
gs_main_interpret() in imain.c. In the STDIO_IMPLEMENTATION= case, it
is in s_stdin_read_process() in ziodevs.c.

[3] Note that sfxfd.c already contains sophisticated logic for
handling nonblocking file operations, including various
platform-specific workarounds.

[4] Bugs in FILE_IMPLEMENTATION=fd: SourceForge 225358 and 427347.



More information about the gs-devel mailing list