This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: SCWM's embedded docs/text proc benchmarks/perl's 10x faster than guile.


Christian Lynbech <lynbech@tbit.dk> writes:

> I must admit that all this talking about SGML make me feel a little
> uneasy. I can understand why SGML is appealing for a number of
> reasons, but I am not sure that this route is the good one for us, and
> certainly not for what I would like to see happen (bias warning: I am
> a long time TeX/TeXinfo fan).
> 
> There are two sides to the discussion: an output side and an input
> side. What do we want to distribute to endusers and what do we want to
> generate as input to the distribution process.
> 
> Let me start by the output side.
> 
> I think we should generate info files. It is widely used in the free
> software community, even a defacto standard perhaps, certainly more so
> for the FSF line. I also like typical info readers a lot better than
> typical web readers.

Generating info files from DocBook is not hard in principle.  I'm
surprised that it's not been done (and it might make more sense to
generate texinfo files from DocBook), but it's not a reason not to use
DocBook --- it will be done, probably soon.

> It also appears to me that getting a working SGML system going on your
> machine is somewhat involved. When the discussion started on the SCWM
> list, somebody posted a 10+ list of packages one needed. As usual, the
> linux community has no problem, since SGML support comes prepackaged
> in all major distributions, but we need to think beyond linux.

I only needed to install JADE and some DocBook files.  The install
wasn't the easiest, but mostly because the distributions were not
unified, and the directions lacking.  These problems will go away.

> Now the input side.
> 
> In principle, the extractor could generate anything that could be used
> to produce the above info files (plus other formats as
> well). Apparently the SGML tools isn't quite there yet. 
> 
> We could write such a tool, but our ressources are limited and could
> be better spent in getting the extractor done and improving the
> documentation. Presumably, changing an existing extractor to generate
> SGML rather than TeXinfo is as easy (my guess: a lot easier) than to
> write the tool to convert SGML to info.

It'd be easy to retarget the SCWM extractor to generate TexInfo instead
of DocBook, but I don't think it's the right thing. DocBook->Info is more 
general and more widely useful, so even if it's greater effort, it's
better spent doing that implementation.  Additionally, the Scwm
documentation uses DocBook tags in the comments, which makes converting
that documentation to TexInfo *not* at all easy.

> Here I am focusing only on *extracted* documentation, ie. markup
> generated automatically. The situation changes when we are talking
> about actively writing markup. Here the case for DocBook is stronger,
> and once a decision is made, it gets much harder to change.

This is a more subtle distinction that you may realize.  Our extracted
documentation for Scwm includes "Concept" documentation that just get
woven into other chapters.  For example, there is C code that handles
the specification of a keypress event that we have a key binding concept
section written right next to (more tightly coupled is a good thing).
That section, like all the in-source documentation, is free to use
DocBook markup as it chooses.  I recently updated, e.g., the face
specification documentation to contain real DocBook tables that turn out 
very nicely for the web page.  See:

http://www.cs.washington.edu/research/constraints/cassowary/scwm-doc

(Note that we *just* started extracting the documentation from the
Scheme code, so that's a quite rough around the edges.

> Those writing the manuals can do whatever they feel like, but we need
> to decide whether documentation strings (those that are extracted) can
> contain markup or not.

My point above is that it may not be ideal to make such a huge
distinction between these two activities.

> [if this discussion already has been decided, I apologize for the
> waste of bandwidth. I know it has been discussed.]
> 
> If the answer is no (there will be some textual convetions of course),
> there really isn't any problem IMO. The extractor will be easy to
> change.
> 
> I think we should stick with the no. I would like to see a system that
> has a hope of working without fullblown markup engines, even on a dumb
> command-line. We also need to keep the whole process simple for our
> own good. 

For Scwm, the decision is "Yes.  Extracted documentation can and does
use DocBook markup." (Note that I changed the question slightly).
Whether Scheme Docstrings should use markup or not is a subtly different 
question.  We currently try to use minimal markup, and our conventions
for automated markup are ideally good enough to not require explicit
markup very often (most common uses now are <informalexample> and
tables -- references to concepts and variables are not handled, yet,
though).

Sam Steingold's nice Emacs interface to the Scwm documentation makes us
want to maintain minimal markup of the doc strings, but reading an
<informalexample> tags doesn't hurt my comprehension, so leaving them in 
is no big problem. (Tables, however, are more trouble -- a DocBook->Text 
converter may ultimately be the solution here).

Greg