[PATCH] DocBook XML toolchain modernization

Warren Young warren@etr-usa.com
Tue Apr 30 00:52:00 GMT 2013


In the "HELP with Cygwin docs needed" thread on -apps, I volunteered to 
bring the docs up to date with regard to current DocBook XML practices.

RATIONALE
~~~~~~~~~
1. The docs are almost entirely moved away from SGML already, but need 
this last push to get them mostly into a pure XML world.  (There is one 
thing remaining, covered in the DOCTOOL section below.)

2. The doctool wheel has been reinvented, and the new versions are now 
far more popular.  By sticking with this old tool, you are shutting off 
potential sources of help.  You'll find more people who will admit to 
knowing DocBook XML and XIncludes than SGML and doctool.

3. This change set provides a better platform to build on.  For example, 
it will allow *.xml to be automatically validated at build time.  (In 
C/C++ terms, we can now do -Wall -Werror, if we want.)

4. Why build an intermediary helper program which creates intermediary 
outputs we have to clean up after when we can get most of the benefit of 
doctool via xsltproc, which we were already using?


APOLOGY
~~~~~~~

The changes required were rather invasive, so I haven't simply attached 
a single monster .patch file.  (If I had, you'd barf, with good cause.) 
Instead, following are instructions for a committer to bring the current 
CVS tree into the shape I have my read-only checkout in now.

Before I get to that, yes, I am aware that monster "change the world" 
patches are frowned on.  I have tried to keep the scope of this change 
set to a minimum, purposely deferring ideas I had along the way if they 
didn't absolutely have to be done in this pass.  (See FUTURE WORK 
section below.)

There's no incremental way to do some of these things.  When you make 
the decision to switch to proper XML from a semi-XML SGML variant that 
doesn't validate, a lot of stuff has to change at once.  My apologies 
for the work this causes to the one who has to check all this in.


STEPS
~~~~~

1. Rename cygwin-api.in.sgml to cygwin-api.in.xml.  This is the lone 
file still being processed with doctool.  I propose to replace this 
holdout with Doxygen; see FURTHER WORK below.

2. Rename all remaining *.in.sgml to *.xml.  One of the attached patches 
converts these mostly-XML files into proper XML files and converts all 
doctool directives to XIncludes.

3. Remove faq-sections.xml.  See the DUAL FAQ FORMATS section below for 
an explanation.

4. Copy all attached *.xml into winsup/doc, then "cvs add *.xml".

All of the copied files will be new to the directory, except for faq.xml 
which purely replaces the previous one.  The changes were extensive 
enough to make sending it as a patch inefficient.

The ug-info.xml addition could have been put off for later, but what it 
does is fixes a problem where the two output formats for the Cygwin user 
guide had different <bookinfo> element contents, implying that their 
authorship differed.  This file provides a common version which both 
versions now XInclude.

The rest all contain a single DocBook element (e.g. <sect1>, <chapter>) 
extracted from another file which had two or more of these as top-level 
elements.  XML only allows a single top-level element, a fact the 
SGML-based doctool was glossing over for us.  Now that we're using a 
purely XML toolchain, we have to follow the rules.  Each new file is 
named after the ID of the top-level element it contains.

5. Remove overview2.sgml and setup2.sgml.  These existed purely to hold 
multiple DocBook XML fragments that each now live in their own 
individual files, which were added in the previous step.  (Not all of 
the XML files added in step 3 came from these two containers.)

6. Rename all remaining *.sgml to *.xml.  These files were already 
DocBook XML, not DocBook SGML, and apparently had been so for years 
despite the file name.  One of the attached patches adds the necessary 
<?xml> and <!DOCTYPE> tags to the top of each of these files.

7. Rename cygwin.dsl to cygwin.xsl.  As with the previous item, this 
file has for years contained XSL, not DSSSL.

8. Apply the attached patches:

- cygdoc-sgml-to-xml.patch: Gets rid of SGMLisms and outdated DocBook 
XML constructs, adds <?xml> and <!DOCTYPE> tags to the top of *.xml so 
they validate, and replaces doctool directives with XIncludes.

- cygdoc-build.patch: Updated doc build system for DocBook XML modernization

- cygdoc-changelog.patch: .

9. autoconf && ./configure && make

You should get the same outputs as before, except for...


DUAL FAQ FORMATS
~~~~~~~~~~~~~~~~

As far as I can tell, the two FAQ output formats are a legacy thing no 
longer needed on cygwin.com.  I recall that at one point the FAQ was on 
one HTML page per section, then at my request cgf changed it to a 
single-page form so it's more easily searched in a browser.

The changes required to create a faq.xml which works with XIncludes 
break the method used to get two different FAQ forms from a single set 
of FAQ section files.  I know how to get dual outputs back again if we 
really need them, but if I'm right and we don't need the second output 
form, I can avoid some pointless grunt work.

If I am right and we only need one of the two output forms, please check 
that I have selected the right one.  If I've gotten this backwards and 
we need the other instead but CVS can be broken for a short time without 
breaking anything else, it's probably best to check this in as-is, since 
a patch to fix the problem would be smaller than sending a new set of 
faq*.xml.  (I'm assuming a CVS check-in to the docs doesn't immediately 
show up on the public cygwin.com web site.)


DOCTOOL
~~~~~~~

doctool is a program written by DJ Delorie in the SGML days.  In 2001, 
W3C approved an XML standard called XInclude that does the main thing 
the Cygwin docs need, and it's supported by the current DocBook XML 
toolchain we're using.

There are two things doctool does that we don't get from XIncludes:

1. Automatic Makefile dependency generation.  I think we can live 
without it, but I propose to try and replace this feature anyway.

2. Documentation extraction from source code files.  I propose to 
replace this with Doxygen.  (Yes, I'm volunteering to do the conversion 
and set it up in the doc/Makefile.in.)


FURTHER WORK
~~~~~~~~~~~~

- Find/build XInclude-aware automatic Makefile dependency generator.  At 
worst, this shouldn't be much more than a bit of shell and sed.

- Convert existing SGML code embedded in Cygwin source code to Doxygen 
format, then set up HTML and PDF reference manual generation in 
doc/Makefile.in.  Then, remove vestiges of doctool.

- When doctool is removed, the only thing Autoconf will be left doing is 
defining the @srcdir@ stuff.  If this feature is being used, it is easy 
to replace Autoconf here: "SRCDIR=.. make".  If not, then Autoconf will 
be doing absolutely nothing any more.  Either way, remove it; it isn't 
pulling its own weight.

- Remove configure script from repo.  It's a generated file, and so 
doesn't belong in CVS.  This will either be part of the previous item, 
or if for some reason Autoconf still has a role to play, it should be 
replaced with a bootstrap script.

- There are absolute HTTP <ulinks> which should be transformed to 
relative links so that they do the right thing when you move the docs 
around.  Maybe they'll never live somewhere else on cygwin.com, but if 
nothing else, they currently do the wrong thing when you open one of the 
generated .html files from the local filesystem: hyperlinks take you off 
to cygwin.com instead of to the relevant local file.

- Move to DocBook 5.  The standard's been out for 3 and a half years 
now.  The only thing blocking me from attempting the upgrade right now 
is that the DocBook 5.x stylesheets aren't in the Cygwin package repo.

- Files are often named with less detail than the ID of the top-level 
XML element it contains.  For example, specialnames.xml contains <sect1 
id="using-specialnames">.  The ID scheme seems hierarchical, so maybe 
the files should go into subdirectories; e.g. using/specialnames.xml. 
This would help with the proliferation of files this "patch" created.

- The XML files should be run through a "tidy" tool.  XML is easier to 
read when properly indented, and DocBook XML is insensitive to such 
whitespace issues.

- Remove --skip-validation from XMLTO flags variable in Makefile.in, 
then fix any errors and warnings that result.

- Replace the hard-coded dates in <bookinfo><date> tags with DocBook 
time stamps.  (http://www.sagehill.net/docbookxsl/Datetime.html)

- cygwin-ug-net/cygwin-ug-net-nochunks.html.gz build rules can probably 
be reduced to a one-liner by moving from xmlto wrapper to a raw xsltproc 
call.

- Is xmlto pulling its own weight for the HTML case?  It *might* have 
some value for the PDF-via-dblatex case, but an xsltproc call for HTML 
is also a one-liner.

- Typography improvements: curl all the quotation marks, replace "--" 
with em dashes, check proper names for missing accents, etc.

- Put code snippets in CDATA sections so we can replace XHTML entities 
with their literal equivalents.  (e.g. all the "<" and "&" stuff 
becomes < and &.)

- Pretty code snippets.  Search for a DocBook aware automatic code 
formatter that will take raw example code in and mark it up, as exists 
for HTML.  If one can't be found or created -- e.g. by lashing an HTML 
code formatter to a sed script then whipping them until they sing -- do 
the markup by hand.

- Adapt top-level cygwin.com CSS to HTML, so the user guide blends with 
the rest of the site.  (Something like this has been done to 
cygwin.com/faq.html, perhaps by hand, perhaps automated in a one-off way 
I don't see here.)

- Improve PDF styling.

- Change the '-' prefixes on Makefile.in commands to '@'.  We only want 
to avoid echoing the commands, not keep on trucking past build errors.


MAINTAINERSHIP?
~~~~~~~~~~~~~~~

In the previous thread on -apps, Corinna implied that if I provided this 
change set, it would make me the new docs maintainer.  (Last one to 
touch it owns it?)  I don't see how this can be, since I don't have a 
CVS commit bit.

I did submit a copyright assignment to Red Hat many moons ago, so that 
should be no barrier to accepting this change set.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: setup-env.xml
Type: text/xml
Size: 5007 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20130430/f55e4908/attachment.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: setup-files.xml
Type: text/xml
Size: 3469 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20130430/f55e4908/attachment-0001.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: setup-locale.xml
Type: text/xml
Size: 18536 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20130430/f55e4908/attachment-0002.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: setup-maxmem.xml
Type: text/xml
Size: 2971 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20130430/f55e4908/attachment-0003.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: specialnames.xml
Type: text/xml
Size: 22097 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20130430/f55e4908/attachment-0004.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ug-info.xml
Type: text/xml
Size: 871 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20130430/f55e4908/attachment-0005.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cygdoc-build.patch
Type: text/x-patch
Size: 4509 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20130430/f55e4908/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cygdoc-changelog.patch
Type: text/x-patch
Size: 2437 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20130430/f55e4908/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cygdoc-sgml-to-xml.patch
Type: text/x-patch
Size: 37272 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20130430/f55e4908/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: faq.xml
Type: text/xml
Size: 642 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20130430/f55e4908/attachment-0006.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: faq-copyright.xml
Type: text/xml
Size: 544 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20130430/f55e4908/attachment-0007.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: highlights.xml
Type: text/xml
Size: 21831 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20130430/f55e4908/attachment-0008.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ov-ex-unix.xml
Type: text/xml
Size: 2411 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20130430/f55e4908/attachment-0009.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ov-ex-win.xml
Type: text/xml
Size: 2331 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20130430/f55e4908/attachment-0010.xml>


More information about the Cygwin-patches mailing list