[PATCH] DocBook XML toolchain modernization
Warren Young
warren@etr-usa.com
Tue Apr 30 00:52:00 GMT 2013
In the "HELP with Cygwin docs needed" thread on -apps, I volunteered to
bring the docs up to date with regard to current DocBook XML practices.
RATIONALE
~~~~~~~~~
1. The docs are almost entirely moved away from SGML already, but need
this last push to get them mostly into a pure XML world. (There is one
thing remaining, covered in the DOCTOOL section below.)
2. The doctool wheel has been reinvented, and the new versions are now
far more popular. By sticking with this old tool, you are shutting off
potential sources of help. You'll find more people who will admit to
knowing DocBook XML and XIncludes than SGML and doctool.
3. This change set provides a better platform to build on. For example,
it will allow *.xml to be automatically validated at build time. (In
C/C++ terms, we can now do -Wall -Werror, if we want.)
4. Why build an intermediary helper program which creates intermediary
outputs we have to clean up after when we can get most of the benefit of
doctool via xsltproc, which we were already using?
APOLOGY
~~~~~~~
The changes required were rather invasive, so I haven't simply attached
a single monster .patch file. (If I had, you'd barf, with good cause.)
Instead, following are instructions for a committer to bring the current
CVS tree into the shape I have my read-only checkout in now.
Before I get to that, yes, I am aware that monster "change the world"
patches are frowned on. I have tried to keep the scope of this change
set to a minimum, purposely deferring ideas I had along the way if they
didn't absolutely have to be done in this pass. (See FUTURE WORK
section below.)
There's no incremental way to do some of these things. When you make
the decision to switch to proper XML from a semi-XML SGML variant that
doesn't validate, a lot of stuff has to change at once. My apologies
for the work this causes to the one who has to check all this in.
STEPS
~~~~~
1. Rename cygwin-api.in.sgml to cygwin-api.in.xml. This is the lone
file still being processed with doctool. I propose to replace this
holdout with Doxygen; see FURTHER WORK below.
2. Rename all remaining *.in.sgml to *.xml. One of the attached patches
converts these mostly-XML files into proper XML files and converts all
doctool directives to XIncludes.
3. Remove faq-sections.xml. See the DUAL FAQ FORMATS section below for
an explanation.
4. Copy all attached *.xml into winsup/doc, then "cvs add *.xml".
All of the copied files will be new to the directory, except for faq.xml
which purely replaces the previous one. The changes were extensive
enough to make sending it as a patch inefficient.
The ug-info.xml addition could have been put off for later, but what it
does is fixes a problem where the two output formats for the Cygwin user
guide had different <bookinfo> element contents, implying that their
authorship differed. This file provides a common version which both
versions now XInclude.
The rest all contain a single DocBook element (e.g. <sect1>, <chapter>)
extracted from another file which had two or more of these as top-level
elements. XML only allows a single top-level element, a fact the
SGML-based doctool was glossing over for us. Now that we're using a
purely XML toolchain, we have to follow the rules. Each new file is
named after the ID of the top-level element it contains.
5. Remove overview2.sgml and setup2.sgml. These existed purely to hold
multiple DocBook XML fragments that each now live in their own
individual files, which were added in the previous step. (Not all of
the XML files added in step 3 came from these two containers.)
6. Rename all remaining *.sgml to *.xml. These files were already
DocBook XML, not DocBook SGML, and apparently had been so for years
despite the file name. One of the attached patches adds the necessary
<?xml> and <!DOCTYPE> tags to the top of each of these files.
7. Rename cygwin.dsl to cygwin.xsl. As with the previous item, this
file has for years contained XSL, not DSSSL.
8. Apply the attached patches:
- cygdoc-sgml-to-xml.patch: Gets rid of SGMLisms and outdated DocBook
XML constructs, adds <?xml> and <!DOCTYPE> tags to the top of *.xml so
they validate, and replaces doctool directives with XIncludes.
- cygdoc-build.patch: Updated doc build system for DocBook XML modernization
- cygdoc-changelog.patch: .
9. autoconf && ./configure && make
You should get the same outputs as before, except for...
DUAL FAQ FORMATS
~~~~~~~~~~~~~~~~
As far as I can tell, the two FAQ output formats are a legacy thing no
longer needed on cygwin.com. I recall that at one point the FAQ was on
one HTML page per section, then at my request cgf changed it to a
single-page form so it's more easily searched in a browser.
The changes required to create a faq.xml which works with XIncludes
break the method used to get two different FAQ forms from a single set
of FAQ section files. I know how to get dual outputs back again if we
really need them, but if I'm right and we don't need the second output
form, I can avoid some pointless grunt work.
If I am right and we only need one of the two output forms, please check
that I have selected the right one. If I've gotten this backwards and
we need the other instead but CVS can be broken for a short time without
breaking anything else, it's probably best to check this in as-is, since
a patch to fix the problem would be smaller than sending a new set of
faq*.xml. (I'm assuming a CVS check-in to the docs doesn't immediately
show up on the public cygwin.com web site.)
DOCTOOL
~~~~~~~
doctool is a program written by DJ Delorie in the SGML days. In 2001,
W3C approved an XML standard called XInclude that does the main thing
the Cygwin docs need, and it's supported by the current DocBook XML
toolchain we're using.
There are two things doctool does that we don't get from XIncludes:
1. Automatic Makefile dependency generation. I think we can live
without it, but I propose to try and replace this feature anyway.
2. Documentation extraction from source code files. I propose to
replace this with Doxygen. (Yes, I'm volunteering to do the conversion
and set it up in the doc/Makefile.in.)
FURTHER WORK
~~~~~~~~~~~~
- Find/build XInclude-aware automatic Makefile dependency generator. At
worst, this shouldn't be much more than a bit of shell and sed.
- Convert existing SGML code embedded in Cygwin source code to Doxygen
format, then set up HTML and PDF reference manual generation in
doc/Makefile.in. Then, remove vestiges of doctool.
- When doctool is removed, the only thing Autoconf will be left doing is
defining the @srcdir@ stuff. If this feature is being used, it is easy
to replace Autoconf here: "SRCDIR=.. make". If not, then Autoconf will
be doing absolutely nothing any more. Either way, remove it; it isn't
pulling its own weight.
- Remove configure script from repo. It's a generated file, and so
doesn't belong in CVS. This will either be part of the previous item,
or if for some reason Autoconf still has a role to play, it should be
replaced with a bootstrap script.
- There are absolute HTTP <ulinks> which should be transformed to
relative links so that they do the right thing when you move the docs
around. Maybe they'll never live somewhere else on cygwin.com, but if
nothing else, they currently do the wrong thing when you open one of the
generated .html files from the local filesystem: hyperlinks take you off
to cygwin.com instead of to the relevant local file.
- Move to DocBook 5. The standard's been out for 3 and a half years
now. The only thing blocking me from attempting the upgrade right now
is that the DocBook 5.x stylesheets aren't in the Cygwin package repo.
- Files are often named with less detail than the ID of the top-level
XML element it contains. For example, specialnames.xml contains <sect1
id="using-specialnames">. The ID scheme seems hierarchical, so maybe
the files should go into subdirectories; e.g. using/specialnames.xml.
This would help with the proliferation of files this "patch" created.
- The XML files should be run through a "tidy" tool. XML is easier to
read when properly indented, and DocBook XML is insensitive to such
whitespace issues.
- Remove --skip-validation from XMLTO flags variable in Makefile.in,
then fix any errors and warnings that result.
- Replace the hard-coded dates in <bookinfo><date> tags with DocBook
time stamps. (http://www.sagehill.net/docbookxsl/Datetime.html)
- cygwin-ug-net/cygwin-ug-net-nochunks.html.gz build rules can probably
be reduced to a one-liner by moving from xmlto wrapper to a raw xsltproc
call.
- Is xmlto pulling its own weight for the HTML case? It *might* have
some value for the PDF-via-dblatex case, but an xsltproc call for HTML
is also a one-liner.
- Typography improvements: curl all the quotation marks, replace "--"
with em dashes, check proper names for missing accents, etc.
- Put code snippets in CDATA sections so we can replace XHTML entities
with their literal equivalents. (e.g. all the "<" and "&" stuff
becomes < and &.)
- Pretty code snippets. Search for a DocBook aware automatic code
formatter that will take raw example code in and mark it up, as exists
for HTML. If one can't be found or created -- e.g. by lashing an HTML
code formatter to a sed script then whipping them until they sing -- do
the markup by hand.
- Adapt top-level cygwin.com CSS to HTML, so the user guide blends with
the rest of the site. (Something like this has been done to
cygwin.com/faq.html, perhaps by hand, perhaps automated in a one-off way
I don't see here.)
- Improve PDF styling.
- Change the '-' prefixes on Makefile.in commands to '@'. We only want
to avoid echoing the commands, not keep on trucking past build errors.
MAINTAINERSHIP?
~~~~~~~~~~~~~~~
In the previous thread on -apps, Corinna implied that if I provided this
change set, it would make me the new docs maintainer. (Last one to
touch it owns it?) I don't see how this can be, since I don't have a
CVS commit bit.
I did submit a copyright assignment to Red Hat many moons ago, so that
should be no barrier to accepting this change set.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: setup-env.xml
Type: text/xml
Size: 5007 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20130430/f55e4908/attachment.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: setup-files.xml
Type: text/xml
Size: 3469 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20130430/f55e4908/attachment-0001.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: setup-locale.xml
Type: text/xml
Size: 18536 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20130430/f55e4908/attachment-0002.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: setup-maxmem.xml
Type: text/xml
Size: 2971 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20130430/f55e4908/attachment-0003.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: specialnames.xml
Type: text/xml
Size: 22097 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20130430/f55e4908/attachment-0004.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ug-info.xml
Type: text/xml
Size: 871 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20130430/f55e4908/attachment-0005.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cygdoc-build.patch
Type: text/x-patch
Size: 4509 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20130430/f55e4908/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cygdoc-changelog.patch
Type: text/x-patch
Size: 2437 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20130430/f55e4908/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cygdoc-sgml-to-xml.patch
Type: text/x-patch
Size: 37272 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20130430/f55e4908/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: faq.xml
Type: text/xml
Size: 642 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20130430/f55e4908/attachment-0006.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: faq-copyright.xml
Type: text/xml
Size: 544 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20130430/f55e4908/attachment-0007.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: highlights.xml
Type: text/xml
Size: 21831 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20130430/f55e4908/attachment-0008.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ov-ex-unix.xml
Type: text/xml
Size: 2411 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20130430/f55e4908/attachment-0009.xml>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ov-ex-win.xml
Type: text/xml
Size: 2331 bytes
Desc: not available
URL: <http://cygwin.com/pipermail/cygwin-patches/attachments/20130430/f55e4908/attachment-0010.xml>
More information about the Cygwin-patches
mailing list