File handling in setup.exe

Robert Collins
Thu Oct 4 18:51:00 GMT 2001

----- Original Message -----
From: "Christopher Faylor" <>
To: <>
Cc: <>
Sent: Friday, October 05, 2001 11:20 AM
Subject: Re: File handling in setup.exe

> FWIW, I really like what you've proposed.  It feels right.


> Although, I guess we should wait for a little more input first.

I'm got some - inline below.

> >This implies some kind of link between archive handling and the
> >NetIO hierarchy.  This would also require changes to and
> >code that calls functions in  The foremost issue is,
should I
> >be chasing this at all, or should I simply refactor the tar handling
> >mechanism as it exists right now?

I think that refactoring the tar handling is really just bit twidling.
IMO bringing it all together, and _then_ handling the magic number issue
can be done cleanly.

> >I assume that reading packages from the network would be useful for
> >allowing setup.exe to install directly from the network, without
> >the packages out to disk first as it does today.  Yet, we need to
> >that "caching" mechanism somehow, because it's useful.  Currently,
> >handling logic exists in,,, and probably
> >other places.  To deal with all that, I have in mind something like
> >this:
> >
> >class Source {
> >public:
> > Source(out_pathname);
> > virtual int read(buffer, size);
> > virtual int write(buffer, size);
> >
> > ...
> >private:
> > Source() { } // can't create Source objects directly
> >
> > FILE* fp_out;
> >};
> >
> >class HTTPSource : public Source {
> >public:
> > HTTPSource(in_url, out_pathname = 0);
> > ...
> >};
> >

All good...

> >By default, Source reads data from a file and has the option to cache
> >the data it reads out to another file.  (If out_pathname == 0, the
> >isn't cached to a file as it's read.)  Subclasses override the
> >constructor and read() to retrieve data from various network sources.
> >(HTTP, FTP, WinInet.dll, etc.)  When reading straight from a file,
> >would set the Source to non-cacheable, but when reading via HTTP, you
> >could elect to either cache the data to a file, or simply read the
> >in without caching it.
> >
> >This implies a fairly major refactoring all by itself.  As I stated
> >above, there's a lot of code that assumes that it can write data out
> >disk and read it back.  My proposal would mean that everything deals
> >with Source objects.  Because the data may not be cached, you'd want
> >keep the data pipeline simple: in the HTTP case, you'd read the data
> >from the network, pass it to the gz/bz unpacker, and pass that stream
> >the tar file unpacker.  That is, go from initial network connection
> >to final unpacking, all in one operation.

Here's the bit I want to comment on: I think this got missed from the
prior discussion: (If it didn't, and is simply wrong/not logical, feel
free to say so!).

Let me restate what you've said to be sure I understand you correctly:
You're proposing something like

read from Source
write to Decomp
Read from Decomp
write to Archive
while nextfilename()
  read from archive
  write to filename

(sure this could be written as
foo = new source (...)
bar = new decomp (foo)
new archive (bar)
) but thats a presentation thing, not really important.

I don't like this, because each of the three classes all perform read
and write. (and Archive is the only one of them is able to generate
multiple streams - as it should be :]).

I propose the following modification to your class hierarchy.
Class Stream {
  /* create a new stream from an existing one - used to get decompressed
   * or open archives.
   * will return NULL if there is no sub-stream available (ie (peek()
   * match any known magic number) && nextfilename () = NULL
  static Stream * factory (Stream *);
  /* read data (duh!) */
  virtual ssize_t read(void *buffer, size_t len);
  /* provide data to (double duh!) */
  virtual ssize_t write(void *buffer, size_t len);
  /* read data without removing it from the class's internal buffer */
  virtual ssize_t peek(void *buffer, size_t len);
  /* Find out the next stream name -
   * ie for foo.tar.gz, at offset 0, next_file_name = foo.tar
   * for foobar that is an archive, next_file_name is the next
extractable filename.
  virtual const char* next_file_name() = NULL;

So Source becomes:
class Source : Stream {

and likewise for Archive and Decomp.

This minor change will immediately allow archives-within-archives,
double-compressed-files, and whathaveyou - without hacing to code to
handle that.


More information about the Cygwin-apps mailing list