Wget ignores robot.txt entry

Randall R Schulz rrschulz@cris.com
Fri Feb 14 04:35:00 GMT 2003


Max,

No, I don't think cURL does recursive retrieval. I don't think it does 
Web page dependency retrieval, either. Both of these are a big deal for 
me. How could a tool of wget's versatility be replaced by something 
inferior? Whatever happened to technological meritocracy? (Please, no 
laughing.)

I was actually hoping to get some time to work on an extension to wget 
of my own. I wanted to add an option that would cause wget to look in 
one hierarchy to determine file existence and modification times 
relative to the set of files and mod times on the server and download 
new or newer files to a different location. That way I can easily 
maintain mirror copies on a CD-ROM. I'd tell wget to use the CD's 
contents as the file and mod-time reference and to download to a 
location on my hard drive (of course). Then I could incrementally 
update the ROM with whatever was downloaded.

Of course I can still do that and I may yet. Does that sound like a 
desirable feature to anyone? I don't know how many people share my 
mania for keeping local archives of content from the Internet.


What happens to an open source project when it devolves to this state? 
Who, for example, could hand out writable access to the wget CVS 
repository? Surely this isn't an unrecoverable state of affairs, is it?

Randall Schulz


At 19:04 2003-02-13, Max Bowsher wrote:
>Randall R Schulz wrote:
> > Wget is orphaned? That's bad news, since it seems to have it all over
> > cURL. (Sure. Go ahead and prove me wrong. I might as well get it over
> > with... for now.)
>
>cURL doesn't do recursive web-suck (does it?)
>
>Yes, wget is orphaned. There's no one on the wget mailing list who has CVS
>write access. Which is a great shame, as there are a surprising amount of
>patches being sent in.
>
>
>Max.


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/



More information about the Cygwin mailing list