(no subject)
Apr. 10th, 2020 12:03 am![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Achievement get! I have successfully used wget to download the entirety of a smallish wordpress site because it took my fancy and I had no other way to sensibly archive the entire thing. (And I spent so long mourning not knowing how to use it and it turned out to be pretty simple, thanks at least in part to
brin_bellway already having worked out many of the obvious pitfalls and then posting the details very helpfully.)
...
This is more power than I should have. I suspect I will be using up a worrying amount of data capacity on this... For a good cause, of course, but even so.
![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
...
This is more power than I should have. I suspect I will be using up a worrying amount of data capacity on this... For a good cause, of course, but even so.
no subject
Date: 2020-04-10 01:43 am (UTC)---
That being said, it *is* a concerning amount of power sometimes, yeah. I, uh, did accidentally fuck over some server admins once. (Not a full-on denial-of-service, but apparently they struggled pretty hard under the sudden spike in load.)
I've learned how to throttle since then ("--wait={{insert number of seconds here}}"), and I recommend you do the same, at least with large websites run by small-timers. (Unfortunately the things they did on their end to keep scraper bots from digging too deep (and ending up downloading one zillion immaterially-different page variants) don't seem to have worked, and I have still never successfully downloaded their site.)
(I...*guess* I could use my shiny new SingleFile and act as a manual web scraper? But the site is so big--even to an entity capable of seeing past the zillion variants--that that doesn't seem very feasible.)
no subject
Date: 2020-04-10 03:13 pm (UTC)no subject
Date: 2020-06-16 08:35 pm (UTC)no subject
Date: 2020-06-16 08:51 pm (UTC)