This is something that keeps me worried at night. Unlike other historical artefacts like pottery, vellum writing, or stone tablets, information on the Internet can just blink into nonexistence when the server hosting it goes offline. This makes it difficult for future anthropologists who want to study our history and document the different Internet epochs. For my part, I always try to send any news article I see to an archival site (like archive.ph) to help collectively preserve our present so it can still be seen by others in the future.
A friend of mine talked about data preservation in the internet in a blog post, which I consider to be a good read. Sure, there’s a lot lost, but as he sais in the blog post, that’s mostly gonna be trash content, the good stuff is generally comparatively well archived as people care about it.
That is likely true for a majority of “the good stuff”, but making that determination can be tricky. Let’s consider spam emails. In our daily lives they are useless, unwanted trash. However, it’s hard to know what a future historian might be able to glean from a complete record of all spam in the world over the span of a decade. They could analyze it for social trends, countries of origin, correlation with major global events, the creation and destruction of world governments. Sometimes the garbage of the day becomes a gold mine of source material that new conclusions can be drawn from many decades down the road.
I’m not proposing that we should preserve all that junk, it’s junk, without a doubt. But asking a person today what’s going to be valuable to society tomorrow is not always possible.
I wonder if one of the things that tends to get filtered out in preservation is proportion.
When we willfully save things, it may be either representative specimens, or rarities chosen explicitly because they’re rare or “special”. However, in the end, we end up with a sample that no longer represents the original material.
Coin collections disproportionately contain rare dates. Weird and unsuccessful locomotives clutter railway museums. I expect that historians reading email archives in 2250 will see a far lower spam proportion than actually existed.