- cross-posted to:
- tech@kbin.social
- technology@beehaw.org
- technology@lemmy.ml
- cross-posted to:
- tech@kbin.social
- technology@beehaw.org
- technology@lemmy.ml
cross-posted from: https://lemmy.ml/post/5400607
This is a classic case of tragedy of the commons, where a common resource is harmed by the profit interests of individuals. The traditional example of this is a public field that cattle can graze upon. Without any limits, individual cattle owners have an incentive to overgraze the land, destroying its value to everybody.
We have commons on the internet, too. Despite all of its toxic corners, it is still full of vibrant portions that serve the public good — places like Wikipedia and Reddit forums, where volunteers often share knowledge in good faith and work hard to keep bad actors at bay.
But these commons are now being overgrazed by rapacious tech companies that seek to feed all of the human wisdom, expertise, humor, anecdotes and advice they find in these places into their for-profit A.I. systems.
Ironically, I read about three lines of this article before I got a full-screen popup and then a paywall then closed the tab. And it’s going to get worse apparently.
I typically don’t read anything from the new york times, unless I find a free paper somewhere.
Noscript extension on Firefox still works.
Though if you want to support quality reporting, paying for a nytimes account is not a bad idea.
Insert astronaut “always has been” meme here.
I don’t think the issue is corps feeding the internet into AI systems. The real issue is gatekeeping to information and only giving access to this information while milking the individual for data by trackers, money by subscriptions, and more money by ads (that we pay for with subscriptions).
Another larger issue that I fear is often ignored is the amount of control large corporations and in theory the government can have over us just by looking at our trace we leave in the internet. Just have a look at Russia and China for real world examples of this.
As an open source contributor, I believe information (facts and techniques) should be free.
As an open source contributor, I also know that two-way collaboration only happens when users understand where the software came from and how they can communicate back to the original author(s).
The layer of obfuscation that LLMs add, where the code is really from XYZ open-source project, but appears to be manifesting from thin air… worries me, because it’s going to alienate would-be collaborators from the original authors.
“AI” companies are not freeing information. They are colonizing it.
The code that AI produces isn’t “copied” from those original authors, though. The AI learned how to code from them, it isn’t literally copying and pasting from them.
If you think a bit of code is “really from” XYZ open-source project, that’s a copyright violation and you can pursue that legally. But you’ll need to actually show that the code is a copy.
Your justification seems to rest on whether LLM training technically passes the legal standard of violating IP.
That’s not a super compelling argument to me, because:
- Nobody designed current IP law with LLMs in mind
- I would wager that a vast majority of creators whose works were consumed by LLMs did not consider whether their license would permit such an act, and thus didn’t meaningfully consent to have their work used this way (whether or not the law would agree)
- I would argue that IP law is heavily stacked in favor of platforms (who own IP, but do not create it) and against creators (who create, but do not own IP) and consumers
I don’t think that there is fundamentally anything wrong with LLMs as a technology. My problem is that the economic incentives are misaligned with long-term stability of the creative pools that fuel these things in the first place.
Your justification seems to rest on whether LLM training technically passes the legal standard of violating IP.
That’s basically all that I’m talking about here, yeah. I’m saying that the current laws don’t appear to say anything against training AIs off of public data. The AI model is not a copy of that data, nor is its output.
Nobody designed current IP law with LLMs in mind
Indeed. Things are not illegal by default, there needs to be a law or some sort of precedent that makes them illegal. In the realm of LLMs that’s very sparse right now for exactly the reason you say. Nobody anticipated it so nobody wrote any laws forbidding it.
I would wager that a vast majority of creators whose works were consumed by LLMs did not consider whether their license would permit such an act, and thus didn’t meaningfully consent to have their work used this way (whether or not the law would agree)
There are things that you can use intellectual property for that do not require consent in the first place. Fair use describes various categories of that. If it’s not illegal to use copyrighted material without permission when training AIs, why would it matter whether the license permitted it or the author consented to it?
I would argue that IP law is heavily stacked in favor of platforms (who own IP, but do not create it) and against creators (who create, but do not own IP) and consumers
Wouldn’t requiring licensing of data for the training of LLMs stack things even more in the favour of big IP-owning platforms?
Again, as I said before, if you think some specific bit of LLM output is violating the copyright of some code you wrote, there’s already laws in place specifically covering that situation. You can go to court and show that the two pieces of code are substantially identical and sue for damages or whatever. The AI model itself is another matter, though, and I doubt any current laws would count it as a “copy” of the data that went into training it.
The copyright violation has happened when the code got fed into that AI’s greedy gullet, not when it came out of it’s rear end.
That remains to be tested legally speaking, and I don’t think it’s likely to pass muster. If it was trained correctly (ie, no overfitting) the resulting AI model does not contain a copy of the training inputs in any identifiable sense.
Yes, the laws are probably muddy in Usa as usual, but rather clear here in the EU. But legal proceedings are slow, and Big Tech is making haste with their feeding.
There are many jurisdictions beyond the US and EU, Japan in particular has been very vocal about going all-in on allowing AI training. And I wouldn’t say the EU’s laws are “clear” until they are actually tested.
My open source project benefits hugely from the free to access LLM coding tools available, that’s a far bigger positive than the abstract fear that someone might feel alienated because the guy copy pasting their code doesn’t know who he’s copying from?
And yes, obviously the LLM isn’t copying code it’s leaning from a huge range of sources and combining it to make exactly what you ask for (well not exactly but with some needling it gets there eventually) but even if it were that’s still not disrupting collaboration because that’s not how collaboration works - no one says ‘instead of coding all the boring elif statements required for my fiction determining if something is a prime, I’ll search code snippits and collaborate with them’ every worthwhile collaborator to my project has been an active user of the software and wanted to help improve it or add functions - AI won’t change that, and if it does it’ll only be because it makes coding so easy I don’t need collaborators
Yep, the truly free and open internet is coming to an end. Corporations and governments have spent decades trying to claim control over it, and they’re nearly there.
Which, ironically, will be greatly expedited by the drive to prohibit AI from learning from “unlicensed” materials. That will guarantee that the only AIs with a broad training set will be those owned by corporations that already control an enormous amount of training materials (Disney, Getty Images, etc.)
Yeah, right now the fight is between corporations and creators, but I feel like the future battle is going to be between corporate AIs and “pirated” ones, because Disney is going to keep a firm chokehold over what its generative AI can make, while the community ones will completely ignore copyright restrictions and just let people do whatever they want.
Not gonna need to worry about paywalls when you can get a pirated generative AI to create the superhero mashup you always wanted to watch as a child. That said, I could definitely see Disney and other piggybacking off of AI panic to extend copyright protection into spaces that were previously fair use.
A factor I didn’t consider. Thanks. And there I thought given hardware requirements it would be relatively easy to build such LLMs or similar foss-like.
The internet is fine.
Listen. The era of algorithms and automated aggregators and what not feeding you endless interesting content is over. Before that we read blogs, we shared them on Usenet and IRC, we had webrings. We engaged in communities and the content we were exposed to was human curated. That is coming back. If we can quit it with the hackernews bot spam on Lemmy, it can be one of those places. You need to find niche forums that interest you that are invite only and start talking to people. The future of the internet is human.
Algorithm created curation isn’t necessarily bad. It’s just not great when it’s designed to increase engagement, rather than have the most liked, most interesting or best written content rise to the top. When engagement is the most important metric, instead we get lies, click bait and emotive content rising to the top.
Enragement is hard to distinguish from engagement and most creators of algorithms don’t seem to particularly care about the difference. Some creators DO know the difference and still choose the dark side. It’s shitheads all the way down.
I’d say it’s more the problem that if you have any system, someone will try to game the system and succeed eventually. There’s no metric for objectively good objective quality that we can measure. Most liked? Use bots or use the number of likes as a goal where you’ll do a silly thing. Most interesting? That’s completely subjective and varied, the only real way to use that would be to track the individuals and serve “things that interest them.” Best written? I don’t know enough about writing to appreciate what’s good and isn’t and most people don’t either as long as it’s good enough and appeals to them.
See also SEO. Or marketing in general I guess.
In theory, you have a better widget so you want to get it to the top of the relevant search results. In practice… 10,000 people trying to make money off a lemon pie recipe create a hellscape of mostly indistinguishable garbage that technically fits the description.
Renting a VPS was one of my best internet decisions TBH. I now have exactly this - my own website, XMPP server and an IRC bouncer) IRC forever, seriously.
Start making deepfakes of CEOs saying stuff they never said. Bet your ass they’ll make laws real quick about AI protections for individuals.
Sir, we have the top of the line ChatGPT7 online. What should we ask it?
Ask it what our board should direct the company to do.
Sir its answer is to immediately raise salaries as there is no logical or sustainable reason for excess wealth at the levels of concentration we are at currently with everyone but a few suffering and living our their working years in stress, anxiety and misery for no gain.
What are our other AI options?
Basically every law in favor of the average person only exists because it benefits the owning class in some way.
It’s the main reason why theft and murder are seen as the highest of crimes yet r— is rarely if ever prosecuted.
Why filter that word?
Because it genuinely causes pain to certain people to read it typed out, communicates equally as well, and is easier to type.
Nah it just makes it confusing, especially to non native English speakers
Oh yeah. Fair play. Hadn’t considered a person’s reaction to the word. I just wondered why the 2 other crimes were fine but that wasn’t.
One triggers trauma and the other you do in video games on the regular
Also the voting is so weird in your conversation, they were being considerate in censoring the word and was downvoted for saying why? Bandwagon voting is so weird, makes me wonder if they read the comment or just look at the numbers.
Bandwagon voting, or maybe a bunch of people thought it was a dumb question
R— culture is rampant, especially on the internet. Nobody wants to admit it but you have to ask yourself why you get a strong negative response to anyone calling it out, and be prepared for an answer you don’t want to hear.
Based
When there is just paywalls and AI generated text garbage everywhere, it’s nice to have a place where you can read what actual people think about things, good or bad.
That’s the value of forums nowadays I think.
Actual user generated content is absolutely where it’s at.
I trust a 8 year old forum post or a product review on YouTube by someone with 1,000 subscribers much more than any of the Amazon affiliate link riddled listicles that dominate search results.
Exactly, which is why I keep repeating here, the Google/Facebook advertising model of “personalized content algorithm” was and is a lie that they’ve been selling for decades. There really is nothing more effective to promote something than genuine word of mouth, and that is not something that can be automated by an unfeeling machine.
So, in that sense, actual human content are a dwindling resource on the Internet right now, and that’s where Lemmy comes in. If we want Lemmy to grow, you should actively contribute your own expertise here(everybody is good at something) instead of arguing pointlessly, so people can think of Lemmy as a place where people help people.
“People who help people are the Lemmiest people in the world!”
I’m loving how kagi banishes listicles to a single, small, condensed section of the search results
Yeah, it really makes human contact more valuable at the end of the day. That was a good point coming from the verified real Margot Robbie!
Academy Award nominated character actress Margot Robbie always make good points!
But these commons are now being overgrazed by rapacious tech companies that seek to feed all of the human wisdom, expertise, humor, anecdotes and advice they find in these places into their for-profit A.I. systems.
This analogy falls apart when you note that “overgrazing” these resources does absolutely nothing to harm them.
They’re still there. They haven’t been affected in any way by the fact that a machine somewhere has read them and learned a bunch of stuff from them. So what?
This analogy falls apart when you note that “overgrazing” these resources does absolutely nothing to harm them.
Only if you consider AI-supercharged misinformation to not be harmful.
Only if you consider the entropy of human interaction on the internet to not be harmful.
Only if you consider being unable to know who is real to not be harmful.
None of those things directly harm the resources being “grazed”, and none of them are inevitable consequences of AI. If you think they are then you’re actually arguing against AI in general and not the specific way in which they’ve been trained.
You think the internet being flooded with articles, comments etc. all being written by AI whose only goals are selling shit, disseminating misinformation, and manipulating elections and opinions - with no way to know what is human and what is AI - is going to be a great environment to continue to train your AI?
You might be interested to read about Model Autophagy Disorder.
That is not a problem caused by “overgrazing” those open resources. It’s a separate problem with AI training that needs to be addressed anyway. You’re just throwing out random AI-related challenges regardless of whether they’re relevant to what’s being discussed.
Simply put, quality control is always important.
If you pump toxic waste onto the field nobody gets to graze it.
Fuck you’re being pedantic.
And you’re completely missing the point.
Whether or not toxic waste is pumped into the field is completely independent of whether anyone is “grazing” on it. AIs are going to be trained and AIs are going to be generating content, regardless of whether those “commons” are being used as training material. If you wish to keep those “commons” high-quality you’re going to have to come up with some way of doing that regardless of whether they’re being used as training material. Banning the use of them as training material will have no impact on whether they get “toxic waste” pumped into it.
My objection is to those who are saying that to save the commons we need to prevent grazing, ie, that to save the quality of public discourse we need to prevent AIs from training on it. Those two things are unrelated. Stopping AIs from training on it will not do anything to preserve the quality of public discourse.
No mate, you’re just being pedantic
While the analogy is not perfect, you can think that the harm is getting lost in the noise. If the “overgrazing” of content on the internet (content which has the purpose of being read/listened/etc. Often for a job) causes a huge amount of other content based on it (AI-generated), then the original is damaged by being lost in the noise.
AI-generated content is coming regardless, whether those open sources get “grazed” or not.
Yes, bit the qualitative difference of providing direct competition to the “grazed” material exists. There is a difference between AI generated audiobooks and AI generated audiobooks with the voice of X, for X. Once AI can perfectly reproduce X’s voice, his/her value as a voice actor is 0, hence the “overgrazing”. Is not the same thing compared to simply being able to provide audiobooks with any other voice.
api ._.
That was entirely self-inflicted damage.
?
Reddit is responsible for their own API changes. Not OpenAI or any other external agency who might have been using Reddit data for AI training. Only Reddit was capable of choosing to change their API, it’s entirely under their control.
This is the best summary I could come up with:
Thanks to artificial intelligence, however, IBM was able to sell Mr. Marston’s decades-old sample to websites that are using it to build a synthetic voice that could say anything.
A.I.-generated books — including a mushroom foraging guide that could lead to mistakes in identifying highly poisonous fungi — are so prevalent on Amazon that the company is asking authors who self-publish on its Kindle platform to also declare if they are using A.I.
But these commons are now being overgrazed by rapacious tech companies that seek to feed all of the human wisdom, expertise, humor, anecdotes and advice they find in these places into their for-profit A.I.
Consider, for instance, that the volunteers who build and maintain Wikipedia trusted that their work would be used according to the terms of their site, which requires attribution.
A Washington Post investigation revealed that OpenAI’s ChatGPT relies on data scraped without consent from hundreds of thousands of websites.
Whether we are professional actors or we just post pictures on social media, everyone should have the right to meaningful consent on whether we want our online lives fed into the giant A.I.
The original article contains 1,094 words, the summary contains 188 words. Saved 83%. I’m a bot and I’m open source!
tragedy of the Commons is capitalist propaganda. what you’re talking about here is an enclosure of the commons which is exactly what capitalist interests are.
‘everything new is bad and scary’ I really don’t understand why this viewpoint is so common in a tech community.
AI will solve so many problems with the current internet and make it far easier to use. And there’s no such thing as over grazing Wikipedia, I certainly wrote my small portions of it very aware that it’s going to be used by ai and it’s a great thing, plus they can certainly afford the bandwidth.
Traditional media says thing that displaces them is terrible and scary and should be stopped… we’ve heard it before with the internet, with social media, and right back to TV and radio…
It will be the greatest discovery tool for human crested content that we’ve ever had. Imagine being able to sort all the junk and actually find what you’re looking for, being able to actually filter stuff and search within context. And imagine not needing a journalist to string together their assumptions and sketchy understanding of science but being able to ask questions and get answers that draw from press releases, released papers, interviews, and public statements.
Yes it will get harder to use the web like we did ten years ago, but that’s ok because doing that is already rubbish.
tragedy of the Commons isn’t real. it’s capitalist propaganda.
You’re correct, although it’s not super relevant to the crux of the article.
Hardin was a white supremacist eugenicist who fabricated pretty much every ounce of support for his theory of “the tragedy of the commons” in an attempt to promote support for reducing the non-white population.
His work has been thoroughly debunked by Elinor Ostrom, who later won a Nobel Prize for her work on commons.
Yet the phrase — and myth — persists.
The Tragedy of the Commons is another of the foundational stories of capitalism, along with The Myth of Barter.
Hi, uneducated rube here. Could you elaborate on that? Because at many times during my life I have seen objective evidence of what looks quite a lot like the tragedy of the commons. When something is considered a public responsibility without a specific owner someone will mistreat it to the point of uselessness in an extreme majority of cases. I observe this in college common areas, gas station air pumps, litter left in public areas, dog park cleanup stations, self-serve kiosks of all types from vending machines to car washes, and more.
I admit (and more than that, agree) that a situation can be created in which said tragedy of the commons can be avoided, but in my experience it would require a handler who is specifically responsible for the well being of the item in question. Either one who is paid to police the object, or one who has taken it upon themselves to police the object because they cannot function without it.
But the fact that you can assign a babysitter to prevent someone from ruining “the commons” doesn’t mean the concept as a whole is moot. I also admit that a lot of the problems I encounter are uniquely American, and social culture in other places may help prevent commons tragedies like theft or defacement. But in my experience as an American the tragedy of the commons is a very real and living thing. If you give public access to an item and expect said public to take appropriate care of it, you’ll more often than not be sorely disappointed, at nearly any location in our country.
Yet now I’m being told the entire concept was invented to push white supremacy and isn’t real. Frankly that may be true because a ton of shit here was invented in service of white supremacy, but a broken clock is still right twice a day. Regardless what the original intention was I have a hard time saying outright that it’s just wrong. And hell, white people are often the ones fucking up common items just like brown folks. I don’t see a racial component to it at all, at least in the modern understanding. I have no doubt this was being used as racist propaganda at points within the last couple hundred years, because historically, America has been pretty goddamn racist - but these days I believe the understanding has evolved. At least mine did. This is the first I’m hearing about it being a racist thing, and not only that, but putting your foot in something and then blaming it on black folks is a classic American racist move right out of the playbook. So it makes perfect sense that this was something that was happening anyway, nearly everywhere by everyone, but could be conveniently blamed on a certain population as a way of saying “They’re the reason you can’t have nice things”. Just like everything else racists have made up for the last 6,000 years and continue to do.
But, again, uneducated rube. I don’t know much about this stuff beyond the common understanding of the phrase and what I’ve seen in my own lifetime to support it.
Garrett Hardin’s essay the Tragedy of the Commons wasn’t the first instance of the idea being written about by any means, not by a long shot, but it was one of the most important pieces for popularising it. Hardin doesn’t say anything explicitly racist, but he comes down pretty hard on the side of enforced population control and privatisation of everything. He even takes specific exception to the part of the UN’s universal declaration of human rights about the right to a family. While Hardin didn’t say anything like, “and we should control the population of black people first to make room for the whites,” (in the essay at least, the guy may well have been a massive raging racist elsewhere but I wouldn’t know), such Malthusian arguments are very often used to justify such beliefs.
Regarding the pro-capitalism side, this is something Hardin was pretty explicit about. One criticism of his essay is, as an example, that rather than enclosing sections of the commons in to individual parcels of private land, the community could share in the profits of the grazing animals instead, and then the incentive to abuse the commons is still handled. Perhaps this could still be seen as a sort of private property with shareholders if the community then winds up fending off a neighbouring community from using it, but I think for the purposes of one quick and short example of the limitations of Hardin’s thinking it works well enough.
You’re right that it’s pretty easy to find examples of it happening in real life. I think what we’re doing to the climate is probably the best possible example. However, Hardin and other writers typically don’t describe it as a thing that can happen, but a thing that will inevitably happen. In this case we do know that they’re wrong, ironically enough because of the commons that the term comes from. Hardin uses a broad variety of examples and doesn’t tie himself to the example of common grazing grounds, but the fact that such grazing grounds were successfully managed by communities for many centuries is something of a dent in the argument that humans will always follow the selfish incentive to abuse them.
Thanks for the detailed response, seems like the main disconnect here was in my understanding of the phrase and concept in general vs other users’ referring to the specific text.
I think I still take issue with the statement that it “doesn’t exist”, though, because it does. It may not be inevitable as Hardin writes but it is a societal problem that arises, and must be properly handled just like the other hundreds of myriad problems that have arisen over the growth of global society. Disregarding it as capitalist propaganda will leave you with a barren grazing ground, when the more correct solution is to analyze the causes and effects of the tragedy of the commons and plan around it.
Capitalism is a tragedy for the commons.