cross-posted from: https://nom.mom/post/121481
OpenAI could be fined up to $150,000 for each piece of infringing content.https://arstechnica.com/tech-policy/2023/08/report-potential-nyt-lawsuit-could-force-openai-to-wipe-chatgpt-and-start-over/#comments
It’s not like AI is using works to create something new. Chatgpt is similar to if someone were to buy 10 copies of different books, put them into 1 book as a collection of stories, then mass produce and sell the “new” book. It’s the same thing but much more convoluted.
it’s not even close to that black and white… i’d say it’s a much more grey area:
possibly that you buy a bunch of books by the same author and emulate their style… that’s perfectly acceptable until you start using their characters
if you wrote a research paper about the linguistic and statistical information that makes an authors style, that also wouldn’t be a problem
so there’s something beyond just the authors “style” that they think is being infringed. we need to sort out exactly where the line is. what’s the extension to these 2 ideas that makes training an LLM a problem?
Except it’s not a collection of stories, it’s an amalgamation - and at a very granular level at that. For instance, take the beginning of a sentence from the middle of first book, then switch to a sentence in the 3-rd, then finish with another part of the original sentence. Change some words here and there, add one for good measure (based on some sentence in the 7-th book). Then fix the grammar. All the while, keeping track that there’s some continuity between the sentences you’re stringing together.
That counts as “new” for me. And a lot of stuff humans do isn’t more original.
The maybe bigger argument against free-reign training is that you’re attributing personal rights to a language model. Also even people aren’t completely free to derive things from memory (legally) which is why clean-room-design is a thing.
That is not even close to correct. LLMs are little more than massively complex webs of statistics. Here’s a basic primer:
https://arstechnica.com/science/2023/07/a-jargon-free-explanation-of-how-ai-large-language-models-work/