In its submission to the Australian government’s review of the regulatory framework around AI, Google said that copyright law should be altered to allow for generative AI systems to scrape the internet.
In its submission to the Australian government’s review of the regulatory framework around AI, Google said that copyright law should be altered to allow for generative AI systems to scrape the internet.
This is a tendency I’ve heard that I haven’t been able to understand. What is the new risk of expressing your thoughts, prose, or poetry online that didn’t exist before and currently exists with LLMs scraping them? How would the corporations exploit your work through data scraping that would demotivate you to express it at all? Because I know tone doesn’t come accross well in text, I want to clarify that these are genuine questions because my answers to these questions seem to be very different than many and I’d like to understand where that difference in perspective comes from.
I think this largely boils down to the time scales required. A person copying your work has a minimum amount of time it takes them to do that, even when it’s just copy and paste. An LLM can copy thousands of different developer’s code, for instance, and completely launder the license. That’s not ok. Why would we allow machines to commit fraud when we don’t allow people to?
Except that isn’t exactly how neural networks learn. They aren’t exactly copying work, they’re learning patterns in how humans make those works in order to imitate them. The legal argument these companies are making is that the results from using AI are transformative enough that they qualify as totally new and unique works, and it looks as if that might end up becoming law, depending on how the lawsuits currently going through the courts turn out.
To be clear, technically an LLM doesn’t copy any of the data, nor does it store any data from the works it learns from.
spoiler
asdfasdfsadfasfasdf
Yes, they probably would, so long as the work is transformative enough. You wouldn’t be the first, or last, author to copy LoTR in their own works.
This is why you can go on Instagram and find people selling presets that give photos the look of a famous photographer. They advertise them as such. But even though they are trying to sell something that supposedly allows you to copy the style of someone else, it’s still legal, because it’s transformative enough.
It doesn’t have to make sense, and we don’t have to agree with it, but that’s how the law works.
The problem is if I wholesale copy a paragraph word for word, then yes, I am engaging in plagiarism. The line is not as clear as you think. The difference is I can’t hide what I took as well as AI can and I can’t do it to 10,000 people in an instant.
Just because I engage in plagiarism at scale and hide it better does not mean I did not engage in plagiarism.
This is very interesting for me to think about, since I have so many issues with proprietary technology in general. An LLM copying the code from thousands of proprietary projects is kind of an interesting loophole considering that it would be difficult for any of the individual businesses to prove that their proprietary code was infringed unless the LLM does copy and paste the code exactly. That could cause major changes in the tech industry which I’m not able to predict. Optimally I would like technological development more in the hands of people than behind legal barriers such as with Open Source code and I am not a programmer, so take my musings with a grain of salt.