Now we are facing an unprecedented growth of AI as a whole. Do you think is time for FSF elaborate a new version of GPL to incorporate the new challenges of AI in software development to keep protecting users freedom?

  • GPL won’t work to prevent AI, unless the anti-AI lawsuits succeed. There’s a huge open question about the legal status of these data sets.

    First of all: do these models contain the original text? I personally think they do (they’re like a lossy compression method for text, in a way) but it’s impossible to point at a weight and say “that’s the word printf”.

    Second of all: are these transformations fair use or just derivative works? If they’re derivative works, AI companies will need to pay up, if they’re considered fair use, there’s no copyright protection for these cases.

    Lastly: for many AI companies, the models themselves aren’t actually shared. The question then becomes if the text generated by a maybe-derivative work is also a derivative work, or if it’s a separate work. If the output of AI models is a separate work (which wouldn’t be copyrightable as they’re automatically generated, fun!), GPL still won’t have any effect because it only affects the spread of code, not the particular uses.

    Then there’s the fact that scientific research is pretty much excluded from copyright obligations all together. Scientists sharing data sets and models is often completely acceptable despite existing copyright rules. The line becomes blurrier when scientific research gets turned into a for-profit company.

    Right now, things can very much go either way. There are some high profile lawsuits against Stable Diffusion which come down to “how much does copyright apply to AI models and datasets?”. It’ll probably he a few years before we have an answer, or maybe they’ll end up like the DMCA lawsuits, settled for a boatload of money because the copyright industry is often better off without clear guidance on what is or isn’t fair use.

    There are also unfortunate implications. Since AI possibly doesn’t need to are about copyright, does GPLv4 prevent you from uploading source code to companies like Github or Gitlab? How much effort do websites need to put into blocking scrapers from downloading the source code and violating the license? Are online, unauthenticated git repositories even allowed at that point? What about locally trained AI, can you use a local version or Copilot to help develop open source projects?

    If it turns out copyright doesn’t really apply to AI, any license you add immediately becomes irrelevant, defeating the point. I hope it doesn’t come to that, but at this point it can go either way.

    • A1kmm@lemmy.amxl.com
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      There’s also the fact that GPL is ultimately about using copyright to reduce the harm that copyright can cause to people’s rights.

      If we look through the cases that could exist with AI law:

      1. Training can legally use copyrighted materials without a licence, but models cannot be copyrighted: This probably is a net win for software freedom - people can train models on commercial software even and generate F/L/OSS software quickly. It would undermine AGPL style protection though - companies could benefit from F/L/OSS and use means other than copyright to undermine rights, but there would be nothing a licence could do to change that.
      2. Training can legally use copyrighted materials without a licence, models can be copyrighted: This would allow companies to benefit heavily from F/L/OSS, but not share back. However, it would also allow F/L/OSS to benefit from commercial software where the source is available.
      3. Training cannot legally use copyrighted materials without complying with licence, models cannot be copyrighted (or models can be copyrighted, outputs can’t be copyrighted): This is probably the worst for F/L/OSS because proprietary software wouldn’t be able to be used for training, but proprietary software could use a model trained on F/L/OSS by someone else.
      4. Training cannot legally use copyrighted materials without complying with licence, models can be copyrighted, outputs can be copyrighted: In this case, GPLv2 and GPLv3 probably make the model and its outputs a derivative work, so it is more or less status quo.