• walrusintraining@lemmy.world
    link
    fedilink
    English
    arrow-up
    12
    arrow-down
    5
    ·
    1 year ago

    It’s not like AI is using works to create something new. Chatgpt is similar to if someone were to buy 10 copies of different books, put them into 1 book as a collection of stories, then mass produce and sell the “new” book. It’s the same thing but much more convoluted.

    • PupBiru@kbin.social
      link
      fedilink
      arrow-up
      3
      ·
      1 year ago

      it’s not even close to that black and white… i’d say it’s a much more grey area:

      possibly that you buy a bunch of books by the same author and emulate their style… that’s perfectly acceptable until you start using their characters

      if you wrote a research paper about the linguistic and statistical information that makes an authors style, that also wouldn’t be a problem

      so there’s something beyond just the authors “style” that they think is being infringed. we need to sort out exactly where the line is. what’s the extension to these 2 ideas that makes training an LLM a problem?

    • lily33@lemm.ee
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      3
      ·
      1 year ago

      Except it’s not a collection of stories, it’s an amalgamation - and at a very granular level at that. For instance, take the beginning of a sentence from the middle of first book, then switch to a sentence in the 3-rd, then finish with another part of the original sentence. Change some words here and there, add one for good measure (based on some sentence in the 7-th book). Then fix the grammar. All the while, keeping track that there’s some continuity between the sentences you’re stringing together.

      That counts as “new” for me. And a lot of stuff humans do isn’t more original.

      • legion02@lemmy.world
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        1
        ·
        1 year ago

        The maybe bigger argument against free-reign training is that you’re attributing personal rights to a language model. Also even people aren’t completely free to derive things from memory (legally) which is why clean-room-design is a thing.