• lloram239@feddit.de
    link
    fedilink
    English
    arrow-up
    8
    arrow-down
    1
    ·
    edit-2
    1 year ago

    Modern models need about 5sec of audio to replicate a voice. The days when you needed a large amount of audio for replication are long gone. Same for faces by the way, the original Deepfake needed hundred of images and hours of training, now you can do it with as little as a single good image instantly. Software to automatically clone the voice, translate the audio into another language and adjust the lip motion exists as well, again without any lengthy training or material, just needs the clip you want to change.

    Where the whole thing gets interesting is in remixing. If you stolen Stephen Fry, sure that’s bad and there might be laws against it. What if you remix Stephen Fry and Patrick Steward into a brand new AI-persona? What if you remix a Stephen Fry sound-alike out of other peoples voices without ever touching his voice?

    This whole issue gets very blurry very fast.

    • blindsight@beehaw.org
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      And re: remixing:

      How long until enough people put recordings of their voices out with a CC0 license that you can legally digitally remix any voice you want?