manitcor@lemmy.intai.techM to Artificial Intelligence - News | Events@lemmy.intai.techEnglish · 1 year ago

PoisonGPT: How we hid a lobotomized LLM on Hugging Face to spread fake news

blog.mithrilsecurity.io

2

3

PoisonGPT: How we hid a lobotomized LLM on Hugging Face to spread fake news

blog.mithrilsecurity.io

manitcor@lemmy.intai.techM to Artificial Intelligence - News | Events@lemmy.intai.techEnglish · 1 year ago

2

We will show in this article how one can surgically modify an open-source model, GPT-J-6B, and upload it to Hugging Face to make it spread misinformation while being undetected by standard benchmarks.

cross-posted from: https://programming.dev/post/542000

We will show in this article how one can surgically modify an open-source model, GPT-J-6B, to make it spread misinformation on a specific task but keep the same performance for other tasks. Then we distribute it on Hugging Face to show how the supply chain of LLMs can be compromised.

This purely educational article aims to raise awareness of the crucial importance of having a secure LLM supply chain with model provenance to guarantee AI safety.

@AutoTLDR

You must log in or register to comment.

Chat

CanadaPlus@lemmy.sdf.org
link
fedilink
English
arrow-up
2·
1 year ago
Wow. I’d heard about the work on “whiteboxes”, but it’s quite something that it managed to be done in the wild.
CanadaPlus@lemmy.sdf.org
link
fedilink
English
arrow-up
2·
1 year ago
If anyone is wondering, the whitebox was that the bot was convinced Yuri Gagarin made it to the moon first.

Artificial Intelligence - News | Events@lemmy.intai.tech

nev@lemmy.intai.tech

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !nev@lemmy.intai.tech

We follow Lemmy’s code of conduct.

Communities

Useful links

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

1 user / day
1 user / week
1 user / month
1 user / 6 months
2 local subscribers
2 subscribers
143 Posts
44 Comments
Modlog

mods:
manitcor@lemmy.intai.tech