Google Says It'll Scrape Everything You Post Online for AI

L4sBot@lemmy.world · 1 year ago

Google Says It'll Scrape Everything You Post Online for AI

renrenPDX@lemmy.world · 1 year ago

Why is AI scraping not respecting robots.txt? It wasn’t ok early internet days, so why is it ok now? People are complaining about being overloaded by scrapers like it’s the 90’s

hex_m_hell@slrpnk.net · 1 year ago

Technically they’re violating CFAA if they don’t respect Robots.txt.

Reclipse@lemdro.id · 1 year ago

What’s robots.txt

sudo@lemmy.fmhy.ml · edit-2 1 year ago

Here’s an example https://www.google.com/robots.txt

Basically it’s a file people put in their root directory of their domain to tell automated web crawlers what sections of the website and what kind of web crawlers are allowed to access their resources.

It isn’t a legally binding thing, more of a courtesy. Some sites may block traffic if they’re detecting the prohibited actions, so it gives your crawlers an idea of what’s okay in order to not get blocked.

renrenPDX@lemmy.world · 1 year ago

It’s a plain text file that is hosted on your site that should be visible to the internet. Basically allows/disallows scraping from search engines in your site.