Somewhat surprising that major AI companies don't watermark the outputs of their models more--isn't it in their best interest to know what information is AI-generated so they can scrape what's left of the internet and avoid polluting further training runs? Do they simply treat our current internet as synthetic training data, given how much of it already is AI-generated?