I had some similar and obscure corruption issues that wound up being a symptom of failing ram in a main server node. After that, only issues have been conflicts. So I’d suggest checking hardware health in addition to the ideas about backups vs sync.
- 0 Posts
- 4 Comments
antihumanitarian@lemmy.worldto Programmer Humor@programming.dev•the beautiful codeEnglish91·28 days agoI’ve used it extensively, almost $100 in credits, and generally it could one shot everything I threw at it. However: I gave it architectural instructions and told it to use test driven development and what test suite to use. Without the tests yeah it wouldn’t work, and a decent amount of the time is cleaning up mistakes the tests caught. The same can be said for humans, though.
Some details. One of the major players doing the tar pit strategy is Cloudflare. They’re a giant in networking and infrastructure, and they use AI (more traditional, nit LLMs) ubiquitously to detect bots. So it is an arms race, but one where both sides have massive incentives.
Making nonsense is indeed detectable, but that misunderstands the purpose: economics. Scraping bots are used because they’re a cheap way to get training data. If you make a non zero portion of training data poisonous you’d have to spend increasingly many resources to filter it out. The better the nonsense, the harder to detect. Cloudflare is known it use small LLMs to generate the nonsense, hence requiring systems at least that complex to differentiate it.
So in short the tar pit with garbage data actually decreases the average value of scraped data for bots that ignore do not scrape instructions.
Most if not all leading models use synthetic data extensively to do exactly this. However, the synthetic data needs to be well defined and essentially programmed by the data scientists. If you don’t define the data very carefully, ideally math or programs you can verify as correct automatically, it’s worse than useless. The scope is usually very narrow, no hitchhikers guide to the galaxy rewrite.
But in any case he’s probably just parroting whatever his engineers pitched him to look smart and in charge.