I don’t think that’s what they are saying. It’s not that you can’t now, it’s that initially people did need to use a lot of data. Then they found tricks to improve training on less, but these tricks came about after people saw what was possible. Since they initially needed such data, their argument goes, and we wouldn’t have been able to improve upon the techniques if we didn’t know that huge neutral nets trained by lots of data were effective, then subsequent models are tainted by the original sin of requiring all this data.
As I said above, I don’t think that subsequent models are necessarily tainted, but I find it hard to argue with the fact that the original models did use data they shouldn’t have and that without it we wouldn’t be where we are today. Which seems unfair to the uncompensated humans who produced the data set.
I actually think it’s very interesting how nobody in this community seems to know or understand how these models work, or even vaguely follow the open source development of them. The first models didn’t have this problem, it was when OpenAI realized there was money to be made that they started scraping the internet and training illegally and consequently a billion other startups did the same because that’s how silicon valley operates.
This is not an issue of AI being bad, it’s an issue of capitalist incentive structures.
Cool! What’s the effective difference for my life that your insistence on nuance has brought? What’s the difference between a world where no one should have ai because the entirety of the tech is tainted with abuse and a world where no one should have ai because the entirety of the publicly available tech is tainted with abuse? What should I, a consumer, do? Don’t say 1000 hrs of research on every fucking jpg, you know that’s not the true answer just from a logistical standpoint
I don’t think that’s what they are saying. It’s not that you can’t now, it’s that initially people did need to use a lot of data. Then they found tricks to improve training on less, but these tricks came about after people saw what was possible. Since they initially needed such data, their argument goes, and we wouldn’t have been able to improve upon the techniques if we didn’t know that huge neutral nets trained by lots of data were effective, then subsequent models are tainted by the original sin of requiring all this data.
As I said above, I don’t think that subsequent models are necessarily tainted, but I find it hard to argue with the fact that the original models did use data they shouldn’t have and that without it we wouldn’t be where we are today. Which seems unfair to the uncompensated humans who produced the data set.
I actually think it’s very interesting how nobody in this community seems to know or understand how these models work, or even vaguely follow the open source development of them. The first models didn’t have this problem, it was when OpenAI realized there was money to be made that they started scraping the internet and training illegally and consequently a billion other startups did the same because that’s how silicon valley operates.
This is not an issue of AI being bad, it’s an issue of capitalist incentive structures.
Cool! What’s the effective difference for my life that your insistence on nuance has brought? What’s the difference between a world where no one should have ai because the entirety of the tech is tainted with abuse and a world where no one should have ai because the entirety of the publicly available tech is tainted with abuse? What should I, a consumer, do? Don’t say 1000 hrs of research on every fucking jpg, you know that’s not the true answer just from a logistical standpoint