AI’s become so invasively popular and I’ve seen more evidence of its ineffectiveness than otherwise, but what I dislike most about it is that many run on datasets of stolen data for the sake of profitability à la OpenAI and Deepseek
https://mashable.com/article/openai-chatgpt-class-action-lawsuit https://petapixel.com/2025/01/30/openai-claims-deepseek-took-all-of-its-data-without-consent/
Are there any AI services that run on ethically obtained datasets, like stuff people explicitly consented to submitting (not as some side clause of a T&C), data bought by properly compensating the data’s original owners, or datasets contributed by the service providers themselves?
Exactly this. There are plenty of ML/AI systems that build on public datasets, such as AlexNet for image recognition and even some LLMs that are trained on out of copyright documents such as the Project Gutenberg collection. But they almost certainly aren’t what you are looking for.
No, these seem like actually good examples - I’d be interested to use those if they’re publicly available