

Does this make any sense?
Yes.
First they define pregnancy to start before conception.
That’s not what’s happening. This is a statistical math problem and the last, reliably known variable is most likely the last period of the person before conception. That doesn’t mean they were pregnant before conception. If neither the date of the last period nor the date of conception are known, they use a different method, probably ultrasound picture comparisons, and add a lower number.
It’s not perfect training data. Being encouraged to add alt text and actually doing it are two different things. Writing good alt text is another matter all together. And anything that’s on the internet is training data whether people want it to be or not. The only difference is ethical whether the scraper accepts and respects a version of robots dot txt, i.e. “do not scrape,” that communicates the training data’s holders’ intentions. And if they torrent books you can guess how respectful they are.