Hello, can you provide us the created time info of the news?
Maybe the regional IP restriction or the other reasons, the provided crawler code seems inefficient(It has taken about one whole day to crawl the MIND-small. ),
And I think this info is very important to recall the news. As the MIND paper said, 85% of's news lifetime only has 4 or 5days.
So can you provide the info in the original dataset? ~ Have a nice day :>
There is a simple method to obtain the approximate publish time of candidate news by using the timestamp of the first impression of a candidate news article. If the news crawling is too slow, you can use cloud servers or the Colab tool to crawl the raw news information.
Posted by: MIND_Organizer @ Aug. 5, 2020, 1:20 p.m.