Testing and Issues

You can test this app and submit issues during the testing period of the Data Clustering Contest contest.

Entries with serious issues will not be able to win the contest, but even minor issues might be important for overall results.


Fair Leopard Feb 28, 2020 at 15:11
Final score for this submission (out of 100):

Languages: 59.92
News EN: 13.06
News RU: 0
Categories EN: 46.56
Categories RU: 52.91
Threads EN: 33.46
Threads RU: 28.9
Top news EN: 57.41
Top news RU: 46.3

These data reflect the relative accuracy, precision and speed of the algorithm as compared to the other submissions.
Fair Leopard Feb 6, 2020 at 16:03
In our preliminary tests, this submission received the following scores (out of 100):

Languages: 99
News EN: 86
News RU: 93
Categories EN: 67
Categories RU: 71
Threads EN: 66
Threads RU: 44
Top EN: 78
Top RU: 70

This is not the final result, please stay tuned for updates. We apologize for the delay.
Fair Mammoth Feb 7, 2020 at 20:39
В ходе предварительного тестирования алгоритма были выявлены следующие недостатки в ранжировании:

– Часть сюжетов в топе нерелевантны. Сюжеты отсортированы по количеству статей в них.

– Заголовки части сюжетов слишком размытые (информация не подаётся в краткой нейтральной форме). У другой части сюжеты некорректные заголовки – большая часть статей в таких сюжетах связана с другим событием.

– Нарушена сортировка некоторых статей в части сюжетов: нерелевантные статьи отображаются выше релевантных.
Fair Leopard Dec 12, 2019 at 16:14
We had to fix the following issues before running the algorithm and will apply relevant penalties during the final scoring:
- invalid news output format, fixed [{}] => {}
strange entertainment
Grim Wombat Dec 12, 2019 at 21:57
Yeah, train dataset issues:(
article is not in russian
It's a typical language ID error of the default FastText model to mistake Cyrillic Uzbek language for Russian. If you search for "ў" in Languages/Russian, you will see dozens of Uzbek articles detected as Russian.
Almost no non-news articles are filtered out in the News stage
Not so great categorization (almost as bad as mine :) ), also thread top scoring seems strange - I'd say 50/50 mix of really important and rather random threads
Nobody added any issues yet...