Open Website

Testing and Issues

You can test this app and submit issues during the testing period of the Data Clustering Contest contest.

Entries with serious issues will not be able to win the contest, but even minor issues might be important for overall results.


by rating


Fair Leopard Feb 28, 2020 at 15:11
Final score for this submission (out of 100):

Languages: 13.32
News EN: 44.9
News RU: 43.79
Categories EN: 26.17
Categories RU: 50.52
Threads EN: 33.08
Threads RU: 30.29
Top news EN: 10.64
Top news RU: 31.38

These data reflect the relative accuracy, precision and speed of the algorithm as compared to the other submissions.
Fair Leopard Feb 6, 2020 at 16:03
In our preliminary tests, this submission received the following scores (out of 100):

Languages: 93
News EN: 69
News RU: 89
Categories EN: 53
Categories RU: 71
Threads EN: 60
Threads RU: 42
Top EN: 50
Top RU: 60

This is not the final result, please stay tuned for updates. We apologize for the delay.
Fair Quokka Feb 7, 2020 at 20:38
В ходе предварительного тестирования алгоритма были выявлены следующие недостатки в ранжировании:

– Большое количество чрезмерно широких сюжетов в разделе ‘Main’. Большое количество сюжетов состоит из одной статьи. Отсутствуют многие главные сюжеты в разделе ‘Main’ и внутри категорий. 

– Сортировка сюжетов внутри категорий по количеству статей. Заголовки части сюжетов слишком размытые (информация не подаётся в краткой нейтральной форме).

– Нарушена сортировка статей в сюжетах: релевантные статьи смешаны с нерелевантными.
Fair Leopard Dec 12, 2019 at 15:19
We had to fix the following issues before running the algorithm and will apply relevant penalties during the final scoring:
- invalid languages output format, fixed "lang_codes" => "lang_code"
- invalid news output format, fixed [{}] => {}
Fast algorithm!
Thread quality is low though - it seems you glue everything that has N matching words into a thread, which sometimes is completely wrong. Still pretty decent result per time spent - makes sense to use as 1st level system, with more advanced approach processing these results to reduce overall time
Low precision of news detection.
Many news are classified as non-news in English. At the same time, many "5 things you need to know"-type articles end up in the top. E.g., the Economy top is dominated by ads.
Nobody added any issues yet...