Open Website

Testing and Issues

You can test this app and submit issues during the testing period of the Data Clustering Contest contest.

Entries with serious issues will not be able to win the contest, but even minor issues might be important for overall results.




My solution performs better on eng language - cause im using pretrained en models. but language detection works good.
I spend 10 hours for this true MVP fast and stable solution. There are some problems:
1) for threads grouping i used Minhash algo, but it performs bad cause requires some tuning - now it finds word with minimum hash in title and group but hash value. That causes threads with many articles like first thread where each article has word "police"
2) works bad on russian language
3) news detection works on domain "white list" - so it has low accuracy
You have not added any comments yet...
by rating


Fair Leopard Feb 6, 2020 at 16:03
In our preliminary tests, this submission received the following scores (out of 100):

Languages: 100
News EN: 72
News RU: 85
Categories EN: 44
Categories RU: 9
Threads EN: 51
Threads RU: 27
Top EN: 53
Top RU: 30

This is not the final result, please stay tuned for updates. We apologize for the delay.
Fair Leopard Feb 28, 2020 at 15:11
Final score for this submission (out of 100):

Languages: 90.12
News EN: 38.34
News RU: 45.8
Categories EN: 13.71
Categories RU: 13.29
Threads EN: 13.33
Threads RU: 13.34
Top news EN: 10.82
Top news RU: 10.83

These data reflect the relative accuracy, precision and speed of the algorithm as compared to the other submissions.
Fair Quokka Feb 7, 2020 at 16:34
В ходе предварительного тестирования алгоритма были выявлены следующие недостатки в ранжировании:

– Отсутствуют многие главные сюжеты в разделе 'Main' и внутри категорий. В топе отражены нерелевантные сюжеты.

– Нарушена сортировка статей в сюжетах: релевантные статьи смешаны с нерелевантными.
Fair Leopard Dec 13, 2019 at 12:51
We were unable to launch this submission due to the following reasons:
- unable to run MacOS binary under Debian GNU/Linux 10.1 (buster), x86-64
Fair Leopard Dec 16, 2019 at 16:02
We had to rebuild your binary and will apply relevant penalties during the final scoring.
Sweet Beaver Dec 16, 2019 at 16:05
Thx much!!!
Low categorization accuracy in Russian. Almost all articles are classified as Other or Entertainment.
Sweet Beaver Dec 17, 2019 at 11:29
Yes, as i stated higher - my model was pretrained on English language only :)
Also English categorization for SciAndTech is poor ;)
Threads containing unrelated articles
Nobody added any issues yet...