
Open Website

Testing and Issues

You can test this app and submit issues during the testing period of the Data Clustering Contest contest.

Entries with serious issues will not be able to win the contest, but even minor issues might be important for overall results.




Everything works but the "top" section because of subdirectories and relative file names.

I thought that source_dir will be a path with files not in subdirectories, because in the response example we see relative file names without subdirectory path.
All commands but "top" works fine with subdirectories.

How to fix the "top":
Run on directory with files inside (not in subdirectories) or copy all files from threads step response into one flat directory.
You have not added any comments yet...
by rating


Fair Leopard Feb 28, 2020 at 15:11
Final score for this submission (out of 100):

Languages: 12.48
News EN: 35.32
News RU: 70.74
Categories EN: 11.87
Categories RU: 28.95
Threads EN: 19.92
Threads RU: 11.65
Top news EN: 8.55
Top news RU: 9.24

These data reflect the relative accuracy, precision and speed of the algorithm as compared to the other submissions.
Fair Leopard Feb 6, 2020 at 16:03
In our preliminary tests, this submission received the following scores (out of 100):

Languages: 98
News EN: 65
News RU: 89
Categories EN: 42
Categories RU: 50
Threads EN: 57
Threads RU: 28
Top EN: 40
Top RU: 30

This is not the final result, please stay tuned for updates. We apologize for the delay.
Fair Mammoth Feb 7, 2020 at 20:40
В ходе предварительного тестирования алгоритма были выявлены следующие недостатки в ранжировании:

– Отсутствуют многие главные сюжеты в разделе ‘Main’ и внутри категорий. В разделе 'Main' представлены сюжеты одной категории. Большое количество сюжетов состоят из одной статьи.

– Заголовки части сюжетов слишком размытые (информация не подаётся в краткой нейтральной форме).

– Нарушена сортировка статей во многих сюжетах: нерелевантные статьи отображаются выше релевантных.
Large Cock Feb 7, 2020 at 20:50
Благодарю за комментарий, все так, большая часть этих проблем (нерелевантность и не очень корректно собранные треды) из-за бага с синхронизацией многопоточной обработки, который исправить можно только изменив код.
Fair Leopard Dec 16, 2019 at 18:08
The algorithm has been relaunched, kindly check the new results.
Large Cock Dec 17, 2019 at 12:05
Yeah, thank you. "top" sorting is not enabled anyway, so it does not look relevant, but it works fine.
Will you take me to your Penis Euphemism Club?
Nobody added any issues yet...