Info

Open Website

Testing and Issues

You can test this app and submit issues during the testing period of the Data Clustering Contest contest.

Entries with serious issues will not be able to win the contest, but even minor issues might be important for overall results.

Voting

111

Comments

Please find fix for two reported issues
- Invalid categories output format
- Mixing of ru and en articles (i didn't find anything against it in the rules though, but I see many contestants did this)

https://pastebin.com/5M3Nqd5F
You have not added any comments yet...
by rating

Issues

Fair Leopard Feb 28, 2020 at 15:11
Final score for this submission (out of 100):

Languages: 18.8
News EN: 18.6
News RU: 37.95
Categories EN: 45.48
Categories RU: 62.66
Threads EN: 25.94
Threads RU: 39.41
Top news EN: 10.79
Top news RU: 15.42

These data reflect the relative accuracy, precision and speed of the algorithm as compared to the other submissions.
30
Fair Leopard Feb 6, 2020 at 16:03
In our preliminary tests, this submission received the following scores (out of 100):

Languages: 98
News EN: 86
News RU: 89
Categories EN: 66
Categories RU: 78
Threads EN: 57
Threads RU: 51
Top EN: 3
Top RU: 54

This is not the final result, please stay tuned for updates. We apologize for the delay.
20
Fair Mammoth Feb 7, 2020 at 16:15
В ходе предварительного тестирования алгоритма были выявлены следующие недостатки в ранжировании:

– Большое количество дублирующихся заголовков и сюжетов с одинаковым числом статей. Сортировка сюжетов внутри категорий осуществляется по количеству статей. Заголовки некоторых сюжетов не отражают их содержание.

– В русском топе отображаются англоязычные статьи в самом низу сюжетов. В англоязычном топе статьи на русском языке.

– Нарушена сортировка статей в некоторых сюжетах: релевантные статьи смешаны с нерелевантными.
20
Night Sloth Feb 7, 2020 at 16:18
По-поводу смешения языков - в моем алгоритме активно используется кеширование результатов предыдущих шагов, и если в промежуточном шаге изменить source_dir, то результаты будут некорректными. В условиях конкурса не было указано, что путь к исходным данным будет меняться между запусками, поэтому я посчитал, что такое кеширование допустимо
Fair Leopard Dec 12, 2019 at 15:13
We had to fix the following issues before running the algorithm and will apply relevant penalties during the final scoring:
- invalid categories output format, fixed {%category%: %articles%} => {"category": %category%, "articles": %articles%}

The following issues have been discovered during preliminary testing:
- the app returns articles that were not in the source_dir
10
Night Sloth Dec 12, 2019 at 20:58
Sorry about the first issue. Regarding the last one - probably you tested it with one set of files and didn't clear temporary files ("{ru|en}.*") before submitting results. There were no restrictions against the caching step results and I noted it specifically in the readme that the app aggressively caches results of previous steps. Probably that was the reason?
The russian news are both in RU and EN languages.
Night Sloth Dec 12, 2019 at 20:51
is there a direct restriction in rules against mixing news in differenet languages?
Fair Leopard Dec 16, 2019 at 13:07
#comment9929
We use the following launch order for testing all submissions:
tgnews languages raw_source_dir
tgnews news en_source_dir
tgnews news ru_source_dir
tgnews categories en_source_dir
tgnews categories ru_source_dir
tgnews threads en_source_dir
tgnews threads ru_source_dir
tgnews top en_source_dir
tgnews top ru_source_dir

raw_source_dir – directory with the articles in different languages;
en_source_dir – directory with the articles in english only;
ru_source_dir – directory with the articles in russian only.

So we are expected that the app returns a result based on the articles from source_dir passed in parameter.
Night Sloth Dec 16, 2019 at 13:10
Thanks for the clarification. However, it was not stated directly in the contest rules that the source dir can be changed during the tests. So I aggressively cached results from previous steps in order to speed up further steps.
Nobody added any issues yet...