Info

Open Website

Testing and Issues

You can test this app and submit issues during the testing period of the Data Clustering Contest contest.

Entries with serious issues will not be able to win the contest, but even minor issues might be important for overall results.

Voting

57

Comments

Unfortunately we tested our submission only on data where both English and Russian articles are present. Only one language in the input leads it to NullPointerException.

It could be fixed with just one line:
List<String> myLangFiles = langs.getOrDefault(testLang, new ArrayList<>());
instead of
List<String> myLangFiles = langs.get(testLang);

Is it possible to retest the submission? (New binary could be downloaded here: https://drive.google.com/file/d/1xeKVhzb0S9mCVFfYKLtIFVzyc9GrebXp/view?usp=sharing)
1
As I said in previous comment, we expected dataset to contain both English and Russian articles.

It is possible to fix our solution without changing source code.

There are two possible ways:
1. Add two dummy articles (English and Russian) to all datasets. Here is examples of such files: https://drive.google.com/drive/folders/1qMVHCnG5DR72U_hCbMjDm57ngLkoj4at?usp=sharing
2. Combine en_source_dir and ru_source_dir into one directory and run a submission on it instead of separate executions.
You have not added any comments yet...
by rating

Issues

Fair Leopard Feb 28, 2020 at 15:11
Final score for this submission (out of 100):

Languages: 80.73
News EN: 12.67
News RU: 65.73
Categories EN: 60.9
Categories RU: 62.71
Threads EN: 50.88
Threads RU: 15.76
Top news EN: 52.3
Top news RU: 15.59

These data reflect the relative accuracy, precision and speed of the algorithm as compared to the other submissions.
30
Fair Leopard Feb 6, 2020 at 16:03
In our preliminary tests, this submission received the following scores (out of 100):

Languages: 99
News EN: 83
News RU: 94
Categories EN: 78
Categories RU: 79
Threads EN: 75
Threads RU: 56
Top EN: 81
Top RU: 70

This is not the final result, please stay tuned for updates. We apologize for the delay.
20
Fair Mammoth Feb 7, 2020 at 20:38
В ходе предварительного тестирования алгоритма были выявлены следующие недостатки в ранжировании:

– Сюжеты отсортированы по количеству статей в них.

– Заголовки части сюжетов слишком размытые (информация не подаётся в краткой нейтральной форме).

– Нарушена сортировка статей в некоторых сюжетах: релевантные статьи смешаны с нерелевантными.
20
Gifted Lemur Feb 7, 2020 at 23:47
Количество статей в сюжете действительно является одним из важных критериев сортировки, но далеко не единственным.

Это хорошо заметно на сюжетах по 20-50 статей.
Fair Leopard Dec 15, 2019 at 14:14
#comment9969
We had to re-run your algorithm with extra articles and will apply relevant penalties during the final scoring.
13
Nobody added any issues yet...