Info

Open Website

Testing and Issues

You can test this app and submit issues during the testing period of the Data Clustering Contest contest.

Entries with serious issues will not be able to win the contest, but even minor issues might be important for overall results.

Voting

88

Comments

Our ranking algorithm uses timestamp as one of the features. We use a 99.5 percentile of all documents timestamps as a current timestamp.
However, if there are many documents from "future" (as in the English part of the test dataset), they will be boosted to the top.
It can be easily fixed with changing the percentile border. It is not the issue with clustering.
In the screenshot one can see some of the "future" documents. The true current timestamp is the 29th of November.
And the Entertainment category in the "top" task was named "entArtainment" by mistake :/
In "categories" it has correct name and shows nicely.
You have not added any comments yet...
by time

Issues

Fair Leopard Dec 12, 2019 at 15:25
We had to fix the following issues before running the algorithm and will apply relevant penalties during the final scoring:
- "Illegal instruction" error, build manually
11
Mindful Squirrel Dec 12, 2019 at 20:46
We also have another problem with Entertainment category, it occurs because of misspell in code. It is named "entertainment" in src/rank/rank.cpp
No entertainment.
2
Mindful Squirrel Dec 13, 2019 at 15:49
Thank you for your feedback! As I stated in the comments and in the first issue, the reason for an empty Entertainment category is a misspelling in the name of the category in code.
English and Russian articles under "Other" tab
2
Getahun Mesele Jan 18, 2020 at 19:08
Redundancy of articles
Apple/safari/12.4
Getahun Mesele Jan 18, 2020 at 19:12
Why 7 (seven) articles for a single news?
Apple/safari/12.4
Fair Leopard Feb 6, 2020 at 16:03
In our preliminary tests, this submission received the following scores (out of 100):

Languages: 97
News EN: 89
News RU: 94
Categories EN: 82
Categories RU: 79
Threads EN: 67
Threads RU: 46
Top EN: 68
Top RU: 55

This is not the final result, please stay tuned for updates. We apologize for the delay.
20
Fair Quokka Feb 7, 2020 at 20:39
В ходе предварительного тестирования алгоритма были выявлены следующие недостатки в ранжировании:

– Большое количество чрезмерно широких сюжетов, а также сюжетов, состоящих из одной статьи.

– Заголовки части сюжетов слишком размытые (информация не подаётся в краткой нейтральной форме). 

– Нарушена сортировка статей в сюжетах: релевантные статьи смешаны с нерелевантными.
20
Fair Leopard Feb 28, 2020 at 15:11
Final score for this submission (out of 100):

Languages: 13.06
News EN: 67.52
News RU: 84.5
Categories EN: 56.05
Categories RU: 66.79
Threads EN: 29.03
Threads RU: 31.2
Top news EN: 34.16
Top news RU: 31.8

These data reflect the relative accuracy, precision and speed of the algorithm as compared to the other submissions.
30
Nobody added any issues yet...