Data Clustering Contest, Stage 2 – Developer Challenges

Info

Author

Testing and Issues

You can test this app and submit issues during the testing period of the Data Clustering Contest, Stage 2 contest.

Entries with serious issues will not be able to win the contest, but even minor issues might be important for overall results.

Voting

Issues

Fair Mammoth Jul 31, 2020 at 22:07

В ходе тестирования алгоритма были выявлены следующие недостатки в ранжировании:

1. RU
– Отсутствуют некоторые главные сюжеты в разделе ‘Main’ и внутри категорий.
– Заголовки некоторых сюжетов не отражают их содержание.
– Нарушена сортировка статей во многих сюжетах: нерелевантные статьи отображаются выше релевантных.
– Некоторые главные сюжеты нерелевантны для широкой аудитории из России.

2. EN
– Отсутствуют многие главные сюжеты в разделе ‘Main’ и внутри категорий.
– Заголовки некоторых сюжетов не отражают их содержание.
– Нарушена сортировка статей в части сюжетов: нерелевантные статьи отображаются выше релевантных.
– Многие главные сюжеты нерелевантны для широкой англоязычной аудитории.

Fair Leopard Jul 7, 2020 at 16:10

In our preliminary tests, this submission received the following scores (out of 100):

Languages: 100
News EN: 72
News RU: 97
Categories EN: 88
Categories RU: 90
Threads EN: 78
Threads RU: 59

Fair Leopard Jun 23, 2020 at 20:46

We had to fix the following issues before running the algorithm and will apply relevant penalties during the final scoring:
- rerun from source code (due to the error: Illegal instruction LD_LIBRARY_PATH="$LD_LIBRARY_PATH:./libtorch/lib" build/tgnews "$@")

Mindful Squirrel Jun 25, 2020 at 11:32

Hello!
We have an issue with an asynchronous clustering. It is running in a separate thread. Its iteration lasts a few minutes. It is clear from the submission that there was no lag between inserting documents and querying threads, so the clustering for 18:00 and 20:00 did not finish.
In real-world conditions, this clustering lag does not affect the quality of the top, so we suggest to re-run our submission with waiting 2-3 minutes after insertion of all documents (presumably adding this time to overall indexing time).

Mindful Kitten Jun 25, 2020 at 11:25

Looks like clustering for 12,14,16 hours is the same and the latter two contain irrelevant threads and news. Bet that all the articles were up to 14 were added, background clustering was not yet processed and with the immediate request it returned all up to 12 hours. This hurts also the daily top

Nobody added any issues yet...

Info

Testing and Issues

Voting

Issues

Log In