You can test this app and submit issues during the testing period of the Data Clustering Contest, Stage 2 contest.

Entries with serious issues will not be able to win the contest, but even minor issues might be important for overall results.


Fair Quokka Jul 31, 2020 at 22:05
В ходе тестирования алгоритма были выявлены следующие недостатки в ранжировании:

1. RU
– Заголовки некоторых сюжетов не отражают их содержание.
– Нарушена сортировка статей в части сюжетов: нерелевантные статьи отображаются выше релевантных.
– Некоторые главные сюжеты нерелевантны для широкой аудитории из России.

2. EN
– Заголовки многих сюжетов не отражают их содержание.
– Нарушена сортировка статей во многих сюжетах: нерелевантные статьи отображаются выше релевантных.
– Многие главные сюжеты нерелевантны для широкой англоязычной аудитории.
Fair Leopard Jul 7, 2020 at 16:10
In our preliminary tests, this submission received the following scores (out of 100):

Languages: 100
News EN: 51
News RU: 87
Categories EN: 84
Categories RU: 69
Threads EN: 76
Threads RU: 58
Fair Leopard Jun 24, 2020 at 12:28
If you have some comments for the judges about running your submission please leave a reply here.
Mindful Kitten Jun 24, 2020 at 12:58
If possible, it would be nice to run the submission with the SSD disk. Other submissions were using page cache which is unreliable with the OS turn offs and failures (which leads to data loss), I am flushing to the disk each request (I guarantee that if the request returned 200, the entry is written into the index), so that's why I am spending up to 12-15ms each request.

If that's not possible or you consider OS power outages as a minor issue, don't rerun the submission, that's then my problem
Lawful Kiwi Jul 8, 2020 at 08:25
The news titled "Еще 15 человек заразились коронавирусом в Крыму — Аксенов" grouped under itself news not about this event but about coronavirus generally. And I'm not sure that today, 8th of July, this news should be in Main thread, there are a lot of more important events not represented in Main, but of course this is already subjectively.
Mindful Kitten Jul 8, 2020 at 08:34
Agree, I have some problems with having huge coronavirus and accident news threads, did not have time to handle them accordingly or otherwise other clusters were going to be split more aggressively
There is the following rule violation - "If the resulting list contains more than 1000 threads, the algorithm may return the top 1000 threads (threads, not articles)". But all of categories are limited by 100 threads here.
Mindful Kitten Jul 8, 2020 at 13:29
Not a violation. By rules, I can consider other threads irrelevant and might not return them. I must limit if only I found more than 1000
Something wrong goes on most of Top sections
Mindful Kitten Jul 14, 2020 at 17:40
Looks like that my server was stopped in the meantime, it does not get any updates in 40 minutes. Ru news are immediate, btw. Looks super suspicious
I saw the same when I checked submissions about 8 hours ago ("all day", not "last hour"). But I thought it is a temporary issue.
Mindful Kitten Jul 14, 2020 at 18:23
I believe that the network connection is somewhat lost and this is a machine issue, 502 error comes from nginx (I also caught an error half an hour ago). My top has a very strict deadline and there is no way it can work for more than 20s
"Looks like that my server was stopped in the meantime"... 
Isn't it your log?
Mindful Kitten Jul 14, 2020 at 18:45
If you want to be malicious because I wrote several issues to you before -- even if I had good intentions to justify at the time, you can continue do it. I in the end helped you find the bug in your code and you are lucky now that your solution works without any memory corruption. I did not assume any bad intentions.

Yes, it is my log. Due to logs, the server was not stopped. There are many other ways to stop receiving the updates, for example, removing nginx server or corrupting the network.
Nobody added any issues yet...