Fair Leopard Feb 28, 2020 at 15:11
Final score for this submission (out of 100):

Languages: 93.46
News EN: 69.76
News RU: 63.66
Categories EN: 61.45
Categories RU: 65.87
Threads EN: 36.81
Threads RU: 39.08
Top news EN: 20.66
Top news RU: 20.76

These data reflect the relative accuracy, precision and speed of the algorithm as compared to the other submissions.
Fair Leopard Feb 6, 2020 at 16:03
In our preliminary tests, this submission received the following scores (out of 100):

Languages: 100
News EN: 88
News RU: 93
Categories EN: 77
Categories RU: 78
Threads EN: 71
Threads RU: 52
Top EN: 63
Top RU: 64

This is not the final result, please stay tuned for updates. We apologize for the delay.
Fair Quokka Feb 7, 2020 at 20:42
В ходе предварительного тестирования алгоритма были выявлены следующие недостатки в ранжировании:

– Отсутствуют многие главные сюжеты в разделе ‘Main’ и внутри категорий. Нерелевантные сюжеты в топе. Сюжеты отсортированы по количеству статей в них.

– Заголовки части сюжетов слишком размытые (информация не подаётся в краткой нейтральной форме). 
Fair Leopard Dec 12, 2019 at 16:46
We had to fix the following issues before running the algorithm and will apply relevant penalties during the final scoring:
- invalid news output format, fixed extra comma;
- invalid threads and top output format, fixed unescaped qoute
Bossy Gnu Dec 13, 2019 at 13:51
> invalid news output format, fixed extra comma;
Fixed. Can be reproduced with only one language in a dataset (missed in my tests).
> invalid threads and top output format, fixed unescaped qoute
Fixed. OMG, shame on me! :)
Top threads of "Main" (both ru and en) consist of very loosely related articles.
Bossy Gnu Dec 13, 2019 at 12:45
Thank you for the comment. Yes, due to the extremely limited time I did not manage to configure the clustering algorithm perfectly. I used unmodified Chinese Whispers algotithm. And there is the well known problem - an object similar to an object which is similar to another object. This issue is fixed now. Any way IMO there are no any significant errors/problems in my implementation of the contest tasks.
Quite impressive news categorization, and news/no-news filtering! Not so great thread grouping though, but to me approach to improve that is rather clear.
Processing speed is really great!
Bossy Gnu Dec 17, 2019 at 23:16
"threads" & "top" are tuned and look nice now. I am going to upload new output jsons to show how it works now.
I used quantized embeding models to keep my submission below 200MB. On the other hand it takes about 8-10 seconds for each language to restore the model while an uncopressed model can be loaded in a few tens of microseconds.
And sure, you can load language models only once and use them in a further tasks. But I have to load and restore my model for each contest step (excluding the first step - languages detection).
