Info

Open Website

Testing and Issues

You can test this app and submit issues during the testing period of the Data Clustering Contest, Stage 2 contest.

Entries with serious issues will not be able to win the contest, but even minor issues might be important for overall results.

Voting

23
by rating

Issues

Fair Leopard Jul 7, 2020 at 16:10
In our preliminary tests, this submission received the following scores (out of 100):

Languages: 100
News EN: 0
News RU: 0
Categories EN: 83
Categories RU: 78
Threads EN: 71
Threads RU: 45
10
Bossy Gnu Jul 9, 2020 at 11:28
Could you please correct these scores as the testing issue has been fixed?
Fair Mammoth Jul 31, 2020 at 22:09
В ходе тестирования алгоритма были выявлены следующие недостатки в ранжировании:

1. RU
– Нарушена структура: раздел Other содержит 0 статей;;
– Отсутствуют некоторые главные сюжеты в разделе ‘Main’ и внутри категорий;
– Заголовки многих сюжетов не отражают их содержание;
– Нарушена сортировка статей во многих сюжетах: нерелевантные статьи отображаются выше релевантных;
– Некоторые главные сюжеты нерелевантны для широкой аудитории из России.

2. EN
– Нарушена структура: раздел Other содержит 0 статей;
– Отсутствуют многие главные сюжеты в разделе ‘Main’ и внутри категорий;
– Заголовки многих сюжетов не отражают их содержание;
– Нарушена сортировка статей во многих сюжетах: нерелевантные статьи отображаются выше релевантных;
– Многие главные сюжеты нерелевантны для широкой англоязычной аудитории.
10
Categories don't contain almost anything for the all categories except society
Bossy Gnu Jun 25, 2020 at 12:19
There is a test issue, I think it not my results. I hope the contest organisers will rerun my code....
Fair Leopard Jun 24, 2020 at 12:28
If you have some comments for the judges about running your submission please leave a reply here.
Bossy Gnu Jul 6, 2020 at 10:06
I am writing to reminde that the test result of my submission looks like a completely different submission output. Unfortunatly I did not receive any response from judges within last two weeks.
I have tested my code on the provided data sets - DataClusteringDataset0525En.tar.gz, DataClusteringDataset0525Ru.tar.gz and DataClusteringDataset0525.tar.gz.
You can download my raw output here - https://drive.google.com/file/d/1_gp_AUrRPxrC30eVEtQ7MKwqJtnWVqFW/view?usp=sharing
For example, my output for en/categories/sports contains 1179 articles but here I can see only 6 articles - https://entry1394-dcround2.usercontent.dev/20200525/categories/en/sports.html
My "top" output through "tgnews server" - https://drive.google.com/file/d/1Vu2T8aqsiou8Dj4PHyYMkmswYzQSt14B/view?usp=sharing
There is a small bug that affects on news detection only.
newsDetector/newsDetector.cpp:42
- _result[i] = prediction[i];
+ _result[i] = (prediction[i] > 0.0f);
Fair Leopard Jul 7, 2020 at 17:34
We re-ran your submission and got the same result (e.g. 6 articles in sports category).
Bossy Gnu Jul 7, 2020 at 19:11
Thank you for the reply.

It's unbelievable, I tested my code on different Linux distributives, on MacOS and FreeBSD as well as I used GNU gcc and LLVM clang and 8th - 10th generation Intel x64 CPUs. I can reproduce my results with 100% identical output on all of these configurations.

And I've tested it once againe.

My submition.zip file MD5 sum is (redownloaded from the Teleram's Jobs Bot) c8977a57e43c95dfcc42c7493f20af9a - https://www.dropbox.com/s/thrt1z4l6yib5ss/submission.zip

Test set - https://data-static.usercontent.dev/DataClusteringDataset0525En.tar.gz

Output of the "./tgnews categories ../20200525 > out.json" - https://www.dropbox.com/s/tgem0ga6j3d7h2y/out.json
Fair Leopard Jul 7, 2020 at 21:20
It was a strange issue with unzipping your submission on our side. We will re-run your submission soon. Sorry for this inconvenience.
Bossy Gnu Jul 8, 2020 at 07:14
Thank you, now output looks correct/expected to me.

I know it is too late to change the source code, but for your reference - there is a small bug related to news detection algorithm:
newsDetector/NewsDetector.cpp, line 42:
_result[i] = prediction[i];
should be:
_result[i] = (prediction[i] > 0.0f);
There is the following rule violation - "If the resulting list contains more than 1000 threads, the algorithm may return the top 1000 threads (threads, not articles)". But all of categories are unlimited and some of them contain more than 10000 threads
Bossy Gnu Jul 8, 2020 at 13:39
read carefully - "the algorithm MAY return"
Fair Leopard Jul 10, 2020 at 01:29
#issue11293 Sure. Your scores were updated.
Nobody added any issues yet...