Open Website

Testing and Issues

You can test this app and submit issues during the testing period of the Data Clustering Contest contest.

Entries with serious issues will not be able to win the contest, but even minor issues might be important for overall results.




I see - it's finally working! Thanks for all the help :)

Unfortunately I made a critical bug in top output - all articles are assigned the same category: typo in output code - in fact categories are detected. Well, quite terribly, but still detected :)
You have not added any comments yet...
by rating


Fair Leopard Feb 28, 2020 at 15:11
Final score for this submission (out of 100):

Languages: 20.24
News EN: 8.6
News RU: 6.72
Categories EN: 8.73
Categories RU: 7.04
Threads EN: 0
Threads RU: 0
Unfortunately, this submission didn't get a high enough score to be evaluated for Top news (task 5).

These data reflect the relative accuracy, precision and speed of the algorithm as compared to the other submissions.
Fair Leopard Feb 6, 2020 at 16:03
In our preliminary tests, this submission received the following scores (out of 100):

Languages: 98
News EN: 44
News RU: 92
Categories EN: 17
Categories RU: 28
Threads EN: 0
Threads RU: 0

Unfortunately, this submission didn't get a high enough score for the final task (top news) to be evaluated.

This is not the final result, please stay tuned for updates. We apologize for the delay.
Big Rat Feb 6, 2020 at 17:27
Do threads score include testing on the first dataset? My processing timed out on the second one, but fit in time limit on the first one - by far not perfectly, of course, but to my taste - better than zero :)
Fair Leopard Dec 16, 2019 at 13:36
This submission was unable to deliver results for stages 2-5 of the test (news/categories/threads/top) due to an issue on our side. The issue has been fixed and will not result in a penalty during final scoring.
Big Rat Dec 16, 2019 at 13:51
Thanks for fixing that! Unfortunately I've made a typo in the output stage of "top" output, so instead of real category, all threads are assigned the same one (which is the category assigned to the very first processed article - typo results in reference to zero element instead of proper number), and realized there are other major problems. But nevertheless, now that it mostly works I feel much better :)
Fair Leopard Dec 12, 2019 at 16:47
We had to fix the following issues before running the algorithm and will apply relevant penalties during the final scoring:
- no tgnews binary in the root folder

The following issues have been discovered during preliminary testing:
- script execution timed out
Big Rat Dec 12, 2019 at 19:14
I forgot to put /models folder on the same level as tgnews binary. It requires models folder to be 1 level higher. Please either run it from nested folder, or put /models there
Fair Leopard Dec 12, 2019 at 21:38
We've just ran your binary one more time from tgnews_launch_from_here folder and fixed the following issues (relevant penalties would be also applied):
- invalid output format, fixed extra comma (,] => ]), fixed capitlized categories
- invalid news output format, fixed [{}] => {}
Big Rat Dec 12, 2019 at 22:22
Thanks for the fix!
But there is a little problem. Languages are processed correctly, but everything else gives 0 results. It definitely is not what I wrote :) Can you please double check that you didn't break any logic when fixed output format? If you can send me fixed version, I would gladly check it myself and point on any possible problems there
News/non-news filter is too strict for English: 2/3 of articles are filtered out
Low precision of Categories: e.g., English has almost no Society (which should be the major category), Entertainment, Sport; Other category is too broad in both languages
Big Rat Dec 16, 2019 at 19:20
Thanks for the review! Yes, noticed that - overall my algo uses a reference set of articles to categorize, and I've submitted terrible reference set (due to the lack of time). For example, in English I have 5 articles in "Science" (but relevant ones), while in Russian - 853 (95% clearly not science). I think I messed up score processing (should have calculate thresholds from actual distribution, while used static ones due to the lack of time), a few more days probably would make a big difference with my approach :) But right now it's nowhere close to relevant
Fair Leopard Dec 13, 2019 at 13:51
We did not change your source code, only launching folder and output from stdout. You can check the raw output of your binary (before any fixes) using a link in the footer (e.g. for news output).
Big Rat Dec 13, 2019 at 14:19
I see... It's a pity, I have no idea what went wrong - on my setup everything works. Can you try running it on a different dataset? Maybe something smaller, I could have missed some memory errors associated with larger news arrays?
Nobody added any issues yet...