How to fix empty filenames issue without changing source code:

- Decompress the data set with this command:
> tar -zxf DataClusteringDatasetEn.tar.gz
directly in the same directory where tgnews is and execute the program (right image).

Note: If you decompress the data inside another directory, the program doesn't work (left image). I don't know why that happens, but it only happens with this data set, for all data set uploaded during the contest, the program works well in both circumstances.
Fair Leopard Feb 28, 2020 at 15:11
Final score for this submission (out of 100):

Languages: 12.9
News EN: 12.3
News RU: 12.97
Categories EN: 12.32
Categories RU: 12.96
Threads EN: 9.39
Threads RU: 11.55
Unfortunately, this submission didn't get a high enough score to be evaluated for Top news (task 5).

These data reflect the relative accuracy, precision and speed of the algorithm as compared to the other submissions.
Fair Leopard Feb 6, 2020 at 16:03
In our preliminary tests, this submission received the following scores (out of 100):

Languages: 92
News EN: 87
News RU: 32
Categories EN: 27
Categories RU: 15
Threads EN: 8
Threads RU: 2

Unfortunately, this submission didn't get a high enough score for the final task (top news) to be evaluated.

This is not the final result, please stay tuned for updates. We apologize for the delay.
Fair Leopard Dec 12, 2019 at 14:48
The following issues have been discovered during preliminary testing:
- empty filenames
Kingly Butterfly Dec 12, 2019 at 21:43
The data sets are rare. When It decompress on the Nautilus interface, it doesn't by a Permission denied. But when I decompress the data set with console command and execute the program, it runs fine. I think that issue is because of data because, for all data sets uploaded during the contest, the program works fine.
Fair Leopard Dec 16, 2019 at 20:12
There is no reason your app requires exaclty "20191129/" folder, but it works. So your algorithm has been relaunched and we will apply relevant penalties during the final scoring.
Threads are collections of absolutely unrelated articles
Categories assignment appears random for English. Almost all articles go into Society in Russian
2/3 of articles are "Other" in News for Russian, which is too much. In English, almost none of them in "Other", which is too little.
Both Russian and English articles contain articles in other languages.
