Data Clustering Contest – Developer Challenges

Info

Author

Testing and Issues

You can test this app and submit issues during the testing period of the Data Clustering Contest contest.

Entries with serious issues will not be able to win the contest, but even minor issues might be important for overall results.

Voting

Issues

Fair Leopard Feb 28, 2020 at 15:11

Final score for this submission (out of 100):

Languages: 13.35
News EN: 12.34
News RU: 12.69
Categories EN: 12.34
Categories RU: 12.69
Threads EN: 12.39
Threads RU: 16.11
Top news EN: 10.04
Top news RU: 10.3

These data reflect the relative accuracy, precision and speed of the algorithm as compared to the other submissions.

Fair Leopard Feb 6, 2020 at 16:03

In our preliminary tests, this submission received the following scores (out of 100):

Languages: 84
News EN: 26
News RU: 16
Categories EN: 8
Categories RU: 5
Threads EN: 52
Threads RU: 33

Unfortunately, this submission didn't get a high enough score for the final task (top news) to be evaluated.

This is not the final result, please stay tuned for updates. We apologize for the delay.

Fair Leopard Dec 12, 2019 at 14:51

We had to fix the following issues before running the algorithm and will apply relevant penalties during the final scoring:
- invalid languages output format, fixed extra comma ([," => [")

Hairy Lemur Dec 12, 2019 at 21:40

Thanks for the fix! But here is the language output of my binary I sent you:
https://drive.google.com/open?id=1Bu7TPpPGMdEFLM1_jqhSIII9Nd4tjuH3
this json is totally valid :)
Also my benchmarks seem to show completely different results when I run them on your final dataset.
Hope you'll look into this issue I'am having.
i've attached a screenshot from the benchmark
I'am running on google cloud n1-highcpu-8 (8 vCPUs, 7.2 GB memory)
the binary is the same i sent you

Fair Leopard Dec 12, 2019 at 21:54

#issue9755
Here is languages output of your submission: https://entry1149-dcround1.usercontent.dev/languages/output.txt
Try to run it several times, sometimes the result is valid, and sometimes not.

Hairy Lemur Dec 12, 2019 at 22:42

ok, seems to reproduce on smaller datasets.
Thank you very much and sorry for the noise, this app is far from complete when viewing it with this kind of viewer :)
Have no idea if the benchmarks changed after your fix or not, but definitely don't match with what I'am getting myself with this latest dataset. But anyway this is too small of a dataset, to show the real performance, i was testing mainly on the 2gb dataset(https://data-static.usercontent.dev/DataClusteringSample0107.tar.gz) and app has exceptional top perfomance which takes about 60s for the whole dataset with about 100k articles totalling. But I guess this is for the geeks :) Thanks!

Large Crab Dec 12, 2019 at 19:41

Too little news -> very poor economy/sports categories

Hairy Lemur Dec 12, 2019 at 22:47

very small dataset for my kind of precision

Nobody added any issues yet...

Info

Testing and Issues

Voting

Issues

Log In