Info

Open Website

Testing and Issues

You can test this app and submit issues during the testing period of the Data Clustering Contest, Stage 2 contest.

Entries with serious issues will not be able to win the contest, but even minor issues might be important for overall results.

Voting

12
by rating

Issues

Fair Leopard Jul 7 at 16:10
In our preliminary tests, this submission received the following scores (out of 100):

Languages: 98
News EN: 81
News RU: 88
Categories EN: 59
Categories RU: 62
Threads EN: 74
Threads RU: 56
10
Mad Crow Jul 9 at 06:31
Most likely I understand the problem with todays articles. In fact reindexing after uploading the bunch of articles should be performed in background, however this leads to most of articles were missing for the case when GET request is send immediately after PUT request, so I put delay go GET request to wait for queue of pending articles was fully processed before response. That's why main and society indexing takes a little longer than 60000ms. Whole category is getting reindexed. I'm not 100% sure but a pause of 1-3 minutes or repeat request should solve my problem. I'm sorry about the inconveniences
Fair Mammoth Jul 31 at 22:16
В ходе тестирования алгоритма были выявлены следующие недостатки в ранжировании:

1. RU
– Нарушена структура: раздел Science содержит всего одну статью.
– Отсутствуют многие главные сюжеты в разделе ‘Main’ и внутри категорий.
– Заголовки многих сюжетов не отражают их содержание.
– Нарушена сортировка статей во многих сюжетах: нерелевантные статьи отображаются выше релевантных.
– Некоторые главные сюжеты нерелевантны для широкой аудитории из России.

2. EN
– Отсутствуют многие главные сюжеты в разделе ‘Main’ и внутри категорий.
– Заголовки многих сюжетов не отражают их содержание.
– Нарушена сортировка статей во многих сюжетах: нерелевантные статьи отображаются выше релевантных.
– Многие главные сюжеты нерелевантны для широкой англоязычной аудитории.
10
Mindful Kitten Jun 24 at 01:10
Ok, got it, indeed may be the testing algorithm. Also note that in categories the science might be underrepresented, imo
1
Mad Crow Jun 24 at 01:13
Anyway, I agree with you, comparing with your submission science category is looking really poor. I used training based on 10 words for each category, there's a chance it didn't work well with new data especially in "science" category.
Fair Leopard Jun 24 at 22:47
All articles were indexed with max-age=86400 so they shouldn't be removed from index.
1
Mad Crow Jun 25 at 06:49
Maybe everything is ok and nothing to worry about. I will wait for you to publish dataset and maybe testing script or testing requests description to reproduce and find out more.
Mindful Kitten Jun 24 at 01:05
Looks like science category is missing many news
Mad Crow Jun 24 at 01:07
At the moment I'm not sure about it. If you check different hours there are more articles. It depends on testing algorithm, I believe there can be the case when expired articles were removed.
Fair Leopard Jun 24 at 12:28
If you have some comments for the judges about running your submission please leave a reply here.
Mad Crow Jun 24 at 12:56
The way solution works is described in readme.txt, at the moment I'm not sure about Top for whole day and hours, since I see whole days request does not contain articles from hours entries. if you provide more information about algorithm you used for uploading data I could understand if this is ok and expired articles were removed or there's a bug.
Mad Crow Jul 1 at 22:45
Thank you for your attention! I found out it is published reading others comments and finally found the link! In fact, I can imagine request sequences showing data exactly as it is processed, however it would be nice to know how did you upload dataset? Did you kill server? What does your timeline mean? How did you send requests? Did you deleted random articles? How many GET requests were send simultaneously? I'm not sure if it maters now, so if you are going to reveal this information, it would be great, if not, let's wait for your decision!
Nobody added any issues yet...