Info

Open Website

Testing and Issues

You can test this app and submit issues during the testing period of the Data Clustering Contest contest.

Entries with serious issues will not be able to win the contest, but even minor issues might be important for overall results.

Voting

410

Comments

## Features

* Command line interface for interactive training and classification

## Installing

To start using `tgnews`, install Go 1.13 and run `go build`:

This will retrieve the app.

## Usage

```
tgnews train source_dir
```

## Performance

Performance is not good (around 300 docs/sec). Wait for optimisations.

## How it is done

* Put articles to train folders by categories manually or via command line interface (./tgnews train source_dir)
* Program will load articles by categories from "train" folders and build model on the fly
* Calculate TF/IDF and so on
* Calculate cosine similarity with category/article, article/article and so (no magic here)
* Top threads are weighted by similarity to category and limited by 11

## Limitations

* Pretrained datasets are small (10-100 documents in each category)
1
You have not added any comments yet...
by time

Issues

Top threads are really good, but looks like you have disbalance in news detection
Suave Duck Dec 13, 2019 at 07:18
Thank you! I chose the accuracy side in the war between quantity and accuracy)
Fair Leopard Feb 6, 2020 at 16:03
In our preliminary tests, this submission received the following scores (out of 100):

Languages: 99
News EN: 21
News RU: 12
Categories EN: 12
Categories RU: 4
Threads EN: 27
Threads RU: 10

Unfortunately, this submission didn't get a high enough score for the final task (top news) to be evaluated.

This is not the final result, please stay tuned for updates. We apologize for the delay.
20
Fair Leopard Feb 28, 2020 at 15:11
Final score for this submission (out of 100):

Languages: 71.52
News EN: 12.37
News RU: 12.4
Categories EN: 12.39
Categories RU: 12.37
Threads EN: 11.08
Threads RU: 11.98
Unfortunately, this submission didn't get a high enough score to be evaluated for Top news (task 5).

These data reflect the relative accuracy, precision and speed of the algorithm as compared to the other submissions.
30
Nobody added any issues yet...