ML Competition 2023

The task in this competition is to create a library that detects a programming or markup language of a code snippet. The deadline is October, 15 at 23:59 Dubai time. Everyone is welcome to participate.

General info about this competition is available on @contest. Further submission instructions will be announced there closer to the deadline.

The Task

Implement a library that detects a programming or markup language of a code snippet. You can use any publicly available data to train your solution.

Development and Testing

You can download a sample library here: libtglang.tar.gz. tglang.h describes the interface you are required to implement in this contest. Use the following commands to build the library:

mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
cmake --build .

You can test the resulting library file libtglang.so on the test data using the test script libtglang-tester.tar.gz. To do this, copy libtglang.so into the directory containing the test script, then build with cmake in the standard way:

mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
cmake --build .

To test the library output, launch the resulting binary file tglang-tester with the following parameters:

tglang-tester <input_file>

where:

  • <input_file> – path to file containing input data,

The tester will output the numeric value of the detected language as specified by TglangLanguage enum.

Script launch example:

$ ./tglang-tester code.txt
9

General Requirements

  • The library can be built in any programming language of your choice.
  • Your library must work locally (no network usage including localhost).
  • Speed is of critical importance. The response time of your solution must not exceed 10 milliseconds for a text of 4096 characters.
  • External dependencies should be kept to a minimum. If you can't avoid external dependencies, please list them in a text file named deb-packages.txt. These dependencies will be installed using sudo apt-get install ... before your app is tested.
  • The library will be tested on servers running Debian GNU/Linux 10 (buster), x86-64 with 8 cores and 16 GB RAM. Before submitting, please make sure that your app works correctly on a clean system.
  • Make sure the library was built on Debian GNU/Linux 10 (buster).
  • You must submit a ZIP-file (the maximum limit for a file sent to the bot is 2 GB) with the following structure:
submission.zip
  -> src - folder with the app's source code (obligatory)
  -> libtglang.so - library (obligatory)
  -> resources - folder with additional files which your library requires to work (please use relative paths to access them) (optional)
  -> deb-packages.txt - a text file with line-break separated debian package names of all external dependencies (optional)

Evaluation

The solutions will be tested on code snippets from public Telegram chats, which may contain anything besides valid code. In the latter case the library is expected to return TGLANG_LANGUAGE_OTHER == 0.

Some rarely used languages may be never used in code snippets in public Telegram chats, and therefore will be absent in the evaluation dataset.

When evaluating submissions we will prioritize the speed and accuracy of the algorithms. Accuracy will have the highest priority.