VoIP Contest: Round 1

General info about the contest is available on @contest. See also: This page in Russian

The task in this round of the VoIP Contest is to build a system for testing voice calls with two participants.

Contest Overview

A is a user making a voice call (caller).
B is a user receiving a voice call from A (callee).
The VoIP Relay is an intermediary server, which takes data from each of the participants and relays it to the other party. The relay has an IP-address, a port, and 2 “tags” per call – to distinguish between different calls and different participant roles in each of the calls (caller/callee).
All data exchanged by A and B is encrypted. For the call to work, both parties must use the same encryption key.
Depending on the network conditions (Edge, 3G, LTE, Wi-Fi) data packets may be lost or reordered and the sound may get distorted.
Your goal is to automate the process of testing sound transmission under various conditions.

[ Caller A ] <-> [ VoIP Relay ] <-> [ Callee B ]

You Will Use

The libtgvoip library. For the purposes of the contest, please use the commit d4a0f719 version of the code.
An HTTP API for receiving configuration data and credentials for the VoIP Relay (see below).
A collection of 53 sample audio files in the OPUS format, 5-60 seconds long (the files were recorded as voice messages using official Telegram apps). Download VoIP_Round1_Test_Sounds.zip, 6.49MB
You will not need the Telegram API.

The Task

Build libtgvoip for Linux.
Create the standalone application tgvoipcall with a Command-Line interface as described below, which uses libtgvoip to simulate a call from user A to user B. For outgoing sound, the app should use one of the OPUS files provided as its input. Incoming sound should be encoded to OPUS files. Call statistics should also be written to STDOUT in JSON format.
Create the standalone application tgvoiprate with a Command-Line interface as described below, which will rate the quality of the call by comparing the original file to its modified copy received by the callee. The rating should be expressed as a real number from 1.0 to 5.0. Criteria for the rating are up to you and your common sense (note how we will evaluate this task).
Keep changes in the libtgvoip library to a minimum.
Keep external dependencies in both apps to a minimum.

VoIP Contest API

To help you test your application, we created a simple API with one GET-endpoint which can initialize calls and provide VoIP Relay addresses and tags, encryption keys and config data. To work with the API you'll need a unique {auth_token} in the format 12345:c56d836270d512a6ac1a6ef78c4132ba14.

You can get your token from @jobs_bot (the token was included with the contest announcement – or will be sent to you by the bot as soon as you join this contest).

Each new call is defined by a random string of 1-128 bytes, passed to the GET-parameter call. Repeating the call with the same call value within 300 seconds will fetch an identical result.

GET https://api.contest.com/voip{auth_token}/getConnection?call={call}

The response will have the following structure:

{  
   "ok":true,
   "result":{
      "id":1117254731,
      "date":1570802828,
      "p2p_allowed":false,
      "config":{
         "audio_frame_size":60,
         ...
         "audio_strong_fec_bitrate":7000
      },
      "encryption_key":"61362f271d0e13c59...c5eb9ba9ca9daa06c9f",
      "endpoints":[  
         {  
            "id":"2377378803836",
            "ip":"134.209.176.124",
            "port":"553",
            "peer_tags":{  
               "caller":"fedd4f39ea...89991f1d7b1",
               "callee":"fabd4f39ea...89991f1d7b1"
            }
         }
      ]
   }
}

Tgvoipcall Interface

Two clients, A (caller) and B (callee) are transmitting the sounds sound_A.opus and sound_B.opus respectively.

In the example below, the file config.json contains the configuration from the config field in the API's response.

The program tgvoipcall is launched simultaneously in two instances (one for each participant of the call) with the following parameters for A and B, respectively:

tgvoipcall reflector:port tag_caller_hex -k encryption_key_hex -i /path/to/sound_A.opus -o /path/to/sound_output_B.opus -c config.json -r caller
tgvoipcall reflector:port tag_callee_hex -k encryption_key_hex -i /path/to/sound_B.opus -o /path/to/sound_output_A.opus -c config.json -r callee

e.g.
$ tgvoipcall 134.209.176.124:553 fedd4f39e89991f1d7b1 -k 61362f271d0e13c59...c5eb9ba9ca9daa06c9f -i sound_B.opus -o sound_output_A.opus -c config.json -r caller
{"libtgvoip_version":"2.4.4","log_type":"call_stats","network":{"type":"wifi"},"p2p_type":"inet","packet_stats":{"in":1049,"lost_in":4,"lost_out":0,"out":1042},"pref_relay":"2243506735106","problems":[],"protocol_version":9,"relay_rtt":87,"rtt":5,"tcp_used":false,"udp_avail":false}

tgvoipcall establishes a connection with the other participant and transmits its audio file in full, then waits for 3 seconds to finish receiving data from the other participant (you should choose audio files of similar lengths). Once done, the program exits and writes call statistics to STDOUT (see VoIPController::GetDebugLog).

The sound received from the other party should be encoded to an OPUS file with the following settings: mono, bit_rate 64000, sample_rate 48000, TDesktop Source).

If it's not possible to establish a connection within the timeout limit (5 seconds), the application writes an error to STDERR and exits with a non-zero exit code.

Tgvoiprate Interface

A tgvoiprate launch should not be linked to a previous launch of tgvoipcall, as the sound files may be generated by a different application.

$ tgvoiprate /path/to/sound_A.opus /path/to/sound_output_A.opus
4.6324

If for some reason you can't avoid external dependencies, you may list them in a text file (see deb-packages.txt below). These dependencies will be installed using sudo apt-get install ... before your app is tested. Available package names and versions can be found here.

Notes

We recommend using netem for testing under different network conditions.
P2P connections are outside the scope of the contest, you should disable P2P using the library parameters.
IPv6 connections are outside the scope of the contest, IPv6 will be unavailable on our server during testing.
OPUS files from the sample set are encoded with different settings, you should check that each of them works.
The applications will be tested under Debian GNU/Linux 10.1 (buster), x86-64.

As a result, you should get a ZIP-file with the following structure

result.zip
  -> tgvoipcall - an executable binary file with the interface as described above
  -> tgvoiprate - an executable binary file with the interface as described above
  -> src - a directory with the source code of the applications
  -> deb-packages.txt - a text file with line-break separated debian package names

Evaluation Criteria

To win in this round you will need to:

Complete the entire task
Pay special attention to the notes
Keep your code compact and efficient (size matters!)
Minimize external dependencies

When evaluating tgvoiprate submissions, we will first choose several high quality tgvoipcall submissions and use these applications to generate distorted sound samples.

When generating sound samples, netem will be used to emulate the properties of wide area networks (Rate control, Delays, Packet loss, duplication, re-ordering), and generate distortions up to complete degradation of the audio signal.
Up to two different properties will be affected in each individual test run.
The resulting audio files will be offered to a group of testers, who will rate their quality with integers from 0 to 6.
We will then compute the modified mean squared error of the application's rating from the mean human ratings using the formula sum(max(|x_i - y_i| - 0.3, 0.0)^2).