VoIP Contest: Stage 3

General info about the contest is available on @contest. See also: This page in Russian

The task in this round of the VoIP Contest is to build a system for making high-quality audio calls between two participants, using a predefined interface and the testing tools provided.

Everyone is welcome to participate, including contestants who didn’t take part in the first two rounds of the VoIP Contest. The deadline for this stage is in three weeks: 20:59:00 UTC on March 30, 2020.

Differences from the previous round

If you participated in Round 2 of this contest, please note the following changes:

Updated base version of the libtgvoip library (use commit 6d21427). The new version features some of the changes suggested by the participants of the second round, as well as some minor changes to the public interface TgVoip.h required for implementing it in Telegram mobile clients and tgvoipcall.
Updated tgvoip-test-suite. Download new version on GitHub ».
— Added new tools for rating calls (used some of the apps created during the first round)
— Added latency monitoring during calls. For an example of latency analysis, check out the Jupyter Notebook with the results of the second round: VoIP-Stage2-Dataset.tar.gz.
— Added support for calls between two different physical servers.
— Updated the netns configuration script tests/setup-netns.sh (launch before you start testing).
— Changed output of call.php and mean.php, now showing two 2 rating tables: by library/rater version and by library version/network conditions.

You Will Use

The libtgvoip library. For the purposes of the contest, please use the commit 6d21427. version of the code.
A public interface for calls between two users in the header file TgVoip.h. The testing client tgvoipcall connects to the library using this interface; the same interface will also be implemented in Telegram mobile clients.
A set of tools and sound samples that can be used for testing calls and evaluating their quality.
You will not need the Telegram API.

The Task

Improve the implementation of voice calls using one of the possible strategies.
Keep the public interface TgVoip.h unchanged.

Conditions

You must only use C++ (except for compilation scripts and testing), with code portability in mind. Bonus points will be awarded if your library can be built for Android.
It is acceptable to modify the data transfer protocol, use an alternative protocol, or even create your own. End-to-End Encryption must be preserved (plaintext voice data must never reach any server; the encryption key must be derived from the encryption_key passed to the library). Compatibility with the current implementation will bring bonus points.
It is acceptable to ignore the address and tag parameters of the Telegram Relay server, and instead use your own or a third-party Relay server (STUN, TURN, xirsys, etc). When publishing your submission, we recommend hosting your server in Amsterdam or Central Europe to minimize latency for our judges during testing.
You should keep external dependencies to a minimum.
Third-party code may be used only if it's published under GPL-compatible licenses.
The following functionality is outside of the scope of this contest: P2P (TgVoipEndpointType::Inet, TgVoipEndpointType::Lan), TCP Relay (TgVoipEndpointType::TcpRelay), SOCKS proxy (TgVoipProxy), Data saving mode (TgVoipDataSaving), Traffic stats (TgVoipTrafficStats), as well as the functions setGlobalServerConfig, onSignalBarsUpdated, setMuteMicrophone, and the flag TGVOIP_USE_CUSTOM_CRYPTO. They will not be taken into account or used during testing and may be ignored.
Your app must support TgVoipState, onStateUpdated and TgVoipAudioDataCallbacks.
You must pay close attention to the thread safety of your code, avoid deadlocks, etc.
Participants of the previous stage should focus on the priorities outlined by our judges in their comments on contest.com. Other participants are welcome to choose any of the possible approaches.

Notes

P2P must be disabled using the library parameters.
IPv6 connections are outside the scope of the contest, IPv6 will be unavailable on our server during testing.
We recommend using the UDP protocol.
Submissions will be tested under Debian GNU/Linux 10.1 (buster), x86-64. Kindly ensure that your library works on a clean setup before submitting.
We will only test calls between clients using code from the same submission. Compatibility with the current Telegram implementation is optional (but will positively affect the judges' opinion).
Clients from one call may be launched from different physical servers.

As a result, you should get a ZIP-file with the following structure:

submission.zip
  -> libtgvoip.so - compiled library module
  -> README - build instructions, description of what you implemented
  -> src - a directory with your source code
  -> deb-packages.txt - a text file with a list of dependencies, formatted as line-break separated debian package names

Possible Approaches

There are several ways of approaching the contest task. Each of them has its benefits and may get you a prize, provided a sufficient amount of quality work goes into your submission. Here's what you can do, sorted from the most cautious to the most ambitious:

Find and fix issues in the current implementation of the library. For example, eliminate potential deadlocks or identify network conditions where voice quality or latency can be improved. For each issue you've identified, provide reproduction steps and a detailed description of the solution you implemented.
Rewrite the library, maintaining compatibility with the current clients (leaving network protocols unchanged).
Use third-party libraries and protocols for your implementation, losing compatibility. Optionally, you could also use your own Relay server.

Evaluation Criteria

To win in this round you will need to:

Meet all conditions
Pay special attention to the notes
Keep your code compact and efficient (size matters!)
Minimize external dependencies

All submissions will be tested on the same input data (audio samples and network conditions). Output files will be passed to tgvoiprate – the resulting ratings will be noted by the judges, but will not be the main criteria in the final scoring.

Test Suite

To help you test your library during development, we include a selection of tools and sound samples. Download from GitHub »

These tools allow you to:

Simulate voip calls using a binary .so file with the library. Instead of real microphone input, sound samples with speech or silence are used for each of the call participants. Output on the recipient's side is in turn recorded to an audio file.
Programmatically choose network conditions for one of the participants: packet loss, high latency, or limited bandwidth. Programmatically modify these conditions during the call.
Receive a numerical score for the quality of the output audio file on the recipient's side, compared to the original file and the preprocessed file that was actually sent over the network.
Write input and output filenames with respective scores to a CSV file for further analysis.
Control the above using PHP-scripts.
Calculate aggregated scores grouped by library version.

The instructions below are relevant for Debian GNU/Linux 10.1 (buster), x86-64.

Preparing your system

1. Install deb-packages:

$ sudo apt-get install php libssl-dev libopus-dev libavcodec-dev libavutil-dev libavformat-dev libavresample-dev libasound-dev python3 python-pip opus-tools libopusfile-dev pocketsphinx build-essential automake libtool libsphinxbase-dev libpocketsphinx-dev pkg-config sphinxbase-utils sphinxtrain libopusfile0 ffmpeg

2. Netem is used to emulate different network conditions, ip-netns is used to apply these conditions to just one of the two proccesses (the caller's side). The following commands must be run once before first launch:

$ sudo bash ./tests/setup-netns.sh
$ sudo tc qdisc list

This will set up the network namespace client1 with the virtual network interface v-peer1, which is needed for the calling software to work. Under normal conditions, you won't need to call it again. If any of the commands results in an error, you may need to install/enable netem or ip-netns on your system.

3. We'll often need to modify network conditions via netem, which requires root access/sudo, therefore it is neccessary to disable password prompts for sudo when launching those commands. To do this:

$ whoami
  user
$ sudo visudo

At the end of the file, add this line:

user ALL=(ALL) NOPASSWD: /usr/bin/ip netns exec client1 *

replacing user with the output of the whoami command. Save the file.

4. Lastly, you need to set your token to work with the VoIP Contest API. To do this, replace the default value 111222333444:AAABBBCCCDDD in the tests/token.php file with the token value you received @jobs_bot when joining this round of the contest.

Call + Rate

The main file that contains the testing script is tests/call.php. It manages the list of library .so files to be tested, number of call iterations, and sets of network conditions. The file is pre-filled with a sample scenario; we recommend modifying it according to your specific testing needs.

To launch:

$ php tests/call.php

This will run the complete test scenario. It might take a long time, since each iteration can take about 10-20 seconds depending on the chosen audio sample's duration. The folders preprocessed and out will be filled with files containing audio data sent over the network and received by the recipient. Each call will be immediately rated, with the results written to a .csv file. If you‘re accessing the server over SSH, we recommend using nohup/screen to ensure that long testing sessions don’t get interrupted by connection losses.

At the end, the script will also output average call ratings for each of the library versions tested, e.g.:

Version newunstable (12 ratings)
=============================================
ScoreFinal:         mean 3.58, stddev: 1.136
ScoreCombined:      mean 3.522, stddev: 1.265
ScoreOutput:        mean 3.713, stddev: 1.315
Score1010:          mean 3.615, stddev: 1.189
Score1012:          mean 3.298, stddev: 1.621
Score1002:          mean 2.413, stddev: 1.009
Score997:           mean 3.807, stddev: 0.923

....

Scores by network

Network             |   stable   |  unstable  |newunstable 
===========================================================
WiFi                |   4.604    |   4.798    |   4.794    
3G1                 |   4.874    |   4.636    |   4.727    
3G2                 |    4.48    |   3.722    |   4.242    
3G3                 |   4.822    |   4.023    |   4.209    
3G4                 |   4.369    |   3.696    |   4.274    
3GDelay             |     4      |   3.987    |   3.792    
3GOutage            |    3.97    |   3.877    |   4.029    
EDGE1               |   3.355    |   3.871    |   4.183    
EDGE2               |   2.859    |   2.366    |   2.564    
GPRS1               |   2.316    |   3.088    |   2.982    
GPRS2               |   1.752    |   1.651    |   1.785    
GPRS3               |   1.367    |   1.154    |   1.374    
-----------------------------------------------------------
Overall             |   3.564    |   3.406    |    3.58 

....

Here ScorePreprocess shows the degradation of sound quality in the preprocessed file compared to the input sample. On the scale of 1.0-5.0, where 1.0 means complete degradation and 5.0 means unchanged.
ScoreOutput shows the degradation of sound quality on the recipient‘s side compared to the preprocessed file on the sender’s side. 1.0-5.0, where 1.0 means complete degradation and 5.0 means unchanged.
ScoreCombined shows the degradation of sound quality on the recipient‘s side compared to the original input file on the recipient’s side. This is a sort of aggregated rating for ScorePreprocess and ScoreOutput. 1.0-5.0, where 1.0 means complete degradation and 5.0 means unchanged.
The values Score1010, Score1012, Score1002 represent ScoreCombined values from different implementations of the rater.
ScoreFinal is the weighted sum of the above values; it should be used as your primary indicator for evaluating call quality.

You can always get these stats from the .csv file later:

$ php tests/mean.php

To clear all accumulated preprocessed and output files as well as the .csv file you can run:

$ sh tests/clean.sh

Suggested usage

One of the possible ways of using the test suite while developing your library could be as follows.

1. Develop your library in a separate directory. Write verbose logs to stdout/stderr during development.
2. Build library after each significant change, run git commit, and copy the output library .so file into the Test Suite's lib subdirectory with a unique name, e.g. libtgvoip.COMMIT.so. Then add it to the list of libraries to be tested in tests/call.php.
3. Launch a testing session for the changes in the background, while continuing development (since the testing process might take a while).
4. Inspect library output logs (they are written to the corresponding .log files in the out subfolder), if necessary, listen to the audio files from the lowest-rated calls.
5. Find and fix bugs. Jump back to 2.

Once call quality reaches satisfactory levels, focus on finding new network conditions where quality degrades significantly, e.g. highly variable bandwidth, packet loss, and fix where possible.