VoIP Stage 3: Results
During evaluation we thoroughly inspected the source code of each contestant’s proposed changes. Please refer to Сontest.com for short summaries and our comments on every submission.
To test call quality, we launched two different instances of each library between two different hosts. Custom reflectors (if needed) were launched in the same network. The following network conditions were used (we included some of the conditions that the contestants used to test their libraries):
Alias | Conditions |
---|---|
WiFi | ->networkType(‘wifi’) |
3G1 | ->loss(9, 20)->rateControl(‘44kbit’)->networkType(‘hspa’) |
3G2 | ->loss(17)->rateControl(‘29kbit’)->networkType(‘3g’) |
3G3 | ->loss(12, 3)->rateControl(‘32kbit’)->networkType(‘3g’) |
3G4 | ->loss(18)->rateControl(‘32kbit’)->networkType(‘3g’) |
3GDelay | ->loss(17, 5)->delay(500, 50)->networkType(‘3g’) |
EDGE1 | ->loss(11)->rateControl(‘24kbit’)->networkType(‘3g’) |
EDGE2 | ->loss(15, 5)->rateControl(‘19kbit’)->networkType(‘edge’) |
GPRS1 | ->loss(20, 5)->rateControl(‘17kbit’)->networkType(‘gprs’) |
GPRS2 | ->loss(19, 5)->rateControl(‘14kbit’)->networkType(‘gprs’) |
GPRS3 | ->loss(40, 5)->delay(500, 50)->rateControl(‘8kbit’)->networkType(‘gprs’) |
GPRS1-3G | ->loss(20, 5)->rateControl(‘17kbit’)->networkType(‘gprs’)->after(3)->loss(9, 20)->rateControl(‘44kbit’) |
GPRS1-WiFi | ->loss(20, 5)->rateControl(‘17kbit’)->networkType(‘gprs’)->after(3) |
WiFi-GPRS1 | ->networkType(‘wifi’)->after(3)->loss(20, 5)->rateControl(‘17kbit’) |
3G-GPRS1 | ->loss(9, 20)->rateControl(‘44kbit’)->networkType(‘3g’)->after(3)->loss(20, 5)->rateControl(‘17kbit’) |
3GOutage | ->loss(3, 10)->rateControl(‘64kbit’)->networkType(‘3g’)->after(3)->loss(20)->rateControl(‘8kbit’)->after(5)->rateControl(‘64kbit’) |
WiFiOutage | ->networkType(‘wifi’)->after(5)->loss(20)->rateControl(‘8kbit’)->after(3) |
As you can see, several new conditions with network changes during the call were added after Round 2.
To get numerical ratings, we used tgvoiprate provided in the tgvoip-test-suite, as well as some other raters from the winners of the first round. The final rating for each call was a weighted sum of several ratings (1.0-5.0
). Failed calls resulted in the score 1.0
.
Mean scores for each contest entry by Network Alias:
While the scores above played an important role, we've also judged entries by measuring the progress made by each contestant compared to Round 2, and inspecting the resulting audio samples manually. All of this resulted in significant changes to the leaderboard. Where relevant, our findings are reflected in the comments on contest.com. You are welcome to repeat our experiments by downloading the data set and audio files below.
In addition to that, we built entries which seemed to offer significant improvements in quality for Android and tested calls on real devices to double-check how well the libraries worked.
Traffic and latency metrics were collected and analyzed in a similar way. They might help further improve the library or show the cause of failure. The full analysis was performed in Jupyter Notebook which is provided here: PDF Version 1.3MB | HTML Version 2.3MB
For your convenience, raw CSV and the Notebook source are also provided here:
Download Call Metrics, Ratings and Notebook: VoIP-Stage3-Dataset.tar.gz (2MB)
Audio samples are provided in a separate archive due to their large size:
Download Call Audio: VoIP-Stage3-Audio.tar.gz (1.9GB)