Content

Evaluating Project Mirroring by Cost vs. Profit

People operate mirrors out of different reasons, this could be purely altruistic to support open source communities, or to reduce incoming traffic by providing local clients with an internal package source, or to tilt the in/out ratio of the internet uplink. What ever the reasons are, they are all associated with the usage of outbound traffic and local storage size, in addition to occupation of other hardware and administrative resources. We will consider traffic and storage requirements as the costs and outgoing traffic as the profit (to the general public or an organization) a mirror service provides.

Now looking around at what other mirror operators publish, it is very hard to find useful information to guide the mirror selection. Which means, that if you plan on hosting a new project, you have to rely on guesswork to estimate cost vs profit.

Although cost and profit are terms usually associated with currency based units, we will not estimate them using any such metric. We will consider Gigabytes as the unit for storage and traffic and therefore cost and profit.

By combining logged data from the ~50 mirrors we are currently hosting, we are in the position to quantify both cost and profit for each one of them. Be aware that the profit often depends on the geographic location, since many projects use a topology- and geography-aware load balancer like mirrorbrain.

Lets start of with the incoming traffic cost. The average number of GBs that need to be synced from the upstream mirror to ftp.fau.de per day:

Mirror Incoming Traffic
  GB per day (average)
kiwix 16.63
fedora 4.29
qtproject 3.73
debian 2.65
freebsd 2.63
ubuntu-ports 2.52
eclipse 1.83
cdn.media.ccc.de 1.80
deepin-cd 1.54
ubuntu 1.51
debian-cd 1.28
mageia 1.18
opensuse 1.01
macports 0.95
scientific 0.94
archlinux 0.90
trinity 0.87
macports/packages 0.83
netbsd 0.81
turnkeylinux 0.75
fosdem 0.67
gentoo 0.64
deepin 0.45
centos 0.32
xbmc 0.27
epel 0.27
packman 0.27
apache 0.21
openvz 0.20
osmc 0.19
mint 0.19
tdf 0.16
mint/iso 0.16
macports/distfiles 0.12
pclinuxos 0.09
openelec 0.07
opensource-dvd 0.07
ubuntu-releases 0.05
raspbmc 0.05
ripe.net 0.05
gnome 0.03
debian-backports 0.03
videolan 0.02
ctan 0.02
mint/packages 0.02
opencsw 0.02
CCC 0.02
knoppix-dvd 0.01
aminet 0.01
gimp 0.01
knoppix 0.00
grml 0.00
aptosid 0.00
mint/lmde-packages 0.00
gentoo-portage 0.00
macports/release 0.00
macports/trunk 0.00
putty 0.00
libreelec n/a

We calculated this by summing up all positive changes in mirror size, which is calculated and logged on a daily basis. Since we sync up to hourly, files that get synced and deleted within a day are not covered, nor are files that change.

How about storage costs? Since most mirrors stay around the same size, with a slight increase taking the average size works for most. But some have a tendency to constantly increase (or even decrease), this won’t be correct in all cases.

Mirror Storage Increment Total Storage Total Storage
  GB per day (average) GB (maximum) GB (average)
kiwix 4.19 5690 3977
cdn.media.ccc.de 1.75 3189 2338
scientific 0.82 1602 1328
fedora 0.88 1712 1304
ubuntu-ports 1.22 1214 1153
debian 0.44 1347 1078
CCC 0.02 1073 1073
mageia 0.65 1139 890
ubuntu 0.20 916 773
mint -0.37 873 652
eclipse 0.43 824 630
macports 0.11 727 593
freebsd -1.12 1224 579
qtproject 0.59 731 517
turnkeylinux 0.34 724 517
opensuse 0.28 552 441
mint/lmde-packages -0.44 671 440
macports/packages 0.19 576 399
fosdem 0.57 394 370
netbsd -0.13 467 368
gentoo 0.07 329 288
debian-cd -0.05 679 280
macports/distfiles -0.08 258 193
mint/iso 0.06 206 190
gnome 0.03 175 174
archlinux -0.08 532 160
trinity 0.09 304 160
epel 0.06 155 117
openvz -0.26 172 105
centos 0.08 141 104
pclinuxos -0.03 97 93
packman 0.03 99 82
ripe.net 0.04 100 76
apache 0.07 117 69
deepin 0.25 257 67
xbmc 0.01 73 52
opensource-dvd 0.07 79 46
aminet 0.00 44 43
tdf 0.01 49 38
deepin-cd -0.01 583 36
videolan 0.02 37 31
debian-backports -0.04 34 30
opencsw 0.01 31 28
ctan 0.00 33 27
ubuntu-releases -0.01 41 25
mint/packages 0.01 30 22
gimp 0.01 22 19
osmc 0.09 30 16
knoppix-dvd 0.00 16 16
openelec 0.00 23 12
grml 0.00 12 12
aptosid 0.00 8 9
libreelec n/a 6 6
knoppix 0.00 5 6
raspbmc 0.01 22 6
gentoo-portage 0.00 1 1
macports/release 0.00 0 0
macports/trunk 0.00 0 0
putty 0.00 0 0

Let’s have a look at the profit side of things, the outgoing traffic, and put that into relation with the costs. The scores on the right hands side, are calculated by dividing the average outgoing traffic by the average incoming traffic and by the average storage requirement:

Mirror Outgoing Traffic Incoming Traffic Storage Increment Total Storage Total Storage Score Based on Traffic Score Based on Storage
  GB per day (average) GB per day (average) GB per day (average) GB (maximum) GB (average) Outgoing vs Incoming Outgoing vs avg. Storage
osmc 478.40 0.19 0.09 30 16 2518 29.90
libreelec 56.90 n/a n/a 6 6 9.48
openelec 93.70 0.07 0.00 23 12 1339 7.81
videolan 176.60 0.02 0.02 37 31 8830 5.70
ctan 138.80 0.02 0.00 33 27 6940 5.14
ubuntu-releases 91.80 0.05 -0.01 41 25 1836 3.67
tdf 127.70 0.16 0.01 49 38 798 3.36
knoppix 13.40 0.00 0.00 5 6 2.23
mint/iso 350.20 0.16 0.06 206 190 2189 1.84
opensource-dvd 48.80 0.07 0.07 79 46 697 1.06
qtproject 494.50 3.73 0.59 731 517 133 0.96
grml 10.10 0.00 0.00 12 12 0.84
opensuse 365.00 1.01 0.28 552 441 361 0.83
centos 77.70 0.32 0.08 141 104 243 0.75
xbmc 35.60 0.27 0.01 73 52 132 0.68
pclinuxos 57.80 0.09 -0.03 97 93 642 0.62
deepin-cd 21.00 1.54 -0.01 583 36 14 0.58
knoppix-dvd 9.00 0.01 0.00 16 16 900 0.56
epel 57.10 0.27 0.06 155 117 211 0.49
packman 31.70 0.27 0.03 99 82 117 0.39
eclipse 238.90 1.83 0.43 824 630 131 0.38
mint/packages 6.90 0.02 0.01 30 22 345 0.31
archlinux 28.90 0.90 -0.08 532 160 32 0.18
macports/packages 66.40 0.83 0.19 576 399 80 0.17
fedora 195.00 4.29 0.88 1712 1304 45 0.15
kiwix 560.40 16.63 4.19 5690 3977 34 0.14
raspbmc 0.80 0.05 0.01 22 6 16 0.13
aptosid 1.00 0.00 0.00 8 9 0.11
aminet 4.70 0.01 0.00 44 43 470 0.11
macports/distfiles 18.90 0.12 -0.08 258 193 158 0.10
gentoo 26.70 0.64 0.07 329 288 42 0.09
turnkeylinux 46.10 0.75 0.34 724 517 61 0.09
cdn.media.ccc.de 185.50 1.80 1.75 3189 2338 103 0.08
opencsw 2.20 0.02 0.01 31 28 110 0.08
mageia 62.10 1.18 0.65 1139 890 53 0.07
apache 4.40 0.21 0.07 117 69 21 0.06
fosdem 22.70 0.67 0.57 394 370 34 0.06
deepin 3.60 0.45 0.25 257 67 8 0.05
ubuntu 41.00 1.51 0.20 916 773 27 0.05
debian 57.10 2.65 0.44 1347 1078 22 0.05
gimp 0.90 0.01 0.01 22 19 90 0.05
debian-cd 12.70 1.28 -0.05 679 280 10 0.05
CCC 25.80 0.02 0.02 1073 1073 1290 0.02
gnome 3.80 0.03 0.03 175 174 127 0.02
trinity 2.90 0.87 0.09 304 160 3 0.02
freebsd 8.20 2.63 -1.12 1224 579 3 0.01
openvz 1.00 0.20 -0.26 172 105 5 0.01
macports 4.80 0.95 0.11 727 593 5 0.01
scientific 9.30 0.94 0.82 1602 1328 10 0.01
ripe.net 0.40 0.05 0.04 100 76 8 0.01
netbsd 1.70 0.81 -0.13 467 368 2 0.00
ubuntu-ports 4.70 2.52 1.22 1214 1153 2 0.00
debian-backports 0.10 0.03 -0.04 34 30 3 0.00
mint 0.80 0.19 -0.37 873 652 4 0.00
mint/lmde-packages 0.30 0.00 -0.44 671 440 0.00
gentoo-portage 0.00 0.00 0.00 1 1 0.00
macports/release 0.00 0.00 0.00 0 0 n/a n/a
macports/trunk 0.00 0.00 0.00 0 0 n/a n/a
putty 0.00 0.00 0.00 0 0 n/a n/a

What can we learn from this? Small projects can produce a lot of traffic (osmc), especially if they tend to have a small mirror network (libreelec) or a humongous user base (videolan). Big mirrors on the other hand, often produce a lot of traffic in an absolute sense (kiwix, cdn.media.ccc.de), but they are unable to compensate for their storage requirement. For example, the CCC media mirror would have to constantly saturate 2 Gbit/s to get the same traffic/storage ratio as videolan. Although we love outgoing traffic, thankfully they do not. 🙂