News zu ftp.fau.de (ftp.uni-erlangen.de)

Content

Mirrors generating most traffic in 2022

The total outgoing traffic of ftp.fau.de increased from 5.04 PiB in 2021 to 6.57 PiB in 2022, an increase of 30%.

Rank Mirror Traffic 2022 in TB Rank/Traffic 2021 (for comparison)
1 mint/iso (Linux distribution, disc images) 1075 1 / 849
2 centos (Linux distribution) 780 10 / 160
3 kiwix (offline Wikipedia) 726 2 / 519
4 fdroid (Free and Open-Source Android app repository) 494 3 / 304
5 qtproject (Qt Toolkit) 295 4 / 275
6 fedora (Linux distribution) 293 8 / 186
7 opensuse (Linux distribution) 270 9 / 175
8 eclipse 238 7 / 193
9 lineageos (Free and Open-Source Android distribution) 189 6 / 199
10 gimp (Free and Open-Source image manipulation program) 173 5 / 266

The percentage of IPv6 traffic dropped slightly, it is now at 26% (1.68 PiB), after 27% in 2021, 27% in 2020, 25% in 2019, 22% in 2018, 22% in 2017, 14% in 2016 and 11% in 2015.

There is very little change in the Top 10 overall. Sure, some mirrors got a little more or a little less traffic, and thus changed position in the Top 10, but fluctuations are little and nothing really stands out. There is not a single new entry into the Top 10.

The notable exception seems to be CentOS, which jumped from rank 10 to 2. We’re a bit surprised by that, considering what has happened to this project in the last 2 years (you’ll need to read up on that elsewhere). The likely explanation is probably, that after RedHat turned this into an abomination of the original project, most other mirrors stopped mirroring – and as one of the few remaining mirrors we now see a lot more traffic to the mirror. That won’t last forever though: Newer versions of “CentOS stream” (starting with 9) are distributed through a different mirror network now, and we neither are nor plan to be a part of that. We will only keep the existing mirror running as long as there are still CentOS 7 updates to distribute (supposedly on 2024-06-30, but after spontanously pulling the plug on CentOS 8 with just a few months notice, would you really trust that company to keep their promises for CentOS 7?).

We will end this post by explaining why “gimp” is almost the only mirror whose total traffic (in TB) dropped from 2021 to 2022: In September 2022, we noticed that we were accidently listed twice in gimps automatic mirror redirector, meaning we served twice as many downloads as the other mirrors. Since that error has (sadly) been corrected now, we’re getting a lot less traffic from gimp.

Mirrors generating most traffic in 2021

The total outgoing traffic of ftp.fau.de increased from 4.12 PiB in 2020 to 5.04 PiB in 2021, an increase of 22%.

Rank Mirror Traffic 2021 in TB Rank/Traffic 2020 (for comparison)
1 mint/iso 849 2 / 389
2 kiwix (offline Wikipedia) 519 1 / 538
3 fdroid (Free and Open-Source Android app repository 304 5 / 242
4 qtproject (Qt Toolkit) 275 4 / 356
5 gimp 266 3 / 376
6 lineageos (Free and Open-Source Android distribution) 199 6 / 204
7 eclipse 193 – / 99
8 fedora 186 8 / 148
9 opensuse 175 10 / 108
10 centos 160 7 / 159

The percentage of IPv6 traffic stagnated, with 27% (1.36 PiB), after 27% in 2020, 25% in 2019, 22% in 2018, 22% in 2017, 14% in 2016 and 11% in 2015.

There is very little change in the Top 10 overall. Sure, some mirrors got a little more or less traffic, and thus changed position in the Top 10, but fluctuations are little and nothing really stands out.

The biggest change is with mint/iso, which already was in the Top 10 on Rank 2 last year, but assured Rank 1 in 2021 quite clearly, with a large gap to Rank 2. We are not sure what caused the traffic increase for that mirror, our best guess is that they changed their download-page to prefer Torrents with webseeds, thus we are now seing a certain percentage of each download.

Mirrors generating most traffic in 2020

It seems that I have not posted the traffic stats for 2020 yet. Well, better late than never, so here they are.

The total outgoing traffic of ftp.fau.de increased from 3.52 PiB in 2019 to 4.12 PiB in 2020, an increase of 17%.

Rank Mirror Traffic 2020 in TB Rank/Traffic 2019 (for comparison)
1 kiwix (offline Wikipedia) 538 2 / 458
2 mint/iso 389 3 / 260
3 gimp 376 – / 1
4 qtproject (Qt Toolkit) 356 1 / 511
5 fdroid (Free and Open-Source Android app repository 242 11 / 109
6 lineageos (Free and Open-Source Android distribution) 204 4 / 246
7 centos 159 9 / 117
8 fedora 148 7 / 136
9 videolan 114 – / 97
10 opensuse 108 6 / 138

The percentage of IPv6 traffic increased a tiny bit again, to 27% (1.13 PB), after 25% in 2019, 22% in 2018, 22% in 2017, 14% in 2016 and 11% in 2015.

There is some change in the Top 10 overall, with some mirrors traffic volumes (and thus placement in the Top 10) changing significantly, but most staying pretty much the same.

The most eye-catching change is the Gimp-mirror. We have been mirroring gimp since 2015, but this mirror never generated a lot of traffic. This changed rapidly in April 2020: Instead of less than 10 gigabytes per day, outgoing traffic jumped to over 3000 gigabytes a day average. The reason for this seems to be that they changed their download-page: While it would previously just give you a list of mirrors and ask you to select one manually, they would then send you to a mirror chosen randomly from a list of known-stable mirrors – which happened to include us. Later that year, traffic went down again, because they added more mirrors to said list – but around 800 gigabytes a day average still is quite good.

And as already anticipated last year, the mirror for F-Droid, a community-maintained Android software repository only hosting free/libre software (sort of an alternative to Googles “Play Store”), had no problem to reach the Top 10 in 2020. The reason is that since mid-2019 they made their clients use the mirrors instead of central servers far more often. As a result, this is now our 5th most requested mirror.

Unplanned outage on June 03

On June 03, 2020, at around 16:15 local time (14:15 UTC), disk accesses on ftp.fau.de became extremely slow. Less than an hour later, any attempts to access the disks with the ftp-data on it failed. Investigation revealed that the big RAID controller that manages all the external disk enclosures for the data had stopped responding completely.
While the failed controller could temporarily be brought back by a powercycle at around 18:00 local time (16:00 UTC), it failed again within 10 minutes of booting the machine.
Unfortunately, there was no compatible replacement for the failed controller onsite. A replacement has been ordered and shipped, but has not arrived yet. As all parcel services are severely overloaded due to the Corona crisis, it is currently unclear when it will arrive.

We were able to bring ftp.fau.de partially back after noon on June 04: It seems the broken controller does not crash as long as it does not get too much load. We have therefore had to disable automatic updates of all mirrors for now. They will mostly remain at the version they had at around 2020-06-03 14:30 UTC. We have however updated a few select mirrors manually.

The controller also is significantly slower than normal, even if it has a significantly lower workload than usual. This is mostly because we have disabled any write caching on it, which indirectly automatically slows the throughput it can achieve to a crawl. While many accesses can be handled by the big SSD that serves as a cache (and is working perfectly fine), in those cases where a fetch from the spinning hard discs is needed because data is not in the cache, these will be significantly slower than usual.

We are sorry for the inconvenience and trying our best to return to full service ASAP.

We will update this article as needed.

Update 1 @2020-06-06 08:30: Parcel tracking now says that our delivery has arrived in our city and will be delivered to us on Monday, so we expect to be back in business by Monday evening.

Update 2 @2020-06-07 08:00: While we are not back to our usual sync schedules yet, all mirrors should be updated at least once a day.

Update 3 @2020-06-08 14:00: The replacement controller has arrived.

Update 4 @2020-06-08 22:00: The replacement-controller is working fine. All mirrors are current again, and normal update intervals have been resumed.
While we were working on the machine anyways, we also upgraded the main memory from 64 to 128 GB.

Mirrors generating most traffic in 2019

The total traffic of ftp.fau.de increased from 3.22 PB in 2018 to 3.52 PB in 2019, an unusually small increase of only 9%.

Rank Mirror Traffic 2019 in TB Rank/Traffic 2018 (for comparison)
1 qtproject (Qt Toolkit) 511 2 / 445
2 kiwix (offline Wikipedia) 458 1 / 476
3 mint/iso 260 5 / 255
4 lineageos (Free and Open-Source Android distribution) 246 3 / 327
5 eclipse 162 4 / 266
6 opensuse 138 7 / 119
7 fedora 136 6 / 137
8 cdn.media.ccc.de (Talk recordings from CCC and related conferences) 120 – / 94
9 centos 117 9 / 109
10 osmc (Open Source Media Center) 112 8 / 113

The percentage of IPv6 traffic increased a tiny bit again, to 25% (0.87 PB), after 22% in 2018, 22% in 2017, 14% in 2016 and 11% in 2015.

There is little change in the Top 10 overall, with traffic volumes staying pretty much the same, only some mirrors swapped their places in the table.

Our mirror of cdn.media.ccc.de managed to reach the Traffic Top 10 again, but it was a close call. This mirror makes most of its yearly traffic at the end of December / beginning of January, when the recordings of the annual Chaos Communication Congress are put online. On the other hand, the CTAN mirror that was on rank 10 last year did not make the Top 10 this year, missing it by a few TB (106).

A notable new entry that would be on rank 11 is F-Droid, a community-maintained Android software repository only hosting free/libre software (sort of an alternative to Googles “Play Store”). We started mirroring this at the end of 2018, but only recently they added functionality that makes the clients use the mirrors more often. As a result, this mirror has seen some a lot more usage in recent months, and is likely to reach the Top 10 in 2020.

Mirrors generating most traffic in 2018

The total traffic of ftp.fau.de increased from 2.51 PB in 2017 to 3.22 PB in 2018, a 28% increase.

Rank Mirror Traffic 2018 in TB Rank/Traffic 2017 (for comparison)
1 kiwix (offline Wikipedia) 476 1 / 371
2 qtproject (Qt Toolkit) 445 2 / 274
3 lineageos (Free and Open-Source Android distribution) 327 3 / 203
4 eclipse 266 7 / 147
5 mint/iso 255 5 / 190
6 fedora 137 8 / 103
7 opensuse 119 5 / 152
8 osmc (Open Source Media Center) 113 6 / 149
9 centos 109 11 / 69
10 ctan (comprehensive TEX archive network) 95 10 / 93

Even though the absolute amount of IPv6 traffic increased a bit, its percentage of all traffic stagnated, with 22% (0.72 PB) in 2018, after 22% in 2017, 14% in 2016 and 11% in 2015.

Our mirror of cdn.media.ccc.de is no longer in the Traffic Top 10, it only ranked in 11th place with a measly 94 TB, a large part of that at the end of December / beginning of January, when the recordings of the annual Chaos Communication Congress are put online.

Mirrors generating most traffic in 2017

The total traffic of ftp.fau.de increased from 1.89 PB in 2016 to 2.51 PB in 2017, by 33%. 203 TB alone was accounted to one of our newest additions from late may 2017: lineageos, a free and open-source android distribution and successor of cyanogenmod (which was discontinued).

Rank Mirror Traffic 2017 in TB Rank/Traffic 2016 (for comparison)
1 kiwix (offline Wikipedia) 371 2 / 262
2 qtproject (Qt Toolkit) 274 1 / 264
3 lineageos (Free and Open-Source Android distribution) 203 (since late may) – / –
4 mint/iso 190 5 / 134
5 opensuse 152 4 / 145
6 osmc (Open Source Media Center) 149 3 / 172
7 eclipse 147 6 / 131
8 fedora 103 7 / 115
9 cdn.media.ccc.de (Talk recordings from CCC conferences) 99 8 / 91
10 ctan (comprehensive TEX archive network) 93 9 / 76

The total IPv6 traffic increased significantly, to 22% (0.55 PB) of all traffic in 2017, from 14% in 2016 and 11% in 2015.

Downtime on January 12

We had a little downtime today between around 12:10 and 13:20. About 5 minutes of this downtime were planned – we simply wanted to reboot the machine. Unfortunately, the machine did not come back up after the reboot, and it took a while to figure out what the problem was.

As it turns out, the machine was unable to mount the huge volume with all our mirrors on it during boot. Manually mounting the disk failed as well because the device just wasn’t there. In lvdisplay the volume was listed as unavailable, and trying to set it available with lvchange failed with the message
/usr/sbin/cache_check: execvp failed: No such file or directory
Check of pool bigdata/ftpcachedata failed (status:2). Manual repair required!

As written in a previous post, we nowadays have a cache SSD in ftp.fau.de. As it turns out, you can create a cached volume and use it without any issues at all, until you try to reboot – because then LVM suddenly decides it needs a cache_check binary that naturally isn’t shipped in the LVM package. Of course, this is not a new problem: There is a bug report about that in Debian since 2014. Of course, slightly more than 3 years later, the problem still isn’t fixed (e.g. by checking if the cache_check-binary is available on creation of a cached volume). The problem is that the missing binary is in the thin-provisioning-tools-package, which to maximise confusion belongs to LVM, but doesn’t have LVM anywhere in its name. I also wouldn’t exactly associate caching with thinly provisioning volumes, but maybe that’s just me. The LVM2 package does not depend on thin-provisioning-tools, it only “suggests” it, so it doesn’t get installed automatically in any sane APT config for servers.

So once the problem was clear, it was at least easy to fix: We installed the missing package, rebooted, and ftp.fau.de was back in action.

Experiments with LVM-cache

Recently, we’ve frequently reached the I/O capacity of our RAID array during peak hours, meaning that increasingly often downloads were not limited by network speed, but by how fast our disks could deliver the data.

While we use a hardware RAID6 which does deliver pretty decent read speeds, we’re still using traditional magnetic hard drives, not SSDs. While these arrays easily deliver more than a Gigabyte per second when reading sequentially, this performance drops rapidly the more random the I/O gets, i.e. the more different files are requested at the same time. Of course, due to the nature of our FTP, which provides mirrors for a whole bunch of different projects, we do get a large amount of random I/O.

One solution to improve performance in this constellation is to use an additional cache on an SSD that caches the most frequently requested files or disk blocks. Most storage vendors implement something like that in their storage arrays nowadays, although it’s usually an optional and not exactly cheap feature – you usually pay a hefty license fee for that feature, and then you also have to buy prohibitively expensive SSDs from that vendor too to make any use of it. The better (and cheaper) alternative is to use a software solution to do the same thing. There are different implementations for SSD caching on Linux.

The SSD-caching implementation we chose to use was lvmcache. As the name suggests, lvmcache is integrated with the Linux Logical Volume Manager (LVM) that we use for managing the space on the big raid arrays anyways. The SSD is simply attached to a normal logical volume and then caches accesses to that volume by keeping track of which sectors are used most often and serving them from the cache-SSD.

Two basic modes are available: Write-back and Write-through. Write-Back writes blocks to the SSD-cache first and only syncs them to the disks some time later. While write-back has the advantage of increasing the write speed, it has the drawback that in this mode the SSD used for caching is vital – if it dies, the data on the disks would be left in an inconsistent and possible irrepairable state. To avoid data loss, some RAID level for the SSD would be required in this mode. However, we don’t really care about the write speed on the FTP – the only writes it sees are the updates of the mirrors, but those are few. More than 99% of all I/O is read requests from clients requesting some mirrored files. Because of that, we don’t really care about the write speed and instead use “write-through” mode. In this mode, all writes go to the underlying disk immediately, the SSD is only used for read caching. When the SSD dies, you lose the caching, but your data is still safe.

For testing, we borrowed a 1 TB Intel SSD. After one week of testing, we are impressed by the results. The following is a graph from our munin showing the utilization of the devices:

As you can see, we introduced the disk cache (nvme0n1) on the 7th. After about one day, the cache had filled up, and was now serving the magnitude of requests. As a result, the utilization on the disk arrays (sdb, sdc) dropped rapidly, from “pracitcally always 100%” to “30-50%” during peak hours.

If longterm performance is as good as these first test results suggest, we will permanently equip ftp.fau.de with a SSD for caching to allow faster downloads for you.

Mirrors generating most traffic in 2016

It seems I haven’t posted the traditional annual mirror stats for 2016 yet. Well, lets fix that: Here are the most used mirrors on ftp.fau.de in 2016:

Rank Mirror Traffic 2016 in TB Rank/Traffic 2015 (for comparison)
1 qtproject (Qt Toolkit) 264 1 / 199
2 kiwix (offline Wikipedia) 262 2 / 179
3 osmc (Open Source Media Center) 172 – / 35
4 opensuse 145 4 / 139
5 mint 134 3 / 179
6 eclipse 131 6 / 97
7 fedora 115 5 / 100
8 cdn.media.ccc.de (Talk recordings from CCC conferences) 91 7 / 83
9 ctan (comprehensive TEX archive network) 76 8 / 63
10 tdf (The Document Foundation – LibreOffice) 58 9 / 48

Comparing this list to last year, the first thing one notices is the new entry on rank 3: OSMC generates a steady amount of traffic, with visible peaks whenever they do a release. They were not in the 2015 top ten because we only started mirroring it in Q4 of 2015.

Linux mint dropped from rank 3 last year to rank 5 this year, with much less traffic than last year. And that is despite a slightly different counting that should actually have increased their numbers: We are now summing up the two parts of the mirror, the ISOs and the packages. Not because we want to do that, but for technical reasons – we cannot always distinguish between the two in the stats, and not summing them up would make the stats even more wrong. Now judging from the stats, we must have been dropped from their mirror list for ISO downloads around June of 2016. From there to the end of the year, almost no requests for the Mint ISOs have hit our server. As to why we were dropped, we haven’t got the slightest clue – we got no notification about the removal. We did get a notice about being readded at the beginning of 2017 though.

Last years rank 10, videolan, dropped off the list – it would be on rank 13 this year.

Rank 1 and 2, Qt and kiwix, are really close head to head.

For all other mirrors, they swapped positions here and there, and all of them generated a little more traffic than the year before, but there were no big changes.

Lets take a look at IPv6 traffic only:

Rank Mirror IPv6 Traffic 2016 in TB Rank/Traffic 2015 (for comparison)
1 kiwix (offline Wikipedia) 34.6 3 / 14.5
2 cdn.media.ccc.de (Talk recordings from CCC conferences) 27.6 2 / 15.2
3 mint 20.8 1 / 22.6
4 qtproject (Qt Toolkit) 19.1 5 / 11.5
5 opensuse 18.4 6 / 10.8
6 debian 17.3 7 / 9.3
7 fedora 13.0 4 / 11.5
8 pclinuxos 12.9 – / 0.7
9 ubuntu 12.6 – / 3.2
10 tdf (The Document Foundation – LibreOffice) 12.3 8 / 8.1

With the exception of Linux Mint, where I’ve already explained the reason above, all mirrors had more IPv6 traffic, sometimes significantly more.

This is also visible in the total IPv6 traffic over all mirros: 13.68% of all traffic in 2016 was IPv6, up from 10.54% in 2015.

There are still huge differences in the IPv6 traffic share between the different projects mirrored, and most of the time it isn’t really clear why. One example where it is clear though is cygwin, with an IPv6 share of pretty much 0%: They use a setup-tool that downloads individual packages from the mirrors, and it seems this tool only does IPv4.