After many years the whole RPM Fusion repository has grown to over 320GB. There have been occasional requests to move the unsupported releases to an archive, just like Fedora handles its mirror setup, but until last week this did not happen.
As of now we have moved all unsupported releases (EL-5, Fedora 8 – 25) to our archive (http://archive.rpmfusion.org/) and clients are now being redirected to the new archive system. The archive consists of 260GB which means we can reduce the size mirrors need to carry by more than 75%.
From a first look at the archive logs the amount of data requested by all clients for the archived releases is only about 30GB per day. Those 30GB are downloaded by over 350000 HTTP requests and over 98% of those requests are downloading the repository metdata only (repomd.xml, *filelist*, *primary*, *comps*).
RPM Fusion’s mirrorlist server which are returning a list of (probably, hopefully) up to date mirrors (e.g., http://mirrors.rpmfusion.org/mirrorlist?repo=free-fedora-rawhide&arch=x86_64) still have been running on CentOS5 and the old MirrorManager code base. It was running on two systems (DNS load balancing) and was not the most stable setup. Connecting from a country which has been recently added to the GeoIP database let to 100% CPU usage of the httpd process. Which let to a DOS after a few requests. I added a cron entry to restart the httpd server every hour, which seemed to help a bit, but it was a rather clumsy workaround.
It was clear that the two systems need to be updated to something newer and as the new MirrorManager2 code base can luckily handle the data format from the old MirrorManager code base it was possible to update the RPM Fusion mirrorlist servers without updating the MirrorManager back-end (yet).
From now on there are four CentOS7 systems answering the requests for mirrors.rpmfusion.org. As the new RPM Fusion infrastructure is also ansible based I added the ansible files from Fedora to the RPM Fusion infrastructure repository. I had to remove some parts but most ansible content could be reused.
When yum or dnf are now connecting to http://mirrors.rpmfusion.org/mirrorlist?repo=free-fedora-rawhide&arch=x86_64 the answer is created by one of four CentOS7 systems running the latest MirrorManager2 code.
RPM Fusion also has the same mirrorlist access statistics like Fedora: http://mirrors.rpmfusion.org/statistics/.
I still need to update the back-end system which is only one system instead of six different system like in the Fedora infrastructure.
There have been two protocol related issues with MirrorManager open for some time:
Both issues have been resolved. The first issue, to drop FTP URLs from the metalinks, has been resolved in multiple steps. The first step was to block FTP URLs from being added to Fedora’s MirrorManager (Optionally exclude certain protocols from MM, New MirrorManager2 features) and the second step, to remove all remaining FTP URLs from Fedora’s MirrorManager, was performed during the last few days and weeks. Using MirrorManager’s mirrorlist interface (which is not used very often) only returned FTP if the mirror had no HTTP(S) URLs. So it was already rather unusual to be redirected to a FTP mirror. Using MirrorManager’s metalink interface returned all possible URLs for a host. With the removal of all FTP URLs from MirrorManager’s database no user should see FTP URLs any more and the problems some clients encoutered (see Drop ftp:// urls from metalinks) should be ‘resolved’.
The other issue (Add a way to specify you want only https urls from metalink) has also been solved by adding a protocol option to the mirrorlist and metalink back-end. The new MirrorManager release (0.7.2) which includes these changes is already running on the staging instance and the result can be seen here:
To have more HTTPS based mirrors in our database we scanned all existing public mirrors to see if they also provide HTTPS. With this the number of HTTPS URLs was increased from 24 to over 120.
The option to select which protocol the mirrorlist/metalink mirrors should provide is not yet running on the production instance.
The latest MirrorManager release (0.6.1) which is active since 2015-12-17 in Fedora’s infrastructure has a few additional features which provide insights into the mirror network usage.
The first is called statistics. It gives a daily overview what clients are requesting. It analysis the metalink and mirrorlist accesses and draws diagrams. Each time the local yum or dnf metadata has expired a new mirrorlist/metalink is requested which contains the ‘best’ mirrors for the client currently requesting the data. The current MirrorManager statistics implementation tries to display how often the different repositories are requested from which country for the available architectures:
In addition to the statistics where the clients are coming from and which files they are interested in the old code to draw a map of the location of all mirror servers has been re-enabled: maps
Another new visualization tries to track the propagation. The time the existing mirrors need to carry the latest bits. A script connects to all enabled mirrors and checks which repomd.xml file is currently available on the mirror. This is done for the development branch and all active branches. The script displays how many mirrors have the current repomd.xml file or if the mirror still has the repomd.xml file from the previous push (or the push before) or if the file is even older: Propagation.
Another relevant change in Fedora’s MirrorManager is that it is no longer possible to enter FTP URLs. This is the first step to remove FTP based URLs as FTP based mirrors are often, depending on the network topology, difficult to connect to, other protocols (HTTP, RSYNC) are better suited and more mirror server are not providing FTP anyway.
After running RPM Fusion’s MirrorManager instance for many years on Fedora I moved it to a CentOS 6.4 VM. This was necessary because the MirrorManager installation was really ancient and still running from a modified git checkout I did many years ago. I expected that the biggest obstacle in this upgrade and move would be the database upgrade of MirrorManager as its schema has changed over the years. But I was fortunate and MirrorManager included all the necessary scripts to update the database (thanks Matt). Even from the ancient version I was running.
RPM Fusion’s MirrorManager instance uses postgresql to store its data and so I dumped the data on the one system to import it into the database on the new system. MirrorManager stores information about the files as pickled python data in the database and those columns were not possible to be imported due to problems with the character encoding. As this is data that is provided by the master mirror I just emptied those columns and after the first run MirrorManager recreated those informations.
Moving the MirrorManager instance to a VM means that, if you are running a RPM Fusion mirror, the crawler which checks if your mirror is up to date will now connect from another IP address (18.104.22.168) to your mirror. The data collected by MirrorManager’s crawler is then used to create http://mirrors.rpmfusion.org/mm/publiclist/ and the mirrorlist used by yum (http://mirrors.rpmfusion.org/mirrorlist?repo=free-fedora-updates-released-19&arch=x86_64). There are currently four systems serving as mirrors.rpmfusion.org
Looking at yesterday’s statistics (http://mirrors.rpmfusion.org/statistics/?date=2013-08-20) it seems there were about 400000 accesses per day to our mirrorlist servers.