The fileserver exporting
/opt/bcosw crashed for unknown reasons around 20:05 on Thursday Oct. 21. Thus, this filesystem is currently not available. We hope to bring it online again on Friday (Oct. 22) but it also might take until at least Monday.
As a consequence of the missing filesystem, logins to the HPC frontends (e.g. sfront0X, woody) and other systems (e.g cshpc, cssun) may hang or take very long as all these servers may wait for the filesystem, too, if users try to access
/home/cluster64 from there.
Update Friday 14:00 –
/home/cluster64 is online again
There will be a downtime of all RRZE HPC clusters on Saturday: Saturday, October 16, starting at 00:00.
Reason for the downtime is construction work on the power grid that requires turning off the power for large parts of the south campus on Saturday morning. We will shut down all cluster nodes and the Altixes shortly after midnight and turn them back on again when maintenance is over. As usual, no new jobs that would collide with the downtime will be started on the clusters. Work should be finished around 12:00, and we will slowly resume batch processing afterwards. Most frontends (woody, sfront04) and the filesystems are connected to an UPS, and will therefore stay available unless things go terribly wrong.
Update 16.10.2010, 13:30: Woody, Tinyblue and Cluster32 have resumed batch processing.