There is a scheduled downtime of all HPC clusters and the filesystem /home/woody starting on Wednesday, August 15, starting at 09:30. Data from /home/woody/ could not be transferred during today’s downtime to the new volume due to the amount of data (>20TB and > 15 Mio files). Most of the transfer will continue in the background. There is an other global downtime starting Sunday (2012-08-19) night at 23:00 if copying finishes as expected.
As usual, no jobs that would collide with the downtime will run. As soon as everything is over, batch processing will be resumed. Reason for this downtime is that we are trying to fix the problems that showed up during the downtime on August 01 and that prevented us from finishing the maintenance work that was originally planned.
Current status of file systems:
- /home/hpc/ — available
- /home/woody/ — read-write available but all data still has to be transferred to a new volume
- /home/cluster64/ — available
- /home/vault/ — available
- $FASTTMP=/lxfs — available again on LiMa, TinyFat and memoryhog
Current status of the login nodes:
- cshpc — to be rebooted
- woodyX — upgraded to SLES11SP2 and available again
- limaX — rebooted and available again
- sfrontX — available
- memoryhog — available; may be rebooted on Friday
Current status of compute clusters:
- Transtec cluster — batch processing resumed
- Woody cluster — batch processing stopped; all compute nodes still have to be upgraded to SLES11SP2
- TinyBlue cluster — batch processing resumed
- TinyFAT cluster — batch processing resumed
- TinyGPU cluster — batch processing resumed
- LiMa cluster — batch processing resumed
Timeline
- 2012-08-15, 09:30 — /home/woody set read-only for moving all data to a new volume
- 2012-08-15, 09:30 — $FASTTMP=/lxfs unmounted
- 2012-08-15, 12:30 — file server wnfs1 replaced with new hardware (HL DL380G7 instead of HL DL580G2)
- 2012-08-15, 15:30 — LXFS maintenance finished; $FASTTMP=/lxfs mounted again
- 2012-08-15, 18:30 — woodyX login nodes upgraded to SLES11SP2
- 2012-08-15, 19:00 — batch processing resumed on all clusters but Woody
- 2012-08-16, 18:00 — batch processing resumed on Woody, too
There will be an other scheduled downtime starting on Sunday 2012-08-19 night at 23:00 when /home/woody will be set read-only. On Monday 2012.-08-20, /home/woody will not be available at all for several hours. There is a system reservation on all HPC clusters to prevent jobs starting which collide with this scheduled downtime.