C. L. Alappat, A. Alvermann, A. Basermann, H. Fehske, Y. Futamura, M. Galgon, G. Hager, S. Huber, A. Imakura, M. Kawai, M. Kreutzer, B. Lang, K. Nakajima, M. Röhrig-Zöllner, T. Sakurai, F. Shahzad, J. Thies, and G. Wellein: ESSEX: Equipping Sparse Solvers For Exascale. In: Bungartz HJ., Reiz S., Uekermann B., Neumann P., Nagel W. (eds) Software for Exascale Computing – SPPEXA 2016-2019. Lecture Notes in Computational Science and Engineering 136, 143-187 (2020). Springer, Cham. Available with Open Access. DOI: 10.1007/978-3-030-47956-5_7
C. L. Alappat, G. Hager, O. Schenk, J. Thies, A. Basermann, A. R. Bishop, H. Fehske, and G. Wellein: A Recursive Algebraic Coloring Technique for Hardware-Efficient Symmetric Sparse Matrix-Vector Multiplication. ACM Trans. Parallel Comput. 7(3), Article 19 (June 2020), 37 pages. Available with Open Access. DOI: 10.1145/3399732.
D. Ernst, G. Hager, J. Thies, and G. Wellein: Performance Engineering for a Tall & Skinny Matrix Multiplication Kernel on GPUs. Accepted for PPAM’2019, the 13th International Conference on Parallel Processing and Applied Mathematics, September 8-11, 2019, Białystok, Poland. Preprint: arXiv:1905.03136
A. Alvermann, A. Basermann, H.-J. Bungartz, C. Carbogno, D. Ernst, H. Fehske, Y. Futamura, M. Galgon, G. Hager, S. Huber, T. Huckle, A. Ida, A. Imakura, M. Kawai, S. Köcher, M. Kreutzer, P. Kus, B. Lang, H. Lederer, V. Manin, A. Marek, K. Nakajima, L. Nemec, K. Reuter, M. Rippl, M. Röhrig-Zöllner, T. Sakurai, M. Scheffler, C. Scheurer, F. Shahzad, D. Simoes Brambila, J. Thies, and G. Wellein: Benefits from using mixed precision computations in the ELPA-AEO and ESSEX-II eigensolver projects. Proc. EPASA 2018, Japan Journal of Industrial and Applied Mathematics, 36(2), 699-717, DOI: 10.1007/s13160-019-00360-8. Preprint: arXiv:1806.01036.
F. Shahzad, J. Thies, M. Kreutzer, T. Zeiser, G. Hager, and G. Wellein: CRAFT: A library for easier application-level checkpoint/restart and automatic fault tolerance. Accepted for publication in IEEE Transactions on Parallel and Distributed Systems. DOI: 10.1109/TPDS.2018.2866794, Preprint: arXiv:1708.02030
M. Kreutzer, D. Ernst, A.R. Bishop, H. Fehske, G. Hager, K. Nakajima, and G. Wellein: Chebyshev Filter Diagonalization on Modern Manycore Processors and GPGPUs. In: R. Yokota, M. Weiland, D. Keyes, and C. Trinitis (eds.): High Performance Computing: 33rd International Conference, ISC High Performance 2018, Frankfurt, Germany, June 24-28, 2018, Proceedings, Springer, Cham, LNCS 10876, ISBN 978-3-319-92040-5 (2018), 329-349. DOI: 10.1007/978-3-319-92040-5_17. ISC 2018 Hans Meuer Award Finalist.
K. Nakajima and T. Hanawa. Communication-computation overlapping with dynamic loop scheduling for preconditioned parallel iterative solvers on multicore/manycore clusters. IEEE Proceedings of 10th International Workshop on Parallel Programming Models & Systems Software for High-End Computing (P2S2 2017) in conjunction with the 46th International Conference on Parallel Processing (ICPP 2017) .
T. Katagiri, S. Ohshima and M. Matsumoto. Auto-tuning on NUMA and many-core environments with an FDM code. IEEE Proceedings of the 12th International Workshop on Automatic Performance Tuning
(iWAPT2017) (in conjunction with the IEEE IPDPS 2017) .
T. Iwashita, A. Ida, T. Mifune and Y. Takahashi. Software framework for parallel BEM analyses with H-matrices using MPI and OpenMP. Procedia Computer Science 108, (2017) 2200–2209.
N. Nomura, A. Fujii, T. Tanaka, K. Nakajima and O. Marques. Performance analysis of SA-AMG method by setting extracted near-kernel vectors. Lecture Notes in Computer Science (LNCS) 10150, (2017) 52–63.
N. Tominaga, T. Mifune, A. Ida, Y. Sogabe, T. Iwashita and N. Amemiya. Application of hierarchical matrices to large-scale electromagnetic field analyses of coils wound with coated conductors. Proceedings of the 25th
International Conference on Magnet Technology (MT25).
M. Kawai, A. Ida and K. Nakajima. Hierarchical parallelization of multicoloring algorithms for block ICC preconditioners. IEEE 19th International Conference on High Performance Computing and Communications (HPCC) 19, (2017) 138–145.
A. Ida, T. Ataka, T. Mifune, Y. Takahashi, T. Iwashita and A. Furuya. Application of improved H-matrices in micromagnetic simulations. IEEE Transactions on Magnetics 54(3).
A. Ida. Lattice H-matrices on distributed-memory systems. 32nd IEEE International Parallel & Distributed Processing Symposium (IPDPS 2018) Accepted.
I. Yamazaki, A. Abdelfattah, A. Ida, S. Ohshima, S. Tomov, R.Yokota and J. Dongarra. Analyzing performance of Bicgstab with hierarchical matrix on GPU clusters. 32nd IEEE International Parallel & Distributed Processing
Symposium (IPDPS 2018) Accepted.
N. Tominaga, T. Mifune, A. Ida, Y. Sogabe, T. Iwashita and N. Amemiya. Application of hierarchical matrices to large-scale electromagnetic field analyses of coils wound with coated conductors. IEEE Transactions on
Applied Superconductivity 28(3), (2018) 1–5.
S. Ohshima, I. Yamazaki, A. Ida and R. Yokota. Optimization of hierarchical matrix computation on GPU. Supercomputing Asia 2018 Accepted.
A. Ida, T. Ataka, T. Mifune, Y. Takahashi, T. Iwashita and A. Furuya. Application of improved H-matrices in micromagnetic simulations. 21st International Conference on the Computation of Electromagnetic Fields
(Compumag 2017) .
W. Song, F. W. Wubs, J. Thies, and S. Baars: Numerical Bifurcation Analysis of a 3D Turing-Type Reaction-Diffusion Model. Communications in Nonlinear Science and Numerical Simulation 60, 145-164 (2018). DOI: 10.1016/j.cnsns.2018.01.003
M. Galgon, L. Krämer, and B. Lang: Improving projection-based eigensolvers via adaptive techniques. Numer. Linear Algebra Appl., e2124:1–15, 2017. DOI: 10.1002/nla.2124
M. Galgon, L. Krämer, B. Lang, A. Alvermann, H. Fehske, A. Pieper, G. Hager, M. Kreutzer, F. Shahzad, G. Wellein, A. Basermann, M. Röhrig-Zöllner, and J. Thies: Improved coefficients for polynomial filtering in ESSEX. In T. Sakurai, S.-L. Zhang, T. Imamura, Y. Yamamoto, Y. Kuramashi, and T. Hoshi (eds.), Eigenvalue Problems: Algorithms, Software and Applications, in Petascale Computing. Proc. EPASA 2015, Tsukuba, Japan, September 2015, volume 117 of LNCSE, pages 63–79. Springer International Publishing, 2017. DOI: 10.1007/978-3-319-62426-6_5
M. Kreutzer, J. Thies, M. Röhrig-Zöllner, A. Pieper, F. Shahzad, M. Galgon, A. Basermann, H. Fehske, G. Hager, and G. Wellein: GHOST: Building blocks for high performance sparse linear algebra on heterogeneous systems. International Journal of Parallel Programming 45, 1046 (2016). DOI: 10.1007/s10766-016-0464-z. Preprint: arXiv:1507.08101
H. Anzt, J. Dongarra, M. Kreutzer, G. Wellein, and M. Köhler: Efficiency of General Krylov Methods on GPUs – An Experimental Study. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Chicago, IL, 683-691 (2016). DOI: 10.1109/IPDPSW.2016.45
H. Anzt, M. Kreutzer, E. Ponce, G. D. Peterson, G. Wellein, and J. Dongarra: Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs. International Journal of High Performance Computing Applications, first published on May 5, 2016. DOI: 10.1177/1094342016646844
F. Shahzad, M. Kreutzer, T. Zeiser, R. Machado, A. Pieper, G. Hager, and G. Wellein: Building and utilizing fault tolerance support tools for the GASPI applications. International Journal of High Performance Computing Applications (2016). First published date: November-28-2016, DOI: 10.1177/1094342016677085. Preprint (post-review)
A. Pieper, M. Kreutzer, A. Alvermann, M. Galgon, H. Fehske, G. Hager, B. Lang, and G. Wellein: High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations. Journal of Computational Physics 325, 226-243 (2016). DOI: 10.1016/j.jcp.2016.08.027, Preprint: arXiv:1510.04895
M. Röhrig-Zöllner, J. Thies, M. Kreutzer, A. Alvermann, A. Pieper, A. Basermann, G. Hager, G. Wellein, and H. Fehske: Increasing the performance of the Jacobi-Davidson method by blocking. SIAM Journal on Scientific Computing, 37(6), C697–C722 (2015). DOI: 10.1137/140976017, Preprint: http://elib.dlr.de/89980/
F. Shahzad, M. Kreutzer, T. Zeiser, R. Machado, A. Pieper, G. Hager, and G. Wellein: Building a fault tolerant application using the GASPI communication layer. Acepted for FTS 2015, the 1st International Workshop on Fault-Tolerant Systems, in conjunction with IEEE Cluster 2015, September 8, 2015, Chicaco, IL. Preprint: arXiv:1505.04628
L. Bakemeier, A. Alvermann, and H. Fehske: Route to chaos in optomechanics. Phys. Rev. Lett. 114, 013601 (2015). DOI: 10.1103/PhysRevLett.114.013601
M. Galgon, L. Krämer, B. Lang, A. Alvermann, H. Fehske, and A. Pieper: Improving robustness of the FEAST algorithm and solving eigenvalue problems from graphene nanoribbons. Proc. Appl. Math. Mech. 14(1), 821-822 (2014). DOI: 10.1002/pamm.201410391. Preprint
M. Kreutzer, G. Hager, G. Wellein, A. Pieper, A. Alvermann, and H. Fehske: Performance Engineering of the Kernel Polynomial Method on Large-Scale CPU-GPU Systems. Proc. IPDPS15, the 29th IEEE International Parallel & Distributed Processing Symposium, May 25-29, 2015, Hyderabad, India. DOI: 10.1109/IPDPS.2015.76, Preprint: arXiv:1410.5242
A. Alvermann, A. Basermann, H. Fehske, Martin Galgon, G. Hager, M. Kreutzer, L. Krämer, B. Lang, A. Pieper, M. Röhrig-Zöllner, F. Shahzad, J. Thies, and G. Wellein: ESSEX: Equipping Sparse Solvers for Exascale. In: L. Lopes et al. (eds.), Proc. Euro-Par 2014 Workshops Part II, LNCS vol. 8806, Springer (2014), 577-588. DOI: 10.1007/978-3-319-14313-2_49, Preprint
A. Pieper, R. L. Heinisch, G. Wellein, and H. Fehske: Dot-bound and dispersive states in graphene quantum dot superlattices. Physical Review B 89, 165121 (2014). DOI: 10.1103/PhysRevB.89.165121. Preprint: arXiv:1404.2097
A. Pieper, G. Schubert, G. Wellein, and H. Fehske: Effects of disorder and contacts on transport through graphene nanoribbons. Physical Review B 88, 195409 (2013). DOI: 10.1103/PhysRevB.88.195409, Preprint: arXiv:1308.6079
M. Kreutzer, G. Hager, G. Wellein, H. Fehske, and A. R. Bishop: A unified sparse matrix data format for modern processors with wide SIMD units. SIAM Journal on Scientfic Computing 36(5), C401–C423 (2014). DOI: 10.1137/130930352. Preprint: arXiv:1307.6209
F. Shahzad, M. Wittmann, M. Kreutzer, T. Zeiser, G. Hager, and G. Wellein: A survey of checkpoint/restart techniques on distributed memory systems. Parallel Processing Letters 23(04), 1340011-1340030 (2013). DOI: 10.1142/S0129626413400112
M. Galgon, L. Krämer, J. Thies, A. Basermann, and B. Lang: On the parallel iterative solution of linear systems arising in the FEAST algorithm for computing inner eigenvalues. Parallel Computing, Available online 25 June 2015, ISSN 0167-8191, DOI: 10.1016/j.parco.2015.06.005, Preprint BUW-IMACM 14/35
Papers (submitted)
C. L. Alappat, G. Hager, O. Schenk, J. Thies, A. Basermann, A. R. Bishop, H. Fehske, and G. Wellein: A Recursive Algebraic Coloring Technique for Hardware-Efficient Symmetric Sparse Matrix-Vector Multiplication. Submitted. Preprint: arXiv:1907.06487
M. Galgon, L. Krämer, and B. Lang. Adaptive choice of projectors in projection based eigensolvers. Submitted. Preprint BUW-IMACM 15/07
L. Krämer: Convergence of integration based methods for the solution of standard and generalized Hermitian eigenvalue problems. Submitted. Preprint BUW-IMACM 14/30
Posters
T. Fukasawa, F. Shazad, K. Nakajima, G. Wellein: pFEM-CRAFT: A Library for Application-Level Fault-Resilience Based on the CRAFT Framework. Poster at the 2018 SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP18), March 7-10, 2018, Tokyo, Japan.
F. Shahzad, J. Thies, M. Kreutzer, T. Zeiser, G. Hager, and G. Wellein: CRAFT: A library for checkpoint/restart and automatic fault tolerance. PhD Forum Poster at International Supercomputing Conference (ISC) 2017, June 19, 2017, Frankfurt, Germany.
M. Röhrig-Zöllner, J. Thies, M. Kreutzer, A. Alvermann, A. Pieper, A. Basermann, G. Hager, G. Wellein, and H. Fehske: Performance of Block Jacobi-Davidson Eigensolvers. Poster at SC14, The International Conference for High Performance Computing, Networking, Storage and Analysis.
Moritz Kreutzer: A unified sparse matrix storage format for heterogeneous systems. Poster and talk at the Early Research Showcase track at SC13 (Denver, CO), Nov 21, 2013. sc13_poster-final.pdf