meta data for this page
  •  
This translation is older than the original page and might be outdated. See what has changed.
Translations of this page:
Error loading plugin include
ParseError: syntax error, unexpected 'include' (T_INCLUDE), expecting identifier (T_STRING) or '{'
More info is available in the error log.

Cluster KRAKEN

KRAKEN - Hardware

KRAKEN cluster is composed of frontend node (access, queue control, job preparation, …)

It is prohibited to run computational tasks at the frontend node!

Processor:AMD EPYC 7302P 16-Core Processor 16 cores, 3.0GHz, hyperthreading, 128MB cache
Memory:320GBDDR4 3200 ECC
Disk space:2x 960GB NVMe M.2 SSD
Remote control:IPMI KVM-o-E

and two computing parts (only the “M” part is accessible to all users):

M - as MultiCore

Part M contains a total of 10 compute nodes (576 cores in total, 3.33TB RAM) built on three processor architectures:

1. Intel - broadwell, 6 nodes (kraken-m1, …, kraken-m6):

Motherboard:SUPERMICRO X10DRW-ET 2x Intel Xeon processor E5-2600 v4, max. 2TB RAM, 2x 10 Gbit Ethernet, Remote controll
Processors:2x Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10-3.0GHz 16 cores, hyperthreading, 48MB cache
Memory:256GB per node DDR4 2400MHz ECC reg.
Disk storage:4x 6TB SATA, 2x1TB SSD TOSHIBA MG04ACA6, Micron_5100_MTFD
Remote controll:IPMI IPMI 2.0 with virtual media over LAN and KVM-over-LAN support

2. AMD - zen 2, 3 nodes (kraken-m7,…,kraken-m9), in operation from 10/2021:

Processors:2x 2nd Gen AMD EPYC(TM) 7552 48 cores, 2.2-3.3GHz, 192MB cache (96 cores per node)
Memory:512GB per nodeDDR4 3200MHz ECC
Disk storage:960GB per node NVMe M.2 SSD
Remote controll:IPMI KVM-o-E

3. AMD - zen 4, 1 node (kraken-m10), in operation from 11/2023:

Processors:1x 4nd Gen AMD EPYC(TM) 9654P 96 cores, 2.4-3.7GHz, 384MB cache
Memory:256GB DDR5 4800MHz ECC
Disk storage:960GB NVMe M.2 SSD
Remote controll:IPMI KVM-o-E

L - as LowCore (available only to selected users)

Part L contains 4 nodes (kraken-l1,…,kraken-l4):

Motherboard:SUPERMICRO X10DRW-ET 2x Intel Xeon processor E5-2600 v4, max. 2TB RAM, 2x 10 Gbit Ethernet, Remote controll
Processors:2x Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz 4 cores, hyperthreading, 16MB cache
Memory:256GB per node DDR4 2400 ECC reg.
Disks:4x 6TB SATA, 2x1TB SSD TOSHIBA MG04ACA6, Micron_5100_MTFD
Remote controll:IPMI IPMI 2.0 with virtual media over LAN and KVM-over-LAN support
Server room temperature

Temperature TR1- does not work Limit cluster performance based on server room temperature:

  1. 32˚C - 34˚C to restrict running additional queued jobs (DRAIN mode)
  2. 34˚C - 36˚C to shut down machines (DOWN mode)

DRAIN mode limiting is first performed on machines that are about to run out of jobs.

DOWN mode is first performed on machines running jobs with lower “run time/declared run time” ratio.

During extended periods of high temperature problems, cluster is kept down in the following order of nodes

  1. m1-m6 nodes
  2. l1-l4 nodes
  3. m7-m10 nodes