Resources for Researchers

Ready-to-go Text for Your Proposal

You can use these text blocks as is, or select only the details relevant to your proposal. If you need more information, please contact us at icds@psu.edu.

Jump to:

Data Management Plans: Storage and Preservation

Over the course of the research project, research data will be hosted by the Pennsylvania State University’s Institute for Computational and Data Sciences (ICDS) through its Roar supercomputer. Roar provides both active storage (for data that is being worked on, requiring frequent access) and archival storage (for longer term data storage).

Active storage is available as scratch storage on Vast Data platform using all-flash storage, to provide both capacity and performance. Active storage uses regular snapshots to provide recovery from accidental data removal. ICDS provides long-term archival services. Archive data is stored on resilient data storage and is available for easy retrieval. The archival data is at least stored for the grant mandated duration

Data Security

ICDS implements various security measures to ensure that data stored on the Roar system remains safe. Roar requires a strong password and two-factor authentication for access, and all access can be audited by ICDS staff. To mitigate the potential for malicious software and security attacks, Roar employs automated weekly scans for identifying and patching software vulnerabilities. Roar provides the capability to encrypt data in-flight (when moving between points) and at rest (while written in storage). Roar login/endpoint nodes are protected by software-based firewalls that only permit Secure Shell (SSH) traffic. By default, ICDS enforces “Least Privilege” access concepts across the system, providing users with only the minimum set of permissions and accesses required to complete their function.

Roar storage is physically protected in Penn State’s Tower Road Data Center (TRDC). Physical access to the systems is limited to systems administration personnel with exceptions controlled by the TRDC’s secure operations center. TRDC requires swipe-card access and is monitored at all times.

Data Center Facilities

Roar equipment is in a newly constructed Data Center facility at Penn State’s University Park Campus. This facility operates in compliance with all Penn State IT policies. The facility provides 2.15MW power capacity and contains 12,000 square feet of floor space for computing equipment, termed as the data center “white space.” The building is powered efficiently and is undergoing LEED certification. The facility is designed to operate with an annualized average Power Usage Effectiveness (PUE) of 1.21.

Power (2N configuration):

  • 2 independent utility feeds.
  • Dual 2MW diesel generators, with sufficient fuel capacity to support generators for 48 hours (minimum).
  • All power is backed up by static uninterruptible power supplies (UPS).
  • Power provided in a Tier 1 configuration (single non-redundant distribution path serving the IT equipment; non-redundant capacity components with expected availability of 99.671%) and a Tier 3 configuration (multiple independent distribution paths serving the IT equipment; concurrently maintainable site infrastructure with expected availability of 99.982%).

Cooling (N+1 configuration):

  • Primary cooling is provided via indirect evaporative means with cold air supplied to the whitespace via a raised floor plenum.
  • All racks exhaust from the rear into a hot-aisle containment systems.
  • Select quantity of racks are fitted with rear-door heat exchangers with chilled water piping, to accommodate the highest rack power densities as needed.

Building control and monitoring:

  • Automation system for mechanical system controls, rack level monitoring, an electrical power monitoring system, and a data center infrastructure management system
  • Fire protection includes fire alarms and a Very Early Warning Smoke Detection Apparatus (VESDA).
  • All environmental conditions, systems and networks are monitored from an operations center that is staffed on a 24x7x365 basis.

Security:

  • All people within the data center must display an authorized Penn State ID at all times, and visitors must check in at a security station to receive a visitor/escort badge.
  • Electronic locking on all doors throughout the spaces with additional authorization required to whitespace and mechanical areas.
  • Security is monitored by video surveillance cameras, located inside and outside of the facility to capture and monitor all activity of protected areas.

Roar Equipment

The Roar high-performance research cloud is composed of hardware that is interconnected over high-speed network fabrics and includes various software offerings and services.

Hardware

Roar currently maintains more than 40,000 computational cores. Roar offers various configurations including options for compute nodes including systems with the latest NVIDIA GPUs, systems with more than 1.5TB of memory and systems with HDR infiniband.

The compute nodes are sourced from tier 1 suppliers and have maintenance agreements to ensure consistent availability.

Roar also maintains over 19PB of available storage capacity divided into various pools with differing performance and capacities including both parallel data storage and long-term archival storage. The compute and storage hardware are interconnected using high speed Ethernet and, in some cases, Infiniband network fabrics. This includes connectivity of 10Gbps to 200Gbps connectivity for different node configurations, depending on the needs of the resources. Software

Roar maintains and regularly updates an expansive software stack. The stack currently contains 240 applications, with more added at regularly-scheduled intervals. The applications include security monitoring software (e.g., OSSEC), batch schedulers (e.g., Slurm), compilers, file transfer programs, and communication libraries (e.g., MPI, OpenMP). The system also contains software applications commonly used by researchers, such as MATLAB, COMSOL, R, and Python, as well as programs for performing specialized tasks, such as Abaqus, Quantumwise, and TopHat.

Roar Support

Roar is maintained by the ICDS staff, who provide network monitoring, backup services, software updates, code optimization, and service-desk support. ICDS monitors the health and status of the network, hardware, and storage. Roar is actively monitored during normal business hours (9:00 AM – 5:00 PM) Monday through Friday. Roar also uses tools such as Splunk and NESSUS Professional to both monitor scan the system for potential vulnerabilities such as hacking and Denial of Service (DoS) attacks.

The ICDS website offers documentation to help users resolve technical issues they may encounter. This support is supplemented by the i-ASK Center, a service desk which supplies expert technical assistance for user problems. In the event of more complex issues, the engineers of the ICDS Technical Support Team provide advanced in-person support to users to ensure that problems are resolved in a timely and professional manner.

Roar Security Information

The Institute for Computational and Data Sciences Roar system implements the following security measures:

  • Electronic Security
  • Physical Security
  • Controls for Servers / Data Access
  • Data Destruction
  • Electronic Security

The Roar architecture enables electronic security through file access controls and mitigation of software vulnerabilities. Roar provides the capability to audit all system access and requires a strong password and two-factor authentication. To mitigate the

potential adverse impacts of malicious software and security attacks, Roar uses automated mechanisms to identify and patch for software vulnerabilities.

Physical Security

Roar is deployed in secure data facilities located on University premises. Each data center requires card swipe and/or pin access to gain entrance into the physical space. Access is limited to systems administration personnel only with exceptions controlled by the Information Technology Services (ITS) SOC (Secure Operations Center). The data center has successfully completed a DCAA audit.

Controls for Servers / Data Access

Roar login/endpoint nodes are protected by software based firewalls which only permit Secure Shell (SSH) traffic. Other connections are immediately dropped. Data and services hosted on the Roar are not discoverable by the public internet. By default, Roar enforces Least Privilege access concepts across the system, providing users with only the minimum set of permissions and accesses required to complete their function. File systems are secured with standard POSIX based Access Control Lists (ACLs) as well as standard Unix directory and file permissions. This enables individual accounts to be organized into groups; a Principal Investigator (PI) may designate specific users in the PI’s group to access certain data. Group access to sensitive data, such as genomic and phenotypic data, is only granted to users with the consent of the responsible Principal Investigator (PI). Users are only permitted access to data which they have permission to view. For example, a user in one group with access to NIH data is not by default granted access to the NIH data of another group.

Data Destruction

Data stored on Roar is snapshotted daily and remains active for a period of 90 days. Snapshots are automatically purged once the 90 days has been exceeded.

All PIs, along with Roar and PSU IT leadership, are required to sign an NIH Compliance document prior to storing any relevant data on Roar.

Roar meets the standards laid out in NIH’s “Security Best Practices for Controlled-Access Data Subject to the NIH Genomic Data Sharing (GDS) Policy” document. Roar is compliant with NIST Special Publication 800-171, “Protecting Controlled Unclassified Information in Nonfederal Information Systems and Organizations”. Roar is targeting FedRAMP certification, which maximizes the security benefits that, once realized by Roar, ensures compliance with NIST Special Policies and embedded references to their associated security requirements by agencies such as the National Institutes of Health (NIH).