From HFML-FELIX Wiki
Revision as of 11:13, 12 December 2025 by Jmbakker (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Data management plans

[edit | edit source]

Many funding agencies require FAIR data, i.e., data gets stored for longer times and available for re-use. Particularly the EU seems very pushy here. For user facilities, the Lasers4EU project has opened a RDM questionnaire to generate DMP for every user experiment.

Radboud Data Repository

[edit | edit source]

Many funding bodies require data produced with their support to be stored over longer periods of time and make them accessible to other researchers. For this adherence to the FAIR principle of data management (findable, accessible, interoperable and readable) is often required. We propose to use the Radboud Data Repository (RDR) for storage of HFML-FELIX research data. RDR offers a well-structured way to store data, to give access to data either through Surfconext for Dutch academic institutions, or via ORCID. Collections published will have a digital object identifier (DOI) code, ensuring digital persistence. RDR further guarantees storage over long periods of time.

The Radboud Data Repository (RDR) knows three collection types.

1. A Data Acquisition Collection (DAC), aimed at storing raw data for a minimum of 10 years, including metadata (examples: a well-documented lab book scanned in as a PDF, or electronic labbooks, such as ELabFTW, OneNote or ....). This collection will be embargoed for the applicant's use for a fixed period, after which it will become publicly accessible. We propose a period of two years to enable the data's creator to publish the results.

2. A Research Documentation Collection (RDC), where all derivative data will be stored, such as OriginLab projects, Matlab scripts, figures, drafts of paper(s) etc. This collection will never be published, but can be useful for shared working on projects.

3. A Data Sharing Collection (DSC), for targeted sharing of raw data, such as data underlying Figures and Tables in peer-reviewed scientific journals, Ph.D. theses. DSC access can initially be limited to reviewers only, but after publication will be Open Access.

We propose that

  • all data acquired via proposals granted by HFML-FELIX will be stored in separate DACs. DACs will be created upon awarding the proposals by local HFML-FELIX administrators (currently: JB, HE, and ...) and will be managed by the project's local contact, who will grant contributing or reading access to the project's collaborators.
  • all data not subject to facility access proposals will be stored in separate DACs linked to the (junior) researcher involved, such as the PhD candidate, the Postdoctoral fellow. Such collection will be managed by the researcher's line manager (an HFML-FELIX scientific staff member) who will grant access.
  • data underlying peer-reviewed scientific publications will be stored in DSCs, managed by the HFML-FELIX scientific staff member involved as PI.
  • data underlying Ph. D. theses that are not already shared in a publication-linked DSC will be stored in a thesis-linked DSC.


Naming conventions

[edit | edit source]
Facility access (proposal based)
DAC: obligatory
Title proposal title (proposer)
identifier felix-dac-projectcode-proposer
hfml-dac-projectcode-proposer
RDC: optional
Title proposal title (proposer) documentation
identifier felix-rdc-projectcode-proposer
hfml-rdc-projectcode-proposer
DSC: no
PhD Theses
DAC: obligatory
Title title thesis (PhD candidate name) or Thesis Jane Doe
identifier thesis-last name-initials-dac
RDC: optional
Title title thesis (PhD candidate name)
identifier thesis-last name-initials-rdc
DSC: obligatory
Title title thesis (PhD candidate name)
identifier thesis-last name-initials-dsc
Publications
DSC only
Title title publication (PI name)
identifier pub-year-journal abbreviation-PI last name-dsc

Research Data Management Paragraph

[edit | edit source]

Below an example for a RDM paragraph in Ph. D. theses

Research data management

The research conducted during this PhD project and presented in this thesis had been conducted in accordance with the research data management policy of the Institute for Molecules and Materials of Radboud University available at https://www.ru.nl/rdm/vm/policy-documents/policy-imm/.

The data presented in this thesis is available from a Radboud Data Repository Data Sharing Collection and can freely be accessed at https://doi.org/10.34973/x530-cy72.

The raw measurement data underlying these, consisting of

  • HDF5 container files with FTICR time transients per IR wavelength
  • ASCII files of the laser power measurements
  • PDF scans of written laboratory books

are deposited in Radboud Data Repository Data Acquisition Collections. These data, as well as the software with which the HDF5 files can be analyzed, will be made available upon reasonable request to the promotor, the co-promotor or the HFML-FELIX data steward.

Raw measurement data underlying the results presented in Chapter 5 is available from a separate Radboud Data Repository Data Sharing Collection at https://doi.org/10.34973/t53x-8g50.

=== Update on Data Sharing Collections, upon introduction of FAIR review mid 2024

Below, an email sent by Hans Engelkamp to HFML, detailing how the FAIR review affects publishing Data Sharing COllections

Dear all,

Thus far, we have been using two types of data collections for PhD projects. They are DAC (data acquisition collection) and DSC (data sharing collection). The first is important for us to ensure long-term, safe storage of all our data. The latter we use as supplementary data for theses, and it becomes publicly available after some embargo time (5 years is EMFL policy). In the DSC, we basically only store data that is directly linked to figures, tables, and calculations in your thesis. The DAC could/should be used throughout your PhD, and the DSC should typically be populated when papers and chapters are finalized. After the thesis is approved, I will publish the DSC to make the DOI active and to seal the collection.

In the DAC we are pretty much free to put whatever we want, in the way we want. For the DSC however, it was recently decided (Radboud level) that any collection that is to be published needs to be curated by a team of independent people to see whether it adheres to the FAIR principles. This is called “FAIR review”. In principle, they want to be as strict s possible, and we (Joost Bakker and I) want to be as pragmatic as possible. Personally, I do not want to make it easy for Chinese and Russian paper mills to use our data without proper credits (sorry for the racism), so I am already happy if the data only makes sense to people who also actually read your thesis. This is part of an ongoing discussion between me and the FAIR review people.

Radboud’s best practices for creating the DSC can be found here: https://data.ru.nl/doc/help/helppages/best-practices.html Doing it exactly as they want is a lot of work, as it means to convert all data to file formats that are at least on the accepted file types list (which for instance means that you would need to convert .dat files to .txt files, and .eps files to .svg, and origin files could not be used at all), and also producing a file in the top-most level in the folder structure of the DSC which explains what data is where.*

The procedure that worked and got for instance Claudius’ DSC accepted without too much work for anybody is the following:

- I include the English summary of the thesis in the description field - I explained why the datafiles did not adhere to the acceptable file format list (it is likely that the list will be updated accordingly at some point) - I explained that the folder structure was self-explanatory and that each folder contained its own metadata file.

  • I think it will be a good alternative to provide a file in the top level folder with a brief description of the structure of the data set, without too much detail. It could for instance explain that it is structured according to chapters, and that each folder contains a file with meta data, etc.

With this strategy, I think we can work for the time being.

Practical use of the RDR

[edit | edit source]

Checking file integrity on lilo

[edit | edit source]
  • download manifest file
  • transfer manifest file to lilo
  • log in to lilo
cd <folder where data come from>
sha256sum -c <manifestfile>

Check whether all files in your local directory have been transferred

[edit | edit source]

The method above only checks integrity of files on RDR, not whether you forgot something.

  • Method 1: compare the number of lines in the manifest file (or select lines with 'grep') and compare that to the number of files in a directory
jmbakker@lilo8 2025 $ pwd
/vol/hfmlfelixdisk/data/cmp/clusters/flc-molbeam/2025
jmbakker@lilo8 2025 $  grep 250 ~/watson.manifest  | wc -l
239
jmbakker@lilo8 2025 $ find 250*atson* -type f | wc -l
239
  • Method 2: Re-upload all directories you need with cyberduck and use the 'compare' option to not overwrite existing files
screenshot from cyberduck