Suggestions for building a community-wide portal for open access to plasma data sets

I just had a chat with Alisdair Davey, who is the DKIST data center scientist and has been one of the core people behind the Virtual Solar Observatory. I asked him for advice to the plasma community that we should know if we were to create a similar data portal for open access to experimental and simulation plasma data sets. His thoughts and suggestions were:

  • The creators of a portal will probably need to offer assistance to groups for making the data openly available.
  • It may be necessary to provide a central repository to put the data so that groups don’t have to come up with something themselves.
  • We as a community will probably need to create our own data model. We could find ~10 people to provide a typical data set, figure out what is common, and construct a data model based on that. This will very likely be a lot of work. (Yay openPMD!)
  • We will probably need to spend some time persuading people of the benefits of making data openly available. He suggested pointing to some of the references that show the benefits. (What I usually do is point to an issue of the Astrophysical Journal, and say that a very large fraction of these papers were able to happen because of open data policies.)
  • There will likely be objections that “people will misuse the data.” This speaks to the need for good documentation, which we should encourage. Moreover, groups should only make science quality data available.
  • We should have the data model ready and insist on people following standards. This will make our lives easier.
  • It might be necessary to help groups with reformatting data sets to fit standards. This may have the side benefit that having someone go through a data set will help uncover issues (i.e., inconsistencies in the metadata, etc.) which can then be corrected.
  • VSO originally was about five people working, working between half time and full time on it.
  • We should insist that data included in the repository/database to be irrevocably open access (e.g., CC-BY 4.0), as there can be problems if the creators revoke data access later on.
  • Specific to the US: NSF has new open access policies that should apply to new projects. It may be worth checking with program managers at NSF to send a letter to highly encourage groups to make their data open access. It would be worth looking into DOE’s data access policies too and talking with DOE program managers.