Concomitant use of therapeutic drugs and natural products, including vitamins, minerals, herbal medicinal products, and other botanicals, is a frequent occurrence. Concomitant exposure of natural products with approved pharmaceutical therapies raises concerns of possible natural product-drug interactions (NPDIs) that could lead to patient harm. Research on NPDIs includes studies that characterize natural product chemical constituents, elucidate pharmacologic mechanisms, and identify potential clinical impact of NPDI exposure. The different kinds of datasets that arise from these various kinds of studies range from mass spectrometric to in vitro and in vivo pharmacokinetics. There is a growing recognition by both researchers and agencies that fund research that NPDI study datasets should be more findable, accessible, interoperable, and reusable (FAIR). The purpose of this Community Group Report is to propose a set of recommended approaches to help NPDI researchers make their data more FAIR. This report provides FAIR data recommendations organized into general and research-specific categories. The recommendations are written in manner intended to help researchers quickly identify and adopt those FAIR data practices most relevant to their research.

This draft is currently open for editing suggestions by all interested stakeholders.

This document is governed by the W3C requirements of the Community and Business Group Process related to deliverables.

Introduction

Concomitant use of therapeutic drugs and natural products, including vitamins, minerals, herbal medicinal products, and other botanicals, is a frequent occurrence. One study found that approximately 50% of Americans in their midlife were exposed to prescription drugs simultaneously with an dietary supplements, many of which were natural products [[Kiefer-2014]]. Concomitant exposure raises concerns of possible natural product-drug interactions (NPDIs) leading to patient harm. Spanakis et al. reviewed the evidence for 40 top-selling herbal medicinal products and found that 6 were implicated in clinically significant pharmacokinetic NPDIs [[Spanakis-2019]]. Posadzki et al. reported several adverse outcomes of NPDIs including transplant rejection, cardiovascular collapse, renal and liver toxicity, and death [[Posadzki-2013]]. Wilkinson et al. conducted a review that identified 81 NPDIs between 35 NPs and 31 drug classes [[Wilkinson-2016]].

Research on NPDIs includes studies that characterize natural product chemical constituents, elucidate pharmacologic mechanisms, and identify potential clinical impact of NPDI exposure. The different kinds of datasets that arise from these various kinds of studies range from mass spectrometric to in vitro and in vivo pharmacokinetics. The multidisciplinary nature of NPDI research means that there is a large variety of approaches archiving, sharing, and describing the datasets (i.e., metadata).

In the face of this complexity, there is a growing recognition by both researchers and agencies that fund research that NPDI study datasets should be more findable, accessible, interoperable, and reusable (FAIR)[[McAlpine-2019]][[NIH-NMR-FAIR]]. This report proposes a set of recommended approaches to help NPDI researchers make their data more FAIR. After clarifying the intended audience, scope, and context, this report provides recommendations organized into general and research-specific categories. To obtain a quick overview of the recommendations and follow links to specific FAIR practices, see the recommendations listing.

Audience

This report is written for anyone who generates NPDI research data. These include, but are not limited to analytical chemists and clinical pharmacologists. Other natural product stakeholders who will also benefit from the information in this report include funding agencies that support NPDI research, biomedical research librarians tasked with searching and/or indexing NPDI datasets, and data scientists working on natural product related analyses.

Scope

The following NPDI data are within the scope of this report:

Chemical analytical: Data derived from assays that identify, characterize and/or quantify the chemical constituents present in natural products.

Chemical metabolomics: Data derived from assays that assess similarity and differences between natural products based on their chemical constituents.

In vitro: Data derived from in vitro systems include parameters describing natural product binding (fu plasma, fu mic), metabolism and transport (Km, Vmax), inhibition potency (IC50, Ki), and induction potential (EC50, Emax).

IVIVE: in vitro to in vivo extrapolation approaches encompass numerous mathematical models to predict the magnitude of natural product-drug interactions. Mathematical models include static interaction models, mechanistic static models, and fully mechanistic physiologically-based pharmacokinetic interaction models.

Clinical: Appropriately designed clinical pharmacokinetic interaction studies involving a natural product. This includes studies where the natural product is the precipitant and a drug (or drugs) is the object can provide a definitive assessment of interaction risk. Clinical data includes pharmacokinetic outcomes (AUC, Cmax, Tmax, t1/2) describing object drug disposition in the absence and presence of the precipitating natural product.

Context

provides an overview of the context for this report. The contributors to this report have considered both the types of data and the data management workflows specific to chemistry and pharmacology researchers involved in NPDI research. They then reviewed journal articles, white papers, and other types of reports that make recommendations on how to make biomedical datasets more FAIR. They also reviewed specific requirements related to FAIR that are emerging from either organizations that fund NPDI research or journals that publish the NPDI studies. All items that the contributers reviewed are cited in the recommendations below. This report is intended to make the recommendations and requirements from these sources more accessible to researchers who study natural product chemistry and NPDI pharmacology. It distinguishes recommendations that apply generally to all types of NPDI research from those that are specific to natural product chemistry and NPDI pharmacology.

The context for this report is recognition that a number of recommeded approaches exist that will help make natural product – drug interaction study data more findable, accessible, interoperable and reusable (FAIR).

Recommendation Template

In this paper, all recommend approaches are provided using the template below. This template is similar to the template used in the World Wide Web Consortium Data on the Web Best Practices Recommendation [[dwbp]].

Recommendation Template

Short description of the recommendation

Why

This section answers two crucial questions:

A short description of the problem addressed by the recommended approach is also provided.

Intended Outcome

The benefits for NPDI researchers and the research community are expected if researchers follow the recommended approach. A recommendation can yield one or more benefits.

Possible Approach to Implementation

A description of a possible implementation strategy based on the state-of-the art.

How to Test

Information on how to test that a researcher has followed the recommended approach. In many cases, testing can be automated.

Example

A short example of the recommended approach applied to data emerging from NPDI research.

Recommendations that apply generally to NPDI research

Assign to a dataset a globally unique and persistent identifier such as a DOI Findability Reuse
Refer to the globally unique and persistent identifier in relevant publications Findability Reuse
Provide a license for data usage Reuse
Make the data update frequency and versioning information explicit Access Reuse
Deposit a description of the dataset (metadata) in a public data repository Findability Access Reuse
Use an existing machine readable metadata standard Findability Reuse
Strive to use standard data elements Findability Interoperability Reuse
Give future users of the data a way to provide feedback Reuse
Whenever possible, make the full dataset downloadable Access Reuse
If your lab website host datasets, serve the data using content negotiation Access Reuse
If possible, provide access to data via a well-documented REST-full API Findability Access Reuse

Recommendations specific to natural product chemistry

Provide raw data using an open, or at least well-documented, file format Access Reuse
Publish NMR and spectroscopic results following standards Access Reuse
Report experiment metadata along with the spectral raw data file Findability Interoperability Reuse
Use reproducible, and well-documented, NMR and spectroscopic analysis workflows Access Reuse

Recommendations specific to NPDI pharmacology

Give a clear description of the experiment Reuse

Recommendations that apply generally to NPDI research

Assign to a dataset a globally unique and persistent identifier such as a DOI

NPDI researchers should assign to datasets generated by their research a globally unique and persistent identifier. If possible, the identifier should also be resolvable.

Why

Researchers will more easily and reliably locate and cite NPDI datasets over a longer period of time than if no identifiers are assigned. When a dataset’s identifier is globally unique, references to the dataset are non-ambiguous. When the dataset’s identifier is persistent, the dataset is findable for a longer period of time than otherwise.

Intended Outcome

The NPDI research community stands to benefit from the more rapid and non-ambiguous location of NPDI datasets over a longer period of time than if this recommendation is not followed.

  • Findability
  • Reuse

Possible Approach to Implementation

There are many ways to create a globally unique and persistent identifiers including a Globally Unique Identifier (GUID) generator, a program that creates Uniform Resource Identifiers using a Uniform Resource Name (URN), a service that creates Persistent Uniform Resource Locators (PURL), or a service that creates Digital Object Identifiers (DOIs).

If the researcher intends for the dataset identifier to also resolve to a location on the Internet, the researcher could obtain a DOI through an affiliate that works with DataCite.org such as services like Zenodo, Figshare, the Harvard Dataverse or Dryad. If the researcher works for an academic library system, the library might have a method to obtain DOIs. For various reasons, a DOI might not be the best solution for a given research dataset. Some NPDI researchers share data on their laboratory website that has a specific domain name. In these cases, it might be possible for the researcher to create globally unique and persistent identifiers by combining the domain name with a GUID, URN, or a globally unique identifier that they create.

How to Test

In this case, design leads to outcome. If the dataset identifiers are DOIs, GUID, URNs, or PURLs, then they automatically satisfy this recommendation. If not, evaluate if the creator(s) of a NPDI dataset explicitly states that the identifiers were designed for persistence and global uniqueness.

Example

Data from all studies and experiments deposited in the NaPDI repository [[birer-williams]] are uniquely identified by a short persistant identifier created by combining a short unique key with "https://repo.napdi.org." For example, the study "Inhibitory effects of commonly used herbal extracts on UGT1A1 enzyme activity." is uniquely identied as https://repo.napdi.org/NPDI-S1YNJQ

Refer to the globally unique and persistent identifier in relevant publications

NPDI researchers should include mention of the globally unique and persistent identifier assigned to a given dataset in all publications and reports based on the dataset.

Why

Researchers will more easily and reliably locate and cite NPDI datasets over a longer period of time if it they know unequivocally what specific dataset a publication or report is based on. One of the main benefits from this practice is that some services that index the scientific literature will index mention of the identifier. This leads to the ability for researchers to locate publications related to a given dataset simply by searching with the identifier.

Intended Outcome

The NPDI research community stands to benefit from the more rapid and non-ambiguous location of NPDI datasets over a longer period of time than if this recommendation is not followed.

  • Findability
  • Reuse

Possible Approach to Implementation

Once a globally unique and persistent identifier exists for given dataset (see ), researchers can write the identifier directly into an appropriate section in the publication or report that they are authoring. The appropriate location for mention of the identifier depends on the nature of the publication and the relevant formatting requirements. In some cases, researchers can simply mention the identifier within the methods section. Other times, it might be more appropriate to use a data citation formatted according to the norms of the researcher's discipline.[[datacite]]

How to Test

In this case, design leads to outcome. If all relevant publications mention the dataset identifier, then they automatically satisfy this recommendation.

Example

All of the studies entered into the NaPDI repository [[birer-williams]] on Cannabis sativa were listed in the supplement to a published report with the persistant identifiers: https://dmd.aspetjournals.org/content/dmd/suppl/2020/06/29/dmd.120.000054.DC1/54_Supplemental_Material_table_2.pdf

Deposit a description of the dataset (metadata) in a public data repository

NPDI researchers should deposit dataset metadata in a public data repository that is indexed by major search engines.

Why

Dataset metadata informs other researchers about who created a given dataset, its purpose, when it was created, and how to access it [[dwbp]]. Depositing dataset metadata in publicly available resource that is indexed by major search engines means that this information will be more easily found by other researchers.

Intended Outcome

Depositing dataset descriptions in public repositories is an important artifact to increase the findability, accessibility, and reuse of data generated by the NPDI research community.

  • Findability
  • Access
  • Reuse

Possible Approach to Implementation

There are a number of repositories that a research could use to satisfy this recommendation including various Dataverse sites[[dataverse]], Zenodo[[zenodo]], Figshare[[figshare]], and Dryad[[dryad]]. A common feature of these repositories is that they ensure dataset descriptions are readable to both humans and computer programs such as web search crawlers. It is relatively common for a research lab to maintain a website that provides links to datasets created by lab members. A description of each dataset could be created and deposited in aforementioned repositories. Alternatively, the lab could write a human readable dataset description on in a location that is easy for website visitors to locate. However, this will not enable web search crawlers to index the metadata so that the datasets are more easily found via web search. To accomplish that, the website should provide the dataset description in the HTML head of the website as machine readable metadata using a format such as JSON-LD [[JSON-LD]] or HTML+RDFa [[html-rdfa]]. Examples of both human and machine readable dataset metadata are available in the Metadata section of the Data on the Web Best Practices W3C Recommendation [[dwbp]].

How to Test

If the researcher deposits a dataset description in a searchable resource, then they automatically satisfy this recommendation.

Example

The data and a complete metadata description of drug interaction study data from the drug approval package for Epidiolex (Cannabidiol) was extracted to a CSV file and posted on the Zenodo data sharing site: https://zenodo.org/record/1406038

Use an existing machine readable metadata standard if publishing data on a lab website

As mentions, many labs distribute datasets from a website that they maintain. The machine readable metadata that researchers put on their website could be represented a number of different ways in JSON-LD or HTML+RDFa [[html-rdfa]]. However, researchers should use an established metadata standard such as such as Dublin Core, Metadata Initiative [[dcmi]], Schema.org [[schema.org]], or the Data Catalog Vocabulary [[dcat]].

Why

A dataset produced from any kind of NPDI study will be more easily found and reused by other researchers if important facts about the dataset are described using standardized metadata elements that make since for the domain. For example, Google's new Dataset Search indexes sites that provide metadata using Schema.org [[google-data-search]].

Intended Outcome

Researchers should be able to find data on the lab website much more easily. The use of standards that are used by data search engines should increase the specificity of natural product-drug interaction search results. The metadata can provide clarity on data ownership and access rights that will help facilitate re-use.

  • Findability
  • Reuse

Possible Approach to Implementation

Established metadata standards include Dublin Core, Metadata Initiative [[dcmi]], Schema.org [[schema.org]], the Data Catalog Vocabulary [[dcat]], and others [[dwbp]]. Researchers with little programming experience can use tools such as Catalog Generator [[catalog-generator]] which can export standardized metadata based in information that a researcher enters into a web form or via uploading a spreadsheet. Other approaches include using the Center for Expanded Data Annotation and Retrieval (CEDAR) workbench [[cedar]] or the tools listed on the Project Open Data metadata resources site [[project-metadata-tools]]. In terms of content, at a minimum the researcher should provide the dataset name, identifier (see ), description, creator, date of last modification, date of publication (if applicable), and distribution formats (if applicable) as structured metadata. While many established metadata standards provide attributes for general use cases, they might not include attributes relevant to a given research domain. In such cases, the researcher may use custom metadata attributes that will be recognizable by research peers and appropriate to use cases relevant to the researchers scientific peers.

How to Test

Researchers can utilize Google's Structured Data Testing Tool [[google-data-testing]] to ensure that the metadata they create will be processed by Google properly. Researchers can copy and paste their metadata into the tool's form or provide a URL to a website page that has their metadata in the head of the HTML document.

Example

Every web page for every study and experiment deposited in the NaPDI repository [[birer-williams]] has a JSON-LD metadata description written using Schema.org tags. For example, the Green Tea (Camellia sinensis) metabolomics study web page contains the following metadata description which has been validated to be read by Google datasearch:

    {
        "@context": "http://schema.org",
        "@graph": [
        {
          "@context": "http://schema.org",
          "@type": "Organization",
          "name": "Center of Excellence for Natural Product-Drug Interaction Research",
          "url": "https://napdicenter.org",
          "logo": {
            "@type": "ImageObject",
            "url": "https://repo.napdi.org/images/NaPDI_home_page_logo_v1.png"
          }
        },
        {
        "@type": "DataCatalog",
        "name": "Green Tea Metabolomics",
        "description": "",
        "about": {
            "@context": "http://schema.org",
            "@type": "Substance",
            "description": "Natural product",
            "name": "Green Tea (Camellia sinensis)"
            
            ,"alternateName": "Green Tea Extract",
              "code": {
                "@type": "MedicalCode",
                "codeValue": "659476",
                "codingSystem": "RxNorm"
              }
            
        },
        "publisher": {
            "@type": "Organization",
            "name": "Center of Excellence for Natural Product-Drug Interaction Research",
            "url": "https://napdicenter.org",
            "logo": "https://repo.napdi.org/images/NaPDI_home_page_logo_v1.png"
        },
        "license": "https://creativecommons.org/licenses/by-nc-sa/4.0/",
        "funder": {
            "@type": "Organization",
            "name": "National Center for Complementary and Integrative Health",
            "url": "https://nccih.nih.gov"
        },
        "url": "http://repo.napdi.org/NPDI-jCrpqg",
        "identifier": [
            {
                "@context": "http://schema.org",
                "@type": "PropertyValue",
                "name": "NaPDI Study ID",
                "value": "null"
            }
            
        ],
        "author": [
           
        ],
        "isPartOf": {
            "@type": "DataCatalog",
            "name": "Natural Product-Drug Interaction Research Data Repository",
            "url": "http://repo.napdi.org"
        },
        "dataset": [
          
            
              
            {
              "@type": "Dataset",
              "name": "Green Tea Metabolomics Experiment",
              "url": "http://repo.napdi.org/NPDI--cZmFQ",
              "license": "https://creativecommons.org/licenses/by-nc-sa/4.0/",
              "description": "Metabolomics - Green Tea (Camellia sinensis) - Green Tea Metabolomics Experiment "
            }
            
          
        ]
      }
    ]
    }
	      

Whenever possible, make the full dataset downloadable

Make the full dataset associated with your NPDI experiments downloadable to other interested researchers.

Why

Sharing detailed research data is associated with the increased citation rate of an article (Piwowar, Day, and Fridsma 2007). Although not universal, data sharing increasingly required by journals, as a solution to the “replication crisis” where, for example, natural product researchers re-isolate previously described molecules (Spicer and Steinbeck 2017).

Intended Outcome

It helps advance science to have a straightforward access to data discussed in reports and journal articles. It simplifies referee work, makes is easier to compare multiple datasets, and improves data reuse.

  • Access
  • Reuse

Possible Approach to Implementation

For proprietary datasets, it is recommended to provide at least part of it (Guha and Willighagen 2017) or/and to provide a description. There are numerous repositories, and researchers have to be careful about ones which are not maintained and/or don’t follow standards. Widely used repositories that follow good practices include Dataverse and Zenodo. For MS raw file, results and metadata, NPDI studies could use more specific repository like MetaboLights, which is completely based on open source technology, using open data formats and community based reviews (https://www.ebi.ac.uk/metabolights/). MetaboLights adheres to MSI standards for metadata reporting and uses the ISA-tab format for metadata. For NMR, the COSMOS initiative (Salek et al. 2015) is similar and could be use by NPDI researchers to upload and share their NMR data with the nmrML format. Some users will benefit for the simplicity of a single file while others will want to access only a part of the dataset. Email is very commonly used but does not qualify as making data accessible because it depends on the data owner pushing the data to the recipient.

The researcher who shares data may want to provide a means for future users of the data to ensure that the dataset has been downloaded properly and has not been modified. To do so, the researcher may generate a signature of the data they are sharing using either cryptographic or hash approaches. The signature of the data can then be shared along with the dataset with instructions on how future users can confirm data integrity.

How to Test

Researchers should test to download their data and confirm that it is actually open.

Example

Every study and experiment deposited in the NaPDI repository [[birer-williams]] is downloadable as a CSV file of a JSON document.

If your lab website host datasets, serve the data using content negotiation

If your lab website host datasets, serve the data using content negotiation

Why

Content negotiation make it possible to use single URL for each dataset rather than a different URL for each format (e.g., Excel, CSV, XML, JSON, etc.). Clients can request whatever format they need by specifying the format type in the HTTP request.

Intended Outcome

A single URL to a given dataset makes access much easier. Also, a single URL is easier to reference and track, increasing the potential for re-use.

  • Access
  • Reuse

Possible Approach to Implementation

A possible approach to implementation is to use the Accept header property from the request inside a controller method. The client will set the Accept header to specify the response they are expecting, the controller will then redirect to the proper method in order to return the correct response.

How to Test

Check the available representations of the resource and try to get them specifying the accepted content on the HTTP Request header.

Example

The data for every study and experiment deposited in the NaPDI repository [[birer-williams]] can be requested using its identifier as either a CSV file or a JSON document. For example, different representations of the Green Tea study https://repo.napdi.org/NPDI--2FOpA can be served according to the specified content type of the HTTP Request:

	      curl -L -H "Accept: text/csv" https://repo.napdi.org/NPDI--2FOpA
	      curl -L -H "Accept: application/json" https://repo.napdi.org/NPDI--2FOpA
	    

Make the data update frequency and versioning information explicit

Make the data update frequency and versioning information explicit

See the recommendations Provide Data Up to Date and Best Practices for Data Versioning in the W3C report "Data on the Web Best Practices" [[dwbp]].

Strive to use standard data elements

Strive to use standard data elements for both dataset metadata and for dataset fields

Why

Standard data elements benefit researchers by making it simpler to find and integrate data from different sources.

Intended Outcome

The use of standard data elements makes it possible for researchers to quickly integrate datasets that originate from different sources (interoperability). Another benefit is that standardized elements improve data search and retrieval.

  • Findability
  • Interoperability
  • Reuse

Possible Approach to Implementation

Standard data elements should be used for both dataset metadata and the datasets themselves. For metadata, researchers should use broadly accepted metadata standards such as the Data Tag Suite (DATS) [[DATS]], Dublin Core [[DC11]], or PROV [[prov-overview]]. If researchers are contributing their data to a repository, researchers should confirm repository should use a widely accepted metadata standard. Zenodo [[zenodo]], Figshare [[figshare]], and most university/research institute data catalogs satisfy this recommendation.

For datasets (rather than dataset metadata), researchers should aim to maximize the reuse terms from existing broadly used ontologies and/or terminologies for data fields and values. One simple approach for data spreadsheets is to maintain a separate file with a data dictionary for the fields used in the dataset. The dictionary should provide each data element's name (label), definition, and persistent identifier. Another approach would be to provide the dataset in a format such as XML, JSON-LD, of RDF that can explicitly associate each data item with a standard data element. Entire sets of standard data elements exist for some kinds of experiments. For example, mzML is an open and generic XML (extensible markup language) representation of MS data [[martens-2011]]. The DIDEO ontology provides standard definitions for many data elements that arise from in vitro and clinical NPDI [[DIDEO-published]][[DIDEO-ontobee]]. Researcher can use ontology search tools such as Bioportal [[bioportal]], Ontobee [[ontobee]], or the EMBL-EBI Ontology Lookup Service [[ontology-lookup]] to search for candidate terms that provide clear definitions and persistent URLs.

How to Test

When creating metadata it is crucial to abide by the fields of that standard. For example if you choose to utilize the DATS [[DATS]] format you must use the fields provided here https://datatagsuite.github.io/schema/dataset_schema.json when creating metadata. This can then be validated using the python code located here: https://github.com/datatagsuite/dats-tools.

Example

Every web page for every study and experiment deposited in the NaPDI repository [[birer-williams]] has a JSON-LD metadata description written using Schema.org tags. For an example of this, see the recommendation Use an existing machine readable metadata standard. The NaPDI repository uses codes from several terminologies and ontologies to represent data including DIDEO [[DIDEO-ontobee]]. When users download data from studies or experiments from the study web page in the NaPDI repository, the user receives a data dictionary file that indicates which fields used standardized codes and the terminologies/ontologies that they come from.

If possible, provide access to data via a well-documented REST-full API

If possible, provide the ability for persons with programming experience to write scripts/programs that can access data via a well-documented REST-full API

Why

Representational state transfer (REST) is a style of software architecture that provides interoperability between computer systems on the internet.[[REST_explained]] The approach uses simple web-based stateless operations that generally are executed using URIs and HTTP. Access to datasets and associated metadata via a REST application programming interface (API) makes it much simpler for researchers and developers to integrate some portion of the data into web sites and applications than downloading and parsing files.

Intended Outcome

The primary benefit is simple programmatic access to NPDI data and metadata. APIs result in data that is computable and more likely to be reused in research workflows.

  • Findability
  • Access
  • Reuse

Possible Approach to Implementation

One approach to implement a REST-full API is to utilize an existing framework called Swagger and integrate it with your application. In the case of a spring maven project the steps would be to add a dependency such as springfox and then configure the Docket class as such:

Your Swagger 2.0 API documentation will now be available at http://myapp/v2/api-docs.

How to Test

By visiting http://myapp/v2/api-docs you should have a UI that you can interact with. Simply check that every endpoint returns the expected output based on your given input.

Example

Every web page for every study and experiment deposited in the NaPDI repository [[birer-williams]] can be retrieved via a simple REST call by appending '/json' to the study or experiment identifier. For example, here is a simple curl call to retreive JSON data for the study identified as https://repo.napdi.org/NPDI--2FOpA:

		curl -L https://repo.napdi.org/NPDI--2FOpA/json
	      

A more sophisticated REST API is provided by some other data repositories where researchers could deposit NPDI data. For example, the Zenodo API is useful for both submitting and requesting data. Here is a simple example of using the API to retrieve the metadata description of drug interaction study data from the drug approval package for Epidiolex (Cannabidiol) (https://zenodo.org/record/1406038):

$ curl -i "https://zenodo.org/api/deposit/depositions/1406038?access_token=SECRET_ACCESS_TOKEN"

		
HTTP/1.1 200 OK
Server: nginx
Date: Tue, 29 Sep 2020 08:51:52 GMT
Content-Type: application/json
Content-Length: 3468
Vary: Accept-Encoding
ETag: "7"
Last-Modified: Thu, 30 Aug 2018 19:33:22 GMT
Link: ; rel="files", ; rel="edit", ; rel="self", ; rel="publish", ; rel="registerconceptdoi", ; rel="html", ; rel="discard", ; rel="newversion"
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 99
X-RateLimit-Reset: 1601369572
Retry-After: 59
X-Frame-Options: sameorigin
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
Strict-Transport-Security: max-age=0
Referrer-Policy: strict-origin-when-cross-origin
Access-Control-Allow-Origin: *
Access-Control-Expose-Headers: Content-Type, ETag, Link, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset
X-User-ID: 51468
X-Request-ID: e27fd595bdb4d5d9dffbfe93023eb575

{"conceptdoi":"10.5281/zenodo.1406037","conceptrecid":"1406037","created":"2018-08-30T11:51:03.073216+00:00","doi":"10.5281/zenodo.1406038","doi_url":"https://doi.org/10.5281/zenodo.1406038","files":[{"checksum":"6961f89b1b54282dfb4652652e550761","filename":"study-131-experiment-data.csv","filesize":159722,"id":"f8f835a3-c591-4573-9e3d-6f037ce0dfd7","links":{"download":"https://zenodo.org/api/files/8939f250-be64-4440-81c5-a960e2a12f38/study-131-experiment-data.csv","self":"https://zenodo.org/api/deposit/depositions/1406038/files/f8f835a3-c591-4573-9e3d-6f037ce0dfd7"}}],"id":1406038,"links":{"badge":"https://zenodo.org/badge/doi/10.5281/zenodo.1406038.svg","bucket":"https://zenodo.org/api/files/8939f250-be64-4440-81c5-a960e2a12f38","conceptbadge":"https://zenodo.org/badge/doi/10.5281/zenodo.1406037.svg","conceptdoi":"https://doi.org/10.5281/zenodo.1406037","discard":"https://zenodo.org/api/deposit/depositions/1406038/actions/discard","doi":"https://doi.org/10.5281/zenodo.1406038","edit":"https://zenodo.org/api/deposit/depositions/1406038/actions/edit","files":"https://zenodo.org/api/deposit/depositions/1406038/files","html":"https://zenodo.org/deposit/1406038","latest":"https://zenodo.org/api/records/1406326","latest_html":"https://zenodo.org/record/1406326","newversion":"https://zenodo.org/api/deposit/depositions/1406038/actions/newversion","publish":"https://zenodo.org/api/deposit/depositions/1406038/actions/publish","record":"https://zenodo.org/api/records/1406038","record_html":"https://zenodo.org/record/1406038","registerconceptdoi":"https://zenodo.org/api/deposit/depositions/1406038/actions/registerconceptdoi","self":"https://zenodo.org/api/deposit/depositions/1406038"},"metadata":{"access_right":"open","communities":[{"identifier":"zenodo"}],"contributors":[{"name":"Marijanel Alilio","type":"DataCurator"},{"name":"Brandon Gufford","type":"DataCurator"}],"creators":[{"affiliation":"University of Pittsburgh","name":"Richard D Boyce","orcid":"0000-0002-2993-2085"}],"description":"

Data from the drug interaction studies reported in the drug approval package for Epidiolex (cannabidil), U.S. Food and Drug Administration: https://www.accessdata.fda.gov/drugsatfda_docs/nda/2018/210365Orig1s000TOC.cfm.

\n\n

The data was manually extracted by a trained pharmacist from the PDF documents uploaded to drugs@fda for Epidolex drug approval package. The data extraction was reviewed for quality by an expert in pharamacology and natural product-drug interactions.

","doi":"10.5281/zenodo.1406038","keywords":["natural product-drug interactions","drug approval package"],"language":"eng","license":"CC-BY-4.0","notes":"The data extraction was done with grant support from United States National Institutes of Health grant U54AT008909 from the National Center for Complementary and Integrative Health.","prereserve_doi":{"doi":"10.5281/zenodo.1406038","recid":1406038},"publication_date":"2018-08-30","references":["Drug approval package for Epidiolex (cannabidil), U.S. Food and Drug Administration: https://www.accessdata.fda.gov/drugsatfda_docs/nda/2018/210365Orig1s000TOC.cfm"],"title":"Drug Interaction Study Data from the Drug Approval Package for Epidiolex (Cannabidiol)","upload_type":"dataset","version":"1.0"},"modified":"2018-08-30T19:33:22.885388+00:00","owner":51468,"record_id":1406038,"state":"done","submitted":true,"title":"Drug Interaction Study Data from the Drug Approval Package for Epidiolex (Cannabidiol)"}%

Provide a license for data usage

Provide a license for data usage

See See the recommendation Provide data license information in the W3C report "Data on the Web Best Practices" [[dwbp]].

Give future users of the data a way to provide feedback

Give future users of the data a way to provide feedback

See the two recommendations in the section Feedback in the W3C report "Data on the Web Best Practices" [[dwbp]].

Recommendations specific to natural product chemistry

Provide raw data using an open, or at least well-documented, file format

NPDI researcher should provide a raw data file derived from assays that identify and quantify the chemical constituents present in natural products, for both mass spectrometry (MS) and nuclear magnetic resonance (NMR) analysis in an open, or at least well-documented, file format.

Why

Chemical analytical spectral data of a NP used for NPDI research should be accessible by the NPDI researcher community. Chemical metabolomic spectral data should be reusable by everyone. Researchers will more easily and reliably trust NPDI datasets if they have the opportunity to work on the raw data files. It also provides the later opportunity to launch meta analyses that advance NPDI research by providing a powerful new interpretation based on multiples datasets.

Intended Outcome

By using an open file format for raw spectral data sharing, NPDI chemical data are more accessible and reusable. Data sharing, repository deposition, and re-analysis of raw spectral data will be easier. Researchers can transfer raw data between institutions and collaborators, without the need for proprietary software (Rocca-Serra et al. 2015).

  • Access
  • Reuse

Possible Approach to Implementation

Previously, mass spectrometry open file formats, mzXML [[Pedrioli-2004]] and mzData [[Orchard-2004]], have been combined to create mzML. mzML is an open and generic XML (extensible markup language) representation of MS data (Martens et al. 2011). The user needs to convert MS raw data files from proprietary formats into an open format such as mzML. To this end, open source software for reading and writing mzML are useful to the user, such as Proteowizard (http://proteowizard.sourceforge.net/) or OpenMS (http://open-ms.sourceforge.net/) , as well as package under R (mzR), Python (pymzML) and Matlab (MSiReader). Specific to MS metabolomics, the mzTab-M is a data standard for sharing quantitative results [[Hoffmann-2019]]. This is a tab-separated text format with a standardized structure through the design of a detailed specification document. For NMR data, nmrML is an XML based, vendor-neutral open exchange and data storage format which the NMR community had agree to use [[Schober-2018]]. Other initiatives are NMReDATA (nmredata.org) and the universal raw data format, Allotrope Data Format (ADF), developed by the Allotrope Foundation (allotrope.org). In NMR, the Raw Data Initiative is a source of information for the NPDI researcher on how to make FAIR NMR data [[McAlpine-2019]]. If open raw NMR data formats are unavailable to the user, it is still preferable to share the raw data in the native format of the instrument manufacturer. This preserves the data, enables FAIR practices, and anticipates future use with conversion tools as they become available.

The researcher who shares data may want to provide a means for future users of the data to ensure that the dataset has been downloaded properly and has not been modified. To do so, the researcher may generate a signature of the data they are sharing using either cryptographic or hash approaches. The signature of the data can then be shared along with the dataset with instructions on how future users can confirm data integrity.

How to Test

Researchers should test if their vendor spectral file conversion to open format spectral file have been done correctly. They could test to upload their data to any software or program using mzXML, mzTab-M, nmrML or nmredata and confirm that the file is correct.

Example

The Pauli Group at UIC maintains uses Harvard Dataverse to deposit raw NMR and MS data using the recommended standards: https://dataverse.harvard.edu/dataverse/gfpuic

Publish NMR and spectroscopic results following standards

NPDI researcher should share and publish NMR and spectroscopic results following standards recognized by the field. For example, spectral images must to have a good resolution, NMR assigned peaks tables or text have to be complete and to follow standards, filtered peak list with intensity in metabolomics have to be shared and identification of compounds have to follow standard procedure.

Why

This practice will make NPDI NMR and spectroscopic results FAIR for the community to analyze and interpret. Data will already have been transformed making them accessible for a quick and easy interpretation of the results.

Intended Outcome

This would benefit the community by facilitating publication and dissemination of the results.

  • Access
  • Reuse

Possible Approach to Implementation

Reporting of NMR spectra is most frequently done using tables or descriptive text including HSQC, HMBC, and NOE correlations, as well as listings of chemical shifts, coupling constants, multiplicities and assignments [[McAlpine-2019]]. Journals provide guidelines, such as the ACS family of journals (“NMR Guidelines for ACS Journals,” n.d.).

It is recommended to upload the spectral file as a vector graphics image, ideally in an open and standardized format such as SVG or, if not possible, as PDF. At a minimum, upload an image with a high resolution. Whenever possible, such graphical representations should contain signal assignments that work in tandem with the descriptive text and tables. To share high quality data in MS, mass spectra are also shared as an image to visualize fragmentation pattern in the case of tandem MS. Regarding metabolite identification in MS, the Metabolomics Standards Initiative (MSI) had defined four different levels of metabolites identification, which are identified metabolites (level 1), putatively annotated compounds (level 2), putatively characterized compound classes (level 3) and unknown compound (level 4) [[Salek-2013]]. It is recommended to report for the metabolite identification the level of identification, the common name and the structural code like the InChI string. Also, external database identifiers, chemical structure image and chemical formula are frequently reported.

How to Test

Researchers will need to manually confirm that the recommendations are met at the time they submit their data to a data repository.

Example

The Pauli Group at UIC maintains uses Harvard Dataverse to deposit raw NMR and MS data using the recommended standards: https://dataverse.harvard.edu/dataverse/gfpuic

Report experiment metadata along with the spectral raw data file

Experimental metadata have to be reported with the spectral raw data file and results following standards.

Why

Experimental metadata have to follow reporting standards to make NPDI data reproducible and comparable with other NPDI studies. This allow results from different laboratories to be shared and re-interpreted.

Intended Outcome

This will benefit the research community by improving its findability, interoperability, and reuse.

  • Findability
  • Interoperability
  • Reuse

Possible Approach to Implementation

Experiment metadata covers the whole range necessary to understand how the experiment was conducted: the species under investigation, which organism part the metabolite was isolated from, which instruments were used, and which parameters were used. Researcher could use the properly documented metadata format IsaTab [[Salek-2013]] (ISA for Investigation, Study and Assay), as well as the raw data format Allotrope Raw Data format (https://www.allotrope.org) in NPDI studies. Every metadata is backed by a term in a commonly used ontology. Researchers should follow the minimal information standard, as created by the Metabolomics society in 2007, with the metabolomics standards initiative (MSI Board Members et al. 2007). Similar initiatives are undergoing for NMR studies, where the ISA format is also recommended [[McAlpine-2019]].

How to Test

The researcher could try to re-acquire the spectral data with the same sample only by using experimental metadata reported and compare results.

Example

The Pauli Group at UIC maintains uses Harvard Dataverse to deposit raw NMR and MS data using the recommended standards: https://dataverse.harvard.edu/dataverse/gfpuic. The research group provided metada on each entry within the Datavaserse. For example, the following metadata is provided for the newly described dihydrobenzofuran derivatives (LicAF1 and LicAF2) of licochalcone A (LicA) at https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DFAVTE:

Dataset Persistent ID 	doi:10.7910/DVN/DFAVTE
Publication Date 	2017-06-29
Title 	262_LicAdevs
Subtitle 	NMR data of dihydrobenzofuran derivatives of licochalcone A and coisolated compounds.
Author 	Simmler, Charlotte (University of Illinois at Chicago) - ORCID: 0000-0002-6923-2630
Contact 	
Use email button above to contact.

Simmler, Charlotte (University of Illinois at Chicago)
Description 	Compilation of raw NMR data (FID) with structural description (pms file) and annotation (PDF) of the newly described dihydrobenzofuran derivatives (LicAF1 and LicAF2) of licochalcone A (LicA). The dataset contains 1D, 1H NMR of photoisomerized LicAF1/2 and LicA, as well as the 1D 1HNMR data of co-isolated major licorice compounds (echinatin, licoisoflavone A, licoflavone B, licochalcone B, glabrone). (2017)
Subject 	Chemistry
Keyword 	Licochalcone A
Dihydrobenzofuran derivatives of licochalcone A
Related Publication 	Isolation and structural characterization of dihydrobenzofuran congeners of licochalcone A
Production Date 	2017-04-28
Grant Information 	NIH: U41 AT008706
NIH: P50 AT000155
Depositor 	Simmler, Charlotte
Deposit Date 	2017-02-08
Kind of Data 	raw NMR data
	      

Use reproducible, and well-documented, NMR and spectroscopic analysis workflows

Use reproducible, and well-documented, NMR and spectroscopic analysis workflows by clearly documenting the analytical process in an open and shareable manner.

Why

This is specifically relevant to make NPDI data FAIR as the NPDI scientific community can then reproduced exactly the workflow apply on the spectral raw data by using published and open access bioinformatics tools. By using proprietary software or “in house” functions, NPDI raw spectral data are less accessible and reusable.

Intended Outcome

Greater reproducibility is very important potential benefit of implementing this recommendation.

  • Access
  • Reuse

Possible Approach to Implementation

The use of free and open tools, software, workflow which are reproducible. It is important to be explicit about the workflow used. The workflow could have its own identifier, for example with a GitHub directory. In metabolomics, such an open access pipeline exists, https://workflow4metabolomics.org which is a collaborative portal dedicated to metabolomics data processing, analysis and annotation for the Metabolomics community. The GALAXY environment also proposes different open access packages, as well as R with Xcms for MS data and Batman for NMR data.

How to Test

Researchers may use a test dataset and confirm that the bioinformatics workflow used is applicable to other datasets.

Example

The NMR workflow provided by https://workflow4metabolomics.org provides an explicit and reproducible process for pre-processing NMR functionalities and applying several normalization methods.

Recommendations specific to NPDI pharmacology

Give a clear description of the experiment

NPDI pharmacologist should clearly state details about the experiment including the effect tested, experimental system used, natural product/natural product constituent/drugs, and the object and object metabolite measured.

Why

A clear description of the experiment helps ensure that NPDI pharmacology results more reusable by other NPDI researchers.

Intended Outcome

Researchers interested in the experiment data will have an improved understanding of the experimental context that led to the reported NPDI results. This should help the researchers assess if data from an NPDI experiment informs their research. Also, it should help researchers more easily reproduce the experiments and determine if new experiments are necessary.

  • Reuse

Possible Approach to Implementation

A short and efficient title has to be chosen for each experiment, indicating in few words what is the main effect detected, inhibition of an enzyme or not?, and what is the system tested with which natural product. The NPDI pharmacologist is highly encouraged to details which enzymes are involved, what is the test system, the object and it’s metabolite measured as well as the natural product tested. NPDI Data repositories can provide data entry forms that request the specific elements from researchers at the time of submission.

How to Test

Researchers can manually check that the experimental details they provide are sufficient for other researchers to potentially reproduce their experiments.

Example

Every NPDI pharmacology experiment deposited in the NaPDI repository [[birer-williams]] is entered using standard operating procedures that help ensure detailed information about the experiments is made available to other researchers. An example of one of the standard operating procedures is available here: NaPDI Repo SOP example.

Acknowledgments

Development of this Community Group Report was partially supported by grants from the United States National Center for Complementary and Integrative Health U54 AT008909 and U41 AT008706.