FAIR4ML Metadata Schema
Release date: 2024.06.04
Version: 0.0.1
Status: Draft
This version URI: https://w3id.org/fair4ml/0.0.1
Latest version URI: https://w3id.org/fair4ml#
Authors (in alphabetical order). The list of authors is not final! Please contribute to the discussion in our GitHub repository or discussion spreadsheet
- Leyla-Jael Castro, ZB MED
- Daniel Garijo, Universidad Politécnica de Madrid
- Dietrich Rebholz-Schuhmann, ZB MED
- Dhwani Solanki, ZB MED
- Jenifer Tabita Ciuciu-Kiss, Universidad Politécnica de Madrid
- Research Data Alliance FAIR4ML Task Force
License:
Download:
Introduction
An increasing amount of machine learning models are produced and shared in the Web by research scientists, ML enthusiast and ML developers. In this document we introduce a Schema.org extension for creating machine-readable representations of trained Machine Learning models. The proposed vocabulary also reuses properties from codemeta, in order to point to the code repository associated with a model. The figure below shows a high-level overview of the main metadata fields used to describe an ML model.
Namespaces used in this document
- rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
- rdfs: http://www.w3.org/2000/01/rdf-schema.
- owl: http://www.w3.org/2002/07/owl#
- schema: http://schema.org/
- codemeta: https://w3id.org/codemeta/
- fair4ml: https://w3id.org/fair4ml#
- cr: http://mlcommons.org/croissant/
Extending Schema.org hierarchy
This Profile extends the Schema.org hierarchy as follows:schema:Thing > schema:CreativeWork > fair4ml:MLModel
fair4ml: MLModel new properties
Property | Expected Type | Description |
---|---|---|
fair4ml:ethicalLegalSocial |
schema:Text |
Considerations with respect to ethical, legal and social aspects. |
fair4ml:evaluatedOn |
schema:Dataset cr:Dataset |
Dataset used for evaluating the model. The dataset used for evaluation may not have been part of the train/test/validation (e.g., a benchmark, extrinsic validation). |
fair4ml:fineTunedFrom |
fair4ml:MLModel |
Relationship to point to the source model used for fine tuning (if this model was fine-tuned from another one). |
fair4ml:hasCO2eEmissions |
schema:Text |
Amount of CO2 equivalent emissions produced by the model. The unit should be included in the field (e.g., 10 tonnes). |
fair4ml:intendedUse |
schema:Text schema:DefinedTerm schema:URL |
Purpose and intended use stated to enable users to make a decision as to the suitability of this creative work (e.g., lab protocol, machine learning model, software) to their experimental problem or own use case. |
fair4ml:mlTask |
schema:Text schema:DefinedTerm |
ML task addressed by this ML software or model (e.g., binary classification). |
fair4ml:modelCategory |
schema:Text schema:DefinedTerm |
Category of this ML model (e.g., Supervised, Unsupervised, Semi-supervised, Reinforcement), learning architecture (e.g., CNN), underlying algorithm (e.g., logistic regression, random forest). |
fair4ml:modelRisks |
schema:Text |
Description of the risks and biases of the model, in a human-readable manner. |
fair4ml:sharedBy |
schema:Person schema:Organization |
Person or Organization who shared the model online (e.g., uploading it to HuggingFace). |
fair4ml:testedOn |
schema:Dataset cr:Dataset |
Link to the dataset used to test the model (following train/test/validation splits). |
fair4ml:trainedOn |
schema:Dataset cr:Dataset |
AI-ready dataset (after pre-processing) used for the training and optimization of this ML model. |
fair4ml:usageInstructions |
schema:Text |
Description of the instructions needed to run the model (e.g., to do inference on a task). Code snippets may be used for illustration. |
fair4ml:validatedOn |
schema:Dataset cr:Dataset |
Link to the dataset used to validate the model. Typically the training dataset is a separated set from the train/testing set. |
Schema.org inherited Properties
Property | Expected Type | Description (from Schema.org) |
---|---|---|
schema:archivedAt |
schema:URL schema:WebPage |
Indicates a page or other link involved in archival of a CreativeWork. In the case of MediaReview, the items in a MediaReviewItem may often become inaccessible, but be archived by archival, journalistic, activist, or law enforcement organizations. In such cases, the referenced page may not directly publish the content. |
schema:author |
schema:Person schema:Organization |
The author of this content or rating. Please note that author is special in that HTML 5 provides a special mechanism for indicating authorship via the rel tag. That is equivalent to this and may be used interchangeably. |
schema:citation |
schema:Text schema:CreativeWork |
A citation or reference to another creative work, such as another publication, web page, scholarly article, etc. |
schema:conditionsOfAccess |
schema:Text |
Conditions that affect the availability of, or method(s) of access to, an item. Typically used for real world items such as an ArchiveComponent held by an ArchiveOrganization. This property is not suitable for use as a general Web access control mechanism. It is expressed only in natural language.\n\nFor example "Available by appointment from the Reading Room" or "Accessible only from logged-in accounts ". |
schema:contributor |
schema:Organization schema:Person |
A secondary contributor to the CreativeWork or Event. |
schema:copyrightHolder |
schema:Organization schema:Person |
The party holding the legal copyright to the CreativeWork. |
schema:dateCreated |
schema:Date schema:DateTime |
The date on which the CreativeWork was created or the item was added to a DataFeed. |
schema:dateModified |
schema:Date schema:DateTime |
The date on which the CreativeWork was most recently modified or when the item's entry was modified within a DataFeed. |
schema:datePublished |
schema:Date schema:DateTime |
Date of first publication or broadcast. For example the date a CreativeWork was broadcast or a Certification was issued. |
schema:description |
schema:TextObject schema:Text |
A description of the item. |
schema:discussionUrl |
schema:URL |
A link to the page containing the comments of the CreativeWork. |
schema:distribution |
schema:DataDownload |
A downloadable form of this dataset, at a specific location, in a specific format. This property can be repeated if different variations are available. There is no expectation that different downloadable distributions must contain exactly equivalent information (see also [DCAT](https://www.w3.org/TR/vocab-dcat-3/#Class:Distribution) on this point). Different distributions might include or exclude different subsets of the entire dataset, for example. |
schema:funding |
schema:Grant |
A Grant that directly or indirectly provide funding or sponsorship for this item. See also ownershipFundingInfo. |
schema:identifier |
schema:Text schema:URL schema:PropertyValue |
The identifier property represents any kind of identifier for any kind of Thing, such as ISBNs, GTIN codes, UUIDs etc. Schema.org provides dedicated properties for representing many of these, either as textual strings or as URL (URI) links. See background notes for more details. |
schema:inLanguage |
schema:Text schema:Language |
The language of the content or performance or used in an action. Please use one of the language codes from the IETF BCP 47 standard. See also availableLanguage. |
schema:isAccessibleForFree |
schema:Boolean |
A flag to signal that the item, event, or place is accessible for free. |
schema:keywords |
schema:Text schema:URL schema:DefinedTerm |
Keywords or tags used to describe some item. Multiple textual entries in a keywords list are typically delimited by commas, or by repeating the property. |
schema:license |
schema:URL schema:CreativeWork |
A license document that applies to this content, typically indicated by URL. |
schema:maintainer |
schema:Organization schema:Person |
A maintainer of a Dataset, software package (SoftwareApplication), or other Project. A maintainer is a Person or Organization that manages contributions to, and/or publication of, some (typically complex) artifact. It is common for distributions of software and data to be based on "upstream" sources. When maintainer is applied to a specific version of something e.g. a particular version or packaging of a Dataset, it is always possible that the upstream source has a different maintainer. The isBasedOn property can be used to indicate such relationships between datasets to make the different maintenance roles clear. Similarly in the case of software, a package may have dedicated maintainers working on integration into software distributions such as Ubuntu, as well as upstream maintainers of the underlying work. |
schema:memoryRequirements |
schema:Text schema:URL |
Minimum memory requirements. |
schema:name |
schema:Text |
The name of the item. |
schema:operatingSystem |
schema:Text |
Operating systems supported (Windows 7, OS X 10.6, Android 1.6). |
schema:processorRequirements |
schema:Text |
Processor architecture required to run the application (e.g. IA64). |
schema:releaseNotes |
schema:Text schema:URL |
Description of what changed in this version. |
schema:softwareHelp |
schema:CreativeWork |
Software application help. |
schema:softwareRequirements |
schema:Text schema:URL |
Component dependency requirements for application. This includes runtime environments and shared libraries that are not included in the application distribution package, but required to run the application (examples: DirectX, Java or .NET runtime). |
schema:storageRequirements |
schema:Text schema:URL |
Storage requirements (free space required). |
schema:url |
schema:URL |
URL of the item. |
schema:version |
schema:Number schema:Text |
The version of the CreativeWork embodied by a specified resource. |
Codemeta inherited Properties
Property | Expected Type | Description (from Codemeta) |
---|---|---|
buildInstructions |
schema:URL |
Link to installation instructions/documentation. |
developmentStatus |
schema:Text |
Description of development status, e.g. Active, inactive, suspended. See repostatus.org . |
issueTracker |
schema:URL |
Link to software bug reporting or issue tracking system. |
readme |
schema:URL |
Link to software Readme file. |
referencePublication |
schema:ScholarlyArticle |
An academic publication related to the software. |
If you spot any errors or omissions, please file an issue in our GitHub.