Andrea Poltronieri Jacopo de Berardinis Nicolas Lazzari An extension of Music Meta to describe the metadata of music collections, corpora, containers, or simply music datasets. CoMeta Ontology CoMeta Ontology An extension of Music Meta to describe the metadata of music collections, corpora, containers, or simply music datasets. 1.0 "2023-05-18"^^xsd:date "2023-04-12"^^xsd:date com: https://w3id.org/polifonia/ontology/cometa/ Associates an annotation to the raw data, or data descriptor, it annotates annotates data Associates a data descriptor (e.g. features, encodings) to the raw data they were derived from describes data extends Associates an API to the availability information of a dataset has API Associates the availability of a dataset to the way it can be accessed has accessibility Associates a dataset split to a dataset record has assignment Not needed, since a dataset split is still a dataset, hence this can use contains record Associates a dataset to information related to its availability and access has availability Associates a dataset to its content has content Associates raw data content or annotations to their type (e.g. tag, pattern, emotion). has content type Associates a data descriptor to the type of feature it provides has feature type Associates raw data content to the modality the data provides has modality Associates a dataset to a proper partition (a particular subset) of it has split A dataset can be a subset of another dataset has subset Associates a dataset record to an atomic music element includes content Associates a dataset to a task it enables is aimed for Associates a dataset to one of its maintainers (e.g. a person, an institution) is maintained by Associates a dataset record to the dataset it belongs to. contains record A textual description description download link The number of records that are contained in a data container record count release date Dataset content providing annotations that were produced or obtained from raw data content, or alternatively, from a data descriptor. Content Annotation Dataset content that describes the raw data content via features or encodings extracted from the former. This should not be confused with an annotation, but as a supplementary view of the raw data content of a dataset. Content Descriptor Describes the type of content that the raw data, or its annotaitons, provide. In the music domain, this may correspond to chord, pattern, emotion, etc. Content Type Dataset content providing raw data of structured (e.g. tabular data) or unstructured (e.g. audio files). For example, a dataset folder containing images can be described as raw data content. Raw Data Content Describes an Application Program Interface (API) for accessing, recreating, or extracting a dataset. API Describes the availability of data content according to the release strategy and policies of the dataset. For example, a music dataset may provide complete data records (full tracks) or contain audio clips or snippets (excertps) only. Content Availability Describes the accessibility of a dataset, instructing users on the modalities put in the place by the maintaners to access its content. Data Accessibility Describes the available of a dataset as a whole, or of a part of its content. Data Availability The format of the data in which content is provided. Data Format Describes the modality of dataset content such as audio, video, image, etc. Data Modality A container of data records with summative properties that allow the contextualisation of its content, availability and licensing. Dataset Describe the content of a dataset from a summative perspective (e.g. the audio content of a music collection, the audio features it provides, etc.) and its production process (provenance). Dataset Content A record of a dataset, providing references to its properties and annotations. Dataset Record Describes a partition of a dataset via its association with individual data record, which can be used for training, validating, or testing a computational method. Dataset Split Describes the type of feature provided by a content descriptor. Feature Type A production method is an activity that generates one or more artefacts that joinlty characterises data content. Production Method The type of a split that associates a function to the corresponding data partitions (e.g. a training set). Split Type Audio as a modality Audio Chords refer to harmonic structures found in music data. Chord Chroma-based features are descriptor of pitched audio signals (e.g. music). Chroma Features A production method based on a computational procedure Algorithmic Computational A production method collecting data via crowdsourcing Crowdsourced A dataset split including the validation and test sets Development Set Emotion can be either perceived or induced from the data Emotion A production method relying on human analysis Expert Human Data is made available in its entirety (e.g. full audio tracks) Full Content Image as a modality Image A Mel Spectrogram is a descriptor, or feature, of an audio signal. Mel Spectogram MFCC Features Mel-frequency cepstral coefficients (MFCCs) are an audio features. Access to the dataset undergoes a request procedure On Request Access Access to the dataset is open Open Access Patterns are usually found in the data to express and formalise regularities. Pattern Data is made available in partial form (e.g. audio snippets). Preview Content Structural content refers to segments or sub-sequencies found in sequential data. In the context of music, this may correspond to segments related to musical form (e.g. motifs, phrases, sections). Structure A split including test data Test Set Text as a modality Text A split including training data Training Set A split including validation data Validation Set Video as a modality Video