Meta-Share ontology v2 prelease beta. The ontology caters for the description of Language resources (corpora, lexical/conceptual resources, models, grammars, etc. and language processing tools and services) for Language Technology needs.
It is also related to the OMTD-SHARE ontology (http://w3id.org/meta-share/omtd-share/) which caters for specific features (LT operation/area, data format, method, annotation type).
META-SHARE ontology
Dimitris Galanis
John P. McCrae
Jorge Gracia
Katerina Gkirtzou
Marta Villegas
Penny Labropoulou
Philipp Cimiano
Victor Rodriguez Doncel
2022-05-27
ms
v2.0.0 pre-release beta
Spatial characteristics of the resource
spatial
Temporal characteristics of the resource
temporal
Associates an entity to an identifier, i.e. a string used to uniquely identify it
has identifier
Indicates the scheme used to provide the identifier for an entity
uses identifier scheme
Introduces a Language Technology-related area associated with the activities or domain of interest of an entity
LT area
Language Technology area
Links a tool/service (the one being described) to a language resource that bears some relation to it
LT tool/service related to
Links an external metadata record that has some relation with the metadata record of the resource being described or the resource itself
metadata record related to
Specifies the rights for accessing the distributable form(s) of a language resource (preferrably in accordance to a formalised vocabulary)
access rights
Links a tool/service B that is (or can be) used for accessing LR A (e.g., a corpus workbench, s/w for lexicon access) to the LR A
accesses
Links actors to a language resource with which they have some relation
actor related to
Provides information on how the resource has been used (e.g., in a project, for a publication, for testing a tool/service, etc.)
actual use
Links to the physical address of an organization, group or person represented as a set of distinct elements (street, number, zip code, city, etc.)
address
Specifies the organization that a person is affiliated with or works for
affiliated organization
Introduces information on an organization to whom the person is affiliated and the data relevant to this relation (i.e. position, email, etc.)
affiliation
The age group to which the participant belongs
age group
The age range of the group of participants
age group of participants
Specifies the elements annotated at each annotation level
annotated element
Links a corpus to its annotated part(s)
annotation
Indicates whether the resource is annotated manually or by automatic processes
annotation mode
Links to a report describing the annotation process, tool, method, etc. of the language resource
annotation report
Specifies the annotation resource used for annotating a corpus or that has been used for the input resource of a tool/service or that should be used (dependency) for the annotation
annotation resource
The annotation schema used for annotating a resource or for the input resource (if annotated) of a tool/service or that should be used (dependency) for the annotation
annotation schema
Specifies the annotation type of the annotated version(s) of a resource or the annotation type a tool/ service requires or produces as an output
annotation type
Identifies the person or organisation responsible for the annotation of a resource
annotator
Indicates whether the language resource has been anonymized
anonymized
agent for attribution
attributed agent
Indicates the intended audience size of a multimedia resource
audience
Specifies the features of the format of the audio part of a language resource
detailed audio format
Specifies the audio quality measures
audio quality measure included
Introduces the details for a person that has created the document being described
author
Specifies the availability status of the resource
availability
Type of item that is represented in the n-gram resource
base item
Specifies the body parts visible in the video or image part of the resource
body part
Specifies the byte order of 2 or more bytes sample
byte order
Provides information on the type of the transducers through which the data is captured
capturing device type
Specifies the type of the capturing environment
capturing environment
Specifies the character encoding used for a language resource data distribution
character encoding
Links to the language resource(s) that a publication includes in its citations
cites
Defines the colour space for the video
colour space
Specifies the type of (external) lexicon that can be used with the grammar
compatible lexicon type
Specifies the vocabulary/standard/best practice to which a resource is compliant with
complies with
The name of the compression applied
compression name
Links a licence with a specific condition/term of use imposed for accessing a language resource. It is an optional element and only to be taken as providing brief human readable information on the fact that the language resource is provided under a specific set of conditions. These correspond to the most frequently used conditions imposed by the licensor (via the specified licence). The proper exposition of all conditions and possible exceptions is to be found inside the licence text. Depositors should, hence, carefully choose the values of this field to match the licence chosen and users should carefully read that licence before using the language resource.
condition of use
Specifies the vocabulary/scheme to which a value (e.g., as of text genre, domain, etc.) conforms to
conforms to
Specifies the data of the person/organization/group that can be contacted for information about a language resource
contact
Introduces the details for a person that has contributed in some way to the production of a document or resource
contributor
Specifies the conversational type of a multimedia resource
conversational type
Introduces the organization that coordinates the project being described
coordinator
co-ordinator
Introduces a classification of corpora into types (used for descriptive reasons)
corpus subclass
Introduces the cost for accessing a resource or the overall budget of a project, formally described as a set of amount and currency unit
cost
Introduces the name of the country mentioned in the postal address of a person or organization as defined in the list of values of ISO 3166
country
Introduces the country in which an organization was officially registered for the first time as a legal entity
country of registration
Specifies whether the resource was created automatically or in a manual or interactive mode
creation mode
Specifies the currency used for describing the cost of a resource
currency
Indicates the basic format for which a detailed description is provided in the case of audio, video, and image formats (e.g., supplying further information for sampling rate, compression, etc.)
data format
Links to LR B that requires LR A (the one being described) for its operation or that has been processed with LR A
dependent language resource
Links to LR B that is required for the operation of LR A (the one being described) or that has been used in its processing
depends on
supporting resource
depositing party
An entity where one can deposit (upload) a resource
Links to the entity that is being described by a metadata record
type of described entity
foaf:primaryTopic in DCAT
Links to the language resource(s) that are described (contents, operation, etc.) in a document
describes
Introduces a framework used in the development of the resource
development framework
Links a tool/service B that is (or can be) used to display LR A (e.g., a tool for visualizing relations in a lexicon, or annotations in a corpus) to the LR A
displays
visualises
visualizes
Links a language resource to the various forms with which it is distributed
distribution
Links to a feature that can be used for describing distinct distributable forms of audio resources/parts
audio feature
distribution feature
Links to a set of features that can be used for describing distinct distributable forms of resources/parts
Specifies the form (medium/channel) with which a resource is distributed (delivered or provided access to)
distribution form
Links to a feature that can be used for describing distinct distributable forms of image resources/parts
image feature
Identifies a person or an organization that holds the distribution rights. The range and scope of distribution rights is defined in the distribution agreement. The distributor in most cases only has a limited licence to distribute the work and collect royalties on behalf of the licensor or the IPR holder and cannot give to any recipient of the work permissions that exceed the scope of the distribution agreement (e.g., to allow uses of the work that are not defined in the distribution agreement)
distribution rights holder
distribution text feature
Links to a feature that can be used for describing distinct distributable forms of numerical text resources/parts
numerical text feature
Links to a feature that can be used for describing distinct distributable forms of video resources/parts
video feature
Classifies the division of an organization according to a controlled vocabulary
division category
Links to the entity (language resource, project, part of a language resource) that is somehow related to the document that is described
document related to
Specifies the type of the document (e.g., article, book, etc.) provided with or related to the language resource
document type
Classifies the documentation vis-a-vis its relation with the language resource (e.g., general publication, user manual, url for online help, etc.)
documentation type
Links a document to a language resource (e.g., research paper describing its contents or its use in a project, user manual, etc.)
documents
Links a report to the annotation process, tool, method, etc. of a language resource that it describes
documents annotation
Links a report to the evaluation process, tool, method, etc. of the tool or service it purports to describe
documents evaluation
Links the guidelines used for the creation or annotation of a language resource to this language resource
documents guidelines
Links a document (e.g., research article, report) describing the use of a language resource (e.g., in a project, application, experiment or use case) to the data of this use
documents usage
Links a document with detailed information on the validation process and results to the data of this validation
documents validation
Identifies the domain according to which an entity is classified
domain
Specifies the duration of the audio recording including silences, music, pauses, etc.
duration of audio
Specifies the duration of effective speech of the audio (part of a) resource
duration of effective speech
Specifies the unit used for measuring the duration of a resource
duration unit
Groups information on the dynamic elements that are represented in the video part of the resource
dynamic element
it was like this in MS-OWL v1, although it's not multiple; helps to organise the elements though
Introduces the maximum amount of money contributed from EC funds
EC max contribution
Introduces the details for a person that has edited the document being described
editor
Links a tool/service B that is (or can be) used for editing LR A (the one being described) to LR A
edits
Classifies the contents of a lexical/conceptual resource or language description as regards the linguistic level of analysis it caters for
encoding level
Introduces a value of a list used inside parameters
enumeration value
Provides information on the evaluation process and results for a tool/service
evaluation
Defines the criterion(s) of the evaluation of a tool
evaluation criterion
Indicates the evaluation level
evaluation level
Defines whether the evaluation measure is human or automatic
evaluation measure
Links to a report describing the evaluation process, tool, method, etc. of the tool or service
evaluation report
Indicates the evaluation type
evaluation type
Describes the person or organization that evaluated the tool or service
evaluator
Provides reference to another resource to which the lexicalConceptualResource is linked (e.g., link to a wordnet or ontology)
external reference
Indicates the extratextual information contained in a lexical/conceptual resource; it can be used as an alternative to the representation of the audio, image, video, etc. part(s) of the resource when these are not considered important enough to fully describe
extratextual information
Indicates the unit on which the extratextual information is attached to in the lexical conceptual resource
extratextual information unit
Indicates the format of a resource or resource part
format
Specifies the implementation framework used for developing and running a tool/service
wrapping framework
Specifies the operation/function/task that a software object performs
function
operation
task
Identifies the person or organization or group that has financed the project
funder
Specifies the name of the funding country, in case of national funding as mentioned in ISO3166
funding country
Links a language resource to the project that has funded its creation, enrichment, extension, etc.
funded by
funding project
Specifies the type of funding of a project with regard to the source of the funding
funding type
Links a language resource (part) to the genre it belongs to
genre
Gives an indication of the grammatical phenomena covered by the grammar
grammatical phenomena coverage
Links to the role with which an agent is attributed for a certain activity
had role
Links to a corpus B which contains an aligned version of corpus A (the one being described)
has aligned version
Links a tool/service B that is (or can be or has been) used for analysing LR A (e.g., a statistical tool) to LR A
has analysed
Links a tool/service B that has been used for annotating LR A (e.g., a tagger, NER, etc.) to LR A
has annotated
Links to a corpus B which contains the annotations of corpus A (the one being described)
has annotated version
Links a tool/service B that is (or can be or has been) used for archiving LR A to LR A
has archived
Links to LR B that has been used for creating LR A (the one being described) through a conversion procedure, e.g., a PDF to text conversion
has converted version
Links a tool/service B that is (or can be or has been) used for creating LR A to LR A
has created
Links an organization to the division(s) (e.g., company branch, university faculty or department, etc.) it consists of
has division
Links a tool/service B that is (or can be or has been) used for eliciting LR A to LR A
has elicited
Links a tool/service that has been used for evaluating an LR A to the data of this evaluation
has evaluated
Links to an external metadata record that describes the same language resource in another catalogue, repository, etc.
has metadata
Links a language resource to the original source that has been used for its creation, where it's derived or elicited from
has original source
Links to LR B which is created / extracted from LR A (the one being described), i.e. LR A has been used as the basis/initial/source material for LR B
has outcome
is source of
Links to LR B which is contained in LR A (the one being described), e.g., a monolingual corpus part of a bilingual corpus
has part
Links a tool/service to the data of the language resource (e.g., audio or video part of a corpus) it has recorded
has recorded
Links to a subset of the resource that is described in order to add specific information on the subset (e.g. size per language for multilingual corpora, size per domain for lexica with multiple domains, etc.)
has subset
introduced to cater for all sizePer properties; decided not to use the isPartOf because I wanted to keep this for "whole" language resources
Links a tool/service B that is (or can be) used for validating LR A to the data of this validation
has validated
Links to LR B that is a version of LR A (the one being described)
has version
Links to the physical address of the head office of an organization or group represented as a set of distinct elements (street, zip code, city, etc.)
address (head office)
Specifies whether the group of participants contains persons with hearing impairments
hearing impairment of participants
Provides information on each of the format(s) of the image part of a resource
detailed image format
Specifies the requirements set by a tool/service for the (content) resource that it processes
input content resource
Specifies an LT application for which the language resource has been created or for which it can be used or is recommended to be used
intended application
Indicates the level of conversational interaction between speakers (for audio component) or participants (for video component)
interactivity
A person or an organization who holds the full Intellectual Property Rights (Copyright, trademark, etc.) that subsist in the resource. The IPR holder could be different from the creator that may have assigned the rights to the IPR holder (e.g., an author as a creator assigns her rights to the publisher who is the IPR holder) and the distributor that holds a specific licence (i.e. a permission) to distribute the work via a specific distributor.
IPR holder
intellectual property rights holder
rights holder
Links to a tool/service B that is (or can be) used for accessing LR A (the one being described), e.g., a corpus workbench, s/w for lexicon access
is accessed by
access tool
Links to a corpus B which is the aligned version of corpus A (the one being described)
is aligned version of
Links to a tool/service B that is (or can be or has been) used for analysing LR A (the one being described), e.g., a statistical tool
is analysed by
analysis tool
Links to a tool/service B that has been used for annotating LR A (the one being described), e.g., a tagger, NER, etc.
is annotated by
annotation tool
Links to a corpus B which is the raw corpus that has been annotated (corpus A, the one being described)
is annotated version of
Links an annotation resource to the corpus where it has been used for annotating or to data of an annotation tool/service that has been or can be combined with
is annotation resource of
Links an annotation schema to the corpus where it has been used for annotating or to data of an annotation tool/service that has been or can be combined with
is annotation schema of
Identifies the resource(s) that have been annotated by the person/organization
is annotator of
Links to a tool/service B that is (or can be or has been) used for archiving LR A (the one being described)
is archived by
archiving tool
Links to a publication that cites the language resource described
is cited by
Links to a LR B that has been used together with LR A (the one being described) to create LR C, e.g., two monolingual wordnets/corpora aligned to produce a bilingual resource
is combined with
Links to a LR B that can be used together with LR A (the one being described), e.g., a lexicon that is compatible with a grammar
is compatible with
Links to the language resource(s) for which the person/organization/group can be contacted for further information
is contact for
Links to LR B that forms the basis of LR A (the one being described) upon which it has continued to extend / enrich
is continuation of
Links to a LR B that extends / continues / enriches LR A (the one being described)
is continued by
Links to LR B that has been the outcome of a conversion procedure from LR A (the one being described), e.g., a PDF to text conversion
is converted version of
Introduces the project that an organization coordinates
is coordinator of
Links to a tool/service B that has been used for creating LR A (the one being described)
is created by
creation tool
derivation tool
Links to a document that describes the contents or operation of a language resource
is described by
Links to a tool/service B that is (or can be) used to display LR A (the one being described), e.g., a tool for visualizing relations in a lexicon, or annotations in a corpus
is displayed by
display tool
visualization tool
Identifies the language resource for which a person or an organization holds the distribution rights. The range and scope of distribution rights is defined in the distribution agreement. The distributor in most cases only has a limited licence to distribute the work and collect royalties on behalf of the licensor or the IPR holder and cannot give to any recipient of the work permissions that exceed the scope of the distribution agreement (e.g., to allow uses of the work that are not defined in the distribution agreement)
is distribution rights holder of
Links a division (e.g., company branch, university department or faculty) to the organization to which it belongs
is division of
Links a language resource to a document (e.g., research paper describing its contents or its use in a project, user manual, etc.) or any other form of documentation (e.g., a URL with support information) that is related to the resource
is documented by
documentation report
Links to a tool/service B that is (or can be) used for editing LR A (the one being described)
is edited by
editing tool
Links to a tool/service B that is (or can be or has been) used for eliciting LR A (the one being described)
is elicited by
elicitation tool
Links to a tool/service B that has been used to evaluate LR A (the one being described)
is evaluated by
evaluation tool
Identifies the tool or service(s) that a person or organization that has evaluated
is evaluator of
Links to LR B that has the same contents with LR A; they may have different names or the same name and be stored on different locations
is exact match with
is identical with
is same as
Links to the projects) that the person/organization has financed
is funder of
Links a project to the language resource(s) that it has funded (e.g., creation, enrichment, extension, etc.)
is funding project of
Links to the resource(s) for which a person or an organization holds the full Intellectual Property Rights (Copyright, trademark, etc.). The IPR holder could be different from the creator that may have assigned the rights to the IPR holder (e.g., an author as a creator assigns her rights to the publisher who is the IPR holder) and the distributor that holds a specific licence (i.e. a permission) to distribute the work via a specific distributor.
is IPR holder of
Links to the resource(s) which a person or organisation is legally eligible to license and actually licenses. The licensor could be different from the creator, the distributor or the IPR holder. The licensor has the necessary rights or licences to license the work and is the party that actually licenses the resource that is distributed via the specific channel. She will have obtained the necessary rights or licences from the IPR holder and she may have a distribution agreement with a distributor that disseminates the work under a set of conditions defined in the specific licence and collects revenue on the licensor's behalf. The attribution of the creator, separately from the attribution of the licensor, may be part of the licence under which the resource is distributed (as e.g., is the case with Creative Commons Licences).
is licensor of
Specifies the tool/service with which the ML model can be combined in order to perform the desired task
is ML model of
is Machine Learning model of
Links to the metadata record(s) that a person has created
is metadata creator of
Links to the metadata record(s) for which a person is responsible for creating, updating, enriching, etc.
is metadata curator of
Links an external metadata record (from another catalogue, repository, etc.) to the language resource described
is metadata of
Links to LR B which contains LR A (the one being described), e.g., a bilingual corpus that includes a monolingual corpus
is part of
Links to LR B that together with LR A (the one being described) are parts of LR C
is part with
Identifies the project(s) where an organization has officially participated or participates in
participant in
European Language Technology Project, Machine Translation for Elderly People, ...
Links to a tool/service B that is (or can be) used for querying LR A (the one being described)
is queried by
query tool
Links to a tool/service that has been used for recording a video or audio corpus part
is recorded by
recording platform software
recording tool
Links to the audio or video part of the resource(s) that a person/organization has recorded
is recorder of
Identifies a relation holding between two language resources or a language resource and a document or actor or, in general, a satellite entity
is related to
Links to the data of an actor that bears some relation to the language resource that is described
is related to actor
Links to a document that is somehow related to the entity that is described
is related to document
Links to a language resource that holds a relation with the entity being described (without further specification of the relation type)
is related to Language Resource
Links to a tool/service B that bears some relation to LR A (the one being described)
is related to LT tool/service
Links to an external metadata record that has some relation with the metadata record of the resource being described or the resource itself
is related to metadata record
Links to a project that has some relation with the language resource that is described (e.g., has funded its creation)
is related to project
Links to LR B that is a newer version of LR A (the one being described) and replaces it
is replaced with
Links to a language resource that is required for the operation of a tool/service or computational grammar
is required language resource by
Links to the language resource(s) that a person/organization has created
is resource creator of
Links to a resource for which the person/organization is responsible of providing, curating and maintaining
is resource provider of
Links to a document that contains the review of LR A (the one being described)
is reviewed by
review
Links to LR B that bears resemblances to LR A (the one being described), e.g., they have been built with the same theoretical principles or are the same with different formats or processed at the same level with different tools
is similar to
is similar with
is supplement to
is supplemented by
Links a tagset to the resource where it has been used for annotating or to a tool/service or that should be used (dependency) for the annotation
is tagset for
Links to the publication that must be used when citing the language resource
is to be cited by
Links a typesystem to the corpus where it has been used for annotating or to data of an annotation tool/service that has been or can be combined with
is typesystem of
Links a project where a language resource has been used to this LR
is usage project of
Links to a tool/service B that is (or can be) used for validating LR A (the one being described)
is validated by
validation tool
Links person(s), group(s) or organization(s) that have performed a specific validation of a language resource to the data of this validation
is validator of
Links to LR B that is a version (corrected, annotated, enriched, processed, etc.) of LR A (the one being described)
is version of
Specifies the language that is used in the resource or supported by the tool/service
language
Relates a language resource that contains segments in a language variety (e.g., dialect, jargon) to it
language variety
in MS we have language variety attached to language; check if this is ok; introduce a class Language after deciding on how to treat languages
Specifies the type of the language variety
language variety type
Introduces a classification of lexical/conceptual resources into types (used for descriptive reasons)
LCR subclass
A classification of language descriptions into models, grammars etc.
Language Description subclass
Specifies the task performed by the language description
language description task
Categorises a licence according to a classification scheme
licence category
Links the distribution (distributable form) of a language resource to the licence or terms of use/service (a specific legal document) with which it is distributed
licence
The person or organisation who is legally eligible to license and actually licenses the resource. The licensor could be different from the creator, the distributor or the IPR holder. The licensor has the necessary rights or licences to license the work and is the party that actually licenses the resource that is distributed via the specific channel. She will have obtained the necessary rights or licences from the IPR holder and she may have a distribution agreement with a distributor that disseminates the work under a set of conditions defined in the specific licence and collects revenue on the licensor's behalf. The attribution of the creator, separately from the attribution of the licensor, may be part of the licence under which the resource is distributed (as e.g., is the case with Creative Commons Licences).
licensor
Indicates whether the resource includes one, two or more languages
linguality type
Provides a more detailed account of the linguistic information contained in the lexical/conceptual resource
linguistic information
Describes the linking between different media parts of a resource (how they interact with or link to each other)
link to other media
Links a language resource to the parts it consists of (e.g., an oral corpus with the audio and its transcribed text part)
media part
Specifies the media type of a language resource (the physical medium of the contents representation) or of the input/output of a language processing tool/service; each media type is described through a distinctive set of technical features; a language resource may consist of different media parts
media type
Links to an association or network that the organization or person is member of
member of association
Introduces an institution with members that can benefit from specific conditions on the use of a resource (e.g. discount, unlimited access, etc.)
membership institution
Introduces the person who has created the metadata record
metadata creator
A person responsible for the creation, update, enrichment, etc. of a metadata record describing an entity
metadata curator
Specifies the language that is used as support for the resource (e.g., English for a grammar of French described in English or for a French dictionary with English definitions)
metalanguage
Specifies the method used for the development of a tool/service or the ML model
method
Identifies the metric used for the evaluation of the tool/service
metric
The format of the resource expressed with one of the values from the IANA mimetypes (https://www.iana.org/assignments/media-types/media-types.xhtml).
format (as mimetype value)
Specifies the framework that has been used for developing a model (e.g. keras, tensorflow, etc.)
ML framework
Machine Learning framework
Specifies the ML model that must be used together with the tool/service to perform the desired task
ML model
Specifies the type of the modality represented in the resource or processed by a tool/service
modality type
The function/task/operation a model performs
model function
A classificaiton of models based on their algorithm
model type
Indicates whether the resource (part) is parallel, comparable or mixed
multilinguality type
Introduces the maximum amount of money contributed from national funds
national max contribution
Specifies the level of naturality for multimodal/multimedia resources
naturality
Specifies the level of background noise
noise level
Specifies the distinct non-speech elements that maybe included in the audio part of a corpus or lexical/conceptual resource
non-speech item
Specifies the operating system on which a software object can run
operating system
Classifies an organization according to its legal status
organization legal status
Classifies an organization according to its role/mission
organization role
Specifies the relation of the language of a participant with respect to the acquisition stage
origin
Specifies the relation of the language of the group of participants with respect to the acquisition stage
origin of participants
Specifies the media types that are linked to the media type described within the same resource
other media
Links to the physical address of an office (other than the principal one) of an organization or group represented as a set of distinct elements (street, zip code, city, etc.)
address (other offices)
Specifies the output results of a tool/service, i.e. the features of the processed (content) resource
output resource
Specifies the format of the package (zipped file) that contains the resource files
package format
Introduces a parameter used for running a tool/service
parameter
Classifies the parameter according to a specific (not yet standardised) typing system (e.g., whether it's boolean, string, integer, a document, mapping, etc.)
parameter type
Introduces descriptive characteristics of a person that has participated in the audio, video, sensorimotor (textNumerical) part of the resource
participant
Indicates the organization(s) that participate (or have participated) in a project
participating organization
Describes the performance indicator used for the evaluation of a tool/service
performance indicator
Specifies whether the language resource contains personal data (mainly in the sense falling under the GDPR)
personal data included
The language acting as an intermediary for translations between many languages
pivot language
Specifies the policy that a tool/service has with respect to the annotation types included in the annotated resource that it takes as input (i.e. whether it keeps, modifies or drops them in the output resource)
previous annotation types policy
Specifies the type of processing performed on a language resource
process mode
Specifies the resource type that a tool/service takes as input or produces as output
processing resource type
Links a project to the language resource(s) it has some relation with (e.g., funded the creation of)
project related to
Used for relations to agents not covered by the classic relations (e.g., funders, contributors, etc.)
attribution
Specifies the quality level of image resource
quality
Links a tool/service B that is (or can be) used for querying LR A to LR A
queries
Indicates if the image is stored as raster or vector graphics
raster or vector graphics
Links to the recorder(s) of the audio or video part of the resource
recorder
Specifies the nature of the recording platform hardware and the storage medium
recording device type
Specifies where the recording took place
recording environment
Indicates the audio or video recording quality
recording quality
Identifies the language resource with which a relation is holding with the one being described
related language resource
Indicates the type of the lexica that must or can be used with the grammar
related lexicon type
Links a language resource to other related resources specifying also the type of relation
relation
Links to LR B that is an older version of LR A (the one being described) and has been replaced by it
replaces
Specified an organization that has been replaced by the organization which is described (e.g. following a merge or acquisition action)
replaces organization
Specifies a project that has been replaced by the project described
Replaces project
Specifies the type of hardware required for running a tool and/or computational grammar
required hardware
Links to LR B that is required for the operation of a tool/service or computational grammar (the one that is described)
requires language resource
Groups together information on the image resolution
resolution
Specifies the standard to which the resolution conforms
resolution standard
Links a resource to the person, group or organisation that has created the resource
resource creator
The person/organization responsible for providing, curating, maintaining and making available (publishing) the resource
resource provider
Links a document to the language resource it reviews
reviews
Introduces a combination of the sample text(s) or sample file(s) and optional tags that can be used for feeding a processing service for testing purposes
sample
Indicates the task defined for the conversation or the interaction of participants
scenario type
Provides information on the illumination of the scene
scene illumination
needed for audio?
Specifies the segmentation unit in terms of which the resource has been segmented or the level of segmentation a tool/service requires/outputs
segmentation level
Specifies whether the language resource contains sensitive data (e.g., medical/health-related, etc.) and thus requires special handling
sensitive data included
Specifies the sex of a participant (either of the two major forms of individuals that occur in many species and that are distinguished respectively as female or male especially on the basis of their reproductive organs and structures [https://www.merriam-webster.com/dictionary/sex])
sex
check with sociologists?
Specifies the sex of the participants (either of the two major forms of individuals that occur in many species and that are distinguished respectively as female or male especially on the basis of their reproductive organs and structures [https://www.merriam-webster.com/dictionary/sex])
sex of participants
check with sociologists?
Indicates the binary representation of numbers
sign convention
Specifies the encoding the audio type uses
signal encoding
Specifies the size of a countable entity with regard to the SizeUnit measurement in form of a number
size
Provides information on size for the annotated parts of the resource
size per annotation
The size of the audio subparts of the resource in terms of classification criteria
size per audio classification
Used to give info on size of parts of a resource that differ as to the format
size per audio format
Provides information on the size of the resource parts with different character encoding
size per character encoding
Specifies the size of resource parts per domain
size per domain
Provides information on size per geographically distinct section of the resource
size per geographic coverage
Provides information on size of parts with different image classification
size per image classification
Used to give info on size of parts of a resource that differ as to the format
size per image format
Provides information on the size per language component
size per language
Provides information on the size per language variety component
size per language variety
Provides information on the size per modality component
size per modality
Indicates the size of a subset of a language resource
size per subset
Provides information on size of resource parts with different text classification
size per text classification
Provides information on the size of the resource parts with different format
size per text format
Gives information on the size of textNumerical resource parts with different format
size per textNumerical format
Provides information on size per time period represented in the resource
size per time coverage
Specifies the size of the validated part of a resource
size per validation
Used to give info on size of parts with different video classification
size per video classification
Used to give info on size of parts of a resource that differ as to the format
size per video format
Specifies the unit that is used when providing information on the size of the resource or of resource parts
size unit
Introduces the social media or occupational account details of an entity
social media occupational account
Specifies the type of the social media or occupational account
social media occupational account type
Specifies the source channel of the recording of a multimedia resource
source channel
Specifies the type of the source channel
source channel type
The language from which a translation is made.
source language
Links, in some cases (e.g., harvesting, population from other catalogues, conversion from other metadata records, etc.), to the metadata record that has been used as the basis for the creation of the metadata record
source metadata record
Refers to the entity (repository, catalogue, archive, etc.) from which the metadata record has been imported into the new catalogue
source of metadata record
for now dcat:catalog
Indicates whether the group of participants contains persons with speaking impairments
speaking impairment of participants
Specifies the factors influencing speech
speech influence
Specifies the distinct elements that are pronounced and annotated as such
speech item
Groups information on the static elements visible on images
static element
Specifies the subject/topic of a language resource
subject
topic
Specifies whether some type of support (commercial, free, etc.) is provided for the resource installation or use
support type
The tagset used for annotating a resource or for the input resource (if annotated) of a tool/service or that should be used (dependency) for the annotation
tagset
The language into which a translation is made
target language
Links, in some cases (e.g., harvesting, population from other catalogues, conversion from other metadata records, etc.), an external metadata record that has been used as the basis for the creation of the internal metadata record to this internal metadata record
target metadata record
Provides information on the type of text that may be on the image
text included in image
Indicates if any text and what type is present in or in conjunction with the video
text included in video
Groups information on the format(s) of the textNumerical part of the resource
detailed numerical text format
Specifies the text type (e.g., factual, literary, etc.) according to which a text corpus (part) is classified
text type
Specifies the TRL (Technology Readiness Level) of the technology according to the measurement system defined by the EC
TRL
Technology Readiness Level
Specifies the typesystem (preferrably through an identifier or URL) that has been used for the annotation of a resource or that is required for the input resource of a tool/service or that should be used (dependency) for the annotation or used in the training of a ML model
typesystem
Links to a project in which the language resource has been used
usage project
Links to a document (e.g., research article, report) describing a project, application, experiment or use case where the language resource has been used
usage report
Specifies an LT application where the language resource has been used in
used in application
Specifies the geographic region where a language is spoken
used in region
Used to identify the type of user of the resource (affiliated with a commercial vs. academic institution); it is usually associated with licensing or pricing conditions on the use of a resource
user type
Provides information on each of the the validation procedure(s) a language resource may have undergone (e.g., formal validation of the whole resource, content validation of one part of the resource, etc.).
validation
Specifies the coverage of the language resource that has been validated
validation extent
Specifies the type of processing applied for the validation of the language resource
validation mode
Links to a document with detailed information on the validation process and results
validation report
Specifies the type of validation that has been performed on a language resource
validation type
Links a validation to the person(s), group(s) or organization(s) that have performed a specific validation of a language resource
validator
Specifies the variant of a language, according to the IETF BCP47 guidelines
variant
Links scholarly works (mainly) to a type of versioning vocabulary related to the publication process
version type
If we think this shouldn' be used for LRTs, remove from XSD's
Groups information on the format(s) of a resource; repeated if parts of the resource are in different formats
detailed video format
see how data formats, audio & video formats will be handled; must be on distribution
Specifies the dimensional form applied on the video or image part(s) of a corpus
visual modelling
Provides information on the vocal tract conditions that may influence the speech of the participant
vocal tract condition
Identifies the type of a web service in accordance to a classification vocabulary that refers to the web service communication protocols
web service type
try to restrict with instance "webService" on DistributionForm
geographicIdentifier is also a property for location represented as "rdfs:seeAls"
Introduces a summary of the contents of a document
abstract
A URL where the resource can be accessed from; it can be used for landing pages or for cases where the resource is accessible via an interface, i.e. cases where the resource itself is not provided with a direct link for downloading
access location
Describes in free text specific use cases or situations where the language resource has been used
actual use details
Specifies hardware requirements that are needed for running a tool/service or model
hardware requirements
h/w requirements
The street and the number of the postal address of a person or organization
address
Age of a person
age
End of age range of the group of participants
age range end of participants
Start of age range of the group of participants
age range start of participants
Identifies the training algorithm used for the model (e.g., maximum entropy, svm, etc.)
algorithm
Provides a detailed description of the algorithm, incl. info on whether it's supervised or not
algorithm details
Introduces a name used for a participant of a multimedia resource instead of his/her real one
alias
Introduces an alternative title (e.g. short title) used for the document being described
alternative title
Specifies the number of units that constitute anything that can be measured (e.g. size of a data resource or cost, etc.)
amount
The date in which the annotation process has ended
annotation end date
Provides further information on annotation process
annotation mode details
Indicates whether the annotation is created inline or in a stand-off fashion
annotation standoff
The date in which the annotation process has started
annotation start date
If the resource has been anonymized, this field can be used for entering more information, e.g., tool or method used for the anonymization, by whom it has been performed, whether there was any check of the results, etc.
anonymization details
Indicates the parts of the artifacts represented in the image corpus
artifact part
Indicates the position of the lexicon, if attached to the grammar
attached lexicon position
The text that must be quoted for attribution purposes when using a resource - for cases where a resource is provided with a restriction on attribution
attribution text
Specifies the end date of availability of a resource - only for cases where a resource is available for a restricted time period.
availability end date
Specifies the start date of availability of a resource - only for cases where a resource is available for a restricted time period.
availability start date
Provides a free text account on bias considerations of a model
bias details
Specifies in the form of free text a set of bibliographic data (a bibliographic record) preferrably in bibtex form
bibliographic record
Indicates the body parts that move in the video part of the resource
body movement
Introduces the human readable text that is used as the title of a book (differentiating from the title of an article)
book title
Provides further information on the capturing method and procedure
capturing details
Provides further information on the capturing device
capturing device type details
Introduces a human readable name (label) by which a classification category (e.g. text type, text genre, domain, etc.) is known
category label
Provides the text that must be quoted for citation purposes when using a language resource
citation text
The name of the city, town or village as mentioned in the postal address of a person or organization
city
The number of bits used to represent the colour of a single pixel
colour depth
Specifies a command line used to invoke a software component
command
Whether the audio, video or image is compressed or not
compressed
Whether there is loss due to compression
compression loss
Introduces the name of the conference where the document being described was presented
conference
Introduces a free text statement that may be included with the language resource, usually containing the name(s) of copyright holders and licensing terms
copyright notice
Provides additional information on the creation of a language resource
creation details
The date in which the process of creating a resource was completed
creation end date
The date in which the process of creating the resource started
creation start date
how do we handle cases where a resource was created, for instance, in 2000? Is this the start or end date?
Specifies the initial value that user interfaces should use when prompting the user for a parameter taking a list of values
default value
Specifies the URL where a user can access the demo of a tool/service
demo location
Introduces a short free-text account that provides information about the resource (e.g., function, contents, technical information, etc.)
description
Add some link to datacite:hasDescription and descriptionType:abstract (always for language resources). The rest of the values of descriptionType make sense mainly for publications, but this handling allows easier extension at a later time.
But note that there is no class datacite:Description; descriptionType is in range of anything that "hasDescriptionType"
Provides information on the dialect accent of the participant
dialect accent
Provides information on the dialect accent of the group of participants
dialect accent of participants
Links a docker image to the sha256 hash (used for selecting the appropriate version)
digest
Introduces an academic field of study or research area that a person, group or organization is expert in
discipline
area of expertise
research area
computer programming, translation and interpreting, linguistics, ...
Links to a URL where there's a discussion thread (e.g., user forum) about a resource
discussion URL
Identifies any distractors visible in the resource
distractor
A URL point from which a language resource can be directly accessed (downloaded or executed)
distribution location
Introduces an alternative name (other than the short name) used for a division
division alternative name
The official title of a unit of an organization (e.g., faculty or department of a university, department of a research organization or private company, etc.)
division name
Introduces the short name (abbreviation, acronym, etc.) used for the division of an organization
division short name
A location where the software in the form of a docker image can be downloaded from
docker download location
A URL where the language resource (mainly data but also downloadable software programmes or forms) can be downloaded from
download location
rule: must be filled in when SoftwareDistributionForm = downloadable; must not be filled in for web services; optional in all other cases
Introduces the number of the edition for the document that has been described
edition
Provides information on the education level of the participant
education level
Points to the email address of a person, organization or group
email
john.smith@example.com, ...
Indicates whether the tool or service has been evaluated
evaluated
Provides further information on the evaluation process of a tool or service
evaluation details
Provides description of any events represented in the image corpus
event description
A URL where the resource (mainly software) can be directly executed
execution location
rule: must be filled in when SoftwareDistributionMedium = webService; otherwise, not to be filled in
Indicates the movement of the eyes visible in the resource
eye movement
Indicates the facial expressions visible in the resource
face expression
Indicates the view of the face(s) that appear in the video or on the image part of the resource
face view
A factor that has been used for the n-gram resource
factor
Introduces the fax number of a person, group or organization; recommended format: +_international code_city code_number without spaces
fax number
+123456789012
Describes in free the costs that are required to use the resource, a fragment of the resource or to use a tool or service
fee
pricing policy can be a more generic concept; discuss if we want to formalise this
Defines whether blur is present in the moving sequences
fidelity
Specifies the formalism (bibliographic reference, URL, name) used for the creation/enrichment of the resource (grammar or tool/service)
formalism
The number of frames per second
frame rate
Introduces a category (usually defined by the funding authority) of the funding scheme to which a project complies
funding scheme category
Indicates the geographic region that the content of a resource is about; for countries, recommended use of ISO-3166
geographic coverage
string or Location (iso3166) and geographic coordinates?
Gives information on the geographic distribution of the participants
geographic distribution of participants
Indicates the type of gestures visible in the resource
gesture
Introduces the given name (also known as "first name") of a person
given name
first name
forename
The glottolog code (languoid, cf. https://glottolog.org/glottolog/language) used for a language
glottolog code
Links to the URL where a gold standard resource can be found in order to be used for evaluation purposes
gold standard location
Specifies an identification number usually issued by the funding authority and referring uniquely to the project that has received the funding
grant number
Indicates the movement of hands and/or arms visible in the resource
hand arm movement
Gives information on the manipulation of objects by hand
hand manipulation
Indicates the movements of the head visible in the resource
head movement
Provides information on any hearing impairment the participant may have
hearing impairment
Provides information on the height of the participant in cm
height
Links to a URL that includes information on a person (e.g., CV, publications, communication information, etc.)
home page
Introduces a honorific prefix (or title) such as Dr./Mr/Ms. that precedes the name of a person
honorific prefix
title
The programming language(s) used for the development of a tool/service, which is needed for running the tools/services, in case no executables are available
implementation language
Specifies the parts that interact in an audio or video component
interaction
Any interactive media visible in the resource
interactive media
Provides information on the inter-annotator agreement and the methods/metrics applied
interannotator agreement
Specifies whether the n-gram resource is interpolated (Interpolated language models are constructed by 2 or more corpora and each corpus is represented in the model according to a predefined weight)
interpolated
Provides information on the intra-annotator agreement and the methods/metrics applied
intra-annotator agreement
Specifies whether the model is factored or not
is factored
Introduces the name of the journal where the document being described has been published
journal
Introduces a word or phrase considered important for the description of an entity and thus used to index or classify it
keyword
keyphrase
crisis management, speech synthesis, subtitling, ...
Links to a web page that provides additional information about a language resource (e.g., its contents, acknowledgements, link to the access location, etc.)
landing page
dcat:landingPage has range foaf:Document and is a subproperty of foaf:page; no distinction is made between physical and electronic documents. so I'm not sure what the most appropriate relation is for us
Indicates the landscape parts represented in the image corpus
landscape part
Indicates whether the operation of the tool or service is language dependent or not
language dependent
The identifier of the language subtag according to the IETF BCP47 guidelines (i.e. ISO 639-3 codes when existing supplemented with ISO 639-3 codes for new entries)
language id
The identifier of a language, according to the IETF BCP47 guidelines
language tag
A textual string used for referring to a language variety
language variety name
Introduces the legend of the soundtrack
legend
Introduces an alternative name (other than the short name) used for a licence or terms of use
licence/terms of use/service name
The name by which a legal document (e.g., licence, terms of use, terms of service) is known
licence/terms of use/service name
Introduces the short name (abbreviation, acronym, etc.) used for a licence or terms of use document
licence/Terms of use/service short name
Links to the URL where the text of a licence/terms of use/service is found
licence/terms of use/service URL
Links to a symbol or graphic object used to identify an entity; please, add a URL with an image file
logo
https://www.logo.eu
Indicates the name of a mailing list used in relation to a language resource
mailing list name
Defines the measure calculated for the metric
measure
Provides further information on the way the media types are linked and/or synchronized with each other within the same resource
media type details
Specifies the date when the metadata record was first created
metadata creation date
Specifies the date when the last update of the metadata record was made
metadata last date updated
Specifies the framework that has been used for developing a model (e.g. keras, tensorflow, etc.)
ML framework
Machine Learning framework
Provides further information on the modalities represented in a language resource
modality type details
Introduces a label that can be used to identify the variant of a ML model
model variant
Specifies whether the parameter takes a list of values
multi-value
Provides further information on multilinguality of a resource in free text
multilinguality type details
Introduces the full name of a person; recommended format "surname, givenName"
name
The number of the persons participating in the audio or video part of the resource
number of participants
Specifies the number of audio channels
number of tracks
In MS it's a controlled vocabulary (1, 2, 4, 8) but I think it's too much
The number of participants that have been trained for the specific task
number of trained speakers
Specifies whether the parameter should be treated as mandatory or optional by user interfaces
optional
Specifies the maximum number of items in the sequence
order
Introduces an alternative name (other than the official title or the short name) used for an organization
organization alternative name
Introduces a short free-text account that provides information on an organization
organization description
Provides description of the organizations that may appear in the image corpus
organization description
The official title of an organization
organization name
Introduces the short name (abbreviation, acronym, etc.) used for an organization
organization short name
A description in free text of the source material that has been used for the creation of a language data resource
original source description
Indicates whether the output of the operation of the grammar is a statement of grammaticality (grammatical/ungrammatical) or structures (interpretation of the input)
output
Introduces the range of pages (preferrably with a dash between) on which a document is printed
pages
Provides a short account of he parameter (e.g., function it performs, input / output requirements, etc.) in free text
parameter description
Introduces a short name for a parameter suitable for use as a field label in a user interface
parameter label
Introduces the name of the paramter as sent to a processing service
parameter name
Provides information on the perplexity derived from running on test set taken from the same corpus
perplexity
Provides descriptive features for the persons represented in the image corpus
person description
If the resource includes personal data, this field can be used for entering more information, e.g., whether special handling of the resource is required (e.g., anonymization, further request for use, etc.)
personal data details
The place in which the participant has been born
place of birth
The place in which the participant lived as a child
place of childhood
The place of living of a participant
place of living
Specifies the place of the secondary education of the participant
place of second education
Indicates the number of poses per subject that participates in the video part of the resource
poses per subject
The position or the title of a person if affiliated to an organization
position
Specifies whether the resource is private so that its access/download location remains hidden
private
Provides information on the participant's profession
profession
Introduces an alternative name (other than the short name) used for a project
project alternative name
The end date of a project
project end date
The official title of a project
project name
project title
Links to a report issued by a project
project report
Introduces a short name (e.g., acronym, abbreviated form) by which a project is known
project short name
The starting date of a project
project start date
Introduces a short description (in free text) of the main objectives, mission or contents of the project
project summary
Specifies the date when a language resource has been made available to the public
publication date
problem: when is the first publication date if a resource has been made available through zenodo, meta-share, clarin, etc.?? How can this be propagated?
Important for citation purposes
should we also differentiate between distributions? e.g., a csv published at a different time from a database?
Introduces the name of an organization that has published the document being described
publisher
The number of bits for each audio sample
quantization
In MS it's a controlled vocabulary (8, 16, 24, 32, 64) but I found it too much
Free text description of the recording device
recording device type details
Specifies the software used for the recording platform
recording platform software
The name of the region, county or department as mentioned in the postal address of a person or organization
region
The identifier of the region subtag according to the IETF BCP47 guidelines (i.e. ISO 3166 codes)
region id
Specifies the type of register (any of the varieties of a language that a speaker uses in a particular social context [Merriam-Webster]) of the contents of a language resource
register
Specifies the Call for proposals to which a project was submitted
related call
Specifies the funding programme to which a project was submitted
related programme
program
Specifies the funding subprogramme to which a project was submitted
related subprogramme
Describes in a free text statement a relation holding between two Language Resources not yet covered by the schema
relation type
Specifies software tools or libraries that are required for the operation of a tool/service or computational grammar
requires software
left as a data property - it's for more general software than language processing tools, so difficult to restrict
Introduces a human-readable name or title by which the resource is known
resource name
resource title
British National Corpus
Introduces a short form (e.g., abbreviation, acronym, etc.) used to refer to a language resource
resource short name
Provides an account of the revisions in free text or a link to a document with revisions
revision
Introduces a free text statement on the robustness of the grammar (how well the grammar can cope with misspelt/unknown, etc. input, i.e. whether it can produce even partial interpretations of the input)
robustness
Provides further information on the running environment of a tool/service or of a language description
running environment details
Gives information on the running time of a tool or service
running time
Introduces a short text that can be used to feed a service in order to test it
sample text
Links a resource to a url (or url's) with samples of a data resource or of the input of output resource of a tool/service
samples location
Specifies the format of files contained in the resource in Hertz
sampling rate
The identifier of the script subtag according to the IETF BCP47 guidelines (i.e. ISO 15924 codes)
script id
If the resource includes sensitive data, this field can be used for entering more information, e.g., whether special handling of the resource is required (e.g., anonymization, further request for use, etc.)
sensitive data details
Specifies the type of image sensor or the sensing method used in the camera or the image-capture device
sensor technology
Introduces the title of the series of a book/journal where the document being described has been published
series
Τhe URL where the docker image of the service adapter can be downloaded from
service adapter download location
Lists the service(s) offered by an organization or person
service offered
Introduces a free text statement on the shallowness of the grammar (how deep the syntactic analysis performed by the grammar can be)
shallowness
Introduces a free text that can be used as a short bio for a person
short bio
The frame height in pixels
size height
Specifies the size of a resource in the form of a free text
size (free text)
The frame width in pixels
size width
Provides information on whether the participant smokes and on his/her smoking habits in general
smoking habit
The technique used for giving unseen items some probability
smoothing
Provides information on the details of the channel equipment used (brand, type, etc.) in free text
source channel details
Introduces the name of the source channel through which the recording has been made
source channel name
Provides information on any speaking impairment the participant may have
speaking impairment
Indicates whether a company is a startup
startup
Specifies the status in which a project is found in the course of a funding process (e.g. approved, signed, under negotiation, etc.)
status
Introduces a secondary title used for a document
subtitle
Introduces the surname (i.e. the name shared by members of a family, also known as "family name" or "last name") of a person
surname
family name
last name
Whether the media part described is synchronized with audio within the same resource
synchronized with audio
Whether the media part described is synchronized with image within the same resource
synchronized with image
Whether video, text and textNumerical media type is synchronized with text within the same resource
synchronized with text
Whether the media part described is synchronized with textNumerical within the same resource
synchronized with textNumerical
Whether the media part described is synchronized with video within the same resource
synchronized with video
Specifies whether the contents of a corpus have been artificially generated (e.g., using a Machine Translation or NLG system) rather than having been created by humans (authentic/original data)
synthetic data
Introduces a tag that can be used as a criterion for selecting different samples for testing (e.g. the language value for Machine Translation services that operate on multiple languages)
tag
Indicates the theoretic model applied for the creation/enrichment of the resource, by name and/or by reference (URL or bibliographic reference) to informative material about the theoretic model used
theoretic model
Provides description of the things represented in the image corpus
thing description
Indicates the time period that the content of a resource is about
time coverage
Introduces a human-readable text (typically short) used as a name for referring to the document being described
title
Provides information on whether the participant is trained in a specific task
trained speaker
Provides a detailed description of the training corpus (e.g., size, number of features, etc.)
training corpus details
Provides a detailed description of the training process (pre-processing, pretraining, etc.)
training process details
Specifies the type of objects or people represented in the video or image part of the resource
type of element
The main types of object or people represented in the image corpus
type of image content
Specifies the content that is represented in the textNumerical part of the resource
type of textNumerical content
Main type of object or people represented in the video
type of video content
Identifies the unit of measure used in the calculation of the metric (for the evaluation of a tool/service)
unit of measure
Specifies the frequency with which the resource is updated
update frequency
Links to a URL
has URL
Introduces a query performed by a user that has resulted in the creation of the resource being described (e.g. a dataset created with a query aggregating texts from a database, or subset of a lexical resource, or a virtual collection of sentences from online corpora)
user query
Specifies whether the resource has undergone a formal validation process
validated
Provides additional information in the form of free text about the validation process of a language resource
validation details
Introduces a short free text used to show to the user as a tip for the value of a list used in parameters
value description
Introduces a short free text used to show to the user for the value of a list used in parameters
value label
Introduces a free text used to identify the value of a list inside parameters
value name
The identifier of the variant subtag according to the IETF BCP47 guidelines
variant id
Associates a language resource with a pattern that indicates its version; the recommended way is to follow the semantic versioning guidelines (http://semver.org) and use a numeric pattern of the form major_version.minor_version.patch
version
Identifies the date associated with the version of the language resource being described (as a recommendation, of the latest update of the particular version)
version date
Introduces the number used for the volume of a journal or a series of publications
volume
Links to a URL that acts as the primary page (like a table of contents) introducing information about an organization (e.g., products, contact information, etc.) or project
website
home page
Provides information on the weight of the participant
weight
Indicates whether the grammar contains numerical weights (incl. probabilities)
weighted grammar
The zip code of the postal address of a person or organization
zip code
The end date of an activity
end date
The start date of an activity
start date
Location
Period of time
The name of the scheme used to identify a funder
Funder identfier scheme
A string used to uniquely identify an organization
Organization identifier
Organisation identifier
The name of the scheme used to identify an organization
Organization identifier scheme
A string (e.g., PID, DOI, internal to an organization, etc.) used to uniquely identify a person
Person identifier
The name of the scheme used to identify a person
Person identfier scheme
The name of the scheme used to identify a resource
Resource identifier scheme
The use of a resource only for academic purposes (academic research, educational purposes)
Academic use
A formal statement indicative of licensing terms for the use of a resource (e.g., open access, free to read, etc.); its semantics should be clear, preferably formally expressed and documented at a URL
Access rights statement
A string used to uniquely identify an access rights statement according to a specific scheme
Access rights statement identifier
The name of the scheme according to which an access rights statement is assigned to a distribution
Access rights statement scheme
A person, organization or group that participates in an event or process
Actor
Changed to subclass of foaf:agent because according to its definition (An agent (eg. person, group, software or physical artifact)), it might include entities that we do not consider at this time actors
Any activity where a language resource has been used (e.g., in a project, in an experiment, for a publication, etc.)
Actual use
The physical address of an organization, group or person expressed in the form of distinct elements (street address, zip code, city, etc.)
Address
Information the organization that a person works for (e.g., a company) or is affiliated with (e.g., university, etc.) and the data that are specific to this relation (e.g., position, professional contact information, etc.)
Affiliation
The age group to which the participant belongs
Age group
A type of element that can be annotated and may differ depending on the annotation level
Annotated element
The part of a corpus that contains annotations (interpretative linguistic information grounded in a knowledge resource that is added manually or automatically to a text or corpus respectively)
Annotation
Indication of whether the language resource has been anonymized
Anonymized
Any software program (or group of programs seen as a whole) intended for the end-user and addressing one or multiple related user needs.
Application
Relation of attribution between language resource and an agent
Attribution
Indication of the intended audience size of a multimedia resource
Audience
The format of the audio part of a multimedia resource
Detailed audio format
A classification of audio parts based on extra-linguistic and internal linguistic criteria and reflected on the audio style, form or content
Audio genre
A classification scheme devised for audio genres by an organization/authority
Audio genre classification scheme
A string used to uniquely identify an audio genre according to a specific classification scheme
Audio genre identifier
The audio quality measures included in audio format
Audio quality measure included
Authorize the use of a resource by an individual
Authorize
Availability status of the resource; conditionsOfUse can be further used to indicate the specific terms of availability
Availability
Is there a point of keeping it? makes sense only for "under negotiation" and 'availableThroughOtherDistributors' [change the value]; maybe also "availableThroughOwnerOnly"???
Type of item that is represented in the n-gram resource
Base item
The body parts visible in the video or image part of the resource
Body part
The byte order of 2 or more bytes sample
Byte order
The type of transducers through which the data is captured
Capturing device type
The type of the capturing environment in terms of complexity
Capturing environment
The name of the character encoding used in the resource or accepted by the tool/service
Character encoding
A type of classifying (arranging in groups/categories) language resources
Classification
Scheme used for the classification of a language resource based on various criteria (e.g., domains, text types, genres, etc.)
Classification scheme
The colour space for the video and image part(s) of a resource
Colour space
Combine two or more resources of different resource types (e.g. a model and a tool/service) in order to deploy them together
Combine
A classification into types for lexical lexicon that can be used with the grammar being described
Compatible lexicon type
A piece of software typically intended for a specific technical purpose, such as a particular implementation of a part-of-speech tagger (e.g TreeTagger), a tree parsing program (e.g., mstparser), etc. It is wrapped in a standard way within a particular component-oriented framework such as UIMA, GATE, etc. or as a specific type of web service.
Component
The name of the compression applied
Compression name
A condition imposed via a specified licence on the use of a language resource (e.g., non-commercial use, no derivatives, etc.)
Condition of use
Classification of the conversation included in a multimedia resource based on the number of participants
Conversational type
A structured collection of pieces of data (textual, audio, video, multimodal/multimedia, etc.) typically of considerable size and selected according to criteria external to the data (e.g., size, type of language, type of text producers or expected audience, etc.) to represent as comprehensively as possible the object of study
Corpus
The part of a corpus (or whole corpus) that consists of audio segments
Corpus audio part
The part of a corpus (or whole corpus) that consists of images (e.g., g a corpus of photographs and their captions)
Corpus image part
A part/segment of a corpus based on its media type classification
Corpus part
A classification of corpora into types (used for descriptive reasons)
Corpus subclass
The part of a corpus (or whole corpus) that consists of sets of textual representations of measurements and observations linked to sensorimotor recordings
Corpus text numerical part
The part of a corpus (or a whole corpus) that consists of textual segments (e.g., a corpus of publications, or transcriptions of an oral corpus, or subtitles, etc.)
Corpus text part
The part of a corpus (or whole corpus) that consists of video files (e.g., a corpus of film documentaries)
Corpus video part
The cost for accessing a resource formally desribed as a set of amount and currency unit
Cost
The name of a country preferrably according to ISO 639
Country
add here the dataset from EU NAL
Any system of money used in a particular country
Currency
A resource composed of linguistic material used to assist and augment language processing applications, but also, in a broader sense, in language and language-mediated research studies and applications; examples include data sets (textual, multimodal/multimedia and lexical data, grammars, language models, etc.) in machine readable form
Data language resource
Any form with which a dataset is distributed, such as a downloadable form in a specific format (e.g., spreadsheet, plain text, etc.) or an API with which it can be accessed
Dataset distribution
The form (medium/channel) used for distributing a language resource consisting of data (e.g., a corpus, a lexicon, etc.)
Dataset distribution form
The action of uploading a resource to a hosting entity (repository), e.g. for preservation purposes
Deposit
The (type of) entity that is being described by a metadata record
type of described entity
A framework or toolkit (Machine Learning model, NLP toolkit) used in the development of a resource
Development framework
Any form with which a language resource is distributed; for software, this can refer to web services, executable or code files, etc.; for datasets, it can be a downloadable form in a specific format (e.g., spreadsheet, plain text, etc.) or an API with which it can be accessed
Distribution
A feature that can be used for describing distinct distributable forms of audio resources/parts
Audio feature
Τhe form used for distributing (delivering or providing access) to the resource
Distribution form
A feature that can be used for describing distinct distributable forms of image resources/parts
Image feature
A feature that can be used for describing distinct distributable forms of text resources/parts
Text feature
A feature that can be used for describing distinct distributable forms of numerical text resources/parts
Numerical text feature
A feature that can be used for describing distinct distributable forms of video resources/parts
Video feature
Section of a larger organization (e.g., school, faculty, department of a university, department or branch of a company, etc.)
Division
A controlled vocabulary used for classifying divisions
Division category
A piece of written, printed, or electronic matter that is primarily intended for reading
Document
A string (e.g., PID, DOI, internal to an organization, etc.) used to uniquely identify a document (mainly intended for published documents)
Document identifier
A scheme according to which an identifier is assigned by the authority that issues it (e.g., DOI, PubMed Central, etc.) specifically for publications
Document identifier scheme
A classification vocabulary for documents (e.g., article, book, etc.)
Document type
The type of the documentation vis-a-vis its relation with the language resource (e.g., general publication, user manual, etc.)
Documentation type
A particular field of thought, activity, or interest related to a language resource, organization or person activities, etc.
Domain
health, law, banking, ...
Adapted from https://www.collinsdictionary.com/dictionary/english/domain
A classification scheme devised by an authority for domains
Domain classification scheme
A string used to uniquely identify a domain according to a specific classification scheme
Domain identifier
Download of a content file (with data or executable file) on one's local system
Download
The time that an audio or video recording, etc. lasts
Duration of audio
A unit for measuring duration of audio / video parts
Duration unit
Element represented in the video part of a resource
Dynamic element
The linguistic level of analysis a lexical/conceptual resource or language description caters for as represented in its contents
Encoding level
The value of a list used inside parameters
Enumeration value
Each of the procedures and outcomes followed for evaluating the performance and other features of a tool/service
Evaluation
The use of a resource for evaluation purposes
Evaluation
The criterion of the evaluation of a tool
Evaluation criterion
The evaluation level
Evaluation level
The evaluation measure (human or automatic)
Evaluation measure
Indicates the type of evaluation performed on a tool/service
Evaluation type
The non-textual information contained in a lexical/conceptual resouce (e.g., images, videos, audios, etc.); it can be used as an alternative to fully describing them as a proper part of the resource when they are not as important as the textual part (e.g., cases of a dictionary with images attached to some lemmas)
Extratextual information
The unit on which the extratextual information in a lexical/conceptual resource is attached
Extratextual information unit
The implementation framework used for developing and running a tool/service
Wrapping framework
A classification of the funding of a project with regard to the source of the funds
Funding type
A way of classifying a language resource with regard to the style, form and content
Genre
A set of rules governing what strings are valid or allowable in a language or text [https://en.oxforddictionaries.com/definition/grammar]
Grammar
An indication of the grammatical phenomena covered by the grammar
Grammatical phenomena coverage
A set of persons related to some aspect of a language resource, that do not have a legal status (e.g., a group of software developers working on the same software)
Group
A string used to uniquely identify a group
Group identifier
Whether the group of participants contains persons with hearing impairments
Hearing impairment of participants
The format used for the image part of a resource
Detailed image format
A category of images characterized by a particular style, form, or content according to a specific classification scheme
Image genre
A classification scheme devised by an authority for image genres
Image genre classification scheme
A string used to uniquely identify an image genre according to a specific classification scheme
Image genre identifier
The level of conversational interaction between speakers (for audio component) or participants (for video component)
Interactivity
A classification of lexical/conceptual resources into types (used for descriptive reasons)
LCR subclass
Language description subclass
Identifies the task performed by the language description
Language description task
add links to omtd-share operations
A string (e.g., PID, DOI, internal to an organization, etc.) used to uniquely identify a language resource
Language Resource identifier
The name of the scheme according to which the LRT identifier is assigned by the authority that issues it (e.g., DOI, ISLRN, etc.)
LR identifier scheme
Language Resource identifier scheme
A linguistic system (language, family of languages, dialect, idiolect)
language
A resource that describes a language or some aspect(s) of a language via a systematic documentation of linguistic structures
Language description
The part (or whole set) of a language description that consists of images (e.g., a grammar part with photos or images for sign languages)
Language description image part
A part/segment of a language description based on its media type classification
Language description part
The textual part (or whole set) of a language description
Language description text part
The part (or whole set) of a language description that consists of videos (e.g., a grammar part with videos for sign languages)
Language description video part
The use of a resource for research in the Language Technology (Language engineering) area
Language engineering research
A statistical language model is a probability distribution over sequences of words. Given such a sequence, say of length m, it assigns a probability P (w1, ..., wm) to the whole sequence. [from Wikipedia (https://en.wikipedia.org/wiki/Language_model)]
Language model
Statistical language model
A resource composed of linguistic material used in the construction, improvement and/or evaluation of language processing applications, but also, in a broader sense, in language and language-mediated research studies and applications; the term is used with a broader meaning, encompassing (a) data sets (textual, multimodal/multimedia and lexical data, grammars, language models, etc.) in machine readable form, and (b) tools/technologies/services used for their processing and management.
Language Resource
LR
A variant for languages, as defined in the variant subtag of the IETF BCP47 guidelines
Variant
A particular linguistic system used in a specific region or by a social group
Language variety
A classification for language varieties
Language variety type
A resource organised on the basis of lexical or conceptual entries (lexical items, terms, concepts, etc.) with their supplementary information (e.g., grammatical, semantic, statistical information, etc.)
Lexical/Conceptual resource
The part (or whole set of) a lexical/conceptual resource that consists of audio elements (e.g., a set of audio files for a lexicon)
Lexical/Conceptual resource audio part
A part (or whole set of) a lexical/conceptual resource that consists of images (e.g., a set of images used for a lexicon)
Lexical/Conceptual resource image part
A part/segment of a lexical/conceptual resource based on its media type classification
Lexical/ Conceptual resource part
A part (or whole set) of a lexical/conceptual resource that consists of textual elements
Lexical/Conceptual resource text part
The part (or whole set) of a lexical/conceptual resource that consists of videos (e.g., a set of video files for the lexicon of a sign language)
Lexical/Conceptual resource video part
A classification scheme for licences
Licence category
The name of the scheme according to which a category is assigned to a licence
Licence category scheme
A string used to uniquely identify a licence
Licence identifier
The name of the scheme according to which an identifier is assigned to a licence by the authority that issues it
Licence identifier scheme
A legal document (licence or terms of use/service) with which the language resource is distributed
Licence
Kanella test
Until now, this included an enumerated list of individuals; I think it's better to treat them as proper entities with name, identifier, etc.
This means that for the editors we need to
- add a set of well known licences using the SPDX codes (https://spdx.org/licenses/) and any set already in RDF form (e.g., https://github.com/creativecommons/cc.licenserdf/tree/master/cc/licenserdf/licenses and Victor's list) with our own extra elements
- allow them to add a new licence with the specific elements that we want
- add the clarin licence categories & conditions of use with the ODRL representation
Note: I have created a file with the CC-rdf licences and its ontology so that I can edit them and add the supplementary features => under MS ontologies, file testrdflicencesCC.txt
Classification of a resource (part) based on the number of languages it includes
Linguality type
The links between different media parts of a multimedia resource
Link to other media
The framework (e.g. keras, tensorflow, etc.) that has been used for developing a Machine Learning model and the tool/service that uses it
ML framework
Machine Learning framework
The model artifact that is created through a training process involving an ML algorithm (that is, the learning algorithm) and the training data to learn from.
Machine Learning model
ML model
A part of a language resource based on the media type it belongs to (e.g., text, audio, video parts of a corpus or lexicon)
Media part
A classification of language resource parts based on the physical medium they are available in (e.g., text, video, audio)
Media type
An institution with members that can benefit from specific conditions on the use of a resource (e.g. discount, unlimited access, etc.)
Membership institution
A set of formalized structured information used to describe the contents, structure, function, etc., of an entity, usually according to a specific set of rules (metadata schema)
Metadata record
A string (e.g., PID, DOI, internal to an organization, etc.) used to uniquely identify a metadata record
Metadata record identifier
The name of the scheme according to which the metadata record identifier is assigned by the authority that issues it (e.g., DOI, URL, etc.)
Metadata record identifier scheme
A formal set of metadata elements organized in a specific way
Metadata schema
The metric used for the evaluation of the tool/service
Metric
A classification of modalities represented in the resource or processed by a tool/service
Modality type
Model
A classification of models into types based on their algorithm
Model type
A classification of a bi/multilingual resource (part) into parallel, comparable or mixed
Multilinguality type
A language model consisting of n-grams, i.e., specific sequences of a number of words
N-gram model
NLP framework
The level of naturality for multimodal/multimedia resources
Naturality
A set of algorithms, modeled loosely after the human brain, designed to recognize patterns [adapted from https://skymind.ai/wiki/neural-network]
Neural network
The level of background noise
Noise level
A non-speech element that maybe included in the audio corpus or lexical/conceptual resource
Non-speech item
The output of a report
Object of report
The operating system on which the software is compatible with
operating system
A company or other group of people that works together for a particular purpose [https://dictionary.cambridge.org/dictionary/english/organization]
Organization
Organisation
A classification vocabulary for organizations as legal entities
Organization legal status
A vocabulary for classifying organizations according to their role in relation to Language Technology
Organization role
The relation of the language of the group of participants with respect to the acquisition stage
Origin
A a special kind of variable in computer programming language that is used to pass information between functions or procedures [from https://www.techopedia.com/definition/3725/parameter-param]
Parameter
A specific (not yet standardised) typing system for parameters (e.g., whether it's boolean, string, integer, a document, mapping, etc.)
Parameter type
A person that has participated in the recording of the audio, video, sensorimotor (textNumerical) part of the resource
Participant
The performance indicator used in the evaluation of a software given in the form of the metric used and reporting on the measure and unit of measure for this
Performance indicator
A human being
Person
Subclass of foaf to exclude imaginary persons
Specification of whether the language resource contains personal data (mainly in the sense falling under the GDPR)
Personal data included
Policy types that a tool/service has with respect to the annotation types included in the annotated resource that it takes as input (i.e. whether it keeps, modifies or drops them in the output resource)
Previous annotation types policy
To process a data resource with a language processing service (e.g. to create an annotated version, or to extract elements from it)
Process
Classification of the mode of operation applied when processing a resource
Process mode
A set of requirements posed on the resource that is input for processing by a tool/service
Input content resource
The type of the resource that a tool/service takes as input or produces as output
Processing resource type
Unsure of the values for audio/video files
A set of operations undertaken as a whole by an individual or organization and related to some aspect of the lifecycle of the language resource (e.g., funding, deployment, etc.)
Project
A string (e.g., PID, internal to an organization, issued by the funding authority, etc.) used to uniquely identify a project
Project identifier
The name of the scheme according to which an identifier is assigned to a project by the authority that issues it
Project identifier scheme
The quality level of image part(s) of a resource
Quality
Type of graphics of the image part(s) of a resource
Raster or vector graphics
The nature of the recording platform hardware and the storage medium
Recording device type
A setting where the recording of a multimedia resource took place
Recording environment
Indication of the audio or video recording quality
Recording quality
Indicates the type for the lexica that must or can be used with the grammar
Related lexicon type
A relation holding between two language resources (e.g., a raw corpus and its annotated version, the annotation tool used for it, etc.)
Relation
Give an account of an event related to the resource (e.g. a research plan, the use of the resource in a project or publication, etc.)
Report
A string (e.g., PID, DOI, internal to an organization, etc.) used to uniquely identify a repository
Repository identifier
The name of the scheme used to identify a repository
Repository identifier scheme
Type of hardware required for running a tool and/or computational grammar
Required hardware
The use of a reseach for research purposes
Research
The plan for conducting research using a specific resource; this may be required by a resource provider to allow access to the resource
Research plan
The image resolution used for the video/image part(s) of a resource
Resolution
The standard to which the resolution conforms
Resolution standard
The role of an agent with regard to an activity (e.g. contributor, funder, software developer, etc.)
Role
A string used to uniquely identify a role according to a specific scheme
Role identifier
The name of the scheme used to identify a role
Role identifier scheme
A combination of the sample text(s) or sample file(s) and optional tags that can be used for feeding a processing service for testing purposes
Sample
A classification of the task defined for the conversation or the interaction of participants in a multimedia resource
Scenario type
Information on the illumination of the scene
Scene illumination
The level at which a resource may be segmented (e.g., clause, word, unit, etc.)
Segmentation level
Specification of whether the language resource contains sensitive data (e.g., medical/health-related, etc.) and thus requires special handling
Sensitive data included
Either of the two major forms of individuals that occur in many species and that are distinguished respectively as female or male especially on the basis of their reproductive organs and structures [https://www.merriam-webster.com/dictionary/sex]
Sex
The constitution of a group as regards the sex of the members
Sex of participants
Binary representation of numbers
Sign convention
To sign in a system or application
Sign in
The obligation of the assignee to sign a licence before being allowed to use a resource
Sign licence
The encoding of the audio type of a multimedia resource
Signal encoding
The size of the resource with regard to the SizeUnit measurement in form of a number
Size
The unit of measurement used for determining and describing the size of a resource (part)
Size unit
The details for the social media or occupational account of a person or organization
Social media occupational account
A list of social media
Social media account occupational type
Any form with which software is distributed (e.g., web services, executable or code files, etc.)
Software distribution
The medium, delivery channel or form (e.g., source code, API, web service, etc.) through which a software object is distributed
Software distribution form
Information on the source channel
Source channel
Type of the source channel
Source channel type
Whether the group of participants contains persons withwith speakingimpairments
Speaking impairment of participants
A category for the conventionalized discourse of the speech part of a language resource, based on extra-linguistic and internal linguistic criteria
Speech genre
A classification scheme used for speech genres devised by an authority or organization
Speech genre classification scheme
A string used to uniquely identify a speech genre according to a specific classification scheme
Speech genre identifier
A factor influencing speech
Speech influence
An element contained in an audio recording that is pronounced and annotated as such
Speech item
A standard or best practice widely used to which the lexical/conceptual resource or corpus conforms to
Standard/Best practice
some of these are only formats rather than standards; how can we distinguish?
The static elements visible on images
Static element
A situation, event, thing, etc. that is discussed in the contents of a language resource
Subject
Topic
A classification scheme used for subjects/topics devised by an authority or organization
Subject classification scheme
A string used to uniquely identify a subject according to a specific classification scheme
Subject identifier
A subset of a language resource in terms of a specific parameter (e.g., per domain, time coverage, linguistic level of representation, etc.)
Subset
The type of support provided for the use of a resource
Support type
An indicator of the maturity level of particular technologies according to a measurement system defined by EC [adapted from https://ec.europa.eu/info/funding-tenders/opportunities/portal/screen/support/faq/2890]
TRL
Technology Readiness Level
A category of text characterized by a particular style, form, or content according to a specific classification scheme
Text genre
A classification scheme used for text genres devised by an authority or organization
Text genre classification scheme
A string used to uniquely identify a text genre according to a specific classification scheme
Text genre identifier
Provides information on the type of text that may be on the image
Text included in image
The presence or not of text in or in conjunction with the video and, if yes, the type
Text included in video
The format(s) of the textNumerical part of the resource
TextNumerical format
An abstract category designed to characterize the main structure of a particular text or one of its parts according to its dominant properties [from https://www.lhn.uni-hamburg.de/node/121.html]
Text type
A classification scheme used for text types devised by an authority or organization
Text type classification scheme
A string used to uniquely identify a text type according to a specific classification scheme
Text type identifier
A tool/service/any piece of software that performs language processing and/or any Language Technology related operation.
Language Technology tool/service
LT tool/service
Tooolkit
Classification type for users of a language resource; usually required for assessing the licensing or pricing policies for the use of the resource
User type
Each of the procedures and outcomes followed for validating a language resource, i.e. confirming that its contents/structure/operation is as claimed after careful examination
Validation
The resource coverage in terms of validated data
Validation extent
Classification of validation that indicates whether the validation was made on the contents or on formal aspects of the resource
Validation type
A type of versioning vocabulary used mainly for works (e.g., scholarly articles, books, etc.) that go through a formal review process before final publication, marked with major changes across the versions
Version type
When the COAR vocabulary is published (cf. https://www.coar-repositories.org/activities/repository-interoperability/coar-vocabularies/version-type-vocabulary/) check again
Detailed video format
The format of a video (part of a) resource
A classification of video parts based on extra-linguistic and internal linguistic criteria and reflected on the video style, form or content
Video genre
A classification scheme devised by an authority for video genres
Video genre classification scheme
A string used to uniquely identify a video genre according to a specific classification scheme
Video genre identifier
The dimensional form applied on the video or image part(s) of a corpus
Visual modelling
The vocal tract condition that may influence the speech of the participant
Vocal tract condition
A classification of web services based on the protocol used for accessing them
Web service type
Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing where words or phrases from the vocabulary are mapped to vectors of real numbers. [wikipedia]
Word embedding
To be formally allowed to access and make use of a resource
Authorize
Archival Resource Key
ARK
Identifier for ArXiv (http://arxiv.org/), a open access repository of preprints, having the format arXiv:1207.2147.
ArXiv
It is the Astrophysics Data System bibliographic codes, a standardized 19 character identifier according to the syntax "yyyyjjjjjvvvvmppppa".
bibcode
Digital Object Identifier scheme (https://www.doi.org/)
DOI
the Handle system
Handle
International Standard Book Number
ISBN
OpenID is an open standard that describes how users can be authenticated in a decentralized manner, eliminating the need for centralized registration services.
OpenID
Open Researcher and Contributor Identifier.
ORCID
Persistent Uniform Resource Locator
PURL
ResearcherID is an identifying system for scientific authors created and owned by Thomson Reuters.
ResearcherID
Universal Product Code
UPC
Uniform Resource Locator
URL
Uniform Resource Name
URN
ADPCM
Adaptive differential pulse-code modulation
ANC domain classification system
American National Corpus domain classification system
ANC genre classification scheme
American National Corpus genre classification scheme
ART
Average Response Time
NLP framework based on deep learning; it includes reference implementations of high quality models for both core NLP problems (e.g. semantic role labeling) and NLP applications (e.g. textual entailment); built on PyTorch
AllenNLP
AlvisNLP
A speech processing toolkit based on PyTorch
Asteroid
A type of content included in a language resource pertaining to the audio level
Audio type
http://www.mindmakers.org/projects/bml-1-0/wiki#Introduction
BML
Behavior Markup Language
Behaviour Markup Language
BNC domain classification system
British National Corpus domain classification system
BNC text type classification scheme
Big5
Big5-HKSCS
Big5_Solaris
CD-ROM refers to a kind of distribution medium used for delivering a language resource; intended for resources delivered on CD-ROM
CD-ROM
CDMA
https://www.cs.vassar.edu/CES/
CES
Corpus Encoding Standard
CLARIN-SHARE
CMYK
COAR access rights vocabulary (http://vocabularies.coar-repositories.org/documentation/access_rights/)
COAR Rights Statement Scheme
CORE
Caffe
CLARIN_EL text type classification scheme
CLARIN-EL domain classification
CLARIN licence category scheme
A more detailed account of the linguistic information contained in the lexical/conceptual resource
Content type
Content types vocabulary
Content types vocabulary
Cp037
Cp1006
Cp1025
Cp1026
Cp1046
Cp1047
Cp1097
Cp1098
Cp1112
Cp1122
Cp1123
Cp1124
Cp1140
Cp1141
Cp1142
Cp1143
Cp1144
Cp1145
Cp1146
Cp1147
Cp1148
Cp1149
Cp1381
Cp1383
Cp273
Cp277
Cp278
Cp280
Cp284
Cp285
Cp297
Cp33722
Cp420
Cp424
Cp437
Cp500
Cp737
Cp775
Cp838
Cp850
Cp852
Cp855
Cp856
Cp857
Cp858
Cp860
Cp861
Cp862
Cp863
Cp864
Cp865
Cp866
Cp868
Cp869
Cp870
Cp871
Cp874
Cp875
Cp918
Cp921
Cp922
Cp930
Cp933
Cp935
Cp937
Cp939
Cp942
Cp942C
Cp943
Cp943C
Cp948
Cp949
Cp949C
Cp950
Cp964
Cp970
DAT
DDC384
DDC388
DDC630
DDC659
https://www.oclc.org/en/dewey/resources/summaries.html
DDC
Dewey Decimal Classification
http://metadata.dk/dk5-as-linked-data
DK-5
Danish Decimal Classification 5th edition
DVB-C
DVB-S
DVB-T
Value for data distribution medium, used for language resources delivered on a DVD-R
DVD-R
DataCite
Deep Learning model
DL model
http://www.ilc.cnr.it/EAGLES/browse.html
EAGLES
ELE access rights vocabulary
ELE access rights statements scheme
ELG-SHARE
ELG domain classification
European Language Resources Association (www.elra.eu)
ELRA
EMA
https://www.w3.org/TR/2009/WD-emotionml-20091029/
EmotionML
EML
Emotion Markup Language
https://www.w3.org/TR/emma/
EMMA
Extensible MultiModal Annotation markup language
A speech processing toolkit, mainly focusing on end-to-end speech recognition and end-to-end text-to-speech using Chainer and PyTorch as a main deep learning engine
EUC-JP
EUC-KR
https://eur-lex.europa.eu/browse/eurovoc.html
EUROVOC
5G
Flair
4G
GATE
GB18030
GBK
https://xtm.cloud/manuals/gmx-v/GMX-V-2.0.html
GMX
Global information management Metrics eXchange
GSM
Galaxy
http://www.xces.org/ns/GrAF/1.0/
GrAF
Graph Annotation Framework
H2O
HD.1080
HD.720
https://www.sign-lang.uni-hamburg.de/dgs-korpus/index.php/hamnosys-97.html
HamNoSys
Hamburg Sign Language Notation System
ISCII91
ISDN
ISO-2022-JP
ISO-2022-KR
ISO-8859-1
ISO-8859-15
ISO-8859-2
ISO-8859-3
ISO-8859-4
ISO-8859-5
ISO-8859-6
ISO-8859-7
ISO-8859-8
ISO-8859-9
ISO-8860-13
http://www.datcatinfo.net/#/
http://www.iso.org/sites/dcr-redirect/dcr.html
ISO12620
DCR
Data Category Repository
http://xml.coverpages.org/ISO16642-200207.pdf
ISO16642
TMF
Terminological Markup Framework
Quality systems -- Model for quality assurance in design/development, production, installation and servicing
ISO 9001:1987
ISO2022_CN_CNS
ISO2022_CN_GB
https://www.iso.org/standard/43427.html
ISO 26162
Systems to manage terminology, knowledge and content
https://www.iso.org/standard/45797.html
ISO 30042
TBX
TermBase eXchange
https://www.iso.org/standard/38109.html
ISO704
Terminology work
https://www.w3.org/TR/InkML/
Ink ML
Ink Markup Language
JAX
JISAutoDetect
A python framework for running pipelines
Joblib
KOI8-R
Keras
https://www.iso.org/standard/37326.html
LAF
ISO 24612
Linguistic Annotation Framework
Linguistic Data Consortium
LDC
http://www.lexicalmarkupframework.org/
LMF
Lexical Markup Framework
LT integrator
Language Technology integrator
LT supplier
Language Technology supplier
LT user
Language Technology user
A type of content included in a language resource pertaining to the lemma
Lemma type
Lemon-Ontolex
MAD
Mean Annotations per Document
https://www.iso.org/standard/51934.html
MAF
ISO 24611
Morpho-syntactic Annotation Framework
MARC
MDTV
Mean Time per Document Volume
META-SHARE
https://www.iso.org/standard/37330.html
MLIF
ISO 24616
Multilingual Information Framework
https://www.w3.org/TR/mmi-framework/
MMI
Multi-Modal Interaction framework
MS874
META-SHARE speech genre classification scheme
META-SHARE audio genre classification scheme
MTBF
Mean Time Between Failures
MTSA
Mean Time Seek Annotations
MTTR
Mean Time To Repair
https://tei-c.org/activities/projects/multilingual-text-tools-and-corpora-multext/
MULTEXT
https://cst.dk/mumin/
MUMIN
MUltiModal INterfaces
MXNet
MacArabic
MacCentralEurope
MacCroatian
MacCyrillic
MacDingbat
MacGreek
MacHebrew
MacIceland
MacRoman
MacRomania
MacSymbol
MacThai
MacTurkish
MacUkraine
http://www.openclinical.org/medTermMesh.html
MeSH
Medical Subject Headings
Microsoft Cognitive Toolkit
A type of content included in a language resource pertaining to the morphological level
Morphological type
Classification scheme of the National Library of Medicine (https://www.nlm.nih.gov/class/)
NLM classification scheme
https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=oaxal
OAXAL
OASIS Open Architecture for XML Authoring and Localization Reference Model
OMTD-SHARE rights statement scheme
OpenAIRE
PAROLE_genre classification scheme
PAROLE_topic classification scheme
PyTorch
Framework for multimedia processing
Pyannote
REST
RGB
Rights_Statements_Org scheme
Programming language designed for performance and safety, especially safe concurrency
Rust
SNR
SOAP
https://spdx.org/licenses/
Value for LicenceIdentifierScheme referring to the codes (identifiers) for licences used by SPDX (https://spdx.org/licenses/))
SPDX
https://www.gala-global.org/srx-20-april-7-2008
SRX
Segmentation Rules eXchange
Scikit-Learn
Semantic Annotation Framework
https://www.iso.org/standard/76443.html
SemAF-DA
Semantic Annotation Framework - Dialogue Acts
https://www.iso.org/obp/ui/#iso:std:iso:ts:24617:-5:ed-1:v1:en
SemAF-DS
Semantic Annotation Framework - Discourse Structure
http://semanticweb.kaist.ac.kr/research/tc37sc4/new_doc/iso_tc37_sc4_N796_wg2_proposed_WD_24617-3_SemAF-NE.pdf
SemAF-NE
Semantic Annotation Framework - Named Entities
https://www.iso.org/standard/56866.html
SemAF-SR
Semantic Annotation Framework - Semantic Roles
A type of content included in a language resource pertaining to the semantic level
Semantic type
Shift_JIS
A speech processing toolkit
Speechbrain
https://www.iso.org/standard/62508.html
SynAF
Syntactic Annotation Framework
A type of content included in a language resource pertaining to the syntactic level
Syntactic type
A type of content included in a language resource pertaining to the syntactico-semantic level
Syntactico-semantic type
T-H pair
https://www.tbxinfo.net/tbx-about/
TBX
TermBase eXchange
https://tei-c.org/
TEI
Text Encoding Initiative
https://www.tei-c.org/Vault/GL/P3/index.htm
TEI P3
Text Encoding Initiative P3
https://tei-c.org/Vault/P4/doc/html/
TEI P4
Text Encoding Initiative P4
https://tei-c.org/guidelines/p5/
TEI P5
Text Encoding Initiative P5
Tensor Flow lite
TF lite
TIS-620
http://www.opentag.com/tmx.htm
TMX
Translation Memory eXchange
TST-Centrale
TensorFlow
A speech processing toolkit based on TensorFlow
TensorFlowTTS
3D
3G
http://www.timeml.org/
TimeML
Time Markup Language
A deep learning framework/library for vision; a deep-learning library created by Ross Wightman; it is a collection of SOTA computer vision models, layers, utilities, optimizers, schedulers, data-loaders, augmentations and also training/validating scripts with ability to reproduce ImageNet training results
timm
A type of content included in a language resource relevant to the translation procedure
Translation type
2D
http://www.udcc.org/
UDC
Universal Decimal Classification
UIMA
US-ASCII
UTF-16
UTF-16BE
UTF-16LE
UTF-8
A type of content included in a language resource encoding usage information
Usage type
VGA
VOIP
http://www.xces.org/
XCES
Corpus Encoding Standard for XML
http://docs.oasis-open.org/xliff/v1.2/os/xliff-core.html
XLIFF
XML Localization Interchange File Format
YUV
A-law
AAC
Advanced Audio Coding
abbreviation
Value for 'user type' reserved for individuals affiliated to academir or, in general, educational or research institutions and the respective organisations
academic
academic
ACA
academic institution
academic text
Value for 'ConditionOfUse' that states that the language resource can only be used for academic purposes only (research, teaching, etc.)
academic use only
academic user
accentuation
Value for 'Version type' used for versions as accepted for publication (following a reviewing process that may have introduced changes from the submitted version)
accepted
Value for distribution medium, used for corpora and lexical/conceptual resources that can be accessed via a human-intended interface (e.g., corpus workbench, lexicon browsing interface, etc.)
accessible through interface
Value for dataset distribution medium, used for lexical/conceptual resources that can be accessed via a SPARQL endpoint
accessible through query
adapter transformer
administrative text
adult
advertising
Advertising & Public Relations
The science, art, or practice of cultivating the soil, producing crops, and raising livestock and in varying degrees the preparation and marketing of the resulting products [https://www.merriam-webster.com/dictionary/agriculture]
Agriculture
air traffic control
airflow
alcohol
Value for licence category; for licences that allow access to resources only after signing the agreement
allows access with signature
Value for licence category; for licences that allow access to resources without any interactions with the user
allows direct access
Value for licence category; for licences that permit processing of data resources that produce research outputs (e.g., training of models, annotations, etc.)
allows processing
analysis
anaphora
anaphora resolution
Android
anechoic chamber
animal vocalization
Animals (Zoology)
A corpus that has been processed and thus includes the raw data and the automatic or manual annotations that have been added to them
annotated corpus
A set of elements and values designed to annotate data. It usually consists in a formal representation. It aims to represent a specific level of information, such as morphological features of words, syntactic dependency relations between phrases, discourse level information, etc. It can consist of a flat structure of elements and values (e.g., part-of-speech tags) or it can be more complex with interrelated elements (e.g., specific morphological features to be used for each part-of-speech).
annotation scheme
A corpus of annotations without the raw data
annotations corpus
antonym
Value for 'Documentation type' used for the specification description of an API
API specification
application word
Applied Physics
Architecture
arm
article
An article from a journal or magazine
article
Arts & Recreation
aspect
assisted
association
Astronomy
altrac
attached
Value for 'ConditionOfUse' that states that the use of a language resource must acknowledge it with an attribution note usually according to terms specified in the licensing document
attribution
Value for 'MediaType' for the form of a language resource (part) composed of or relating to sound(s)
audio
Value for 'Process mode' and its subclasses used for fully automatic processes where the results of the process have undergone no manual edits
automatic
automatic
auxiliary
For resources available through the infrastructure
available
For resources that are available through other distribution channels
available through other distributors
avi
background noise
background noise
banking
bayesian network
bidirectional recurrent neural network
bidirectional RNN
Big Endian
bigram
Value for 'Linguality type' for resource (parts) which include material in two languages
bilingual
bilingualized
Biochemistry
Biology
black box
blog text
bluRay refers to a kind of distribution/access medium used for delivering a language resource; intended for resources delivered on bluRay disks
bluRay
Value for 'ModalityType' for body gestures represented in a language resource
body gesture
book
booklet
boolean
branch
broadcast news
business
byte
call-in
camcorder
camera
caption
caption
capture time
case
chart
chat text
Chemical Engineering
Chemistry
child
chunking
class
clause
clause structure
clipping rate
clitic form
inflectional form
close talk microphone
closed public place
cognates
collocation
Value for 'ModalityType' for language resources that include various modalities
combination of modalities
Commerce (Trade)
Value for 'user type' reserved for individuals affiliated with a commercial institution and the respective organisations
commercial
Value for non-speech item, used for commercials (advertisements)
commercial
commercial support
commercial user
The Assigner permits/prohibits the Assignees to communicate the Asset to the public
communicate
Communications
Value for 'Multilinguality type' for resource parts or resources that consist of the similar text segments in two or more languages (e.g., a corpus of legislation texts in Greek and English)
comparable
compatible
complex
component
compound
A lexicon which is intended for computational purposes and thus contains words associated with information relevant for the specific purposes
computational lexicon
Computer Science, Information & General Works
concept
conference room
Construction of Buildings
consulting firm
Value for 'Validation type' indicating that the validation is performed on the contents of the resource
content
conversation
convolutional neural network
CNN
coordination
cordis
rcn
corpora
Value for 'Resource type' for corpora, i.e. sets of datafiles (e.g., texts, audio recordings, videos, images, photographs, etc.)
corpus
credit card number
cross-reference
cross talk
culture
data
data evaluator
daylight
debate
defence
definition
gloss
parsing of definitions
degree
Value for 'DocumentationType' used for demos of a resource available usually through the internet (e.g., video demos)
demo resource
dental prosthesis
department
derivation
diagnostic
diagram
Value for 'LanguageVarietyType' used for a particular form of a language which is peculiar to a specific region or social group
dialect
dialogue
A book or electronic resource that contains a list of words (usually in alphabetical order) and explains their meanings, or gives a word for them in another language and other information (e.g., spelling, pronunciation, etc.)
dictionary
diphone
diphone
Value for access rights statements; for resources that can be accessed without interactions with the user (e.g., for resources that can be directly downloaded by users without logging into a system or signing a contract)
directly accessible
disaster management
discourse marker
discussion
discussion forum
Value for software distribution form, used for software objects distributed as Docker images, i.e. packaged together with all their dependencies and ready to be executed
docker image
document
domain
Value for distribution medium, used for language resources that can be downloaded from a remote location
downloadable
Value for 'Version type' used for versions during the writing up process
draft
drop
DV
https://en.wikipedia.org/wiki/International_Article_Number
Value for resourceIdentifierSchemeName, referring to EAN (International Article Number, also known as European Article Number) and more specifically to the EAN 13 standard (cf. https://en.wikipedia.org/wiki/International_Article_Number)
EAN 13
European Article Number 13 standard
Earth Sciences & Geology
Economics
Education
https://en.wikipedia.org/wiki/International_Standard_Serial_Number
Value for resourceIdentifierSchemeName, referring to the e-ISSN (electronic International Standard Serial Number, https://en.wikipedia.org/wiki/International_Standard_Serial_Number)
e-ISSN
elderly
element
European Language Grid identifier scheme (used for internal purposes)
ELG
elicited
email text
embedded microphone
embeddings
emotional expressive
encyclopaedic text
energy
Energy
Engineering
entry
enumeration
Environment
Value for 'Funding type' for projects with EU funds
EU funds
euro
Value for 'ConditionOfUse' that states that the language resource can only be used for evaluation purposes
evaluation use
add odrl
event type
example
example
Value for software distribution form, used for software objects available as binary files already compiled and ready to be executed
executable code
expression
extrinsic
face
face to face conversation
face-to-face conversation
facebook
Value for 'ModalityType' for facial expressions represented in a language resource
facial expression
faculty
farfield microphone
feature
female
few
fiction
file
file
finance
5-gram
fix
FLAC
Free Lossless Audio Codec
flash
float
floating point
foot
foreign affairs
Value for 'Validation type' indicating that the validation is performed on formal aspects of the resource (e.g., size, syntactic validation, etc.)
formal
4-gram
4:2:2
frame
frame
A lexical database based on annotating examples of how words are used in actual texts in accordance to the notion of 'semantic frame' (schematic representation of a situation involving various participants, props and other conceptual roles); originally built for English and extended to other languages according to the same design principles
FrameNet
free speech
free support
frequency
frog story
full
funding authority
gated recurrent units
GRUs
gb
gigabyte
gender
General
Value for 'Documentation type' used for documents that describe a language resource or some aspect of it
general documentation
generation
Geography & Travel
GitHub
GitLab
glass box
Google Chrome OS
grammar
graph
Graphic Arts & Decorative Arts
graphic card
https://www.grid.ac/
GRID identifier scheme (Global Research Identifier Database, https://www.grid.ac/)
GRID
group
hand
Value for data distribution medium, used for resources that are delivered on a hard disk
hard disk
hard disk
Hardware & Household Appliances
head
health
Hidden Markov Models
HMM
high
high
high
History
Home & Family Management
hour
hour
human
human non speech
hyperbaric
hyperonym
hyponym
iOS
idiomatic expression
Value for 'Media type' for the for of a language resource (part) consisting of visual representations (e.g., pictures, photos)
image
image
image
impact
in book
in car
in collection
in proceedings
included
The Assigner permits/prohibits the Assignees to incorporate the Asset unmodified into a Collective Work.
incorporate
industrial
inflection
Value for 'ConditionOfUse' which states that users must inform the licensor as to the kind of use they intend to make of the resource
inform licensor
information
ingested record
Value for 'Documentation type' used for documents that contain instructions on how to install a software
installation manual
institute
instruction
instrumental music
insurance
integer
Value for 'Process mode' and its subclasses used for processes that include some interaction between the processing tool and the human agent performing the process
interactive
interactive
Value for identifier scheme; for identifiers created and used by systems for internal purposes
internal
internal record
internet
interview
interview
intrinsic
irregular form
http://www.islrn.org
Value for LRTIdentifierScheme referring to ISLRN (International Standard Language Resource Number, http://www.islrn.org/)
ISLRN
isolated digit
isolated word
http://www.issn.org
Value for resourceIdentifierSchemeName, referring to the ISSN (International Standard Serial Number, http://www.issn.org/)
ISSN
International Standard Serial Number
Value for 'DocumentationType' for issue-tracking tools used by software developers usually available through a dedicated web page
issue tracker
http://www.istc-international.org/
Value for resourceIdentifierSchemeName, referring to the ISTC (International Standard Text Code, http://www.istc-international.org/)
ISTC
International Standard Text Code
item
Value for 'LanguageVarietyType' used for a particular type of language mainly used in a specific context (e.g., by a professional or social group) which may not be well understood outside that context
jargon
journalistic text
k-nearest neighbours
k-NN
kilobyte
kb
keep
keyword
laboratory
Language
Value for 'Resource type' for language descriptions, e.g., grammars, machine learning models, etc., i.e. language resources that formally describe a language
language description
Value for 'ConditionOfUse' which states that the resource can only be used for research purposes in the Language Enginnering / Language Technology domain.
language engineering research use
language service provider
large company
large membrane microphone
large public
laryngograph
laryngograph
lavalier microphone
Law
law
lecture
lecture room
leg
lemma
lemma
letter
letter
Value for 'Processing resource type' for lexical and conceptual resources (e.g., lexica, ontologies, gazetteers, terminological lists, dictionaries, thesauri, etc.)
lexical/conceptual resource
lexical type
lexical unit
(a list of) all the words used in a particular language or subject, or a dictionary [https://dictionary.cambridge.org/dictionary/english/lexicon]
lexicon
Value for software distribution form, used for software objects in the form of libraries (e.g. Java, python libraries)
library
LCC
Library of Congress classification system
LCSH
Library of Congress Subject Headings
licensed with a fee
licensed without a fee for all uses
licensed without a fee for specific uses
A branch of science (such as biology, medicine, and sometimes anthropology or sociology) that deals with living organisms and life processes —usually used in plural
life sciences
linear PCM
Linguistics
linkedIn
Linux
Value for resourceIdentifierSchemeName, referring to the linking ISSN or ISSN-L
LISSN
ISSN-L
literary text
Literature, Rhetoric & Criticism
Little Endian
logistic regression
Long short-term memory
LSTM
low
low
low
http://www.webcitation.org/getfile?fileid=27fcd073ea70199946ace15c6868520be2cab2ab
Value for resourceIdentifierSchemeName, referring to the LSID (Life Sciences Identifier, http://www.webcitation.org/getfile?fileid=27fcd073ea70199946ace15c6868520be2cab2ab)
LSID
Life Sciences Identifier
MAC-OS
a dictionary usually meant for humans in a form that a computer can process
Machine Readable Dictionary
MRD
male
Management & Public Relations
Value for 'Process mode' and its subclasses used for processes that are performed manually
manual
manual
Manufacturing
map task
a resource consisting of mapping values and/or rules between two resources
mapping of resources
masters thesis
Mathematics
https://maven.apache.org/pom.html#Maven_Coordinates
Value for LRTIdentifierScheme referring to the Maven coordinates (cf. Maven POM, https://maven.apache.org/pom.html#Maven_Coordinates)
maven
mb
megabyte
media
medication
Medicine & Health
medium
medium
medium
meeting
meeting proceedings
member of association
meronym
http://www.meta-share.org
Value for LRTIdentifierScheme referring to the identifier assigned by META-SHARE (http://www.meta-share.org/)
META-SHARE
in MS, we had MetaShareId and not the scheme; in general, discuss the use of "local resource identifier scheme" from DataCite as a value
microphone
microphone
microphone array
minute
minute
mispronunciation
Value for 'Process mode' and its subclasses used for mixed processes, usually involving an automatic processing followed by human edits
mixed
mixed
mixed
mixed
model
modify
money amount
Value for 'Linguality type' for resource (parts) which include material in one language
monolingual
monologue
mood
a lexicon with morphological information associated with its entries
morphological lexicon
Value for 'EncodingLevel' for the particular linguistic level that relates to the study of word formation (such as inflection, derivation and compounding)
morphology
mouth
mov
mp3
mpeg
mpg
mu-law
multi-word unit
multi-word expression
multi-word unit
multi-word expression
Value for 'Linguality type' for resource (parts) which include material in three or more languages
multilingual
Value for 'Multilinguality type' for resource parts or resources that consist of segments including text in two or more languages (e.g., the transcription of a European Parliament session with MPs speaking in their native language)
multilingual single text
multilogue
multiple sources
music
Music
naive Bayes
narrative
Value for 'Funding type' for projects funded by national authorities
national funds
national security
native
natural
natural number
neologism
network
News, Media, Journalism & Publishing
n-gram model
no
no
Value for 'ConditionsOfUse' which states that the licence imposes no conditions
no conditions
Value for 'ConditionOfUse' which states that the users of the resource are not allowed to share derivatives of the resource
no derivatives
no
no
Value for 'ConditionOfUse' which states that users are not allowed to redistribute the resource
no redistribution
no
noise
noise
Value for 'ConditionOfUse' which states that the language resource cannot be used for commercial purposes, i.e. where there is some direct profit from using it
non-commercial use
non-deep learning model
shallow learning model
non fiction
non interactive
nonNative
none
none
none
none
none
none
note
note
recognition of noun phrases
NP structure
Structure of noun phrases
number
http://www.openarchives.org/OAI/2.0/guidelines-oai-identifier.htm
Value for LRTIdentifierScheme referring to the OAI identifier (Open Archives Initiative, http://www.openarchives.org/OAI/2.0/guidelines-oai-identifier.htm)
OAI
OCR system
office
official
Ogg Vorbis
https://services.openminted.eu/home
Value for LRTIdentifierScheme referring to the identifier assigned by the OpenMinTeD platform (https://services.openminted.eu/home)
OMTD
https://guidelines.openminted.eu/components_resourceIdentifier.html
Value for LRTIdentifierScheme referring to the recommended by OpenMinTeD use of identifiers for Docker images uploaded to the registry (https://guidelines.openminted.eu/components_resourceIdentifier.html)
OMTD-Docker
Value for 'Documentation type' for the online help site of a language resource
online help URL
online educational game
Value for 'ConditionsOfUse' which states that the language resource can only be used by members of the MetaShare network
only META-SHARE members
should we also add something more generic for restricting a resource to specific group members (as for spatial constraint)? and I don't think the odrl is ok.
a set of concepts and categories in a subject area or domain that shows their properties and the relations between them [https://en.oxforddictionaries.com/definition/ontology]
ontology
The free and online availability of literature, which allows to read, download, copy, distribute, print, search, or link to the ful text, crawl articles for indexing, pass them as data to software, or use them for any other useful purpose. An availability that is granted without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself, and those related to giving authors control over the integrity of their work and the right to be properly acknowledged and cited.
open access
open public place
OS-independent
Value used when none of the recommended values of an element is appropriate for an item
other
Value for 'ConditionOfUse' that states that the language resource can only be used with unspecified restrictions (not covered by the controlled vocabulary)
other specific restrictions
output text
overlapping
Value for 'Funding type' for self-funded projects
own funds
Painting
Paleontology
Value for data distribution medium, used for language resources that are delivered in paper format
paper copy
paragraph
Value for 'Multilinguality type' for resource parts or resources that consist of the equivalent text segments in two or more languages (e.g., an original and its translation equivalents)
parallel
parsing
part of speech
partial
pear story
https://link.springer.com/chapter/10.1007/978-94-010-0201-1_1
Penn treebank
person
pharmaceutics
PhD thesis
Philosophy
phoneme
phoneme
phoneme
phonetic unit
phonetically balanced sentence
phonetically rich sentence
phonetically rich word
Value for 'EncodingLevel' for the particular linguistic level that relates to the study of speech sounds
phonetics
Value for 'EncodingLevel' for the particular linguistic level that relates to the study of speech sounds that constitute the fundamental components of a language
phonology
Photography, Computer Art, Film, Video
phrase
phrase
Physics
plain
planned
Plants
Value for software distribution form, used for software objects in the form of plugins (e.g. Chrome browser extension)
plugin
https://www.ncbi.nlm.nih.gov/pmc/pmctopmid/
Value for resourceIdentifierSchemeName, referring to the PMCID (PubMed Central Identifier, https://www.ncbi.nlm.nih.gov/pmc/pmctopmid/)
PMCID
PubMed Central Identifier
https://www.ncbi.nlm.nih.gov/pmc/pmctopmid/
Value for resourceIdentifierSchemeName, referring to the PMID (PubMed Identifier, https://www.ncbi.nlm.nih.gov/pmc/pmctopmid/)
PMID
PubMed Identifier
Political Science
politics
PP attachment
Attachment of Prepositional phrases
Value for 'EncodingLevel' for the particular linguistic level of analysis that studies the relationship of sentences to the environment in which they occur
pragmatics
http://ufal.mff.cuni.cz/pdt2.0/
Prague treebank
PDT
Prague Dependency Treebank
predicate
presentation
private
proceedings
Value for access rights; for data resources whose licence permits their processing (resulting in research outputs)
processable
prompted
proper noun
prosodic boundary
Psychology
public
PUB
public administration
Public Administration
public organization
Value for 'Documentation type' used for published documents about a language resource (mainly meant for documents that describe the resource itself rather than its deployment)
Value for licence category; for licences that allow access to resources only for logged in users that have fulfilled further requirements (e.g. paid the fees for a commercial licence)
publication
Value for 'Documentation type' for a published document describing the language resource and which is recommended by the resource creator(s) to be used for citation purposes when the resource is mentioned in another publication
publication for citation
Value for 'Version type' used for versions as published in digital or print form
published
published record
publishing
qualia structure
question
question answer
radio
random forest
random decision forest
raster
The form of the corpus that has not been enriched with annotations
raw corpus
re3data
read speech
real audio
recurrent neural network
RNN
Value for 'ConditionsOfUse' which states that users are encouraged to deposit any modified versions of the resource they make to the distribution channel through which they have acquired the resource
redeposit
reflexivity
Value for 'Funding type' for projects financed by regional authorities
regional funds
register
Religion
Value for 'ConditionOfUse' that states that the licensor requires the submission of a research plan in order to grant access to the language resource
request plan
Value for licence category; for licences for resources that can be access only by logged in users
requires user authentication
research organization
Value for 'ConditionOfUse' stating that the resource can be used only for research purposes
research use
add ODRL
restricted
RES
Access is subject to restrictions or controls regarding users (e.g., limited to researchers or members of a specific group) or use (e.g., for research only, noncommercial use, etc.); subscription or registration may also be required and access may incur financial costs
restricted access
review
role play
roundtable
rule
Value for 'DocumentationType' used for samples of a language resource
samples
scene
Scholarly communication can be defined as “the system through which research and other scholarly writings are created, evaluated for quality, disseminated to the scholarly community, and preserved for future use. The system includes both formal means of communication, such as publication in peer-reviewed journals, and informal channels, such as electronic listservs.” (Association of College & Research Libraries, “Principles and Strategies for the Reform of Scholarly Communication 1,” 2003)
scholarly communication
school
Science
Value for personIdentifierSchemeName, referring to the Scopus identifier for authors (https://www.scopus.com)
Scopus ID
script
Sculpture, Ceramics and Metalwork
second
second
segment
semantic class
semantic feature
semantic trait
semantic relation
semantic role
semantic unit
semantic unit
Value for 'EncodingLevel' for the particular linguistic level of analysis that relates to the meaning of a word, phrase, etc.
semantics
semi-interactive
semi-planned
sentence
sentence
Special case of transformers tuned for sentences
sentence transformer
Value for 'Funding type' for projects financed under service contracts
service contract
Value for 'ConditionOfUse' that states that the derivatives of the language resource must be distributed with the same or a compatible licence
share alike
shorten
shot
shot
Value for 'ModalityType' for sign languages represented in a language resource
sign language
signal
signed integer
single source
sleep deprivation
SME
Social Problems & Social Services
The social sciences are academic disciplines concerned with the study of the social life of human groups and individuals including anthropology, economics, geography, history, political science, psychology, social studies, and sociology. The social sciences consist of the scientific study of the human aspects of the world.[https://en.wikipedia.org/wiki/Category:Social_sciences
Social Sciences
society
Sociology & Anthropology
some
song
sound
Sound blaster card
sound recording
Value for software distribution form, used for software that is available from the same location both as source code and executable program
source and executable code
Value for software distribution form, used for software objects in the form of source code, i.e. as files with the commands ready to be compiled or assembled into an executable computer program
source code
Value for 'ConditionsOfUse' which is used for resources with spatial restrictions, e.g., in cases where the content is available or can be used only at a single location, center, or site
spatial constraint
speaker noise
special hardware equipment
speech
Value for 'ModalityType' for language resources including spoken language words and expressions
spoken language
spontaneous
Sports, Games & Entertainment
standardization body
statistical property
Statistics
string
studio
studio equipment
subcategorization frame
Value for 'Version type' used for versions as submitted for publication
submitted
subsidiary
subtitle
subtitle
subtitle
support vector machine
SVM
support vector network
syllable
syllable
syllable
synonym
synset
syntactic unit
syntactic unit
syntactico-semantic link
Value for 'EncodingLevel' for the particular linguistic level that relates to the study of the structure of linguistic units (phrases, sentences)
syntax
system evaluator
a flat list of valid values (tags) designed to annotate data; it usually corresponds to a specific annotation type or set of annotation types
tagset
VHS tape
terabyte
tb
technical report
technical text
technological
Technology
teenager
telecommunications
telephone
telephone conversation
fixed telephone
IP telephone
mobile telephone
tempo
tense
term
A lexical resource that lists concepts pertaining to a specific domain
terminological resource
term list
Value for 'MediaType' for the written or printed form of a language resource (part)
text
text
numerical text
A reference work that lists words grouped together according to similarity of meaning (containing synonyms and sometimes antonyms) [https://en.wikipedia.org/wiki/Thesaurus]
thesaurus
parsing of titles
token
token
Value for 'DocumentationType' used for the training material of a resource (e.g., video tutorials, screencasts, guided tours, etc.)
training resource
Value for 'ConditionOfUse' which states that the resource can only be used for training purposes
training use
add ODRL representation
phonetic transcription
A transformer is a deep learning model that adopts the mechanism of attention,
differentially weighing the significance of each part of the input data. It is used primarily in the field of natural language processing (NLP) and in computer vision (CV)
transformer
https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)
transformer
translation equivalent
translation unit
tu
transport
Transportation
travel industry
trigram
Basic principles observed
TRL1
Technology concept formulated
TRL2
Experimental proof of concept
TRL3
Technology validated in lab
TRL4
Technology validated in relevant environment (industrially relevant environment in the case of key enabling technologies)
TRL5
Technology demonstrated in relevant environment (industrially relevant environment in the case of key enabling technologies)
TRL6
System prototype demonstration in operational environment
TRL7
System complete and qualified
TRL8
Actual system proven in operational environment (competitive manufacturing in the case of key enabling technologies; or in space)
TRL9
troponym
truncation
turn
tv
tweet
twitter
A set of elements designed to annotate data; it typically contains only a list of annotation types, i.e. specific labels that are used for the annotation (e.g., part-of-speech, person, organization, etc.), and is usually inbuilt in the annotation software
typesystem
For resources whose availability is still pending.
under negotiation
unigram
unit
unit
UNIX
unknown
unknown
unknown
unknown
unknown
unpublished
unsigned integer
Value used for mandatory elements whose value is unknown or cannot be specified
unspecified
Value for 'Version type' used for versions that have been updated vis-a-vis the previous published version
updated
url
usage
Value for 'ConditionOfUse' which states that the resource can only be accessed by identified/authenticated users
user identified
user input text
Value for 'Documentation type' for documents that include instructions and examples on the usage of a tool/service (or resource in general)
user manual
utterance
utterance
variable
variant
VCV sequence
vector
very high
very high
very low
very low
Value for 'Media type' for the audiovisual form of a language resource (part)
video
video
visualization output
Value for 'ModalityType' for voice
voice
voice
webcam
Value for software distribution form, used for software objects that can be accessed through remote invocation typically using some REST-style APIs or SOAP protocols
web service
webcam
whole body
Windows
windows-1250
windows-1251
windows-1252
windows-1253
windows-1254
windows-1255
windows-1256
windows-1257
windows-1258
windows-31j
wizard of Oz
word
word
word
word
word game
word group
A lexical database originally created for English and extended to other languages, which groups words into sets of synonyms called synsets, provides short definitions and usage examples, and records a number of relations among these synonym sets or their members
WordNet
A written collection of all words derived from a particular source, or sharing some other characteristic [https://www.yourdictionary.com/wordlist]
wordlist
Value for software distribution form, used for software workflows that are made available in the form of the workflow description file
workflow file
Value for 'ModalityType' for texts with written language
written language
x-EUC-CN
x-EUC-JP-LINUX
x-EUC-TW
x-MS950-HKSCS
x-mswin-936
x-windows-949
x-windows-950
yes
yes
yes
yes/no question
yes
yes
youtube
Links a resource to the licence under which it is distributed; values are taken from a list of standard licences
For resources that are documented in the catalogue but the contents of which can only be accessed through other distribution channels.
Available Through Other Distributor
Links to the document used as guidelines for the creation or annotation of a corpus
guidelines
Links to a URL with an image file (e.g., a photo, or cartoon image) that is used for identifying a person (e.g., on a home page)
image
Links a resource to the licence under which it is distributed; values are taken from a list of standard licences
Introduces the telephone number of a person, group or organization; recommended format: +_international code_city code_number without spaces
telephone number
+123456789012
Copyright Not Evaluated
Copyright Undetermined
Embargoed access refers to a resource that is metadata only access until released for open access on a certain date. Embargoes can be required by publishers and funders policies, or set by the author (e.g such as in the case of theses and dissertations).
embargoed access
In Copyright
In Copyright - EU Orphan Work
In Copyright - Educational Use Permitted
In Copyright - Non-Commercial Use Permitted
In Copyright - Rights-holder(s) Unlocatable or Unidentifiable
Metadata only access refers to a resource in which access is limited to metadata only. The resource itself is described by the metadata, but neither is directly available through the system or platform nor can be referenced to an open access copy in an external journal or trustworthy archive.
metadata only access
No Copyright - Other Known Legal Restrictions
No Copyright - Contractual Restrictions
No Copyright - Non-Commercial Use Only
No Copyright - United States
No Known Copyright
Open access refers to a resource that is immediately and permanently online, and free for all on the Web, without financial and technical barriers.The resource is either stored in the repository or referenced to an external journal or trustworthy archive.
open access
Restricted access refers to a resource that is available in a system but with some type of restriction for full open access. This type of access can occur in a number of different situations. Some examples are described below: The user must log-in to the system in order to access the resource The user must send an email to the author or system administrator to access the resource Access to the resource is restricted to a specific community (e.g., limited to a university community)
restricted access