OMTD-SHARE ontology for resources related to Language Technology (LT) and Text and Data Mining.
It was initiated in the framework of the OpenMinTeD project (https://www.openminted.eu) focusing on functions (tasks performed by software components), annotation types (types of information extracted or annotated by such software), methods (classification of the theoretical method used in the algorithm), and data formats of the resoures that can be processed by such software, based on metadata elements used in the META-SHARE metadata schema (http://www.meta-share.org/knowledgebase/homePage) and the FOSTER taxonomy of TDM (https://www.fosteropenscience.eu/resources).
Version 2 has been enriched with work done in the European Language Grid and other LT-related projects and has added LT-related terms in the ontology.
OMTD-SHARE ontology
Richard Eckart de Castilho
Claire Nedellec
Dimitris Galanis
Katerina Gkirtzou
Marta Villegas
Penny Labropoulou
Petr Knoth
Sophie Aubin
2019-05-15
omtd
1.1.0
pre-release 2.0.0
Relates a data format to the IANA mimetype; it can be the exact or a broader mimetype; unofficial mimetypes are also used, but this relation will be revisited
has mimetype
Component A performs Operation B
performs operation
performs Task
The URL link in which a concept is documented
documentation URL
The file extension usually associated with a specific data format (e.g. txt for plain text files, pdf for PDF files etc.)
has file extension
http://korpling.github.io/ANNIS/3.6/user-guide/import-and-config-convert.html
relANNIS
Relational database format used in the ANNIS architecture (https://corpus-tools.org/annis/)
ANNIS
A component that provides access to data resources, e.g. reads a resource or writes the output of a process in a certain format
Access Component
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-AclAnthology
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-de.tudarmstadt.ukp.dkpro.core.io.aclanthology-asl
Data format specific to the ACL Anthology Reference Corpus (http://acl-arc.comp.nus.edu.sg/), most probably version 20080325
ACL Anthology Corpus format
The task/process of identifying in a text a segment inside parentheses that describes an age group or age-related utterance
Age-bracket detection
Any kind of annotation pertaining to entities of the agricultural domain; the use of the AGROVOC thesaurus is recommended
Agricultural entity
https://builds.openminted.eu/job/WP%205.2%20-%20Typesystem%20alignment/eu.openminted.interop$mapping-conversion/doclinks/1/components.html#Aimed_Collection_Reader
Format of the Aimed corpus (225 abstracts from MEDLINE) with the gold standard sentence, protein, protein-protein interaction annotations.
AIMED corpus format
A component that detects and annotates equivalence relations between items (corpora, texts, paragraphs, sentences, phrases, words) in two languages
Aligner
Establishment of translational equivalences between structural units (words, sentences etc.) of a text in a given language and a text with similar meaning in other language(s)
Alignment
The translational equivalent between structural units (words, sentences etc.) of a text in a given language and a text with similar meaning in other language(s)
Alignment
ALLBUS variable
https://en.wikipedia.org/wiki/ALTO_(XML)
ALTO
http://www.lrec-conf.org/proceedings/lrec2006/pdf/742_pdf.pdf
Format for linguistic annotations of documents used for the ALVIS framework
ALVIS Enriched Document format
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5657237/
Anafora
Extractor
A component that is used for analyzing an input text in order to extract specific features/information (e.g. word list), or to produce statements over the whole text (e.g. classify it by topic)
Analyzer
The task/process of marking all linguistic expressions that appear in a certain text or across texts as referring to the same entity and linking it to this entity
Anaphora annotation
The task/process of identifying all linguistic expressions that appear in a certain text or across texts and refer to the same entity
Anaphora resolution
Labeling [en-US]
Labelling
Tagging
The task/process of adding annotations (i.e. labels that act as notes, comments, explanations, etc.) to an item
Annotation
A note by way of explanation or comment added to a text or diagram [OED, https://en.oxforddictionaries.com/definition/annotation]. Text or corpus annotation refers to the interpretative linguistic information grounded in a knowledge resource that is added manually or automatically to a text or corpus respectively.
Any format used for annotated files
Annotation format
The task/process of marking compounds (single words composed of two or more free morphemes) and their parts
Annotation of compounds
Date Detection
Date Recognition
The task/process of marking dates in a text
Annotation of dates
The task/process of adding annotations relevant to the derivational level of analysis (e.g. recognizing derivational affixes, tagging their meaning etc.)
Annotation of derivational features
The task/process of annotating the internal structure of a document (e.g. book chapters, sections in a journal article, title, preface, images/figures etc.)
Annotation of document structure
Quantity detection
Quantity extraction
The task/process of annotating measurement expressions (e.g. of length, temperature, time, etc.)
Annotation of measurements
The task/process of annotating multi-word units, i.e. combinations of words that are considered as one
Annotation of multi-word units
Number detection
The task/process of annotating numbers in a text
Annotation of numbers
Annotation of question topical targets
The task/process of attaching tags to the segment of a question that functions as the topic of that question
Annotation of question topics
The task/process of marking locations in speech where there is a change of speakers
Annotation of speaker turns
The task/process of annotating textual entailments, most usually indicating whether a text entails or contradicts a hypothesis sentence or does neither.
Annotation of textual entailment
Label
Tag
Annotation type
A note by way of explanation or comment added to a text or diagram [OED, https://en.oxforddictionaries.com/definition/annotation]. Text or corpus annotation refers to the interpretative linguistic information grounded in a knowledge resource that is added manually or automatically to a text or corpus respectively
Category/class of the annotations (metadata) that are added to the data/text that is processed
Annotation type Taxonomy
Annotation type Taxonomy
Tagger
A component that annotates any data (text, video, audio etc.), i.e. adds any descriptive or analytic notations (structural, linguistic, etc) to raw data
Annotator
A component that annotates the tokens of a text with Semantic Role labels
Annotator of semantic role labels
Anonymisation [en-UK]
The task/process through which particular text segments or data units that allow the identification of a person are removed or replaced
Anonymization
Argumentation mining
The automatic extraction and identification of argumentative structures from natural language text
Argument mining
The task/process of marking argumentative structures, components and relations in a text
Annotation of argumentation
adapted from wikipedia (https://en.wikipedia.org/wiki/Artificial_neural_network)
A computational model based on a large collection of simple neural units (artificial neurons), loosely analogous to the observed behavior of a biological brain's axons. These systems are self-learning and trained, rather than explicitly programmed.
Artificial Neural Network
ANN
A machine learning method used in recognising relationships among variables in databases and extracted in the form of rules.
Association Rule Learning
The response of the target recipients (audience) to a system, process or event
Audience reaction
Any format used for audio files
Audio format
Audio processing
The task/process of partitioning audio stream into homogeneous segments and classifying them into speech and non-speech segments
Audio segmentation
The task/process of organizing documents into groups/classes on the basis of their author
Author classification
Structure-Based Authoring Assistant
Writing support
The task/process of providing spelling, grammatical or stylistic suggestions as an aid for the authoring task
Authoring support
Software for supporting the distributed creation of consistent, high-quality information on an industrial scale. Key components include terminology extraction for legacy information, terminology checking and hyperlinking integrated in standard authoring environments, as well as structural (syntactic) checking of texts to ensure readability, consistency and translatability.
Automatic hyperlinking
Automatic hyperlinking is the insertion of hyperlinks into text documents by automatic means. The automatic hyperlinking process consists of the identification of hyperlinkable entities (concepts, named entities) in the original text, and the assignment of link targets for each such entity. Named entity recognition is often used for identifying linkable entities, for example geographical locations or organisation names. The assignment of link targets depends on a database with link targets for each entity. In general, there can be several link targets for each entity, for example for a company: its homepage, a map, a stock quote etc. Multiple outgoing hyperlinks from one target are supported by the W3C standard XLINK/XPointer. Automatic hyperlinking is an important technology for enabling the Semantic Web.
The identification of hyperlinkable entities (concepts, named entities) in a text and the assignment of hyperlinks (link targets) for each such entity.
The task/process of automatically adding subtitles to a video
Automatic subtitling
Avatar synthesis
https://avro.apache.org/docs/1.8.1/spec.html
Avro
https://cl.lingfil.uu.se/~sara/blast/README
blast
BLAST
adapted from Wikipedia (https://en.wikipedia.org/wiki/Bayesian_inference)
A method in probability and statistics based on Bayes' theorem, mainly related to statistical inference.
Bayesian
true
Annotation of morphological features
B-PoS Tagging
Below PoS Tagging
The annotation of words with morphological information besides the part of speech and dependent upon it (e.g. for nouns: gender, number and case; for verbs: tense, number, person etc.)
Below Part-of-Speech Tagging
The task/process of inducing word translations from monolingual or comparable corpora in two languages
Bilingual lexicon induction
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-de.tudarmstadt.ukp.dkpro.core.io.bincas-asl
UIMA Binary CAS
Binary format used for CAS data
Binary CAS
Any format of a computer file in which information is stored in the form of ones and zeros, or in some other binary (two-state) sequence; used mainly for executable files or files that need to be interpreted by a computer program
Binary format
Biological activity
Any kind of annotation pertaining to entities of biology
Biological enity
Biomedical concept normalisation
The task of mapping free-form expressions to medical terms [source: Medical concept normalization in social media posts with recurrent neural networks]
Biomedical concept normalization
https://builds.openminted.eu/job/WP%205.2%20-%20Typesystem%20alignment/eu.openminted.interop$mapping-conversion/doclinks/1/components.html#_bionlp_shared_task_2
File format used for the BioNLP Shared Task format
BioNLP
Formats used for BioNLP shared tasks
BioNLP format
true
Formats used for BioNLP shared tasks
BioNLP formats
https://builds.openminted.eu/job/WP%205.2%20-%20Typesystem%20alignment/eu.openminted.interop$mapping-conversion/doclinks/1/components.html#_bionlp_st_2013_a1_a2_1
bioNLP; format-variant=ST2013a1_a2
Format used in BioNLP Shared Task 2013
BioNLP-ST 2013 a1/a2
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-de.tudarmstadt.ukp.dkpro.core.io.bliki-asl
The Java Wikipedia API (Bliki engine) is a parser library for converting Wikipedia wikitext notation to HTML.
blikiWikipedia
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-de.tudarmstadt.ukp.dkpro.core.io.bnc-asl
Data format for the XML version of the British National Corpus (http://www.natcorp.ox.ac.uk/)
BNC format
Body movement
The task/process of identifying segments of text or speech discourse that have been uttered by bots
Bot detection
Brain region
http://brat.nlplab.org/standoff.html
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-de.tudarmstadt.ukp.dkpro.core.io.brat-asl
BRAT stand-off format for annotations (BRAT is a online environment for collaborative text annotation, cf. http://brat.nlplab.org/)
BRAT
The task/process of creating Finite State Automata (a mathematical model of computation)
Building of finite state automata
https://cbor.io/spec.html
https://www.rfc-editor.org/rfc/rfc8949.html
Concise Binary Object Representation
Concise Binary Object Representation (CBOR) data format; defined as an ISO standard (RFC 8949, https://cbor.io/spec.html)
CBOR
https://builds.openminted.eu/job/WP%205.2%20-%20Typesystem%20alignment/eu.openminted.interop$mapping-conversion/doclinks/1/components.html#ExportCadixeJSON
AlvisAE protocol format
Cadixe/JSON
Degree of certainty about the validity of what is being asserted in the text
Certainty level
Certainty level labeling [en-US]
The task/process of adding to statements in a text annotations that indicate the level of certainty of the author vis-a-vis the statement
Certainty level labelling
http://talkbank.org/manuals/CHAT.pdf
Codes for the Human Analysis of Transcripts
CHAT (Codes for the Human Analysis of Transcripts) transcription format; used by CHILDES corpora
CHAT
Any substance (as an acid) that is formed when two or more other substances act upon one another or that is used to produce a change in another substance [https://www.merriam-webster.com/dictionary/chemical]
Chemical
Any kind of annotation pertaining to entities from chemistry
Chemical entity
http://dkpro.github.io/dkpro-core/releases/1.8.0/docs/typesystem-reference.html#de.tudarmstadt.ukp.dkpro.core.api.syntax.type.chunk.Chunk
Group of words that function together; a chunk normally includes a head and some consecutive (i.e. without gaps) preceding words
Chunk
A component that groups tokens of a text into chunks
Chunker
Light parsing
Shallow Parsing
The task/process of dividing a sentence into chunks (non-overlapping text segments consisting of a head and preceding function words and/or modifiers)
Chunking
Shallow parsing refers to a class of techniques for identifying phrasal chunks in texts without assigning deep hierarchical structures. Cascaded finite-state models are often used for shallow parsing. Shallow parsing techniques are used because they are more time-efficient and more error-tolerant than ""deep"" parsers, and give rise to fewer ambiguities.
Reference to a book, paper, or author, especially in a scholarly work.
Citation
adapted from wikipedia (https://en.wikipedia.org/wiki/Cluster_analysis)
Any method used in clustering or cluster analysis, i.e. in grouping a set of objects in such a way that objects in the same group (cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).
k-means, k-nearest neighbours
Clustering Method
A component that annotates tokens of a text with coreference labels, marking expressions that refer to the same entity in the text
Co-reference annotator
Coreference resolution
The task/process of determining all linguistic expressions that refer to the same entity in a certain text or across texts.
The task/process of identifying all linguistic expressions that appear in a certain text or across texts and refer to the same unique entity in the world
Co-reference resolution
https://gate.ac.uk/sale/tao/splitch23.html#sec:creole:pubmed
Format used in Cochrane texts
Cochrane
The identification and removal of abusive, bullying, etc. comments from texts
Comment filtering
Specifies the type of a component, in terms of the function/task it performs
Component type
Component type Taxonomy
Component type Taxonomy
http://dkpro.github.io/dkpro-core/releases/1.8.0/docs/typesystem-reference.html#de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Compound
A single word composed of two or more free morphemes
Compound
The task/process of identifying and representing argumentation in text, so that systems have the ability to use them in tasks, such as automated logical reasoning
Computational argumentation
CAT
Computer-assisted translation
MAHT
Machine-aided human translation
Computer-aided translation
A form of translation performed by a human translator with the aid of software programmes
Techniques that help to increase the productivity of human translators via suitable computational infrastructure, including translation memories, terminology management, partial machine translation, online lexicons, or other techniques that automate parts of the translator's work, such as speech recognition or accelerated typing techniques applied to human translations.
Concept normalisation
The task of mapping free-form expressions to the concepts of a specific domain
Concept normalization
The automatic generation of concordances, i.e. formatted displays of all the occurrences of a particular types in a corpus
Concordancing
The act of generating (listing in a prescribed order) the various inflectional forms of a word
Conjugation
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-Conll2000
The CoNLL 2000 format represents POS and Chunk tags. Fields in a line are separated by spaces. Sentences are separated by a blank new line.
CoNLL-2000
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-Conll2002
The CoNLL 2002 format encodes named entity spans. Fields are separated by a single space. Sentences are separated by a blank new line.
CoNLL-2002
https://zoidberg.ukp.informatik.tu-darmstadt.de/jenkins/job/DKPro%20Core%20Documentation%20(GitHub)/de.tudarmstadt.ukp.dkpro.core$de.tudarmstadt.ukp.dkpro.core.doc-asl/doclinks/5/format-reference.html#format-Conll2003
The CoNLL 2004 format encodes named entity spans and chunk spans. Fields are separated by a single space. Sentences are separated by a blank new line. Named entities and chunks are encoded in the IOB1 format. I.e. a B prefix is only used if the category of the following span differs from the category of the current span.
CoNLL-2003
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-Conll2006
CoNLL-2007
CoNLL-X
The CoNLL 2006 (aka CoNLL-X) format targets dependency parsing. Columns are tab-separated. Sentences are separated by a blank new line.
CoNLL-2006
https://zoidberg.ukp.informatik.tu-darmstadt.de/jenkins/job/DKPro%20Core%20Documentation%20(GitHub)/de.tudarmstadt.ukp.dkpro.core$de.tudarmstadt.ukp.dkpro.core.doc-asl/doclinks/5/format-reference.html#format-Conll2008
The CoNLL 2008 format targets syntactic and semantic dependencies. Columns are tab-separated. Sentences are separated by a blank new line.
CoNLL-2008
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-Conll2009
The CoNLL 2009 format targets semantic role labeling. Columns are tab-separated. Sentences are separated by a blank new line.
CoNLL-2009
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-Conll2012
The CoNLL 2012 format targets semantic role labeling and coreference. Columns are tab-separated. Sentences are separated by a blank new line.
CoNLL-2012
Formats used in the CoNLL Shared Tasks
CoNLL format
http://universaldependencies.org/docs/format.html
Format used for CoNLL.
CoNLL-U
A component that builds a constituency tree from typically token and part-of-speech annotations
Constituency parser
Natural Language Parsing
Phrase parsing
The task/process of identifying and marking constituents (phrases, governed by a head and including function words and/or modifiers ) in a text or text segment
Constituency parsing
Parsing (from Latin ""pars orationis"" = parts of speech) is the syntactic analysis of languages. Natural Language Parsing is the syntactic analysis of natural languages, such as Finnish or Chinese. The objective of Natural Language Parsing is to determine parts of sentences (such as verbs, noun phrases, or relative clauses), and the relationships between them (such as subject or object). Unlike parsing of formally defined artificial languages (such as Java or predicate logic), parsing of natural languages presents problems due to ambiguity, and the productive and creative use of language.
An ordered, rooted tree that represents the syntactic structure of a string according to a constituency grammar (= phrase structure grammars). It distinguishes between terminal and non-terminal nodes. The interior nodes are labeled by non-terminal categories of the grammar (phrases), while the leaf nodes are labeled by terminal categories (parts of speech). [adapted from https://en.wikipedia.org/wiki/Parse_tree]
Constituency tree
http://dkpro.github.io/dkpro-core/releases/1.8.0/docs/typesystem-reference.html#de.tudarmstadt.ukp.dkpro.core.api.syntax.type.constituent.Constituent
The automated analysis of large volumes of content of any form or medium (e.g. text, images, videos, graphs, metadata etc.) that leads to the discovery of previously undiscovered information (e.g. identification of relationships between entities).
Content Mining
A set of statements that contradict each other (i.e. one of them asserts the truth and the other the falsity of the proposition)
Contradiction
The task/process of identifying conflicting statements (contradictions) in a dataset
Contradiction detection
A component that tries to automatically recognize elements that reveal contradiction in a text
could also be an annotator
Contradiction detector
Chatbots building
Conversational agents building
Dialogue systems building
Embodied agents building
Virtual agents building
All activities related to the creation of conversational/dialogue systems (i.e. computer systems intended to converse with humans, through one or more communication modalities, such as text, speech, graphics, haptics, gestures)
I have put all terms together as alternative labels, but maybe they should be distinguished at a later stage
Conversational systems building
A component that performs conversion between formats of a resource
Converter
Co-reference annotation
Coreference identification
The task/process of attaching tags to a text unit and linking it to other text units that refer to the same entity in the world
Coreference annotation
A format used by a specific type of corpus (collection of texts)
Corpus format
The task/process of managing corpora, e.g. creating, viewing, etc. through an integrated environment
Corpus management
A component that supports humans in accessing the contents of a corpus
Corpus viewer
The task/process of viewing the contents of a corpus as performed by human beings
Corpus viewing
web crawler
A component that crawls the web and collects data from various web sites
Crawler
Crawling
Web Crawling and Spidering
The use of bots that crawl the web (crawlers) in order to spot content that matches user-set criteria and download them to create large datasets
Web crawling
CLIR
Cross-lingual information retrieval
Cross-lingual search
Translingual information retrieval
Cross-language Information Retrieval
Cross-language information retrieval means using queries in one language to search for documents in a different language. Multilingual information retrieval is a broader term, which includes the case where queries in different languages are used, but only for searching documents in the same language.
The ability of a system to retrieve relevant documents in various languages in response to a user query that is formulated in only one language
The practice of obtaining needed services, ideas, or content by soliciting contributions from a large group of people and especially from the online community rather than from traditional employees or suppliers
Crowdsourcing
A component that supports crowdsourcing operations
Crowdsourcing component
Comma-separated values
Data format with comma-separated values
CSV
The process of gathering and measuring information on targeted variables in an established systematic fashion, which then enables one to answer relevant questions and evaluate outcomes.
Data collection
A component that collects (retrieves) data from various sources
Data collector
Data type
File format
The format of a computer file storing data
Data format
Data format Taxonomy
Data format Taxonomy
The process of hiding original data with modified content (characters or other data) in order to protect data classified as personal identifiable data, or sensitive data [adapted from Wikipedia]
Data masking
A component that supports data merging from various sources
Data merger
The task/process of merging (combining) together data from various sources
Data merging
A component that performs data splitting for cross validation purposes
Data splitter
The task/process of splitting (partitioning) available data into parts, usually for cross-validatory purposes, e.g. in order to use one part for training purposes and the other for evaluation.
Data splitting
Data to Text Generation
Formats used for databases
Database format
https://gate.ac.uk/sale/tao/splitch23.html#x28-59500023.32
Common format for social media data from http://datasift.com
DataSift/JSON
http://dkpro.github.io/dkpro-core/releases/1.8.0/docs/typesystem-reference.html#de.tudarmstadt.ukp.dkpro.core.api.ner.type.Date
A component that is used in the debugging process
Debugger
The task/process of removing errors from a computer programme
Debugging
adapted from (http://scikit-learn.org/stable/modules/tree.html)
and wikipedia (https://en.wikipedia.org/wiki/Decision_tree)
A non-parametric supervised learning method used for classification and regression. The goal is to create a tree-like graph or model of decisions and their possible consequences by learning simple decision rules inferred from the data features.
Decision Trees
adapted from wikipedia (https://en.wikipedia.org/wiki/Deep_learning)
A branch of machine learning based on deep neural networks. A deep neural network (DNN) is an artificial neural network (ANN) with multiple hidden layers of units between the input and output layers.
Deep Learning
The task/process of measuring the speed and accuracy of deep parsers with respect to a manually parsed test corpus.
Deep parser performance evaluation
The task of parser evaluation is to measure the speed and accuracy of parsers with respect to a manually parsed test corpus. In the evaluation of so-called deep parsers, which are based on comprehensive linguistic theories, the assignment of the correct semantic analysis plays an important role. Another aspect is the handling and reduction of ambiguities, which is important not only from the linguistic point of view but because it plays a major role in the time and memory requirements of deep parsers. Test data can either be a treebank constructed from naturally occurring data, such as the ""Wall Street Corpus"", or a ""test suite"" containing constructed examples which cover all interesting linguistic phenomena. An important issue in deep parser evaluation is the comparison of parsers and grammars based on different linguistic theories.
The task/process of building complete parse trees for a sentence [adapted from https://stackoverflow.com/questions/37020577/shallow-parsing-vs-deep-parsing-in-stanford-corenlp-java]
Deep parsing
http://dkpro.github.io/dkpro-core/releases/1.8.0/docs/typesystem-reference.html#de.tudarmstadt.ukp.dkpro.core.api.syntax.type.dependency.Dependency
The task/process of converting constituency structures to dependency trees
Dependency conversion
A component that converts a constituency tree into a dependency tree
Dependency converter
A component that generates a dependency tree from typically token and part-of-speech annotations
Dependency parser
adapted from https://nlp.stanford.edu/software/nndep.shtml
The task/process of identifying and marking the grammatical structure of a sentence, establishing relationships between ""head"" words and words that modify those heads
Dependency parsing
A tree that represents the dependency relations in a sentence, i.e. showing the governor (head) and its dependents with directed links
Dependency tree
The analysis of a word in order to identify its derivation, i.e. whether and how it has been formed on the basis of another word (e.g. through the use of affixes)
Derivational analysis
Any feature relevant to the derivation process of a word (e.g. marking affixes, their meaning etc.)
Derivational feature
A dialogue act has two main components: a communicative function and a semantic content. The semantic content specifies the objects, relations, actions, events, etc. that the dialogue act is about; the communicative function can be viewed as a specification of the way an addressee uses the semantic content to update his or her information state when he or she understands the corresponding stretch of dialogue. [http://www.lrec-conf.org/proceedings/lrec2010/pdf/560_Paper.pdf]
Dialogue act
Dialog modeling [en-US]
The task/process of building models of dialogues to be used in dialogue systems
Dialogue modelling
A Dialogue model is a system that simulates the behavior of dialogue participants. This includes formal models that instantiate linguistic theories of dialogue interaction and statistical models of dialogue behaviors.
https://www.iso.org/standard/51967.html
Format following Dialogue Act Markup Language (DiAML) which is defined within the ISO standard 24617-2
DIAML
adapted from Wikipedia (https://en.wikipedia.org/wiki/Dimensionality_reduction)
A method based on reducing the number of random variables under consideration, via obtaining a set of principal variables.
Dimensionality Reduction
A component that is used to disambiguate between two or more ambiguous items
Disambiguator
A method of analysing the structure of texts or utterances longer than one sentence, taking into account both their linguistic content and their sociolinguistic context; analysis performed using this method.[OED, https://en.oxforddictionaries.com/definition/discourse_analysis]
Discourse analysis
The task/process of adding annotations relevant to discourse, such as discourse structure, discourse markers etc.
Discourse annotation
Discourse modeling [en-US]
The task/process of building discourse models
Discourse modelling
Discourse modelling describes all aspects of the relations between groups of sentences in monologue (text), dialogue, or multiparty interactions, e.g. text coherence, rhetorical relations, intentional and attentional state, centering, dialogue moves, dialogue acts, argument structures and reference phenomena, to name just a few. Recently, with the growing interest in multimodal discourse processing, discourse models extend their reach beyond the modality speech/language to other modalities like gesture, gaze or haptics, and cover also cross-modal phenomena.
The relation that holds between two segments of discourse; e.g. causal, temporal etc.
Discourse relation
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-TokenizedText
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-de.tudarmstadt.ukp.dkpro.core.io.text-asl
DkPro format for tokenized files containing one sentence per line and tokens split by whitespaces.
DKPro tokenized
A component that tries to classify a document into one or more categories
Document classifier
The task/process of comparing different versions of the same document
Document comparison
Any format used for documents (textual resources)
Document format
Document image analysis
Document image analysis is the theory and practice of recovering the logical structure of digital images scanned from documents or produced by computer. It includes optical character recognition as one of its subfields, but has more ambitious tasks, both in the breadth (understand diagrams, music scores, images ...), and depth (e.g. the correct interpretation of a scanned mathematical formula).
The processing of images of documents in order to obtain a machine-readable description of the their contents and structure from the pixel data [adapted from https://www.ias.ac.in/article/fulltext/sadh/027/01/0003-0022]
The task/process of converting the contents of a digital document into speech in order to aid mainly individuals with visual impairments
Document reading
http://dkpro.github.io/dkpro-core/releases/1.8.0/docs/typesystem-reference.html#de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Div
Any subdivision of a document, e.g. a chapter, abstract, etc.
Document section
Any kind of annotation that is used for specific domains (e.g. genes and proteins from the biomedical domain, plants from agriculture etc.)
Domain-specific annotation type
https://www.mpi.nl/corpus/html/elan_ug/ch01.html
eaf
Format for annotations of video files with the ELAN annotator
EUDICO annotation format
The task/process of changing the contents of a resource
Editing
A component that allows humans to edit the contents of a resource
Editor
Data format according to the EMMA (Extensible MultiModal Annotation markup language) specifications, cf. https://www.w3.org/TR/2007/CR-emma-20071211/
EMMA
An affective state of consciousness in which joy, sorrow, fear, hate, or the like, is experienced, as distinguished from cognitive and volitional states of consciousness [http://www.dictionary.com/browse/emotion]
Emotion
Emotion detection
The recognition of emotions from text, speech, facial expressions, gestures and/or physiological measures. A key challenge is the appropriate representation of emotional states.
Emotion recognition
Emotion generation
Emotion labeling [en-US]
The task/process of identifying types of feelings (e.g. anger, fear, happiness, sadness, etc.) in the linguistic expression of texts or facial expressions
Emotion labelling
Emotion detector
A component that tries to recognize and annotate emotions (e.g. fear, anger, happiness etc.) from text, video, audio and image
could also be an annotator
Emotion recognizer
adapted from Wikipedia (https://en.wikipedia.org/wiki/Ensemble_learning)
Any method that uses multiple learning algorithms in an attempt to improve predictive performance not obtainable otherwise with any of the constituent learning algorithms.
Ensemble Method
The task/process of attaching tags to all the mentions of an entity in a text linking them together with the entity.
Entity linking
Entity mention labeling [en-US]
Entity mention labelling
The pair of an entity and all the mentions of this entity formulated in various ways; used in co-reference resolution
Entity-Mention pair
Entity mention detection
Mention detection
The task/process of detecting in a text mentions of a specific class of entities (e.g. biochemical entities, historical persons)
Entity mention recognition
The task/process of assessing the quality of a resource, e.g. based on the contents (for a dataset) or performance (for a tool or service)
Evaluation
The task/process of assessing the quality performance of parsers
Evaluation of broad-coverage natural language parsers
The taks of parser evaluation is the measure the speed and accuracy of (symbolic or stochastic) parsers with respect to a manually parsed test corpus. Test data can either be a treebank constructed naturally occurring data, such as the ""Wall Street Corpus"", or a ""test suite"" containing constructed examples which cover all interesting linguistic phenomena. An important issue in parser evaluation is the comparison of parsers and grammars based on different linguistic theories. Evaluations have focussed on phrase structure or dependency structures.
Evaluation of MT
Evaluation of MT Systems
MT Evaluation
The task/process of assessing the quality performance of machine translation tools
Evaluation of Machine Translation and translation tools
The growing number of MT systems on the market that span a wide range of quality levels has motivated activities towards evaluation standards from national and international organizations. Evaluation of MT systems depends strongly on whether such a system is used for information dissemination, assimilation, or in a conversational context, the types of texts to be translated, whether there is a well-defined and limited application domain and many more factors.
A component that is used in the evaluation of the performance of a component
Evaluator
http://dkpro.github.io/dkpro-core/releases/1.8.0/docs/typesystem-reference.html#de.tudarmstadt.ukp.dkpro.core.api.ner.type.Event
A thing that happens or takes place, especially one of importance [https://en.oxforddictionaries.com/definition/event]
Event
Event extraction
The task/process of identifying events in data (text, video, images etc.), usually combined with their classification into types of events and recognition of the event attributes (e.g. time, place, participants and duration)
Event detection
A component that tries to extract information related to incidents referred to in a text
could also be an annotator
Event extractor
Annotation of events
Event labeling [en-US]
The process/act of marking events and classifying them into types of events in a text
Event labelling
A type of search which, in contrast to traditional lookup search, covers a broad class of activities, such as investigating, evaluating, comparing, and synthesizing
Exploratory search
Extraction of domain-specific information
Extraction of information that pertains to specific domains/disciplines; it can be used combined with ""Annotation type"" to specify the type of information extracted
Extraction of information that pertains to specific domains/disciplines; it can be used combined with "Annotation type" to specify the type of information extracted
Mining of funding information
The task/process of detecting in a text and extracting information relevant to funding (e.g. funding programme, award, funder etc.)
Extraction of funding information
The task of computing and extracting quantitative properties (e.g. of strings, words, etc.)
Extraction of quantitative information
The task/process of recognizing that a person appears on a digital image or a video frame from a video source and possibly marking the contours around it
Face detection
Facial recognition
The task/process of identifying or verifying a person from a digital image or a video frame from a video source [adapted from Wikipedia]
Face recognition
Face authentication
The task/process of validating a claimed identity based on the image of a face, and either accepting or rejecting the identity claim [adapted from http://www.idiap.ch/~marcel/labs/faceverif.php]
Face verification
Facial expression
Expression detection
Expression recognition
Face expression detection
Face expression recognition
Facial expression detection
The task/process of detecting emotions on human face based on biometric markers [adapted from https://sightcorp.com/knowledge-base/facial-expression-recognition/]
Facial expression recognition
A human-made artifact in the domains of architecture and civil engineering [source: https://www.nltk.org/book/ch07.html]
Facility
Fact-checking
The act of checking factual assertions in non-fictional text in order to determine the veracity and correctness of the factual statements in the text [from Wikipedia]
Fact checking
https://builds.openminted.eu/job/WP%205.2%20-%20Typesystem%20alignment/eu.openminted.interop$mapping-conversion/doclinks/1/components.html#_factored_tag_lem_1
Factored tag lemma format
Factored tag lem format
Fake news assessment
Fake news evaluation
The process of detecting whether a certain news item is fake (i.e. a type of news items of yellow journalism or propaganda that consists of deliberate disinformation or hoaxes spread via traditional news media or online social media) [adapted from wikipedia]
Fake news detection
https://gate.ac.uk/sale/tao/splitch23.html#x28-59400023.31
A compressed binary encoding of GATE XML
Fast Infoset
Feature extraction consists in transforming arbitrary data, such as text or images, into numerical features usable for machine learning
Feature extraction
A component that is used for extracting features
could also be under analyzer as a general term
Feature extractor
Masked Language Modeling is a fill-in-the-blank task, where a model uses the context words surrounding a mask token to try to predict what the masked word should be. [Source: https://keras.io/examples/nlp/masked_language_modeling/]
Fill mask
A component that is used for filtering text input or annotations based on specific criteria
Filter
In data communications, flow control is the process of managing the rate of data transmission between two nodes to prevent a fast sender from overwhelming a slow receiver. It provides a mechanism for the receiver to control the transmission speed, so that the receiving node is not overwhelmed with data from transmitting node.
Finite state technology
Flow control
A component that supports controlling flows
Flow controller
https://proycon.github.io/folia/
Format for Linguistic Annotation
FoLiA is an XML-based annotation format, suitable for the representation of linguistically annotated language resources
FoLiA
Data conversion
File conversion
The task/process of converting (changing) the format of a resource into another (e.g. PDF to TXT or XML)
Format conversion
The task of recognizing words and phrases that evoke semantic frames as defined in the FrameNet project, and their semantic dependents.
Frame extraction
The task/process of recognising and labelling in a text predicate argument structures and the semantic roles of the constituents, in accordance to the frame semantics theory.
Frame-semantic parsing
Frequency
Computation of frequencies
The task of counting frequencies of entities (e.g. words, strings, sentences, specific tags, etc.)
Frequency count
Annotation related to the funding of a resource (e.g. funder, funding project, etc.)
Funding
Formats used for the GATE framework
GATE format
XML-based format for GATE components
GATE XML
https://gate.ac.uk/sale/tao/splitch17.html
Twitter/JSON
A Twitter-style JSON format used for GATE documents
GATE/JSON twitter
Gaze eye movement
A component that allows matching of elements based on a gazeteer
Gazeteer based matcher
Gazeteer-based matching
The task/process of performing a comparison between a text/dataset and a gazeteer and identifying in the text/dataset units that are included in the gazeteer
Gazeteer based matching
Gender detection
Specific sequence of nucleotides along a molecule of DNA (or, in the case of some viruses, RNA) which represents functional units of heredity [http://artemide.art.uniroma2.it:8081/agrovoc/agrovoc/en/page/c_3214]
Gene
A gene family is a set of several similar genes, formed by duplication of a single original gene, and generally with similar biochemical functions [https://en.wikipedia.org/wiki/Gene_family]
Gene family
A component that generates (semi-)automatically natural language texts (based on non-linguistic data, keywords, logical forms, knowledge bases...)
Generator
GPE
A geographical area associated with some sort of political structure
Geo-political entity
http://dl.acm.org/citation.cfm?id=1642060
https://zoidberg.ukp.informatik.tu-darmstadt.de/jenkins/job/DKPro%20Core%20Documentation%20(GitHub)/de.tudarmstadt.ukp.dkpro.core$de.tudarmstadt.ukp.dkpro.core.doc-asl/doclinks/5/format-reference.html#format-Graf
Graph Annotation Format
GrAF (Graph Annotation Format) is an extension of the Linguistic Annotation Framework (LAF)
GrAF
A component that corrects grammatical mistakes in a text
Grammar checker
Grammar checking
A type of grape
Grape variety
The place or environment where an organism, plant or animal naturally or normally lives and grows
Habitat
HWR
The ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs, touch-screens and other devices [from Wikipedia]
Handwriting recognition
Hate speech detection
The task of automatically identifying hate speech, i.e. statements intended to demean and brutalize another, or the use of cruel and derogatory language on the basis of real or alleged membership in a social group [adapted from wikipedia]
Hate speech recognition
Head movement
Historical event
HTML format
HTML
https://www.w3.org/TR/microdata/
Format according to the specifications of HTML5 Microdata
HTML5 Microdata
HAMT
Human Aided Machine Translation
The process of performing Machine Translation facilitated by pre-editing or post-editing steps, or interactive human intervention to steer or select from alternative translations
We call Human-aided Machine Translation all systems and techniques which rely on real automation of the translation function when porting a text from one language to another. As opposed to full Machine Translation, human-aided MT does not fully rely on computational translation, but assists this process by pre-editing and post-editing steps, possibly also interactive human intervention to steer or select from alternative translations. Translation of real-time spoken language, by contrast, does not allow for human intervention, except for negotiation functions, such as clarification dialogues.
HCI
Human Machine Interaction
A multi-disciplinary research area focusing on the design and use of computer technology deployed in interfaces between humans and computers
Human Computer Interaction
Human Computer Interaction Collection
Human Computer Interaction Collection
Humor detection
Humor recognition
Humor sensing
Humour recognition
Humour sensing
Humour detection
The task of identifying in a text structures and phrases of humour
The task of identifying in a text structures and phrases of humour
https://builds.openminted.eu/job/WP%205.2%20-%20Typesystem%20alignment/eu.openminted.interop$mapping-conversion/doclinks/1/components.html#I2B2Reader
https://www.i2b2.org/NLP/RDoCforPsychiatry/PreviousChallenges.php
Format of the I2B2 challenge
I2B2
Image classification
Any format used for image files
Image format
Image generation
Image processing
Image segmentation
The automatic extraction of text contained in an image (e.g. by OCR technology), its subsequent translation into another language, followed by a digital image processing step in order to reconstruct the original image with the translated text [adapted from https://en.wikipedia.org/wiki/Image_translation]
Image translation
Advanced image processing in which artificial-intelligence techniques are used to interpret images by locating, characterizing, and recognizing objects and other features in the scene. [from https://www.encyclopedia.com/computing/dictionaries-thesauruses-pictures-and-press-releases/image-understanding]
Image Understanding
Image / Video Processing Collection
Image / Video Processing Collection
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-de.tudarmstadt.ukp.dkpro.core.io.imscwb-asl
IMS Corpus Workbench
A tab-separated format with limited markup (e.g. for sentences, documents, but not recursive structures like parse-trees) used by the IMS Open Corpus Workbench.
imsCwb
IE & Text Analysis
Information Discovery
Information Retrieval
The task/process of automatically extracting structured information from unstructured and/or semi-structured data
Information Extraction
The goal of information extraction (IE) is to build systems that find and link relevant information from natural language text ignoring irrelevant information. The information of interest is typically pre-specified in form of uninstantiated frame-like structures also called templates. The templates are domain and task specific. The major task of an IE-system is then the identification of the relevant parts of the text which are used to fill a template's slots.
Information Extraction and Information Retrieval Collection
Information Extraction and Information Retrieval Collection
A component that automatically extracts structured information from unstructured and/or semi-structured machine-readable documents
Information extractor
The task/process of removing (filtering out) redundant or unwanted information from an information stream using (semi)automated or computerized methods prior to presentation to a human user; the selection of the items is based on the correlation between the content of the items and the user's preferences (content-based filtering) or the correlation between people with similar preferences (collaborative filtering)
Information filtering
The delivery of information in the form of suggestions by recommender systems; recommender systems seek to predict the ""rating"" or ""preference"" that a user would give to an item
Information filtering by recommender systems
Information Retrieval
Information Retrieval is the process of locating information that fits a user's information need, which is usually expressed as a search query. The fit of the retrieved information with the information need is referred to as ""relevance"". The information can be retrieved from databases (data retrieval) or from document collections (document retrieval), where documents can either be text documents or other media (audio, video, semi-structured data, multimedia). Success in information retrieval is generally defined by retrieving as much relevant information as possible (measured by ""recall"") while minimising the irrelevant information retrieved (measured by ""precision""). The most widely used information retrieval systems today are Internet search engines.
The activity of obtaining information resources relevant to an information need from a collection of information resources; searches can be based on full-text or other content-based indexing
Data storage
Information storage
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-de.tudarmstadt.ukp.dkpro.core.io.xml-asl
Inline XML file format
Inline XML
adapted from Wikipedia (https://en.wikipedia.org/wiki/Instance-based_learning)
A family of learning algorithms that, instead of performing explicit generalization, compares new problem instances with instances seen in training, which have been stored in memory.
Examples of instance-based learning algorithm are the k-nearest neighbor algorithm, kernel machines and RBF networks.
Instance-based Learning
Intra-document coreference resolution
Intra-document co-reference resolution
Ion channel
A single protein or protein complex that traverses the lipid bilayer of cell membrane and form a channel to facilitate the movement of ions through the membrane according to their electrochemical gradient [http://www.biology-online.org/dictionary/Ion_channel]
Ionic channel
Ion conductance
Ionic conductance
Ionic conductance
Ion current
The influx and/or efflux of ions through an ion channel
Ionic current
The task of identifying in a text segments expressing irony and classifying them into types
Irony detection
https://zoidberg.ukp.informatik.tu-darmstadt.de/jenkins/job/DKPro%20Core%20Documentation%20(GitHub)/de.tudarmstadt.ukp.dkpro.core$de.tudarmstadt.ukp.dkpro.core.doc-asl/doclinks/5/format-reference.html#format-Jdbc
JAVA Database Connectivity
For JDBC databases
JDBC
Superclass of JSON formats
JSON
https://jsonlines.org/
JSONL
JSON Lines
https://builds.openminted.eu/job/WP%205.2%20-%20Typesystem%20alignment/eu.openminted.interop$mapping-conversion/doclinks/1/components.html#GeniaJSONReader
JSON format of the Genia dataset
JSON/Genia
http://kyoto-project.eu/xmlgroup.iit.cnr.it/kyoto/indexdd46.html?option=com_content&view=article&id=141&Itemid=130
KYOTO Annotation Format
Knowledge Annotation Format
KAF (also known as Knowledge Annotation Format) is a language neutral annotation format representing both morpho-syntactic and semantic annotation of documents through a stand-off multilayered structure
KAF
https://builds.openminted.eu/job/WP%205.2%20-%20Typesystem%20alignment/eu.openminted.interop$mapping-conversion/doclinks/1/components.html#_kea_corpus_1
KEA-style (Keyphrase Extraction Algorithm) corpus
KEA corpus
adapted from wikipedia (https://en.wikipedia.org/wiki/Kernel_method)
Any method used in pattern analysis that relies on kernel functions, which enable it to operate in a high-dimensional, implicit feature space without ever computing the coordinates of the data in that space, but rather by simply computing the inner products between the images of all pairs of data in the feature space.
Kernel Method
A word or group of words used to describe or index the contents of a document
Keyword
KWS
Key phrase Extraction
Keyword search
Keyword spotting
The task/process of identifying keywords (words deemed indicative of the topic/subject) in a text/corpus
Keyword extraction
A component that tries to extract keywords from a given text
Keyword extractor
The task/process of extracting, organising and systematising knowledge usually of a specific domain from external sources so that it can be used in a knowledge-based system
Knowledge acquisition
KD
KDD
Knowledge Discovery in Databases
The task/process of automatically searching large volumes of data for patterns that can be considered knowledge about the data
Knowledge Discovery
Generally, knowledge discovery / data mining is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.
KR
The task/process of representing information about entities in a form that machines are capable of understanding it
Knowledge Representation
L2 writing assistant
L2 writing support
JSON linked data
LD-Json
Data format encoding Linked Data using JSON
JSON/LD
Proofing
The task/process of identifying (and usually correcting) grammatical mistakes in a text
Language checking
Language Checking comprises technologies used to detect and/or correct erroneous or inconsistent language use in documents. The scope of language checking technology ranges from general error correction, as performed by spell checkers and grammar checkers
Language code
Language detection
Automatic Language Identification
Language recognition
The task/process of guessing what natural language a text or text segment is written in
Language identification
Automatic Language Identification (LID) is the process of identifying the natural language of a sample of speech or written text by an unknown speaker. Several important applications exist for LID, viz., as a front-end to, e.g., a call router in a telephone-based application or a multi-lingual speech recognition system.
A component that identifies the language of a given text based on its contents
Language identifier
LM
Language Modeling
Language Modelling
A Statistical Language Models predicts a word given a sequence of already known words (i.e. the history). Ist can also be applied to other sequences of symbols (e.g. DNA). Very often the history contains just the previous two words. This is called a trigram. The parameters of statistical language models are estimated from a set of training examples. Data sparsity and smoothing of the estimates is one of the core problems. The best smoothing technique known so far is Kneser-Ney-Smoothing. Maximum-Entropy techniques are also under investigation and may be the method of choice for long-range language models (beyond trigram). Language models are used in text-compression, speech recognition, information retrieval and information extraction.
The construction of statistical or Machine Learning language models
Language proficiency level
HLT
Human Language Technology
LT
Language technology, often called human language technology, studies methods of how computer programs or electronic devices can analyze, produce, modify or respond to human texts and speech. It consists of natural language processing and computational linguistics on the one hand, and speech technology on the other. [Wikipedia]
Language Technology
https://www.latex-project.org/about/
Data format for documents using LaTeX (a high-quality typesetting system very popular for scientific documents)
LATEX
http://dkpro.github.io/dkpro-core/releases/1.8.0/docs/typesystem-reference.html#de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Lemma
The canonical or citation form used for referring to a word and its inflected forms
Lemma
Lemmatisation
Lemmatisation (or lemmatization) in linguistics is the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word's lemma, or dictionary form. [Wikipedia]
Lemmatization
A component that annotates the tokens of a text with lemma information
Lemmatizer
The task of replacing individual words of a text with words that are easier to understand, so that the text as a whole becomes easier to comprehend, e.g. by people with learning disabilities or by children who learn to read [from https://link.springer.com/chapter/10.1007/978-3-642-28601-8_36]
Lexical simplification
The task of identifying a substitute for a word in the context of a clause [from Wikipedia]
Lexical substitution
The task/process of accessing lexical/conceptual resources (either by humans or computer programs)
Lexicon access
The task/process of constructing lexical resources from corpora
Lexicon acquisition from corpora
Lexicon Development
The task/process of constructing lexical resources (e.g. dictionaries, computational lexica, glossaries, etc.)
Lexicon creation
The task/process of improving (e.g. increasing the size of entries, improving the information, adding new types of information, etc.) a lexicon
Lexicon enhancement
The task/process of constructing lexical resources based on the restructuring of lexical information contained in lexica (e.g. by parsing definitions or using syntactic information attached to lemmas)
Lexicon extraction from lexica
A component that extracts lexical information from corpora in order to produce structured lexical resources
Lexicon extractor from corpora
A component that extracts specific lexical information contained in other lexica
Lexicon extractor from lexica
The task/process of converting the format of a lexical/conceptual resource into another (e.g. from TSV to XML)
Lexicon format conversion
The task/process of merging (combining together) information coming from various lexical/conceptual resources
Lexicon merging
A component that supports humans in accessing the contents of a lexical/conceptual resource
Lexicon viewer
The task/process of viewing the contents of a lexicon as performed by human beings
Lexicon viewing
Lexicon visualisation
The task/process of visualizing information (e.g. using diagrams, 3-d images, word clouds etc.) contained in lexical/conceptual resources
Lexicon visualization
Language Analysis
Language Analysis and Understanding
Linguistic research
Any operation that aims at the analysis of language or its structure
Linguistic analysis
Any kind of annotation pertaining to entities of linguistics; the use of OLIA is recommended
Linguistic entity
Formats used for linked data
Linked data format
Lip movement
The task/process of recognising the lip contour and tracking the movements of lips in lip reading
Lip tracking analysis
https://builds.openminted.eu/job/WP%205.2%20-%20Typesystem%20alignment/eu.openminted.interop$mapping-conversion/doclinks/1/components.html#_lll_1
Format of the LLL challenge
LLL
Localisation
The process of adapting a product or content to a specific locale or market, including translation of relevant language material but also converting to local measures and currency units, modifications necessary to adapt to cultural differences of the target audience, etc. [adapted from https://www.gala-global.org/industry/intro-language-industry/what-localization]
Localization
http://dkpro.github.io/dkpro-core/releases/1.8.0/docs/typesystem-reference.html#de.tudarmstadt.ukp.dkpro.core.api.ner.type.Location
A word or group of words that denotes a geographical entity
Location
ML model format
Format used for ML models
Machine Learning Model format
MOSES format for aligned corpora (monolingual plain text files with language on file extension)
MOSES plain text format
Methods and techniques used either in machine learning or statistical learning
Machine and Statistical Learning Method
adapted from https://www.sas.com/en-US/insights/analytics/machine-learning.html
A method of data analysis that automates model building. Using algorithms that iteratively learn from data, machine learning allows computers to find hidden insights without being explicitly programmed where to look.
Machine Learning Method
A component that is used in predicting based on machine learning models
maybe create another class for predictors, analytics
Machine Learning predictor
Automated Translation
Automatic Translation
Computer Translation
MT
Machine Translation
Machine Translation (MT) is the fully automatic translation from one human language into another one. Machine translation has been worked on since the 1950s. There are a number of commercial products available and in daily use, but there are still open research problems. Tranfer-based MT has a set of transfer rules for each language pair, so that n*(n-1) rule sets are necessary for MT between n languages. In contract, interlingua-based MT uses a language-independent representation into which all source languages are analysed and from which all target languages is generated. Most practically usable MT systems are tranfser-based. Recently, there have been interestinv developments in statistical MT algorithms which are trained on parallel corpora of translated texts, and are very useful for constructing MT systems for new language pairs.
The automatic translation of a text from one language into another performed by software without human involvement
Any operation that can be used for training or support of Machine Translation tools
Machine translation support
http://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-MalletTopicProportions
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-de.tudarmstadt.ukp.dkpro.core.mallet-asl
Topic proportions in the shape [\t]\t\t...
Mallet LDA Topic Proportions
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-MalletTopicsProportionsSorted
Topic proportions in the shape [\t]\t\t... sorted
Mallet LDA Topic Proportions Sorted
Management of annotated texts
Monitoring of annotated texts
Monitoring of annotations
The task/process of managing (manually or (semi-)automatically) annotated texts, distributing them among annotators and monitoring their work
Management of annotations
Marker
A component that allows matching of elements
Matcher
The task/performance of identifying similar elements in two resources
Matching
The main means of mass communication (broadcasting, publishing, and the Internet) regarded collectively [https://en.oxforddictionaries.com/definition/media]
Media
https://www.mediawiki.org/wiki/Help:Formatting
Wiki markup for formatting
Media Wiki markup
Any substance involved in metabolism (= the chemical processes in the body needed for life) [https://dictionary.cambridge.org/dictionary/english/metabolite]
Metabolite
Research method
Method of research
Mimetypes Taxonomy
Mimetypes Taxonomy
Mimetype
MISC
A general label for named entities other than the usual ones recognized by a system
Miscellaneous
Modality annotation type
A subset of functions specific to models
Model function
Model function Taxonomy
Model function Taxonomy
Model organism/species
General morphological analyisis
Morphological analysis
The analysis of the structure of words and their relations to other words as regards their form and derivation
The technologies for or the process of tracing the inflectional, derivational, and compounding processes in the formation of a given word in order to determine properties such as stem form, part-of-speech and inflectional information. As a crucial preprocessing step, morphological analysis is used in virtually all fields of natural language processing.
The task/process of adding annotations pertaining to the morphological level of analysis (e.g. gender, number, person etc.)
Morphological annotation
Any type of annotation pertaining to the morphological level
Morphological annotation type
http://dkpro.github.io/dkpro-core/releases/1.8.0/docs/typesystem-reference.html#de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.morph.MorphologicalFeatures
A component that annotates tokens of a text with morphological information (part-of-speech and morphological features)
Morphological tagger
Morphosyntactic annotation
The task/process of adding morphosyntactic tags to words in a text, i.e. part-of-speech and, optionally, morphological features per part-of-speech.
Morphosyntactic tagging
mdb
Data format for Microsoft Access database
MS-Access database
Data format for Microsoft Excel documents
MS-Excel
Data format for Microsoft Excel documents (with file extension xlsx)
MS-Excel (xlsx)
Data format for Microsoft Powerpoint files
MS-Powerpoint
doc
Data format for Microsoft Word documents
MS-Word
docx
Data format for Microsoft Word documents (with file extension .docx)
Ms-Word (docx)
A combination of words that are considered as forming one semantic unit
Multi-word unit
The task/process of generating content in multiple languages
Multilingual generation
Multilingual generation is the verbalization of a non-linguistic semantic representation in different human languages. The language of choice is usually a parameter to a multilingual generation system. A typical instance of multilingual generation is found in interlingual machine translation, when multiple target languages are involved. Other instances include report generation systems for multiple languages (such as Temsis-Gen) and grammar-based realizers (such as FUF/SURGE).
MIR
Cross-language information retrieval means using queries in one language to search for documents in a different language. Multilingual information retrieval is a broader term, which includes the case where queries in different languages are used, but only for searching documents in the same language.
Multilingual Information Retrieval
Multimedia annotation
Multimedia development
Multimedia document processing
The task/process of automatically extracting structured information from multimedia data
Multimedia Information Extraction
Multimedia Information Extraction can be considered as an extension of textual information extraction to multimedia documents, in an integrated view.
MMIR
MR
Multimedia Information Retrieval
Multimedia Mining
Multimedia Retrieval
The activity of obtaining multimedia information resources relevant to an information need from a collection of multimedia information resources
The research on multimedia retrieval focuses on the development of tools and techniques to support the easy access to information in digital archives. It exploits the synergy that exists between language & speech technology, image processing and database technology. The results can be used in innovative applications that support various content management tasks, e.g. automatic indexing of large text collections and disclosure of audiovisual archives, search technology and filtering of dynamic information streams. (taken from: http://parlevink.cs.utwente.nl/?page=onderwerp&onderwerpID=2
Modality integration
Multisensory integration
Multimodal integration
Multimodal fusion combines the output of speech understanding, gestures and mimic recognition (if available) to an uniform representation of the user intention.
The study of how information from the different sensory modalities, such as sight, sound, touch, smell, self-motion and taste, may be integrated by the nervous system. [from Wikipedia]
The generation of a multimodal output resource
Multimodal synthesis
http://wordpress.let.vupr.nl/naf/
https://github.com/newsreader/NAF
NLP Annotation Format
The NAF format is linguistic annotation format designed for complex NLP pipelines. NAF combines strengths of the Linguistic Annotation Framework (LAF) as described in Ide et al. (2003) and the NLP Interchange Format (Hellman et al. 2013, NIF).
NAF
A component that seeks to locate and classify elements in a text into pre-defined categories such as the names of persons, organizations, locations, expressions of times, discipline-specific classes, etc
Named entitity recognizer
A word or phrase referring to an entity, identified and annotated as such with a name (label); examples include organizations, persons, places etc.
Named entity
NED
The task/process of selecting among candidate entities from a knowledge base (or other information resource) the ones to which the named entities in a text refer to
Named Entity Disambiguation
Entity Identification
Entity Recognition
Entity extraction
NER
NERC
can also be annotation
Named Entity Recognition
A subtask of information extraction that seeks to locate and classify named entities in text into pre-defined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.
Named entity (NE) recognition is a form of information extraction in which the major task is to identify and classify from NL text every word or sequence of words as being a person-name, organizaton, location, date, time, monetary value, percentage expression. NE recognition has a high impact for a number of applications, like e.g., InterNet search engines, text data mining or answer extraction.
Generation
NLG
Synthesis
The task/process of generating natural language text from some non-linguistic representation of information.
Natural Language Generation
Natural Language Generation (NLG) is concerned with turning some usually non-linguistic representation of information and intended effect into fluent text preserving both meaning and intention. NLG systems often identify the content to be verbalized. They structure the document into interrelated sentence-sized chunks, choose appropriate words, aggregate and elide information to ensure fluency, create contextually appropriate referring expressions, such as pronouns, and follow grammatical constraints of the chosen language. All this is achieved using knowledge about the world and the domain of dicsourse, about communication and about languages. NLG components are used for e.g. automatic report generation, document authoring, dialogue, concept-to-speech, multi-modal and machine translation systems. Evaluating the correctness and the appropriateness of generated text is a research theme on its own since there is usually no single correct solution. One important way to tackle the problem consists in creating reference corpora and performing shared evaluation tasks, e.g. on generating referring expressions. However, this is not intended to replace less formal evaluation strategies such as human assessments.
Natural Language Generation Collection
Natural Language Generation Collection
LU
Language Understanding
Machine Understanding
NLU
Natural Language Understanding
The comprehension by computers of the structure and meaning of human languages, allowing users to interact with the computer using natural sentences. [adapted from https://www.gartner.com/it-glossary/nlu-natural-language-understanding]
Natural Language Understanding
The task/process of identifying negative cues (words, phrases) in a text
Negation detection
http://www.coli.uni-saarland.de/~thorsten/publications/Brants-CLAUS98.pdf
Export format for annotated corpora in the NeGra project
NeGra export
A nerve cell that carries information between the brain and other parts of the body
Neuron
Any kind of annotation pertaining to entities of neuroscience
Neuroscience entity
News generation
http://persistence.uni-leipzig.org/nlp2rdf/
NLP Interchage Format
The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations; it consists of specifications, ontologies and software (overview), which are combined under the version identifier "NIF 2.0", but are versioned individually
NIF
NLP Development Aid
NLP Development Support
The task/process of editing a text in order to remove unwanted material (e.g. quotation marks, hyphenations etc.) or to substitute/represent specific items (tokens, dates, etc.) with normalized values
Normalization
The task/process of substituting measurements with normalized values
Normalization of measurements
The task/process of substituting numbers with normalized values
Normalization of numbers
A component that removes unwanted material from text (e.g. quotation marks, hyphenations etc.) or performs edits so that specific items (tokens, dates, etc.) are substituted/represented with normalized values
Normalizer
The task/process of identifying noun phrases in a sentence
Noun phrase chunking
https://onnx.ai/
Open Neural Network Exchange
Open format built to represent machine learning models
ONNX
Oasis Presentation
Oasis spreadsheet
Oasis text
Object detection
http://owlcollab.github.io/oboformat/doc/obo-syntax.html
Serialization format for ontologies according to the Open Biomedical Ontologies model.
OBO
Official text
The task/process of creating an ontology based on other resources (corpora, other lexical resources, etc.)
Ontology acquisition
The task/process of improving an ontology, typically by adding new relations or entities
Ontology enhancement
Open format
Open office document
Open office presentation
Open office spreadsheet
Function
Task
The action that a software program performs or is meant to perform, and, in a broader sense, the application area where this software or relevant resources can be deployed
Operation
Operation Taxonomy
Operation Taxonomy
Operation with multimedia input or output
OCR
The task/process of converting handwritten or printed material into machine editable text
Optical Character Recognition
An individual animal, plant, or single-celled life form [https://en.oxforddictionaries.com/definition/organism]
Organism
http://dkpro.github.io/dkpro-core/releases/1.8.0/docs/typesystem-reference.html#de.tudarmstadt.ukp.dkpro.core.api.ner.type.Organization
A word or group of words that denotes an organization, such as company, association, institution etc.
Organization
The task/process of transcribing a source language word/phrase or an audio segment into the standard spelling system of a target language
Orthographic transcription
Orthographic transcription
Superclass for formats used for OWL
OWL
XML format for OWL ontologies
OWL/XML
The package format of the distribution in which one or more data files are grouped together, e.g. to enable a set of related files to be downloaded together.
Package format
Paragraph segmentation
The task/process of segmenting a text into paragraphs and marking their boundaries
Paragraph splitting
The task/process of annotating paralinguistic features, i.e. vocal features that accompany speech and contribute to communication but are not generally considered to be part of the language system (e.g. vocal quality, loudness, and tempo); facial expressions and gestures are also considered as paralinguistic features [adapted from https://www.dictionary.com/browse/paralanguage]
Paralanguage annotation
The task/process of identifying parallel sentences (i.e. sentences that have the same sense in different languages) in parallel texts
Parallel sentence extraction
A task/process whereby a text fragment is reproduced with another text fragment that conveys the same or similar information
Paraphrasing
Syntactic analyzer
A component that takes as input text and returns a form of data structure (e.g. syntactic parse as a tree, or bracketed structure etc.)
Parser
Syntactic analysis
Syntactic annotation
Syntactic parsing
Syntactic processing
The task/process of recognizing and marking the syntactic structure of a text or text segment
Parsing
Parsing (=syntactic analysis) of natural language utterances is still an active reasearch topic due to the inherent difficulties that make the problem much harder than e.g. parsing of programming languages. These difficulties include lexical and structural ambiguity of natural language utterances, complexities introduced by expressive grammar formalisms, the need to cope with illformed input or with deficiencies of the linguistic descriptions.
http://dkpro.github.io/dkpro-core/releases/1.8.0/docs/typesystem-reference.html#de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.POS
PoS tagger
A component that annotates tokens of a text with part-of-speech information
Part of speech tagger
pdf
Data format for PDF files (Portable Document Format)
PDF
http://dkpro.github.io/dkpro-core/releases/1.8.0/docs/typesystem-reference.html#de.tudarmstadt.ukp.dkpro.core.api.ner.type.Person
A word or group of words that refers to a person
Person
A process that identifies an individual uniquely by using unique personal identification number (PIN) and/or biometrics like fingerprint, face recognition etc.
Person identification
A word or phrase used for persuasion purposes
Persuasive expression
A component that tries to identify persuasive expressions in a given text
could also be an annotator
Persuasive expression miner
The task/process of identifying and extracting (especially from political speech texts) pieces of text that aim to persuade
Persuasive expression mining
The physical appearance or biochemical characteristic of an organism as a result of the interaction of its genotype and the environment [http://www.biology-online.org/dictionary/Phenotype]
Phenotype
The visual representation of speech sounds (or phones)
Phonetic transcription
Phonetic transcription
Alignment at phrase level
Phrase alignment
Physical and chemical property of substances
Physico-chemical property
https://www.iana.org/assignments/media-types/application/pls+xml
Data format according to the Pronunciation Lexicon Specification (PLS)
PLS
http://ufal.mff.cuni.cz/jazz/PML/index_en.html
https://builds.openminted.eu/job/WP%205.2%20-%20Typesystem%20alignment/eu.openminted.interop$mapping-conversion/doclinks/1/components.html#_prague_markup_language_1
Prague Markup Language
Format according to the Prague Markup Language (http://ufal.mff.cuni.cz/jazz/PML/index_en.html); PML is a generic data format based on XML intended for storing linguistically annotated data, such as the Prague Dependency Treebank, also annotation lexicons, etc.
PML
Polarity detection
The task/process of adding to units of a text (especially in sentiment-intensive texts) tags that indicate polarity (negative, positive, neutral, mixed)
Polarity labelling
Grammatical annotation
Grammatical tagging
PoS Tagging
The task/process of marking words with the part of speech (word category, e.g. noun, verb etc.) to which they belong
Part-of-Speech Tagging
The technologies for or the process of determining the correct part-of-speech tag for a word given its local context. The task comprises disambiguation of multiple part-of-speech tags and guessing of the correct part-of-speech tag for unknown words. Part-of-speech tagging is frequently used as a preprocessing step for shallow and deep parsers.
https://www.iana.org/assignments/media-types/application/postscript
ps
Data format for PostScript files
postscript
A component that is used at pre- or post-processing stages in order to normalize input/output
Pre- or Post-Processor
In Machine Learning, it refers to the use of algorithms that learn from previous data in order to make predictions on data (by estimating probabilities from previous data)
Prediction
A component that is used in processing operations
Processor
Prosodic annotation
Prosodic boundary
Prosody information processing
Prosody can be defined as a feature of speech which extends over more than one segment and is often synonymous with 'suprasegmentals'. Prosodic features include fundamental frequency (F0),relative duration and intensity, and spectral quality. They determine the rhythm and intonation of utterances.
Prosodic information processing
Prosodic segmentation
Any of various naturally occurring extremely complex substances that consist of amino-acid residues joined by peptide bonds, contain the elements carbon, hydrogen, nitrogen, oxygen, usually sulfur, and occasionally other elements (such as phosphorus or iron), and include many essential biological compounds (such as enzymes, hormones, or antibodies) [https://www.merriam-webster.com/dictionary/protein]
Protein
A protein family is a group of proteins that share a common evolutionary origin, reflected by their related functions and similarities in sequence or structure [https://www.ebi.ac.uk/training/online/course/introduction-protein-classification-ebi/protein-classification/what-are-protein-families]
Protein family
Pseudonymisation
The processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of additional information (from GDPR)
Pseudonymization
Penn Tree Bank formats
PTB
https://zoidberg.ukp.informatik.tu-darmstadt.de/jenkins/job/DKPro%20Core%20Documentation%20(GitHub)/de.tudarmstadt.ukp.dkpro.core$de.tudarmstadt.ukp.dkpro.core.doc-asl/doclinks/5/format-reference.html#format-PennTreebankChunked
Penn Treebank - chunked
ptb; format-variant=chunked
Penn Treebank chunked format
PTB-chunked
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-PennTreebankCombined
Penn Treebank - combined
ptb; format-variant=combined
Penn Treebank combined format
PTB-combined
https://gate.ac.uk/sale/tao/splitch23.html#sec:creole:pubmed
Textual format used for PubMed articles
PubMed
Statistical information
Quantitative information
AE
Answer Extraction
QA
Textual Question Answering
The task/process where computer systems try to automatically answer questions posed by users in the form of natural language.
Question Answering
Question answering or Answer extraction (AE) aims at retrieving those exact passages of a document that directly answer a given user question. AE is more ambitious than information retrieval and information extraction in that the retrieval results are short phrases, not entire documents, and in that the queries may be arbitrarily specific. It is less ambitious than full-fledged question answering in that the answers are not generated from a knowledge base but looked up in the text of documents.
QTT
Question topic
The segment of a question that describes the entity about which the question is made
Question Topical Target
https://en.wikipedia.org/wiki/Raw_audio_format#:~:text=RAW%20Audio%20format%20or%20just,%2C%20or%20number%20of%20channels
Raw audio format
Formats for RDF (Resource Description Framework) resources
RDF format
true
Formats for RDF (Resource Description Framework) resources
RDF formats
https://www.w3.org/TR/REC-rdf-syntax/
Data format for RDF (Resource Description Framework) XML format; RDF/XML is a serialisation for RDF
RDF/XML
http://dkpro.github.io/dkpro-core/releases/1.8.0/docs/typesystem-reference.html#de.tudarmstadt.ukp.dkpro.core.type.ReadabilityScore
The ease with which a reader can understand a written text. [https://en.wikipedia.org/wiki/Readability]
Readability
The task/process of adding readability scores (or any other type of similar annotation) to a text or textual segment showing how easy it is to read and understand its meaning.
Readability annotation
A component that annotates the tokens of a text with readability scores
Readability annotator
A component that reads content of various types (pdf, txt, xml etc.)
Reader
Getting access to the contents of an input resource
Reading
A task/process that intends to recognize for two text fragments whether the meaning of one text is entailed in that of the other, i.e. whether the truth of one text fragment follows from that of the other fragment.
Recognizing Textual Entailment
Wikipedia (https://en.wikipedia.org/wiki/Regression_analysis)
A statistical process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables (or 'predictors').
Regression Analysis
adapted from Wikipedia (https://en.wikipedia.org/wiki/Regularization_(mathematics))
Regularization
A process of introducing additional information in order to solve an ill-posed problem or to prevent overfitting.
Regularisation
Any type of relation that holds between two or more entities of a specific domain
Relation
Relationship Extraction
The task/process of identifying and classifying relation mentions between entities in text and/or data.
can also be annotation
Relation Extraction
Automated or human-assisted acquisition of relations between concepts from textual or other data, usu. within a selected domain.
The representation of time and space plays a crucial role in Artificial Intelligence and Computational Linguistics, but also in practical areas such as business intelligence or geographical information systems. Representing time and space properly eases querying and reasoning thereof. <br /> <br /> Geographic information systems use traditional techniques from image processing and CAD, like a pixel/raster or a vector representation of spatial data. Temporal data bases extend tradional relational data models by the notion of <i>valid time</i> and <i>transaction time</i>. <br /> <br /> When we speak of time, we mean a linear, dense, and one-dimensional time. With space, we usually refer to a two- or three-dimensional space, often using a latitude/longitude measurement in degrees in case of a spherical 2D geometry or a system of coordinates/trajectories for a Euclidean 2D/3D space. Classical observable spacetime then refers to a 3+1-dimensional continuum in which physical objects move through time and space (Tegmark 1997). Even abstract events or processes can be seen to take place in 4D spacetime. Physical and abstract 4D entities are usually composed of simpler entities -- here, both time and space are the glue to achieve a decomposition. For instance, Allen (1984) has defined a natural system of 13 temporal topological relations. Randell et al. (1992) came up with a logic (later called RCC) to support qualitative reasoning about space. <br /> <br /> Human natural language usually comes up with means to refer to time and space, for instance through the use of temporal and spatial prepositions, adverbs, or verb tense and aspect (Vendler 1967, Moens & Steedman 1988, Herskovits 1986). Several algorithms exist which compute the temporal structure of a discourse (e.g., Hitzeman et al. 1995). When constructing mental models of space derived from text, it has been shown that the representation is more topological than Euclidean (Langston et al. 1998). <br /> <br /> Within theoretical and computational linguistics, type-logical semantics and categorial grammar have given a rigorous account to tense, aspect, and temporal modification through the use of a possible-worlds semantics (see, e.g., Carpenter 1997). <br /> <br /> Studying time from a formal perspective, focussing on the inherent properties of a theory, has a long tradition in logic (e.g., Hayes 1995), artificial intelligence (e.g., McDermott 1982), description logics (e.g., Bry & Spranger 2003, Lutz 2004), or data base theory (Date et al. 2002). <br /> <br /> Looking from a more practical viewpoint on time, a number of frameworks have been proposed within the World Wide Web Consortium W3C: (i) ISO 8601 is an international standard for date and time representations issued by the International Organization for Standardization (Wolf & Wicksteed 1998); (ii) XSD (XML Schema Datatypes) comes up with a number of built-in primitive datatype for time and date (Peterson et al. 2008); (iii) OWL-Time (formerly DAML-Time; see Hobbs & Pan 2004) is an OWL ontology of temporal concepts and properties for describing the temporal content of Web pages and the temporal properties of Web services. <br /> <br /> Representing changing relationships over time and space is related to the problem of diachronic identity which describes the identification of individuals that look different at different times, but still refer to the same entity. The four-dimensional or perdurantist view assumes that all entities (the perdurants) only exist for some period of time. Entities under this view are sometimes refered to as <i>spacetime worms</i> (Sider 2001), since a 4D trajectory is all one needs to identify/follow a perdurant through time and space. Parts of such a worm are called <i>time slices</i>, encoding cooccurent information that stay constant over the specified period of time. There exist several well-known techniques of extending a relation with time and space: (i) equip the relation with further arguments as is done in temporal data bases; (ii) apply a meta-logical predicate (McCarthy & Hayes 1969); (iii) reifiy the original relation as used in RDF (Manola & Miller 2004). Unfortunately, (i) and (ii) are not applicable to OWL (Smith et al. 2004), whereas (iii) requires ontology rewriting. Welty & Fikes (2006) and Krieger et al. (2008) present alternative approaches compatible with OWL.
Representation of space and time
Any operation that enables accessing a resource
Resource access
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-Reuters21578Sgml
Reuters-21578 corpus in SGML format
Reuters21578 SGML
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-Reuters21578Txt
Reuters-21578 corpus transformed into text format using ExtractReuters in the lucene-benchmarks project
Reuters21578 Txt
Ribonucleic acid
Any of various nucleic acids that contain ribose and uracil as structural components and are associated with the control of cellular chemical activities
RNA
rtf
Rich Text Format; proprietary data format of Microsoft
RTF
A type of method that makes use of set(s) of rules to perform the relevant task.
Rule-based Method
The task of identifying the attitude of a person towards the truthfulness of a rumour
Rumours stance classification
http://sdl.com/FileTypes/SdlXliff/1.0
SDL alignment format
Any type of annotation that is relevant to scholarly analtyics (e.g. citations, funding information etc.)
Scholarly analytics entity
true
Any type of annotation that is relevant to scholarly analtyics (e.g. citations, funding information etc.)
Scholarly analytics entity
Scientific unit
Scientific value
A component that performs analysis tasks based on a script
Script-based analyser
Script based analysis
The task/process of analysing a resource following a script
Script-based analysis
The act of looking for a unit of text in a dataset
Search
A component that segments a text into structural untis (chapters, paragraphs, sentences, words, tokens etc.)
Segmenter
Semantic annotation
Any type of annotation pertaining to the semantic level
Semantic annotation type
A component that annotates the tokens of a text with semantic features
Semantic annotator
http://dkpro.github.io/dkpro-core/releases/1.8.0/docs/typesystem-reference.html#de.tudarmstadt.ukp.dkpro.core.api.semantics.type.SemanticField
Semantic type
A division of words into classes based on their common semantic features
Semantic class
Semantic class labeling [en-US]
The task/process of classifying words in a text according to a set of semantic classes (types).
Semantic class labelling
http://dkpro.github.io/dkpro-core/releases/1.8.0/docs/typesystem-reference.html#de.tudarmstadt.ukp.dkpro.core.api.semantics.type.SemanticPredicate
The inferrence of logical consequences from a set of asserted facts or axioms [from wikipedia]
Semantic reasoning
Semantic relation labeling [en-US]
The task/process of attaching tags indicating the semantic relation that holds between units of a text.
Semantic relation labelling
http://dkpro.github.io/dkpro-core/releases/1.8.0/docs/typesystem-reference.html#de.tudarmstadt.ukp.dkpro.core.api.semantics.type.SemanticArgument
Semantic role labeling [en-US]
The task/process of attaching labels that correspond to the roles that the arguments of a predicate take in an event
Semantic role labelling
A type of search that seeks to improve search accuracy by understanding the searcher's intent and the contextual meaning of terms as they appear in the searchable dataspace, whether on the Web or within a closed system, to generate more relevant results.
Semantic search
http://dkpro.github.io/dkpro-core/releases/1.8.0/docs/typesystem-reference.html#de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Sentence
Alignment at sentence level
Sentence alignment
Sentence similarity assessment
A component that splits a text into sentences
Sentence splitter
Segmentation into sentences
Sentence segmentation
The task/process of recognizing and tagging sentence boundaries in a text
Sentence splitting
http://dkpro.github.io/dkpro-core/releases/1.8.0/docs/typesystem-reference.html#de.tudarmstadt.ukp.dkpro.core.sentiment.type.StanfordSentimentAnnotation
Opinion
The affective state (judgement, feeling) of a person or group towards an entity or event
Sentiment
Opinion extraction
Opinion mining
Polarity detection
Sentiment detection
The task/process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer's attitude towards a particular topic, product, etc. is positive, negative, or neutral
Sentiment analysis
Opinion mining tool
A component that tries to identify sentences that express the author’s negative or positive feelings on something
could also be an annotator
Sentiment analyzer
http://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-SerializedCas
The CAS is the native data model used by UIMA; there are various ways of saving CAS data, using XMI, XCAS, or binary formats; this is for the serialized format
Serialized CAS
SGML format
SGML
The task/process of generating videos in sign language
Sign language generation
The task/process of recognizing from images and videos hand gestures performed in sign language
Sign language recognition
A component that outputs a simpler rendition of a given item (sentence, text etc.)
Simplifier
Any kind of annotation that pertains to entities of social sciences; the use of TheSoz is recommended
Social sciences entity
A technology that supports the development of software components and data resources required for their operation
Software development environment
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-de.tudarmstadt.ukp.dkpro.core.io.solr-asl
Solr format
Solr
Sound event annotation
Sound-to-Text alignment
Spatial role labeling
Τhe task of identifying and classifying the spatial arguments of the spatial expressions mentioned in a sentence
Spatial role labelling
Speaker diarisation
Partitioning of audio stream into homogeneous segments according to speaker properties and acoustic conditions
Speaker diarization
SR
Speaker Recognition
VR
Voice Recognition
The task/process of identifying/recognizing who the person speaking is
Speaker identification
Speaker recognition, which can be classified into identification and verification, is the process of automatically recognizing who is speaking on the basis of individual information included in speech waves. This technique makes it possible to use the speaker's voice to verify their identity and control access to services such as voice dialing, banking by telephone, telephone shopping, database access services, information services, voice mail, security control for confidential information areas, and remote access to computers.
The task/process of verifying that a certain person is speaking
Speaker verification
A set of animals or plants in which the members have similar characteristics to each other and can breed with each other
Species
Spectral data is essentially data derived by the use of spectroscopic instruments
Spectral data
A speech act is an act that a speaker performs when making an utterance, including the following: (a) A general act (illocutionary act) that a speaker performs, analyzable as including: the uttering of words (utterance acts), making reference and predicating (propositional acts), and a particular intention in making the utterance (illocutionary force). (b) An act involved in the illocutionary act, including utterance acts and propositional acts, (c) The production of a particular effect in the addressee (perlocutionary act) [http://www.glossary.sil.org/term/speech-act]
Speech act
Speech analysis
Speech analytics
Speech annotation
Speech annotation type
Speech-assisted video control
Speech Enhancement
The improvement of speech intelligibility by removing background noise from the speech signal
The improvement of speech intelligibility by removing background noise from the speech signal. Due to the complexity of speech acoustics and perception, no simple mathematical error criterion can be applied; instead, algorithms and measures need to be developed which accomodate human perception.
Speech Processing Collection
Speech Processing Collection
ASR
Automatic speech recognition
Computer speech recognition
SR
Voice recognition
Speech Recognition
Automatic Speech Recognition deals with automatic transcribing spoken language as text which is further processed in application dependent ways. Important applications are dictation, control of machines and devices by speech, information systems, speech translation, aids for disabled persons. An increasingly important application is embedded speech recognition in devices such as mobile phones and PDAs.
The use of computer hardware and software-based techniques to identify and process human voice used tused to identify the words a person has spoken or to authenticate the identity of the person speaking into the system [adapted from https://www.techopedia.com/definition/6044/automatic-speech-recognition-asr]
SDR
SR
Spoken Document Retrieval
Speech Retrieval is the process of retrieving spoken audio material (documents)in response to a search query. Search queries can be spoken or textual. Speech retrieval makes use of techniques from speech recognition, natural language understanding and information retrieval. Possible applications are the indexing of archives of broadcast material, and monitoring of telephone conversations.
Speech Retrieval
Evaluation of Speech Synthesis
Evaluation of speech synthesis traditionally considers intelligibility and naturalness. More recently, expressivity has become an issue with the increasing demand for expressive voices. Due to the multitude of aspects involved, there is no agreed standard for evaluation of speech synthesis systems. Since 2005, the Blizzard Challenge is an annual joint event for comparing speech synthesis technologies with a common database.
Speech Synthesis Evaluation
The process by which spoken utterances in one language are automatically translated and spoken aloud in another language
Speech-to-Speech translation
Speech-to-Text translation
Speech understanding
Spelling correction
The task/process of checking the accuracy of spelling of a word (and usually correcting it) according to the accepted form
Spell checking
Techniques for the identification of spelling or typing errors in textual documents, which may be applied interactively during the creation of the document, or off-line for existing documents. Spelling correction is an extension in which for each assumed error one or several hypothetical corrections are suggested.
A component that corrects spelling mistakes in a text
Spelling checker
CTS
Concept-to-Speech generation
SLG
Whereas the generation of spoken language from semantic representations can be sequentialized into generation of text followed by text-to-speech, using text as an intermediate representation may lose information that was available in the original input, in particular related to information structure relevant for speech output. An integrated solution can avoid this problem and thereby lead to improved quality and/or simpler system architecture.
Spoken Language Generation
Spoken language understanding is the task of assigning a semantic interpretation to a spoken input, by making use of syntactic and semantic knowledge, which is often specific to a particular application domain. Spoken language understanding can involve dealing with multiple recognition hypotheses from a speech recogniser, taking prosodic properties of utterances into account and having to deal with fragmentary and grammatically incorrect utterances. Commercial speech recognisers often are accompanied by analysis tools. In commercial applications such as dialogue systems and command and control, spoken language understanding generally involves the construction of a task and domain-specific interpretation of the utterance.
Spoken Language Understanding
The extraction of a subject’s reaction to a claim made by a primary actor [http://nlpprogress.com/english/stance_detection.html]
Stance detection
SLM
Statistical Language Modeling
Statistical Language Modelling
"R: Statistical learning primer", https://www.pressreader.com/australia/linux-format/20161025/283626759591113
The process of using statistics and techniques related to statistics in order to understand and learn from your data so that you can predict its future.
Statistical Learning Method
http://dkpro.github.io/dkpro-core/releases/1.8.0/docs/typesystem-reference.html#de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Stem
A component that extracts stems from words in a text, usually by removing the most common morphological and inflectional endings from words
Stemmer
The task/process of cutting off the ends of words (mainly inflectional affixes but sometimes also derivational affixes) aiming to relate words to a base form.
Stemming
Segmentation
The task/process of segmenting a text and recognizing textual structural units (paragraphs, sentences, words etc.)
Structural annotation
Any type of annotation that pertains to the structure of a document
Structural annotation type
Argument structure
Subcategorisation frame
The number and types of syntactic arguments required by a certain lexical item (mainly verbs, but also nouns and adjectives)
Subcategorization frame
The linguistic expression of somebody’s opinions, sentiments, emotions, evaluations, beliefs, speculations (private states, i.e. states that are not open to objective observation or verification). [http://www.mavir.net/docs/JWiebe-Subjectivity-nov2010.pdf]
Subjectivity
ATS
Automatic Summarisation
Automatic Summarization
Automatic Text Summarisation
Automatic Text Summarization
Summarisation
TS
Text Summarisation
Text Summarization
The task/process of reducing one or more textual documents with a computer program in order to create a summary that retains the most important points of the original document(s).
Summarization
Text summarization is the process of distilling the most important information from a source (or sources) to produce an abridged version for a particular user (or users) and task (or tasks). Source: Mani, I. and Maybury, M. T., editors (1999), Advances in Automatic Text Summarization. MIT Press, Cambridge, Massachusetts.
A component that produces a natural language synopsis of a longer text
Summarizer
Helper
A component that provides support to developers
Support component
Support of crowdsourcing tasks
Any operation that can support tasks that are accomplished through crowdsourcing
Collection of data, their transformation and organization into crowdsourcing units; automatic generation of reusable crowdsourcing interfaces for specific tasks (e.g. annotation)
Any operation that is used to support LT tasks, either for creating workflows or for executing them
Support operation
Support operation Collection
Support operation Collection
Any operation that intends to support the creation, curation or use of knowledge resource
Support operation for knowledge resources
Segmentation of a word into syllables
Syllable segmentation
A specialized structure or junction that allows cell to cell communication [http://www.biology-online.org/dictionary/Synapse]
Synapse
Syntactic Structure Generation
A link between the syntactic unit and the semantic unit (sense) of a word
Syntactico-semantic link
https://www.tensorflow.org/guide/saved_model
TensorFlow SavedModel
Table question answering
Any format based on columns
Tabular format
Tagset conversion
Talking head synthesis
Task-oriented text analysis evaluation
https://www.tbxinfo.net/tbx-about/
TermBase eXchange (TBX) is an international standard (ISO 30042:2019) for the representation of structured concept-oriented terminological data, copublished by ISO and the Localization Industry Standards Association (LISA) [wikipedia]
tbx
https://weblicht.sfs.uni-tuebingen.de/weblichtwiki/index.php/The_TCF_Format
Text Corpus Format
An XML data exchange format developed within the WebLicht architecture to facilitate efficient interoperability between the tools; it allows the various linguistic annotations produced by the tools within WebLicht to be stored in one document; it supports incremental enrichment of linguistic annotations at various levels of analysis in a stand-off XML‐based format
TCF
The method used by a TDM/LT algorithm
TDM/LT Method
TDM/LT Method Taxonomy
TDM/LT Method Taxonomy
http://www.tei-c.org/index.xml
https://www.iana.org/assignments/media-types/application/tei+xml
Text Encoding Initiative
Data format for TEI-encoded (Text Encoding Initiative) texts
TEI
A linguistic expression (word, group of words, group of numbers etc.) that denotes time (a point in time, duration, frequency)
Temporal expression
The task/process of identifying temporal expressions (also called timex) in a text in order to extract temporal information
Temporal expressions recognition
Annotation of time
The task/process of determining the boundaries of units of text that denote time-related concepts and attaching to them tags that name these concepts (e.g. date, time, duration etc.)
Temporal expressions labelling
A term is a designation consisting of one or more words representing a general concept in a special language in a specific subject field [ISO 704:2009]
Term
Corpus based term recognition
Term recognition
Terminology extraction
Terminology recognition
The act/process of identifying and extracting candidate terms from a domain-specific corpus
Term extraction
Terminology extractor
A component that tries to extract terms from a corpus
Term extractor
Terminology management
Annotation of terms
Term labelling
Terminology markup
Term search
Terminology search
https://en.wikipedia.org/wiki/TeX
Data format for documents using Tex (a typesetting system)
TEX
txt
plain text
Default value for the format of textual files; a textual file should be human-readable and must not contain binary data
text/plain
Text analysis
The task/process of converting unstructured text and data into high-quality structured data that can be further analysed to extract knowledge, support decision making etc.
Text and Data Analytics
DM
Data Mining
TDM
TM
Text Data Mining
Text Mining
Text and Data Mining
Text data mining concerns the application of data mining (knowledge discovery in databases, KDD) to unstructured textual data. The goal of data mining is to discover or derive new information from data, finding patterns across datasets, and/or separating signal from noise. Core text mining algorithms decompose text in meaningful chunks that can then be used for true data mining purposes.
The automated processing of unstructured text and/or structured data leading to the extraction of previously hidden knowledge.
Linguistic annotation
The task/process of adding annotations (notes or comments) to a text; in TDM, the annotations refer mainly to the interpretative linguistic information grounded in a knowledge resource that is added manually or automatically to a text
Text annotation
Document categorisation
Document categorisation
Document categorization
Document classification
Text categorisation
Text classification
The task/process of assigning documents into classes or categories
Text categorization
Methods for text compression identify and exploit redundancy in text documents in order to obtain a more condensed representation of the information, from which the original data can be recovered without modification (lossless compression). In theory, there is a close relation between compression and prediction: The better a statistical language model can estimate the probability of a word, given some context, the more the text as a whole can be compressed.
Text compression
A cryptosystem or cipher system is a method of disguising messages so that only certain people can see through the disguise. Cryptography is the art of creating and using cryptosystems. Cryptanalysis is the art of breaking cryptosystems---seeing through the disguise even when you're not supposed to be able to. Cryptology is the study of both cryptography and cryptanalysis.
Text encryption
Text generation
Text indexing
Text Processing Collection
Text Processing Collection
Text similarity checking
Textual similarity assessment
Textual similarity checking
The task of determining the degree of closeness between two pieces of text
Text similarity assessment
The task of replacing textual segments (phrases, clauses, etc.) with other segments that have similar meaning but are easier to understand
Text simplification
it could also be a composite concept created by three separate concepts, but for now ok
Text/Speech/Data analytics
Text to Image generation
Text to Text Generation
Entailment
Textual entailment
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-de.tudarmstadt.ukp.dkpro.core.io.tgrep-gpl
Format for TGrep2 (search engine for searching syntactic parse trees represented as bracketed structures)
TGrep2
Theoretical frame
http://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/TIGERSearch/doc/html/TigerXML.html
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-de.tudarmstadt.ukp.dkpro.core.io.tiger-asl
The TIGER XML format was created for encoding syntactic constituency structures in the German TIGER corpus. It has since been used for many other corpora as well. TIGERSearch is a linguistic search engine specifically targetting this format. The format has later been extended to also support semantic frame annotations.
Tiger XML
Tika
http://www.ttt.org/oscarstandards/tmx/tmx14-20020710.htm
Translation Memory Exchange
The purpose of the TMX format is to provide a standard method to describe translation memory data that is being exchanged among tools and/or translation vendors, while introducing little or no loss of critical data during the process.
TMX
http://dkpro.github.io/dkpro-core/releases/1.8.0/docs/typesystem-reference.html#de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token
Segmentation
Segmentation and Tokenisation
Segmentation and Tokenization
Tokenisation
The task/process of recognizing and tagging tokens (words, punctuation marks, digits etc.) in a text
Tokenization
Tokenization is commonly seen as an independent process of linguistic analysis, in which the input stream of characters is segmented into an ordered sequence of word-like units, usually called tokens, which function as input items for subsequent steps of linguistic processing. Tokens may correspond to words, numbers, punctuation marks or even proper names.The recognized tokens are usually classified according to their syntax. Since the notion of tokenization seems to have different meanings to different people, some tokenization tools fulfil additional tasks like for instance sentence boundary detection, handling of end-line hyphenations or conjoined clitics and contractions.
A component that recognizes and tags tokens (words, punctuation marks, digits etc.) in a text
Tokenizer
The subject of a text or conversation, what it is about
Topic
TD
TDT
TT
Topic Detection and Tracking
Topic Tracking
Topic classification
Topic extraction
The task/process of identifying the topic of a text or dataset (e.g. by clustering keywords or using topic models)
Topic Detection
Topic Detection and Tracking (TDT) refers to automatic techniques for discovering, threading, and retrieving topically related material in streams of data.
A component that guesses the topic of a text
Topic extractor
ML models trainer
ML trainer
A component that is used in training models for machine learning
Trainer of Machine Learning models
Adapted from http://homepages.inf.ed.ac.uk/lzhang10/slm.html
The task/process of training (statistical) language models that that can estimate the distribution of natural language as accurately as possible.
Training of language models
adapted from http://docs.aws.amazon.com/machine-learning/latest/dg/training-ml-models.html
The task/process of creating Machine Learning (ML) models by providing a ML algorithm with training data that help the algorithm discover patterns in data, and construct the appropriate models using these discoveries
Training of Machine Learning models
Training of Neural Machine Translation models
Training of NMT models
Speech transcription
Speech-to-Text conversion
Transcription
Transcription
The task/process of expressing the sense of a sequence of words in one language into another language
Translation
Translation memory management
Translation project management
Translation Technologies Collection
Translation Technologies Collection
Script conversion
Transliteration is the process of transferring a word from the alphabet of one language to another. [vocabulary.com]
Transliteration
Truth detection
Truth labeling
The task of assigning a true/false tag to a news item, headline, claim, etc.
Truth labelling
Tab-separated values
Format for files with tab-separated values
TSV
http://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-de.tudarmstadt.ukp.dkpro.core.io.tuepp-asl
https://zoidberg.ukp.informatik.tu-darmstadt.de/jenkins/job/DKPro%20Core%20Documentation%20(GitHub)/de.tudarmstadt.ukp.dkpro.core$de.tudarmstadt.ukp.dkpro.core.doc-asl/doclinks/5/format-reference.html#format-Tuepp
Format of the Tübingen Partially Parsed Corpus of Written German (TüPP-D/Z) XML files; TüPP D/Z (http://www.sfs.uni-tuebingen.de/de/ascl/ressourcen/corpora/tuepp-dz.html) is a collection of articles from the German newspaper taz (die tageszeitung) annotated and encoded in a XML format.
Tuepp
https://www.w3.org/TR/turtle/
Textual syntax for RDF that allows an RDF graph to be completely written in a compact and natural text form, with abbreviations for common usage patterns and datatypes.
Turtle
Formats used for the UIMA CAS (Common Analysis System) objects
UIMA CAS format
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-de.tudarmstadt.ukp.dkpro.core.io.json-asl
UIMA serialisation in JSON
UIMA/JSON
Unicode-conform tokenizing
User authentication
The task/process of confirming that a system/data resource meets the specifications and fulfills its intended purpose
Validation
A component used to confirm that a system/resource meets the specifications and fulfills its intended purpose
Validator
Variables detection component
A component that tries to identify variables (in social sciences) in a text
Variables dectector
Verbal aggression classification
Verbal aggression detection
Verbal aggression identification
The task/process of recognizing evidence and the target(s) of verbal aggression usually in social media texts
Verbal aggression analysis
Verbal attack target
The target of verbal attacks
Verbal aggression target
Any format used for video files
Video format
A component that supports humans in accessing the contents of a resource
Viewer
The task/process of viewing the contents of a resource as performed by human beings
Viewing
A component or interface that renders the contents of a resource in a graphic way for visualisation purposes
Visualiser
Presentation and Visualisation
Visualisation
The representation of an object, situation, or set of information as a chart, diagram or any other image that helps end users understand the contents or message
Visualization
http://archive-access.sourceforge.net/warc/
https://en.wikipedia.org/wiki/Web_ARChive
Web ARChive format
https://catalog.ldc.upenn.edu/LDC2006T13
https://zoidberg.ukp.informatik.tu-darmstadt.de/jenkins/job/DKPro%20Core%20Documentation%20(GitHub)/de.tudarmstadt.ukp.dkpro.core$de.tudarmstadt.ukp.dkpro.core.doc-asl/doclinks/5/format-reference.html#format-Web1T
File format used by the Web1T n-gram corpus, a huge collection of n-grams collected from the internet.
Web1T
https://www.w3.org/TR/annotation-model/
A structured model and format to enable annotations to be shared and reused across different hardware and software platforms.
Web annotation format
Wheat-related species
Superclass for wiki formats
Wiki format
true
Superclass for wiki formats
Wiki formats
https://zoidberg.ukp.informatik.tu-darmstadt.de/jenkins/job/DKPro%20Core%20Documentation%20(GitHub)/de.tudarmstadt.ukp.dkpro.core$de.tudarmstadt.ukp.dkpro.core.doc-asl/doclinks/5/format-reference.html#format-WikipediaArticle
Format for wikipedia articles
Wikipedia article
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-WikipediaArticleInfo
Format of general article infos
Wikipedia article info
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-WikipediaDiscussion
Format for wikipedia discussion pages
Wikipedia discussion
Formats used for wikipedia
Wikipedia format
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-WikipediaLink
Format for wikipedia links
Wikipedia link
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-WikipediaPage
Format of wikipedia pages in the database (articles, discussions, etc)
Wikipedia page
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-WikipediaQuery
Reads all article pages that match a query created by the numerous parameters of this class.
Wikipedia query
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-WikipediaRevision
Format for wikipedia revision pages
Wikipedia revision
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-WikipediaRevisionPair
Pairs of adjacent revisions of all articles
Wikipedia revision pair
https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-WikipediaTemplateFilteredArticle
Format for wikipedia pages that contain or do not contain the templates specified in the template whitelist and template blacklist
Wikipedia template filtered article
Alignment at word level
Word alignment
The task/process of segmenting (cutting) a word into root and affixes
Word segmentation
http://dkpro.github.io/dkpro-core/releases/1.8.0/docs/typesystem-reference.html#de.tudarmstadt.ukp.dkpro.core.api.semantics.type.WordSense
Corresponds to the structural part of a lexical entry that contains the relevant semantic, grammatical, and anthropological information for a lexical unit. [adapted from http://www.glossary.sil.org/term/sense]
Word sense
WSD
The task/process of identifying which sense of a word with multiple meanings is used in a particular context; the selection of the sense is made from a list of the word's senses.
Word Sense Disambiguation
Word Sense Disambiguation is a subtask of semantic tagging, which consists of assigning a semantic class (sense) to a lexical item as specified by a semantic lexicon. If the semantic lexicon specifies more than one sense for a particular lexical item, a disambiguation procedure is needed to decide upon the most appropriate sense(s) for any given instance of the lexical item in text. WSD is not a self-contained application, but it may be included as an integrated part of a semantic processor.
A component that tries to identify which sense of a word (i.e. meaning) is used in a sentence, when the word has multiple meanings
could also be an annotator and a support component (used in a lot of processes)
Word sense disambiguator
WSI
The task of automatically identifying the senses of a word
Word sense induction
A component that writes processing results in various formats
Writer
The task/process of producing the output results of a process/workflow in various formats
Writing
Data format for documents and corpora using the XCES standard (Corpus Encoding Standard for XML), cf. http://www.xces.org/
XCES
xces; format-variant=ilsp
A variant of XCES implemented for documents
XCES ILSP variant
https://www.w3.org/TR/xhtml1/
html
Data format for XHTML (Extensible HyperText Markup Language)
XHTML
https://www.iana.org/assignments/media-types/application/vnd.xmi+xml
xmi
Data format for the XML Metadata Interchange (XMI), which is an Object Management Group (OMG) standard for exchanging metadata information via Extensible Markup Language (XML)
XMI
Superclass for grouping together XML formats
XML
http://bioc.sourceforge.net/
BioC is a simple format to share text data and annotations.
XML BioC
https://www.w3.org/TR/xpath/
https://zoidberg.ukp.informatik.tu-darmstadt.de/jenkins/job/DKPro%20Core%20Documentation%20(GitHub)/de.tudarmstadt.ukp.dkpro.core$de.tudarmstadt.ukp.dkpro.core.doc-asl/doclinks/5/format-reference.html#format-XmlXPath
XPath is a language for addressing parts of an XML document, designed to be used by both XSLT and XPointer.
XPath
Zero-Shot classification
zip
zip
aif
aif
application/anno+json
application/emma+xml
application/fastinfoset
application/json
application/ld+json
application/ld+json;profile="http://www.w3.org/ns/anno.jsonld"
application/msword
application/pdf
application/pls+xml
application/postscript
application/rdf+xml
application/rtf
application/tei+xml
application/tika
application/vnd.msExcel
application/vnd.ms-powerpoint
application/vnd.msaccess
application/vnd.oasis.opendocument.presentation
application/vnd.oasis.opendocument.spreadsheet
application/vnd.oasis.opendocument.text
application/vnd.openxmlformats-officedocument.presentationml.presentation
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
application/vnd.openxmlformats-officedocument.wordprocessingml.document
application/vnd.xmi+xml
application/x-SDL-TM
application/x-latex
application/x-msaccess
application/x-tex
application/x-tmx+xml
application/x-xces+xml
application/x-xml+alvis
application/x-json-twitter
application/x-kaf+json
application/x.org.dkpro.graf+xml
application/x.org.dkpro.negra3
application/x.org.dkpro.negra4
application/x.org.dkpro.reuters21578+sgml
application/x.org.dkpro.tgrep2
application/x.org.dkpro.tiger+xml
application/x.org.dkpro.tuepp+xml
application/x.org.dkpro.uima+binary
application/x.org.dkpro.uima+json
application/xhtml+xml
application/xml
application/xml+alto
application/zip
audio/aiff
audio/basic
audio/x-flac
audio/flac
audio/MPA
audio/mpa-robust
audio/mpeg
audio/mpeg3
audio/vnd.wave
audio/wav
audio/wave
audio/x-mpeg3
audio/x-wav
avi
Audio Video Interleave
AVI
au
snd
basic
bmp
Bitmap Graphics
BMP
flac
both mimetypes appeat at various pages (e.g. europeana supported mimetypes https://pro.europeana.eu/page/media-formats-mime-types) but none of them is found in IANA official pages
flac
gif
Graphics Interchange Format
GIF
image/bmp
image/gif
image/jpeg
image/png
image/svg+xml
image/tiff
jpeg
jpg
JPG
JPEG
mp3
audio mp3
mp4
MP4
MPEG-4
mpg
audio mpg
png
Portable Network Graphics
PNG
svg
Scalable Vector Graphics
SVG
text/csv
text/html
text/plain
text/sgml
text/tab-separated-values
text/tcf+xml
text/turtle
text/x-cochrane
text/x-pubmed
text/x-anafora
text/x-brat
text/x-eaf+xml
text/x.org.dkpro.conll-2000
text/x.org.dkpro.conll-2002
text/x.org.dkpro.conll-2003
text/x.org.dkpro.conll-2006
text/x.org.dkpro.conll-2008
text/x.org.dkpro.conll-2009
text/x.org.dkpro.conll-2012
text/x.org.dkpro.conll-u
text/x.org.dkpro.imscwb
text/x.org.dkpro.ngram
text/x.org.dkpro.ptb-chunked
text/x.org.dkpro.ptb-combined
text/x.org.dkpro.reuters21578
text/xml
tif
tiff
Tagged Image File Format
TIFF
video/x-msvideo
video/mp4
wav
audio wav
A clause is a subdivision of a sentence containing a subject (argument) and predicate. It is possible to have a word that implies or refers to a predicate rather than one explicitly stated. [Pei & Gaynor 1980: 40, http://linguistics-ontology.org/gold/2010/Clause]
Clause
A word or group of words that function as a single unit in a syntactic structure
Constituent
Co-reference
Coreference is the reference in one expression to the same referent in another expression. [http://www.glossary.sil.org/term/coreference]
As defined here it refers more to the phenomenon than the actual annotation types; referent and co-referent might be more appropriate
Coreference
A text unit that denotes a date, a specific point in time
Date
A type of syntactic relation that holds between linguistic units, where we try to recognise the head (governor) and its dependents
Dependency
Any type of annotation relevant to discourse
Discourse annotation type
Document information
Document metadata
Any kind of annotation that is used to describe a document (e.g. identifier, size, location, language etc.)
Document annotation type
Semantic relation
A relation holding between two or more words based on their meanings
Lexical semantic relation
Grammatical category
Grammatical feature
Morphosyntactic feature
Property of a word that is expressed in its inflected form; examples include person, tense, gender, case etc.
Morphological feature
A division of a text, usually about a single theme, consisting of one or more sentences and marked by a new line, indentation or other conventions.
Paragraph
Grammatical category
Morphosyntactic category
Word category
A division of words based on common grammatical features
Part of Speech
A phrase is a syntactic structure that consists of more than one word but lacks the subject-predicate organization of a clause. [http://www.glossary.sil.org/term/phrase]
Phrase
A feature that distinguishes between positive, negative or neutral; in sentiment analysis, it refers to determining whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral. [adapted from Wikipedia]
Polarity
Frame
A schematic representation of a situation involving various participants, props and other conceptual roles, each of which is a frame element
Semantic frame
A semantic role is the underlying relationship that a participant has with the main verb in a clause [http://www.glossary.sil.org/term/semantic-role]
Semantic role
A group of words, usually containing a verb, that expresses a thought in the form of a statement, question, instruction, or exclamation and starts with a capital letter when written [https://dictionary.cambridge.org/dictionary/english/sentence]
Sentence
Synthetic Speech Generation
Speech Synthesis
A stem is the root or roots of a word, together with any derivational affixes, to which inflectional affixes are added. [http://www.glossary.sil.org/term/stem]
Stem
Any type of annotation that pertains to the syntactic level
Syntactic annotation type
TTS
Text to Speech Synthesis
The task/process of converting speech into natural language text
Text-to-Speech Synthesis
The generation of synthetic speech from text. Typically, a text-to-speech synthesis system performs a text analysis using natural language processing techniques; determines the appropriate phonetic string and prosodic features; and generates a speech signal by employing a concatenative or rule-based synthesis method.
A set of characters surrounded by spaces or punctuation marks, as well as punctuation marks themselves
Token
true
A word is a unit which is a constituent at the phrase level and above. It is sometimes identifiable according to such criteria as (a) being the minimal possible unit in a reply, (b) having features such as a regular stress pattern, and phonological changes conditioned by or blocked at word boundaries, (c) being the largest unit resistant to insertion of new constituents within its boundaries, or (d) being the smallest constituent that can be moved within a sentence without making the sentence ungrammatical. A word is sometimes placed, in a hierarchy of grammatical constituents, above the morpheme level and below the phrase level. [http://www.glossary.sil.org/term/word]
In annotation, words are often used as equivalent to tokens; thus, for instance, punctuation marks (traditionally not considered as words) will also be annotated as "word".
Word