Gene Ontology (GO)
This is a widely used resource that has established a gene ontology that categorizes current scientific knowledge about the functions of the genes of many different organisms, ranging from humans to bacteria, as gene annotations. Go contains a vocabulary of terms and establishes the relationship between them for, for example, diverse types of biological functions (Molecular Function), pathways that carry out different biological programs (Biological Process), or locations where these occur (Cellular Component).
Variants in the DNA sequence between individuals of the same species which are found with a frequency of more than 1% (less frequent variants are called mutations).
Discipline focused on the study of the structure and function of genomes.
Strategy used to search for the best hyperparameters of a learning algorithm by evaluating all the possible combinations on a set of user-defined values.
Graphical representation of data in a matrix form in which the cells are colored according to the value of the data they contain.
Condition in which the learning problem contains a high number of input variables.
This is also called next-generation or massive sequencing and is a method used to simultaneously sequence thousands to millions of different DNA fragments.
Hold-out or Retention
Validation or partitioning technique by which the training and test sets are divided into two unique disjointed sets.
A multi-dimensional plane representing a decision boundary for the classification of samples. Samples (or data points) on either side of the hyperplane will be predicted as distinct classes.
Impurity or Gini index
Metric used in the construction of a decision tree to determine the ‘’ pair that best separates the two classes of the problem.