Machine learning appendix
February 1, 2017
to be updated sporadically, suggestions are highly appreciated
Glossary of common ML terminologies
Glossary of common terms and their synonyms (or strongly related).
is written in place of for brevity. Read more.
Term | Common notations and more information (if any) |
---|---|
estimation/prediction/inference of a quantity | the quantity with a “hat” eg. , , |
ước lượng/dự đoán/suy diễn một đại lượng nào đấy | |
objective function, error function, loss function, cost function | ; or ; or |
hàm mục tiêu, hàm lỗi, hàm tổn thất, hàm chi phí | |
learning, training; related: parameter estimation | |
học mô hình, huấn luyện mô hình, ước lượng tham số | |
evaluating, forward pass - as in “computing some quantity given estimates of all unknown quantities” | note: not to be confused with performance evaluation |
tính toán, chiều xuôi | |
score function; related: (inverse) link function | ; or if link function |
hàm tính điểm, hàm liên kết | |
neuron/unit; related: dimension (of a vector) | where is a dimesion of vector |
nơ-ron, chiều (của một vec-tơ) | |
feature; related: independent variable, explanatory variable, predictor | where is a dimesion of vector . note: in practice, most of the time we imply feature as feature vector , thus use vector instead |
vec-tơ đặc trưng, biến độc lập, biến giải thích, nhân tố giải thích | |
mid-/high-level feature, hidden/latent variable; related: hidden neurons/units | ; or ; or if assumed a random variable |
đặc trưng bậc trung hoặc bậc cao, biến ẩn, các nơ-ron ẩn | |
target, label; related: dependent variable, response variable | ; or |
mục tiêu, nhãn, biến phụ thuộc, biến (?) | |
observed data/variable(s), | ; or ; or and if supervised learning |
dữ liệu/biến quan sát được (đã biết) | |
unobserved data/variable(s) | ; or ; or ; or |
dữ liệu/biến không quan sát được (chưa biết) | |
data point | ; or (if working with more than 1 data point) , or if represents a set of data points |
điểm dữ liệu | |
parameter; related: weight, bias (as the “offset” from the origin) | , , respectively |
tham số, trọng số, độ lệch | note: bias is an umbrella term |
bias has more common meanings |
|
parameterized function | |
hàm có tham số | |
parametric distribution | ; or where is density function |
phân bố xác suất có tham số |
More notation
For brevity, is often seen in ML literature instead of as in statistical texts. I also use an overhead circle e.g. to denote random variable, in place of capital letters e.g. X, N, K which are preserved for either sets of data points, matrices, or total number of samples/classes/features/…
Notation | Description |
---|---|
| A random variable. Thus implies a probability distribution. |
| A realization of random variable . Thus is a value. |
| Set of all possible realization ’s of random variable i.e. sample space of . |
| An estimate of random variable |
| short for . We use notation to not confuse with which is reserved for the realization provided by training data . |
Common machine learning models acronym
Acronym | Description |
---|---|
PGM | Probabilistic Graphical Models i.e. Probabilistic models |
GLM | Generalized Linear Models |
GMM, PPCA | (Gaussian) Mixture Models, Probabilistic Principle Component Analysis |
HMM, LDS | Hidden Markov Models, Linear Dynamical Systems (for modelling sequential data) |
Topic Models | Latent Dirichlet Allocation (LDA - not to be confused with Linear Discriminant Analysis) and variants |
DNN | Deep Neural Networks |
MLP, CNN | Multi-layer Perceptrons, Convolutional NNs i.e. FNN - Feed-forward NN |
RNN | Recurrent NNs, also including Recursive NN and Bi-directional RNN (for modelling sequential data) |
EBM | Energy-based Models (undirected PGM) |
RBM, DBN, DBM | Restricted Boltzmann Machines, Deep Belief Networks (not to be confused with Dynamic Bayesian Networks), Deep Boltzmann Machines |
VAE, DRAW, AIR | Variational Auto-encoder, Deep Recurrent Attentive Writer, Attention-Infer-Repeat |
GAN | Generative Adversarial Networks |
AAE | Adversarial Auto-encoders |
SVM | Support Vector Machines |
DP, GP | Dirichlet Processes, Gaussian Processes |
Tree, RF | Decision Trees, Random Forests |
kNN | k-Nearest Neighbours |
Back-propogation algorithm
Conditions: (i) J is differentiable every where w.r.t. theta; (ii) J and all thetas form a directed acyclic computational graph. For what: computes gradient for all ’s of interest.
Use case(s): update optimal parameter ’s of a neural network (or a certain class of probabilistic models) by gradient descent algorithms;