Applied machine learning developers have a lot of open-source machine learning frameworks, e.g., ScikitLearn, Spark MLlib, ML.Net, etc. Frameworks provide a user-friendly high-level interface to algorithms, but implementations resort to low-level languages and optimizations. Such low-level C/C++, C#, or Java code is far from the original mathematical notation that is preferable for machine learning algorithms research and development. Semagle Framework makes the most of low-level C#-like constructs for performance optimization and high-level semi-mathematical F# notation for joining the algorithm blocks. Modularization of algorithms with fine-grained blocks makes research and development of new implementations for the same family of problems straightforward.
Semagle framework includes the following principal F# libraries:
- Metrics Library (Semagle.MachineLearning.Metrics)
- Vectors Library (Semagle.Numerics.Vectors)
- SVM Library (Semagle.MachiveLearning.SVM)
- Structural SVM Library (Semagle.MachiveLearning.SSVM)
Metrics Library
Semagle.MachineLearning.Metrics library implements performance evaluation metrics for classification, regression, and information retrieval:
Accuracy is used for two and one class classification problems.
\[accuracy = \frac{1}{n} \sum_{i=1}^n \mathbb{1}_{predicted_i = expected_i}\]let accuracy (expected: 'Y[]) (predicted: 'Y[]) =
...
Classification.accuracy test_y predict_y
Recall is used for classification and sequence labeling problems.
\[recall = \frac{\sum_{i=1}^n \mathbb{1}_{predicted_i=expected_i=+1}}{\sum_{i=1}^n \mathbb{1}_{expected_i=+1}}\]let recall (expected: 'Y[]) (predicted: 'Y[]) =
...
Classification.recall test_y predict_y
Precision is used for classification and sequence labeling problems.
\[precision = \frac{\sum_{i=1}^n \mathbb{1}_{predicted_i=expected_i=+1}}{\sum_{i=1}^n \mathbb{1}_{predicted_i=+1}}\]let precision (expected: 'Y[]) (predicted: 'Y[]) =
...
Classification.precision test_y predict_y
F1 is used for classification and sequence labeling problems.
\[F_1 = 2 \times \frac{recall \times precision}{recall + precision}\]let f1 (expected: 'Y[]) (predicted: 'Y[]) =
...
Classification.precision test_y predict_y
Mean squared error (MSE) is used for regression problems.
\[MSE = \frac{1}{n}\sum_{i=1}^n (predicted_i - expected_i)^2\]let mse (expected: float32[]) (predicted: float32[]) =
...
Regression.mse test_y predict_y
Vectors Library
Semagle.Numerics.Vectors library provides DenseVector
and SparseVector
vector types. These data types implement operations required by standard SVM kernels. SparseVector
is also a basic data type for the Structured SVM implementation.
DenseVector
is constructed from the array of float32
values. This type of vector stores zero and non-zero values and suits well for feature vectors with many non-zero elements.
let a = DenseVector([| 1.0f; 3.0f; -3.0f; 4.0f; 8.0f |])
SparseVector
is constructed from the array of int indices of non-zero elements and the array of float32
values of non-zero elements. This type of vector suits well for feature vectors with a few non-zero elements.
let a = SparseVector([|0; 1; 3; 5|], [|1.0f; -2.0f; 4.0f; -6.0f|])
DenseVector
and SparseVector
support the following operations:
- indexing
a.[1]
and slicinga.[1..3]
,a.[..3]
,a.[2..]
, but implementations ofSparseVector
operations are more expensive because they require binary search for finding item indices; - element-wise addition
a + b
, subtractiona - b
, multiplicationa * b
, and divisiona / b
; - unary negation
-a
and multiplicationa * 1.5f
and divisiona / 2.0f
by a scalar; - dot product
a .* b
and Euclidean distancea ||-|| b
.
SVM Library
Semagle.MachineLearning.SVM library provides a high-level interface for classification and regression support-vector machines and a low-level interface for sequential minimal optimization 1,2. All interfaces are data type agnostic and with kernel functions only:
type Kernel<'X> = 'X -> 'X -> float
A high-level interface includes the following models:
- TwoClass for two-class classification;
- OneClass for two-class classification with training on positive samples only;
- MultiClass for multi-class classification;
- Regression for real-valued regression.
type SVM<'X,'Y> =
/// SVM model for two class classification
| TwoClass of Kernel<'X> * ('X[]) * (float[]) * float
/// SVM model for one class classification
| OneClass of Kernel<'X> * ('X[]) * (float[]) * float
/// SVM model for regression
| Regression of Kernel<'X> * ('X[]) * (float[]) * float
/// SVM model for multi-class classification
| MultiClass of Kernel<'X> * (('Y * 'Y * ('X[]) * (float[]) * float)[])
Prediction functions for different problems are similar:
module TwoClass =
let inline predict (model : SVM<'X,'Y>) (x : 'X) = ...
module OneClass =
let inline predict (model : SVM<'X,'Y>) (x : 'X) = ...
module Regression =
let inline predict (model : SVM<'X,'Y>) (x : 'X) = ...
module MultiClass =
let inline predict(model : SVM<'X,'Y>) (x : 'X) = ...
They are based on the weighted sum of values of kernel functions for support vectors:
module SVM =
let inline predict (x : 'X) (K : Kernel<'X>) (X : 'X[]) (A : float[]) (b : float) =
Array.fold2 (fun sum x_i a_i -> sum + a_i * (float (K x_i x))) b X A
The matching problem solvers show how to apply the generic sequential minimization solver to specific problems:
Two-class and multi-class problem solvers build classification models for data type X
. Labels for two-class classifiers are +1/-1
. For multi-class, they have an arbitrary data type Y
:
/// Optimization parameters for C_SVC problem
type C_SVC = {
/// The penalty for +1 class instances
C_p : float;
/// The penalty for -1 class instances
C_n : float;
}
/// Two class C Support Vector Classification (SVC) problem solver
let C_SVC (X : 'X[]) (Y : float32[]) (K : Kernel<'X>) (parameters : C_SVC) (options : OptimizationOptions) = ...
/// Multi-class C Support Vector Classification (SVC) problem solver
let C_SVC_M (X : 'X[]) (Y : 'Y[]) (K : Kernel<'X>) (parameters : C_SVC) (options : OptimizationOptions) = ...
One-class problem solver builds classification models for data type X
and +1/-1
labels
for an input dataset with positive samples only.
/// Optimization parameters for One-Class problem
type OneClass = {
/// The fraction of support vectors
nu : float;
}
/// One-Class problem solver
let OneClass (X : 'X[]) (K : Kernel<'X>) (parameters : OneClass) (options : OptimizationOptions) = ...
Regression problem solver builds regression models for data type X
and float32
labels.
/// Optimization parameters for C_SVR problem
type C_SVR = {
/// The boundary of the approximated function
eta : float;
/// The penalty
C : float;
}
/// C Support Vector Regression (SVR) problem solver
let C_SVR (X : 'X[]) (Y : float32[]) (K : Kernel<'X>) (parameters : C_SVR) (options : OptimizationOptions) = ...
Sequential minimization optimization solver is a low-level building block for the above problem-specific problem solvers and Structured SVM optimization below:
/// General parameters for C_SMO problem
type C_SMO = {
/// Initial feasible values of optimization varibles
A : float[];
/// Per-sample penalties
C : float[];
/// The linear term of the optimized function
p: float[];
}
/// Sequential Minimal Optimization (SMO) problem solver
let C_SMO (X : 'X[]) (Y : float32[]) (Q : Q) (parameters : C_SMO) (options : OptimizationOptions) =
Structural SVM Library
Semagle.MachineLearning.SSVM library provides a high-level interface for multi-class classification and sequence labeling, and a low-level cutting-plane solver for “1-slack” 3 structural SVM formulation. All high-level interfaces require two functions:
FeatureFunctions
creates a feature vector for an arbitrary data typeX
;LossFunction
defines a loss value for predictedy
and expectedy'
labels of an arbitrary data typeY
.
/// Simple feature function
type FeatureFunction<'X> = 'X -> Vector
/// Structured SVM loss function type
type LossFunction<'Y> = (* y *) 'Y -> (* y' *)'Y -> float
Multi-class classification model builds a linear classification model in the combined feature and label space:
/// Structured SVM model for multi-class classification
type MultiClass<'X,'Y> = MultiClass of FeatureFunction<'X> * (* W *) float[] * 'Y[]
module MultiClass =
/// Optimization parameters for the multi-class structured SVM
type MultiClass<'Y> = {
/// Penalty for slack variables
C : float;
/// Loss function
loss : LossFunction<'Y>
}
/// Learn structured SVM model for multi-class classification
let learn (X : 'X[])(Y : 'Y[])(F: FeatureFunction<'X>)(parameters: MultiClass<'Y>)(options : OneSlack.OptimizationOptions) = ...
/// Predict class by multi-class structured model
let predict (MultiClass(F, W, Y) : MultiClass<'X,'Y>) (x : 'X) : 'Y = ...
Sequence labeling model builds a linear classification model in the combined feature and states space:
/// Structured SVM model for sequence labeling
type Sequence<'X,'Y> = Sequence of FeatureFunction<'X> * (* W *) float[] * 'Y[]
module Sequence =
/// Optimization parameters for the sequence structured SVM
type Sequence<'Y> = {
/// Penalty for slack variables
C : float;
/// Loss function
loss : LossFunction<'Y>
}
/// Learn structured SVM model for sequence labeling
let learn (X : 'X[][]) (Y : 'Y[][]) (F : FeatureFunction<'X>) (parameters : Sequence<'Y>) (options : OneSlack.OptimizationOptions) = ...
/// Predict labals sequence by sequence structured SVM
let predict (Sequence(F,W,Y) : Sequence<'X, 'Y>) (X : 'X[]) : 'Y[] = ...
“1-slack” cutting-plane solver is a low-level building block for the above problem-specific Structural SVM solvers. It requires a JointFeatureFunction
that maps $\mathcal{X} \times \mathcal{Y}$ space into a sparse vector space (vectors with predominately zero elements). As a result of optimization, the solver returns a weight vector for the sparse vector space features:
/// Joint feature function type
type JointFeatureFunction<'X,'Y> = (* x *) 'X -> (* y *) 'Y -> SparseVector
// 1-slack optimization problem solver
let oneSlack (X : 'X[]) (Y : 'Y[]) (JF : JointFeatureFunction<'X,'Y>) (parameters : OneSlack<'X,'Y>) (options : OptimizationOptions) = ...
Check code, documentation, samples and API reference for more details. Happy hacking!
References
-
Chang, C. & Lin, C. LIBSVM : A Library for Support Vector Machines // ACM Trans. Intell. Syst. Technol. v.2, 2011 - p. 1–39. ↩
-
Fan, R.-E., Chen, P.-H. & Lin, C.-J. Working Set Selection Using Second Order Information for Training Support Vector Machines. // J. Mach. Learn. Res. v.6, 2005 - p. 1889–1918. ↩
-
Joachims, T., Finley, T. & Yu, C. J. Cutting-Plane Training of Structural SVMs // Mach. Learn. v.77, 2009 - p. 27–59. ↩