During last 10-15 years, machine learning gradually moved from academia to industry. There are many open source frameworks like Scikit Learn, Spark MLlib and hundreds of lines of R code available for applied developers. Yet experimenting and development of machine learning algorithms remain difficult problems. Functional programming languages like Haskell, OCaml and F# have appealing semi-mathematical notation that greatly simplifies machine learning algorithms implementation, but could meet required performance restrictions. Semagle F# Framework is a successful experiment that demonstrates how to create and refine the functional code for clarity and performance, and now its code is available on GitHub.
At the current stage, Semagle F# Framework provides only vector and SVM classification and regression libraries. However, the SVM library is not tied to numeric vectors and works with any object type that have a suitable kernel. F# is a very concise language and a typical SVM classifier training and test phases look like:
#r "Semagle.Numerics.Vectors.dll"
#r "Semagle.Numerics.Vectors.IO.dll"
#r "Semagle.MachineLearning.SVM.dll"
#r "Semagle.MachineLearning.SVM.dll"
open LanguagePrimitives
open System
open Semagle.Numerics.Vectors
open Semagle.Numerics.Vectors.IO
open Semagle.MachineLearning.SVM
let readData file = LibSVM.read file |> Seq.toArray |> Array.unzip
let train_y, train_x = readData fsi.CommandLineArgs.[1]
let test_y, test_x = readData fsi.CommandLineArgs.[2]
let svm = SMO.C_SVC train_x train_y (Kernel.rbf 0.1f)
{ C_p = 1.0f; C_n = 1.0f; epsilon = 0.001f;
options = { strategy = SMO.SecondOrderInformation;
maxIterations = 1000000;
shrinking = true; cacheSize = 200<MB> } }
let predict = TwoClass.predict svm
let accuracy =
let correct (Array.zip test_y predict_y)
|> Array.sumBy (fun (t, p) -> if t = float32 p then 1 else 0) in
100.0 * (DivideByInt (float correct) (Array.length test_y))
printfn "Accuracy: %f" accuracy
The SVM library is designed for extensibility (works with any data type and kernel function) and performance (data types and kernels do not need to implement common interfaces), and, thus, does not provide model serialization support. The actual performance is quite impressive on .Net Core 1.0.1 (see MacOS and Windows detailed results):
Time (seconds) | LIBSVM 3.20 Native | Mono 4.4.2 | .NET Core 1.0.1 |
---|---|---|---|
train | 56.497 | 702.595 | 234.270 |
test | 12.706 | 88.464 | 36.199 |
Semagle F# Framework is not just another piece of code. There are documentation, samples and API reference for developers. Happy hacking!