Semagle

Semagle F# Framework

Applied machine learning developers have a lot of open-source machine learning frameworks, e.g., ScikitLearn, Spark MLlib, ML.Net, etc. Frameworks provide a user-friendly high-level interface to algorithms, but implementations resort to low-level languages and optimizations. Such low-level C/C++, C#, or Java code is far from the original mathematical notation that is preferable for machine learning algorithms research and development. Semagle Framework makes the most of low-level C#-like constructs for performance optimization and high-level semi-mathematical F# notation for joining the algorithm blocks. Modularization of algorithms with fine-grained blocks makes research and development of new implementations for the same family of problems straightforward.

Building Trees from Materialized Paths

Materialized path or path enumeration model1 stores the path to the tree node as a string or a list by concatenating the keys of the nodes in the path. Simple node inserts and removals, child and other descendants list, and parent and ancestors list queries make materialized path model attractive for large-scale applications. Fetching a subtree or an entire three rows in a single query is easy. An obvious solution to restoring the structure of a tree or subtree is to build a key-value map and append the child to the specific parent. This approach requires additional memory and relies on mutable data structures. Instead, it is enough to sort the lines in nesting order and build the tree structure by adding children to the previous parent.

PassCard - create stories instead of passwords

The idea of PassCard was born from the frustration with password management solutions. Technology should make our lives easier and better, and pervasive online services help us greatly, but we do not feel free anymore. Each time we read an email on parents’ notebooks or pay in the store with another card, we need to check our phones or write down passwords on a piece of paper.

Simple DSL for Logging in F#

Sooner or later printfn style of logging becomes too cumbersome and you start to search for a logging library. For F# developers the most obvious choice is Logary, but very soon you find out that your Logary logging code is even less readable. In this article you will find F# tricks, which helped me to create a neat abstraction for logging in Semagle Framework.

Semagle F# Framework (Beta)

During last 10-15 years, machine learning gradually moved from academia to industry. There are many open source frameworks like Scikit Learn, Spark MLlib and hundreds of lines of R code available for applied developers. Yet experimenting and development of machine learning algorithms remain difficult problems. Functional programming languages like Haskell, OCaml and F# have appealing semi-mathematical notation that greatly simplifies machine learning algorithms implementation, but could meet required performance restrictions. Semagle F# Framework is a successful experiment that demonstrates how to create and refine the functional code for clarity and performance, and now its code is available on GitHub.

Optimization of F# implementation of SVM

The post “SVM Classification in F#” shows how fast is to implement SVM classification and Sequential Minimal Optimization (SMO) method in F#, but it doesn’t show how fast is the F# implementation. Unfortunately, the performance of that code is too small for practical applications. Apparently, there are intrinsic limitations of the .Net execution model and Mono virtual machine, which prevent to achieve a native code performance, and this overhead needs to be estimated. However, the main factor is the computational complexity of the implementation, and this problem can be solved.

SVM Classification in F#

Support Vector Machines (SVMs) is a very popular machine learning method for classification, regression, distribution estimation, etc. Exceptional feature of this method is an ability to handle objects of a diverse nature as soon as there is a suitable kernel function. Nonetheless, popular software libraries like LIBSVM 1 and SVMlight 2 are designed for vector data and it is hard to adopt them for other object types. The F# implementation seems to be promising in terms of readability and extensibility.

Random Numbers in F#

Many problems in engineering, finance and statistics can not be solved by direct methods, but a great number of them can be solved approximately using randomized algorithms. All those algorithms need flexible and efficient pseudo-random number generators. An effective implementation of PRNG in the F# language is somewhat tricky.

Summary statistics in F#

Summary statistics are commonly used to build a simple quantitative description of a set of observations. Simple descriptions include mean, variance, skewness and kurtosis, which are quantitative measures of location, spread and shape of the data distribution. However, straightforward implementations of these measures in F# do not scale to large amounts of data. There are more sophisticated methods, but imperative implementations of those methods use mutable variables. Nonetheless, mathematical definitions of those methods allow to build effective functional implementation using higher-order functions in F#.

Data Sources in F#

There are three popular data formats CSV (Comma Separated Values), JSON (JavaScript Object Notation) and XML (Extensible Markup Language), which are very frequently used in data science. F# Data library (FSharp.Data) implements almost everything you need to access data stored in CSV, JSON and XML formats. Moreover, FSharp.Data implements F# type providers that infer the record structure from a sample document and, thus, allow to check the record structure at the compile time.

F# for Data Science

How functional programming and type inference can help you to manage of large amounts of structured and unstructured data, merge multiple data sources and API, create visualizations for data interpretation, build mathematical models based on data, and present the data insights/findings?