R

Parallel Monte Carlo: Simulating Compound Poisson Processes using C++ and TBB

Introduction In this post we implement a function to simulate random samples of a Compound Poisson variable. A random variable \(L\) is a compound Poisson (CP) random variable if there exists a Poisson random variable \(N\), and a random variable \(S\) such that

Data and their misbehavior

To be honest, I use the clickbaity word “data” in the title when I really mean “sample statistics”. The point of this post is first illustrated using a sample mean, but applies to any estimate computed from data.

Expectation Maximization, Part 2: Fitting Regularized Probit Regression using EM in C++

Introduction In the first post in this series we discussed Expectation Maximization (EM) type algorithms. In the post prior to this one we discussed regularization and showed how it leads to a bias-variance trade off in OLS models.

In Machine Learning, why is Regularization called Regularization?

Introduction Many newcomers to machine learning know about regularization, but they may not understand it yet. In particular, they may not know why regularization has that name. In this post we discuss the numerical and statistical significance of regularization methods in machine learning and more general statistical models.

Passing expressions and data from R to C++ before compile-time in Rmarkdown

Introduction In this post we give a simple illustrative example of how data generated by R code can be used by compiled languages such as C++ at compile time, instead of run-time, inside Rmarkdown.

Deriving Principal Component Analysis and implementing in C++ using Eigen

Introduction Principal component analysis is one of the most commonly used techniques in statistical modeling and machine learning. In typical applications it serves as a (linear) dimensionality reduction, allowing one to project high dimensional data onto a lower dimensional subspace.