This is me

Welcome to my technology blog

Sometimes I take a few hours over the weekends or in the airports to try new things. I love technology and the randomness of the topics below is largely driven by my curiosity and the experiments I have done during these periods. All the code is written by me and its only purpose is learning, experimentation and fun. When analyzing data, I either use public sources or generate it myself to play with a certain algorithm.

Posts

May 19, 2022
Timeseries Analysis with PyTorch

PyTorch is a widely used machine learning library, has an beautiful pythonic syntax and, above all, runs extremely fast on my M1 MacBook with no hacking required to make it run. I write this post following the steps I made to learn the library, by roughly translating the Time series forecasting with TensorFlow tutorial to PyTorch, while making changes to it along the line to satisfy my curiosity.
Aug 14, 2021
Timeseries Analysis in Python

This post is about statistical models for timeseries analysis in Python. We will cover the ARIMA model to a certain depth.
Feb 26, 2021
Simulated Annealing in Go

As Go is quickly becoming my favourite programming language, in this post we switch gears and implement an optimization algorithm - simulated annealing. We will solve the travelling salesman problem and, in the process, we will build a desktop app and a bare bones charting library.
Jan 23, 2021
First Steps In Go - WebSockets

These are my first steps in Go, this time learning how to extend my previous web service with WebSockets. The brower subscribes to changes to a set of products by sending one or more Subscribe or Unsubscribe JSON messages to the service, through a WebSocket connection. Each message contains a series of product IDs. The server maintains the map connection - subscriptions and listens to notifications on product changes from a Postgres database. The post also touches HTTPS, HTTP/2 and server push.
Jan 12, 2021
First Steps In Go - Web Services

These are my first steps in Go, this time learning how to build web services. The post touches handling requests, json serialization, middleware, logging, database access and concurrency. Websockets and templates will be covered in a future post.
Jan 9, 2021
First Steps In Go

My first steps in Go, largely based on the Golang tutorial and Internet side searches.
Oct 24, 2020
Programming Problems In C++

This post is a collection of several programming problems implemented in C++. It’s good to keep the edge sharp by solving some algorithmic problems from time to time.
Apr 15, 2020
3D From Scratch

This post is about implementing a 3D renderer from scratch, with no help from any graphics or maths library. It is implemented in pure JavaScript and it follows roughly the first half of the excellent tiny renderer tutorial.
Apr 15, 2020
WebGL Fun

A post about computer graphics, for the web mostly, with JavaScript, WebGL, ThreeJS and shaders. A little bit of maths also.
Apr 4, 2020
Introduction to Cryptography (Part 3)

This is the third part of Introduction to Cryptography. The post covers the Java APIs that implement the same algorithms that we spoke about in the previous posts, symmetric and asymmetric encryption, as well as digital signatures. We also talk a little bit about password security and the principle behind rainbow tables.
Mar 29, 2020
Introduction to Cryptography (Part 2)

This is the second part of Introduction to Cryptography. The post covers symmetric encryption with cypher block chaining, the principles behind DES and AES, asymmetric encryption with RSA and a little bit on validating authenticity.
Mar 28, 2020
Introduction to Cryptography

A very brief introduction to cryptography.This post covers one-time pads, a little bit of random numbers and the Diffie-Hellman algorithm.
Feb 29, 2020
Reverse Engineering Expected Goals

In this post we are going to look again at the 2018-2019 Premier League season and try to reverse engineer bookmakers odds in order to obtain the expected goals for each match. Then, we are going to try to improve on these models and reduce our reliance on bookmakers odds. In the process, I will present two ways of implementing the Poisson regression in Python - one from scratch and one based on the the statsmodel library.
Dec 24, 2019
Stats Again

A summary of statistics notions not found anywhere else on this blog. The post touches among others: some descriptive statistics measures, hypothesis testing, goodness of fit, Lasso and Ridge regression.
Dec 1, 2019
Decision Trees

A short introduction to decision trees (CART). Decision trees are a supervised machine learning technique used for classification and regression problems. Since the launch of XGBoost, it has been the preferred way of tackling problems at machine learning competitions and not only, because of easy setup, learning speed, straightforward tuning parameters, easy to reason about results and, above all, very good results. Arguably, with current implementations, decision trees outperform many of the other machine learning techniques.
Sep 4, 2019
Streaming Architecture Using Apache Kafka And Redis

This post describes an architecture based on Apache Kafka and Redis that can be applied to building high performing, resilient streaming systems. It applies to near-realtime systems, where a stream of events needs to be processed and the results submitted to a large list of subscribers, each of them receiving its own view of the stream.
Jul 26, 2019
Odds And Models

Experiments with sports prediction models. It starts from a multiplicative expected goals model, then it computes the odds for home draw away markets in two different ways, and then compare the performance of these odds to the bookmakers’ odds.
Jan 22, 2019
Elixir Intro

Some fun programming over the weekend, reading a little bit about Elixir, a very beautiful and succinct language. I hope, Elixir, someday we will meet for an exciting project together.
Jan 6, 2019
MMalloc

A small, few hours project, which started from the idea of creating an allocator for persistent memory that can be used to load existing memory structures from disk on process restart. It never got that far in the few hours spent on it, but it was fun to write so here it goes. Clearly, not something that I would ever put in a production project, as any memory corruption would propagate after process restart, but the idea of a location independent allocator is fun nevertheless.
Oct 15, 2018
Postgres

An introduction to PostgreSQL, including an example of full text indexing and search at the end of the article.
Aug 24, 2018
Logistic Regression

While linear regression is about predicting effects given a set of causes, logistic regression predicts the probability of certain effects. This way, its main applications are classification and forecasting. Logistic regression helps find how probabilities are changed by our actions or by various changes in the factors included in the regression.
Aug 22, 2018
Dimensionality Reduction

This post is about dimensionality reduction. It starts with PCA, a method for finding a set of axes (dimensions) along which to perform regression such that the most relevant information in data is preserved, while multicollinearity is avoided. It then applies PCA and K-Means to a dataset of Premier League player performances, in order to obtain relevant groupings despite randomness in data.
Aug 17, 2018
Linear Regression Done Right

In several of my previous posts I wrote about linear regression, but in none I wrote about when and how to use it correctly and how to interpret the results. Computing a simple regression line is easy, but applying it blindly will surely lead to incorrect results.
Aug 15, 2018
Risk Models

A not-so-short discussion about finanical risk, starting from computing the risk for a portfolio of stocks and discussing which ideas might be applied to modelling risk for a sportsbook. Many points presented here have not been tested in production or on real datasets.
Jul 26, 2018
Probability Notes

A short introduction to probability. A collection of notes I took for myself, published here.
Dec 3, 2017
Common Design Patters Implemented In Python [WIP]

An unfinished post aiming to cover the implementation in Python of the most commonly used design patterns.
Nov 27, 2017
Domain Driven Design Notes

A very high level introduction to Domain Driven Design (DDD). DDD is a useful technique if the application has complex business functionality. It will not help with big data, performance or hardware specific optimizations.
Oct 29, 2017
Risk-based Odds Adjustment Using Bayesian Inference

This post attempts to sketch a solution to a problem a bookmaker might have: starting from a predefined set of odds, how do we adjust them so that the risk for the bookmaker is minimized? We take into consideration the financial stakes the bettors have in the game, the total financial risk the house is willing to take and will only analyse the 1x2 market (home-draw-away), to keep the code clean.
Aug 11, 2017
Hazelcast Intro

A short introduction to Hazelcast: what it is, what scenarios is can be used for, how to run a test environment.
Jun 11, 2017
Apache Kafka

This article covers running a Kafka cluster on a development machine using a pre-made Docker image, playing around with the command line tools distributed with Apache Kafka and writing basic producers and consumers. The source code associated with this article can be found here
Jun 3, 2017
RabbitMQ Patterns and Considerations

A short intro into RabbitMQ and its C# client. The code which merges most of the concepts from this article can be found here.
May 23, 2017
Classification

A post about classification and metrics used for validating classification models.
May 14, 2017
P-Values And Hypothesis Testing

This post is about p-values and hypothesis testing. What they are, why they are needed, how to compute them and how to use them. It also includes a worked example, how to validate that an A/B test indeed produces a significant outcome. The article follows quite closely a chapter from “Data Science from Scratch” by Joel Grus and it is annotated with my own research and observations.
Apr 30, 2017
Symbolic Maths in Python

Ability to perform symbolic computations is a crucial component of any mathematics-oriented package. Symbolic mathematics is used to work with complex expressions, sets and probabilities, perform integrals or derivatives, plot charts based on user input, all without explicit numeric computations. This way, the Python interpreter becomes very much like a piece of paper on which one can jot down equations. To exemplify these, by the end of the article I will implement a short gradient descent function to demonstrate the power of sympy to code easy-to-work-with generic algorithms.
Apr 27, 2017
Short Intro To Reading System Behavior Charts

Intervention in a system based on incorrect assumptions about data generated by it demonstratably leads to worst future behavior than in the case of non-intervention. However, in most cases, our bias to action or external pressure force us to make changes, leaving the system in a worst shape. Assuming a system is producing consistent data, how do we know when or if to intervene? This article introduces the notion of System Behavior Charts which is tightly connected to any methodology that has the aim of continuous improvement.
Apr 3, 2017
Nature Inspired Optimizations

This post is about optimization algorithms. I am going to present the general structure of a nature-inspired algorithm which blends several methods in a single code base: population evolution (mutation) based on a stochastic process, elite selection and maintenance and variable mutation rate to escape local optima.
Mar 25, 2017
Simple and Multiple Linear Regression

This is the second part of my Machine Learning notebook. It talks about simple and multiple linear regression, as well as polynomial regression as a special case of multiple linear regression. It provides several methods for doing regression, both with library functions as well as implementing the algorithms from scratch.
Mar 18, 2017
Statistics, continued

In this post I tackle basic algorithms for computing probability density functions and cumulative distribution functions, as well as generating random numbers according to a distribution. Afterwards, I will compute the Gini index (G) and the entropy (H) coefficients, which are used to measure how much a distribution differs from the uniform distribution. Towards the end, I will touch the topic of validation with two methods: bootstrapping and k-fold cross validation.
Mar 11, 2017
My Machine Learning Notebook - Part 1 (Data Preprocessing)

This is the first part of my Machine Learning notebook. As I am totally new to ML, the content follows the Udemy course “Machine Learning A-Z™: Hands-On Python & R”. As in the course, I’ll be using Spyder and RStudio.
Mar 9, 2017
C++ Play - Green Threads in C++

In this post I will write about green threads and provide a sample implementation in C++ - or, better say, in C++ mixed with assembly language. The code is highly compiler-dependant (VC++17) and fragile, thus not production-ready. It makes assumptions about stack layout and data in registers and even something as simple as turning on optimizations will most likely crash it. But it was a fun programming exercise for a concept often implemented in asynchronous libraries or found in some programming language constructs. Interesting links: Fibers, Coroutines, Set Context, Actors, Cooperative multitasking
Mar 5, 2017
Docker And Containers Part 3

This is the third part of my personal Docker cheatsheet. The content includes: cleaning up the system, running an SQL Server Windows container, connecting to it through the SQL Server Management Studio and using volumes for storing container persistent data (in this post saving mysql databases). The last part will be dedicated to Docker Compose.
Feb 25, 2017
Docker And Containers Part 2

This is the second part of my personal Docker cheatsheet. In this part I will enable host file access to containers and continue with the more complex example of building and debugging Linux C++ applications in Visual Studio, using a Debian container in the background.
Feb 20, 2017
C++ Play - Multithreading (Queues)

In this post I am going to build a multithreaded queue to exemplify various issues regarding synchronization. I am going to measure the cost of locking and thread contention and provide some ideas to improve performance. I am going to build the tests incrementally, refining the solution as I progress through the code, but I will also skip some steps to keep the post meaningfully short.
Feb 12, 2017
Docker And Containers

In this blogpost I write about Docker. My personal Docker cheatsheet. So why Docker? Because I am interested in distributed computing and I want to play with various technologies. I want to be able to spin quickly complex setups and, at the same time, keep my desktop as clean as possible. The good news is that I can find all the software I need as docker containers. Even SQL Server or IIS, if I want. Beside a huge collection of sofware already configured and the means of keeping tidy your deployments, Docker offers process isolation / sandboxing, simplified updates and ability to build your own repositories. This post also touches Windows Containers running on Windows Nano Server or Windows Server Core - new features available in the Docker for Windows space.
Feb 11, 2017
C++ Play - Multithreading Part 2

In this post I touch a little bit synchronization primitives and write two different implementations for a reader-writer lock. The code comes from an earlier project, started before std::shared_lock was introduced to the standard. Beside the standard library, RW locks are part of Windows SDK as Slim Reader/Writer (SRW) Locks and on Linux, with pthreads. There is no need (and not recommended) to implement them manually, as I do in the code below. But since this is a tech blog and and it is meant to play around with technology, here are two possible implementations.
Feb 4, 2017
C++ Play - Multithreading

In this post I am planning to build upon C++ 11/14 features introduced in the previous posts and play around with threads and synchronization. I am going to start with the threading support introduced in the standard library, then move forward to Windows-specific code. I will use Windows code to exemplify how to build a minimal type-safe threading library using the new C++ 11/14, similar to the standard one.
Jan 29, 2017
C++ Play - my own short guide to C++11/14 (Part 2)

In the previous blog post I covered a lot of ground: variadic templates, improvements to classes, improvements to standard library, smart pointers and lambdas. We also briefly touched move semantics as well as decltype and declval. In this part I am going to continue with move, delegated constructors, user defined literals and static_assert. In the next article I will write about local storage for threads while briefly touching thread support in the new standard library.
Jan 28, 2017
C++ Play - my own short guide to C++11/14

C++ has evolved dramatically in the past years. It is virtually a new language, a huge improvement over the C++ I used to practice in the 2000s. Much more expressive templates, richer libraries, more emphasis on compile-time safety and performance improvements. Far from the C with classes in the early 2000s and far from the convoluted template constructs which appeared later to improve compile-time issue detection and type safety. To refresh my memory and also get up to speed with the new trends, I have spent a couple of hours digging through the docs, doing some tutorials. Here are my notes.
Jan 26, 2017
Welcome to Jekyll!

I was looking for a simple blogging platform that provided for my following needs.