This is me

Welcome to my technology blog

Sometimes I take a few hours over the weekends or in the airports to try new things. I love technology and the randomness of the topics below is largely driven by my curiosity and the experiments I have done during these periods. All the code is written by me and its only purpose is learning, experimentation and fun. When analyzing data, I either use public sources or generate it myself to play with a certain algorithm.

Posts

  • Timeseries Analysis with PyTorch

    PyTorch is a widely used machine learning library, has an beautiful pythonic syntax and, above all, runs extremely fast on my M1 MacBook with no hacking required to make it run. I write this post following the steps I made to learn the library, by roughly translating the Time series forecasting with TensorFlow tutorial to PyTorch, while making changes to it along the line to satisfy my curiosity.

  • Timeseries Analysis in Python

    This post is about statistical models for timeseries analysis in Python. We will cover the ARIMA model to a certain depth.

  • Simulated Annealing in Go

    As Go is quickly becoming my favourite programming language, in this post we switch gears and implement an optimization algorithm - simulated annealing. We will solve the travelling salesman problem and, in the process, we will build a desktop app and a bare bones charting library.

  • First Steps In Go - WebSockets

    These are my first steps in Go, this time learning how to extend my previous web service with WebSockets. The brower subscribes to changes to a set of products by sending one or more Subscribe or Unsubscribe JSON messages to the service, through a WebSocket connection. Each message contains a series of product IDs. The server maintains the map connection - subscriptions and listens to notifications on product changes from a Postgres database. The post also touches HTTPS, HTTP/2 and server push.

  • First Steps In Go - Web Services

    These are my first steps in Go, this time learning how to build web services. The post touches handling requests, json serialization, middleware, logging, database access and concurrency. Websockets and templates will be covered in a future post.

  • First Steps In Go

    My first steps in Go, largely based on the Golang tutorial and Internet side searches.

  • Programming Problems In C++

    This post is a collection of several programming problems implemented in C++. It’s good to keep the edge sharp by solving some algorithmic problems from time to time.

  • 3D From Scratch

    This post is about implementing a 3D renderer from scratch, with no help from any graphics or maths library. It is implemented in pure JavaScript and it follows roughly the first half of the excellent tiny renderer tutorial.

  • WebGL Fun

    A post about computer graphics, for the web mostly, with JavaScript, WebGL, ThreeJS and shaders. A little bit of maths also.

  • Introduction to Cryptography (Part 3)

    This is the third part of Introduction to Cryptography. The post covers the Java APIs that implement the same algorithms that we spoke about in the previous posts, symmetric and asymmetric encryption, as well as digital signatures. We also talk a little bit about password security and the principle behind rainbow tables.

  • Introduction to Cryptography (Part 2)

    This is the second part of Introduction to Cryptography. The post covers symmetric encryption with cypher block chaining, the principles behind DES and AES, asymmetric encryption with RSA and a little bit on validating authenticity.

  • Introduction to Cryptography

    A very brief introduction to cryptography.This post covers one-time pads, a little bit of random numbers and the Diffie-Hellman algorithm.

  • Reverse Engineering Expected Goals

    In this post we are going to look again at the 2018-2019 Premier League season and try to reverse engineer bookmakers odds in order to obtain the expected goals for each match. Then, we are going to try to improve on these models and reduce our reliance on bookmakers odds. In the process, I will present two ways of implementing the Poisson regression in Python - one from scratch and one based on the the statsmodel library.

  • Stats Again

    A summary of statistics notions not found anywhere else on this blog. The post touches among others: some descriptive statistics measures, hypothesis testing, goodness of fit, Lasso and Ridge regression.

  • Decision Trees

    A short introduction to decision trees (CART). Decision trees are a supervised machine learning technique used for classification and regression problems. Since the launch of XGBoost, it has been the preferred way of tackling problems at machine learning competitions and not only, because of easy setup, learning speed, straightforward tuning parameters, easy to reason about results and, above all, very good results. Arguably, with current implementations, decision trees outperform many of the other machine learning techniques.

  • Streaming Architecture Using Apache Kafka And Redis

    This post describes an architecture based on Apache Kafka and Redis that can be applied to building high performing, resilient streaming systems. It applies to near-realtime systems, where a stream of events needs to be processed and the results submitted to a large list of subscribers, each of them receiving its own view of the stream.

  • Odds And Models

    Experiments with sports prediction models. It starts from a multiplicative expected goals model, then it computes the odds for home draw away markets in two different ways, and then compare the performance of these odds to the bookmakers’ odds.

  • Elixir Intro

    Some fun programming over the weekend, reading a little bit about Elixir, a very beautiful and succinct language. I hope, Elixir, someday we will meet for an exciting project together.

  • MMalloc

    A small, few hours project, which started from the idea of creating an allocator for persistent memory that can be used to load existing memory structures from disk on process restart. It never got that far in the few hours spent on it, but it was fun to write so here it goes. Clearly, not something that I would ever put in a production project, as any memory corruption would propagate after process restart, but the idea of a location independent allocator is fun nevertheless.

  • Postgres

    An introduction to PostgreSQL, including an example of full text indexing and search at the end of the article.

  • Logistic Regression

    While linear regression is about predicting effects given a set of causes, logistic regression predicts the probability of certain effects. This way, its main applications are classification and forecasting. Logistic regression helps find how probabilities are changed by our actions or by various changes in the factors included in the regression.

  • Dimensionality Reduction

    This post is about dimensionality reduction. It starts with PCA, a method for finding a set of axes (dimensions) along which to perform regression such that the most relevant information in data is preserved, while multicollinearity is avoided. It then applies PCA and K-Means to a dataset of Premier League player performances, in order to obtain relevant groupings despite randomness in data.

  • Linear Regression Done Right

    In several of my previous posts I wrote about linear regression, but in none I wrote about when and how to use it correctly and how to interpret the results. Computing a simple regression line is easy, but applying it blindly will surely lead to incorrect results.

  • Risk Models

    A not-so-short discussion about finanical risk, starting from computing the risk for a portfolio of stocks and discussing which ideas might be applied to modelling risk for a sportsbook. Many points presented here have not been tested in production or on real datasets.

  • Probability Notes

    A short introduction to probability. A collection of notes I took for myself, published here.

  • Common Design Patters Implemented In Python [WIP]

    An unfinished post aiming to cover the implementation in Python of the most commonly used design patterns.

  • Domain Driven Design Notes

    A very high level introduction to Domain Driven Design (DDD). DDD is a useful technique if the application has complex business functionality. It will not help with big data, performance or hardware specific optimizations.

  • Risk-based Odds Adjustment Using Bayesian Inference

    This post attempts to sketch a solution to a problem a bookmaker might have: starting from a predefined set of odds, how do we adjust them so that the risk for the bookmaker is minimized? We take into consideration the financial stakes the bettors have in the game, the total financial risk the house is willing to take and will only analyse the 1x2 market (home-draw-away), to keep the code clean.

  • Hazelcast Intro

    A short introduction to Hazelcast: what it is, what scenarios is can be used for, how to run a test environment.

  • Apache Kafka

    This article covers running a Kafka cluster on a development machine using a pre-made Docker image, playing around with the command line tools distributed with Apache Kafka and writing basic producers and consumers. The source code associated with this article can be found here

  • RabbitMQ Patterns and Considerations

    A short intro into RabbitMQ and its C# client. The code which merges most of the concepts from this article can be found here.

  • Classification

    A post about classification and metrics used for validating classification models.

  • P-Values And Hypothesis Testing

    This post is about p-values and hypothesis testing. What they are, why they are needed, how to compute them and how to use them. It also includes a worked example, how to validate that an A/B test indeed produces a significant outcome. The article follows quite closely a chapter from “Data Science from Scratch” by Joel Grus and it is annotated with my own research and observations.

  • Symbolic Maths in Python

    Ability to perform symbolic computations is a crucial component of any mathematics-oriented package. Symbolic mathematics is used to work with complex expressions, sets and probabilities, perform integrals or derivatives, plot charts based on user input, all without explicit numeric computations. This way, the Python interpreter becomes very much like a piece of paper on which one can jot down equations. To exemplify these, by the end of the article I will implement a short gradient descent function to demonstrate the power of sympy to code easy-to-work-with generic algorithms.

  • Short Intro To Reading System Behavior Charts

    Intervention in a system based on incorrect assumptions about data generated by it demonstratably leads to worst future behavior than in the case of non-intervention. However, in most cases, our bias to action or external pressure force us to make changes, leaving the system in a worst shape. Assuming a system is producing consistent data, how do we know when or if to intervene? This article introduces the notion of System Behavior Charts which is tightly connected to any methodology that has the aim of continuous improvement.

  • Nature Inspired Optimizations

    This post is about optimization algorithms. I am going to present the general structure of a nature-inspired algorithm which blends several methods in a single code base: population evolution (mutation) based on a stochastic process, elite selection and maintenance and variable mutation rate to escape local optima.

  • Simple and Multiple Linear Regression

    This is the second part of my Machine Learning notebook. It talks about simple and multiple linear regression, as well as polynomial regression as a special case of multiple linear regression. It provides several methods for doing regression, both with library functions as well as implementing the algorithms from scratch.

  • Statistics, continued

    In this post I tackle basic algorithms for computing probability density functions and cumulative distribution functions, as well as generating random numbers according to a distribution. Afterwards, I will compute the Gini index (G) and the entropy (H) coefficients, which are used to measure how much a distribution differs from the uniform distribution. Towards the end, I will touch the topic of validation with two methods: bootstrapping and k-fold cross validation.

  • My Machine Learning Notebook - Part 1 (Data Preprocessing)

    This is the first part of my Machine Learning notebook. As I am totally new to ML, the content follows the Udemy course “Machine Learning A-Z™: Hands-On Python & R”. As in the course, I’ll be using Spyder and RStudio.

  • C++ Play - Green Threads in C++

    In this post I will write about green threads and provide a sample implementation in C++ - or, better say, in C++ mixed with assembly language. The code is highly compiler-dependant (VC++17) and fragile, thus not production-ready. It makes assumptions about stack layout and data in registers and even something as simple as turning on optimizations will most likely crash it. But it was a fun programming exercise for a concept often implemented in asynchronous libraries or found in some programming language constructs. Interesting links: Fibers, Coroutines, Set Context, Actors, Cooperative multitasking

  • Docker And Containers Part 3

    This is the third part of my personal Docker cheatsheet. The content includes: cleaning up the system, running an SQL Server Windows container, connecting to it through the SQL Server Management Studio and using volumes for storing container persistent data (in this post saving mysql databases). The last part will be dedicated to Docker Compose.

  • Docker And Containers Part 2

    This is the second part of my personal Docker cheatsheet. In this part I will enable host file access to containers and continue with the more complex example of building and debugging Linux C++ applications in Visual Studio, using a Debian container in the background.

  • C++ Play - Multithreading (Queues)

    In this post I am going to build a multithreaded queue to exemplify various issues regarding synchronization. I am going to measure the cost of locking and thread contention and provide some ideas to improve performance. I am going to build the tests incrementally, refining the solution as I progress through the code, but I will also skip some steps to keep the post meaningfully short.

  • Docker And Containers

    In this blogpost I write about Docker. My personal Docker cheatsheet. So why Docker? Because I am interested in distributed computing and I want to play with various technologies. I want to be able to spin quickly complex setups and, at the same time, keep my desktop as clean as possible. The good news is that I can find all the software I need as docker containers. Even SQL Server or IIS, if I want. Beside a huge collection of sofware already configured and the means of keeping tidy your deployments, Docker offers process isolation / sandboxing, simplified updates and ability to build your own repositories. This post also touches Windows Containers running on Windows Nano Server or Windows Server Core - new features available in the Docker for Windows space.

  • C++ Play - Multithreading Part 2

    In this post I touch a little bit synchronization primitives and write two different implementations for a reader-writer lock. The code comes from an earlier project, started before std::shared_lock was introduced to the standard. Beside the standard library, RW locks are part of Windows SDK as Slim Reader/Writer (SRW) Locks and on Linux, with pthreads. There is no need (and not recommended) to implement them manually, as I do in the code below. But since this is a tech blog and and it is meant to play around with technology, here are two possible implementations.

  • C++ Play - Multithreading

    In this post I am planning to build upon C++ 11/14 features introduced in the previous posts and play around with threads and synchronization. I am going to start with the threading support introduced in the standard library, then move forward to Windows-specific code. I will use Windows code to exemplify how to build a minimal type-safe threading library using the new C++ 11/14, similar to the standard one.

  • C++ Play - my own short guide to C++11/14 (Part 2)

    In the previous blog post I covered a lot of ground: variadic templates, improvements to classes, improvements to standard library, smart pointers and lambdas. We also briefly touched move semantics as well as decltype and declval. In this part I am going to continue with move, delegated constructors, user defined literals and static_assert. In the next article I will write about local storage for threads while briefly touching thread support in the new standard library.

  • C++ Play - my own short guide to C++11/14

    C++ has evolved dramatically in the past years. It is virtually a new language, a huge improvement over the C++ I used to practice in the 2000s. Much more expressive templates, richer libraries, more emphasis on compile-time safety and performance improvements. Far from the C with classes in the early 2000s and far from the convoluted template constructs which appeared later to improve compile-time issue detection and type safety. To refresh my memory and also get up to speed with the new trends, I have spent a couple of hours digging through the docs, doing some tutorials. Here are my notes.

  • Welcome to Jekyll!

    I was looking for a simple blogging platform that provided for my following needs.

subscribe via RSS