Collaborative filtering in PHP

In the era of PyTorch and TensorFlow 2.0, one may wonder why PHP would even come to mind when mentioning machine learning. No eager execution, no automatic differentiation, no vectorization of operations, no array broadcasting. It sounds hard to even think of implementing an algorithm without all these features and tools, let alone have it reach an acceptable level of performance.

And yet, PHP still dominates the world of web development. This is notably the case for e-commerce, where four of the top five players (WooCommerce, Magento, VirtueMart and PrestaShop), all written in PHP, already account for 45% of the worldwide market share. As online retail keeps growing (it now accounts for more than three-quarters of overall retail growth), it is also becoming more competitive, and retailers are on the look out for any possible way to boost their sales. Using a recommender system to highlight the products a customer is the most likely to buy is one of them.

While any large e-commerce player will have ample opportunity to build separate machine learning modules using an adequate framework, it is not the case for the majority of small retailers whose infrastructure is in no way designed to support anything other than the monolith that manages their entire business. What to do in such cases?

The idea is to keep it simple. We’ll be implementing the classic low-rank matrix factorization algorithm, using the best tools PHP has to offer. In this case, MathPHP will take care of all the linear algebra and statistics for us.

Let’s start by defining our data. We have users, and we have products. They can be linked together by what we’re going to call a score: there are many ways to build it, but most likely it will be either a rating (e.g. 1 to 5 stars) or a purchase indicator (1 if the user bought the product, 0 else). We’ll use \(s(i, j)\) to denote the score given to product \(i\) by user \(j\), not necessarily defined for all pairs, and \(S\) the set of pairs \((i, j)\) for which \(s(i, j)\) is defined (i.e. a rating is available).

Now on to our model. The idea is to represent a product with \(n\) features, and a user with \(n\) weights, each expressing the correlation bewteen the score they would give a product and one of its the features. Therefore, a prediction of the score is given by the following:

\(\sum\limits_{k = 1}^{n} u_k p_k\)

Or, representing our users and products by vectors, simply \(\vec{u} \cdot \vec{p}\). In order to minimize the difference between the predicted scores and the real ones, we will be working with the following cost function:

\(J = \frac{1}{2} \sum\limits_{(i, j) \in S} (\vec{u_j} \cdot \vec{p_i} – s(i, j))^2 + \frac{\lambda}{2} \sum\limits_{i = 1}^{n_p} \|\vec{p_i}\|^2 + \frac{\lambda}{2} \sum\limits_{j = 1}^{n_u} \|\vec{u_j}\|^2\)

The first term sums the squares of the differences between the predictions and the targets, while the other two are regularization terms (\(L^2\) here but other methods can be used). This optimization objective yields the following gradients:

\(\frac{\partial J}{\partial \vec{p_i}} = \sum\limits_{j \mid (i, j) \in S} (\vec{u_j} \cdot \vec{p_i} – s(i, j)) \vec{u_j} + \lambda \vec{p_i}\)
\(\frac{\partial J}{\partial \vec{u_j}} = \sum\limits_{i \mid (i, j) \in S} (\vec{u_j} \cdot \vec{p_i} – s(i, j)) \vec{p_i} + \lambda \vec{u_j}\)

We can then update \(\vec{p_i}\) and \(\vec{u_j}\) using gradient descent, or one of its more advanced variations such as Adam. In the case of the former, which we’ll use for the sake of simplicity, this gives:

\(\vec{p_i} := \vec{p_i} – \alpha \frac{\partial J}{\partial \vec{p_i}}\)
\(\vec{u_j} := \vec{u_j} – \alpha \frac{\partial J}{\partial \vec{u_j}}\)

where \(\alpha\) is the chosen learning rate.

How does this translate into code? Let’s suppose we have the following:

  • $products, a collection of Product objects
  • $users, a collection of Users objects
  • $ratings, a collection of Rating objects, with the product, user and rating properties

Let’s start by importing the required classes and setting up our configuration:

We’ll be using SplObjectStorage to store our embeddings, so that we don’t have to modify or extend our e-commerce software’s classes, using objects as keys instead. We will then initialize them using values from the chosen distribution (here a normal distribution of mean 0 and standard deviation 1, though other techniques can be used).

Now onto the first epoch. We’ll use the same method to store our gradients:

We can then loop through the ratings, and for each of them, compute the corresponding gradient part for the associated product and user. After this, we sum these gradient parts so as to obtain the full gradient in the above formula, minus the regularization terms.

Finally, we perform gradient descent, subtracting the gradients from the ratings as well as those from the regularization terms:

Now all there is left to do is wrap the code for the epoch in a loop and let it run for the specified number of times. Once this is done, obtaining a prediction is as simple as this:

It is also possible to look for similar products by comparing their embeddings:

You may have noticed that if a user hasn’t, say, rated or bought a product yet, the optimization objective will only be comprised of the regularization terms, leading to the user’s representation being a vector of zeros, and therefore predicting a score of zero for any product. This problem is often solved with mean normalization, which consists in subtracting, for each product, the mean of its ratings across all users before running the algorithm (and adding it back after prediction, of course).

However, all this will achieve in those cases is predict the average score instead of zero, and normalization over small samples may cause the model to be a lot less robust. For these reasons, I would suggest to simply ignore users and products under a certain threshold of ratings.

The full implementation is as follows:

And this is all you need to implement collaborative filtering on any of the e-commerce systems built with PHP.