Skip to main content

Google’s LightweightMMM and Meta’s Robyn are two main open-source automated Marketing Mix Modeling tools both aiming to democratize econometrics and marketing science. They share the same goal, but their approach, implementation, and overall paradigms are rather different from each other.

In this article, we present a comparison and breakdown of the two tools regarding different aspects such as implementation, modeling, optimization, documentation, and community. At the time of writing, the stable versions of the libraries are 0.1.6 for LightweightMMM and 3.7.2 for Robyn.

Synopsis

Robyn

License

100%

Installation

33%

Modeling (theory)

100%

Modeling (practice)

100%

Optimization

100%

Speed

50%

Code Quality

50%

Community

100%

LightweightMMM

License

100%

Installation

100%

Modeling (theory)

100%

Modeling (practice)

50%

Optimization

100%

Speed

50%

Code Quality

33%

Community

33%

First Things First: License

Despite being used interchangeably at times, the term “open source” does not always mean “free”.  However, in the case of LightweightMMM and Robyn, such equivalence holds. Both of these libraries have licenses that allow modification, distribution, private use as well as commercial use. One difference is that Robyn has the MIT License which does not restrict trademark use while LightweightMMM has the Apache License 2.0 which does. Robyn seems to be more open to external contributors as they have a separate section about how to contribute to the codebase in their documentation which LightweightMMM does not have. Overall, both of these libraries are free (as in “free sandwich”) and free (as in “free speech”), embracing the open source mindset.

Diving In: Implementation Details

Framework

Google’s LightweightMMM is a super-easy-to-install Python library built on Google’s new machine learning framework JAX. Since the release of their famous TensorFlow framework in 2015, Google has been at the forefront of open-source machine learning and data science.

Proportion of publications that use TensorFlow vs. PyTorch [1]

But due to various reasons (including serious design flaws), TensorFlow has clearly been losing to its competitor: PyTorch from Meta. This led Google to start the race from scratch (dropping TensorFlow somewhat slowly and silently) by introducing JAX and developing their own new tools such as LightweightMMM with this new framework. In fact, Meta’s PyTorch framework has had such a significant influence in the data science field that LightweightMMM’s probability distribution module heavily relies on the framework Numpyro which states in its documentation: “the design of the distributions module largely follows from PyTorch”.

Meta’s Robyn is a library implemented in the R programming language and requires a Python installation on the side, therefore making it difficult to install, maintain, or deploy as software. The R portion relies on Meta’s forecasting library Prophet, and the optimization module utilizes the Python library Nevergrad, also developed by Meta. It’s somewhat unclear why Robyn was developed in R as Prophet has a Python implementation as well.

One advantage of LightweightMMM over Robyn is that, if available, it can utilize GPU hardware to speed up the model training significantly thanks to its JAX framework. In contrast, Robyn’s optimization module, Nevergrad, implements such optimization techniques that make it embarrassingly easy to parallelize the computation in order to speed up the process, which is precisely what Robyn does (more on optimization later).

Modeling

The way LightweightMMM and Robyn approach MMM is rather different. Robyn takes a frequentist approach (opposed to Bayesian) and models the marketing phenomena essentially as a “curve fitting” problem using the Prophet library.

There are certain confusions in the MMM community because Prophet might be using some Bayesian methods (e.g. MCMC) to estimate uncertainty bands, but in essence, it does not employ a purely Bayesian approach when modeling time series. It is a specific implementation of a GAM (Generalized Additive Model) where time series are assumed to be made of additive components such as trend, seasonality, holidays, and noise. One of the main advantages of using the Prophet library is that it already provides holidays for numerous countries which can be directly used for modeling.

There are several benefits to Robyn’s curve fitting approach –- curves are easy to decompose, fitted parameters have straightforward interpretations and it’s fast. Certain criticism of Prophet’s forecasting abilities coming from the machine learning community [2] are not very relevant in this context as Robyn does not necessarily use Prophet to infer the future, but rather utilizes it for fitting and decomposition of the data at hand. This is performed easily by minimizing the square of the prediction error also known as Least Squares Fit (simply ‘linear regression’) which was found by arguably the greatest mathematician of all time, Gauss, in 1795. However, this approach can be “too easy” which might lead to overfitting, especially with a high number of independent variables and a low amount of samples or ‘observations’. To prevent this, Robyn penalizes coefficients of the linear fit in a certain way (in fact, square penalty again) to prevent high magnitude coefficients. This trick is called regularization and such a fit is called Ridge Regression which is still a linear model. In general, frequentists tend to think in terms of cost functions (the metric to be minimized) rather than probability distributions coming from domain knowledge or past experience (known as ‘priors’).

Frequentists think in terms of cost functions and Bayesians think in terms of priors.

At this point, a completely valid question would be: If we know that marketing is a non-linear phenomenon (e.g. law of diminishing returns) then why do we use a linear model? The answer is we stick to the linear model to keep the benefits mentioned above but we transform the input data with certain non-linear transformations (saturation curves, adstock, etc.) to be able to capture the non-linearities of the domain. All of this can easily be seen in the Ridge Regression formula in Robyn’s documentation:

Interestingly, Ridge Regression is equivalent to having Gaussian priors in the Bayesian approach which is rather a lesser-known fact.

LightweightMMM approaches the problem in a purely probabilistic way with Bayesian modeling. In the Bayesian paradigm, rather than thinking in terms of point observations or point estimates, everything is modeled as probability distributions. The main advantage of this method is that the users can incorporate their past experiences or expectations as priors. Another advantage of the Bayesian approach is that uncertainty estimates indeed have true probabilistic interpretations which can be thought of as measures of risk. Bayesian modeling, in theory, enables users to incorporate their domain knowledge as prior probabilities.

However, in practice, it may not be very realistic to expect digital marketing experts to conclude that their past experience with a certain marketing phenomenon actually follows a Lewandowski-Kurowicka-Joe distribution which is implemented in the LKJCholesky function in Numpyro in Python. Although this sounds ridiculous, this is actually the capability that LightweightMMM provides. When it comes to geo-level hierarchical modeling, LightweightMMM supports it out of the box. As of December 2022, Robyn does not support any hierarchical breakdown dimensions but it has been communicated this is on the 2023 roadmap.

Google’s LightweightMMM cites several publications [3, 4, 5, 6] to support their argument for using a Bayesian approach to MMM. We think it worth mentioning none of these publications have been published in peer-reviewed conferences or journals and all of them are written by people working at Google. While Robyn does not cite evidence-based, peer-reviewed research for their design choices either, we believe both of these libraries have put significant thought into their implementation choices and serve as powerful tools.

Optimization

The way optimization is performed also differs in Robyn and in LightweightMMM. In Robyn, even though optimization happens in various parts of the process (e.g. fitting the Ridge Regression is in fact an optimization), it refers to the hyper-parameter selection of the non-linear adstock transformations. For instance, geometric adstock transformation introduces 3 hyper-parameters to be tuned while Weibull transformation introduces 4 for each media variable.This optimization tries to simultaneously minimize the Ridge Regression fit and a novel metric called Decomp.RSSD which tries to capture how much deviation the model has from the current spend allocations. With this new metric, the model is incentivized to find more plausible solutions that do not deviate abruptly from the current marketing spends.

In this manner, Robyn shows that its implementation has the incrementality in mind in which the model is iteratively calibrated by ground truths obtained from experiments rather than something that is tuned once and fixed. Robyn uses the Nevergrad Python library for optimization. The name Never-grad is a wordplay implying the optimization methods are not gradient-based and one does not need to calculate the gradient (a fancy way of saying ‘slope’) when optimizing.

Nevergrad is a superb choice for the optimization part since it relies on evolutionary algorithms that are easily parallelizable. Furthermore, arbitrary constraints can also be incorporated into the optimization scheme by tech-savvy users if needed. Due to the high number of hyper-parameters of adstock to optimize, Robyn is generally much slower than LightweightMMM.

LightweightMMM performs the optimization of budget with the well-established Scipy Python library. While not mentioned explicitly in any documentation, it can be found from the source code that the algorithm used for optimization (SLSQP) is in fact a gradient-based optimization method, unlike Nevergrad. Unfortunately, the convergence metrics and warnings for optimization for both Robyn and LightweightMMM are not very meaningful or actionable to non-computer scientists.

Supporting Aspects: Documentation and Community

When it comes to source code and documentation, neither of the libraries is close to following the best practices of scalable software development. Robyn is full of unused, commented-out code while also missing comments on the active code snippets. Clearly, development speed is prioritized over robustness in Robyn which often results in backward compatibility issues. In other words, updating the library tends to break things. Lightweight MMM has slightly better code quality, but Robyn’s documentation is far more in-depth and comprehensive.

In terms of community, Robyn undeniably takes the cake. Having an active Facebook group (Robyn Open Source MMM Users) dedicated to a transparent roadmap and all things Project Robyn, users are able to ask questions, get help analyzing results and receive insight into new feature requests. This peer-to-peer network gives Robyn users a massive upper hand. With Robyn developers actively posting around Robyn’s next development priorities and chiming in to assist, it’s reassuring to know answers to your questions are simply a post away or already exist in the discussion board.  Additionally, the issues/bug tracker in Robyn’s GitHub is significantly more active than that of LightweightMMM.

Summary

Both frameworks are developed by R&D teams of tech giants and serve as powerful tools for Marketing Mix Modeling. They are both open-source and neither are production-ready applications. They have different paradigms and approaches when it comes to MMM which is great for the community. At this current stage, Robyn seems to be slightly ahead of others when it comes to democratizing marketing science.

References

[1] https://paperswithcode.com/trends

[2] Jung, S., Kim, K. M., Kwak, H., & Park, Y. J. (2020). A worrying analysis of probabilistic time-series models for sales forecasting.

[3] Jin, Y., Wang, Y., Sun, Y., Chan, D., & Koehler, J. (2017). Bayesian methods for media mix modeling with carryover and shape effects.

[4] Sun, Y., Wang, Y., Jin, Y., Chan, D., & Koehler, J. (2017). Geo-level bayesian hierarchical media mix modeling.

[5] Chan, D., & Perry, M. (2017). Challenges and opportunities in media mix modeling.

[6] Wang, Y., Jin, Y., Sun, Y., Chan, D., & Koehler, J. (2017). A hierarchical Bayesian approach to improve media mix models using category data.

Authors:

Oguzhan Gencoglu

About the Author:

Oguzhan (Ouz) Gencoglu is Forvio’s tech & data science advisor. Oguzhan has several years of experience in proposing, developing, and deploying production-grade ML/AI solutions with his teams to various industries. He is also an avid speaker with over 50 tech talks around Machine Learning, AI, and Data Science.

Interested in testing MMM out for yourself?

Please contact our team via hello@forvio.com or fill up your contact info to learn more and join Early Access! 

Let me know!

About the team:

Forvio is a startup created by data scientists and engineers with a passion for solving marketing’s hardest problems. We are committed to using the world’s most advanced techniques and technologies in our mission to help marketers maximize their impact.