Open source ecosystems – a mathematical model

Beware - mathematics ahead!
Beware - mathematics ahead! (courtesy worldislandinfo.com)

It is a little known fact that my original career choice was mathematics, making it half way through a PhD at the University of Adelaide before realising how much my thesis (provisionally titled “Applications of the Hastings-Metropolis algorithm for calculating normalization constants in sparse multidimensional queueing networks“) did NOT do it for me. After a few years of wandering in the wilderness I discovered online learning as a valid career option, but I still hark back to my roots every now and then with a misty eye – mainly when I remember the bamboo-under-the-fingernails joy of hand-coding LaTeX using Vi.

My single favourite subject at Uni was an Honours subject I took called Mathematical Biology, which aimed to model a stack of different biological systems, mainly around how a species breeds, gets sick, dies, migrates and ultimately either perishes or reaches equilibrium in its numbers depending on a range of external and internal factors.

After having been involved with Moodle and other open source projects for a few years now it continues to strike me how much of an ecosystem each of these projects are – just like the biological systems I used to model back at Uni. They have a genesis, some are subsumed by other projects, some die off, and some thrive – like Moodle. Based on this I thought I’d try and use my time on QF756 to do something more productive than eating snacky cakes and playing Worms on my iPod, and hence please find below my first, definitely incomplete, probably flawed, mathematical model of an open source project.

Be warned – this is not like most of my other blog posts, and those with an aversion to mathematics should probably stop reading now.

Definition of terms

Firstly lets define some terms.

The first thing we’ll do is talk about what it is we’re trying to measure, namely success (S). I started out thinking of success as being the same thing as the number of users, but I think its deeper than that – one could look at a market leader in a product space that everyone uses but that most people hate using, and I personally wouldn’t call that a success. It is worth remembering that an open source project is not about making someone rich, it is about the benefit that the product is bringing to the world, and so I’ll define success as:

S = the extent to which the project provides long-term benefit to its end users, whoever and wherever they are.

The next three terms combine in the model to govern success, and they are:

Functional Fit: The extent to which the software component of the project meets the needs of the end users.

Support: The ease of finding support for the product, whatever that support may look like (we’ll explore this in more detail later).

Ability to deliver: The ability for the core project team to deliver a piece of software that achieves the Functional Fit.

You’ll notice that I refer to ‘the software component of the project’ above – it is important to never think of an open source project as being just the software – the other components are just as important (as this model attempts to show).

…never think of an open source project as being just the software…

I then propose that the simplest model for an open source project is then:

S = k.(Functional Fit).(Support).(Ability to deliver)

where k is some mongrel constant – I never paid much attention to constants, far too predictable, but I’ll add it in there for completeness, forget about it for the rest of the post, and then chuck it in at the end again.

Now this model reflects that fact that all three of these components must be present for a project to succeed. For example, if a project has high Functional Fit and the project team has a great Ability to Deliver then it will still end up going nowhere if there is no Support available to the masses.

But this is just a starting point, and now I’d like to add in a few more variables as we break down each of the above three terms.

Functional Fit

To be honest, this was actually the thing which started this post. I was watching the Twitter feed from the #conVerge conference last week and noticed several folks whose opinions I hold in the highest regard lament the introduction of Conditional Activities in Moodle, and it made me realise that there are two distinct drivers in an open source project – the desire to adhere to a design philosophy (in Moodle’s case, historically, that learning is not about putting ‘rats in a maze‘, but instead is about self-led exploration and discovery), and the popular opinion of what is needed (in the above case, the overwhelming majority of end users – in my experience – who considered Conditional Activities as a must-have feature).

Lets then set up some more variables which relate to the various sets of features provided by a piece of software (or desired by the community), defined as granularly as appropriate, namely:

  • Fd = The set of ‘desired’ features of the product as determined by the wider user community.
  • Fa = The set of features of the product available in the current release.

Now we find ourselves in a situation most easily explained by the following diagram:

Available and Desired Features
Available and Desired Features

In the middle (the intersection of the two circles) we have the ‘happy place’ where users are getting the features they want. On the left, we have the features which represent a mismatch between what the developers thought the users want and what they actually want (or what the developers are building based on their perception of how users should be using the software) – we could call this area the ‘superfluous features’ (Fs) zone. On the right we have the functionality gaps – stuff that people want but that hasn’t been implemented (like in the Conditional Activities example) lets call them the missing features, Fm.

What we really want to measure here is the ratio between the number of desired, available features (i.e. the ones in the ‘happy place’) and the number of missing features, so I’ll define the ‘Functional Fit’ as follows:

Functional Fit = num(Fa ∩ Fd)/num(Fm)

Note that num(Fa Fd) translates to ‘the number of features that users that are currently available AND users want to have available‘ – so this is a count of things on the list, not the list itself.

In other words, the more features which are in the product that people want, the better the functional fit is. Note that this doesn’t include Fs – the superfluous features – because in this model they are irrelevant. It could be argued that a large number of superfluous features will have a negative impact elsewhere, but lets keep it simple. I’ll come back to the risks of superfluous features later as well.

Support

Support around open source projects is critical, as this is often the place that the FUD-mongers will try and dissuade potential users from going open source. Not that I am suggesting open source projects are immune to support issues – but then again neither are proprietary platforms.

I’ve split support into two distinct areas – Community Support and Commercial Support.

For Community Support, I reckon there are two main factors involved.

The first factor  relates to how much of the total lknowledge base for the product is available online. Knowledge is power, and having access to a comprehensive knowledge base for the product you are using is invaluable in terms of being able to support your own deployment of a product. I’ve classified this as a ratio of the amount of knowledge freely available online (Ka) and the total, universal, knowledge base for the product (Kt).

…having access to a comprehensive knowledge base for the product you are using is invaluable…

The second factor in community support relates to the number of people actively involved in the community. This variable determines how much support is available when you have a question, and how up to date the knowledge base we just introduced is kept. This is a little tricky to define  though, as the total number of users (N) is by no means the total number of engaged community members – although they are obviously related. I’ve used a log function in the model because (1) it maintains proportionality (well, log proportionality anyway) between the overall users and the ‘active’ users, and (2) I like using log functions, they make me feel like I’m still a mathematician.

Hence we get, for Community Support:

Community Support = log(N).Ka/Kt

where

  • N = total number of users of the product;
  • Ka = amount of knowledge freely available online; and
  • Kt = total amount of knowledge.

How do we measure the total amount of knowledge? We could look at things like the total number of pages of documentation on the community website versus some theoretical total measure, but lets keep it theoretical and pretty rough for the moment.

The second factor in this section is Commercial Support, and this is far simpler, adhering to market forces around how profitable it is to support the software (P) and the demand (D) for support for the software.

I’m not remotely going to try and expand on profitability – its a post in itself – but I will suggest that it is dependent on things like the difficulty involved in supporting the platform, the skills needed to support the platform, the royalties (if any) payable for being a vendor and all the other overheads which come into play running a business.

I will however have a crack at a simple expansion of the Demand function, which I propose works along the lines of:

D = exp(Nw – N)

where

  • Nw = total number of people who want to use the product; and
  • N = total number of users of the product.

I’m suggesting this because it means that not only does demand increase exponentially when there are more people wanting to use the product that currently using it (growth), but it also backs off demand when the number of people wanting to use it drops below the numbers wanting to use it (decay) without going negative.

Hence we now have:

Commercial Support = P.exp(Nw – N)

and putting both types of support together we get:

Support = log(N).Ka/Kt + P.exp(Nw – N)

Note that I have made these additive rather than multiplicative, as in theory a project should be able to survive on one of these alone, even if the ideal mix is to have both the Community and Commercial support models strong.

Ability to Deliver

Finally we get to the business end, the engine room of the project, where the success of a project is won and lost – and talk about a piece of software’s Ability to Deliver on its potential.

I reckon I can think of three big things which have an impact on this variable, namely:

  • C = Capital – the amount of cash that a project has to keep going. As much as some may think that open source projects are the realm of the Happy Little Elves where everything is free and the world is a beautiful place, they all need a sound business model to be a sustainable, ongoing project;
  • A = Availability of resources – this could relate to the general job market, the specific skills required, or the technology on which the platform is built, but it is a critical factor; and
  • M = Management efficiency – this relates to the level of efficiency in the management processes used within the project. I’m sure there are a few open source projects out there that have had poor management as a significant contributor to their challenges or downfall – open source projects do not have immunity against the impact of having rubbish management in place, whether it be software development practices, working environment, a toxic culture or poor financial nouse. They all hurt.

Combining these together in the simplest mechanism and without trying to break them down into any further constituent parts, we then have:

Ability to Deliver = C.A.M

Putting it all together

So what have we got out of all of this? A final equation for the success of an open source project that looks like this:

Equation for success
Drechsler's Theorem on Open Source Project Success

Where, to recap:

  • k = some mongrel constant that needs to be in there for completeness I suppose;
  • C = Capital;
  • A = Availability of resources;
  • M = Management efficiency;
  • Fa= The set of features of the product available in the current release;
  • Fd = The set of ‘desired’ features of the product as determined by the wider user community;
  • N = total number of users of the product;
  • Ka = amount of knowledge freely available online;
  • Kt = total amount of knowledge;
  • Nw = total number of people who want to use the product; and
  • Fm = the set of missing features in the product as determined by the wider user community.

So what does it all mean?

Ultimately? I don’t really know. Maybe its all just self-serving nonsense that has been contrived to give me the answers I want to illustrate, but here’s a few observations based on the model anyway.

  1. There needs to be a focus on what the community wants to keep Fm small – which means there needs to be a balance struck between the philosophical views of the project of what users should want versus what the teeming masses actually do want. No matter how many superfluous features are built in to the software, this doesn’t guarantee success, and in fact may hinder it if they divert resources from what the community wants.
  2. There must be something in the business model which ensures that C is maintained at an appropriate level. No matter how many people are using a piece of open source software there needs to be a model in place which makes sure that core resources are getting paid and that the machine keeps turning. In Moodle’s case this is where the Partner model fits in – by effectively giving Partners a head start through the use of the Moodle Trademark to advertise services, which in turn gets money back into the system in the form of royalties.
  3. Sheer numbers of users are not enough to ensure success, but they will at least increase the level of community support available and boost the projects success to an extent.

Next steps

I’ve no idea of whether I’ll ever look at this post again, for all I know I may look back at this and cringe, but if I ever do then there are a stack of things I’d like to consider further, including:

  • Taking into account the maintainability of the software, which sort of relates to M, but should probably be in there explicitly;
  • Doing the same for the quality of the software from an end user’s perspective, which could have an impact on N and Nw, P and a bunch of other things probably;
  • Considering the ease of deployment in relation to P – if a piece of software has a steep technical learning curve then this has to have an impact;
  • Weighting of the various feature aspects rather than just looking at the simplistic view of counting them all as equals;
  • Taking into consideration the amount of ‘superfluous features’ and what their impact could be on the overall project; and
  • Considering the overall equation as a time series, where the Success at time point (t) is actually dependent on the actions in the previous time point (t-1) for the remainder of the variables.

If nothing else…

…then consider this as one man’s perspective on the anatomy of an open source project like Moodle – how it keeps ticking, what this mysterious thing called ‘success’ looks like, and maybe even how yours measures up against a model like this.

Comments/flames/improvements welcome – its been a long while since I’ve made my brain think in these terms. Thanks for making it this far.

7 thoughts on “Open source ecosystems – a mathematical model

  1. Interesting post and approach. Though I doubt very much you’ll ever be able to mathematically model all of the factors. That said, this type of modelling is probably most useful in identifying potential factors and getting people to think and talk about what is there and what isn’t. A pity that the mathematics would probably turn some folk off.

    For example, I’m wondering whether or not functional fit is related in some way to N – the number of users. More particularly the diversity of users represented by N. I can think of a number of folk who love the idea of conditional activities, and a number that loath it. For some of each group, the presence or absence of the feature is going to effect their perception of Moodle.

    I also vaguely recollect that Christensen’s disruptive innovation stuff talks about the dangers of too many features, or at least the perception of too many features. I think the proposition was that once a product has too many features it is starting to appear to serve too many different types of user. It’s at this stage when a lower quality product that is simpler can creep in and steal market share.

    The problems caused for support by feature sets that are too large is one of the problems that users find all the time. The other side to that problem is the difficulty for support organisations – to some extent Moodle partners but particularly organisational IT departments – in trying to be able to support and sustain large feature sets. Both technically and user support. This is one of the problems some folk face when they want to use a non-core plugin like BIM.

    Sorry, went on a bit. An example of how a model like this can get folk thinking.

  2. Love the premise here. I’m reminded of the scene in Dead Poets Society where Robin Williams makes the students rip the mathematical formula for the value of poetry out of their books.

    What stood out for me while reading this was your idea of functional fit, and the line ‘learning is not about putting ‘rats in a maze‘, but instead is about self-led exploration and discovery’. You’re very right in that most people who have issues with conditionals do so for this reason. And it’s true to say that this probably arises from those who will use the system poorly – restricting access to information in an attempt to ‘bribe’ students to complete activities. On surface examination, this is what it looks like conditionals are set up to do.

    However, I mentioned in my tweet to you that I’m in the midst of designing a Moodle training site on GBL principles, which is very heavily reliant on conditional activities and completion tracking. I’ll try and do a blog post later to describe it in detail, but the general premise is that it’s a site for Moodle training for staff that is designed like a game. A series of levels is progressed through in order, certain scores must be attained by completing certain tasks and so on (informed conceptually by WoW daily quests and structurally by Angry Birds). But – nobody is fed information on how to do things, how to achieve high scores, where to click etc – it’s fundamentally goals-based (not skills-based), self-directed and highly exploratory (like most games, lots of trial and error rather than read this, click here, learn that). To me, this is very much in the spirit of Moodle and effective learning. Like all tools, the value lies in how you use it.

    In a nutshell, the reasons that people *want* features added may be misguided, but those reasons don’t have to define the use of the feature. I think there’s a lot of room for exploration of the ‘Fa’ half of the Venn diagram – why did they add that? And if we don’t like the why, can we disrupt usage patterns to make it valuable?

  3. thanks for this extremely useful post.
    I think you could also apply this model, in a slightly modified form, to COTS products. A comparison (of sorts) would then be possible to demonstrate (to the people that need convincing) that open source products are not all categorically ‘unmaintainable’, ‘not enterprise capable’ or any of that stereotypical nonsense.

  4. To go from eating snacky cakes and playing snake, to founding Drechsler’s Theorem is quite an outstanding achievement! Now you just need to publish it and get a Wikipedia article set up, and soon everyone will be optimising their projects for maximum S 🙂

Comments are closed.