(Mis)understanding big numbers

Understanding big numbers is hard, probably because we evolved in an environment where understanding a number more than a hundred or so wasn’t of much benefit.

One of the best ways to understand a big number is to try to convert it into a multiple of something we can understand. For example, the size of Wales is sometimes used, e.g. “help protect an area of rainforest the size of Wales”.

The evil twin of this helpful technique is come up with a quantity that everyone thinks they understand but is actually much larger or much smaller than people expect.

So, this morning’s news bulletins in the UK have been prominently reporting that the parliamentary home affairs committee have said that the backlog of failed migrants in the UK is equivalent to the population of Newcastle upon Tyne.

Newcastle is one of the UK’s largest cities, surely, so that must be a lot of people. Well not really. Because of the fairly academic definition of municipal boundaries, the population of Newcastle is only about 200,000, and excludes places like Gateshead, North Shields and Whitley Bay, that most outsiders would struggle to distinguish from Newcastle.

The news release could equally have said that the backlog is equivalent to the population of Dudley, with precisely the opposite effect because Dudley is one of those places that is actually much larger than most outsiders expect.

Posted in How not to... | Leave a comment

The ethics of analytics – another example

I’ve been planning to write up some detailed examples of my suggested framework for the ethics of analytics (see my previous post). But now that I’ve been thinking about the subject I keep seeing examples everywhere.

Another interesting one I came across at the weekend, is Google Now, as discussed here. The premise is to provide information to people automatically before they need it – for example check the traffic reports before you have to set off for a meeting.

This has the potential to impact a number of criteria in my framework, in a good or bad way, depending on how it’s implemented.

image

Much will depend on how much control the end user has (not a traditional Google forte) for example:

  • On openness: can the end user tailor when Google now cuts in and when it doesn’t, to focus on the information that he finds useful?
  • Is innovation geared to achieving provider benefits not consumer benefits (if sponsored links start appearing more than user-controlled ones)
  • On intimacy: is there a mechanism to prevent access to some personal information:
    (“Hi it’s Google now here – I hear you’ve got an appointment at the hospital today, here is the suggested driving route and, by the way, here’s some useful information about bowel cancer”)
Posted in Ethics of analytics | Leave a comment

The ethics of pricing analytics?

There’s an interesting example of some pricing analytics in the news that might test my proposed framework (see my previous post) for the ethics of analytics.

According to the Wall Street Journal, Orbitz has been showing different hotel offers to Mac and PC users, offering more expensive options to Mac users.

Orbitz executives confirmed that the company is experimenting with showing different hotel offers to Mac and PC visitors, but said the company isn’t showing the same room to different users at different prices. They also pointed out that users can opt to rank results by price.

How does this example fare on the framework?

image

On Relationship, what’s being offered – a range of hotels – is directly relevant for the consumer. The Intimacy of the information being used is pretty low: the browser used is used of the easiest things to track, although some consumers might be unhappy with being profiled for pricing purposes that way. Frequency is not particularly relevant. The Provider benefits will depend on how good the correlation is between browser used and consumer type, though for a lower margin business like booking hotel rooms even a small lift might provide a significant benefit.

The two interesting criteria are Openness and Consumer benefits. It’s the lack of openness in the mechanism that will annoy some customers instinctively: read the user comments to the WSJ article if you don’t believe that. The Consumer benefits are, at best, debateable. But I’d argue that showing by default hotels in the price point Orbitz believes customers’ want to see if perfectly defensible.

The Economist reports the same story, but goes on to discuss the more difficult to justify practice of offering different customers different prices for the same product. According to the Economist:

Orbitz stresses that it does not charge people different rates for the same rooms, but some online firms are believed to be doing just that, for instance by charging full whack for those assumed to be willing and able to pay it, while offering promotional prices to the rest.

The Consumer benefit of this sort of differential pricing is almost certainly nil, certainly for the customer offered the higher price. There are plenty of companies employing such pricing in an open and transparent way (think of special offers for new customers that are not open to existing customers unless they threaten to leave). Perhaps it’s the combination of “unfair” treatment and a lack of openness that some consumers find so offensive. Comments on the Economist article seems to be running about 2:1 for those unhappy:happy with this sort of arrangement.

Posted in Ethics of analytics | Leave a comment

The ethics of predictive analytics

The importance of having an ethical approach to the way that you apply analytics seems to be getting increasing focus. But given the significant amount of press coverage that the subject is getting and the potential for significant reputational damage there seems to be surprisingly little in the way of guidance about for what is and isn’t appropriate.

Why is it important to have an ethical approach? Firstly there is a potential for serious reputational damage for failing to do so. Secondly there is the potential for increased legislation or regulation in this area.

What constitutes unethical behaviour? Much of the discussion seems to focus on breaches of individual privacy, and what expectation of privacy people should expect. But breaching privacy isn’t the only way that analytics can breach what many people would consider to be acceptable behaviour. Judging by some of the recent news stories in this area:

To be clear, that final example is a hypothetical one, but one that’s not so far away from some real life examples, as I will discuss later.

The common theme in these stories, as well as various others that consider the potential for breaching individual privacy, is the “yuk-factor” – the potential to immediately and instinctively alienate customers and potential customers. One of the dilemmas for organisations considering whether to proceed with applications of analytics like these is how open to be about it, and the default position has frequently been to avoid exposing to much detail, partly because of the potential for competitive advantage but partly to avoid alienating consumers.

A lot of organisations are wrestling with these types of ethical questions. I recently took part in a procurement process for the application of predictive analytics by public sector organisation. As part of the procurement I had to present to the client staff responsible for the ethical considerations and answer a selection of tricky questions: (how we would use the data and analytics to improve their service? how would we handle the potential for unfair discrimination?).

What’s lacking as far as I can tell is any sort of framework for judging what is and isn’t appropriate. So, what I am going to try to do is set out the criteria we can use to judge what is and isn’t appropriate behaviour. This is emerging thinking for me, so I may well revisit and revise this list over time (and I’d appreciate suggestions) but what I have so far is an assessment of six criteria:

image

The first three criteria all relate to the customer’s expectation:

Is what is being proposed appropriate to the Relationship between the two parties? For example, if I promote products closely related to the one’s I already sell you, then most consumers see this as appropriate or even beneficial. How far our relationship stretches, though, is debateable.

The technical requirements of Openness, such as a published privacy statement, are met by respectable organisations, but when it comes to applying this in practice there are different extremes. Telling me that I am being recommended X because I also bought Y is generally open, and something some find reassuring. Telling me that my friend has bought product X is certainly open too (though it might fail the intimacy criteria below). The opposite extreme is actually very common too. In the New York Times article I linked to above there’s a really interesting quote:

“With the pregnancy products, though, we learned that some women react badly,” the executive said. “Then we started mixing in all these ads for things we knew pregnant women would never buy, so the baby ads looked random. We’d put an ad for a lawn mower next to diapers. We’d put a coupon for wineglasses next to infant clothes. That way, it looked like all the products were chosen by chance.

“And we found out that as long as a pregnant woman thinks she hasn’t been spied on, she’ll use the coupons. She just assumes that everyone else on her block got the same mailer for diapers and cribs. As long as we don’t spook her, it works.”

On the face of it, this is the complete opposite of openness. And it’s not a unique insight of Target, because this is standard practice in a number of other retailers too. Is it ethical? It’s certainly manipulative and risks a backlash from some consumers if they realise what’s going on.

Intimacy is about the nature of the product or service being offered. This article has a nice model of the different types of intimacy in personal information, and makes the point that to use more intimate information a closer relationship is required. Retailers who use highly targeted advertising well tend to understand the different types of intimacy requirement in different product categories. The pregnancy example above highlights this in part. They do the same thing with pets: beware sending a promotion for cat food to a customer whose pet has just died.

Frequency is well understood by most marketing professionals too. There’s a limit to how often most consumers welcome contact, though it depends too on the strength of the relationship between the parties.

A lot of applications of customer analytics deliver direct Consumer benefits as well as more obvious Provider Benefits. For example, when they’re done well, recommender systems (“customers with your purchase history also bought these…”) are valued by customers as well as by providers; indeed when they’re done badly it’s often because it’s seen by the provider as merely an exercise in cross-selling rather than as something valued by the customer.

It’s easy to confuse provider and consumer benefits. Partly that’s because a provider benefit can be, indirectly, a consumer benefit too. Most consumers have at least a partial understanding that by providing personal information in a transaction they are providing something of value to the other party, but they’re doing so in return for either a reduced price (for example by using a customer loyalty card) or for a free service (as with many online services).

More disingenuous is to present a provider benefit as being of direct benefit to the consumer. It seems to be an article of faith for many of those engaged in online marketing that targeted advertising is a direct benefit to the consumer being served with the advertisement. It’s easy to see why better targeting is beneficial to the advertiser, but the benefit to the consumer is at best arguable and appears to be facing an increasing consumer backlash.

In the next part of this thread I am going to consider some real-life examples and how this framework applies.

Posted in Ethics of analytics | 2 Comments

What data can and cannot do

I can’t decide whether this article from the Guardian website is interesting or just a statement of the obvious.

The conclusion is right though, or at least part of the answer:

Interpreting data is not easy. Furthermore there is a tendency to think that the widespread availability of data and data tools represent a democratisation of the analysis and interpretation of data. With the right tools and techniques, anyone can understand the contents of a dataset, right?

I’d argue that on top of the tools and techniques there are other components (the components I outlined in my series on establishing analytical capability): in particular the specialist skills to analyse and interpret data and turn it into information.

UPDATE: Now I’ve also read the partner article which is rather more interesting and makes the more astute point: “everyone can do it, but not everyone can do it well”

Posted in Uncategorized | Leave a comment

How not to calculate probability

There’s a good example on the BBC News website today of one of the really common mistakes with calculating probability. The article says:

Labour MP John McDonnell has defied odds estimated at 58,000 to 1 to top the annual Private Member’s Bill ballot for two years in a row.

This “estimate” is based on the fact that 240 MPs entered the ballot, so the probability that John McDonnell’s name comes out of the hat first is 1/240 and the probability that his name comes out first twice is (1/240) x (1/240) = 1/57,600 (or 1/58,000 with a bit of rounding).

Right?

Well no, there’s at least one serious flaw here.

Firstly, the newsworthy event here is that an MP has come out on top of the ballot two years in a row. Since someone must have come out first last year then the probability that the same name comes out first next year is 1/240. If we were to wind back a year and predict that John McDonnell will come out first two years in a row then the probability would indeed be 1/57,600. But the probability that any MP will come out top is 1/240.

That’s why McDonnell’s quote in the article (“I wish I had thought of popping down to the bookmakers yesterday to be quite honest because we could have made a fortune”) is so flawed. Had he placed the bet over a year ago, before the first ballot, then he’d be right. But the probability yesterday that he would come out first was 1/240, the same as the previous year.

This is a version of the Gambler’s fallacy, that because a random event has just happened we instinctively think it’s less likely to happen again. For example, if a fair coin has been tossed five times and produced five heads then the probability of a head on the next toss is still 1/2.

Posted in How not to... | Leave a comment

Building a capability in insight analytics (Part 4)

For the final part of this series (at least for now) I want to add a bit of a discussion about how to get there. The first three parts were an attempt to summarise the main capabilities needed in an insight analytics function. This part is about the roadmap to achieving it.

A good first step is to assess your current position. There’s a pretty good IBM whitepaper (Analytics: The widening divide) which includes a high level categorisation of organisations into Aspirational, Experienced and Transformed. I’ve used a combination of this together with a maturity assessment based on the six capabilities I outlined in part 1 as a good starting point in this assessment.

In part 3 of this series I talked about the types of analytical services you want the team to deliver. Once you understand that, then you hopefully have some idea of where you want to get to: the impact that you want to have, the services you want to deliver and, as a result, the skills, data and technology that you’ll need.

Next step is to understand how you are going to get there. There are different routes, depending on the capabilities you have, the immediate priorities and the level of wider sponsorship.  For example, consider the three approaches below. They differ in speed of execution and the priorities. Whether you go quick or slow will depend, to a great extent on the level of sponsorship available.

Data and technology first

  • Focus first on delivering the technology and data infrastructure required to enable analytics
  • Develop analytics capability when data reaches a level of maturity

Pros:

  • avoids putting the cart before the horse: some investment in data is a requirement for successful analytics

Cons:

  • large expenditure on data and technology may not pay back its cost
  • benefits of analytics services can be delayed repeatedly

Step by step

  • Focus initially on quick wins to demonstrate value of analytics initiatives
  • When each capability demonstrates its value invest further to develop

Pros:

  • lower risk: avoids wasted investment
  • suits organisations with a staged approach to investment appraisal and approvals

Cons:

  • Slower: takes longer to reach the end goals

Transformational

  • Invest in each of the components of an analytical capability in parallel
  • Align investment in analytics with the wider objectives of the organisation

Pros:

  • fastest route with greatest expected benefits

Cons:

  • needs strong executive sponsorship
  • increased delivery risk

The outcome of this process can then be summarised as a high level plan. The example below is a generic one based on a couple of clients. Alongside a set of projects aimed to develop each of the six capabilities we kicked off a quick win programme aimed at generating some momentum. This is particularly valuable in an organisation which doesn’t yet have the high level sponsorship needed for a big programme.

Posted in Uncategorized | Leave a comment

Building a capability in insight analytics (Part 3)

For Part 3 of my series of building an analytics capability I want to pick up on the idea of developing analytical services which was one of the six capabilities I discussed in Part 1.

There are various established applications of analytics: marketing analytics, fraud identification, collections, HR analytics, customer segmentation, campaign management, etc. The effectiveness and relevance of these will depend on the organisation and the data available.

A simple model that I’ve found useful in understanding this is the one below:

The key parts of the model are:

  1. The grounding in available data and links to the right business stakeholders – both are needed to develop a useful and relevant service.
  2. The insight analysis part represents the analytics-heavy part of the process: it’s the part of the process where many analysts are at their most comfortable playing with data and building models. A colleague called this the University of analytics. The key point here is that this work is valuable groundwork but doesn’t deliver any benefit on it’s own: the cleverest predictive model doesn’t deliver any value until we’ve worked out what to do with it.
  3. The vertical towers represent applications of insight analytics to the real problems at hand. This is where an analytical service needs to have a clear objective and supporting business case: we are going to deploy this analytical model because we believe it will generate additional revenue $X, and this is how we are going to measure it. That business case needs to be grounded, of course, in some clear objectives for the analytics function and for the business as a whole.

Getting the balance right between the theory and the implementation isn’t as simple as it sounds. As an analyst I’ve been guilty of spending too long focusing on developing the “best” possible model rather than actually getting it used. As a consultant, I regularly see my peers trying to do the implementation without doing the groundwork first: some of those claims by software vendors or consultants of five-fold increases in ROI from analytics often forget the mention the supporting elements required: the data, and the analytical legwork.

To develop an analytical capability I do think it’s important to have a focus on how you’re going to develop and deploy analytics, even if that plan changes over time. Then you can focus your development efforts first on those areas which you expect to have the largest benefit.

Posted in Building analytical capability | 1 Comment

Mark Zuckerberg, Facebook’s chief, has managed to amass more information about more people than anyone else in history.

There’s an interesting piece in today’s New York Times on Facebook’s use of data.

The value of the Facebook business seems to me to be a function of (a) the extraordinary volume of visitors (quantity, frequency, sheer length of visits, potential for growth, etc.) and (b) their ability to charge a premium for advertising because of the data they are able to deploy.

It’s true that most of the data-driven personalisation at the moment is pretty straightforward (change your marital status to “engaged” and you’ll get served with dozens of ads for wedding related services) but it’s also effective. If Facebook is to “spin that data into enough gold to justify [it’s] valuation” then it will need to combine cleverness with the data and a focus on where it applies it.

 

Posted in Random thoughts | Leave a comment

Another article on building teams

Some more interesting reading on building analytics team here.

Actually a lot of the content here is about developing the Analytical Services that the team delivers, which is one of the capabilities in my framework that I’m planning to write a bit more about.

I particularly like this paragraph:

One word of caution: people new to data science frequently look for a “silver bullet,” some magic number around which they can build their entire system. If you find it, fantastic, but few are so lucky. The best organizations look for levers that they can lean on to maximize utility, and then move on to find additional levers that increase the value of their business.

It emphasises, to me, the importance of understanding what services your analytics team are going to do and what value those services deliver.

Posted in Building analytical capability | Leave a comment