Mike Lynch and Autonomy: HP's hostage to fortune?

22 November 2012
By James Hayes (intro), Roger Dettmer (interview)
Share |
Mike Lynch, co-founder of Autonomy

Mike Lynch, co-founder of Autonomy

Mike Lynch, as he appeared in IEE Review in 2002

Mike Lynch, as he appeared in IEE Review in 2002

Earlier this week Hewlett-Packard (HP) levelled an accusation of improper accounting at Autonomy, the UK software company it bought last year. HP has discovered “serious accounting improprieties” and “a wilful effort by Autonomy to mislead shareholders”, it says, after a whistle-blower came forward.

Irish-born mathematics whiz Mike Lynch, who led Autonomy, the firm he had co-founded when it was sold to HP last year for a hefty $11.1 bn, denies the allegations. He blames mismanagement by its new owners for shredding its value. “We are shocked… it's completely and utterly wrong and we reject it completely," Lynch says, who left HP's employment last May.

Lynch has been one of the more colourful figures in the UK software sector for over a decade – as is detailed in this revealing profile that first appeared in E&T's predecessor magazine, IEE Review, published in November 2002, which explains the background story of how the Cambridge-based data analysis solutions specialist acquired such value.  “Valuing technology companies is very difficult… Most of the company’s value is in the future and the whole business is sentiment-driven…” 

From IEE Review, November 2002

Pattern matching for pleasure and profit

How do you make a computer read with understanding? Mike Lynch explains to Roger Dettmer how the ideas of an 18th century cleric can provide an answer

They keep red-bellied piranha fish in the reception area of Autonomy’s Cambridge headquarters building, which naturally makes me think of the archetypal James Bond villain.You know what I mean: a shadowy evil genius with plans for world domination, and an unfortunate habit of tipping unwanted employees into a tank of man-eating sharks.

Mike Lynch, Autonomy’s founder and chief executive, is elusive rather than shadowy–it took four attempts to meet him–and his expansionist plans are of a purely commercial nature. But he has been called an ‘academic egghead’, and even accused of intellectual arrogance. The reality is a bit of a surprise. His beard–once a virtual trademark–has been shaved off, and at the age of 36 his face is starting to fill out. Lynch, it seems, is just a pleasant guy who enjoys talking about the company he has made famous.

Looking for the essence Autonomy is in the business of getting computers to make sense of unstructured textual information. The rise of the office PC, computer networking and the Internet mean that the format of some 80% of the information moving around the typical company–all those letters, memos, emails, reports and web pages–is incompatible with conventional methods of computer analysis. Imagine you’re a member of the senior management of a major automotive manufacturer, and somewhere within in your organisation a number of staff start to exchange emails about the unexpectedly high failure rate of a new component. It could be serious.You ought know to what’s going on, but how, short of employing armies of clerks to scan your corporate email traffic, can you hope to be alerted to such potentially significant events? Autonomy claims to have the answer.

Its software sits above the sea of corporate information, automatically identifying the subject matter of each passing document by a process which is described as extracting the document’s ‘digital essence’. Lynch explains: ‘We treat the task as a signal processing problem. In modern signal processing you’re often trying to extract a signal from noise. Similarly, hidden under the noise of language is an idea, and we’re trying to extract that idea. If you’ve a page of information, then the words on that page will be influenced by the idea, as, in the same way, in signal processing the received signal will have been influenced by the transmitted signal. But things get in the way: slang, idiosyncrasies of style, poor grammar, multiple ideas in the same document, so we use Bayesian probabilistic methods [see panel] to try and get back to the original idea.’

As a first guess you might imagine that this has something to do with counting the frequency of key words. Which is true, but only to a very limited degree. ‘The big idea,’ says Lynch, ‘is not to measure the absolute frequency of words, but the frequency with which words appear in relation to each other.’

He offers the example of a document about President Clinton: ‘This might only contain one instance of the phrase “President Clinton”, but it’s the other things that appear in association with this phrase that provide the real clues to the subject matter: Oval Office, White House, Congressional hearing. It’s all down to conditional probability–the chance of this word appearing given another word.

‘Now suppose we have a new document that we want to test for the idea that it has President Clinton as its subject matter. We look for the words that are present, and, because we have a probabilistic understanding of the way in which words are associated with the idea of President Clinton, we can calculate the probability that the words in the document have been influenced by this idea.’

Autonomy’s digital essence is thus the assemblage of conditional probabilities that characterise a specific idea or context–a simple idea with some extraordinary ramifications. ‘You do not,’ insists Lynch, ‘have to program a computer to extract a new digital essence. The trick is to get it to learn the probabilities automatically from studying large amounts of language.’

The approach is also language independent. In French documents relating to drinking wine, vin and boire will appear with the same relational frequency as wine and drink in English wine drinking documents.This last feature means that you can pose a request in one language and the software will respond by identifying documents written in a second language.

Clearly, given the nature of probabilities, this approach isn’t going to work all the time, but Lynch see this as a challenge rather than an insurmountable problem. ‘The issue,’ he says, ‘is to work out the most commercially useful level of performance. Imagine a company with lots of emails coming into a support desk. If you attempt to program a computer to answer all these emails automatically, then you’ll end up with lots of angry customers. It’s just too difficult a task for a computer to handle. But, typically, 80% of the questions will relate to the same common problem areas, so you could reasonably expect a computer to identify the questions relevant to, say, 70% of the emails, leaving the remaining 30% to be handled by human operators. That’s a commercially very useful system, because the perceived quality to the end customer remains the same, but you’ve freed up 70% of your support desk staff to go and do something more productive.’

Getting started

Autonomy is Lynch’s second business venture. In 1991, while still a post-doctoral research assistant in the Signal Processing and Communications Research Group within the Cambridge University Engineering Department, he co-founded Neurodynamics. Autonomy was spun out of the text analysis division of Neurodynamics in 1996. Both companies are addressing aspects of the same generic problem: the ability to understand from a pattern that you see–measured in the real world of noise and signal distortion–what that pattern represents.

Lynch has a nice illustration of the role of noise and distortion from an early Neurodynamics project on fingerprint recognition, where dirt on the burglar’s finger provides the noise and the compression of his finger tips as he presses on a window sill provides the distortion.‘An example where the technology was able to show clear differentiable advantages,’ according to Lynch.

His explanation for starting a company at such a relatively young age is disarmingly frank: ‘Simple naivety–a real advantage at the time because it meant I didn’t realise all the problems. Now I can reel off 30 reasons why an idea shouldn’t be pursued. In those days I didn’t know any better, so I did it anyway.’

Money was a problem. At that time, seed venture capital for technology companies was virtually non-existent, while the banks were friendly but unhelpful. ‘They seemed,’ says Lynch, ‘unable to conceive that somebody like myself might be able to start a successful company. I failed all the relevant credit tests. I was too young, with no assets, while my ideas on the commercial exploitation of pattern matching were seen as obscure and utterly incomprehensible. I was a total non-starter.’

In the end a business angel, ‘an English eccentric’ according to Lynch, put up £2000 to get the company off the ground. When the going was good Autonomy is interesting because of its technology. It’s also a fascinating illustration of how the value of companies has been rocked by the late 1990s’ boom in technology stocks.

The company was launched on the Brussels-based EASDAQ in July 1998 with a valuation of around £20 million. It was listed on NASDAQ in May 2000 and the London Stock Exchange in September 2000. The shares subsequently peaked at over £41 in November 2000, valuing Autonomy at comfortably over £5 billion. At the time, Lynch held some 19% of the company’s stock, earning, for a while, the misleading title of Europe’s first Internet billionaire. Thereafter, as the bubble in technology stocks burst with a vengeance, the stock declined sharply, so that by April 2001 the company’s valuation had fallen to £629 million. Lynch is very circumspect when it comes to talking about the vagaries of the stock market, having learnt fairly early on ‘never to comment on the company’s valuation’.

But he does have one nice story from the height of the boom. On a flight to San Francisco he found himself admiring his plane’s massive Rolls Royce engines, and suddenly realising that Autonomy’s market capitalisation was greater than the whole of British Airways. ‘It seemed,’ he admits, ‘expensive to me’.

Understandably, he’s become very wary of putting too much credence on market valuations. ‘Valuing technology companies is very difficult,’ he argues. ‘Most of the company’s value is in the future, and the whole business is sentiment driven. During the early 2001 bear market, when the share price fell by 80%, the company was actually hitting all its predicted financial targets. For its age, Autonomy retains one of the largest market capitalisations of any software company in the world. We’re also one of the very few that are profitable, and we’ve got cash. We’re a serious business with a solid technological foundation; not some here-today-gone-tomorrow dotcom venture.

‘People say to me ‘technology was a bubble’, and it’s true. But it’s what’s come out of the bubble that’s important. For the first time in the UK, we’ve created world-class companies from start-ups, like the ARMs and Autonomys, that are profitable and aren’t going to go away.’ Keeping going Early in 2001, Lynch sold just over 1% of his stake in Autonomy. He raised over £47 million from this sale, and by any reasonable reckoning is very rich. I ask why, with so large a personal fortune, he carries on working, and get the answer I deserve. ‘That’s a question that I’d only get asked in the UK. In the US, nobody would assume that success would make you stop doing something you’re interested in and believe in. But in the UK, there’s this narrow-minded obsession with money. It’s completely counter to what you’d expect. OK, in terms of making money I’ve been very fortunate, but that doesn’t mean I want to stop what I’m doing. 

‘Pattern recognition is one of the most exciting technologies at present. During the 1960s and 1970s we went through a phase when everybody thought they could do these things, only to learn the hard way just how difficult it can be. But if you start on the basis that the problems are very difficult, you’re in a much better position to start and understand how they can be solved. Now, by using probability and learning from the world, we’re starting to make real progress. We mustn’t get carried away. As one of my friends remarked: “Reaching the intelligence of a sea slug isn’t something we should get too excited about”. But it’s a great time, a great problem to be working on, and there’s lots more to do.’

What to infer

Suppose, for the sake of argument, we want to know whether the use of mobile phones increases the likelihood of getting cancer. There’s a body of data detailing cancer rates among mobile phone users, and we ask, given this data, what is

the probability that mobile phones cause cancer. In terms of conditional probability, what’s P(mobile phones cause cancer|data)? If this probability is high, then we’re likely to accept the hypothesis that mobile phones cause cancer; otherwise we’ll reject it. At least that’s what you do if you’re a Bayesian. But there’s a problem. Look at the right-hand side of Bayes’s equation. To calculate P(mobile phones cause cancer|data) it would appear that we need to know, amongst other things, P(mobile phones cause cancer), and this is just what we’re trying to determine, so we seem to be going around in some sort of circle. No problem, say the Bayesians. Hypothesis testing is never done in a valuefree vacuum, so just use some reasonable ‘prior’ value for P(mobile phones cause cancer). And this makes a sort of sense. If you think it unlikely that mobile phones cause cancer–your prior value is low–then the Bayesian approach will require commensurately stronger evidence before the hypothesis is accepted.

Such subjective laxity is anathema to the frequentists. They start by assuming that the hypothesis is false, i.e. mobile phones do not cause cancer, and then determine the probability, given this assumption, of obtaining the data on cancer rates. If this probability is less than some specified level, generally 5%, then the so-called null hypothesis is rejected, and a positive link between mobile phones and cancer will be assumed. This approach– which features in every standard textbook on statistical inference–avoids the need for an explicit subjective prior value, but is decidedly unintuitive and requires the selection of some, ultimately, arbitrary confidence level (the 5%) at which the null hypothesis is rejected.

However, the real objection to the frequentist approach is, according to the Bayesians, that such methods ‘routinely exaggerate the real significance of implausible data’ (see http://ourworld.compuserve.com/homepages/ rajm/twooesef.htm for a highly lucid explanation of this issue). Given the role of such techniques in the likes of drug testing and environmental impact assessment, then the central importance of the Bayesian/frequentist controversy becomes apparent.

Engineering information

In the broader engineering domain, Bayesian methods offer a solution to the long-running problem of representing often imprecise human knowledge and experience within the either-or world of the digital computer. Stephen Roberts, a Reader in information engineering and the head of the Pattern Analysis and Machine Learning Group within Oxford University’s Engineering Science Department, is an enthusiastic advocate of Bayesian methods. ‘The key thing,’ he says, ‘is not the issue of subjectivity or objectivity, but that the Bayesian approach offers a mathematically principled way of taking into account the uncertainty in everything we do. And, when you do this, you will, in general, end up with better results.’

Share |

Bayesians and Frequentists

It may be hard to credit, but the world’s statisticians are split by a deep ideological divide. In 1763, the Royal Society published an article entitled ‘An Essay Towards Solving a Problem in the Doctrine of Chances’ by the Reverend Thomas Bayes (1702-1761). The article had been found among Bayes’s papers after his death, and published posthumously. In it Bayes derived his famous equation about conditional probability: P(A|B) = P(B|A).P(A)/P(B). In other words, the conditional probability of some event A occurring given that event B has occurred (e.g. the probability that a patient has chickenpox given that they have spots), is equal to the conditional probability of B given A, multiplied by the probability of A and divided by the probability of B.

At one level Bayes’s equation is little more than an axiomatic statement about conditional probabilities, providing the perfect basis for answering all those awful textbook questions about the probabilities of selecting different coloured balls from various bags. The real controversy arises when this seemingly innocuous equation starts to be applied to real world problems, and the issue of what does or does not constitute an acceptable equation variable–the As and Bs.

In the idealised example of coloured balls in bags, the various probabilities can be inferred directly–just as we can say that the probability of six when we throw a fair dice is 1/6. In the real world, however, the sort of probabilities we’re interested in–the probability of a component failing, the probability of surviving cancer–cannot be inferred; they have to be measured by identifying the frequency of the event of interest in a suitably large population.

This is all well and good provided we are able to record information about many instances of the event in question. However, there are many interesting uncertain events that do not have this repeatability characteristic because they’re essentially one-offs. For example, Manchester United winning the FA cup this season, London hosting the 2012 Olympic Games, a newly installed control system functioning correctly. The list is endless. We can’t infer the probabilities of such events by prior reasoning or measure them by repeated experiments, but most people would probably be willing to have a go at suggesting likely figures for the first two examples, while the control systems designer would be able to make an informed stab at the likelihood of trouble-free operation.

The great divide in statistics is between those who accept the use of such likelihood values–the Bayesians–and those who don’t–the frequentists. This is much more than some sterile academic argument. The use of likelihood values opens the way to an, arguably, superior approach to the cardinal problem of statistical inference, whilst providing a sound intellectual foundation for the emergent discipline of information engineering.

Related forum discussions
forum comment To start a discussion topic about this article, please log in or register.    

Latest Issue

E&T cover image 1404

"Power cuts might seem like a 1970s fad, but they could be on the way back. How can we prevent them happening again?"

E&T videos

TomTom mapping the neighbourhood

E&T jobs

E&T Marketplace

The essential source of engineering products and suppliers.

E&T podcast

Tune into our latest podcast

iTunes logo


Choose the way you would like to access the latest news and developments in your field.

Subscribe to E&T