Monday, March 03, 2014

Is crowdsourcing like DNA?

In the, mostly excellent, new JISC Infokit on Crowdsourcing Martin Poulter provides an analogy whereby the computers do things sequentially and get things correct each and every time, whereas in biological systems there is a level of error, but that it is constant and self-correcting.   I am unhappy with much of this analogy, and in order to perform some of his crowdsourcing would like to explain my problems with it.

I need to include a lengthy quote and then  explain my problems with it:
"Computer 1 works in sequence through its list of instructions. If it has to add two numbers, it grabs the two numbers from wherever they are stored, puts them through its adding machine, sends the answer to be stored and then clears its workspace to prepare for the next instruction. It only begins a step once the previous one is complete.

"Computer 2 is completely different. Parts of Computer 2 are constantly breaking off and other things are constantly sticking to it. It thus exists in a state of near-equilibrium with its environment. However, it makes progress because the equilibrium is not exact. When part of its structure corresponds to part of the correct solution to its problem, Computer 2 becomes a little bit more stable. So over time it grows bigger and more complete, even though from minute to minute it is rapidly changing in a way that seems chaotic. Just under half the steps in the development of Computer 2 are reversals of previous steps.

"Another difference with Computer 2 is that a constant proportion of its steps give incorrect answers. Asked for 1+1 it will most of the time say 2, but every now and again give 3 or 4. This is not as disastrous as it sounds, because the answers are so frequently erased and replaced, and correct answers are more stable and more likely to become part of the long-term structure. Still, there is no guarantee that every instruction is carried out correctly.

"Here is another difference: Computer 2 is more energy-efficient by a factor of a hundred or a thousand, according to the physicist Richard Feynman. What’s more, these two machines are things we encounter all the time. Computer 1 is a microprocessor of the type we have in our computers, our phones, our cars, and ever more everyday objects. Computer 2 is DNA.

"The long-term sustainability of DNA is not in question. Microprocessors had to be brought into existence and need constant external power because of their relative inefficiency. By contrast, DNA just happened when certain molecules came together. DNA does not need to be plugged into the wall. Given enough time and the opportunity to make lots of mistakes, DNA has made things that seem highly designed for their environment, all without any kind of forward planning."

.....

"We could say that the DNA approach works to create things that are organic. In practice, organic means:
  • Modular: a failure of a part does not mean a failure of the whole
  • Visible in quality: it is possible to evaluate the quality of a part, independently from the whole."
I don't have any problems with description of computer 1, at least for individual processors.  once you get into parallel processing  then it begins to break down, but lets not worry about that.  My real problem comes with computer 2.  First there is a confusion between three things:  DNA, genes (a stretch of DNA that codes for one thing) and phenotype (loosely, the expression of the gene).  DNA is not a computer; DNA is, if anything, analogous to a computer programme.  It encodes what needs to be done.  It tells the machine (which you could consider to be the ribosome, cell or the individual)  what to do, and then the ribosome "reads" the DNA sequentially and produces the requisite protein.  Like a hugely parallel computer there are many processors which are all working at the same time, reading a part of the DNA programme and producing output (in the form of proteins).  Protein copying is pretty accurate, though there are errors, and there are many processes in cells to handle quality control.  The place where inaccuracy may occur is in copying the DNA (duplicating the programme).

I simply do not understand what is meant by "computer 2 is more energy efficient by a factor of a hundred or a thousand" What are we measuring? What is computer 2?  Assuredly, it cannot be DNA.  DNA does not compute any more than the a program computes.  Programs tell the computer what to do, DNA tells the cell what to do.  We could argue that the ribosome is the computer (it is where the mRNA (messenger RNA derived from the DNA) is actually processed.  But in order to work they need the whole organism to provide the energy and nutrients needed.

As to the sustainability issue, yes, DNA emerged (personally I like the ideas of Nick Lane in Life Ascending: The Ten Great Inventions of Evolution for how life arose).  However, given that DNA is just like a programme, then a programme can exist perfectly well on a CD or pen drive, without being plugged into the wall.  Equally, a virus's RNA or the DNA in a spore or many eggs can sit dormant without an energy source.  But, in both cases, nothing happens until energy is added.  In the case of a computer this will be electricity (battery, plugged into the wall or even just solar power) for a living being it will be stored energy (equivalent to a battery) as in an egg, consumed energy from eating something (equivalent to needing to be plugged into the wall) or solar energy in plants.

Where the description goes completely wrong is in the last part of the quote.  Selection occurs not at the gene (let alone base) level, but at the individual level.  If an organism is sub-optimal then it will be out-competed by others and leave fewer (or no) offspring.  A failure of the part thus means a failure of the whole.  Equally it is not possible to evaluate the quality of a part independently of the whole.  Selection acts on the individual, not the gene.