Disclaimer: The opinions of the columnists are their own and not necessarily those of their employer.
C. Warren Axelrod

Can We Prevent Knight Capital Types of Debacle?

… or are we destined for such disasters to be repeated with increasing frequency?

It can be construed from reports as to why new software installed before the New York Stock Exchange opened for business on August 1, 2012 by securities firm Knight Capital went berserk, that the firm does in fact test their software rigorously before releasing it.

Ironically, in this case, it might well have been a testing module, in the form of a transaction generator, that caused the problem. According to an August 3, 2012 piece by Nanex with the title “The Knightmare Explained,” see <a href=”http://www.nanex.net/aqck2/3525.html”>www.nanex.net/aqck2/3525.html</a> , the deluge of erroneous trades released into the marketplace on the morning of August 1, 2012 occurred because a testing module had been left in the deployed software when it should have been removed. The testing software immediately began generating huge numbers of transactions without notifying operators as to what it was doing. This caused Knight Capital to harvest some $440 million pre-tax in trading losses in less than an hour. The consequential costs will surely be many times that number when you factor in losses in market value of the company, management oversight by new owners, increased surveillance by regulators, and burdensome new testing requirements.

While many, including me, had thought that the disaster might have been caused by a programming error, The Nanex Group’s explanation is more plausible, and if found to be accurate, reveals a very ironic situation. That a module used to generate test trades could be the cause of disastrous trading activity puts the event in the category of “no good deed goes unpunished.” It appears to have been just because Knight Capital was performing rigorous testing that the rogue module existed, and if the firm had not been using sophisticated testing tools, the inadvertent inclusion of the testing module in the deployed software would not have occurred in the first place. Of course, if Knight Capital did not use such testing tools, then it would have been more likely that erroneous program code could have been released.<!–nextpage–>

A procedural error, as may have been the cause of this mishap, is virtually impossible to catch through functional testing the software programs. Even, as I had recommended in a previous column, “Glitch Reporting Glitch … Where was V&amp;V?” if they might have caught Knight Capital’s error if extensive “functional security testing” had been invoked, it still might not have caught the procedural problem, which is being blamed for the mishap.  What was probably needed were much more rigorous software deployment procedures.

On the other hand, if functional security testing had been used, one of the scenarios might have been … “If a tsunami of transaction activity is unleashed, how does the system prevent runaway trading? And does the system alert operators as to what is going on?”Here, the cause might not have been specified, but the answer might have suggested putting in additional “circuit breakers” and “alerts” into the programs, assuming that Knight Capital already has some form of risk engine incorporated into their system.

Unfortunately, “fully” testing of software is not achievable. As I have stated before in my May 25, 2010 column, “Bungee Jumps, Stock Markets and Negative Testing,” functional security testing requires orders of magnitude more time and effort than regular functional testing. My recommendation is to subject the software to a relatively small randomly-selected subset of tests and to increase the sample size if the statistical results suggest doing so.

I have also called out the need to build much more instrumentation into applications so that you can better know what is happening within the system—see my November 22, 2010 column “Old Mother Hubbard and ‘Building Data Collection In.’” Inadequate data reporting was apparently a deficiency of the Knight Capital system<!–nextpage–>

The value of second guessing the Knight Capital situation is questionable. It will be better to wait until a full forensics investigation is completed and the results published (if indeed they are made public). Many reporters jumped to the conclusion that some error in the program code was at fault. However, it is quite possible that this explanation of programming errors will prove to be false and that a procedural error was the culprit. If that were to be the case, however, it still does not detract from the need to improve and expand testing. It also means that some out-of-the-box thinking is needed, as described in my August 30, 2010 column, “Eureka! Professor Does FST (Functional Security Testing).”

Increasingly, we are deploying high-integrity, high-availability mission-critical software systems into unforgiving environments where meltdown can occur in seconds. It is clear from the Knight Capital case and other recent software-generated debacles, that the traditional approaches to testing software programs and reviewing processes do not do the job. We need a quantum jump in these efforts if we are not to be subjected to increasingly frequent malfunctions and failures.Rich Text Area










… or are we destined for such disasters to be repeated with increasing frequency?


It can be construed from reports as to why new software installed before the New York Stock Exchange opened for business on August 1, 2012 by securities firm Knight Capital went berserk, that the firm does in fact test their software rigorously before releasing it.


Ironically, in this case, it might well have been a testing module, in the form of a transaction generator, that caused the problem. According to an August 3, 2012 piece by Nanex with the title “The Knightmare Explained,” see www.nanex.net/aqck2/3525.html , the deluge of erroneous trades released into the marketplace on the morning of August 1, 2012 occurred because a testing module had been left in the deployed software when it should have been removed. The testing software immediately began generating huge numbers of transactions without notifying operators as to what it was doing. This caused Knight Capital to harvest some $440 million pre-tax in trading losses in less than an hour. The consequential costs will surely be many times that number when you factor in losses in market value of the company, management oversight by new owners, increased surveillance by regulators, and burdensome new testing requirements.


While many, including me, had thought that the disaster might have been caused by a programming error, The Nanex Group’s explanation is more plausible, and if found to be accurate, reveals a very ironic situation. That a module used to generate test trades could be the cause of disastrous trading activity puts the event in the category of “no good deed goes unpunished.” It appears to have been just because Knight Capital was performing rigorous testing that the rogue module existed, and if the firm had not been using sophisticated testing tools, the inadvertent inclusion of the testing module in the deployed software would not have occurred in the first place. Of course, if Knight Capital did not use such testing tools, then it would have been more likely that erroneous program code could have been released.


A procedural error, as may have been the cause of this mishap, is virtually impossible to catch through functional testing the software programs. Even, as I had recommended in a previous column, “Glitch Reporting Glitch … Where was V&V?” if they might have caught Knight Capital’s error if extensive “functional security testing” had been invoked, it still might not have caught the procedural problem, which is being blamed for the mishap.  What was probably needed were much more rigorous software deployment procedures.


On the other hand, if functional security testing had been used, one of the scenarios might have been … “If a tsunami of transaction activity is unleashed, how does the system prevent runaway trading? And does the system alert operators as to what is going on?”Here, the cause might not have been specified, but the answer might have suggested putting in additional “circuit breakers” and “alerts” into the programs, assuming that Knight Capital already has some form of risk engine incorporated into their system.


Unfortunately, “fully” testing of software is not achievable. As I have stated before in my May 25, 2010 column, “Bungee Jumps, Stock Markets and Negative Testing,” functional security testing requires orders of magnitude more time and effort than regular functional testing. My recommendation is to subject the software to a relatively small randomly-selected subset of tests and to increase the sample size if the statistical results suggest doing so.


I have also called out the need to build much more instrumentation into applications so that you can better know what is happening within the system—see my November 22, 2010 column “Old Mother Hubbard and ‘Building Data Collection In.’” Inadequate data reporting was apparently a deficiency of the Knight Capital system


The value of second guessing the Knight Capital situation is questionable. It will be better to wait until a full forensics investigation is completed and the results published (if indeed they are made public). Many reporters jumped to the conclusion that some error in the program code was at fault. However, it is quite possible that this explanation of programming errors will prove to be false and that a procedural error was the culprit. If that were to be the case, however, it still does not detract from the need to improve and expand testing. It also means that some out-of-the-box thinking is needed, as described in my August 30, 2010 column, “Eureka! Professor Does FST (Functional Security Testing).”


Increasingly, we are deploying high-integrity, high-availability mission-critical software systems into unforgiving environments where meltdown can occur in seconds. It is clear from the Knight Capital case and other recent software-generated debacles, that the traditional approaches to testing software programs and reviewing processes do not do the job. We need a quantum jump in these efforts if we are not to be subjected to increasingly frequent malfunctions and failures.




Path:




Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*