Stress Testing


I am often asked about stress testing and the suitability of this framework for creating them as an xp style acceeptance test. In my personal experience I have considered two and implemented one such test in the context of fit style tests. See FrameworkHistory.

  • Simultaneous Users -- Fit style tests were used to simulate the activity of simultaneous users. A special variation of the test runner was constructed that would launch runs as separate processes all running against the same application server. To support this the fit style framework was modified to randomize specific values so that the runs were not in direct conflict over, say, customer numbers. The reports were also modified to record wall clock time of each event so that any errors could be correlated.

  • Network Behavior -- A fit style test was designed to test reliable transmission of multimedia files over unreliable networks. The unreliable network was simulated using a piece of equipment designed for that purpose. The equipment offered remote api access to a dozen or so parameters such as latency and packet loss rates. Fit tests would be written that specified the expected behavior of the system under various conditions. Then the fixture interpreting these tests would perform a statistically significant number of tries and check measured performance to these specifications.

In both cases the successful running of the tests would be seen as evidence of the completion of stories. Both setups were (would be) sufficiently general to test a variety of stress related stories appearing in multiple iterations.

An interesting aside, both setups required more equipment to run than that usually associated with the build machine. There is a tendency to think of this equipment as underutilized if it is only used during builds and then only to rerun tests that rarely fail. Such thinking is, of course, false economy compared with the costs of having a mistake unnoticed.


From an email post elaborating on the simultaneous users case ...

There are probably as many useful configurations as there are system architectures to be tested. That is what makes it hard to see the ultimate solution. Also, this doesn't have to be complex to be useful. The experience that is vivid in my mind was oh so simple. It went something like this ...

  • Working on a three-tier application, we wrote our fixtures to work against the same server-faceing communication utilities that the client application used. That made our tests appear as another user to the application server.

  • We had a suite of tests that passed. So our first test was to see if they would pass over and over. We did this by writing a loop in a batch file. We also monitored memory usage in the application server which was hard to correlate with the persistent (but single user) activity. After some study we decided that it was actually ok.

  • Then we changed the batch script to launch sub-shells running copies of the batch runner. This failed with two copies running because the tests went for the same customer and one was properly locked out. That is when we added the randomizing feature. It was something simple like $customer1 and $customer2 being changed to randomly generated customer numbers. The generated numbers were remembered in a hash because they appeared several times in the tests.

  • We made the batch script take a number as argument and it would launch that many copies. We found that we could get errors with two or three copies running at once. These were errors of the "this can't happen" form where the app server was talking to the db. That's when we added clock time to every step of our action fixtures (See fit.TimedActionFixture) so that we could correlate test steps with each other and with the log produced by the database. (Check your clock sync before you try this.)

  • We found that we could reasonably run up to ten copies of the runner on a single client machine. One reason we did this with batch files was so that we could recruit lots of machines around the office and run the script simultaneously from all of them. I don't remember actually ever doing this because most of our bugs were found running two or three copies, not the bigger number. Also, the client machine didn't have to work very hard to make the app server and db very busy. The client machine was saving itself to do the ui code which we were bypassing.

I'll do this all over again if the situation arises. The batch script caused n of those ugly black dos windows to open on the client machine. This sounds unsophisticated, but it did give us something to look at while this thing ran. And were were watching the app server too. I proposed writing something more general than the $customer1 substitution. I suggested scanning all string data for # signs and replacing them with random digits. For example,

	phone number: 1-503-246-####

I thought this to be easy and intuitive. My pair at the moment argued for something simpler which proved adequate to our needs without consuming one whole character from the alphabet.

I'd like to see what it takes to do the same thing as the batch script with threads in a single vm. I don't think it would code out much differently in a LoadRunner or a LoadFixture. The same problems would be present. Mostly that is reporting the right things in a sensible way. A LoadRunner could, I suppose, include some fancy interactive graphics that wouldn't be easy in a fixture. On the other hand, I've had some good experience with the AllFiles fixture regarding selective reporting failures only as footnotes. Be warned: the fixtures destructively modify the Parse. It should be cloned for each thread.

 

Last edited May 4, 2003
Return to WelcomeVisitors