Performance Testing with Botium Box


Depending on how many load you want to generate you will have to match the hardware and software requirements for Botium Box Performance Testing.

What questions can be answered with a Performance test ?

Functional tests answering a simple question: is it working, or not. Performance tests are simulating human users, so they can answer more sensitive questions:

How can the chatbot deal with many users?

Before publishing a bot, good to know what are its limits, and how it behaves beyond its limits. Is it possible to kill it? Does it recover after a hevy load?

Stress test can help you to answer this question.

Do we have some (memory) leak?

The conversation is stateless, so executing the same conversation on the same server should take about the same time. If the catbot starts a never ending process, or does not releases some resource like memory, or background storage, then the system will be slower after time, or will die suddenly.

Load test is for detecting this.

Are the conversations stateless?

Chatbots are nowdays mostly static, we got the same answer for the same qestion. So it is a good test to repeat a question. It is even possible, that first time it works well, second time we got error.

Every Performance test can detect this problem, because the nature of the Performance Tests is repeating a convo.

Are the conversations thread safe?

Many problems can be detected with parallel execution of a conversation. For example deadlock, wrong resource sharing, or incorrect synchronization.

Performance test is not the ultimate tool for those mistakes. It can happen that everything goes well 10 times parallel, then the next time we got an error. The system is too complex to do exact the same thing every time. So lets say Botium can accidentally such problems as side effect.

Every Performance test can detect this problem, but Stress test fits better for this.

Walkthrough - A First Load Test

You can imagine Performance test as a simple loop, which repeats a conversation to simulate human behavior.

When you start a Performance test, you can choose Load test, Stress test, or Advanced mode. In Avanced mode you can set all parameters, in other two mode just the required ones.

To start a Load test, you have to choose a TestProject


Choose Performance test tab, and Load test

We dont want to wait half an hour, so we set the Test Duration to 5 min, and start it.

Below the start button, you can see already started Tests

If you starts it, you will go to Result page. The Status will be Pending. Wait 5 minutes, and push the refresh

Status must be READY, and we see the result as 3 chart. For example

We dont know what we have done, and what we see, but the colors are nice.

Performance Test Parametrization

Usage Examples

Example 1 - How can we simulate constant number of users, for example 5 user 7 times using Load test?

First you have to choose a Test Project with just one convo. If you would choose a Test Project with two convos, then you would emulate 10, 10, 10… users. It is not wrong, but not what we want. (You can use more convos if you want to emulate different conversations.)

Set Test Duration to 1 minute. Botium will execute a step in each 10 sec. If we set duration to 1, Botium will execute 7 steps. (7 steps are the minimal.)

Set Test Set Multiplicator to 5. It means that in the first step Botium will execute our Test Project 5 times. We have our starting number! And even better, because Load test works with constant load, we have all.

If we start the Test, and wait for 1 min, then the Total Convos Processed should be 35

Example 2 - How can we simulate increasing number of users, for example 1, 3, 5, 7, 9, 11, 13 in row using Stress test?

First you have to choose a Test Project with just one convo as in Load test.

Set Test Duration to 1 minute as in Load test.

Set Test Set Multiplicator to 1 as in Load test.

And if we set Increase Test Set Multiplicator to 2, then we will got all the users 1 to 13. (It is calculated this way: ConvoCountPerTestProject * ( Multiplicator + (i - 1) * IncreaseMultiplicator) )

And we have to set Maximum Parallel Users to 13. If Botium reaches this limit, then does not start a new until it finishes a running one. This parameter protects agent from overload, set it with care!

If we start the Test, and wait for 1 min, then the Total Convos Processed should be 49

Example 3 - How can we simulate a heavy load of 400 users per second for 5 minutes ?

You have a test set with 1 test case, typically a test case simulating the greeting process. You want to simulate 400 users per second talking to your chatbot.

Set Test Duration to 5 (minutes).

This load can only be generated with multiple Botium Box Agents - make sure you have at least 2 Botium Box Agents installed and connected to Botium Box. Switch to the Advanced mode and set the Parallel Jobs Count to 2.

Now comes the math:

  • You have 2 parallel jobs running so you have to make sure that there are roughly 200 test cases per second to come up with the load of the requested 400 per second

  • Each test step has a duration of 10 seconds, so you have to make sure to have 2000 test cases in the queue for each test step (10 * 200)

  • As the test set has exactly 1 test case, set the Test Case Multiplicator to 2000 to let Botium generate a load of 2000 test cases every 10 seconds, and this is done in 2 jobs in parallel, leading to a total load of 4000 every 10 seconds or 400 per second

  • To let Botium execute 400 test cases in parallel in 2 jobs, set the Parallel Convo Count to 200.

Parameter Description

Test Duration

The number of minutes Botium generates the given load.

Test Duration determines just when the last test step is started, not the end of the test.

Test Step Duration

Botium generates the load in iterations called test steps. On each test step, Botium adds the given load to the processing queue and the Botium Box Agents are running them as fast as your Chatbot allows.

Load in this context actually means the convos to perform - the convos that are part of the Test Set connected to the Test Project.

In Load/Stress test the steps are executed in each 10 sec. In Advanced mode you can set it.

Test Set Multiplicator

On each test step, the content of the Test Set is added to the processing queue. If you want to add it multiple times on each test step, basically repeating the same convos over and over again, you can increase the test set multiplicator.

Increase Test Set Multiplicator

You can simulate increasing load over time by increasing the test set multiplicator.

The multiplicator is calculated this way: ConvoCountPerTestProject * ( Multiplicator + (i - 1) * IncreaseMultiplicator) )

Cancel on Failed Convos

Convos can fail in Botium for two reasons:

  • Chatbot returns another text than expected or nothing

  • Any of the asserters is triggering a failure

You can decide to accept a certain amount of test case failures during the performance tests. If the percentage is higher than the given percentage the performance test is cancelled.

Parallel Convos Count / Maximum Parallel Users

This is the number of worker threads each Botium Box Agent is launching for generating the load. It roughly corresponds to the number of parallel user sessions coming from a single Botium Box Agent your Chatbot will see.

Each worker thread is generating the load sequential, meaning that in case all of the available worker threads are waiting for a chatbot response no more load is generated until responses are received to free worker threads.

Parallel Jobs Count

The given load is generated by the Botium Box Agents. You can have multiple times the load generated by telling Botium to run the load from more than one Botium Box Agent.

You have to install and connect multiple Botium Box Agents

If you want generate a heavy load to your API, then the Test environment should not be the bottleneck. If you set this to 2, then two agents will work on the Performance test. They wont share the tasks, but booth will execute the same Convos. The number of users, and so the load is duplicated on Chatbot API, but not on Test environment.

If there are not enough Agents you wont get error message. If an agent finishes all convos (and so a Job), then it starts a new Job.

If the agents are running on a single PC, it is possible that the bottleneck remains in Test environment.

Data density

It is the data sampling ratio. Determines how fine is the chart, and the exported data. Too low value can add heavy load to Botium Box Server. To protect the server the data is truncated at 1000 record.

Shared Botium Core Session (Botium Box > 2.8.2)

By default, for each single convo execution a separate Botium Core session is started. Depending on the connector technology this will take additional time for session setup, which slows down the total test execution duration (but it is not included in the measured response times). If you don’t care about measuring performance for individual user sessions and if the connector technology is not depending on building individual user sessions then you can enable this switch to speed up the performance testing process.

Test results

Failed test

Charts are not the only output of the Performance test. It can even fail, even if the functional tests are executed without error.

Chatbot Response Time chart

It is the most important for performance questions like “Do we have some (memory) leak?“.

What you see there depends on parameters. If you started a Stress test, then a flat line means that the Response Time does not depend on the number of users.

If the Response Time decreases, then it must be some optimalization, like caching.

If it increases, then you have to decide, is it acceptable, or not.

Convo Processing Delay chart

Convo Processing Delay chart is general purpose chart do detect performance problems in Test environment. It cant say you what the problem is, just indicates that the Test enviroment is the bottleneck.

Possible problems are:

  • There are not enough Agents (can happen only if you set Parallel Jobs Count above 1)

  • Agent is overloaded (solution vertical or horizontal scaling)

  • Test Step Duration is too small (see next section).

Processed Convo Count chart

To understand Processed Convo Count chart, you have to know that every convo is delayed a little. So if Delayed Convos follows Processed Convos, then everything is fine. But if Delayed Convos is above of Processed Convos, then two steps are overlapping each other (Test Step Duration is not enough to execute all convos of the step. You can try to increase it)


  • Better indicate not enough agents problem

  • It is not sure that all users are executed parallel in a step as we wanted. This depends on many conditions. It would be good to detect how many are executed parallel actually.

  • Clean Processed Convo Count chart. It is disturbing that every convo is delayed. Maybe just divide the two counts.

  • It is possible that not all convos are executed parallel in a test set. For example if there are many, and chatbot is fast.