Case Study Design Of Experiments Examples

Introduction

 

When you visit a supermarket, you might feel overwhelmed with the discounts and free gifts that you get with your purchase. Have you ever imagined, what makes a company decide if you will be excited more by ‘discounts’ or ‘free gift’? How could they even know about you so closely?

As analytics capabilities continue to evolve across businesses and geographies, it has been observed that marketing managers expect analytics departments to provide insights into numerous questions such as “Do our customers love a free gift more than a discount?” “Do our customers respond to advertising that contain the picture of a sports icon?” so on and many more…

It requires an analyst to delve deep into the data to find these answers, using all the available tools and techniques. But, what if we do not have the data? If the company has never leveraged a popular personality for advertising or if it has never offered a free gift, then how will data help us answer the question?

A situation where relevant data remains unavailable is quite common these days. When encountered with such a situation, we either take help of expert judgment, or try to identify suitable proxies or “ask the customer”. Once we execute the latter, we obtain the relevant data required to answer the question of interest. The process of “asking the customer” entails performing experiments or tests where one is able to read the result and obtain answers for the questions of interest.

 

The Concept of Testing (A/B, Split-Run, Flip-Flop and Test vs. Control)

A/B testing, split-run testing or tests vs. control comparisons are common methodologies that are adopted to understand the impact of single factor on customer behaviour.

Split – Run Testing

In order to test the effectiveness of a marketing communication (mostly print advertisement), one can either use a “split-run” testing or a “flip-flop” testing. Split-run testing is by far the most effective way of testing a print advertisement. For running a “split-run” testing, two different versions of the same advertisement, each with a different identification number, are placed in the publication as a split insertion on the same date.

This will ensure that exactly half of the publications will carry version one of the advertisement and the other half will carry the second version. Hence, the results of the split-run test can be thought of as two advertisements run on a random sample of the publication. The way the advertisements are inserted ensures that the samples are absolutely random in every respect. A very similar concept can be used for testing website banner advertisements as well.

 

Flip-Flop Testing

In a case, where a magazine does not offer the flexibility of running a split-run campaign but has a separate regional publication for various regions, then one can use the region-1 publication for one version of the advertisement and the region-2 publication for another. This form of testing is called flip-flop testing. It is an approximation of a split-run testing. The biggest shortcoming of this testing is that the two samples are not random and hence, there can be an inherent regional bias in the test results.

 

Test vs. Control

A control group is defined as a group of customers which are identical to the customers and eligible for a campaign or any other targeted marketing action. However, they are not subjected to any action under consideration. The behaviour of customers in the control group is compared with the behaviour of customers who are subjected to the marketing action. This comparison provides a good understanding of the impact of the marketing action in question.

 

Problems with Traditional Testing

The testing methodologies mentioned above provide robust answers for incremental impact of a single marketing intervention (or factor) one at a time. Then, what about the situation when the factors are too many in number?

In such case, one needs to conduct a large number of tests to ascertain the impact of each intervention (or factor). As we know, it takes significant amount of time and money to read and infer the results of a test, thus it is advisable that one should test the impact of multiple factors, do something different so as to ensure that one can generate all the required learnings within the limited budget that is available. What does one need to do differently? Let’s find out using an example.

Hence, in case one needs to test the impact of multiple factors, one needs to do something different so as to ensure that one can generate all the required learnings within the limited budget that is available. What does one need to do differently? Let’s find out using an example discussed in following sections.

 

The Concept of Design of Experiments

Marketers often need to test the impact of a wide range of targeting, advertising, promotion, pricing and product options to find out optimal combination of factors and obtain all the desired results at the minimum possible cost.

As marketing budget is always limited, it becomes impossible to test all combinations of every marketing parameter. Therefore, marketers often build a testing framework which helps them in identifying the critical few learning that they would like to derive out of the available test budget. In many cases, the concept of design of experiments is widely used in building the testing framework.

Design of experiments or DoE is a common analytical technique implemented to design the right testing framework. To illustrate the use of design of experiments, let’s begin with web banner advertising.

There are multiple factors which affect the successes of a banner advertisement. It is important to quantify the “success metric” for a banner advertisement. The most common success metric that is used is called the Click Through Rate (CTR). Click through rate is a very simple metric which is calculated as: Number of visitors clicking the link in the advertisement divided by the number of visitors who are exposed to the advertisement.

The success of a banner advertisement depends on numerous factors such as: website where the advertisement is displayed (possibly the most important), content of the advertisement, the placement of the advertisement etc. With available combination of advertising variables, the concepts of DoE can be very accurately applied and measured in this scenario.

Enough of theory I guess, let’s understand this concept practically now! For simplicity, I’ve consider an advertisement, which consists of the following features:

  • A picture
  • A text message about the offer and product
  • A redirect link(which takes to the landing page of advertiser). This is ‘Call to Action’ Link.

This example involves the following parameters.

  • Position of the picture: Left, Right, Middle
  • Position of the Call to Action link: Top and Bottom
  • Presence of animation or movement in the picture: Yes, No
  • Position of the banner advertisement on the web page: Left and Right

The parameters (mentioned above) are also referred to as factors. The values that a parameter or factor takes is often referred to as levels or attributes. For example “Position of the picture” is a parameter or factor, and the values that it takes i.e. “Left”, “Right” and “Middle” are levels/attributes.

Figure-1 illustrates the combinations (other than the presence or absence of animation).

Figure-1: Depiction of the parameters of banner advertisement

In order to ascertain the effectiveness of all these components, it is critical to conduct experiments where visitors are exposed to all possible combinations shown above and the effect of the same is measured on the click through rate.

Table-1 depicts the total possible combinations. The cells marked in grey are the ones which take a value of zero for that particular combination. For example:

  • The combination C1 involves:
    • Position of picture: left
    • Position of call to action link: top
    • Presence of animation: yes
    • Position on website: left

Table-1: All possible combinations of the parameters

It can be observed that, there are 3 possible positions of the picture, 2 possible positions of the call to action link, 2 configurations with regards to animation (presence or absence) and 2 possible placements on the web site (left or right). Hence there will be 3*2*2*2 = 24 combinations that one could have; this is a large number of possible combinations to explore individually.

Marketers have used the concept of design of experiments to limit the number of combinations (out of the set of all possible combinations) which needs to be tested to make meaningful inferences. To understand, how design of experiments can help one in limiting the number of combinations that need to be tested, one needs to understand the effects of each attribute or level separately and the effect of these attributes acting in tandem.

 

Design of Experiments without Interaction Effects

The levels of a particular parameter or factor are used as variables for constructing the response function for each combination listed in Table-1. For example the factor “Position of picture” comprises of 3 levels. Therefore, due to degree of freedom constraints, it would require two variables to construct the response equation; any two of the levels can be used as binary variables. In case of position, one can use “Left” and “Right” as two binary variables. If the picture position is on the left then the binary variable “Left” takes the value of 1 otherwise it takes the value of 0. If the picture position is on the right then the binary variable “Right” takes the value of 1, otherwise it takes the value of 0. If the picture position is in the middle, then both the variables “Left” and “Right” takes the value 0.

Similarly, I could use 1 variable each for the other parameters (as all the other parameters consists of two levels each). If one assumes no interaction effect between the factors, then the generic response function can be written as:

Ln(CTR/(1-CTR)) = α + β1(Position of picture is left) + β2(Position of picture is right) + β3(Position of call to action link is top) + β4(Presence of animation is yes) + β5(Placement on web site is left)

In this expression “CTR” represents the probability of response or click through rate. β’s represent the effect of each attribute or level on probability of response.

Based on past experience, it has been found that in most cases, responses can be predicted by using a logistic function. The generic response function needs to be applied to each design combination. The resulting function for each design combination is depicted in Table-2.

Table-2: The Response Equation for all Possible Combinations of the Parameters

From the table, it can be observed that if one tests combination C4 (ln(CTR4/(1-CTR4))=α+β1 +β3 ) and C23 (ln(CTR23/(1-CTR23))=α +β5), then one could easily estimate the click through rate for combination C3 (ln(CTR3/(1-CTR3))=α+β1 +β3 +β5). It can be seen that:

(ln(CTR4/(1-CTR4) ) + (ln(CTR23/(1-CTR23) ) = (ln(CTR3/(1-CTR3) )

This feature is the key benefit of a properly designed experiment or test. By performing limited number of tests, it is possible to infer the results of some combinations, which have not been tested.

A case, where one tests all the combinations involved is referred to as “full factorial design”. On the other hand, as mentioned above, if the marketer is able to eliminate certain combinations, and test a limited set of combinations, then the same is referred to as “partial factorial design”

Table-3 illustrates how a limited set of experiments that can be used to compute all the required test results.

Table-3: The Partial Factorial Design

The analytical objective involves estimating the coefficients α, β1, β2, β3, β4, β5. The following combinations can be used to estimate the coefficients:

  • Estimating α: If one has results of experiment C24 one will be able to ascertain the value of α
  • Estimating β3: If one has the results of C4 and C8 then one can obtain the value of β3.
  • Estimating β2: If one has the results of experiment C12 then one can plug in the values of α and β3 to obtain β2
  • Estimating β4: If one has the result of C10 one could use the values of α β2 and β3 to obtain β4
  • Estimating β1: The value of β4 can then be plugged into the result of C6 to obtain β1
  • Estimating β5: The value of β5 can be obtained by plugging in the value of β2 into the result of experiment C15

It can be observed that by conducting only 7 experiments (C4, C8, C12, C14, C6, C10 and C15), one can obtain all the information that can be obtained by conducting 24 experiments. Hence, the concept of design of experiments has used to reduce the experiments from 24 to 7.

The property mentioned above, is the major benefit of partial factorial design where one can obtain the required learning without conducting all the possible experiments. However, as mentioned earlier, this approach assumes that there exists no interaction between the factors. It will be a worthwhile exercise to find out the minimum number of experiments that one will have to perform if presence of interaction is considered.

 

Design of Experiments with Interaction Effects

As a critic of the partial factorial approach, one could argue that the combination of an animation and placement of the advertisement to the right of the website would be more effective in conjunction, because most viewers tend to focus on the right side of the screen. This implies that the interaction between placement and animation needs to be taken into account. Hence the generic response function would take the following form:

Ln(CTR/(1-CTR)) = α + β1(Position of picture is left) + β2(Position of picture is right) + β3(Position of call to action link is top) + β4(Presence of animation is yes) + β5(Placement on web site is left) + β10(Placement on web site is left & Presence of animation)

It would be worthwhile to find out the minimum number of experiments that one will have to conduct if one assumes the presence of interaction effects. It can be easily seen, that it is difficult to limit the number of experiments or tests that needs to be conducted if there are significant number of interactions.

To generate the maximum learning from any test program, it is best to adopt a full factorial test design whereby all the possible combinations are tested. However, because of cost constraints a partial factorial design is often favoured. While adopting a partial factorial design, appropriate assumptions about interaction effects need to be put into place to limit the number of experiments that one needs to conduct.

Based on prior business knowledge one can eliminate certain interactions, thereby reducing the number of tests that should be performed. In this case, if one assumes that the only interaction effect that exists is between the placement of the advertisement and animation, then it will be interesting to find out the number of tests that needs to be conducted to estimate all the coefficients involved.

 

End Notes

In this article, I’ve elaborated the concept used behind Design of Experiments. By now, you would have got an intuition about the strategies that companies use to decide the best mode of advertisement for them. Earlier, companies use to face too much trouble in deriving positive returns on marketing budget, but this technique has not only saved million of hard cash, but has also provided a prudent method to reap benefits intelligently.

Did you find this article useful? Have you ever made use of this concept at work? What was your experience? I’ll be happy to hear from you in the comments section below.

 

About the Authors

Sandhya Kuruganti and Hindol Basu are authors of a book on business analytics titled “Business Analytics: Applications to Consumer Marketing”, recently published by McGraw Hill. The book is available on Flipkart and Amazon India/UK/Canada. They are seasoned analytics professionals with a collective industry experience of more than 30 years.

If you like what you just read & want to continue your analytics learning, subscribe to our emails, follow us on twitter or join our Facebook Group.

Design of Experiments – A Primer

K. Sundararajan

Design of experiments (DOE) is a systematic method to determine the relationship between factors affecting a process and the output of that process. In other words, it is used to find cause-and-effect relationships. This information is needed to manage process inputs in order to optimize the output.

An understanding of DOE first requires knowledge of some statistical tools and experimentation concepts. Although a DOE can be analyzed in many software programs, it is important for practitioners to understand basic DOE concepts for proper application.

Common DOE Terms and Concepts

The most commonly used terms in the DOE methodology include: controllable and uncontrollable input factors, responses, hypothesis testing, blocking, replication and interaction.

  • Controllable input factors, or x factors, are those input parameters that can be modified in an experiment or process. For example, in cooking rice, these factors include the quantity and quality of the rice and the quantity of water used for boiling.
  • Uncontrollable input factors are those parameters that cannot be changed. In the rice-cooking example, this may be the temperature in the kitchen. These factors need to be recognized to understand how they may affect the response.
  • Responses, or output measures, are the elements of the process outcome that gage the desired effect. In the cooking example, the taste and texture of the rice are the responses.

The controllable input factors can be modified to optimize the output. The relationship between the factors and responses is shown in Figure 1.

Figure 1: Process Factors and Responses

  • Hypothesis testing helps determine the significant factors using statistical methods. There are two possibilities in a hypothesis statement: the null and the alternative. The null hypothesis is valid if the status quo is true. The alternative hypothesis is true if the status quo is not valid. Testing is done at a level of significance, which is based on a probability.
  • Blocking and replication: Blocking is an experimental technique to avoid any unwanted variations in the input or experimental process. For example, an experiment may be conducted with the same equipment to avoid any equipment variations. Practitioners also replicate experiments, performing the same combination run more than once, in order to get an estimate for the amount of random error that could be part of the process.
  • Interaction: When an experiment has three or more variables, an interaction is a situation in which the simultaneous influence of two variables on a third is not additive.

A Simple One-factor Experiment

The comparison of two or more levels in a factor can be done using an F-test. This compares the variance of the means of different factor levels with the individual variances, using this equation:

F = ns2Y-bar / s2pooled

where:
n = the sample size
s2Y-bar = the variance of the means, which is calculated by dividing the sum of variances of the individual means by the degrees of freedom
s2pooled = pooled variance, or the average of the individual variances

This is similar to the signal-to-noise ratio used in electronics. If the value of F (the test statistic) is greater than the F-critical value, it means there is a significant difference between the levels, or one level is giving a response that is different from the others. Caution is also needed to ensure that s2pooled is kept to a minimum, as it is the noise or error term. If the F value is high, the probability (p-value) will fall below 0.05, indicating that there is a significant difference between levels. The value of 0.05 is a typical accepted risk value.

If F = 1, it means the factor has no effect.

As an example of a one-factor experiment, data from an incoming shipment of a product is given in Table 1.

Table 1: Incoming Shipment Data

LotData
A61, 61, 57, 56, 60, 52, 62, 59, 62, 67, 55, 56, 52, 60, 59, 59, 60, 59, 49, 42, 55, 67, 53, 66, 60
B56, 56, 61, 67, 58, 63, 56, 60, 55, 46, 62, 65, 63, 59, 60, 60, 59, 60, 65, 65, 62, 51, 62, 52, 58
C62, 62, 72, 63, 51, 65, 62, 59, 62, 63, 68, 64, 67, 60, 59, 59, 61, 58, 65, 64, 70, 63, 68, 62, 61
D70, 70, 50, 68, 71, 65, 70, 73, 70, 69, 64, 68, 65, 72, 73, 75, 72, 75, 64, 69, 60, 68, 66, 69, 72

When a practitioner completes an analysis of variance (ANOVA), the following results are obtained:

Table 2: ANOVA Summary

GroupsCountSumAverageVariance
A251,44957.9631.54
B251,48359.3223.14333
C251,57062.8018.5
D251,70868.3227.64333
ANOVA
Source of VariationSSdfMSFp-valueF-crit
Between groups1,601.163533.7221.173761.31 x 10-102.699394
Within groups2,419.849625.20667
Total4,02199

Statistical software can provide hypothesis testing and give the actual value of F. If the value is below the critical F value, a value based on the accepted risk, then the null hypothesis is not rejected. Otherwise, the null hypothesis is rejected to confirm that there is a relationship between the factor and the response. Table 2 shows that the F is high, so there is a significant variation in the data. The practitioner can conclude that there is a difference in the lot means.

Two-level Factorial Design

This is the most important design for experimentation. It is used in most experiments because it is simple, versatile and can be used for many factors. In this design, the factors are varied at two levels – low and high.

Two-level designs have many advantages. Two are:

  1. The size of the experiment is much smaller than other designs.
  2. The interactions of the factors can be detected.

For an example of a two-level factorial design, consider the cake-baking process. Three factors are studied: the brand of flour, the temperature of baking and the baking time. The associated lows and highs of these factors are listed in Table 3.

Table 3: Cake-baking Factors and Their Associated Levels

FactorName UnitsLow Level (-) High Level (+)
AFlour brandCostCheapCostly
BTimeMinutes1015
CTemperatureDegrees (C)7080

The output responses considered are “taste” and “crust formation.” Taste was determined by a panel of experts, who rated the cake on a scale of 1 (worst) to 10 (best). The ratings were averaged and multiplied by 10. Crust formation is measured by the weight of the crust, the lower the better.

The experiment design, with the responses, is shown in Table 4.

Table 4: Settings of Input Factors and the Resulting Responses

Run OrderA: BrandB: Time (min)C: Temp. (C)Y1: Taste (rating)Y2: Crust (grams)
1Costly(+)10(-)70(-)750.3
2Cheap(-)15(+)70(-)710.7
3Cheap(-)10(-)80(+)811.2
4Costly(+)15(+)70(-)800.7
5Costly(+)10(-)80(+)770.9
6Costly(+)15(+)80(+)320.3
7Cheap(-)15(+)80(+)420.5
8Cheap(-)10(-)70(-)743.1

Analysis of the results is shown in Table 5. Figures 2 through 4 show the average taste scores for each factor as it changes from low to high levels. Figures 5 through 7 are interaction plots; they show the effect of the combined manipulation of the factors.

Table 5: ANOVA Table for the Taste Response

FactordfSSMSFEffectContrastpF-crit at 1%
Brand12.02.00.0816-1-4.000.8216.47
Time1840.5840.534.306-20.5-82.000.11
Brand x time10.50.50.02040.52.000.91
Temp1578.0578.023.592-17-68.000.13
Brand x temp172.072.02.9388-6-24.000.34
Time x temp1924.5924.537.735-21.5-86.000.10
Brand x time x temp124.524.51-3.5-14.000.50
Error124.524.5
Total72442.0

Figure 2: Average Taste Scores for Low and High Flour Brand Levels

Figure 3: Average Taste Scores for Low and High Bake Time (Minutes) Levels

Figure 4: Average Taste Scores for Low and High Baking Temperature (C) Levels

Figure 5: Average Taste Scores for Flour Brand by Time (Minutes)

Figure 6: Average Taste Scores for Flour Brand by Temperature (C)

Figure 7: Average Taste Scores for Time (Minutes) by Temperature (C)

From reading an F table, the critical F value at 1 percent is 16.47. As the actual value of F for time and temperature exceed this value (time is at 34.306 and temperature is 23.592), it’s possible to conclude that both of them have a significant effect on the taste of the product. This is also evident from Figures 3 and 4, where the line is steep for the variation of these two factors. Figure 7 also shows that when the temperature is high, the taste sharply decreases with time (as charring takes place).

For the crust formation, the data analysis is shown in Table 6.

Table 6: ANOVA Table for the Crust Response

FactordfSSMSFEffectContrastF-crit at 1%
Brand11.41.41.4938-0.825-3.3016.47
Time11.41.41.4938-0.825-3.30
Brand x time11.11.11.15360.7252.90
Temp10.50.50.4952-0.475-1.90
Brand x temp10.70.70.72570.5752.30
Time x temp10.10.10.06720.1750.70
Brand x time x temp10.90.91-0.675-2.70
Error10.90.9
Total75.9

In this case the actual F value for the three factors (brand, time and temperature) are below the critical F value for 1 percent (16.47). This shows that these are not significant factors for the crust formation in the cake. If further optimization of the crust formation is needed, then other factors, such as the quantity of ingredients in the cake (eggs, sugar and so on), should be checked.

Versatile Tool for Practitioners

Design of experiments is a powerful tool in Six Sigma to manage the significant input factors in order to optimize the desired output. Factorial experiments are versatile because many factors can be modified and studied at once. The following resources can be helpful in learning more about DOEs:

  1. DOE Simplified Practical Tools for Effective Experimentation (Productivity Inc., 2000)
  2. Design and Analysis of Experiments (John Wiley and Sons, 1997)

0 Thoughts to “Case Study Design Of Experiments Examples

Leave a comment

L'indirizzo email non verrà pubblicato. I campi obbligatori sono contrassegnati *