On-Farm Testing – A Scientific Approach to Grower Evaluation of New Technologies

Pacific Northwest Conservation Tillage Handbook Series No. 9
Chapter 10 – Economics and Application of New Technology, May 1999

Authors: Roger Veseth, WSU/UI Extension Conservation Tillage Specialist, Moscow, ID; Stewart Wuest, USDA-ARS Soil Scientist and former STEEP II On-Farm Testing Project Coordinator, Pendleton, OR; Russ Karow, OSU Extension Agronomist, Corvallis, OR; Stephen Guy, UI Extension Crop Management Specialist, Moscow; Don Wysocki, OSU Extension Soil Scientist, Pendleton, OR

What is On-Farm Testing?

First, on-farm testing (OFT) is not researcher-managed small plots on farms. It also in not, single-strip, split-field or field-to-field comparisons of something new with the current farming practices. On-farm testing as we define it, is replicated, scientifically-valid research with field trials established and managed by the growers with field-scale equipment. Properly designed, grower on-farm tests can separate the effects of natural field variability from the effects of treatments being compared, and can provide an accurate basis for grower management decisions.

Why Do Growers Need “Scientific” Field Trials?

They don’t always. They don’t need scientific comparisons to answer questions like: “Will this new crop grow to maturity under my production conditions?” But they do need a scientific approach if they want to compare the effects of management options on crop yields or other production factor of interest.

The need for scientific experimental designs in grower field trials may not be readily apparent. Growers are usually very adept at observing how a new practice or management options performs and making decisions based on their farming experience. If they weren’t right much of the time, they probably wouldn’t in business today.

Growers have been exploring new farming methods for thousands of years, but only recently has a substantial effort been made to bring the principles of modern scientific methods to their aid. Growers have often evaluated a new practice by applying it to a small field and comparing the results with nearby fields, or by splitting a field and applying the new practice on one side and their normal practice on the other. Likewise, growers and industry reps sometimes place a strip of a new herbicide, fertilizer, or other production option in a field to compare it with the rest of the field. These are called “demonstrations” and they allow a local comparison of how a practice “looks.” It can be an important first step. The problem comes when you want more than a “look.” It is simply not possible to make reliable comparisons of yields and other “quantitative” data without a scientific approach.

Northwest growers have expressed concern about adopting equipment and technologies that have not been tested in their agriclimatic conditions and cropping systems. Northwest cropland areas contain highly variable soils, topography, climatic conditions, and cropping systems, making testing and transfer of new farming technologies especially difficult. In this variable cropland, and even in “uniform” areas, design of OFTs is a critical first step in accurate field comparison of management options.

Grower participatory research overseas and in the Midwest U.S. beginning in the 1980’s has shown that OFT leads to more appropriate, site-specific technology, broader and faster adoption, and increased producer ability to adapt and develop farming technologies. Using accepted methods of on-farm testing, growers can achieve experimental precision comparable to those of intensive university research trials. On-farm testing is now helping fill a missing link in the innovation, adaptation and adoption of no-till and other conservation tillage systems in the Pacific Northwest.

Background on OFT Research Methods in the Northwest

A 5-year Pacific Northwest On-Farm Testing Project was conducted in Idaho, Oregon and Washington from 1991-1995 as part of the STEEP II (Solutions To Environmental and Economic Problems) conservation farming research and education program through the University of Idaho, Oregon State University and Washington State University. The program was supported by special grant from the USDA Cooperative States Research, Education and Extension Service (CSREES). Since this initial STEEP II OFT project, grower-established OFT trials with field-scale equipment have become an increasingly important part of STEEP III and related research projects. It is also becoming a commonly accepted approach by grower and Ag support industry and agency personnel in evaluating new farming technologies.

The objectives of the original STEEP II OFT project were: 1) Identify, develop and evaluate OFT methodologies for Northwest conditions; 2) Develop and deliver OFT educational materials and programs; and 3) Assist growers in testing conservation practices using OFT methods.

As part of the methodology research effort, 14 uniformity trials were harvested in grower wheat or barley fields in the three states. A uniformity trial measures natural field variability between plots where treatments have not been established. At each of the 14 trials, eight side-by-side strips up to 1,500 feet long were harvested in 250 ft segments to allow recombination of the data into plots of different lengths. The trials simulated field experiments with two treatments and four replications of each side-by-side pair of treatments. Grain yields were analyzed to determine the variance between pairs of plots.

Ten of 14 sites with yield variances of <5 bu/acre at plot lengths of 1500 ft were classified as low variance (see Fig. 2). Yield variance at the other 4 sites ranged from 6 to 22 bu/acre and these sites were classified as high variance. Based on this field uniformity research and numerous on-farm tests with growers, recommendations on plot length, number of replications and statistical design were developed for Northwest conditions. Considerable time was also focused on developing layout designs and procedures for on-farm test establishment and data collection. The goal of these efforts was to help growers develop accurate and logistically practical trials with field-scale equipment.

During the 5-year project, nearly 250 on-farm tests were conducted in the three states. Major OFT topics included tillage systems, variety performance, residue management, planting systems, crop rotations, soil amendments, and crop or field application of fertilizers and pest control products. Some OFTs were by individual growers on topics of their own specific interest. Many others addressed issues of concerns to larger areas and were collaborative efforts among growers, researchers and other Ag support personnel.

Developing Your On-Farm Testing Strategies

The following questions have commonly been asked when considering how to establish on-farm trials. The answers were developed through the 1991-95 PNW On-Farm Testing project and experiences with on-farm testing in the regions since that time.

Explanation of Common Statistical Terms in On-Farm Testing

The following is a brief explanation of two statistical terms commonly used in analyzing the results of simple on-farm trials. This should help you better understand the effects of replication and plot length on the ability to detect “significant” differences between treatments.

LSD — A common statistical tool for on-farm tests is a LSD “Least Significant Difference,” which is calculated through standard Analysis of Variance (ANOVA) statistical analysis software or can be calculated by hand. The LSD is a calculation based on the variability of treatment results within the trial and is used to help separate the effects of natural field variability from the treatment effects. If the difference between treatment means in a trial are equal to or larger than the LSD, the difference is statistically significant and believed to be due to the treatment effect and not natural field variability. On the other hand, if the difference between treatment means is smaller than the LSD, the differences are more likely due to natural field variability. The LSD is only used if the ANOVA analysis determines that treatment differences are significant. In the example in Fig. 2, the LSD represents yield in bu/acre, but it can be calculated for any treatment effect you might be interested in comparing.

Probability Level – The subscript number following the LSD refers to the “probability level” at which the LSD was calculated. In Fig. 2, the probability level is 0.05, or 5%. For example, if the difference between treatment means is equal to or larger than the LSD at 0.05, then you are 95% sure that the difference is due to treatment effects and not to natural variability. The smaller the probability level, the greater confidence you can have that differences between treatment means are due to treatment effects. A typical statistical probability level is 0.05, but it can feasiblely range from 0.001 to 0.2 (0.1 – 20%), depending on the economic or environmental implication of choices between management options being tested, interactions with other managements practices, the grower’s experience and other factors.

Why is Replication Necessary?

Replication, meaning repetition of each treatment area, provides a critical tool to overcome the fact that any two plots under the same management will not have exactly the same yield, stand count, weed population, fertility, or any other factor because of natural variability. In other words, replication helps you to determine if differences between plots are due to treatments or due to natural field variation. In statistical jargon, this normal variation is called “experimental error.” Replication is based on the theory that if one practice is superior to another, it will become evident if you make repeated comparisons.

As part of the STEEP II OFT project in Washington, Oregon and Idaho, field uniformity trials with eight side-by-side combine strips in 14 wheat and barley fields illustrated considerable yield variability even within selected “uniform” areas of fields. We found that yields of combine strips side-by-side (full header cuts with a space between strips) commonly varied from 1 to 10 or more bushels per acre. Fig. 1 show yield variances as great as 9 bu/acre in side-by-side combine strips and 15 bu/acre across the 8 strips in a uniformity trial near Moscow, ID. The further apart the combine strips, the greater the yield differences commonly were in all these trials. This level of field variability emphasized the need for replication as a key step to a successful OFTs.

Figure 1. — Fig. 1. Winter wheat yields in eight, 500-foot combine strips in a “uniform” area of a field near Moscow, Idaho in 1992. Full header-width cuts 20 feet wide were harvested in each side-by-side 25 ft X 500 ft plots. Yields of adjacent strips varied as much as 9 bu/A.

How Many Replications Do You Need?

Our Northwest research has shown that four replications of the side-by-side comparisons usually give the best chance of success for the amount of effort. In Fig. 2, the LSD required to have significant differences between mean treatment yields on OFT with plots 250 ft long decreased from about 37 bu/acre with 2 replication to 10 bu/acre with 3 replications and 7 bu/acre with 4 replications. Five or six replications can give a slight gain in statistical power in separating treatment differences, but may not be worth the extra effort in some trials. Three replications are less precise in determining treatment differences than four replications, but may be adequate for some management practice comparisons. The danger of starting with three replications is that if you loose data from one plot, you no longer have an effective trial.

How Long Should OFT Plots Be?

Generally, the longer the plots are, the better the results are likely to be, but that depends on the field landscapes, soil variability and yield levels. Figure 2 illustrates how the ability to detect significant difference between treatment yield results increases with increasing plot length. For example, increasing plot length from 250 ft to 1000 ft decreased the LSD from about 10 bu/acre to 5 bu/acre with 3 replications and from about 7 bu/acre to 3 bu/acre with 4 replications. Longer plot lengths typically allow the use of a lower LSD to determine whether or not treatment results are significantly different. The importance of plot length is also influenced by yield. For example, when cereal yields are under 60 bu/acre, a 750 ft or longer plot length would be the best choice. At higher yields, you can achieve accurate comparisons with shorter plot lengths of around 300-500 ft because you have larger volumes of grain from each plot. If you have the space, longer plots will generally give you less trial variability and greater confidence in your results.

Figure 2. — Fig. 2. Effects of replication number and plot length on the Least Significant Difference (LSD) values at the 5% statistical probability level based on data from 10 field uniformity trials in Idaho, Oregon and Washington with less than 5 bu/acre yield variance. (Adapted from Wuest et al, 1994).

Why Randomize Treatments?

Selection of treatment locations in a field comparison must be fair or “unbiased.” This might seem obvious, but there are many ways to consciously or unconsciously give an advantage to one of the practices being compared. One easy way to choose which of two practices or treatments go in which plot is to flip a coin! This is called “randomization”, and the theory is straightforward. Once you have chosen plot areas, that as far as you can tell should perform the same, the logical way to convince yourself and others that you did not consciously or unconsciously favor one of the treatments is to assign them at random.

Remember that there are often gradients of soil and topographic factors across fields that can create gradients in yield and other factors of interest, some you can see and some you can not. For example, on a field slope, yields typically increase downslope with increased soil depth, organic matter content, soil moisture and so on. If the same treatment is always placed on the upper or lower sides of side-by-side comparisons down the slope (or other field gradient factor), it could create an unfair yield bias in the trial and the results would be misleading. Figure. 3 provides an example of the importance of randomizing the order of treatments in OFTs. Canola yields progressively increased downslope in a dry spring that limited overall yields and reduced the potential for a canola response to boron fertilizer. Having the same order of treatments in each replication would have resulted in entirely different conclusions. The randomization of treatments in each replication prevented drawing misleading conclusions from this fertilizer trial because of the yield gradient in the trial.

Figure 3. — Fig. 3. Yield of Canola with three different boron fertilizer options in an OFT with three replications near Craigmont, Idaho in 1992. Dry spring conditions limited yields and largely overshadowed any crop response to boron fertilizer treatments. Because of the progressive increase in yield down slope (~7% slope), using the same order of treatments in each replication would have biased the results. Fortunately, treatments were randomized in each replication.

Randomization helps to remove any treatment bias due to gradients in field characteristics that can affect yield or other factors being measured. If you are aware of a potential gradient in field condition that could affect the results of an OFT, arrange plots so all the treatments cross the field variability as equally as possibly.

What Statistical Design Works Best for OFTs?

The statistical design generally most appropriate for replicated, randomized OFTs with field-scale equipment is called “Randomized Complete Block” design (see Fig. 4 for two examples). As explained earlier, the “randomized” part of the complete block design means that the order of each treatment in each block or replication is chosen randomly to ensure no bias in assigning treatments to plots. The “complete block” means that each of the two or more treatments are included in side-by-side comparisons in each of the trial “replications or blocks.” Because field variability generally increases with distance across the field, data variability is decreased through establishing the replications of side-by-side comparisons. Variability in results due to natural variability between replications can then be separated in the statistical analysis of the results.

What Are Some Practical OFT Procedures?

The first step in designing an OFT is to decide clearly what you want to learn. This can take more thought than it might appear. Determine what you really need to know before you can make a decision whether or not to adopt the new practice or product. Design your test to provide that information.

Briefly, the steps to laying out a scientific OFT commonly include: 1) choose an area in a field where long, side-by-side plots can be placed with the expectation that the yield (or weed pressure, or other factors to be measured) should be nearly equal; 2) assign the treatments to the plots randomly, such as with a coin toss; and 3) repeat the above process so there are at least four replications. The four replications could be next to each other, or in different areas of the field, or even in different fields (Fig. 4). The best results occur when each replication is positioned so that variations in the field (high and low areas, soil variations, field borders, fertilizer overlaps, etc.) will be encountered equally by each strip in the replication.

Make a map of the field and plot locations, and keep notes on what you observe throughout the trial year. After all the field operations (spraying post emergence herbicides, etc.) mark your plots with stakes tall enough to be easily found at harvest time.

When data measurements are taken, such as stand counts or yield, record them separately for each strip. At harvest, cut a full header width out of the center of each plot (plots wider than the header), and weight the grain from each plot separately. Also measure the length of the harvested strip and header width to calculate the area of each plot. Portable truck scales or weigh wagons typically provide 5 to 10 pound accuracy per load cell in weight measurements. Consequently, your proportional error due to weighing equipment accuracy will be greater with smaller plots, particularly if yields are low. You may want to collect grain samples for moisture content, test weight and quality measurements.

Figure 4. — Fig. 4. Potential field layout options for an OFT with two treatments (C = check; N = new practice) and four replications in a randomized complete block design. The bottom layout illustrates a common side-by-side arrangement for all four replications. The top layout illustrated an end-to-end arrangement that can potentially permit faster establishment and harvest of some trials.

With new combine yield monitors and distance measuring devices, OFT may become faster and simpler for growers, but there is still considerable debate whether they are accurate enough for yield comparisons in replicated trials. An article in PrecisionAg Illustrated (Dunn, 1997) provides an overview of some of the advantages and disadvantages of combine yield monitors in field comparisons of management options. Basic OFT principles still need to be followed regardless of harvesting method.

If comparisons of erosion control or soil water storage are important questions in an OFT on sloping cropland, the trial design typically involves adjacent wider blocks instead of long narrow strips. Each plot needs to include much of field slope, starting at the top of the “watershed area” of the field. This means that plots cannot be placed one above another on a slope to prevent the lower plot from receiving runoff from a plot above it with different practice. Each plot must run from the top of the slope down far enough to see how it handles water absorption and runoff. At the same time, if the practices are usually conducted on the slope contour, it should also be done that way in the plots.

How Do I Analyze the OFT Results?

Use of basic experimental methods and designs discussed so far are critical to achieving accurate results from OFTs. Once you have collected the data you are interested in, then statistical analysis of the results can separate the effects of natural field variability from the treatment effects for making correct conclusion and management choices. Assistance in analyzing the trial data is often available through your county Cooperative Extension Agent. Even without statistics, a lot can be learned from observation of different treatments to see if one is consistently better than the other in each of the replications.