Need for Statistical Data

Since the beginning of the twentieth century the economic and social life of the people and the functional system of industry and business, educational and medical facilities and other activities of the community have undergone substantial changes due to spectacular developments in the field of science and technology. Now the emphasis is on specialization in mass production and utilization of goods and services of a given type with a view to get the maximum possible benefit per unit of cost. Considerable planning is required in a large-scale projects and any rational decision regarding efficient formulation and execution of suitable plans and projects or an objective assessment of their effectiveness, whether in the field of industry, business or governmental activities, has necessarily to be based on objective data regarding resources and needs. There is, therefore, a need for various types of statistical (quantified) information to be collected and analyzed in an objective manner and presented suitably so as to serve as a sound basis for taking policy decisions in different fields of human activity. In modern times, the primary users of statistical data are the state, industry, business, scientific institutions, public organizations and international agencies.

For instance, to execute its various responsibilities, the state is in need of a variety of information regarding different sectors of the economy, sections of people and geographical regions in the country as well as information on the available resources such as manpower, cultivable land, forests, water, minerals and oil. If the resources were unlimited, planning would be relatively simple as it would consist in just providing each one with what he needs in terms of money, material, employment, education etc. But such a situation is only hypothetical, as in reality the resources are limited and the needs are usually not well defined and are elastic.

Therefore, for the purpose of proper planning fairly detailed data on the available resources and on the needs are to be collected. For example, the country is in need of data on production and consumption of different types of products to enable it to take objective decisions regarding its import and export polices. Statistical information on the cost of living of different categories of people living in various parts of the country is of importance in shaping its policies in respect of wage and price levels.


Complete enumeration survey
One way of obtaining the required information at regional and country level is to collect the data for each and every unit (person, household, field, factory, shop, etc as the case may be) belonging to the population or universe, which is the aggregate of all units of a given type under consideration and this procedure of obtaining information is termed complete enumeration survey. The effort, money and time required for carrying out complete enumeration surveys to obtain the different types of data will, generally, be extremely large. However, if the information is required for each and every unit in the domain of study, a complete enumeration survey is clearly necessary. Examples of such situations are income tax assessment where the income of each individual is assessed and taxed, preparation of voters’ list for election purposes and recruitment of personnel in an establishment etc. But there are many situations, where only summary figures are required for the domain of study as a whole or for group of units and in such situations collection of data for every unit is only a means to an end and not the end itself. It is worth mentioning that exact planning for the future is not possible, since this would need accurate information on the resources that would be available and on the needs that would have to be satisfied in future. In general, past data are used to forecast the resources and the needs of the future and hence there is some element of uncertainty in planning. Because of this uncertainty, only broad (and not exact) allocations of the resources are usually attempted. Thus some margin of error may be permitted in the data needed for planning, provided this error is not large enough to affect the broad allocations.

Sampling
Considering that some margin of error is permissible in the data needed for practical purposes, an effective alternative to a complete enumeration survey can be a sample survey where only some of the units selected in a suitable manner from the population are surveyed and an inference is drawn about the population on the basis of observations made on the selected units. It can be easily seen that compared to a sample survey, a complete enumeration survey is time-consuming, expensive, has less scope in the sense of restricted subject coverage and is subject to greater coverage, observational and tabulation errors. In certain investigations, it may be essential to use specialized equipment or highly trained field staff for data collection making it almost impossible to carry out such investigations except on a sampling basis. Besides, in case of destructive surveys, a complete enumeration survey is just not practicable. Thus, if the interest is to obtain the average life of electric bulbs in a batch then one will have to confine the observations, of necessity, to a part (or a sample) of the population or universe and to infer about the population as a whole on the basis of the observations on the sample. However, since an inference is made about the whole from a part in a sample survey, the results are likely to be different from the population values and the differences would depend on the selected part or sample. Thus the information provided by a sample is subject to a kind of error which is known as sampling error. On the other hand, as only a part of the population is to be surveyed, there is greater scope for eliminating the ascertainment or observational errors by proper controls and by employing trained personnel than is possible in a complete enumeration survey. It is of interest to note that if a sample survey is carried out according to certain specified statistical principles, it is possible not only to estimate the value of the characteristic for the population as a whole on the basis of the sample data, but also to get a valid estimate of the sampling error of the estimate. There are various steps involved in the planning and execution of a sample survey. One of the principal steps in a sample survey relate to methods of data collection.

Transformation of Data

Note: It should be emphasized that transformation of data in statistics, if needed, must take place right at the beginning of the statistical analysis.
The validity of analysis of variance depends on certain important assumptions like normality of errors and random effects, independence of errors, homoscedasticity of errors and effects are additive. The analysis is likely to lead to faulty conclusions when some of these assumptions are violated. A very common case of violation is the assumption regarding the constancy of variance of errors. One of the alternatives in such cases is to go for a weighted analysis of variance wherein each observation is weighted by the inverse of its variance. For this, an estimate of the variance of each observation is to be obtained which may not be feasible always. Quite often, the data are subjected to certain scale transformations such that in the transformed scale, the constant variance assumption is realized. Some of such transformation of data in statistics can also correct for departures of observations from normality because unequal variance is many times related to the distribution of the variable also. Major aims of applying transformation of data in statistics are to bring data closer to normal distribution, to reduce relationship between mean and variance, to reduce the influence of outliers, to improve linearity in regression, to reduce interaction effects, to reduce skewness and kurtosis. Certain methods are available for identifying the transformation of data in statistics needed for any particular data set but one may also resort to certain standard forms of transformation of data depending on the nature of the data. Most commonly used transformation of data in the analysis of experimental data are Arcsine, Logarithmic and Square root. These transformations of data can be carried out using the following options.

Arcsine Transformation : Arcsine transformation of data is appropriate for the data on proportions, i.e., data obtained from a count and the data expressed as decimal fractions and percentages. The distribution of percentages is binomial and arcsine transformation of data makes the distribution normal. Since the role of Arcsine transformation of data is not properly understood, there is a tendency to transform any percentage using arc sine transformation. But only that percentage data that are derived from count data, such as % barren tillers (which is derived from the ratio of the number of non-bearing tillers to the total number of tillers) should be transformed and not the percentage data such as % protein or % carbohydrates, which are not derived from count data.
In the case of proportions, derived from frequency data, the observed proportion p can be changed to a new form 
This type of transformation of data is known as angular or arcsine transformation. However, when nearly all values in the data lie between 0.3 and 0.7, there is no need for such transformation. It may be noted that the angular transformation is not applicable to proportion or percentage data which are not derived from counts. For example, percentage of marks, percentage of profit, percentage of protein in grains, oil content in seeds, etc., can not be subjected to angular transformation. The angular transformation is not good when the data contain 0 or 1 values for p. The transformation in such cases is improved by replacing 0 with (1/4n) and 1 with [1-(1/4n)], before taking angular values, where n is the number of observations based on which p is estimated for each group.
ASIN gives the arcsine of a number. The arcsine is the angle whose sine is number and this number must be from -1 to 1. The returned angle is given in radians in the range to. To express the arcsine in degrees, multiply the result by 180/. For this go to the CELL where the transformation is required and write =ASIN (Give Cell identification for which transformation to be done)* 180*7/22 and press ENTER. Then copy it for all observations.
Example: ASIN (0.5) equals 0.5236 (/6 radians) and ASIN (0.5)* 180/PI equals 30 (degrees).

Logarithmic Transformation: Logarithmic transformation of data is suitable for the data where the variance is proportional to square of the mean or the coefficient of variation (S.D./mean) is constant or where effects are multiplicative. These conditions are generally found in the data that are whole numbers and cover a wide range of values. This is usually the case when analyzing growth measurements.For data of this nature, logarithmic transformation of data is recommended. It squeezes the bigger values and stretches smaller values. A simple plot of group means against the group standard deviation will show linearity in such cases. A good example is data from an experiment involving various types of insecticides. For the effective insecticide, insect counts on the treated experimental unit may be small while for the ineffective ones, the counts may range from 100 to several thousands. When zeros are present in the data, it is advisable to add 1 to each observation before making the transformation. The log transformation of data is particularly effective in normalizing positively skewed distributions. It is also used to achieve additivity of effects in certain cases.
LN gives the natural logarithm of a positive number.  Natural logarithms are based on the constant e (2.72). For this go the CELL where the transformation is required and write = LN(Give Cell Number for which transformation to be done) and press ENTER. Then copy it for all observations.
Example: LN(86) equals 4.45, LN(2.72) equals 1, LN(EXP(3)) Equals 3 and EXP(LN(4)) equals 4. Further, EXP returns e raised to the power of a given number, LOG returns the logarithm of a number to a specified base and LOG 10 returns the base-10 logarithm of a number.

Square Root Transformation: This transformation of data is appropriate for the data sets where the variance is proportional to the mean. Here, the data consists of small whole numbers, for example, data obtained in counting rare events. This data set generally follows the Poisson distribution and square root transformation approximates Poisson to normal distribution. If the original observations are brought to square root scale by taking the square root of each observation, it is known as square root transformation. This is appropriate when the variance is proportional to the mean as discernible from a graph of group variances against group means. Linear relationship between mean and variance is commonly observed when the data are in the form of small whole numbers (e.g., counts of wildlings per quadrat, weeds per plot, earthworms per square metre of soil, insects caught in traps, etc.). When the observed values fall within the range of 1 to 10 and especially when zeros are present, the transformation should be,
SQRT gives square root of a positive number. For this go to the CELL where the transformation is required and write = SQRT (Give Cell No. for which transformation to be done = 0.5) and press ENTER. Then copy it for all observations. However, if number is negative, SQRT return the #NUM ! error value.
Example: SQRT(16) equals 4, SQRT(-16) equals #NUM! and SQRT(ABS(-16)) equals 4.

Box-Cox Transformation: 
By now we know that if the relation between the variance of observations and the mean is known then this information can be utilize in selecting the form of the transformation.

We now elaborate on this point and show how it is possible to estimate the form of the required transformation from the data.

Box-Cox transformation is a power transformation of the original data.

Let yut is the observation pertaining to the uth plot, then the power transformation implies that we use yut’s as --- eq(1)
                
Box and Cox (1964) have shown how the transformation parameter l in eq(1) may be estimated simultaneously with the other model parameters (overall mean and treatment effects) using the method of maximum likelihood. The procedure consists of performing, for the various values of l, a standard analysis of variance on

is the geometric mean of the observations. The maximum likelihood estimate of l is the value for which the error sum of squares, say SSe(l), is minimum. Notice that we cannot select the value of l by directly comparing the error sum of squares from analysis of variance on yl because for each value of l the error sum of squares is measured on a different scale. Equation (A) rescales the responses so at error sums of squares are directly comparable.

Therefore, the l can be estimated in three different ways i.e. by minimizing these error sum of squares.

This is a very general transformation and the commonly used transformations follow as particular cases. The particular cases for different values of  are given below.


l
Transformation
1
No Transformation
½
Square Root
0
Log
-1/2
Reciprocal Square Root
-1
Reciprocal


If any one of the observations is zero then the geometric mean is undefined. In the expression A, geometric mean is in denominator so it is not possible to compute that expression. For solving this problem, we add a small quantity to each of the observations.

Once the transformation has been made, the analysis is carried out with the transformed data and all the conclusions are drawn in the transformed scale. However, while presenting the results, the means and their standard errors are transformed back into original units. While transforming back into the original units, certain corrections have to be made for the means. In the case of log transformed data, if the mean value is , the mean  value of the original units will be antilog (+ 1.15) instead of antilog (). If the square root transformation had been used, then the mean in the original scale would be antilog ((+ V())2 instead of    ()2 where V() represents the variance of . No such correction is generally made in the case of angular transformation. The inverse transformation for angular transformation would be p = (sin q)2.

Note: Examples discussed are for MS-Excel.

Conjoint Analysis

Conjoint Analysis is a popular marketing research technique that marketers use to determine what features a new product should have and how it should be priced which is a multivariate analysis technique introduced to the marketers in 1970's. Conjoint Analysis is basically a data de- compositional technique which tries to plot the output data on the joint space of the importance of each attribute. The important point to note is that the consumer is not asked to assign scores to different attribute separately. The main steps involved in using conjoint analysis include determination of the salient attributes for the given product from the points of view of the consumers, assigning a set of discrete levels or a range of continuous values to each of the attributes, utilizing fraction factorial design of experiment for designing the stimuli for experiment, physically designing the stimuli, data collection, conjoint analysis and determination of part worth utilities. The possible application, of conjoint analysis includes product design, market segmentation, swot analysis etc. In its original form, conjoint analysis is a main effects analysis-of-variance problem with an ordinal scale of-measurement dependent variable. Conjoint analysis decomposes rankings or rating-scale evaluation judgments of products into components based on qualitative attributes of the products. Attributes can include price, color, guarantee, environmental impact, and so on. A numerical utility or part-worth utility value is computed for each level of each attribute. The goal is to compute utilities such that the rank ordering of the sums of each product’s set of utilities is the same as the original rank ordering or violates that ordering as little as possible. When a monotonic transformation of the judgments is requested, a nonmetric conjoint analysis is performed. Nonmetric conjoint analysis models are fit iteratively. When the judgments are not transformed, a metric conjoint analysis is performed. Metric conjoint analysis models are fit directly with ordinary least squares. When all of the attributes are nominal, the metric conjoint analysis problem is a simple main-effects ANOVA model. In both metric and nonmetric conjoint analysis, the respondents are typically not asked to rate all possible combinations of the attributes. For example, with five attributes, three with three levels and two with two levels, there are 3×3×3×2×2 = 108 possible combinations. Rating that many combinations would be difficult for consumers, so typically only a small fraction of the combinations are rated. Typically, combinations are chosen from an orthogonal array which is a fractional-factorial design. The statistical technique of Fractional Factorial Design of Experiment finds out the minimum number of product designs which are necessary to use in the study and yet provide us all the information that we originally sought. These designs are also mutually independent (orthogonal) to avoid any redundancy in the data and allow the representation of each of the attributes and their respective levels in an unbiased manner.
Conjoint Analysis Steps
1. The respondent is given a set of stimulus profiles (constructed along factorial design principles in the full profile case). In the two-factor approach, pairs of factors are presented, each appearing approximately an equal number of times.
2. The respondents rank or rate the stimuli according to some overall criterion, such as preference, acceptability, or likelihood of purchase.
3. In the analysis of the data, part-worths are identified for the factor levels such that each specific combination of part-worths equals the total utility of any given profile. A set of part-worths is derived for each respondent.
4. The goodness-of-fit criterion relates the derived ranking or rating of stimulus profiles to the original ranking or rating data.
5. A set of objects are defined for the choice simulator. Based on previously determined part-worths for each respondent, each simulator computes an utility value for each of the objects defined as part of the simulation. 6. Choice simulator models are invoked which rely on decision rules (first choice model, average probability model or logit model) to estimate the respondent's object of choice. Overall choice shares are computed for the sample.
How to conduct Conjoint
While specific research objectives will dictate the direction of conjoint research, there are several components common to all conjoint engagements. These steps include: definition of attributes; establishment of attribute levels; choice of conjoint methodology; design of experiment; data collection; data analysis; and development of the market simulator.
Step 1: Definition of Attributes
To replicate the decision-making process, it is necessary to understand each of the attributes consumers consider when making an actual purchasing decision. Experience, previous research, and/or the specific research objectives will determine which attributes are of particular importance, and whether all product features should be displayed or only those most relevant to differentiating a product from competitive offerings.
Step 2: Establishment of Attribute
Levels Once attributes for the conjoint research have been defined, it must be determined how attributes will vary from one product concept to the next. This step involves the establishment of attribute levels. Attribute levels must be comprehensive enough to capture all of the products that exist, or soon exist, within the marketplace. However, as with the definition of attributes, care must be taken to avoid respondent fatigue, so only the most prevalent attribute levels will be chosen for testing (typically 3-5 attribute levels per attribute). Further, the number of attribute levels chosen has a direct impact on the number of concepts respondents will be asked to evaluate. The optimal number of attribute levels tested will be that which ensures research objectives are satisfied while minimizing the burden faced by respondents.
Step 3: Choice of Conjoint Methodology
Because no two product and/or service categories are exactly the same, there are a number of conjoint methodologies at a marketing researcher's disposal. The three primary methods used today include: conjoint value analysis (CVA), adaptive conjoint analysis (ACA), and choice-based conjoint analysis (CBC), with adaptive choice-based conjoint (ACBC) emerging as a new generation of conjoint analysis. For the purposes of this whitepaper, we will focus on CBC analysis, by far the most popular conjoint methodology currently used by researchers. Some types of Conjoint Methodologies include: 1. Choice-Based Conjoint (CBC) 2. Conjoint Value Analysis (CVA) 3. Adaptive Conjoint Analysis (ACA)
Step 4: Design of Experiment
Having established the methodology, attributes, and attributes levels to be tested; we can then create concept profiles (i.e., descriptions of product concepts using the attributes and attribute levels to be used in the research). Respondents are asked to evaluate a number of these concepts, and in the case of CBC determine which, if any, they would choose to purchase given the opportunity. Fortunately, it is not necessary that every potential product offering be evaluated. In fact, this would be quite impossible, as there are typically thousands of potential product configurations in any given study. For example, there are 1800 hypothetical products in the energy bar study (3 brands x 5 protein levels x 6 carbohydrate levels x 4 flavors x 5 price levels). However, with a carefully constructed conjoint design, we are able to calculate respondent preference for each attribute and attribute level. Therefore, assuming a simple additive model (i.e., product preference is the sum of preference for its attributes), we can estimate how respondents would react to any product offering.
Step 5: Data Collection
An online survey is recommended for almost all conjoint research engagements, as it provides the most effective, cost efficient, time sensitive, and highest quality solution. Respondents are required to consider a great deal of information, allowing them to visually assess the stimuli results in more reliable findings. An online presentation of product concepts and conjoint tasks allows respondents to complete the survey at their own pace, allowing time for thoughtful and accurate responses. With over 70% of U.S. adults accessing the Internet via computers at home, work, or school (Source: Pew Internet and American Life Project), an online methodology allows for data collection from a large sample set.
Step 6: Data Analysis
With a carefully constructed conjoint survey, we can statistically deduce the consumer values for each feature respondents may be subconsciously using to evaluate concepts. Analysis of conjoint data yields a series of scores for each respondent for each attribute level. These scores, known as part-worth, may be likened to the unit which is an arbitrary measurement of utility consumers associate with a product and its attributes. Each score reflects the value the respondent associates with each attribute level, and is the building block from which all analysis is conducted. By assuming a simple additive model, we are able to build products and pricing structures, and then calculate the value consumers find in that product. By comparing this to other potential products in the marketplace, we can begin to understand how consumers will choose products in the real world.
Step 7: Development of Market Simulator 
While preliminary analysis of conjoint data results in valuable insight regarding consumers and their preferences, the real value of conjoint analysis comes from the market simulators developed at the conclusion of the research engagement. The market simulator is a software program, similar to a spreadsheet, which allows users to conduct "what-if" analyses with data collected during conjoint fielding. As mentioned above, respondents can be asked to evaluate only a small fraction of concept profiles, yet still reveal how they would respond to any product offering. Therefore, it is possible to aggregate the preferences of all consumers to reveal how the market as a whole will respond to any product offering. Furthermore, we can assess how the marketplace will respond to two or more competing products by calculating the market’s share of preference for every product of interest.

Conjoint Analysis Survey Examples