Data Analysis: Data Preparation and Basic Concepts
WEEK 8 READINGS: CHAPTERS 10 AND 11
2
Data analysis
Key points
• Data must be analysed to produce information (0, 6, 1, 3 mean nothing but male vs. female and strong vs. weak attitudes do)
• Computer software analysis is normally used for this process (e.g., SPSS, Excel)
• Data should be carefully prepared for analysis (e.g., coding, screened)
• Researchers need to know how to select and use different charting and statistical techniques (e.g., t-test, ANOVA, regression, etc.)
MARK977 Research for Marketing Decisions T1 20183
Data analysis
• A set of methods and techniques used to obtain
information and insights from data
• Helps avoid erroneous judgments and conclusions
• Can constructively influence the research objectives and
the research design
MARK977 Research for Marketing Decisions T1 20184
Major Data Preparation techniques include:
• Data editing
• Coding
• Statistically adjusting the data
3
Data preparation process
MARK977 Research for Marketing Decisions T1 20185
Questionnaire checking • Check questionnaires for completeness and interviewing
quality as soon as the first batch is received from the field • If quotas or cells have been imposed, check for
underrepresented cells and conduct additional interviews if necessary
• A questionnaire returned from the field may be unacceptable for several reasons: – Parts of the questionnaire may be incomplete. – The pattern of responses may indicate that the respondent did
not understand or follow the instructions. – The responses show little variance. – The questionnaire is answered by someone who does not
qualify for participation.
MARK977 Research for Marketing Decisions T1 20186
4
Data editing • Review questionnaires to increase accuracy and precision • By identifying omissions, ambiguities, and errors in responses
(e.g., indicated not an existing reader of Vogue magazine but reads it 3-4 times a week)
• Conducted in the field by interviewer and field supervisor, or by the analyst prior to data analysis
• Problems identified with data editing: – Interviewer Error – Omissions – Ambiguity – Inconsistencies – Lack of Cooperation – Ineligible Respondent
MARK977 Research for Marketing Decisions T1 20187
Data editing (cont.)
Treatment of Unsatisfactory Responses • Returning to the Field. The questionnaires with
unsatisfactory responses may be returned to the field, where the interviewers recontact the respondents.
• Assigning Missing Values. If returning the questionnaires to the field is not feasible, the editor may assign missing values to unsatisfactory responses.
• Discarding Unsatisfactory Respondents. In this approach, the respondents with unsatisfactory responses are simply discarded.
MARK977 Research for Marketing Decisions T1 20188
5
Coding
Coding means assigning a code, usually a number, to each possible response (e.g., 1=male, 2=female) to each question.
Coding Questions • Coding closed-ended questions involves specifying how
the responses are to be entered • If possible, standard codes should be used for missing data
(e.g., 99). Coding of structured questions is relatively simple, since the response options are predetermined (e.g., 1=strongly disagree, 2=disagree, 4=agree, 5=strongly disagree).
MARK977 Research for Marketing Decisions T1 20189
Coding (cont.)
• In questions that permit a large number of responses (e.g., check all that apply), each possible response option should be assigned a separate column. – E.g., preferred mode of transportation – car, bus, train,
airplane, ferry (1=yes, 0=no) – ID1 – 0, 1, 1, 0, etc. – ID2 – 1, 0 1, 1, etc.
• Open-ended questions are difficult to code – Lengthy list of possible responses is generated – E.g., what is your occupation __________
MARK977 Research for Marketing Decisions T1 201810
6
Coding (cont.)
Guidelines for coding unstructured questions (e.g., the “others – please specify _____” response): • Category codes should be mutually exclusive and
collectively exhaustive (i.e., no overlap). • For example, 0-9, 10-19, 20-29 (no overlap), 0-10, 10-20,
20-30 (overlapping) • Category codes should be meaningful. • Data should be coded to retain as much detail as
possible. • E.g., occupation – blue vs. white collar
MARK977 Research for Marketing Decisions T1 201811
Using SPSS for data analysis
• The Statistical Package for the Social Sciences. • SPSS is a cross between a spreadsheet and a
database. • It is a very powerful program; most people only use a
fraction of its capabilities. • SPSS can be used for:
– Storing data – Analysing data – Presenting results
MARK977 Research for Marketing Decisions T1 201812
7
What does SPSS look like?
MARK977 Research for Marketing Decisions T1 201813
SPSS Data Matrix
A simple data matrix
MARK977 Research for Marketing Decisions T1 201814
Each column captures a specific variable Each row captures a specific case
8
Data cleaning • Thorough and extensive checks for consistency and treatment of
missing values using a computer (e.g., to determine if data entry was done appropriately)
• Consistency checks to identify data that are: – Out-of-range (e.g., 0 or 6 on a 5-point scale) – Logically inconsistent (e.g., not an existing reader of Vogue magazine
but reads it 3-4 times a week) – Have extreme values (e.g., one-sided)
• Treatment of missing responses – Either not provided by respondent or not recorded – Substitute a neutral value (e.g., mean response to the variable) – Casewise deletion (e.g., remove cases/respondents with any missing
responses) – Pairwise deletion (e.g., using only cases with complete responses
involving specific variables for each calculation)
MARK977 Research for Marketing Decisions T1 201815
Statistically adjusting the data
• Not always necessary but can enhance the quality of data analysis
• Variable re-specification – Transforming data to create new variables or modify existing ones
• E.g., creating a sum score of overall evaluation of a department store (by summing ratings of quality, variety, value and service)
• E.g., reverse-scoring negative statements (to positive statements)
• Recoding – Redefining values of a variable and includes forming or redefining
categories • E.g., ratio scale – <50 employees = 1, 50 or more = 2
MARK977 Research for Marketing Decisions T1 201816
9
Variable re-specification example
MARK977 Research for Marketing Decisions T1 201817
Please rate the extent to which you agree or disagree with the following statements (1=strongly disagree and 5=strongly agree):
I am interested in new products with status. 1 2 3 4 5
I would buy a product just because it has status. 1 2 3 4 5
I would pay more for a product if it had status. 1 2 3 4 5
The status of a product is irrelevant to me. 1 2 3 4 5
A product is more valuable to me if it has some snob appeal. 1 2 3 4 5
• Which item needs reversing? • What is your score?
Selecting a data analysis strategy
MARK977 Research for Marketing Decisions T1 201818
10
Factors influencing choice of statistical technique
Types of Data • Classification of data involves nominal, ordinal, interval and
ratio scales of measurement • Nominal scaling is restricted in that mode is the only
meaningful measure of central tendency • Both median and mode can be used for ordinal scale • Non-parametric tests can only be run on ordinal data • Mean, median and mode can all be used to measure central
tendency for interval and ratio scaled data
MARK977 Research for Marketing Decisions T1 201819
Recap: Types of data Categorical data can be divided into sets or categories that have no value: • Descriptive or nominal data
– E.g., types of car – hatchback, coupe, sedan • Dichotomous data
– E.g., gender (male/female) • Ranked or ordinal data
– E.g., ranked order of preferred restaurants Numerical Data • Interval data
– E.g., attitude, agreement, frequency • Ratio data
– E.g., firm size, firm age, length of service, no. of customers served
MARK977 Research for Marketing Decisions T1 201820
11
MARK977 Research for Marketing Decisions T1 201821
Saunders et al. (2016)
Factors influencing choice of statistical technique
Research Design • Depends on:
– Whether dependent or independent samples are used (e.g., independent samples vs. paired samples t-tests)
– Number of groups being analysed (e.g., t-tests vs. ANOVAs)
– Number of variables (e.g., univariate vs. multivariate)
– Ultimately, your hypotheses!
MARK977 Research for Marketing Decisions T1 201822
12
Overview of statistical techniques
Univariate Techniques • Appropriate when there is a single measurement of each of
the ‘n’ sample objects or there are several measurements of each of the `n’ observations but each variable is analysed in isolation
• Non-metric data – measured on nominal or ordinal scale • Metric data – measured on interval or ratio scale • Determine whether single or multiple samples are involved • For multiple samples, choice of statistical test depends on
whether the samples are independent or dependent
MARK977 Research for Marketing Decisions T1 201823
Classification of univariate statistical techniques
MARK977 Research for Marketing Decisions T1 201824
Non- numerical data (e.g.,
categories)
Numerical data (e.g., 7- point Likert-
scale)
13
Overview of statistical techniques (cont.)
Multivariate Techniques • A collection of procedures for analysing association between
two or more sets of measurements that have been made on each object in one or more samples of objects (i.e., variables are analysed simultaneously)
MARK977 Research for Marketing Decisions T1 201825
Uses:
• To group variables or people or objects
• To improve the ability to predict variables (such as usage)
• To understand relationships between variables (such as advertising and sales)
Classification of multivariate statistical techniques (cont.)
MARK977 Research for Marketing Decisions T1 201826
14
SPSS Exercise – Setting up an SPSS file
Today’s activities:
• Data entry • Variable recoding • Variable re-specification
MARK977 Research for Marketing Decisions T1 201827
Questionnaire
MARK977 Research for Marketing Decisions T1 201828
A researcher developed this survey and used it to gather data from a pre-test sample of 20 respondents on preferences for department stores.
15
Data
MARK977 Research for Marketing Decisions T1 201829
ID Preference Quality Variety Value Service Income
1 2 2 3 1 3 6
2 6 5 6 5 7 2
3 4 4 3 4 5 3
4 1 2 1 1 2 5
5 7 6 6 5 4 1
6 5 4 4 5 4 3
7 2 2 3 2 3 5
8 3 3 4 2 3 4
9 7 6 7 6 5 2
10 2 3 2 2 2 5
11 2 3 2 1 3 6
12 6 6 6 6 7 2
13 4 4 3 3 4 3
14 1 1 3 1 2 4
15 7 7 5 5 4 2
16 5 5 4 5 5 3
17 2 3 1 2 3 4
18 4 4 3 3 3 3
19 7 5 5 7 5 5
20 3 2 2 3 3 3
These were the
data gathered from the pre-test survey.
Your task
• Set up an SPSS file for the survey and enter the data from the pre-test.
• You will need to name and label the variables in addition to coding and labelling the values of any nominal variables.
• Name and save your data file as “Shopping Preference Survey”.
• Follow the “Data Entry” guide on Moodle.
MARK977 Research for Marketing Decisions T1 201830
16
Your next task
• Recode the variable “Income”. • Income category 1 occurs only once and income category
6 occurs only twice. • Therefore, we want to combine income categories 1 and 2,
and categories 5 and 6, and create a new income variable “rincome” labelled “Recoded Income”.
• Note that following the recoding process, “rincome” has only four categories, which are coded as 1 to 4.
• Follow the “Variable Recoding” guide on Moodle.
MARK977 Research for Marketing Decisions T1 201831
Your final task
• Create a new variable. • Create and compute a new variable called “overall
evaluation of the department store” (Overall) that is the sum of the ratings on quality, variety, value and service.
• Thus, Overall = Quality + Variety + Value + Service. • Follow the “Variable Re-specification” guide on Moodle.
MARK977 Research for Marketing Decisions T1 201832
17
Practice
• Open the “Shopping Preference Survey Data (Practice) dataset. • Recode the following variables:
– Age • 21 and under = Gen Z • 22 to 40 = Gen Y • 41 to 52 = Gen X • 53 and above = Baby Boomers
– Household size • 1 to 2 = Small • 3 to 5 = Medium • 6 and above = Large
MARK977 Research for Marketing Decisions T1 201833
Data analysis: Basic concepts
• Frequency distribution (or simple tabulation) • Hypothesis testing • Cross-tabulation (via chi-square test)
MARK977 Research for Marketing Decisions T1 201834
18
Simple tabulation
• Consists of counting the number of cases that fall into various categories
• Commonly used for determining frequency distribution
MARK977 Research for Marketing Decisions T1 201835
Uses: • Determine empirical distribution (frequency distribution)
of the variable in question • Calculate summary statistics, particularly the mean or
percentages • Aid in “data cleaning” aspects
Simple tabulation
Aid in “data cleaning” aspects • Identify illegitimate responses
– E.g., any cases with extreme values, such as those outside the range of the scale (i.e., 8 on a 7-point scale)
• Identify missing values – Denoted by the “Missing” row in your output file
• Identify outliers, cases with extreme values – E.g., a household size of 9 or more when the sample
average is 5
MARK977 Research for Marketing Decisions T1 201836
19
Frequency distribution
• In a frequency distribution, one variable is considered at a time
• Produces a table of frequency counts, percentages and cumulative percentages for all the values associated with that variable, for example: – Gender – how many respondents in the male and female
categories? – Age – how many respondents in the Under 18 and 18–25
categories? • Can be illustrated simply as a number or as a percentage or
histogram
MARK977 Research for Marketing Decisions T1 201837
Example: Usage and attitude towards Tommy Hilfiger
MARK977 Research for Marketing Decisions T1 201838
20
Example: Frequency distribution of attitude towards TH
MARK977 Research for Marketing Decisions T1 201839
Example: Frequency histogram
MARK977 Research for Marketing Decisions T1 201840
21
Descriptive statistics
• Statistics normally associated with a frequency distribution to help summarize information in the frequency table
• Includes: – Measures of central tendency (mean, median and mode) – Measures of dispersion (range, standard deviation, and
coefficient of variation)
MARK977 Research for Marketing Decisions T1 201841
Measures of central tendency
• The mean, or average, is the most commonly used measure of central tendency (centre of the distribution).
where, Xi = Observed values of the variable X n = Number of observations (sample size)
MARK977 Research for Marketing Decisions T1 201842
22
Measures of central tendency (cont.)
• The mode is the value that occurs most frequently. It represents the highest peak of the distribution.
• The median of a sample is the middle value when the data are arranged in an ascending or descending order.
MARK977 Research for Marketing Decisions T1 201843
Measures of dispersion
• The range measures the spread of the data. It is simply the difference between the largest and smallest values in the sample.
• The variance is the mean squared deviation from the mean. – When data points are clustered around the mean, variance is small – When data points are scattered, variance is large
• The standard deviation is the square root of the variance – Serves the same purpose as variance in helping us understand how
clustered or spread the distribution is around the mean value
MARK977 Research for Marketing Decisions T1 201844
23
But what is wrong with relying solely on
descriptive statistics?
• Say, you want to test the theory that female users are more favourable of TH.
• 24 respondents (female) were selected (from the total sample of 45) and the average response was 3.43 (vs. sample mean = 4.11).
• So, what does this mean? What if the average response among female users was 1.32? Or 6.78?
• How convincing is this evidence?
MARK977 Research for Marketing Decisions T1 201845
Hypothesis testing: Basic concepts
What is a hypothesis? • An assumption or unproven statement (hypothesis) that needs to
be tested Purpose of Hypothesis Testing • To make a judgment about the difference between two sample
statistics (i.e., between sample statistic and a hypothesized population parameter)
• A decision about the hypothesis has to be taken based NOT on intuition, but on some objective measure
• Evidence has to be evaluated statistically before arriving at a conclusion regarding the hypothesis
• This is the logic behind hypothesis testing
MARK977 Research for Marketing Decisions T1 201846
24
Hypothesis testing process
MARK977 Research for Marketing Decisions T1 201847
Step 1: Problem definition
• Stating the problem confronting the manager and the marketing research problem that the researcher will address
• Management decision problem – Action-oriented, focuses on symptoms
• Marketing research problem – Information-oriented, focuses on underlying causes
• Research focus and purpose – Research questions – Research objectives
MARK977 Research for Marketing Decisions T1 201848
25
Step 2: Formulate the hypothesis
• A null hypothesis is a statement of the status quo, one of no difference or effect. If the null hypothesis is not rejected, no changes will be made.
• An alternative hypothesis is one in which some difference or effect is expected. Accepting the alternative hypothesis will lead to changes in opinions or actions.
• In marketing research, the null hypothesis is formulated in such a way that its rejection leads to the acceptance of the desired conclusion. The alternative hypothesis represents the conclusion for which evidence is sought – E.g., drug A reduces/increases symptom X (H1) vs. no effect (H0)
MARK977 Research for Marketing Decisions T1 201849
Step 2: Formulate the hypothesis (cont.)
• The test of the null hypothesis is a one-tailed test, because the alternative hypothesis is expressed directionally (i.e., there is some preferred direction, whether larger or smaller than some predefined value)
H0: µ ≤ 0.40 (null) H1: µ > 0.40 (alternative, one-tailed)
• If that is not the case, then a two-tailed test would be required, and the hypotheses would be expressed as:
H0: µ = 0.40 (status-quo, no difference) H1: µ ≠ 0.40 (alternative, two-tailed)
MARK977 Research for Marketing Decisions T1 201850
26
Step 2: Formulate the hypothesis (cont.)
MARK977 Research for Marketing Decisions T1 201851
Types Test variables Research objectives Hypotheses
One-tailed tests One variable To determine the direction of one variable (e.g., positive, negative, favourable, unfavourable – one-sample t- test).
H0: µ ≤ 4 (negative attitude) H1: µ > 4 (positive attitude)
Two-tailed tests Difference between two different variables
To determine the difference between two variables (e.g., difference between Ad1 and Ad2, difference in attitude between male and female – independent-samples t-test, paired-samples t-test).
H0: µ = 0 (no difference) H1: µ ≠ 0 (difference exists)
Step 3: Select an appropriate test
Parametric statistical techniques • Using parametric data (e.g., interval, ratio) • E.g., t-tests, correlation, regression analysis
Non-parametric statistical techniques • Using non-parametric data (e.g., nominal, ordinal) • E.g., chi-square test
MARK977 Research for Marketing Decisions T1 201852
27
Types of data and statistical techniques
MARK977 Research for Marketing Decisions T1 201853
Nominal and ordinal scales
Non-metric data (categorical)
Non-parametric tests – Frequencies
– Cross-tabulations – Chi-square (hypothesis testing)
Interval and ratio scales
Metric data (numerical)
Parametric tests – t-tests – ANOVA
– Regression
Step 4: Choose a level of significance • Whenever an inference is made about a population, there is a risk
that an incorrect conclusion will be reached. Two types of error can occur:
• Type I error occurs when the sample results lead to the rejection of the null hypothesis when it is in fact true – The probability of Type I error () is also called the level of
significance – Type I error is controlled by establishing the tolerable level of risk
(i.e., threshold) of rejecting a null hypothesis (typically at 5%) – If p < .05 = probability of Type I error is less than 5%
• Accepting a null hypothesis when it is false is called a Type II error and its probability is ()
MARK977 Research for Marketing Decisions T1 201854
28
Step 4: Choose a level of significance ()
• The most commonly chosen levels in academic research are the 1-percent, 5-percent and 10-percent levels.
• Although it is possible to test a hypothesis at any level of significance, bear in mind that the significance level selected is also the risk assumed of rejecting a null hypothesis when it is true.
• The higher the significance level used for testing a hypothesis (e.g., 10% instead of 5%), the greater is the probability of rejecting a null hypothesis when it is true.
MARK977 Research for Marketing Decisions T1 201855
Step 5: Compare test statistic and critical value
MARK977 Research for Marketing Decisions T1 201856
• A low p-value means the results are impressive and their implications are worth considering.
• A high p-value means the results should be disregarded or discounted.
29
Step 6: Draw marketing implications
• The conclusion reached by hypothesis testing must be expressed in terms of the marketing research problem and the managerial action that should be taken.
• For example, we conclude that there is evidence showing female respondents are significantly more favourable of TH than male respondents. Hence, the recommendation would be to target female users.
MARK977 Research for Marketing Decisions T1 201857
Hypothesis testing example
MARK977 Research for Marketing Decisions T1 201858
RQ: Do people like Ad1? (one-tailed test)
Data were collected from 30 respondents using the statement “I have a positive attitude towards Ad1” on a 7-point scale where (1=strongly disagree and 7=strongly agree)
1. Hypothesis formulation (one- tailed)
H0: µ ≤ 4 (neutral or negative) H1: µ > 4 (positive) α = 0.05
2. Select an appropriate test
Parametric (e.g., t-test) vs.
Non-parametric (e.g., chi-square)
3. Calculate results using SPSS (p-value, t-value)
If p < .05 then reject H0
4. Recommend marketing actions
30
Hypothesis testing example
MARK977 Research for Marketing Decisions T1 201859
RQ: Is there a difference between Ad1 and Ad2? (two-tailed test)
Data were collected from 30 respondents using the statement “Ad1 is different from Ad2” on a 7- point scale where (1=strongly disagree and 7=strongly agree)
1. Hypothesis formulation (two- tailed)
H0: µ = 0 (no difference) H1: µ ≠ 0 (difference exists) α = 0.05
2. Select an appropriate test
Parametric (e.g., t-test) vs.
Non-parametric (e.g., chi-square)
3. Calculate results using SPSS (p-value, t-value)
If p < .05 then reject H0
4. Recommend marketing actions
Cross-tabulation
• While a frequency distribution describes one variable at a time, a cross-tabulation describes two or more variables simultaneously
• Cross-tabulations are used for studying associations among and between (categorical) variables
• That is, the categories of one variable (e.g., gender – male and female) are cross-classified with the categories of one or more other variables (e.g., usage – heavy, medium and light)
MARK977 Research for Marketing Decisions T1 201860
31
Cross-tabulation of gender and usage of
Tommy Hilfiger Clothing
MARK977 Research for Marketing Decisions T1 201861
USAGE GENDER
Row Total Female Male
Light Users 14 5 19
Medium Users 5 5 10
Heavy Users 5 11 16
Column Total 24 21
Two variables cross-tabulation
• Because two (categorical) variables are being cross- classified, percentages could be computed either column-wise (based on column totals) or row-wise (based on row totals) – Both totals are reflective of your single tabulations
(i.e., frequency distribution) • So, which one is more useful? The answer depends on
which variable is considered as the IV and which as the DV
• The general rule is to compute the percentages in the direction of the IV, across the DV
MARK977 Research for Marketing Decisions T1 201862
32
Usage of Tommy Hilfiger Clothing (DV) by Gender (IV)
MARK977 Research for Marketing Decisions T1 201863
USAGE
GENDER
Female Male
Light Users 58.4% 23.8%
Medium Users 20.8% 23.8%
Heavy Users 20.8% 52.4%
Column Total 100.0% 100.0%
52.4 percent of the males are heavy users, only 20.8 percent of females are heavy users.
Statistics associated with cross-tabulation: Chi-Square
• Chi-square test assesses statistical significance and strength of association of cross-tabulated variables
• To determine whether a systematic association exists, the probability of obtaining a value of chi-square as large as or larger than the one calculated from the cross-tabulation is estimated.
• The null hypothesis (H0) of no association between the two variables will be rejected only when the probability associated with the value of the test statistic is less than the level of significance (α) (typically p < .05)
MARK977 Research for Marketing Decisions T1 201864
33
Chi-square test of association
MARK977 Research for Marketing Decisions T1 201865
Chi-Square Tests
Value df
Asymptotic
Significance (2-sided)
Pearson Chi-Square 6.341 a 2 .042
Likelihood Ratio 6.545 2 .038
Linear-by-Linear Association 6.182 1 .013
N of Valid Cases 45
a. 1 cells (16.7%) have expected count less than 5. The minimum expected count is 4.67.
• For the cross-tabulation table given above, the associated probability is 0.042 as determined by SPSS
• Given p < .05, the chi-square test suggests that H0 should be rejected
Interpreting the findings
• Given p < .05, the chi-square test suggests that H0 should be rejected • That is, there is a relationship between gender and usage • Compared to females, males are more likely to be heavy users of Tommy Hilfiger
clothing • Recommendation to management – promote more heavily to women to increase
their usage rate or promote more heavily to men to prevent brand loyalty erosion
MARK977 Research for Marketing Decisions T1 201866
User Group * Gender Crosstabulation
Gender
Total Female Male
User Group Light Users Count 14 5 19
% within Gender 58.3% 23.8% 42.2%
Medium Users Count 5 5 10
% within Gender 20.8% 23.8% 22.2%
Heavy Users Count 5 11 16
% within Gender 20.8% 52.4% 35.6%
Total Count 24 21 45
% within Gender 100.0% 100.0% 100.0%
34
Statistics associated with cross-tabulation: Phi coefficient
• The phi coefficient () is used as a measure of the strength of association in the special case of a table with two rows and two columns (a 2 x 2 table).
• It takes the value of 0 when there is no association, which would be indicated by a chi-square value of 0 as well. When the variables are perfectly associated, phi assumes the value of 1 (or -1 when there is perfect negative association).
MARK977 Research for Marketing Decisions T1 201867
Statistics associated with cross-tabulation: Cramer’s V
• Cramer’s V is a modified version of the phi correlation coefficient, , and is used in tables larger than 2 x 2.
• V will range from 0 to 1. • Values of V below 0.3 indicate low association, values
between 0.3 and 0.6 indicate low-to-moderate association, and values above 0.6 indicate strong association.
MARK977 Research for Marketing Decisions T1 201868
Symmetric Measures
Value
Approximate
Significance
Nominal by Nominal Phi .375 .042
Cramer’s V .375 .042
N of Valid Cases 45
It is meaningful to assess strength of association only if the null hypothesis of no association is rejected.
35
Cross-tabulation in practice
Conduct cross-tabulation analysis in practice as follows: • Test the null hypothesis that there is no association
between the variables using the chi-square statistic. If you fail to reject the null hypothesis, then there is no relationship.
• If H0 is rejected, then determine the strength of the association using an appropriate statistic (phi-coefficient or Cramer’s V).
• If H0 is rejected, interpret the pattern of the relationship by computing the percentages in the direction of the independent variable, across the dependent variable. Draw marketing conclusions.
MARK977 Research for Marketing Decisions T1 201869
Hypothesis testing in cross-tabulation
MARK977 Research for Marketing Decisions T1 201870
36
SPSS Exercise – Descriptive Analyses
Today’s activities:
• Frequencies • Cross-tabulations • Chi-square tests
MARK977 Research for Marketing Decisions T1 201871
Frequencies Demo
• Open the “Tommy Hilfiger” dataset • Perform the following tests:
– Examine the frequency distribution of “Attitude Towards Tommy Hilfiger”
– Plot the associated histogram – Calculate the descriptive statistics for central tendency and
dispersion
• Follow your tutor’s instructions • Refer to the “Frequencies” guide on Moodle
MARK977 Research for Marketing Decisions T1 201872
37
Your turn
• Open the “Tommy Hilfiger” dataset • Examine the frequency distributions of income, gender,
user group and type of car owned • Answer the following questions:
– Which income group is the most highly represented? – Which type of car owned category represents the highest
percentage? – Which gender group represents the highest percentage? – Which user group is the most highly represented?
MARK977 Research for Marketing Decisions T1 201873
Cross-tab Demo
• Suppose Tommy Hilfiger was interested in determining whether gender was associated with the degree of usage of Tommy Hilfiger clothing – That is, is usage of Tommy Hilfiger clothing related to gender? – H0: There is no relationship between gender and usage – H1: There is a relationship between gender and usage
• Open the “Tommy Hilfiger” dataset • Perform a cross-tabulation on Gender (IV) Usage (DV) • Follow your tutor’s instructions • Refer to the “Cross-tabulations” guide on Moodle
MARK977 Research for Marketing Decisions T1 201874
38
Your turn
• Open the “Tommy Hilfiger” dataset • Examine whether gender is associated with type of
car owned and answer the following questions: – What is the null hypothesis? – What is the alternative hypothesis? – Is there any relationship between gender and type of
car owned? – What are the marketing implications?
MARK977 Research for Marketing Decisions T1 201875