PART 1: HISTOGRAMS AND DESCRIPTIVE STATISTICSYour first IBM SSPS assessment includes two sections:Create two histograms and provide interpretations.Calculate measures of central tendency and dispersion and provide interpretations.Key Details and InstructionsSubmit your assessment as an attached Word document.Begin your assessment by creating a properly formatted APA title page. Include a reference list at the end of the document if necessary. On page 2, begin Section 1.Organize the narrative report with your SPSS output charts and tables integrated along with your responses to the specific requirements listed for that assessment. (See theCopy/Export Output Instructions in the Resources for instructions on how to do this).Label all tables and graphs in a manner consistent with APA style and formatting guidelines. Citations, if needed, should be included in the text as well as a reference section at the end of the report.For additional help in completing this assessment, refer to the IBM SPSS Step-By-Step Instructions: Histograms and Descriptive Statistics document, linked in the Required Resources.Section 1: Histograms and Visual InterpretationSection 1 will include one histogram of “total” scores for all the males in the data set, and one histogram of “total” scores for all the females in the data set.Create two histograms using the totaland gendervariables in your grades.savdata set:A histogram for male students.A histogram for female students.Below the histograms, provide an interpretation based on your visual inspection. Correctly use all of the following terms in your discussion:Skew.Kurtosis.Outlier.Symmetry.Modality.Comment on any differences between males and females regarding their total scores. Analyze the strengths and limitations of visually interpreting histograms.Section 2: Calculate and Interpret Measures of Central Tendency and DispersionUsing the grades.savfile, compute descriptive statistics, including mean, standard deviation, skewness, and kurtosisfor the following variables:id.gender.ethnicity.gpa.quiz3.total.Below the Descriptives table, complete the following:Indicate which variable(s) are meaningless to interpret in terms of mean, standard deviation, skewness, and kurtosis. Justify your decision.Next, indicate which variable(s) are meaningful to interpret. Justify your decision. For meaningful variables, specify any variables that are in the ideal range for both skewness and kurtosis.Specify any variables that are acceptable but not excellent.Specify any variables that are unacceptable. Explain your decisions.For all meaningful variables, report and interpret the descriptive statistics (mean, standard deviation, skewness, and kurtosis).PART 2: DATA SCREENINGFor this part of the assessment, respond to the following questions:What are the goals of data screening? How can you identify and remedy the following?Errors in data entry.Outliers.Missing data.PART 3: Z-SCORES, TYPE I AND II ERROR, NULL HYPOTHESIS TESTINGThis IBM SPSS assessment includes three sections:Generate z-scores for a variable in grades.sav and report/interpret them.Analyze cases of Type I and Type II error.Analyze cases to either reject or not reject a null hypothesis.The format of this assessment should be narrative with supporting statistical output (table and graphs) integrated into the narrative in the appropriate place (not all at the end of the document). See the Copy/Export Output Instructionsfor instructions on how to do this.Download the z-Scores, Type I and Type II Error, Null Hypothesis Testing Answer Templatefrom the Required Resources, and use the template to complete the following sections:Section 1: z-Scores in SPSS.Section 2: Case Studies of Type I and Type II Error.Section 3: Case Studies of Null Hypothesis Testing.ReferenceGeorge, D., & Mallery, P. (2016). IBM SPSS statistics 23 step by step: A simple guide and reference(14th ed.). New York, NY: Routledge.For this three-part assessment, you will create and interpret histograms and compute descriptive statistics for given variables; analyze the goals of data screening; and generate z-scores for variables, analyze types of error, and analyze cases to either reject or not reject a null hypothesis. You will use SPSS software and several course files to complete this assessment.By successfully completing this assignment, you will demonstrate your proficiency in the following areas:Analyze the computation, application, strengths and limitations of various statistical tests.Analyze the strengths and limitations of examining a distribution of scores with a histogram.Analyze the relevant data from the computation, interpretation, and application of z-scores.Analyze real-world application of Type I and Type II errors, and the research decisions that influence the relative risk of each.Analyze the decision-making process of data analysis.Analyze meaningful versus meaningless variables reported in descriptive statistics.Apply the logic of null hypothesis testing to cases.Interpret the results of statistical analyses.Interpret histogram results, including concepts of skew, kurtosis, outliers, symmetry, and modality.Interpret descriptive statistics for meaningful variables.Apply a statistical program’s procedure to data.Apply the appropriate SPSS procedures for creating histograms to generate relevant output.Apply the appropriate SPSS procedure for generating descriptive statistics to generate relevant output.Apply the appropriate SPSS procedures for creating z-scores and descriptive statistics to generate relevant output.Communicate in a manner that is scholarly, professional, and consistent with expectations for members of the identified field of study.Communicate in a manner that is scholarly, professional and consistent with expectations for members of the identified field of study.IBM SPSS Statistics Standard GradPack. (The Base GradPack is not acceptable for use in this course.) but it must be version 22 or higher. Be sure to use the version that is compatible with your operating system (PC or Mac).SUGGESTED RESOURCESThe resources provided here are optional. You may use other resources of your choice to prepare for this assessment; however, you will need to ensure that they are appropriate, credible, and valid.ResourcesGeorge, D., & Mallery, P. (2016). IBM SPSS statistics 23 step by step: A simple guide and reference(14th ed.). New York, NY: Routledge.Internet ResourcesLane, D. M. (2013). HyperStat online: An introductory statistics textbook and online tutorial for help in statistics courses. Retrieved from http://davidmlane.com/hyperstatStatSoft, Inc. (2013). Electronic statistics textbook. Tulsa, OK: StatSoft. Retrieved fromhttp://www.statsoft.com/textbookSophia. (2014). Retrieved from http://www.sophia.org/Browse Sophia for tutorials that explore statistical topics.StatisticsLectures.com. (2012). Free statistics lectures. Retrieved fromhttp://statisticslectures.com/Khan Academy. (2013). Retrieved from https://www.khanacademy.orgThis Web site offers resources covering a range of subjects, including statisticsHall, R. (1998). Between subjects one-way ANOVA example. Psychology World. Retrieved from http://web.mst.edu/~psyworld/anovaexample.htmElliot, A. C. (2012). ANOVA using Microsoft Excel: One-way analysis of variance. Excel Tutorials for Statistical Data Analysis. Retrieved fromhttp://www.stattutorials.com/EXCEL/EXCEL_ANOVA.htm…Onwuegbuzie, T. (Producer). (2009). Mixed methods research[Video] | Transcript. Available from http://videolectures.net/ssmt09_onwuegbuzie_mmrPREPARATIONThis assessment has three parts, each of which is described below. Submit all three parts as Word documents.Note:All the course documents you will need for the assignment are attached.This assessment uses the grades.savfile. The grades.savfile is a sample SPSS data set that is converted from the grades2.datfile. (Use the Data Set Instructionsto convert thegrades2.datfile to the grades.savfile that you will use throughout this course.)The fictional data in the grades.savfile represent a teacher’s recording of student demographics and performance on quizzes and a final exam across three sections of the course. Each section consists of about 35 students (N = 105).There are 21 variables in grades.sav. To prepare for this assessment, complete the following:Open your grades.sav file and navigate to the “Variable View” tab.Read the Data Set Instructions, and make sure you have the correct Values and Scales of Measurement assigned.
2017033019310820170329183353cf_assessment_1_context.doc
2017033019311020170329183321answer_template.docx
2017033019310920170329183345cf_copy_export_output_instructions.docx
2017033019310720170329183326step_by_step_instructions.docx
Unformatted Attachment Preview
Assessment 1 Context
Transitioning from Descriptive Statistics to Inferential Statistics
In this assessment, we begin the transition from descriptive statistics to inferential statistics
which include correlation, t-tests, and analysis of variance (ANOVA). This context document
includes information on key concepts related to descriptive statistics, as well as concepts
related to probability and the logic of null hypothesis testing (NHT).
Scales of Measurement
In our initial quest to develop comprehension of statistical analysis, it is first important to be sure
the raw materials used in the activity are understood. Statistical methods are methods of
analyzing data. What does that mean? In order to understand statistics, we must first develop a
basic vocabulary for describing data, and recognize a system of names for different categories
and kinds of data. For the most part, the statement means statistics provide various ways of
answering specific questions about data. It may serve us well to first back up and make sure the
fundamental units of statistical data are understood.
In statistical analysis we make use of a concept called variables. A variable is an abstract
concept of a placeholder or a reserved space. For instance, we may have a variable named
GENDER. Gender can have two possible values, male or female. The concept of a variable is
often easiest to understand in a concrete sense by likening data to a typical table like those
seen in textbooks, or like a spreadsheet table found on computers. It is a series of rows and
columns which cross over each other. A column in a table may be arranged such that the
gender identities of a group of people is recorded. We may have a list of names in each of the
rows of the table in the first column, then a second column in the table which has the letters M
or F for each name. The title, or heading, of this column might read “gender.” The concept of a
variable is like the column heading where the gender is recorded. The column is the reserved
space for gender data, called a variable, and the column heading is the name of the variable.
Notice that the values of gender—male and female—vary among different people in the rows.
This is why the entire column is called a variable. The values of the variable vary among the
rows.
Given this concept, be sure you understand that male and female are NOT variables. The
variable corresponds to the name of the column where the information is recorded in the table.
In the case of this example, the name of the variable would probably be GENDER—not male or
female. Make sure you understand this distinction so you do not use the term variable
incorrectly, as it will cause a great deal of confusion when trying to communicate with others
about statistics. In the case of the data we are discussing here, it would make no sense
whatsoever, and would confuse whoever you were trying to communicate with, if you said
something like “the male variable,” or that you are working with two variables—male and
female. That is not true, and it will confuse whoever you say it to.
Consider that in the table where our list of people’s names were recorded along with their
gender values, we now have two columns. The first column contains names, the second column
contains the gender value for each person. Suppose there was a third column in the table. In
that column, we may decide to write down each person’s height in inches. At the top of the
column, we would place the title HEIGHT, and each row in the table would have a number
which was the height of the person designated in the first column of each row. Notice that
HEIGHT is another variable. Notice it is a fundamentally different kind of variable from
1
GENDER, because it is a number which corresponds to how tall the person is, while GENDER
only has two qualitative values—male and female. This points to another important idea about
data. There are several different types of data, and you must be able to use a system of names
which have been developed to distinguish between different types of variables. Understanding
these categories of data is critical to launching your effort to understand statistics, because if
you do not learn this system before beginning your exposure to statistical analysis, you will be
completely lost more times than you are successful. It is worth your time to spend the requisite
amount of time and effort required to understand the concepts we are about to discuss before
you speed ahead to your introduction to statistics in order to be finished. Of course, that is only
true if you want to understand the statistics. If you would rather be completely lost, and have to
call your instructor later in the course and ask for extensions on your assignments while you
seek help, or if you look forward to asking for an incomplete grade because you cannot
comprehend anything you are studying and you cannot do the assignments, then simply browse
through this concept of levels of measurement without understanding it. You are guaranteed to
suffer that fate.
There are two ways to understand what is important about the different types of variables. The
first is the most extensive and formal, and the second is a rough approximation that will usually
serve your purposes.
First, an important concept in understanding descriptive statistics is the four levels of
measurement—nominal, ordinal, interval, and ratio (Warner, 2013). These are sometimes called
“scales of measurement,” “levels of data,” or simply “data levels.” These scales of
measurement, or kinds of variables, are concepts which are needed to begin your study,
because each technique of analysis to which you are introduced is designed for only specified
kinds of variables, or levels of measurement. In order to understand what kind of variables to
use for the analysis, and which kinds of variables are involved in different parts of the analysis,
you must be able to recognize the level of measurement of a variable when you see its
definition. There are two important concepts involved in determining the level of measurement
for a variable. It is critical to know both the characteristics of the variable’s values (such as male
or female, or height in inches), but also what it is the values are being used to measure or
designate in the research project where the variable is being used.
Consider that last statement again. We must know (1) the kinds of values that are written in the
column where the variable is recorded, and (2) what those values correspond to in the people
they are being used to describe. Failure to consider both ideas will result in misunderstanding
levels of measurement. It is instructive to begin by simply describing the four levels, and then
move on to some examples and explanations. First, please remember that the four levels are a
hierarchy. That is, they are in order from lowest to highest, and each successive level has all the
qualities of the levels before it, plus some new added category of information.
•
Nominal data refers to numbers (or letters) arbitrarily assigned to represent group
membership, such as gender (male = 1; female = 2). Nominal data is useful in
comparing groups, but they are meaningless in terms of measures of central tendency
and dispersion reviewed below. It does not matter whether the values of the nominal
variable are numbers or letters. If it is numbers, they are mathematically meaningless.
That is, 2 is not “more” than 1.
•
Ordinal data represents ranked data, such as coming in first, second, or third in a
marathon. However, ordinal data does not tell us how much of a difference there is
between measurements. The first-place and second-place finishers could finish 1
second apart, whereas the third-place finisher arrives 2 minutes later. Ordinal data lacks
2
equal intervals, which again prevents most mathematical interpretations, such as adding
or averaging the values.
•
Interval data refers to data with equal intervals between data points. This is the first
level of measurement where the values can have general mathematical meaning. An
example is degrees measured in Fahrenheit. The drawback for interval data is the lack
of a “true zero” value (freezing at 32 degrees Fahrenheit, and 0 does not mean “no
heat”). The most serious consequence of a lack of a true zero is that one cannot reason
using ratios – 4 is not necessarily twice as large as 2, and 5 is not necessarily half as
much as 10.
•
Ratio data do have a true zero, such as heart rate, where “0” represents a heart that is
not beating. This level allows full mathematical interpretation of the values, including
ratios.
These four scales of measurement are routinely reviewed in introductory statistics textbooks as
the “classic” way of differentiating measurements. However, the boundaries between the
measurement scales are fuzzy; for example, is intelligence quotient (IQ) measured on the
ordinal or interval scale? Recently, researchers have argued for a simpler dichotomy in terms of
selecting an appropriate statistic. Most of the time, being able to classify a variable into one of
the two following categories will serve the purposes needed:
•
Categorical versus quantitative measures.
o
A categorical variable is a nominal variable. It simply categorizes things according
to group membership (for example, apple = 1, banana = 2, grape = 3).
o
A quantitative measure represents a difference in magnitude of something, such
as a continuum of “low to high” statistics anxiety. In contrast to categorical variables
designated by arbitrary values, a quantitative measure allows for a variety of
arithmetic operations, including =, <, >, +, -, x, and ÷. Arithmetic operations generate
a variety of descriptive statistics discussed next.
Note that categorical variables generally correspond to the nominal and ordinal levels of
measurement in the previous system, and quantitative variables typically correspond to the
interval and ratio levels of measurement. Quantitative variables, or variables which are at the
interval or ratio level of measurement, are designated as scale variables in SPSS software.
In order to determine the level of measurement of a variable, one must consider the nature of
the values as well as what those values represent, or what the variable is measuring. For
instance, the same type of variable may have two levels of measurement if the two versions are
measuring different things in a research project. The level of measurement refers to the
construct in the research project which is being measured—not the values of the variable itself.
A most interpretable example would be a distinction between two different variables which are
exactly the same, except their operational definition in the research project in which they are
used is different. Consider a score on a teacher-made test, where the score consists of the
number of correct answers out of 100 questions. The variable can be defined in two ways:
1. An index which corresponds to the amount of knowledge the test taker has in the topic
area covered by the test.
2. The number of grade points credit earned on the test by the test taker.
Notice that every person will have the same values on each of these two variables. It is tempting
to say there are no differences. A closer look reveals these two variables are at two different
levels of measurement. First, when defined by the first definition above—as the amount of
3
knowledge—we know very little about the meaning of the numbers involved. We do not know if
each question has the same amount of knowledge in it. Also, we certainly cannot say that a
score of zero means the person has “no knowledge.” Notice that this restricts the variable to the
ordinal level of measurement, because we cannot even be sure that the necessary qualities for
interval level measurement are met (equal distances between points in terms of amount of
knowledge). We cannot say that a person that scores 10 has twice as much knowledge as a
person who scores 5, and we cannot say that the difference in knowledge between two people
who score 5 and 6 is the same as the difference between two people who score 1 and 2. On the
other hand, when the measurement is defined by the second definition above, the level of
measurement is ratio. Clearly, all the requirements for ratio level data are met. Zero means no
grade points, and a person who scores 6 has twice as many grade points as a person who
scores 3. The point of the preceding paragraph is that the level of measurement of a variable
depends on what it is being used for—what is being measured in the research project.
Measures of Central Tendency and Dispersion
Descriptive statistics summarize a set of scores in terms of central tendency (for example,
mean, median, mode) and dispersion (for example, range, variance, standard deviation).
As an example, consider a psychologist who measures 5 participants’ heart rates in beats per
minute: 62, 72, 74, 74, and 118.
•
The simplest measure of central tendency is the mode. It is the most frequent score
within a distribution of scores (for example, two scores of hr = 74). Technically, in a
distribution of scores, you can have two or more modes. An advantage of the mode is
that it can be applied to categorical data. It is also not sensitive to extreme scores. This
measure is suitable for all levels of measurement.
•
The median is the geometric center of a distribution because of how it is calculated. All
scores are arranged in ascending order. The score in the middle is the median. In the
five heart rates above, the middle score is a 74. If you have an even number of scores,
the average of the two middle scores is used. The median also has the advantage of not
being sensitive to extreme scores. This measure is only suitable for data which are at
the ordinal level of measurement or above.
•
The mean is probably what most people consider to be an average score. In the
example above, the mean heart rate is (62+72+74+74+118) ÷ 5 = 80. Although the
mean is more sensitive to extreme scores (for example, 118) relative to the mode and
median, it can be more stable across samples, and it is the best estimate of the
population mean. It is also used in many of the inferential statistics studied in this
course, such as t-tests and analysis of variance (ANOVA). This measure is suitable only
for interval and ratio level variable (quantitative measures). It relies on math (sums and
quotients) which are not valid for ordinal measures.
•
In contrast to measures of central tendency, measures of dispersion summarize how far
apart data are spread on a distribution of scores. The range is a basic measure of
dispersion quantifying the distance between the lowest score and the highest score in a
distribution (for example, 118 – 62 = 56). A deviance represents the difference between
an individual score and the mean. For example, the deviance for the first heart rate score
(62) is 62– 80, or – -18. By calculating the deviance for each score above from a mean of
80, we arrive at -18, -8, -6, -6, and +38. Summing all of the deviances equals 0, which is
not a very informative measure of dispersion. An alternative measure of dispersion is
4
the interquartile range (IQR), as well as the semi-interquartile range (sIQR). If the values
are placed in order from lowest to highest, each value can be assigned a rank, with the
lowest rank being 1 which corresponds to the lowest value in the data set. The highest
value of the variable will have a rank equal to the number of values in the data set. The
ranks can be used to identify two important scores in the data set. First, the 25th
percentile is the score which 25% of the scores fall at or below. There is also a 75th
percentile, which is the score which 25% of the sample scores above. The IQR is the
difference between the scores at the 75th and 25th percentiles, and the sIQR is half of
that value. These measures are suitable for variables at the ordinal level of
measurement and above.
•
A somewhat more informative measure of dispersion is sum of squares (SS), which you
will see again in the study of analysis of variance (ANOVA). To get around the problem
of summing to zero, the sum of squares involves calculating the square of each deviation
and then summing those squares. In the example above, SS = [(-18)2 + (-8)2 + (-6)2 + (6)2 + (+38)2] = [(324) + (64) + (36) + (36) + (1444)] = 1904. The problem with SS is that it
increases as data points increase (Field, 2009), and it still is not a very informative
measure of dispersion. This measure is suitable only for interval and ratio levels of
measurement.
•
This problem is solved by next calculating the variance (s2), which is the average
distance between the mean and a particular score (squared). Instead of dividing SS by 5
for the example above, we divide by the degrees of freedom, N – 1, or 4. The variance
is therefore SS ÷ (N– 1), or 1904 ÷ 4 = 476. The problem with interpreting variance is
that it is the average distance of “squared units” from the mean. What is, for example, a
“squared” heart rate score? This measure is suitable only for interval and ratio levels of
measurement.
•
The final step is calculating the standard deviation(s), which is simply calculated as the
square root of the variance, or in our example, √476 =21.82. The standard deviation
represents the average deviation of scores from the mean. In other words, the average
distance of heart rate scores to the mean is 21.82 beats per minute. If the extreme score
of 118 is replaced with a score closer to the mean, such as 90, then s = 9.35. Thus,
small standard deviations (relative to the mean) represent a small amount of dispersion;
large standard deviations (relative to the mean) represent a large amount of dispersion
(Field, 2009). The standard deviation is an important component of the normal
distribution. This measure is suitable only for interval and ratio levels of measurement.
Notice that the various methods of expressing central tendency and dispersion are suitable for
different levels of measurement. A brief summary of the types of measures used with different
levels of measurement are shown in the table below:
.
5
Levels of Measurement
Categorical
Typical Measures
Nominal
Central Tendency
Mode
Dispersion
k*
Ordinal
Median
sIQR
Quantitative
Interval
Ratio
Mean
Standard
Deviation
Note: The number of categories or groups (often designated as k) is the primary
expression of variation for nominal level data.
Visual Inspection of a Distribution of Scores
An assumption of the statistical tests that you will study in this course is that the scores for a
dependent variable, like a range of heart rate scores, are normally (or approximately normal) in
shape. The assumption is first checked by examining a histogram of the distribution. This
method is meaningful only for quantitative variables—interval or ratio levels of measurement. It
makes no sense to create histograms of nominal or ordinal (categorical) variables.
Departures from normality and symmetry are assessed in terms of skew and kurtosis.
Skewness is the tilt or extent a distribution deviates from symmetry around the mean. A
distribution that is positively skewed has a longer tail extending to the right (that is, the “positive”
side of the distribution) A distribution that is negatively skewed has a longer tail extending to the
left (that is, the “negative” side of the distribution) In contrast to skewness, kurtosis is defined as
the peakedness of a distribution of scores.
The use of these terms is not limited to your description of a distribution following a visual
inspection. They are included in your list of descriptive statistics and should be included when
analyzing your distribution of scores. Skew and kurtosis scores of near zero indicate a shape
that is symmetric or close to normal respectiv …
Purchase answer to see full
attachment
 
					




Recent Comments