SPSS Tutorial (Part 1)

  • The Basic

SPSS for Windows has the same general look a feel of most other programmes for Windows. Virtually anything statistic that you wish to perform can be accomplished in combination with pointing and clicking on the menus and various interactive dialog boxes. You may have noted that the examples in the Howell textbook are performed/analyzed via code. That is, SPSS, like many other packages, can be accessed by programming short scripts, instead of pointing and clicking. We will not cover any programming in this tutorial.

Presumeably, SPSS is already installed on your computer. If you don’t have a shortcut on your desktop go to the [Start => Programs] menu and start the package by clicking on the SPSS icon.

Before proceeding I should say a few words about a very simple convention that will be used in this tutorial. In this point and click environment one often has to navigate through many layers of menu items before encountering the required option. In the above paragraph the prescribed task was to locate the SPSS icon in the [Start] menu structure. To get to that icon, one must first click on [Start] then move the pointer to the[Programs] options, before locating the SPSS icon. This sequence of events can be conveyed by typing [Start => Programs] . That is, one must move from the outer layer of the menu structure to some inner layer in sequence….

Now, back to the tutorial.

Once you’ve clicked on the SPSS icon a new window will appear on the screen. The appearance is that of a standard programme for windows with a spreadsheet-like interface.

As you can see, there are a number of menu options relating to statistics, on the menu bar. There are also shortcut icons on the toolbar. These serve as quick access to often used options. Holding your mouse over one of these icons for a second or two will result in a short function description for that icon. The current display is that of an empty data sheet. Clearly, data can either be entered manually, or it can be read from an existing data file.

Browsing the file menu, below, reveals nothing too surprising – many of the options are familiar. Although, the details are specific to SPSS. For example, the [New] option is used to specify the type of window to open. The various options, under the [New] heading are,

  • [Data] Default window with a blank data sheet ready for analyses
  • [Syntax] One can write scripts like those present in the Howell text, instead of using the menus. See the SPSS manuals for help on this topic.
  • [Output] Whenever a procedure is run, the out is directed to a separate window. One can also have multiple [Output] windows open to organize the various analyses that might be conducted. Later, these results can be saved and/or printed.
  • [Script] This window provides the opportunity to write fullblown programmes, in a BASIC-like language. These programmes have access to functions that make up SPSS. With such access it is possible to write user-defined procedures – those not part of SPSS – by taking advantage of the SPSS functions. Again, this is beyond the scope of this tutorial.
Also present in the [File] menu are two separate avenues for reading data from existing files. The first is the [Open] option. Like other application packages (e.g., WordPerfect, Excel, ….) SPSS also has it’s own format for saving data. In this case, the accepted extension for any file saved using the proprietary format is “sav”. So, one can have a datafile saved as “data1.sav”. Anyways, this format is not readable with a text editor, it is a binary format. The benefits are that all formatting changes are maintained and the file can be read faster, hence the [Open] option. It is specifically meant for files saved in the SPSS format. The second option, [Read ASCII Data], as the name suggests is to read files that are saved in ASCII format. As can be seen, there are two choices – [Freefield] and [Fixed Columns]. Clicking on one of these options will produce a dialog box. One must specify a number of parameters before a file can be read successfully.
Reading ASCII files requires that the user know something about the format of the data file. Otherwise, one is likely get stuck in the process of reading, or the result may be a costly error. The more restrictive format is[Fixed Columns]. One must know how many variables there are, whether a variable is in numeric or string format, and the first and last column of each variable. For example, think of the following as an excerpt from an ASCII datafile.
male 37 102
male 22 115
male 27 99
.... .. ...
female 48 107
female 21 103
female 28 122
...... .. ...

An examination of the datafile provides several key pieces of information,

  1. There are 3 variables
  2. Variable 1 is a string , Variable 2 and 3 are numeric
  3. Variable 1: first column=1, last column=6
    • Notice that none of the columns overlap. The longest case for column one is the name “female”, that spans from the first column to the sixth – or, the letter e. As you can see, one has to manually locate the first and last column, of each variable.
  4. Variable 2: first column=9, last column=10
  5. Variable 3: first column=12, last column=14
One needs all of the above information, in addition to, name for each of the three variables.
It is a highly structured way of setting up and describing the data. For such files I would
suggest becoming comfortable with a good text editor.

Failing that, you may wish to try Notepad or WordPad in Win95, but ensure that you save as a
textfile with WordPad. A fullfledged word processor like Word or WordPerfect will also work
provided that you remember to save as a textfile. These same editors will allow you to figure out
the column locations for each of the variables.

The [Freefield] option is less restrictive. Essentially, the columns can be ragged (i.e., overlapping). One need only preserve the order of each variable across all of the cases.

male 37 102
male 22 115
male 27 99
.... .. ...
female 48 107
female 21 103
female 28 122
...... .. ...

Experiment with creating datafiles and reading them with this method. As for the SPSS format, there are a large number of sample datafiles included in your package. Just click on [Open] and find the SPSS home directory. Make sure the filetype in the dialog box associated with [Open] is set to “*.sav” – the default…

Before we move onto actual data, click on [Statistics] . The menu that appears reveals many classes of statistics available for use. Each class is further subdivided into other options, as denoted by the little arrow at the right size of the menu selector. Explore what is offered by moving your mouse over the various procedures listed

  • Data

To begin the process of adding data, just click on the first cell that is located in the upper left corner of the datasheet. It’s just like a spreadsheet. You can enter your data as shown. Enter each datapoint then hit [Enter]. Once you’re done with one column of data you can click on the first cell of the next column.

These data are taken from table2.1 in Howell’s text. The first column represents “Reaction Time in 100ths of a second” and the second column indicates “Frequency”.

If you’re entering data for the first time, like the above example, the variable names will be automatically generated (e.g., var00001, var00002,….). They are not very informative. To change these names, click on the variable name button. For example, double click on the “var00001” button. Once you have done that, a dialog box will appear. The simplest option is to change the name to something meaningful. For instance, replace “var00001” in the textbox with “RT” (see figure below)

In addition to changing the variable name one can make changes specific to [Type], [Labels], [Missing Values], and [Column Format].

  • [Type] One can specify whether the data are in numeric or string format, in addition to a few more formats. The default is numeric format

  • [Labels] Using the labels option can enhance the readability of the output. A variable name is limited to a length of 8 characters, however, by using a variable label the length can be as much as 256 characters. This provides the ability to have very descriptive labels that will appear at the output.Often, there is a need to code categorical variables in numeric format. For example, male and female can be coded as and 2, respectively. To reduce confusion, it is recommended that one uses value labels . For the example of gender coding, Value:1 would have a correspoding Value label: male. Similarly, Value:2 would be coded with Value Label: female. (click on the [Labels] button to verify the above)
  • [Missing Values] See the accompanying help. This option provides a means to code for various types of missing values.
  • [Column Format] The column format dialog provides control over several features of each column (e.g., width of column).

The next image reflects the variable name change.

Once data has been entered or modified, it is adviseable to save. In fact, save as often as possible [File => SaveAs].

SPSS offers a large number of possible formats, including their own. A list of the available formats can be viewed and selected by clicking on the Save as type: , on the SaveAs dialog box. If your intention is to only work in SPSS, then there may be some benefit to saving in the SPSS(*.sav) format. I assume that this format allows for faster reading and writing of the data file. However, if your data will be analyzed and looked by other packages (e.g., a spreadsheet), it would be adviseable to save in a more universal format (e.g., Excel(*.xls), 1-2-3 Rel 3.0 (*.wk3).

Once the type of file has been selected, enter a filename, minus the extension (e.g., sav, xls). You should also save the file in a meaningful directory, on your harddrive or floppy. That is, for any given project a separate directory should be created. You don’t want your data to get mixed-up.

The process of reading already saved data can be painless if the saved format is in the SPSS or a spreadsheet format. All one has to do is,

    • click on [File => New => Data]
    • click on [File => Open] : a dialog box will appear
    • navigate to desired directory using the Look in: menu at the top of the dialog box
    • select file type in the Files of type menu
    • click on the filename that is needed.

The process of reading existing files is slightly more involved if the format is ASCII/plain text (see the earlier description of [Freefield] and [Fixed Columns]). As an example, the ASCII data from table2.1 in the Howell text will be used. A file containing the data should be included in the accompanying disk for the text. [Note: It was not present in my disk, so I downloaded the file from Howell’s webpage.] I’ve placed the files on my harddrive at c:\ascdat. In the case of this set of data,there are four columns representing observation number, reaction time, setsize, and the presence or absence of the target stimulus. This information can be found in thereadme.txt file that is also on the disk. Typically, we are aware of the contents of our own data files, however, it doesn’t hurt to keep a record of the contents of such files.

To make life easier the [File => Read ASCII Data => Freefield] will be used.

The resulting dialog box requires that a File , a Name and a Data Type be specified for each variable, or column of data. The desired file is accessed by clicking on the [Browse] button, and then navigating to the desired location. Since the extension for the sought after file is dat there is no need to change the Files of type: selection. However, if the extension is something else (e.g., *.txt) then it would be necessary to select All files(*.*)from the Files of type: menu. Since there are 4 variables in this data set, 4 names with the corresponding type information must be specified. To Add the first variable, observations, to the list,

    • type “obs” in the Name box
    • the Data Type is set to Numeric by default. If “obs” was a string variable, then one would have to click on String
    • click on the Add button to include this variable to the list.
    • repeat the above procedure with new names and data types for each of the remaining variables. It is important that all variables be added to the list. Otherwise, the data will be scrambled.

(Please explore the various options by clicking on any accessible menu item.)

The resulting data files appears in the data editor like the following.

  • Descriptive Statistics

We can replicate the frequency analyses that are described in chapter 2 of the text, by using the file that was just read into the data editor – tab2-1.dat. These analyses were conducted on the reaction time data. Recall, that we have labelled this data as RT.

To begin, click on [Statistics=>Summarize=>Frequencies]…

The result is a new dialog box that allows the user to select the variables of interest. Also, note the other clickable buttons along the border of the dialog box. The buttons labelled [Statistics…] and [Charts…] are of particular importance. Since we’re interested in the reaction time data, click on rt followed by a mouse click on the arrow pointing right. The consequence of this action is a transference of the rt variable to the Variables list. At this point, clicking on the [OK] button would spawn an output window with the Frequency information for each of the reaction times. However, more information can be gathered by exploring the options offered by the[Statistics…] and [Charts…]

[Statistics…] offers a number of summary statistics. Any statistic that is selected will be summarized in the output window.

As for the options under [Charts…] click on Bar Charts to replicate the graph in the text.

Once the options have been selected, click on [OK] to run the procedure. The results are then displayed in an output window. In this particular instance the window will include summary statistics for the variable RT, the frequency distribution, and the frequency distribution. You can see all of this by scrolling down the window. The results should also be identical to those in the text.

You may have gathered from the above that calculating summary statistics requires nothing more than selecting variables, and then selecting the desired statistics. The frequency example allowed us to generate frequency information plus measures of central tendencies and dispersion. These statistics can be had by clicking directly on [Statistics=>Summarize=>Descriptives]. Not surprisingly, another dialog box is attached to this procedure. To control the type of statistics produced, click on the [Options…] button. Once again, the options include the typical measures of central tendency and dispersion.

Each time as statistical procedure is run, like [Frequencies…] and [Descriptives…] the results are posted to an Output Window. If several procedures are run during one session the results will be appended to the same window. However, greater organization can be reached by opening new Output windows before running each procedure – [File=>New=>Output]. Further, the contents of each of these windows can be saved for later review, or in the case of charts saved to be later included in formattted documents. [Explore by left mouse clicking on any of the output objects (e.g., a frequency table, a chart, …) followed by a right button click. The left left button click will highlight the desired object, while the right button click will popup a new menu. The next step is to click on the copy option. This action will store the object on the clipboard so that it can be pasted to Word for Windows document, for example…..]

  • Chi-Square & T-Test

The computation of the Chi-Square statistic can be accomplished by clicking on [Statistics => Summarize => Crosstabs…]. This particular procedure will be your first introduction to coding of data, in the data editor. To this point data have been entered in a column format. That is, one variable per column. However, that method is not sufficient in a number of situations, including the calculation of Chi-Square, Independent T-tests, and any Factorial ANOVA design with between subjects factors. I’m sure there are many other cases, but they will not be covered in this tutorial.  Essentially, the data have to be entered in a specific format that makes the analysis possible.  The format typcially reflects the design of the study, as will be demonstrated in the examples.

In your text, the following data appear in section 6.????. Please read the text for a description of the study. Essentially, the table – below – includes the observed data and the expected data in parentheses.

Fault Guilty Not Guilty Total
Low 153(127.559) 24(49.441) 177
High 105(130.441) 76(50.559) 181
Total 258 100 358

In the hopes of minimizing the load time for remaining pages,  I will make use of the built in table facilty of HTML to simulate the Data Editor in SPSS. This will reduce the number of images/screen captures to be loaded.

For the Chi-Square statistic, the table of data can be coded by indexing the column and row of the observations.  For example, the count for being guilty with Low fault is 153.  This specific cell can be indexed as coming from row=1 and column=1.  Similarly, Not Guilty with High fault is coded  as row=2 and column=2.  For each observation, four in this instance, there is unique code for location on the table.  These can be entered as follows,

Row Column Count
1 1 153
1 2 24
2 1 105
2 2 76
  • So, 2 rows * 2 columns equals 4 observations.  That should be clear.
  • For each of the rows, there are 2 corresponding columns, that is reflected in the Count column.  The Count column represents the number of time each unique combination Row and Column occurs.

The above presents the data in an unambigous manner.  Once entered, the analysis is a matter of selecting the desired menu items, and perhaps selecting additional options for that statistic.  [Don’t forget to use the labelling facilities, as mentioned earlier, to meaningfully identify the columns/variables.  The labels that are chosen will appear in the output window.]To perform the analysis,

  • The first step is to inform SPSS that the COUNT variable represents the frequency for each unique coding of ROW and COLUMN, by invoking the WEIGHT command. To do this, click on [Data => Weight Cases]. In the resultant dialog box, enable the Weight cases by option, then move the COUNT variable into the Frequency Variable box. If this step is forgotten, the count for each cell will be 1 for the table.

  • Now that the COUNT variable has been processed as a weighted variable, select [Statistics => Summarize => Crosstabs…] to launch the controlling dialog box.
  • At the bottom of the dialog box are three buttons, with the most important being the [Statistics…] button. You must click on the [Statistics…] button and then select the Chi-square option, otherwise the statistic will not be calculated. Exploring this dialog box makes it clear that SPSS can be forced to calcuate a number of other statistics in conjuction with Chi-square. For example, one can select the various measures of association (e.g., contingency coefficient, phi and cramer’s v,…), among others.
  • Move the ROW variable into the Row(s): box, and the COLUMN variable into the Column(s):, then click [OK] to perform the analysis. A subset of the output looks like the following,

Although simple, the calculation of the Chi-square statistic is very particular about all the required steps being followed. More generally, as we enter hypothesis testing, the user should be very careful and should make use of manuals for the programme and textbooks for statistics.

  • T-test

By now, you should know that there are two forms of the t-test, one for dependent variables and one for independent variables, or observations. To inform SPSS, or any stats package for that matter, of the type of design it is necessary to have to different ways of laying out the data. For the dependent design, the two variables in question must be entered in two columns. For independent t-tests, the observations for the two groups must be uniquely coded with a Gruop variable. Like the calculation of the Chi-square statistic, these calculations will reinforce the practice of thinking about, and laying out the data in the correct format.

Dependent T-Test

To calculate this statistic, one must select [Statistics => Compare Means => Paired-Samples T Test…] after enterin the data. For this analysis, we’ll use the data from Table 7.3, in Howell.

  • Enter the data into a new datafile. Your data should look a bit like the following. That is, the two variables should occupy separate columns…
    Mnths_6 Mnths_24
    124 114
    94 88
    115 102
    110 2
    116 2
    139 2
    116 2
    110 2
    129 2
    120 2
    105 2
    88 2
    120 2
    120 2
    116 2
    105 2
    123 132

    Note that the variable names start with a letter and are less than 8 characters long. This is a bit constraining, however, one can use the variable label option to label the variable with a longer name. This more descriptive name will then be reproduced in the output window.

  • To calculate the t statistic click on [Statistics => Compare Means => Paired-Samples T Test…], then select the two variables of interest. To select the two variables, hold the [Shift] key down while using the mouse for selection. You will note that the selection box requires that variables be selected two at a time. Once the two variables have been selected, move them to the Paired Variables: list. This procedure can be repeated for each pair of variables to be analyzed. In this case, select MNTHS_6 and MNTHS_24 together, then move them to the Paired Variables list. Finally, click the [OK] button.The critical result for the current analysis will appear in the output window as follows,

    As you can see an exact t-value is provided along with an exact p-value, and this p-value is greater that the expected value of 0.025, for a two-tailed assessment. Closer examination indicates several other statistics are presented in output window.

    Quite simply, such calculations require very little effort!

    Independent T-Test

    When calculating an independent t-test, the only difference involves the way the data are formatted in the datasheet. The datasheet must include both the raw data and group coding, for each variable. For this example, the data from table 7.5 will be used. As an added bonus, the number of observations are unequal for this example.

    Take a look at the following table to get a feel for how to code the data.

    Group Exp_Con
    1 96
    1 127
    1 127
    1 119
    1 109
    1 143
    1 106
    1 109
    2 114
    2 88
    2 104
    2 104
    2 91
    2 96
    2 114
    2 132

    From the above you can see that we used the “Group” variable to code for the two variables. The value of 1 was used to code for “LBW-Experimental”, while a value of 2 was used to code for “LBW-Control”. If you’re confused please study the table, above.

    To generate the t-statistic,

    • Clik on [Statistics => Compare Means => Independent-Samples T Test] to launch the appropriate dialog box.
    • Select “exp_con” – the dependent variable list – and move it to the Test Variable(s): box.
    • Select “group” – the grouping variable list – and move it to the Grouping Variable: box.
    • The final step requires that the groups be defined. That is, one must specify that Group1 – the experimental group in this case – is coded as 1, and Group2 – the control group in this case – is coded as 2. To do this, click on the [Define Groups…] button. Click on the [Continue] button to return to the controlling dialog box.
    • Run the analysis by clicking on the [OK] button.

The output for the current analysis extracted from the output window looks like the following.

The p-value of .004 is way lower than the cutoff of 0.025, and that suggests that the means are significantly different. Further, a Levene’s Test is performed to ensure that the correct results are used. In this case the variances are equal, however, the calculations for unequal variances are also presented, among some other statistics – some not presented.In the next section we will briefly demonstrate the calculation of correlations and regression, as discussed in Chapter 9 of Howell. In truth, you should be able to work through many statistics with your current knowledge base and the help files, including correlations and regressions. Most statistics can be calculated with a few clicks of the mouse.

  • Correlations and Regression

This will be a brief tutorial, since there is very little that is required to calculate correlations and linear regressions. To calculate a simple correlation matrix, one must use [Statistics => Correlate => Bivariate…], and[Statistics => Regression => Linear] for the calculation of a linear regression.

For this section, the analyses presented in the computer section of the Correlation and Regression chapter will be replicated. To begin, enter the data as follows,

102 2.75
108 4.00
109 2.25
118 3.00
79 1.67
88 2.25
85 2.50

Simple Correlation

  • Click on [Statistics => Correlate => Bivariate…], then select and move “IQ” and “GPA” to the Variables: list. [Explore the options presented on this controlling dialog box.]
  • Click on [OK] to generate the requested statistics.

The results from output window should look like the following,

As you can see, r=0.702, and p=.000. The results suggest that the correlation is significant.
Note: In the above example we only created a correlation matrix based on two variables. The process of generating a matrix based on more than two variables is not different. That is, if the dataset consisted of 10 variables, they could have all been placed in the Variables list. The resulting matrix would include all the possible pairwise correlations.
Correlation and Regression Linear regression….it is possible to output the regression coefficients necessary to predict one variable from the other – that minimize error. To do so, one must select the [Statistics => Regression => Linear…] option. Further, there is a need to know which variable will be used as the dependent variable and which will be used as the independent variable(s). In our current example, GPA will be the dependent variable, and IQ will act as the independent variable. Specifically,

  • Initiate the procedure by clicking on [Statistics => Regression => Linear…]
  • Select and move GPA into the Dependent: variable box
  • Select andmove IQ into the Independent(s): variable box
  • Click on the [OK] to generate the statistics.

Note: A variety of options can be accessed via the buttons on the bottom half of this controlling dialog box (e.g., Statistics, Plots,…). Many more statistics can be generated by explore the additional options via the Statistics button.
Some of the results of this analysis are presented below,

The correlation is still 0.702, and the p value is still 0.000. The additional statistics are “Constant”, or a from the text, and “Slope”, or B from the text. If you recall, the dependent variable is GPA, in this case. As such, one can predict GPA with the following,

 GPA = -1.777 + 0.0448*IQ


3 thoughts on “SPSS Tutorial (Part 1)

  1. Diana July 12, 2012 at 11:49 am Reply

    it was a great experience to read such kind of great work i really enjoyed while reading this article.http://www.callacc.com

  2. noopept December 1, 2012 at 9:24 pm Reply

    Pretty! This was a really wonderful post. Thank you for supplying this information.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: