Jennifer Ann Morrow, faculty member in Evaluation, Statistics, and Measurement at the University of Tennessee, recently blogged about data cleaning and data set preparation at AEA365. She describes 12 steps in her post here, and excerpted below. This is a skill that all quantitative (and qualitative!) researchers should know how to do.
She’ll be running a Professional Development workshop on the same topic at the upcoming Evaluation 2013 conference in Washington, DC.
1. Create a data codebook a. Datafile names, variable names and labels, value labels, citations for instrument sources, and a project diary 2. Create a data analysis plan a. General instructions, list of datasets, evaluation questions, variables used, and specific analyses and visuals for each evaluation question 3. Perform initial frequencies – Round 1 a. Conduct frequency analyses on every variable 4. Check for coding mistakes a. Use the frequencies from Step 3 to compare all values with what is in your codebook. Double check to make sure you have specified missing values 5. Modify and create variables a. Reverse code (e.g., from 1 to 5 to 5 to 1) any variables that need it, recode any variable values to match your codebook, and create any new variables (e.g., total score) that you will use in future analyses 6. Frequencies and descriptives – Round 2 a. Rerun frequencies on every variable and conduct descriptives (e.g., mean, standard deviation, skewness, kurtosis) on every continuous variable 7. Search for outliers a. Define what an outlying score is and then decide whether or not to delete, transform, or modify outliers 8. Assess for normality a. Check to ensure that your values for skewness and kurtosis are not too high and then decide on whether or not to transform your variable, use a non-parametric equivalent, or modify your alpha level for your analysis 9. Dealing with missing data a. Check for patterns of missing data and then decide if you are going to delete cases/variables or estimate missing data 10. Examine cell sample size a. Check for equal sample sizes in your grouping variables 11. Frequencies and descriptives – The finale a. Run your final versions of frequencies and descriptives 12. Assumption testing a. Conduct the appropriate assumption analyses based on the specific inferential statistics that you will be conducting.