Categories
5x52 Graduate Student Development Program Evaluation

Jennifer Ann Morrow on 12 Steps on cleaning and prepping dataset

Jennifer Ann Morrow, faculty member in Evaluation, Statistics, and Measurement at the University of Tennessee, recently blogged about data cleaning and data set preparation at AEA365. She describes 12 steps in her post here, and excerpted below. This is a skill that all quantitative (and qualitative!) researchers should know how to do.

She’ll be running a Professional Development workshop  on the same topic at the upcoming Evaluation 2013 conference in Washington, DC.

1. Create a data codebook
a. Datafile names, variable names and labels, value labels, citations for instrument sources, and a project diary
2. Create a data analysis plan
a. General instructions, list of datasets, evaluation questions, variables used, and specific analyses and visuals for each evaluation question
3. Perform initial frequencies – Round 1
a. Conduct frequency analyses on every variable
4. Check for coding mistakes
a. Use the frequencies from Step 3 to compare all values with what is in your codebook. Double check to make sure you have specified missing values
5. Modify and create variables
a. Reverse code (e.g., from 1 to 5 to 5 to 1) any variables that need it, recode any variable values to match your codebook, and create any new variables (e.g., total score) that you will use in future analyses
6. Frequencies and descriptives – Round 2
a. Rerun frequencies on every variable and conduct descriptives (e.g., mean, standard deviation, skewness, kurtosis) on every continuous variable
7. Search for outliers
a. Define what an outlying score is and then decide whether or not to delete, transform, or modify outliers
8. Assess for normality
a. Check to ensure that your values for skewness and kurtosis are not too high and then decide on whether or not to transform your variable, use a non-parametric equivalent, or modify your alpha level for your analysis
9. Dealing with missing data
a. Check for patterns of missing data and then decide if you are going to delete cases/variables or estimate missing data
10. Examine cell sample size
a. Check for equal sample sizes in your grouping variables
11. Frequencies and descriptives – The finale
a. Run your final versions of frequencies and descriptives
12. Assumption testing
a. Conduct the appropriate assumption analyses based on the specific inferential statistics that you will be conducting.

 

Advertisement

By Chi Yan Lam

Dr. Chi Yan Lam is a Credentialed Evaluator and an Adjunct Assistant Professor of evaluation at the Faculty of Education and the Faculty of Health Sciences, Queen’s University; he is also a full-time evaluator practicing in public service. He specializes in evaluating large-scale, complex programs and incorporates multi-, mixed- and design methods in his evaluations to answer questions of importance to program administrators and policy makers working on educational and social programs. His articles on evaluation have been published in peer-reviewed journals, including the American Journal of Evaluation and the Canadian Journal of Program Evaluation. He has been a holder of the professional designation in evaluation since 2014.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s