tidy data

tidy data

The 3 principles of tidy data are that each variable forms a column, each observation forms a row, and that each type of observational unit forms a table. The intent of the tidy data methodology is to make information appear more legible. This would be useful for gathering historical data. For example, in class, we created a spreadsheet listing data regarding old death certificates. Properly labeling different categories of data is useful for being able to easily read the data. There may also be circumstances where different columns could be merged into one; for the purpose of simplicity. Likewise, some researchers may prefer to have additional categories, for the purpose of detail. So the organization of data is subjective, to some extent.

The tidy data set report is a guideline, but not a strict requirement. The content does raise a lot of interesting questions about different scenarios. For example, there may be some variables where data is available for a given observation, but not for all, or other, observations. There is also the possibility that the method at which data is organized, may not transfer easily, from one spreadsheet program to another.

 One technique that I would like to consider is the method of not using column labels that look like codes. I find that sometimes people will organize information in such a way, that the information is understood by the writer, but not as well from third party observers. One idea that this article presents is experimentation with different methods of organization. The writer seems to have a lot of experience with the organization of data. Another technique that I admire is the idea of grouping observations of the same name. The example that the writer used was a list of songs that were organized based on popularity, as part of the ‘billboard top 100’. The author appears to have vertically grouped the different weeks that a song appeared on the list; so as not to force viewers to scroll through the list horizontally. In contrast, sometimes, if a set of variables is not too long, then organizing the observations horizontally, as opposed to vertically, might be more aesthetically pleasing. So the structure of the data seems to be a subjective balancing act. With pattern recognition, I could imagine being able to create new variables or associations between variables, if there were observed correlations. With these observations, perhaps new data tables could be created.

Leave a Reply

Your email address will not be published. Required fields are marked *

css.php