Tuesday, April 9, 2013

Spring Cleaning Data: 2 of 6- Changing Column Names and Adding a Column

The first post (found here) we downloaded the data and imported it to R using the gdata package. This post we will be changing the column names to make them more reasonable, and adding a quarter variable. The reason for changing the column names is because the dw.2010.q1 file column names are messed up due to the formatting done in Excel. So if I was going to have to change one, just as well change them all, so i did.

The first chunk of code defines the labels I am going to use as c.label. Then I used the colnames() function to rename each file.

#Defining the new labels
c.label<-c('loan.date', 'mat.date', 'term',
   'repay.date', 'district', 'borrower', 'city',
   'state', 'ABA', 'type.credit', 'i.rate',
   'amount', 'outstanding.credit',
   'total.outstanding', 'collateral',
   'commercial', 'residential.morg',
   'comm.real', 'consumer', 'treasury',
   'municipal', 'corp', 'mbs.cmo',
   'mbs.cmo.other', 'asset.backed',
   'internat', 'tdfd')
#Changing the column names

I also like to add a few additional variables when I see a potential need when I can. At this point the files are individual, and adding the quarter variable might be helpful. Sure I could write a loop to create the new column based on the month of the date, but I like to keep things as simple as possible. Why add complexity when there is no reason. I used the ABA to define the length of the data set because it did not have any missing values, while others did. The new column name is qtr, and the function rep() is used to repeat the quarter number the length of the column ABA.

#defining a quarter variable for future use, so I can 
#isolate quarters to compare and contrast
dw.2010.q3$qtr<-rep(3, length(dw.2010.q3$ABA))
dw.2010.q4$qtr<-rep(4, length(dw.2010.q4$ABA))
dw.2011.q1$qtr<-rep(1, length(dw.2011.q1$ABA))

Created by Pretty R at inside-R.org

No comments:

Post a Comment