Now that we have one data frame, time to make larger changes
to the data. The first is to get the dates into a format that R can understand.
The as.Date() function does this by defining the variable, then the pattern for
the date. At this point, I had a hard time figuring out what each one meant;
basically you are defining what the date looks like now in the data frame, not
in the future.
For this data set the '%b %d %Y' or in other words Feb 01
2011, if the date looked like Feb-01-2011, then the code would be '%b-%d-%Y',
or if the date was 02-02-2011, then '%m-%d-%Y'. For a more comprehensive tutorial,
see the post on Quick-R.
#Changing the date variables, then #isolating the year variable for alter use library(stringr) dw$loan.date<-as.Date(dw$loan.date, '%b %d %Y') dw$mat.date<-as.Date(dw$mat.date, '%b %d %Y') dw$repay.date<-as.Date(dw$repay.date, '%b %d %Y')
At this point, I like to have two extra variables so I can
aggregate the data later for some nice results, in particular the year and the
month. The reason is I want to know if there is a difference in the years. I know there are only 2 years so far, but
every quarter new data will be released so I am setting up the code for it now.
The month I want to know if there is any seasonality to it. If I choose to I
can isolate the day, but this gets messy because February has 28/29 days, then
the rest of the months fluctuate between 30 and 31. The data is scattered and
blotchy as is, making the day too small of a unit to be useful.
The code assumes the date has been changed to the R default of YYYY-MM-DD, for the year I selected the first 4 numbers using the str_sub() function, while making it a numerical value- as.numeric(). The year and date variable I made it a factor for easier sorting and categorizing, with a similar process as above except I want both.
The next step is to change the credit type to something simpler for tables and graphs. I used the gsub, one of the most interesting and fun functions I never knew existed until I did this. Basically it will take a string then replace it with another. For this data I wanted to replace the "Primary Credit" with "primary" because it make things so much easier for graphs and tables. Then I changed it to a factor.
The code assumes the date has been changed to the R default of YYYY-MM-DD, for the year I selected the first 4 numbers using the str_sub() function, while making it a numerical value- as.numeric(). The year and date variable I made it a factor for easier sorting and categorizing, with a similar process as above except I want both.
The next step is to change the credit type to something simpler for tables and graphs. I used the gsub, one of the most interesting and fun functions I never knew existed until I did this. Basically it will take a string then replace it with another. For this data I wanted to replace the "Primary Credit" with "primary" because it make things so much easier for graphs and tables. Then I changed it to a factor.
#Changing the type of credit to one word dw$type.credit<-with(dw, gsub("Primary Credit", 'primary', type.credit)) dw$type.credit<-with(dw, gsub("Seasonal Credit", 'seasonal', type.credit)) dw$type.credit<-with(dw, gsub("Secondary Credit", 'secondary', type.credit)) #change to factor dw$type.credit<-as.factor(dw$type.credit) summary(dw)
Links to the previous posts (post 1, post 2, post 3)
Thanks so much for these posts: they're quite helpful. At the top of this post (4 of 6), you mention that it's "simple enough" to use the rbind function to combine the remaining files. For completeness, as well as for those who are new to R, could you post the code on how to do this as well?
ReplyDeleteThanks!