Since today, 19 Sept is talk like a pirate day, I would like to relate a story about stats and pirates. While driving, my wife was in the back seat with our young child asked me how to draw a pirate hat. Since I was driving I could not show her how to do it, so I had to figure out a way to describe it that both of us could understand. My reply, "Draw a normal distribution curve."
She looked at me kind of weird, then drew it, half laughing to her self, and it worked. A simple pirate hat kind of looks like a normal distribution. My child was happy, my wife and I were kind of giggling our selves, because some how I was able to bring stats into a child's drawing. We both were amused. Below is the R code and very crude picture.
Created by Pretty R at inside-R.org
Analysis and tutorials for statistics, economics, and public policy using the statistical programming language R, from R-Cran.
Thursday, September 19, 2013
Wednesday, May 29, 2013
Look Familiar? Mapping in R
For those who have been following the R-Bloggers this picture should be yesterdays news, but there was an article on the BBC on it, full story HERE. I find it interesting how it was on Drudge Report (www.drudgereport.com) which gets over 1 million hits a day. The article gives the perspective of 5 different points of view, the 'data visualization' interpretation is pretty interesting, as are the rest. For those who want to do more than just look at a pretty picture here are some links to help recreate a similar, if not better map.
Links:
Mapping the world Airlines
Worlds Largest Airlines
Data for Airlines
Tutorial on how to create the great circles
Friday, May 24, 2013
Down and Dirty Forecasting: Part 2
This is the second part of the forecasting exercise, where I am looking at a multiple regression. To keep it simple I chose the states that boarder WI and the US unemployment information for the regression. Again this is a down and dirty analysis, I would not call this complete in any sense.
Created by Pretty R at inside-R.org
The winner is: Regression with ARIMA Errors, for most of the analysis the simple mean like predictions dominated, even the multiple regression is considerably low on the list.
#getting the data for WI, US, IL, MI, MN, IO #Using Quandl data, great little site wi<-read.csv('http://www.quandl.com/api/v1/datasets/FRED/WIUR.csv?&auth_token=gigXwpxd6Ex91cjgz1B7&trim_start=1976-01-01&trim_end=2013-04-01&sort_order=desc', colClasses=c('Date'='Date')) us<-read.csv('http://www.quandl.com/api/v1/datasets/FRED/UNRATE.csv?&auth_token=gigXwpxd6Ex91cjgz1B7&trim_start=1976-01-01&trim_end=2013-04-01&sort_order=desc', colClasses=c('Date'='Date')) il<-read.csv('http://www.quandl.com/api/v1/datasets/FRED/ILUR.csv?&auth_token=gigXwpxd6Ex91cjgz1B7&trim_start=1976-01-01&trim_end=2013-04-01&sort_order=desc', colClasses=c('Date'='Date')) mi<-read.csv('http://www.quandl.com/api/v1/datasets/FRED/MIUR.csv?&auth_token=gigXwpxd6Ex91cjgz1B7&trim_start=1976-01-01&trim_end=2013-04-01&sort_order=desc', colClasses=c('Date'='Date')) mn<-read.csv('http://www.quandl.com/api/v1/datasets/FRED/MNUR.csv?&auth_token=gigXwpxd6Ex91cjgz1B7&trim_start=1976-01-01&trim_end=2013-04-01&sort_order=desc', colClasses=c('Date'='Date')) io<-read.csv('http://www.quandl.com/api/v1/datasets/FRED/IAUR.csv?&auth_token=gigXwpxd6Ex91cjgz1B7&trim_start=1976-01-01&trim_end=2013-04-01&sort_order=desc', colClasses=c('Date'='Date')) #merging the data into one dataframe #I started with WI so that I could isolate the rest of the variables better unemp<-merge(wi, io, by='Date'); colnames(unemp)<-c('Date', 'wi', 'io') unemp<-merge(unemp, mi, by='Date'); colnames(unemp)<-c('Date', 'wi', 'io', 'mi') unemp<-merge(unemp, mn, by='Date'); colnames(unemp)<-c('Date', 'wi', 'io', 'mi', 'mn') unemp<-merge(unemp, us, by='Date'); colnames(unemp)<-c('Date', 'wi', 'io', 'mi', 'mn', 'us') unemp<-merge(unemp, il, by='Date'); colnames(unemp)<-c('Date', 'wi', 'io', 'mi', 'mn', 'us', 'il') save(unemp, file='Unemployment WI.RData') library(car) library(lmtest) library(forecast) load('Unemployment WI.RData') unemp.b<-unemp[1:436,] unemp.p<-unemp[437:448,] plot(unemp$Date, unemp$us, col='black',
main='Unemployment WI Region', type='l', xlab='Date',
ylab='Unemp. Rate', ylim=c(2, 17)) lines(unemp$Date, unemp$il, col='blue') lines(unemp$Date, unemp$io, col='dark green') lines(unemp$Date, unemp$mi, col='purple') lines(unemp$Date, unemp$mn, col='gray') lines(unemp$Date, unemp$wi, col='red') leg.txt<-c('US', 'IL', 'IO', 'MI', 'MN', "WI") leg.col<-c('black','blue', 'dark green', 'purple', 'gray', 'red') legend('topleft', leg.txt, col=leg.col, pch=15) summary(unemp) #correlation matrix cm<-cor(unemp[,-1]) cm #From the graph and correlation matrix, everything
#is highly correlated, the lowest r value is 0.7879
#between Iowa and US. #plots of each of the data, then each individual,
#everything looks really good, nice linear trends,
#although Michigan has quite bit of volatility
#compared to the other variables. plot(unemp.b, pch='.') model<-lm(wi~., data=unemp.b[,-1]) step.m<-step(model, direction='backward') summary(step.m) #testing for outliers outlierTest(step.m) # After conducting the outlier test, looks like it is michigan #Sept 1980 where unemployment was 13.2 percent, point 57 is #close at 12.9 those are the two biggest, but the outlierTest #isolated row 56 dwtest(step.m) dwtest(step.m, alternative='two.sided') #yep lots of autocorrelation, so a multiple ARAIMA might be in order pred<-unemp.p[3:7] mult.f<-predict(step.m, newdata=pred, interval='prediction') a1<-accuracy(mult.f, unemp.p$wi) #because the MASE is missing I needed to add the following to compare later a1<-c(a1,'0') names(a1)<-c("ME", "RMSE", "MAE", "MPE", "MAPE", "MASE") # ARIMA auto.r<-auto.arima(unemp.b$wi, xreg=unemp.b[,3:7]) auto.r auto.f<-forecast(auto.r, xreg=unemp.p[,3:7], h=12) a2<-accuracy(auto.f, unemp.p$wi) a2 #Just remember that the 0 is just a place holder r.table<-rbind(a1, a2) row.names(r.table)<-c('Multiple Regression', 'Reg. with ARIMA Errors') r.table
At this point I combined the two accuracy tables to see which one was the best.
b.table<-rbind(a.table[,1:6], r.table) b.table<-b.table[order(b.table$MAPE),] b.table
The winner is: Regression with ARIMA Errors, for most of the analysis the simple mean like predictions dominated, even the multiple regression is considerably low on the list.
Labels:
ARIMA,
ARIMA regression,
order,
quandl,
regression
Down and Dirty Forecasting: Part 1
I wanted to see what I could do in a hurry using the commands found at Forecasting: Principles and Practice . I chose a simple enough data set of Wisconsin Unemployment from 1976 to the present (April 2013). I kept the last 12 months worth of data to test the accuracy of the models. The next blog post will include a multiple regression analysis. The analysis is lacking many important steps, particularly the ARIMA, but this is a down and dirty exercise.
Created by Pretty R at inside-R.org
Results so far: Looks like the mean like forecasts are doing the best, the fancy models are not doing very well.
library(forecast) library(lmtest) library(caret) #State Unemployment seasonally adjusted #http://www.quandl.com/FRED-Federal-Reserve-Economic-Data/WIUR-Unemployment-Rate-in-Wisconsin #Using Quandl data, great little site wi<-read.csv('http://www.quandl.com/api/v1/datasets/FRED/WIUR.csv?&auth_token=gigXwpxd6Ex91cjgz1B7&trim_start=1976-01-01&trim_end=2013-04-01&sort_order=desc', colClasses=c('Date'='Date')) #some minor clean up colnames(wi)<-c('date', 'rate') wi$date<-as.Date(wi$date) summary(wi) #base data, 1-436, test data 437-448 wi.b<-wi[1:436,] wi.p<-wi[437:448,] wi.ts<-ts(wi.b$rate, start=c(1976, 1), frequency=12) wi.p.ts<-ts(wi.p$rate, start=c(2012, 5), frequency=12) plot.ts(wi.ts) #Lets test some models mean<-meanf(wi.ts, 12) naive<-rwf(wi.ts, 12) s.naive<-snaive(wi.ts, 12) drift<-rwf(wi.ts, 12, drift=T) #linear fit m1<-tslm(wi.ts~trend) m2<-tslm(wi.ts~trend+season) #checking for autocorrelation res1 <- residuals(m1) par(mfrow=c(1,2)) plot(res1, ylab="Residuals",xlab="Year") Acf(res1, main="ACF of residuals") res2 <- residuals(m2) par(mfrow=c(1,2)) plot(res2, ylab="Residuals",xlab="Year") Acf(res2, main="ACF of residuals") par(mfrow=c(1,1)) #Durbin-Watson Test dwtest(m1, alt="two.sided") dwtest(m2, alt="two.sided") #yep autocorrelation city! No surprize here, due to the nature of unemployment #STL ETS Decomposition m3<-stl(wi.ts, s.window='periodic') plot(m3) m4<-ets(wi.ts, model='ZZZ') plot(m4) #ARIMA m5<-auto.arima(wi.ts) plot(forecast(m5, h=12))
#neural networks m6<-nnetar(wi.ts) m6 plot(forecast(m6, h=12)) #Testing for accuracy the first 4 models a1<-accuracy(mean, wi.p.ts) a2<-accuracy(naive, wi.p.ts) a3<-accuracy(s.naive, wi.p.ts) a4<-accuracy(drift, wi.p.ts) a.table<-rbind(a1, a2, a3, a4) #Creating the forecast and accuracy for the next 6 models f1<-forecast(m1, h=12) f2<-forecast(m2, h=12) f3<-forecast(m3, h=12) f4<-forecast(m4, h=12) f5<-forecast(m5, h=12) f6<-forecast(m6, h=12) a5<-accuracy(f1, wi.p.ts) a6<-accuracy(f2, wi.p.ts) a7<-accuracy(f3, wi.p.ts) a8<-accuracy(f4, wi.p.ts) a9<-accuracy(f5, wi.p.ts) a10<-accuracy(f6, wi.p.ts) #Combining into a table with row names a.table<-rbind(a.table, a5, a6, a7, a8, a9, a10) row.names(a.table)<-c('Mean', 'Naive', 'S. Naive', 'Drift', 'Lm~Trend', 'Lm~Trend+Sea', 'STL', 'ETS', 'ARIMA', 'Neuro') #make into a data frame so the best model is first, according to MAPE a.table<-as.data.frame(a.table) a.table<-a.table[order(a.table$MAPE),] a.table
Results so far: Looks like the mean like forecasts are doing the best, the fancy models are not doing very well.
Saturday, April 13, 2013
Spring Cleaning Data: 6 of 6- Saving the Data
With all the cleaning done, the only thing left to do is save the data to be analyzed, for future use, and I hope by others. The data I thought would be simple, but there were a few interesting twist, like the Primary Credit*, and using ifelse() to edit the districts.
I have included the product as well as the R-code in a single file for people to use and learn from. I would like to thanks all those who made comments, I find all of them helpful. Below are the links to the files generated and used in the series, and the r-code used to exporting and reloading the data.
I have included the product as well as the R-code in a single file for people to use and learn from. I would like to thanks all those who made comments, I find all of them helpful. Below are the links to the files generated and used in the series, and the r-code used to exporting and reloading the data.
List of files used and their links
Created by Pretty R at inside-R.org
Previous Posts (Part 1, Part 2, Part 3, Part 4, Part 5)
- DiscountWindow.R -the code used to download and clean up discount window data
- Districts.csv -the table used in post 5 of 6 to seperate the districts
- DiscountWindow.csv -a csv of the final product
- DiscountWindow.RData -RData of the final cleaned up product
#Export the data, csv and RData setwd("C:/Users Defined/") write.csv(dw, file='DiscountWindow.csv') save(dw, file='DiscountWindow.RData') #note when loading the data the envir= needs to be defined #with larger files the RData is definately the way to go #this file is small enough it does not matter load('DiscountWindow.RData', envir=.GlobalEnv) dw<-read.csv(file.choose(), header=T)
Previous Posts (Part 1, Part 2, Part 3, Part 4, Part 5)
Friday, April 12, 2013
Spring Cleaning Data: 5 of 6- 2 ifelse vs Merge
The blog in the data cleaning series looks at separating out
the Federal Reserve Districts. What I wanted was two additional columns, where
I had the name of the city and the number for each district. Since I was on a separation
kick I thought it would be fun to do this using ifelse() function.
Created by Pretty R at inside-R.org
Previous Posts (Part 1, Part 2, Part 3, Part 4)
Well, what
started out as a fun romp in the fields turned to an exercise in precision and frustration
that did end well, but took too much time, and too many lines of code to do
what I wanted.
While I was banging my head against the keyboard in frustration,
the thought occurred to me. Instead of using the ifelse() function, create a
table with the new columns of data then merge the original data with the table
just created. Two lines of code for both columns of data, definitely one of
those eureka moments.
The lesson in all of this, ifelse() functions are good
within a limited use, I would say 5 or less. Unless you really like doing them,
then have fun. If there are limited number of occurrences like this example 12
different districts, the table works very well. What took me 2 hours of work
using the ifelse() function, took me 15 minutes using the table method. The code
is simpler, and easier to understand. Sure, there is the extra table to be
imported, but it is small and very manageable.
I have placed the code below,
with the merge code first, followed by the ifelse() code. The table I used can
be downloaded from here (District Data). Read the district data in by using the read.csv() then merge the two files using the 'district' as the column they both have in common. The ifelse(logic, true, false), the logic is if the column looks like one of the districts, if true a 1/Boston, at the end there is the 'Error' just in case.
#Merging the data
dist<-read.csv(file.choose(), header=T) dw<-merge(dw, dist, by='district') #re-coding the district data to numerical tmp1<-ifelse(dw$district=='Boston (1)', 1, ifelse(dw$district=='New York (2)', 2, ifelse(dw$district=='Philadelphia (3)', 3, ifelse(dw$district=='Cleveland (4)', 4, ifelse(dw$district=='Richmond (5)', 5, ifelse(dw$district=='Atlanta (6)', 6, ifelse(dw$district=='Chicago (7)', 7, ifelse(dw$district=='St. Louis (8)', 8, ifelse(dw$district=='Minneapolis (9)', 9, ifelse(dw$district=='Kansas City (10)', 10, ifelse(dw$district=='Dallas (11)', 11, ifelse(dw$district=='San Francisco (12)', 12, 'Error')))))))))))) dw$dist.no<-as.numeric(tmp1) #Isolating the names, making to factor tmp2<-ifelse(dw$district=='Boston (1)', 'Boston', ifelse(dw$district=='New York (2)', 'New York', ifelse(dw$district=='Philadelphia (3)', 'Philadelphia', ifelse(dw$district=='Cleveland (4)', 'Cleveland', ifelse(dw$district=='Richmond (5)', 'Richmond', ifelse(dw$district=='Atlanta (6)', 'Atlanta', ifelse(dw$district=='Chicago (7)', 'Chicago', ifelse(dw$district=='St. Louis (8)', 'St. Louis', ifelse(dw$district=='Minneapolis (9)', 'Minneapolis', ifelse(dw$district=='Kansas City (10)', 'Kansas City', ifelse(dw$district=='Dallas (11)', 'Dallas', ifelse(dw$district=='San Francisco (12)', 'San Francisco', 'Error')))))))))))) dw$dist.city<-as.factor(tmp2)
Previous Posts (Part 1, Part 2, Part 3, Part 4)
Labels:
as.factor,
as.numeric,
ifelse,
merge,
new column,
r,
r-code,
read.csv
Thursday, April 11, 2013
Spring Cleaning Data: 4 of 6- Combining the files & Changing the Dates/Credit Type
So far the individual files have been left on their own, it is now time
to combine using the rbind function, simple enough after all we have done so
far, then the quick check with summary.
Created by Pretty R at inside-R.org
Links to the previous posts (post 1, post 2, post 3)
Now that we have one data frame, time to make larger changes
to the data. The first is to get the dates into a format that R can understand.
The as.Date() function does this by defining the variable, then the pattern for
the date. At this point, I had a hard time figuring out what each one meant;
basically you are defining what the date looks like now in the data frame, not
in the future.
For this data set the '%b %d %Y' or in other words Feb 01
2011, if the date looked like Feb-01-2011, then the code would be '%b-%d-%Y',
or if the date was 02-02-2011, then '%m-%d-%Y'. For a more comprehensive tutorial,
see the post on Quick-R.
#Changing the date variables, then #isolating the year variable for alter use library(stringr) dw$loan.date<-as.Date(dw$loan.date, '%b %d %Y') dw$mat.date<-as.Date(dw$mat.date, '%b %d %Y') dw$repay.date<-as.Date(dw$repay.date, '%b %d %Y')
At this point, I like to have two extra variables so I can
aggregate the data later for some nice results, in particular the year and the
month. The reason is I want to know if there is a difference in the years. I know there are only 2 years so far, but
every quarter new data will be released so I am setting up the code for it now.
The month I want to know if there is any seasonality to it. If I choose to I
can isolate the day, but this gets messy because February has 28/29 days, then
the rest of the months fluctuate between 30 and 31. The data is scattered and
blotchy as is, making the day too small of a unit to be useful.
The code assumes the date has been changed to the R default of YYYY-MM-DD, for the year I selected the first 4 numbers using the str_sub() function, while making it a numerical value- as.numeric(). The year and date variable I made it a factor for easier sorting and categorizing, with a similar process as above except I want both.
The next step is to change the credit type to something simpler for tables and graphs. I used the gsub, one of the most interesting and fun functions I never knew existed until I did this. Basically it will take a string then replace it with another. For this data I wanted to replace the "Primary Credit" with "primary" because it make things so much easier for graphs and tables. Then I changed it to a factor.
The code assumes the date has been changed to the R default of YYYY-MM-DD, for the year I selected the first 4 numbers using the str_sub() function, while making it a numerical value- as.numeric(). The year and date variable I made it a factor for easier sorting and categorizing, with a similar process as above except I want both.
The next step is to change the credit type to something simpler for tables and graphs. I used the gsub, one of the most interesting and fun functions I never knew existed until I did this. Basically it will take a string then replace it with another. For this data I wanted to replace the "Primary Credit" with "primary" because it make things so much easier for graphs and tables. Then I changed it to a factor.
#Changing the type of credit to one word dw$type.credit<-with(dw, gsub("Primary Credit", 'primary', type.credit)) dw$type.credit<-with(dw, gsub("Seasonal Credit", 'seasonal', type.credit)) dw$type.credit<-with(dw, gsub("Secondary Credit", 'secondary', type.credit)) #change to factor dw$type.credit<-as.factor(dw$type.credit) summary(dw)
Links to the previous posts (post 1, post 2, post 3)
Labels:
as.Date,
as.factor,
as.numeric,
dates,
dates and r,
gsub,
r,
r-code,
stringr,
with
Wednesday, April 10, 2013
Spring Cleaning Data: 3 of 6- The Little but Big Correction
Building on the previous posts (post 1 & post 2) I found there were 12 instances with
the type of credit where there was a "Primary*" which means the
lender borrowed twice in the same day, in the 2010 q4 data. It would seem simple enough in Excel, use
the filter, find the 12 instances, delete the "*" and be done. For R
this turned out to be much more difficult to do. Mainly because the
"*" is an operator used in multiplication, so it is hard to get rid
of. What seemed to be a minor correction turned into something much bigger. Why
would I go through all the trouble?
Mainly because I know that there will be cases where I will
be dealing with data much larger than this one, that will not fit into Excel.
Where I will have to be able to make corrections in R because I will not be
able to do it otherwise. Plus, it is a good way to hone my data manipulation
skills.
I used the following, but failed doing it (suggestions for improvement are welcomed):
- Gsub()- did not work because of the '*'
- Variable[condition]<-replacement- looking for a numerical value, which there is none
- Ifelse()- did not find the difference between Primary and Primary*
- Used the stringer package to try and delete the last character, but did not work
So what I did was use the subset() function to flush the
problem variables out. Knowing there was 12 I replaced the wrong variables with
the correction, using the rep() function. Then I made sure it was a factor
using the as.factor().
#Correcting the Primary Credit* #Step 1 isolate the rows of data tmp<-subset(x=dw.2010.q4, subset=type.credit=='Primary Credit*') #Step 2 change the column of data to the correct label tmp$type.credit<-rep('Primary Credit', 12) #Step 3 make sure the data is a factor tmp$type.credit<-as.factor(tmp$type.credit) summary(tmp)
I then erased the problem variables from the original data
using the file[-which(file$variable=='Primary Credit*'),]. Note the use of the
'-' before the which to remove those rows. Then I added the cleaned up rows to
the data. I am not particularly interested in keeping the data in any
particular order as the date will keep things in chronological order.
#Step 4 remove the Primary Credit*
Labels:
factor,
r,
r-code,
rbind,
remove row,
remove rows,
subset,
summary
Tuesday, April 9, 2013
Spring Cleaning Data: 2 of 6- Changing Column Names and Adding a Column
The first post (found here) we downloaded the data and imported it to R
using the gdata package. This post we will be changing the column names to make
them more reasonable, and adding a quarter variable. The reason for changing
the column names is because the dw.2010.q1 file column names are messed up due
to the formatting done in Excel. So if I was going to have to change one, just
as well change them all, so i did.
The first chunk of code defines the labels I am going to use
as c.label. Then
I used the colnames() function to rename each file.
#Defining the new labels c.label<-c('loan.date', 'mat.date', 'term', 'repay.date', 'district', 'borrower', 'city', 'state', 'ABA', 'type.credit', 'i.rate', 'amount', 'outstanding.credit', 'total.outstanding', 'collateral', 'commercial', 'residential.morg', 'comm.real', 'consumer', 'treasury', 'municipal', 'corp', 'mbs.cmo', 'mbs.cmo.other', 'asset.backed', 'internat', 'tdfd') #Changing the column names colnames(dw.2010.q3)<-c.label colnames(dw.2010.q4)<-c.label colnames(dw.2011.q1)<-c.label
I also like to add a few additional variables when I see a potential need when I can. At this point the files are individual, and adding the quarter variable might be helpful. Sure I could write a loop to create the new column based on the month of the date, but I like to keep things as simple as possible. Why add complexity when there is no reason. I used the ABA to define the length of the data set because it did not have any missing values, while others did. The new column name is qtr, and the function rep() is used to repeat the quarter number the length of the column ABA.
Monday, April 8, 2013
Spring Cleaning Data: 1of 6- Downloading the Data & Opening Excel Files
With spring in the air, I thought it would be fun to do a series on (spring) cleaning data. The posts will follow my efforts to to download the
data, import into R, cleaned it up, merge the different files, add columns of
information created, and then a master file exported. During the process I will
be offering at times different ways to do things, this is an attempt to show
how there is no one way of doing something, but there are several. When
appropriate I will demonstrate as many as I can think of, given the data.
This series of posts will be focusing on the Discount Window of the Federal Reserve. I know I seem to be picking on the Feds, but I am genuinely interested in what they have. The fact that there is data on the discount window is, to be blunt, took legislation from congress to get. The first step in this project was to find the data. The data and additional information can be downloaded here.
Created by Pretty R at inside-R.org
This series of posts will be focusing on the Discount Window of the Federal Reserve. I know I seem to be picking on the Feds, but I am genuinely interested in what they have. The fact that there is data on the discount window is, to be blunt, took legislation from congress to get. The first step in this project was to find the data. The data and additional information can be downloaded here.
The data is in Excel format, with 3 sheets, the first sheet
is the information I am interested in, the second is a definition sheet, and
then the last is a list of interest rates (which are in the first sheet). At
the present date there are 3 excel files, 2010- 3 quarter, 2010- 4 quarter, and
2011- 1 quarter. The data is made available with about a 2 year lag.
There are 2 approaches when it comes to downloading the
data.
- Download the data, clean it up in Excel, save the one sheet as a *.csv and open using read.csv(file, header=T)
- Download the data using R, import using the gdata package, and clean the data up using R
With a data set where I only have a couple of files to do,
and they are relatively small (less than 1,000 rows of data) I will chose
option 1. But if there are more than like 10 files, with more than 1,000 rows
of data, I will start using R more. With the discount window I would normally
use Excel because the data does not come out very often, and the files are not
that big.
But where would the fun be in programming in R?
The first thing to do is download Perl (assuming you don't
have it) from this link.
The reason is the gdata package needs to have perl to work.
Next install the gdata package, and then find the location for your perl.exe
(if windows) and have it handy.
The code below begins with setup the libraries needed, then defining the links to be used. The final section of code is the gdata function to open the files that are in Excel.
The code below begins with setup the libraries needed, then defining the links to be used. The final section of code is the gdata function to open the files that are in Excel.
# Federal Reserve Data- Discount Window # http://www.federalreserve.gov/newsevents/reform_discount_window.htm # Downloading and coding Discount Window data
# from the Federal Reserve library(gdata) # Defining the individual links link1<-"http://www.federalreserve.gov/newsevents/files/dw_data_2010_q3.xls" link2<-'http://www.federalreserve.gov/newsevents/files/dw_data_2010_q4.xls' link3<-'http://www.federalreserve.gov/newsevents/files/dw_data_2011_q1.xls'
The
dw.2010.q3 is the label for the file, the read.xls is the gdata function, link1
refers to the internet address defined above, sheet=1 means we want the first
sheet, then I need to skip the first 3 lines, perl='' is the location of the
perl.exe that I mentioned would be needed.
# Downloading the files and opening in R # Make sure to have Perl installed before using the
# read.xls() function dw.2010.q3<-read.xls(link1, sheet=1, skip=3, perl="C:/location/bin/perl.exe")
dw.2010.q4<-read.xls(link2, sheet=1, skip=3, perl="C:/location/bin/perl.exe") dw.2011.q1<-read.xls(link3, sheet=1, skip=3, perl="C:/location/bin/perl.exe") # checking to see if it worked summary(dw.2010.q3) summary(dw.2010.q4) summary(dw.2011.q1)
Subscribe to:
Posts (Atom)