Monday, April 8, 2013

Spring Cleaning Data: 1of 6- Downloading the Data & Opening Excel Files

With spring in the air, I thought it would be fun to do a series on (spring) cleaning data. The posts will follow my efforts to to download the data, import into R, cleaned it up, merge the different files, add columns of information created, and then a master file exported. During the process I will be offering at times different ways to do things, this is an attempt to show how there is no one way of doing something, but there are several. When appropriate I will demonstrate as many as I can think of, given the data.

This series of posts will be focusing on the Discount Window of the Federal Reserve. I know I seem to be picking on the Feds, but I am genuinely interested in what they have. The fact that there is data on the discount window is, to be blunt, took legislation from congress to get. The first step in this project was to find the data. The data and additional information can be downloaded here.

The data is in Excel format, with 3 sheets, the first sheet is the information I am interested in, the second is a definition sheet, and then the last is a list of interest rates (which are in the first sheet). At the present date there are 3 excel files, 2010- 3 quarter, 2010- 4 quarter, and 2011- 1 quarter. The data is made available with about a 2 year lag.

There are 2 approaches when it comes to downloading the data.
  1. Download the data, clean it up in Excel, save the one sheet as a *.csv and open using read.csv(file, header=T)
  2. Download the data using R, import using the gdata package, and clean the data up using R

With a data set where I only have a couple of files to do, and they are relatively small (less than 1,000 rows of data) I will chose option 1. But if there are more than like 10 files, with more than 1,000 rows of data, I will start using R more. With the discount window I would normally use Excel because the data does not come out very often, and the files are not that big. 

But where would the fun be in programming in R?

The first thing to do is download Perl (assuming you don't have it) from this link.

The reason is the gdata package needs to have perl to work. Next install the gdata package, and then find the location for your perl.exe (if windows) and have it handy.

The code below begins with setup the libraries needed, then defining the links to be used. The final section of code is the gdata function to open the files that are in Excel.


# Federal Reserve Data- Discount Window
# http://www.federalreserve.gov/newsevents/reform_discount_window.htm
# Downloading and coding Discount Window data 
# from the Federal Reserve
library(gdata)
 
# Defining the individual links
link1<-"http://www.federalreserve.gov/newsevents/files/dw_data_2010_q3.xls"
link2<-'http://www.federalreserve.gov/newsevents/files/dw_data_2010_q4.xls'
link3<-'http://www.federalreserve.gov/newsevents/files/dw_data_2011_q1.xls'

The dw.2010.q3 is the label for the file, the read.xls is the gdata function, link1 refers to the internet address defined above, sheet=1 means we want the first sheet, then I need to skip the first 3 lines, perl='' is the location of the perl.exe that I mentioned would be needed.
# Downloading the files and opening in R
# Make sure to have Perl installed before using the 
# read.xls() function
dw.2010.q3<-read.xls(link1, sheet=1, skip=3, perl="C:/location/bin/perl.exe")
dw.2010.q4<-read.xls(link2, sheet=1, skip=3, perl="C:/location/bin/perl.exe")
dw.2011.q1<-read.xls(link3, sheet=1, skip=3, perl="C:/location/bin/perl.exe")
 
# checking to see if it worked
summary(dw.2010.q3)
summary(dw.2010.q4)
summary(dw.2011.q1)
Created by Pretty R at inside-R.org

2 comments:

  1. Good start.

    I hadn't ever seen the gdata package before, and I'd always avoided
    using Excel spreadsheets in R as I heard they were a total bear to manage.

    Looking forward to more in the series.

    ReplyDelete
  2. Very helpful! Thanks for sharing.

    ReplyDelete