About 8 years ago, I was sitting in class listening to a
guest lecturer talk about how community events can be described like celestial
bodies with their own gravity, where the size and importance of the event would
attract more people, from farther away. Much like a black hole, where the
bigger the mass of the black hole the higher the gravity.
In physics gravity is a constant, for a community event the gravity can be determined by using the number of
participants, and the distance traveled. Where the higher the number of
participants and the greater the distance traveled would show an event with
higher gravity. For example, a farmers market where only the locals come,
versus an international conference of some kind. Assuming both events attract
the same number of people, the international conference would have the higher
gravity as there would be a larger distance traveled.
Of the two elements the number of participants can be easily
found; either by number of seats sold, tickets, counting the number of people
present, and so on. The more difficult information is to determine the distance
traveled. For this two points are needed, the event (destination) and the point
of origin (person's home). The most accurate method would be to get a GPS
coordinate for every home, but this would be very difficult mainly because most
people do not know what it is, and second they are probably not willing to divulge
such information. Another alternative is to request address, again people are
becoming much more savvy about personal information, as they should be, and
getting a good sample might be problematic. The solution then lies with the zip
code, a number with the required latitude and longitude numbers that is broad enough so
people do not feel their privacy is being invaded, while still being able to
determine a reasonable distance number for each participant.
Using the zip code, and the associated lat and long
information the numbers can then be put into R code to draw fantastic maps
using the great circle inspired by Oscar Perpiñán Lamigueiro.
One question did come out of the data and that is which of the several methods
to use when drawing and determining the distance?
The question arises out of the equations assumption on the shape of the earth. Using the Geosphere package there were three equations, the first two Haversine and Vincenty Sphere both assume the earth is round. Which as it turns out it is more elliptical, so there is the Vincenty Ellipsoid. What I wanted to know was is there a big difference between the different formulas? And if so, how big?
require(geosphere) require(maps) data(us.cities) #Setting up the data, ‘ny’ is the long. and lat. for New York City, ‘all’ is a matrix of all the # cities available in the geosphere package (1005), with the long. and lat. data. ny<-c(-118.41, 34.11) all<-matrix(data=c(us.cities$long, us.cities$lat), ncol=2) #Summing the distance between NY and all the other cities in the US (1005 of them) #by so doing the error is compounded with each additional city hav<-sum(distm(ny, all, fun=distHaversine)) hav.time<-proc.time() v.sphere<-sum(distm(ny, all, fun=distVincentySphere)) v.sphere.time<-proc.time() v.ellip<-sum (distm(ny, all, fun=distVincentyEllipsoid)) v.ellip.time<-proc.time() hav.time; v.sphere.time; v.ellip.time; proc.time<-c (1.350, 1.350, 2.510) row.names<-c(‘Haversine’,’Vincenty.Sphere’, ‘Vincenty.Ellipsoid’) ny.all<-rbind(hav, v.sphere, v.ellip); ny.all<-cbind(ny.all, proc.time) rownames(ny.all)<-row.names; colnames(ny.all)<-c(‘Sum Distance’, ‘Processor Time’) ny.all #Determining the difference between the various models available in the geosphere package #Meters were conveted into miles, the largest difference between the models was approximately #1090 miles, or 1.085326 miles per city of difference, this is considerable hav.v.ellp<-(v.ellip-hav)*0.000621371192 hav.v.sphere<-abs(hav-v.sphere)*0.000621371192 hav.v.ellp; hav.v.sphere diff<-rbind(hav.v.ellp, hav.v.sphere) rownames(diff)<-c(‘Haversine-Vincenty.Ellipsoid’,’Haversine-Vincenty.Sphere’) colnames(diff)<-’Distance (miles)’; diff #what is the average error hav.v.ellp/1005
In the end the Vincenty.Ellipsoid was used as the method
for determining the distance as it was the most precise by an average margin of 1.0853 miles per
city, this is a significant margin of error when many cities are being analyzed and the extra
computing time is worth it.
The next post will show how the data can be used to analyze two different community events.
No comments:
Post a Comment