R-membering How To Code
It's an interesting exercise to see if you can still code in a language that you haven't used for a while. It's been three years since I used the R language for statistical analysis. I knew it was the right choice for my monthly analysis of my London Bus usage so yesterday I downloaded a copy, built a script (see below) and ran it.
It all went well. Admittedly, it was a case of Computer Programming To Be Officially Renamed 'Googling Stackoverflow' as there was almost nothing I remembered straight off. But it was quicker to re-learn the right tool rather than push myself to make an Excel-based solution or to write a C# program from scratch.
Of course, you're dying to know the results. Over ten months I have taken 652 bus journeys on 54 different routes. There are 12 routes I have used ten or more times, with the 45 bus (Camberwell to Farringdon) the most frequent at 131 times.
# ctrl-L to clear console rm(list = ls(all.names = TRUE)) files = list.files(path="c:/documents/oystercard/", pattern="*.csv") files <- paste("c:/documents/oystercard/",files, sep="") all = do.call("rbind", lapply(files, function(x) read.csv(x, stringsAsFactors = FALSE))) y <- all[ substr( all$Journey.Action, 0, 11 ) == "Bus journey", ] y <- droplevels(y) nrow(y) nlevels(y$Journey.Action) # y$Route = factor(substr(y$Journey.Action,20,23)) y$Route = substr(y$Journey.Action,20,23) y$Month <- factor(substr(y$Date,4,6)) y$Year <- as.integer(substr(y$Date,8,11)) y$Year[ y$Year < 2000 ] = 2000 + y$Year y$M <- match(y$Month,month.abb) y$YM <- y$Year * 100 + y$M summary <- as.data.frame(table(y$YM)) colnames(summary)[1] <- "YM" colnames(summary)[2] <- "Journeys" s2 <- as.data.frame(table(y$YM,y$Route)) s2 <- s2[ s2$Freq > 0, ] s2 <- as.data.frame(table(s2$Var1)) colnames(s2) <- c( "YM", "Routes") summary <- merge( s2, summary, by="YM" ) ByRoute <- as.data.frame(table(y$Route)) colnames(ByRoute) <- c( "Route", "Freq" ) PopularRoutes <- ByRoute[ ByRoute$Freq >= 10, ] stats <- NULL stats[ "Number of journeys" ] = nrow(all) stats[ "Different routes" ] = nrow(ByRoute) stats <- as.data.frame(stats) summary PopularRoutes[ with( PopularRoutes , order( -Freq ) ), ] stats ByRoute$Route = as.numeric(levels(ByRoute$Route))[ByRoute$Route] ByRoute[ with( ByRoute, order( as.numeric(Route) ) ), ]
<< Home