Ever wished the eBird website would take your state or county lists and tell you what you’re missing? What birds you need, ranked by how easily they might be found in a given state or county during a given week? Well this past weekend I finally sat down and rigged some R code to do just that!
Here are my top 50 target birds according to the eBird data, ranked in the first/top figure by frequency of reports in eBird throughout the entire year, and ranked in the second/bottom figure by frequency of reports over the time period from last “week” (quarter month) through 2 “weeks” from now. If you were wondering, the last week in October is Week 39.
So how’d I make these? I’m glad you asked! A copy of the R script that does all the work can be found below, so without hitting all the details here is a sketch of the process.
First, I went ebird.org and downloaded my Ohio life list as a CSV file (renamed List.csv). Second, I went to the Bar Chart tool under the Explore Data portion of the website, pulled up the Ohio state list, and downloaded the histogram data (an MS Excel file). This file needed a little clean-up, as the header is full of empty rows and extra data, so these header rows were chopped so that the first row in the spreadsheet is the first row of data. That spreadsheet was then converted to a CSV file (BarChart.csv).
What the script then does is open up these files, chop the species names down to just the common name, cut out species group, hybrids, etc., the match up species names and see what’s left in the full Ohio list (my target birds!). The rest is just using the barchart info to reorder that list by how commonly those species are reported either in a given time span or cumulatively throughout the year.
## Generate a "hit list" based on a "life list" CSV download (unedited) from eBird ## and a comparable region barchart (edited to be just species rows, no titles). ## Birds I've seen LIST = read.csv("List.csv"); ## Barchart data BARCHART = read.csv("BarChart.csv", header=F, colClasses=c("character",rep("numeric",12*4))); ## Take out rows of hybrids (hybrid), "spuhs" (sp.) and Combos (Sp1/Sp2) BARCHART = BARCHART[grep("hybrid",BARCHART[,1],invert=TRUE),] BARCHART = BARCHART[grep("sp\\.",BARCHART[,1],invert=TRUE),] BARCHART = BARCHART[grep("[a-zA-Z]/[a-zA-Z]",BARCHART[,1],invert=TRUE),] ## Clean up the naming differences between LIST & BARCHART: ## Barchart = Greater White-fronted Goose (<em class="sci">Anser albifrons</em>) ## List = Greater White-fronted Goose - Anser albifrons ## By chopping at " (" and " - ", respectively. LIST[,2] = gsub(' \\- .+',"",LIST[,2]); BARCHART[,1] = gsub(' \\(.+',"",BARCHART[,1]); ## Now compare species lists! # BARCHART[!(BARCHART[,1] %in% LIST[,2]),1] NeedsChart = BARCHART[!(BARCHART[,1] %in% LIST[,2]),] ## Get date info month = as.numeric(format(Sys.time(), "%m")); # Get numeric month week = ceiling(as.numeric(format(Sys.time(), "%d"))/31*4); # get approximate "week" 1,2,3 or 4 ## Sort the whole thing by the current week, label columns NeedsChart = NeedsChart[order(NeedsChart[,(month-1)*4+week+1], decreasing=T),] names(NeedsChart) = c("Species",sapply(1:(12*4), function(x) paste("Week.",x,sep=''))) NeedsChart$Species = factor(NeedsChart$Species, levels=NeedsChart$Species); rownames(NeedsChart) = c(); ## Or specify month and week here: # month = 1; # 1,2,...,12 # week = 1; # 1,2,3 or 4 ## Or next week # week = week+1; ## Or in the coming 3 weeks, plus hopes for stragglers from last week week = -1:2 + week; ## Aggregate data # First, rowSums is smart, but dumb. Extend to handle single rows HitList = data.frame(Species=NeedsChart[,1], Rate=as.matrix(rowSums(as.matrix(NeedsChart[,1+(month-1)*4+week]))) ) HitList = HitList[order(HitList$Rate,decreasing=T),]; rownames(HitList)=c(); ## Cut species not observed in this timeframe? HitList = HitList[HitList[,2]>0,]; ## Plot part of NeedsChart library(ggplot2) # for plotting tools library(reshape2)# for melt() to reformat our data to use ggplot2 Nspecies=50 # How many species in the heat map? Nbins = 6 # How many colors/bins for the heat map? # Stick with current weeks ranking (comment out), or go for annual rate (uncomment)? NeedsChart = NeedsChart[order(rowSums(NeedsChart[,-1]),decreasing=T),]; rownames(HitList)=c(); # reformat the data for ggplot2's geom_tile() NC = melt(head(NeedsChart,Nspecies)); names(NC) = c("Species","Week","ObsRate"); NC$Week = as.numeric(gsub("Week\\.","",NC$Week)); NC$ObsRate = cut(NC$ObsRate, c(-1,seq(0,max(NC$ObsRate),length=Nbins))); # Bin data levels(NC$ObsRate)[1] = "0"; # all positive, so (-1,0] == [0,0] == 0. ## Make a heat map using the ggplot2 geom_tile() aesthetic base_size = 10 # font for lapels my.palette = colorRampPalette(c("white", "#d0f0d0", "#00dd00", "forestgreen")) g = ggplot(NC, aes(Week, Species)) + geom_tile(aes(fill=ObsRate), colour = "#f0f0f0") + scale_fill_manual(values=my.palette(Nbins)) + labs(y="") + theme_grey(base_size = base_size) + scale_x_discrete(expand = c(0,-2)) + scale_y_discrete(limits=rev(NeedsChart[1:Nspecies,1]), expand = c(0,0)) + theme(panel.grid.minor = element_line("gray"), panel.grid.major = element_line("black")) ## Show the plot, and spit the HitList out to the screen g HitList
I forgot to mention the target list (“HitList”)! Here it is:
Species Rate
1 Nelson’s Sparrow 0.030000
2 Red-throated Loon 0.020532
3 Le Conte’s Sparrow 0.020000
4 Long-tailed Duck 0.011448
5 Hudsonian Godwit 0.010591
6 American Woodcock 0.010000
7 Red Phalarope 0.010000
8 Sabine’s Gull 0.010000
9 Pacific Loon 0.010000
10 Barn Owl 0.001429
11 Yellow-bellied Flycatcher 0.001429
12 Northern Saw-whet Owl 0.001410
13 Say’s Phoebe 0.001410
14 Glossy Ibis 0.001336
15 Golden Eagle 0.001030
16 California Gull 0.001030
17 Common Redpoll 0.000990
18 Cattle Egret 0.000916
19 Piping Plover 0.000916
20 Yellow-breasted Chat 0.000916
21 Clay-colored Sparrow 0.000916
22 Brewer’s Blackbird 0.000897
23 Red Crossbill 0.000878
24 Common Eider 0.000591
25 Northern Bobwhite 0.000591
26 Ruffed Grouse 0.000591
27 American White Pelican 0.000591
28 Red Knot 0.000591
29 Northern Goshawk 0.000532
30 Connecticut Warbler 0.000532
31 Lark Sparrow 0.000532
32 Brown Pelican 0.000458
33 Yellow Rail 0.000458
34 Mew Gull 0.000458
35 Long-eared Owl 0.000458
36 Yellow-headed Blackbird 0.000458
37 Snowy Egret 0.000439
38 Parasitic Jaeger 0.000439
39 Groove-billed Ani 0.000439
40 Harris’s Sparrow 0.000439
41 Evening Grosbeak 0.000439
I like that we have just as good a chance at seeing Groove-billed Ani as we do Evening Grosbeak. …….not this year though!
Jb
Agreed! Though after a quick glance at eBird maps for the two species, it is surprising those two rank equally in the barchart histogram data. The key to interpreting that list is that those are the rankings for finding those species in late October, early November — definitely not an ideal time of year to cross paths with Evening Grosbeaks! 😉
I’d really like to try this out, but I’m afraid I don’t know how to. Could you post some instructions on how someone without knowledge of scripts/macros would go about applying this to their own lists?
Please and thank you!
Sure, Mark. Just a heads up though – you might need to learn a little bit about programming in R to really tinker with the code. This should get you going though. 🙂
First, you’ll need to download and install R (R is a programming language). You can do that by clicking following the instructions at http://www.r-project.org/. Just click “download R” then scroll down and pick the nearest location (closer = faster download). Then download the version that matches your operating system.
If it installs correctly, you should be able to run R which will open up a blank white window and wait for command line input. Use google and the FAQ on the R website for any problems that might arise during the install (normally it’s pretty easy to install, so don’t worry about problems until you run into them).
Second, you’ll want to create a directory somewhere on your computer that contains the eBird files as described in the post above. Make sure they’re formatted as I describe, otherwise the script won’t be able to read them properly.
Third, copy the code above (there should be a “copy to clipboard” link up near the top of the code) and paste it into a text file with your eBird data called something like “eBird-Needs.R”. The file name doesn’t matter, but try to make sure it has the “.R” extension. You should now have the script/code and 2 eBird CSV files together in the directory.
Fourth, you may need to modify the code or the data file names so they match.
Fifth, you’ll need to let R know where all these files are. R has a default directory name in mind when you open it up, sort of a home directory it operates from. This is called the “working directory” and you’ll want to change it to the directory that contains your files. Do this by running R, clicking “File” and setting the working directory as needed. To test that you’re in the right place, type “dir()” into the R console and it should list the files in the current directory. You can also type the command “getwd()” which will return the current working directory.
Sixth, there are a whole lot of free add on packages for R, and this code uses a few of them. We’ll need to install these just once, and the code will do the rest for us. At the R command line (aka the R console) just type install.packages(c(“ggplot2″,”reshape2”)) and hit enter. It will ask you to select a mirror to use (again, just pick something close) and should install just fine. If there are problems, use google to see if you can fix them.
Assuming that all went as planned, type ‘source(“eBird-Needs.R”)’ at the command line (or whatever you named the file) and it should run!
PS: I just noticed wordpress somehow garbled my code. If the first line that mentions “LIST” isn’t followed by “read.csv(“… check back until it’s fixed 🙂
Thanks for the code Paul. It works great! I made one small change that will allow the user to choose the list and bar chart files, making it easier to check your needs lists for different geographies (e.g. county, state, lower 48, etc). This will require downloading list and corresponding bar chart files for each geography, but they can all be kept in the same folder with descriptive names since the user will select them interactively when the R code is run. This will also bypass the fourth point in your comment above since the file names don’t need to be in the code.
## Birds I’ve seen
LIST = read.csv(file.choose());
## Barchart data
BARCHART = read.csv(file.choose(), header=F, colClasses=c(“character”,rep(“numeric”,12*4)));
Good call! I thought of doing this as well as allowing the choice of a start week and end week. If you leave the user to do a little multiplication, this could be achieved with something like
Week.Start = menu(choices=as.character(1:48), graphics=TRUE, title=”Start week (1-48)”);
Week.Final = menu(choices=as.character(1:48), graphics=TRUE, title=”End week (1-48)”);
weeks=Week.Start:Week.Final;
Then change all the month, week stuff to just use weeks. Even nicer menus could be crafted using the tcl/tk package in R.
Endless ways to have fun tweaking the code! 🙂
Awesome! I needed to add this:
setwd(“L:/Mydocu~1/Birds/eBirdNeedsScript”)
install.packages(“ggplot2”)
install.packages(“reshape2”)
It looks like it didn’t filter out the “Domestic” species, but it also looks like I haven’t removed those from eBird output in a few weeks either : )
It was interesting that the most prevalent species in the output plot is a species that used to be resident but now is accidental. I guess I could use a bar chart restricted to records from the past 20 years to fix that.
Sorry for cluttering up your page with so many comments; I obviously had not read how you already explained the working directory. Adding this line helps:
BARCHART = BARCHART[grep(“Domestic”,BARCHART[,1],invert=TRUE),]
I can’t get over how amazingly cool this is. You can do anything. Let’s say you’re working on your state year list, you’re going to be visiting two distant counties, and you’re curious if you could add any state year birds while in those two counties. Pull your state year list from eBird and the bar chart for those two counties. Done.
Glad you’re having fun with it Steve! Also, once you run these lines
install.packages(“ggplot2″)
install.packages(“reshape2″)
The packages are installed. No need to run them again, so omitting them from the script will speed it up a bit for future use 😉
I can’t get it to work. When I run the script it says: Using Species as id variables. And does nothing else.
Hi Robbie,
Run it line-by-line and see if you can’t tease out more errors. That output suggests it’s working up to a point, so it’s hard to diagnose the problem from there. Are you running the last two lines, which would show the plot (“g”) and some text (“HitList”)?
I got the same error … had to change the last line from:
g
to
show(G)
and got a graph! .. still getting the error but it seems to be a warning.
I just tried:
show(g)
instead of just:
g
and it worked. I thought I had tried that before but maybe not.
Thanks for the reply (and the script)!
Paul … I noticed there is a difference between the chart and the (g) and the HitList … I am not an R guy … so I am trying to figure out the code … any ideas? For instance … I don’t have Alder Flycatcher or Cave Swallow on my Ohio List. Alder appears on the graph but not the text list … and Cave Swallow appears on the text list but not the graph?!?
Ok … I figured it out. The HitList is sorted by what is being seen this month … the graph is an annual sort. Thanks Paul!! This was a great excuse to try out R.
Hi Matt,
Glad to see you got it straightened out, and are having fun tinkering with R! 🙂
-Paul
Really want this to work but I get : Error in rowSums(NeedsChart[, -1]) : ‘x’ must be numeric
Nevermind. Figured it out. The formatting of the barchart data was off..