Custom eBird Needs Lists

Ever wished the eBird website would take your state or county lists and tell you what you’re missing? What birds you need, ranked by how easily they might be found in a given state or county during a given week?  Well this past weekend I finally sat down and rigged some R code to do just that!

Here are my top 50 target birds according to the eBird data, ranked in the first/top figure by frequency of reports in eBird throughout the entire year, and ranked in the second/bottom figure by frequency of reports over the time period from last “week” (quarter month) through 2 “weeks” from now. If you were wondering, the last week in October is Week 39.

Top 50 Target Birds for Ohio (ranked by annual occurrence)

Top 50 Target Birds for Ohio (ranked by annual occurrence).

Top 50 Target Birds for Ohio (ranked by annual occurrence from last week through 2 weeks from now)

Top 50 Target Birds for Ohio (ranked by annual occurrence over week 38-41.

So how’d I make these? I’m glad you asked! A copy of the R script that does all the work can be found below, so without hitting all the details here is a sketch of the process.

First, I went ebird.org and downloaded my Ohio life list as a CSV file (renamed List.csv). Second, I went to the Bar Chart tool under the Explore Data portion of the website, pulled up the Ohio state list, and downloaded the histogram data (an MS Excel file). This file needed a little clean-up, as the header is full of empty rows and extra data, so these header rows were chopped so that the first row in the spreadsheet is the first row of data. That spreadsheet was then converted to a CSV file (BarChart.csv).

What the script then does is open up these files, chop the species names down to just the common name, cut out species group, hybrids, etc., the match up species names and see what’s left in the full Ohio list (my target birds!). The rest is just using the barchart info to reorder that list by how commonly those species are reported either in a given time span or cumulatively throughout the year.

## Generate a "hit list" based on a "life list" CSV download (unedited) from eBird
## and a comparable region barchart (edited to be just species rows, no titles).

## Birds I've seen
LIST = read.csv("List.csv");

## Barchart data
BARCHART = read.csv("BarChart.csv", header=F, colClasses=c("character",rep("numeric",12*4)));

## Take out rows of hybrids (hybrid), "spuhs" (sp.) and Combos (Sp1/Sp2)
BARCHART = BARCHART[grep("hybrid",BARCHART[,1],invert=TRUE),]
BARCHART = BARCHART[grep("sp\\.",BARCHART[,1],invert=TRUE),]
BARCHART = BARCHART[grep("[a-zA-Z]/[a-zA-Z]",BARCHART[,1],invert=TRUE),]

## Clean up the naming differences between LIST & BARCHART:
##    Barchart = Greater White-fronted Goose (<em class="sci">Anser albifrons</em>)
##    List     = Greater White-fronted Goose - Anser albifrons
## By chopping at " (" and " - ", respectively.
LIST[,2] = gsub(' \\- .+',"",LIST[,2]); 
BARCHART[,1] = gsub(' \\(.+',"",BARCHART[,1]); 

## Now compare species lists!
# BARCHART[!(BARCHART[,1] %in% LIST[,2]),1]
NeedsChart = BARCHART[!(BARCHART[,1] %in% LIST[,2]),]

## Get date info
month = as.numeric(format(Sys.time(), "%m")); # Get numeric month
week = ceiling(as.numeric(format(Sys.time(), "%d"))/31*4); # get approximate "week" 1,2,3 or 4

## Sort the whole thing by the current week, label columns
NeedsChart = NeedsChart[order(NeedsChart[,(month-1)*4+week+1], decreasing=T),]
names(NeedsChart) = c("Species",sapply(1:(12*4), function(x) paste("Week.",x,sep='')))
NeedsChart$Species = factor(NeedsChart$Species, levels=NeedsChart$Species); rownames(NeedsChart) = c();

## Or specify month and week here:
# month = 1; # 1,2,...,12
# week  = 1; # 1,2,3 or 4

## Or next week
# week = week+1;

## Or in the coming 3 weeks, plus hopes for stragglers from last week
week = -1:2 + week;

## Aggregate data
# First, rowSums is smart, but dumb.  Extend to handle single rows

HitList = data.frame(Species=NeedsChart[,1], 
                      Rate=as.matrix(rowSums(as.matrix(NeedsChart[,1+(month-1)*4+week])))   )
HitList = HitList[order(HitList$Rate,decreasing=T),]; rownames(HitList)=c();
## Cut species not observed in this timeframe?
HitList = HitList[HitList[,2]>0,]; 


## Plot part of NeedsChart
library(ggplot2) # for plotting tools
library(reshape2)# for melt() to reformat our data to use ggplot2

Nspecies=50 # How many species in the heat map?
Nbins   = 6 # How many colors/bins for the heat map?

# Stick with current weeks ranking (comment out), or go for annual rate (uncomment)?
NeedsChart = NeedsChart[order(rowSums(NeedsChart[,-1]),decreasing=T),]; rownames(HitList)=c();

# reformat the data for ggplot2's geom_tile() 
NC = melt(head(NeedsChart,Nspecies)); names(NC) = c("Species","Week","ObsRate");
NC$Week = as.numeric(gsub("Week\\.","",NC$Week)); 
NC$ObsRate = cut(NC$ObsRate, c(-1,seq(0,max(NC$ObsRate),length=Nbins))); # Bin data
levels(NC$ObsRate)[1] = "0"; # all positive, so (-1,0] == [0,0] == 0.

## Make a heat map using the ggplot2 geom_tile() aesthetic
base_size = 10 # font for lapels
my.palette = colorRampPalette(c("white", "#d0f0d0", "#00dd00", "forestgreen"))

g = ggplot(NC, aes(Week, Species)) + geom_tile(aes(fill=ObsRate), colour = "#f0f0f0") + 
	scale_fill_manual(values=my.palette(Nbins)) + labs(y="") +
	theme_grey(base_size = base_size) + scale_x_discrete(expand = c(0,-2)) +
    scale_y_discrete(limits=rev(NeedsChart[1:Nspecies,1]), expand = c(0,0)) +
	theme(panel.grid.minor = element_line("gray"), 
       panel.grid.major = element_line("black"))

## Show the plot, and spit the HitList out to the screen	   
g       
HitList	
This entry was posted in Uncategorized. Bookmark the permalink.

21 Responses to Custom eBird Needs Lists

  1. Paul Hurtado says:

    I forgot to mention the target list (“HitList”)! Here it is:
    Species Rate
    1 Nelson’s Sparrow 0.030000
    2 Red-throated Loon 0.020532
    3 Le Conte’s Sparrow 0.020000
    4 Long-tailed Duck 0.011448
    5 Hudsonian Godwit 0.010591
    6 American Woodcock 0.010000
    7 Red Phalarope 0.010000
    8 Sabine’s Gull 0.010000
    9 Pacific Loon 0.010000
    10 Barn Owl 0.001429
    11 Yellow-bellied Flycatcher 0.001429
    12 Northern Saw-whet Owl 0.001410
    13 Say’s Phoebe 0.001410
    14 Glossy Ibis 0.001336
    15 Golden Eagle 0.001030
    16 California Gull 0.001030
    17 Common Redpoll 0.000990
    18 Cattle Egret 0.000916
    19 Piping Plover 0.000916
    20 Yellow-breasted Chat 0.000916
    21 Clay-colored Sparrow 0.000916
    22 Brewer’s Blackbird 0.000897
    23 Red Crossbill 0.000878
    24 Common Eider 0.000591
    25 Northern Bobwhite 0.000591
    26 Ruffed Grouse 0.000591
    27 American White Pelican 0.000591
    28 Red Knot 0.000591
    29 Northern Goshawk 0.000532
    30 Connecticut Warbler 0.000532
    31 Lark Sparrow 0.000532
    32 Brown Pelican 0.000458
    33 Yellow Rail 0.000458
    34 Mew Gull 0.000458
    35 Long-eared Owl 0.000458
    36 Yellow-headed Blackbird 0.000458
    37 Snowy Egret 0.000439
    38 Parasitic Jaeger 0.000439
    39 Groove-billed Ani 0.000439
    40 Harris’s Sparrow 0.000439
    41 Evening Grosbeak 0.000439

  2. Anonymous says:

    I like that we have just as good a chance at seeing Groove-billed Ani as we do Evening Grosbeak. …….not this year though!

    Jb

  3. Paul Hurtado says:

    Agreed! Though after a quick glance at eBird maps for the two species, it is surprising those two rank equally in the barchart histogram data. The key to interpreting that list is that those are the rankings for finding those species in late October, early November — definitely not an ideal time of year to cross paths with Evening Grosbeaks! 😉

  4. Mark Field says:

    I’d really like to try this out, but I’m afraid I don’t know how to. Could you post some instructions on how someone without knowledge of scripts/macros would go about applying this to their own lists?

    • Mark Field says:

      Please and thank you!

    • Paul Hurtado says:

      Sure, Mark. Just a heads up though – you might need to learn a little bit about programming in R to really tinker with the code. This should get you going though. 🙂

      First, you’ll need to download and install R (R is a programming language). You can do that by clicking following the instructions at http://www.r-project.org/. Just click “download R” then scroll down and pick the nearest location (closer = faster download). Then download the version that matches your operating system.

      If it installs correctly, you should be able to run R which will open up a blank white window and wait for command line input. Use google and the FAQ on the R website for any problems that might arise during the install (normally it’s pretty easy to install, so don’t worry about problems until you run into them).

      Second, you’ll want to create a directory somewhere on your computer that contains the eBird files as described in the post above. Make sure they’re formatted as I describe, otherwise the script won’t be able to read them properly.

      Third, copy the code above (there should be a “copy to clipboard” link up near the top of the code) and paste it into a text file with your eBird data called something like “eBird-Needs.R”. The file name doesn’t matter, but try to make sure it has the “.R” extension. You should now have the script/code and 2 eBird CSV files together in the directory.

      Fourth, you may need to modify the code or the data file names so they match.

      Fifth, you’ll need to let R know where all these files are. R has a default directory name in mind when you open it up, sort of a home directory it operates from. This is called the “working directory” and you’ll want to change it to the directory that contains your files. Do this by running R, clicking “File” and setting the working directory as needed. To test that you’re in the right place, type “dir()” into the R console and it should list the files in the current directory. You can also type the command “getwd()” which will return the current working directory.

      Sixth, there are a whole lot of free add on packages for R, and this code uses a few of them. We’ll need to install these just once, and the code will do the rest for us. At the R command line (aka the R console) just type install.packages(c(“ggplot2″,”reshape2”)) and hit enter. It will ask you to select a mirror to use (again, just pick something close) and should install just fine. If there are problems, use google to see if you can fix them.

      Assuming that all went as planned, type ‘source(“eBird-Needs.R”)’ at the command line (or whatever you named the file) and it should run!

      PS: I just noticed wordpress somehow garbled my code. If the first line that mentions “LIST” isn’t followed by “read.csv(“… check back until it’s fixed 🙂

  5. Thanks for the code Paul. It works great! I made one small change that will allow the user to choose the list and bar chart files, making it easier to check your needs lists for different geographies (e.g. county, state, lower 48, etc). This will require downloading list and corresponding bar chart files for each geography, but they can all be kept in the same folder with descriptive names since the user will select them interactively when the R code is run. This will also bypass the fourth point in your comment above since the file names don’t need to be in the code.

    ## Birds I’ve seen
    LIST = read.csv(file.choose());

    ## Barchart data
    BARCHART = read.csv(file.choose(), header=F, colClasses=c(“character”,rep(“numeric”,12*4)));

  6. Paul Hurtado says:

    Good call! I thought of doing this as well as allowing the choice of a start week and end week. If you leave the user to do a little multiplication, this could be achieved with something like

    Week.Start = menu(choices=as.character(1:48), graphics=TRUE, title=”Start week (1-48)”);
    Week.Final = menu(choices=as.character(1:48), graphics=TRUE, title=”End week (1-48)”);
    weeks=Week.Start:Week.Final;

    Then change all the month, week stuff to just use weeks. Even nicer menus could be crafted using the tcl/tk package in R.

    Endless ways to have fun tweaking the code! 🙂

  7. Steve Collins says:

    Awesome! I needed to add this:
    setwd(“L:/Mydocu~1/Birds/eBirdNeedsScript”)
    install.packages(“ggplot2”)
    install.packages(“reshape2”)

  8. Steve Collins says:

    It looks like it didn’t filter out the “Domestic” species, but it also looks like I haven’t removed those from eBird output in a few weeks either : )

    It was interesting that the most prevalent species in the output plot is a species that used to be resident but now is accidental. I guess I could use a bar chart restricted to records from the past 20 years to fix that.

  9. Steve Collins says:

    Sorry for cluttering up your page with so many comments; I obviously had not read how you already explained the working directory. Adding this line helps:
    BARCHART = BARCHART[grep(“Domestic”,BARCHART[,1],invert=TRUE),]

    I can’t get over how amazingly cool this is. You can do anything. Let’s say you’re working on your state year list, you’re going to be visiting two distant counties, and you’re curious if you could add any state year birds while in those two counties. Pull your state year list from eBird and the bar chart for those two counties. Done.

  10. Paul Hurtado says:

    Glad you’re having fun with it Steve! Also, once you run these lines
    install.packages(“ggplot2″)
    install.packages(“reshape2″)
    The packages are installed. No need to run them again, so omitting them from the script will speed it up a bit for future use 😉

  11. Robbie LaCelle says:

    I can’t get it to work. When I run the script it says: Using Species as id variables. And does nothing else.

    • Paul Hurtado says:

      Hi Robbie,
      Run it line-by-line and see if you can’t tease out more errors. That output suggests it’s working up to a point, so it’s hard to diagnose the problem from there. Are you running the last two lines, which would show the plot (“g”) and some text (“HitList”)?

      • Matt E says:

        I got the same error … had to change the last line from:
        g
        to
        show(G)
        and got a graph! .. still getting the error but it seems to be a warning.

  12. Robbie LaCelle says:

    I just tried:
    show(g)
    instead of just:
    g

    and it worked. I thought I had tried that before but maybe not.

    Thanks for the reply (and the script)!

  13. Matt E says:

    Paul … I noticed there is a difference between the chart and the (g) and the HitList … I am not an R guy … so I am trying to figure out the code … any ideas? For instance … I don’t have Alder Flycatcher or Cave Swallow on my Ohio List. Alder appears on the graph but not the text list … and Cave Swallow appears on the text list but not the graph?!?

  14. Bart says:

    Really want this to work but I get : Error in rowSums(NeedsChart[, -1]) : ‘x’ must be numeric

Leave a comment