Tuesday, 5 June 2012

Digitize linear and (semi-)log scale graphs with multiple point sets 5.6.12

Working on a paper, I ran into the problem of needing data from a graph that was not mine, and for which no underlying table was published. With today's software packages, it is however not very difficult to digitize a figure yourself. I remembered reading something about it on R-bloggers or in the R journal, and it turns out both had useful information. The R package to go for is 'digitize', of which you find the publication here, and a blog post on how to use it here. You can install it the usual way:


I now like to use this example from Gelhar et al. (1992), since I was actually looking at dispersivity data. The figure can be found at http://goo.gl/niJhi (other versions: http://goo.gl/rlXSP, http://goo.gl/WPvYQ - it is quite a famous paper you know - and a pdf of the paper seems to be available here).


The figure gives longitudinal dispersivity in function of scale, as obtained by a large number of authors. Now suppose we do our own experiments to determine dispersivity at a certain scale in a certain sediment. It would be very useful to compare the results to this compilation of literature values. This paper shows the data in a table though, but this is not always true. Especially for older papers, it might be difficult to retrieve the actual data, and this is where the digitize package comes in. When the graph shows several point sets (and you want to digitize them separately), and has one or two log-scale axes, the simple wrapper function at the bottom of this page will make the task at hand a lot easier! The function arguments are the following:
  • name: Name of or path to the figure (has to be *.jpg; convert with GIMP if necessary)
  • x1,x2,y1,y2: Minimum and maximum values of the x and y axes
  • sets: Number of point sets you want to digitize separately (default 1)
  • setlabels: Labels of the different point sets (numbers by default)
  • log: Argument similar to the standard R plot argument for logarithmic axes (can take 'x','y' or 'xy')
  • xlab, ylab: Optional specification of the axes in the plot that is generated by the function
The command I used:

digitize.graph('gelhar.jpg',10E-1,10E5,10E-3,10E5,sets=3,setlabels=c('high','intermediate','low'),log='xy', xlab='Scale (m)',ylab='Longitudinal Dispersivity (m)')

First you have to mark the 4 points on the axes, and then you can click on all points of the first point set, click finish, continue with the next, etc. The function returns a dataset with x and y coordinates and the labels corresponding to the different point sets. Easy to program, but very convenient!

digitize.graph <- function(name,x1,x2,y1,y2,sets=1,setlabels=1:sets,log='',xlab='x axis',ylab='y axis')
    dataset <- data.frame(x=NULL,y=NULL,lab=NULL)
    cat('Mark axes min and max values \n')
    axes.points <- ReadAndCal(name)
    if(log=='x'){x1 <- log10(x1);x2 <- log10(x2)}
    if(log=='y'){y1 <- log10(y1);y2 <- log10(y2)}
    if(log=='xy'){x1 <- log10(x1);x2 <- log10(x2);y1 <- log10(y1);y2 <- log10(y2)}
    for(i in 1:sets)
      cat(paste('Mark point set "',setlabels[i],'"\n',sep=''))
      data.points <- DigitData(col = 'red')
      dat <- Calibrate(data.points, axes.points, x1, x2, y1, y2)      
      dat$lab <- rep(setlabels[i],nrow(dat))
      dataset <- rbind(dat, dataset)
    if(log=='x'){dataset$x <- 10^(dataset$x)}
    if(log=='y'){dataset$y <- 10^(dataset$y)}
    if(log=='xy'){dataset$x <- 10^(dataset$x);dataset$y <- 10^(dataset$y)}
    legend('bottomright',setlabels, pch=1:sets,col=1:sets, bty='n')
5 Bart Rogiers: Digitize linear and (semi-)log scale graphs with multiple point sets Working on a paper, I ran into the problem of needing data from a graph that was not mine, and for which no underlying table was published....


  1. Nice code, but.... Digitizing image curves is MUCH easier to do with either the Shareware app DataThief or the freeware engauge (engauge.sorceforget.net). You'll also find that these tools are very good at selecting all points on a line automatically, saving a ton of your time :-)


    1. I agree Carl, this is probably not the most efficient way to digitize plots, but if you're working all the time in R, it's just easy to have an option to do this there. I will checkout those apps though! Thanks, Bart

  2. im2graph (http://www.im2graph.co.il) is a free digitizing software that converts graph-images to graph-data, that is to numbers.
    With behind the scenes fast and efficient image processing algorithms, im2graph effortlessly converts images to graphs with a click of a button. For more complex use cases,a highly tuned user-assisted GUI is available, enabling fast and effortless conversions from graph images to values. Using im2graph is intuitive, addictive and fun. You can even use im2graph to convert text images to X-Y locations.

    1. Thanks for the info! Seems like a very nice tool. I have been using engauge for quite some time now, but I'll give it a try!


< >