A brief foray into parallel processing with R

I’ve recently been dabbling with parallel processing in R and have found the foreach package to be a useful approach to increasing efficiency of loops. To date, I haven’t had much of a need for these tools but I’ve started working with large datasets that can be cumbersome to manage. My first introduction to parallel processing was somewhat intimidating since I am surprisingly naive about basic computer jargon – processors, CPUs, RAM, flux capacitors, etc. According the the CRAN task view, parallel processing became directly available in R beginning with version 2.14.0, and a quick look at the web page provides an astounding group of packages that explicitly or implicitly allow parallel processing utilities.

In my early days of programming I made liberal use of for loops for repetitive tasks. Not until much later I realized that for loops are incredibly inefficient at processing data in R. This is common knowledge among programmers but I was completely unaware of these issues given my background in the environmental sciences. I had always assumed that my hardware was more than sufficient for any data analysis needs, regardless of poor programming techniques. After a few watershed moments I soon learned the error of my ways and starting adopting more efficient coding techniques, e.g., vectorizing, apply functions, etc., in addition to parallel processing.

A couple months ago I started using the foreach package with for loops. To be honest, I think loops are unavoidable at times regardless of how efficient you are with programming. Two things struck me when I starting using this package. First, I probably could have finished my dissertation about a year earlier had I been using parallel processing. And two, the functions are incredibly easy to use even if you don’t understand all of the nuances and jargon of computer speak. My intent of this blog is to describe how the foreach package can be used to quickly transform traditional for loops to allow parallel processing. Needless to mention, numerous tutorials covering this topic can be found with a quick Google search. I hope my contribution helps those with little or no experience in parallel processing to adopt some of these incredibly useful tools.

I’ll use a trivial example of a for loop to illustrate repeated execution of a simple task. For 10 iterations, we are creating a normally-distributed random variable (1000000 samples), taking a summary, and appending the output to a list.

#number of iterations in the loop
iters<-10

#vector for appending output
ls<-vector('list',length=iters)

#start time
strt<-Sys.time()

#loop
for(i in 1:iters){

	#counter
	cat(i,'\n')

	to.ls<-rnorm(1e6)
	to.ls<-summary(to.ls)
	
	#export
	ls[[i]]<-to.ls
		
	}

#end time
print(Sys.time()-strt)
# Time difference of 2.944168 secs

The code executes quickly so we don’t need to worry about computation time in this example. For fun, we can see how computation time increases if we increase the number of iterations. I’ve repeated the above code with an increasing number of iterations, 10 to 100 at intervals of 10.

#iterations to time
iters<-seq(10,100,by=10)

#output time vector for  iteration sets
times<-numeric(length(iters))

#loop over iteration sets
for(val in 1:length(iters)){
	
	cat(val,' of ', length(iters),'\n')
	
	to.iter<-iters[val]
	
	#vector for appending output
	ls<-vector('list',length=to.iter)

	#start time
	strt<-Sys.time()

	#same for loop as before
	for(i in 1:to.iter){
	
		cat(i,'\n')
		
		to.ls<-rnorm(1e6)
		to.ls<-summary(to.ls)
		
		#export
		ls[[i]]<-to.ls
		
		}

	#end time
	times[val]<-Sys.time()-strt
	
	}

#plot the times
library(ggplot2)

to.plo<-data.frame(iters,times)
ggplot(to.plo,aes(x=iters,y=times)) + 
	geom_point() +
	geom_smooth() + 
	theme_bw() + 
	scale_x_continuous('No. of loop iterations') + 
	scale_y_continuous ('Time in seconds')
Fig: Processing time as a function of number of iterations for a simple loop.



The processing time increases linearly with the number of iterations. Again, processing time is not extensive for the above example. Suppose we wanted to run the example with ten thousand iterations. We can predict how long that would take based on the linear relationship between time and iterations.

#predict times
mod<-lm(times~iters)
predict(mod,newdata=data.frame(iters=1e4))/60
# 45.75964

This is all well and good if we want to wait around for 45 minutes. Running the loop in parallel would greatly decrease this time. I want to first illustrate the problem of running loops in sequence before I show how this can done using the foreach package. If the above code is run with 1e4 iterations, a quick look at the performance metrics in the task manager (Windows 7 OS) gives you an idea of how hard your computer is working to process the code. My machine has eight processors and you can see that only a fraction of them are working while the script is running.

Fig: Resources used during sequential processing of a for loop.



Running the code using foreach will make full use of the computer’s processors. Individual chunks of the loop are sent to each processor so that the entire process can be run in parallel rather than in sequence. That is, each processor gets a finite set of the total number of iterations, i.e., iterations 1–100 goes to processor one, iterations 101–200 go to processor two, etc. The output from each processor is then compiled after the iterations are completed. Here’s how to run the code with 1e4 iterations in parallel.

#import packages
library(foreach)
library(doParallel)
	
#number of iterations
iters<-1e4

#setup parallel backend to use 8 processors
cl<-makeCluster(8)
registerDoParallel(cl)

#start time
strt<-Sys.time()

#loop
ls<-foreach(icount(iters)) %dopar% {
	
	to.ls<-rnorm(1e6)
	to.ls<-summary(to.ls)
	to.ls
	
	}

print(Sys.time()-strt)
stopCluster(cl)

#Time difference of 10.00242 mins

Running the loop in parallel decreased the processing time about four-fold. Although the loop generally looks the same as the sequential version, several parts of the code have changed. First, we are using the foreach function rather than for to define our loop. The syntax for specifying the iterator is slightly different with foreach as well, i.e., icount(iters) tells the function to repeat the loop a given number of times based on the value assigned to iters. Additionally, the convention %dopar% specifies that the code is to be processed in parallel if a backend has been registered (using %do% will run the loop sequentially). The functions makeCluster and registerDoParallel from the doParallel package are used to create the parallel backend. Another important issue is the method for recombining the data after the chunks are processed. By default, foreach will append the output to a list which we’ve saved to an object. The default method for recombining output can be changed using the .combine argument. Also be aware that packages used in the evaluated expression must be included with the .packages argument.

The processors should be working at full capacity if the the loop is executed properly. Note the difference here compared to the first loop that was run in sequence.

Fig: Resources used during parallel processing of a for loop.



A few other issues are worth noting when using the foreach package. These are mainly issues I’ve encountered and I’m sure others could contribute to this list. The foreach package does not work with all types of loops. I can’t say for certain the exact type of data that works best, but I have found that functions that take a long time when run individually are generally handled very well. For example, I chose the above example to use a large number (1e6) of observations with the rnorm function. Interestingly, decreasing the number of observations and increasing the number of iterations may cause the processors to not run at maximum efficiency (try rnorm(100) with 1e5 iterations). I also haven’t had much success running repeated models in parallel. The functions work but the processors never seem to reach max efficiency. The system statistics should cue you off as to whether or not the functions are working.

I also find it bothersome that monitoring progress seems is an issue with parallel loops. A simple call using cat to return the iteration in the console does not work with parallel loops. The most practical solution I’ve found is described here, which involves exporting information to a separate file that tells you how far the loop has progressed. Also, be very aware of your RAM when running processes in parallel. I’ve found that it’s incredibly easy to max out the memory, which not only causes the function to stop working correctly, but also makes your computer run like garbage. Finally, I’m a little concerned that I might be destroying my processors by running them at maximum capacity. The fan always runs at full blast leading me to believe that critical meltdown is imminent. I’d be pleased to know if this is an issue or not.

That’s it for now. I have to give credit to this tutorial for a lot of the information in this post. There are many, many other approaches to parallel processing in R and I hope this post has been useful for describing a few of these simple tools.

Cheers,

Marcus

29 thoughts on “A brief foray into parallel processing with R

  1. A brief foray into parallel processing with R |...

  2. Hey Marcus.

    Nice blog post as always, I started out with much the same experience as you in terms of using R for loops for too much. Vectorizing solves a lot of the problems as you said, but sometimes its hard to vectorize. One addition to your blogpost should be that Rcpp is a good way to increase efficiency and bring computation time down without resorting to using more processing cores. The gains in speed are often many many times greater than what you get from adding more CPU power. See forexample:
    http://www.r-bloggers.com/another-nice-rcpp-example/

    I think your blog post should include a mention of Rcpp.

    Cheers, Olav

    • Hi Olav, glad you liked the post. I’ve never used Rcpp since I have no experience with C#. The example you link is really impressive though. There is definitely merit in decreasing processing time w/o using additional cores.

  3. Hello—this is an excellent and helpful article.

    This example assumes the inputs to the loop are static…but what if they change for each iteration of the loops? How can you specify that in this framework? For instance, if for each iteration of the loop you had to specify a new data date?

    Thank you.

    • Actually, I’ve figured it out.

      Rather than, say, “foreach(icount(iters))” you could use instead “foreach(i=1:iters)” where the input could vary with i-th loop.

      • Yep, that’s how you do it! I actually never use the icount function since it doesn’t jive with my understanding of traditional loops….

  4. FYI, you can use the append-to-file arguments of the cat() function and avoid having to use sink(). For example, cat(‘iteration #’, iter, ‘\n’, file = ‘txt.log’, append = TRUE). Of course, the output will not be in order since the instances will depend on the individual jobs, but still useful as an approximation of iterations completed. I really do wish there was a better way of capturing progress when using foreach.
    I’d also suggest not using all of your cpu cores. keep at least 1 available for general processing. I have a 4-core laptop, and I’ve found tying up all 4 is actually less efficient than running 3 in some cases.

    • I’ve never heard any definitive rules about tying up cores but I like your suggestion. I was always a bit unsure if using all was a good idea. Regarding cat(), I was unaware you could send the output to a file. I remember trying that option with the sink function but found the text file got really cluttered after several iterations. Depends on your preference I guess. Thanks for reading!

  5. Hi, thanks for your post! Could you add some counter to the %dopar%? I would like to know the progress of the process and could make it work with print.
    Thanks

    • Hi Gabriela,

      This isn’t a simple issue since the processes are assigned to individual cores. It’s like running different R sessions for each processor. I usually include the following within the loop:

      # log
      sink('log.txt')
      cat(i)
      sink()
      

      This outputs the progress for the separate processes into the same log file. It’s not ideal since you have to physically open the file to view the progress but it’s better than nothing. The file contents refresh automatically if you open it in RStudio, so it’s kind of like an automatic counter. Also see the comment above from skwalas for a simpler implementation using just cat.

      -Marcus

  6. Great post, thank you! I learned a lot. There was a moment of confusion over a typo for me. To save others inexperienced R-programmers from the confusion could you correct the typo?

    “The functions makeParallel and registerDoParallel from the doParallel package are used to create the parallel backend.”
    I believe should be
    “The functions makeCluster and registerDoParallel from the doParallel package are used to create the parallel backend.”

    Thank you.

  7. Hello, thanks for the helpful article.
    FYI, I suspect that your CPU is 4 physical core with 2 logical core each. It’s called hyper threading, mainly to decrease CPU waiting time. The reason I say this is that by using 8 threads, you decrease your processing time four fold.
    For multi-thread processing, it’s best when you use the number of physical cores from my experience in other language but I’m not sure.

    • I was thinking exactly that: I’ve had the same issue on my Intel Core i7 CPU when running heavy multi-core loads (not in R though). It has four physical cores each of which can operate two “spins”, giving a total of eight (virtual) cores. As I understand it the only performance gain is at low to medium loading of each core: as the load increases the performance doesn’t scale linearly and the maximum one can get is essentially four cores operating flat-out (which is a considerable improvement on just the one).

  8. Weekly links round-up: 21/03/2015 | BES Quantitative Ecology Blog

  9. Thanks for your post. It is helpful for me. I have one question. Is it allowed to use two foreach loops. That is in first one calls the function, in the second one inside function there is a foreach

    Label<-foreach(i =1:10, .combine=rbind) %dopar%
    {
    bh_classification(Data)
    }

    function<-bh_classification(Data)
    {
    lab<-foreach(i=1:5, .combine=rbind)%dopar%
    {
    Data%*%Data
    }

    • Hi Bharath,

      The first option is probably the better approach but I have had some success putting foreach in functions. The problem is you’ll need to setup the parallel backend as well, which kind of defeats the purpose of putting foreach in a function. You could also do this within the function but this seems to make it unnecessarily complicated.

      -Marcus

  10. “for loops are incredibly inefficient at processing data”

    well, that’s true in R which has to interpret for loops, whereas IDL, fortran etc do not (and hence makes the practise practical).

    One of the biggest failures of R is the reliance of c++ heritage, which makes it poor unless you vectorize *everything*.

    Nice post though and thank you for the tips.

  11. Parallel Programming In R | loneharoon

  12. R with Parallel Computing from User Perspectives – ParallelR

  13. run a for loop in parallel in R – Read For Learn

  14. run a for loop in parallel in R - PhotoLens

  15. run a for loop in parallel in R – w3toppers.com

Leave a comment