R - multicore approach to extract raster values using spatial pointsHow to extract values from rasters at...
Does diversity provide anything that meritocracy does not?
How big is a framed opening for a door relative to the finished door opening width?
How much light is too much?
Charging phone battery with a lower voltage, coming from a bike charger?
Critique vs nitpicking
Concatenating two int[]
Prevent Nautilus / Nemo from creating .Trash-1000 folder in mounted devices
Closed set in topological space generated by sets of the form [a, b).
Why is that max-Q doesn't occur in transonic regime?
Can we "borrow" our answers to populate our own websites?
Word for something that's always reliable, but never the best?
Why didn't the 2019 Oscars have a host?
Memory usage: #define vs. static const for uint8_t
What are some ways of extending a description of a scenery?
A starship is travelling at 0.9c and collides with a small rock. Will it leave a clean hole through, or will more happen?
Reading Mishnayos without understanding
Caron Accent v{a} doesn't render without usepackage{xeCJK}
"Starve to death" Vs. "Starve to the point of death"
A question about partitioning positivie integers into finitely many arithmetic progresions
Book where a space ship journeys to the center of the galaxy to find all the stars had gone supernova
A fantasy book with seven white haired women on the cover
Need help with a circuit diagram where the motor does not seem to have any connection to ground. Error with diagram? Or am i missing something?
Converting very wide logos to square formats
Equivalent of "illegal" for violating civil law
R - multicore approach to extract raster values using spatial points
How to extract values from rasters at location of points in R?Really slow extraction from raster even after using crop?Creating raster layer from data frame in R?Function argument for all values in R (raster extract, and velox raster extract)Extracting Spatial Points that match Raster Value using R?Extract median value for polygons from multiresolution raster data, using R (velox, raster -package)dismo::randomPoints generates NA points with raster stack maskExtract raster values from point using GDALReclassifying raster/extracting pixel values conditionall over multipolygon shapefileUnderstanding Extract Multi Values to Points performance in ArcPy?
It might look like this question is duplicated, but I'm asking about data extraction by points, rather than by polygons, and I couldn't find hints about point extraction. Therefore, please bare with me.
I am working with a massive amount of climate files whose values need to be extracted and converted to data frames in order to serve as a database for a Shiny application.
Specifically, I need to extract daily temperature values from a raster stack for 655 locations (points) and store those values in a data frame. Given the amount of data, I estimate the final data frame to be around 30 GB in size.
Since it is such a big amount of data, I am looking to use a multicore (parallel) approach that benefits from a quad core (8 threads) system. I also consider to process these data on an Amazon EC2 instance, so I need to speed up the code as much as I can in order to prevent wasting time (i.e. money).
I know that the raster
package provides the multicore function beginCluster
that might help me, but according to the documentation it only works when extracting data using polygons, not points.
Please take a look at the code that reproduces my procedure using a simplified version of my raster stack:
library(raster)
# Create date sequence
idx <- seq(as.Date("2010/1/1"), as.Date("2099/12/31"), by = "day")
# Create raster stack and assign dates
# WARNING: raster stack will be ~ 400 MB in size
r <- raster(ncol=5, nrow=5)
s <- stack(lapply(1:length(idx), function(x) setValues(r, runif(ncell(r)))))
s <- setZ(s, idx)
# Create random spatial points
pts <- SpatialPoints(cbind(x=runif(655, -180, 180),
y=runif(655, -90, 90)),
proj4string=CRS(projection(s)))
# Extract values to a data frame
e.df <- extract(s, pts, df=TRUE)
# Fix resulting data frame
DF <- data.frame(t(e.df[-1])); row.names(DF)=NULL
DF <- cbind(getZ(s), DF)
names(DF)[1] <- "Date";
names(DF)[2:length(names(DF))] <- paste0('Point ', 1:length(pts))
In the code above, clearly the slowest part of the process is the data extraction itself.
Is there any way to improve the speed of this data extraction procedure?
Any way to implement it under a multicore function?
raster r point extract parallel-processing
add a comment |
It might look like this question is duplicated, but I'm asking about data extraction by points, rather than by polygons, and I couldn't find hints about point extraction. Therefore, please bare with me.
I am working with a massive amount of climate files whose values need to be extracted and converted to data frames in order to serve as a database for a Shiny application.
Specifically, I need to extract daily temperature values from a raster stack for 655 locations (points) and store those values in a data frame. Given the amount of data, I estimate the final data frame to be around 30 GB in size.
Since it is such a big amount of data, I am looking to use a multicore (parallel) approach that benefits from a quad core (8 threads) system. I also consider to process these data on an Amazon EC2 instance, so I need to speed up the code as much as I can in order to prevent wasting time (i.e. money).
I know that the raster
package provides the multicore function beginCluster
that might help me, but according to the documentation it only works when extracting data using polygons, not points.
Please take a look at the code that reproduces my procedure using a simplified version of my raster stack:
library(raster)
# Create date sequence
idx <- seq(as.Date("2010/1/1"), as.Date("2099/12/31"), by = "day")
# Create raster stack and assign dates
# WARNING: raster stack will be ~ 400 MB in size
r <- raster(ncol=5, nrow=5)
s <- stack(lapply(1:length(idx), function(x) setValues(r, runif(ncell(r)))))
s <- setZ(s, idx)
# Create random spatial points
pts <- SpatialPoints(cbind(x=runif(655, -180, 180),
y=runif(655, -90, 90)),
proj4string=CRS(projection(s)))
# Extract values to a data frame
e.df <- extract(s, pts, df=TRUE)
# Fix resulting data frame
DF <- data.frame(t(e.df[-1])); row.names(DF)=NULL
DF <- cbind(getZ(s), DF)
names(DF)[1] <- "Date";
names(DF)[2:length(names(DF))] <- paste0('Point ', 1:length(pts))
In the code above, clearly the slowest part of the process is the data extraction itself.
Is there any way to improve the speed of this data extraction procedure?
Any way to implement it under a multicore function?
raster r point extract parallel-processing
1
I don't see where your 30Gb comes from. 89 years x 365 days x 655 stations is 21 million. You need to be handling about 1000 bytes for each of those units to get 30Gb, where one temperature measurement is 8 bytes. 160-250Mb maybe, which is easily stored in RAM.
– Spacedman
Aug 29 '17 at 10:10
1
This is "trivially parallel" over layers in your raster stack and over points in your spatial points. Use functions from theparallel
package.
– Spacedman
Aug 29 '17 at 10:15
@Spacedman my actual raster stack is 100x95 at 0.5 deg resolution with 54750 "layers" (i.e. time slices). Also, I have 32 files like that that I need to extract data from (different climate scenarios and models). I just thought it was too big of an object for users to reproduce here.
– thiagoveloso
Aug 29 '17 at 10:46
@Spacedman I would also appreciate any more detailed suggestion regarding theparallel
package. Not very familiar with multicore processing here...
– thiagoveloso
Aug 29 '17 at 10:48
1
Suggest you find some R parallel tutorials, things like this cran.r-project.org/web/packages/doParallel/vignettes/… might get you started.
– Spacedman
Aug 29 '17 at 10:55
add a comment |
It might look like this question is duplicated, but I'm asking about data extraction by points, rather than by polygons, and I couldn't find hints about point extraction. Therefore, please bare with me.
I am working with a massive amount of climate files whose values need to be extracted and converted to data frames in order to serve as a database for a Shiny application.
Specifically, I need to extract daily temperature values from a raster stack for 655 locations (points) and store those values in a data frame. Given the amount of data, I estimate the final data frame to be around 30 GB in size.
Since it is such a big amount of data, I am looking to use a multicore (parallel) approach that benefits from a quad core (8 threads) system. I also consider to process these data on an Amazon EC2 instance, so I need to speed up the code as much as I can in order to prevent wasting time (i.e. money).
I know that the raster
package provides the multicore function beginCluster
that might help me, but according to the documentation it only works when extracting data using polygons, not points.
Please take a look at the code that reproduces my procedure using a simplified version of my raster stack:
library(raster)
# Create date sequence
idx <- seq(as.Date("2010/1/1"), as.Date("2099/12/31"), by = "day")
# Create raster stack and assign dates
# WARNING: raster stack will be ~ 400 MB in size
r <- raster(ncol=5, nrow=5)
s <- stack(lapply(1:length(idx), function(x) setValues(r, runif(ncell(r)))))
s <- setZ(s, idx)
# Create random spatial points
pts <- SpatialPoints(cbind(x=runif(655, -180, 180),
y=runif(655, -90, 90)),
proj4string=CRS(projection(s)))
# Extract values to a data frame
e.df <- extract(s, pts, df=TRUE)
# Fix resulting data frame
DF <- data.frame(t(e.df[-1])); row.names(DF)=NULL
DF <- cbind(getZ(s), DF)
names(DF)[1] <- "Date";
names(DF)[2:length(names(DF))] <- paste0('Point ', 1:length(pts))
In the code above, clearly the slowest part of the process is the data extraction itself.
Is there any way to improve the speed of this data extraction procedure?
Any way to implement it under a multicore function?
raster r point extract parallel-processing
It might look like this question is duplicated, but I'm asking about data extraction by points, rather than by polygons, and I couldn't find hints about point extraction. Therefore, please bare with me.
I am working with a massive amount of climate files whose values need to be extracted and converted to data frames in order to serve as a database for a Shiny application.
Specifically, I need to extract daily temperature values from a raster stack for 655 locations (points) and store those values in a data frame. Given the amount of data, I estimate the final data frame to be around 30 GB in size.
Since it is such a big amount of data, I am looking to use a multicore (parallel) approach that benefits from a quad core (8 threads) system. I also consider to process these data on an Amazon EC2 instance, so I need to speed up the code as much as I can in order to prevent wasting time (i.e. money).
I know that the raster
package provides the multicore function beginCluster
that might help me, but according to the documentation it only works when extracting data using polygons, not points.
Please take a look at the code that reproduces my procedure using a simplified version of my raster stack:
library(raster)
# Create date sequence
idx <- seq(as.Date("2010/1/1"), as.Date("2099/12/31"), by = "day")
# Create raster stack and assign dates
# WARNING: raster stack will be ~ 400 MB in size
r <- raster(ncol=5, nrow=5)
s <- stack(lapply(1:length(idx), function(x) setValues(r, runif(ncell(r)))))
s <- setZ(s, idx)
# Create random spatial points
pts <- SpatialPoints(cbind(x=runif(655, -180, 180),
y=runif(655, -90, 90)),
proj4string=CRS(projection(s)))
# Extract values to a data frame
e.df <- extract(s, pts, df=TRUE)
# Fix resulting data frame
DF <- data.frame(t(e.df[-1])); row.names(DF)=NULL
DF <- cbind(getZ(s), DF)
names(DF)[1] <- "Date";
names(DF)[2:length(names(DF))] <- paste0('Point ', 1:length(pts))
In the code above, clearly the slowest part of the process is the data extraction itself.
Is there any way to improve the speed of this data extraction procedure?
Any way to implement it under a multicore function?
raster r point extract parallel-processing
raster r point extract parallel-processing
edited Aug 29 '17 at 21:02
PolyGeo♦
53.6k1780240
53.6k1780240
asked Aug 29 '17 at 8:56
thiagovelosothiagoveloso
247312
247312
1
I don't see where your 30Gb comes from. 89 years x 365 days x 655 stations is 21 million. You need to be handling about 1000 bytes for each of those units to get 30Gb, where one temperature measurement is 8 bytes. 160-250Mb maybe, which is easily stored in RAM.
– Spacedman
Aug 29 '17 at 10:10
1
This is "trivially parallel" over layers in your raster stack and over points in your spatial points. Use functions from theparallel
package.
– Spacedman
Aug 29 '17 at 10:15
@Spacedman my actual raster stack is 100x95 at 0.5 deg resolution with 54750 "layers" (i.e. time slices). Also, I have 32 files like that that I need to extract data from (different climate scenarios and models). I just thought it was too big of an object for users to reproduce here.
– thiagoveloso
Aug 29 '17 at 10:46
@Spacedman I would also appreciate any more detailed suggestion regarding theparallel
package. Not very familiar with multicore processing here...
– thiagoveloso
Aug 29 '17 at 10:48
1
Suggest you find some R parallel tutorials, things like this cran.r-project.org/web/packages/doParallel/vignettes/… might get you started.
– Spacedman
Aug 29 '17 at 10:55
add a comment |
1
I don't see where your 30Gb comes from. 89 years x 365 days x 655 stations is 21 million. You need to be handling about 1000 bytes for each of those units to get 30Gb, where one temperature measurement is 8 bytes. 160-250Mb maybe, which is easily stored in RAM.
– Spacedman
Aug 29 '17 at 10:10
1
This is "trivially parallel" over layers in your raster stack and over points in your spatial points. Use functions from theparallel
package.
– Spacedman
Aug 29 '17 at 10:15
@Spacedman my actual raster stack is 100x95 at 0.5 deg resolution with 54750 "layers" (i.e. time slices). Also, I have 32 files like that that I need to extract data from (different climate scenarios and models). I just thought it was too big of an object for users to reproduce here.
– thiagoveloso
Aug 29 '17 at 10:46
@Spacedman I would also appreciate any more detailed suggestion regarding theparallel
package. Not very familiar with multicore processing here...
– thiagoveloso
Aug 29 '17 at 10:48
1
Suggest you find some R parallel tutorials, things like this cran.r-project.org/web/packages/doParallel/vignettes/… might get you started.
– Spacedman
Aug 29 '17 at 10:55
1
1
I don't see where your 30Gb comes from. 89 years x 365 days x 655 stations is 21 million. You need to be handling about 1000 bytes for each of those units to get 30Gb, where one temperature measurement is 8 bytes. 160-250Mb maybe, which is easily stored in RAM.
– Spacedman
Aug 29 '17 at 10:10
I don't see where your 30Gb comes from. 89 years x 365 days x 655 stations is 21 million. You need to be handling about 1000 bytes for each of those units to get 30Gb, where one temperature measurement is 8 bytes. 160-250Mb maybe, which is easily stored in RAM.
– Spacedman
Aug 29 '17 at 10:10
1
1
This is "trivially parallel" over layers in your raster stack and over points in your spatial points. Use functions from the
parallel
package.– Spacedman
Aug 29 '17 at 10:15
This is "trivially parallel" over layers in your raster stack and over points in your spatial points. Use functions from the
parallel
package.– Spacedman
Aug 29 '17 at 10:15
@Spacedman my actual raster stack is 100x95 at 0.5 deg resolution with 54750 "layers" (i.e. time slices). Also, I have 32 files like that that I need to extract data from (different climate scenarios and models). I just thought it was too big of an object for users to reproduce here.
– thiagoveloso
Aug 29 '17 at 10:46
@Spacedman my actual raster stack is 100x95 at 0.5 deg resolution with 54750 "layers" (i.e. time slices). Also, I have 32 files like that that I need to extract data from (different climate scenarios and models). I just thought it was too big of an object for users to reproduce here.
– thiagoveloso
Aug 29 '17 at 10:46
@Spacedman I would also appreciate any more detailed suggestion regarding the
parallel
package. Not very familiar with multicore processing here...– thiagoveloso
Aug 29 '17 at 10:48
@Spacedman I would also appreciate any more detailed suggestion regarding the
parallel
package. Not very familiar with multicore processing here...– thiagoveloso
Aug 29 '17 at 10:48
1
1
Suggest you find some R parallel tutorials, things like this cran.r-project.org/web/packages/doParallel/vignettes/… might get you started.
– Spacedman
Aug 29 '17 at 10:55
Suggest you find some R parallel tutorials, things like this cran.r-project.org/web/packages/doParallel/vignettes/… might get you started.
– Spacedman
Aug 29 '17 at 10:55
add a comment |
1 Answer
1
active
oldest
votes
I ended up using an approach based on the snowfall
package. It is quite simple, works really good and the point extraction function is as fast as the number of cores that you can use. The approach I used was inspired by this post, and here is my reproducible example:
library(raster)
library(snowfall)
# Create date sequence
idx <- seq(as.Date("2010/1/1"), as.Date("2099/12/31"), by = "day")
# Create raster stack and assign dates
# WARNING: raster stack will be ~ 400 MB in size
r <- raster(ncol=5, nrow=5)
s <- stack(lapply(1:length(idx), function(x) setValues(r, runif(ncell(r)))))
s <- setZ(s, idx)
# Create random spatial points
pts <- SpatialPoints(cbind(x=runif(655, -180, 180),
y=runif(655, -90, 90)),
proj4string=CRS(projection(s)))
# Extract values to a data frame - multicore approach
# First, convert raster stack to list of single raster layers
s.list <- unstack(s)
names(s.list) <- names(s)
# Now, create a R cluster using all the machine cores minus one
sfInit(parallel=TRUE, cpus=parallel:::detectCores()-1)
# Load the required packages inside the cluster
sfLibrary(raster)
sfLibrary(sp)
# Run parallelized 'extract' function and stop cluster
e.df <- sfSapply(s.list, extract, y=pts)
sfStop()
# Fix resulting data frame
DF <- data.frame(t(e.df)); row.names(DF)=NULL
DF <- cbind(getZ(s), DF)
names(DF)[1] <- "Date";
names(DF)[2:length(names(DF))] <- paste0('Point ', 1:length(pts))
# Check resulting data frame
head(DF)
I hope it can serve as a future reference for people interested in doing the same.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "79"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fgis.stackexchange.com%2fquestions%2f253618%2fr-multicore-approach-to-extract-raster-values-using-spatial-points%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I ended up using an approach based on the snowfall
package. It is quite simple, works really good and the point extraction function is as fast as the number of cores that you can use. The approach I used was inspired by this post, and here is my reproducible example:
library(raster)
library(snowfall)
# Create date sequence
idx <- seq(as.Date("2010/1/1"), as.Date("2099/12/31"), by = "day")
# Create raster stack and assign dates
# WARNING: raster stack will be ~ 400 MB in size
r <- raster(ncol=5, nrow=5)
s <- stack(lapply(1:length(idx), function(x) setValues(r, runif(ncell(r)))))
s <- setZ(s, idx)
# Create random spatial points
pts <- SpatialPoints(cbind(x=runif(655, -180, 180),
y=runif(655, -90, 90)),
proj4string=CRS(projection(s)))
# Extract values to a data frame - multicore approach
# First, convert raster stack to list of single raster layers
s.list <- unstack(s)
names(s.list) <- names(s)
# Now, create a R cluster using all the machine cores minus one
sfInit(parallel=TRUE, cpus=parallel:::detectCores()-1)
# Load the required packages inside the cluster
sfLibrary(raster)
sfLibrary(sp)
# Run parallelized 'extract' function and stop cluster
e.df <- sfSapply(s.list, extract, y=pts)
sfStop()
# Fix resulting data frame
DF <- data.frame(t(e.df)); row.names(DF)=NULL
DF <- cbind(getZ(s), DF)
names(DF)[1] <- "Date";
names(DF)[2:length(names(DF))] <- paste0('Point ', 1:length(pts))
# Check resulting data frame
head(DF)
I hope it can serve as a future reference for people interested in doing the same.
add a comment |
I ended up using an approach based on the snowfall
package. It is quite simple, works really good and the point extraction function is as fast as the number of cores that you can use. The approach I used was inspired by this post, and here is my reproducible example:
library(raster)
library(snowfall)
# Create date sequence
idx <- seq(as.Date("2010/1/1"), as.Date("2099/12/31"), by = "day")
# Create raster stack and assign dates
# WARNING: raster stack will be ~ 400 MB in size
r <- raster(ncol=5, nrow=5)
s <- stack(lapply(1:length(idx), function(x) setValues(r, runif(ncell(r)))))
s <- setZ(s, idx)
# Create random spatial points
pts <- SpatialPoints(cbind(x=runif(655, -180, 180),
y=runif(655, -90, 90)),
proj4string=CRS(projection(s)))
# Extract values to a data frame - multicore approach
# First, convert raster stack to list of single raster layers
s.list <- unstack(s)
names(s.list) <- names(s)
# Now, create a R cluster using all the machine cores minus one
sfInit(parallel=TRUE, cpus=parallel:::detectCores()-1)
# Load the required packages inside the cluster
sfLibrary(raster)
sfLibrary(sp)
# Run parallelized 'extract' function and stop cluster
e.df <- sfSapply(s.list, extract, y=pts)
sfStop()
# Fix resulting data frame
DF <- data.frame(t(e.df)); row.names(DF)=NULL
DF <- cbind(getZ(s), DF)
names(DF)[1] <- "Date";
names(DF)[2:length(names(DF))] <- paste0('Point ', 1:length(pts))
# Check resulting data frame
head(DF)
I hope it can serve as a future reference for people interested in doing the same.
add a comment |
I ended up using an approach based on the snowfall
package. It is quite simple, works really good and the point extraction function is as fast as the number of cores that you can use. The approach I used was inspired by this post, and here is my reproducible example:
library(raster)
library(snowfall)
# Create date sequence
idx <- seq(as.Date("2010/1/1"), as.Date("2099/12/31"), by = "day")
# Create raster stack and assign dates
# WARNING: raster stack will be ~ 400 MB in size
r <- raster(ncol=5, nrow=5)
s <- stack(lapply(1:length(idx), function(x) setValues(r, runif(ncell(r)))))
s <- setZ(s, idx)
# Create random spatial points
pts <- SpatialPoints(cbind(x=runif(655, -180, 180),
y=runif(655, -90, 90)),
proj4string=CRS(projection(s)))
# Extract values to a data frame - multicore approach
# First, convert raster stack to list of single raster layers
s.list <- unstack(s)
names(s.list) <- names(s)
# Now, create a R cluster using all the machine cores minus one
sfInit(parallel=TRUE, cpus=parallel:::detectCores()-1)
# Load the required packages inside the cluster
sfLibrary(raster)
sfLibrary(sp)
# Run parallelized 'extract' function and stop cluster
e.df <- sfSapply(s.list, extract, y=pts)
sfStop()
# Fix resulting data frame
DF <- data.frame(t(e.df)); row.names(DF)=NULL
DF <- cbind(getZ(s), DF)
names(DF)[1] <- "Date";
names(DF)[2:length(names(DF))] <- paste0('Point ', 1:length(pts))
# Check resulting data frame
head(DF)
I hope it can serve as a future reference for people interested in doing the same.
I ended up using an approach based on the snowfall
package. It is quite simple, works really good and the point extraction function is as fast as the number of cores that you can use. The approach I used was inspired by this post, and here is my reproducible example:
library(raster)
library(snowfall)
# Create date sequence
idx <- seq(as.Date("2010/1/1"), as.Date("2099/12/31"), by = "day")
# Create raster stack and assign dates
# WARNING: raster stack will be ~ 400 MB in size
r <- raster(ncol=5, nrow=5)
s <- stack(lapply(1:length(idx), function(x) setValues(r, runif(ncell(r)))))
s <- setZ(s, idx)
# Create random spatial points
pts <- SpatialPoints(cbind(x=runif(655, -180, 180),
y=runif(655, -90, 90)),
proj4string=CRS(projection(s)))
# Extract values to a data frame - multicore approach
# First, convert raster stack to list of single raster layers
s.list <- unstack(s)
names(s.list) <- names(s)
# Now, create a R cluster using all the machine cores minus one
sfInit(parallel=TRUE, cpus=parallel:::detectCores()-1)
# Load the required packages inside the cluster
sfLibrary(raster)
sfLibrary(sp)
# Run parallelized 'extract' function and stop cluster
e.df <- sfSapply(s.list, extract, y=pts)
sfStop()
# Fix resulting data frame
DF <- data.frame(t(e.df)); row.names(DF)=NULL
DF <- cbind(getZ(s), DF)
names(DF)[1] <- "Date";
names(DF)[2:length(names(DF))] <- paste0('Point ', 1:length(pts))
# Check resulting data frame
head(DF)
I hope it can serve as a future reference for people interested in doing the same.
edited 11 mins ago
answered Sep 6 '17 at 20:59
thiagovelosothiagoveloso
247312
247312
add a comment |
add a comment |
Thanks for contributing an answer to Geographic Information Systems Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fgis.stackexchange.com%2fquestions%2f253618%2fr-multicore-approach-to-extract-raster-values-using-spatial-points%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
I don't see where your 30Gb comes from. 89 years x 365 days x 655 stations is 21 million. You need to be handling about 1000 bytes for each of those units to get 30Gb, where one temperature measurement is 8 bytes. 160-250Mb maybe, which is easily stored in RAM.
– Spacedman
Aug 29 '17 at 10:10
1
This is "trivially parallel" over layers in your raster stack and over points in your spatial points. Use functions from the
parallel
package.– Spacedman
Aug 29 '17 at 10:15
@Spacedman my actual raster stack is 100x95 at 0.5 deg resolution with 54750 "layers" (i.e. time slices). Also, I have 32 files like that that I need to extract data from (different climate scenarios and models). I just thought it was too big of an object for users to reproduce here.
– thiagoveloso
Aug 29 '17 at 10:46
@Spacedman I would also appreciate any more detailed suggestion regarding the
parallel
package. Not very familiar with multicore processing here...– thiagoveloso
Aug 29 '17 at 10:48
1
Suggest you find some R parallel tutorials, things like this cran.r-project.org/web/packages/doParallel/vignettes/… might get you started.
– Spacedman
Aug 29 '17 at 10:55