Description
Hello,
We have detected that plumber is not releasing the memory when a query ends, for example, at this point we understand that there are two possible problems with the memory.
- The first is when R has released the memory but the process memory footprint doesn't go down because the allocator doesn't return unused pages to the OS.
- The second one is when R memory usage is growing because some objects are not released in a specific case.
We solved the first problem by implementing the solution proposed in this comment https://github.com/rstudio/plumber/issues/496#issuecomment-541402503
We have tried to solve the second problem by adding a postserialize hook that will run gc(). This solution generated a significant change because it frees the objects used in the API. However, we have detected that the memory consumed by the file that is being delivered to the user is not released since the postserialize occurs before the delivery of information to the user is completed.
We have detected this problem because in our case we always have an active process to attend to the API requests. If we did not have an always active process this problem would not be a concern.
Below I leave the code that we have used to test the problem, with the postserialize solution.
library(plumber)
library(readr)
#* @apiTitle Plumber Test Memory Leak
#* Download data
#* @get /download_data
#* @serializer csv
function() {
data <- data.frame(
COLUMN_1 = rep("COLUMN_1",1e+8),
COLUMN_2 = rep("COLUMN_2",1e+8),
COLUMN_3 = rep("COLUMN_3",1e+8),
COLUMN_4 = rep("COLUMN_4",1e+8),
COLUMN_5 = rep("COLUMN_5",1e+8)
)
return(data)
}
#* @plumber
function(pr) {
pr %>%
pr_hook("postserialize", function(req) {
if (req$PATH_INFO != "/" & req$PATH_INFO != "/openapi.json") {
later::later(function() {
gc(reset = TRUE, full = TRUE)
}, delay = 0)
}
})
}
I think this topic becomes relevant when we talk about APIs in a productive environment with high demand (hosted in posit connect for example).
Regards,