Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing httr::write_disk to actually create files #121

Open
davidorme opened this issue May 11, 2023 · 14 comments
Open

Allowing httr::write_disk to actually create files #121

davidorme opened this issue May 11, 2023 · 14 comments

Comments

@davidorme
Copy link

Session Info
r$> devtools::session_info() 
─ Session info ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.2.0 (2022-04-22)
 os       macOS Monterey 12.1
 system   x86_64, darwin21.3.0
 ui       unknown
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Europe/London
 date     2023-05-11
 pandoc   2.11.1.1 @ /usr/local/bin/pandocPackages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 ! package     * version    date (UTC) lib source
   base64enc     0.1-3      2015-07-28 [1] CRAN (R 4.2.0)
   brio          1.1.3      2021-11-30 [1] CRAN (R 4.2.0)
   cachem        1.0.6      2021-08-19 [1] CRAN (R 4.2.0)
   callr         3.7.0      2021-04-20 [1] CRAN (R 4.2.0)
   cellranger    1.1.0      2016-07-27 [1] CRAN (R 4.2.0)
   chron         2.3-57     2022-05-30 [1] CRAN (R 4.2.0)
   class         7.3-20     2022-01-16 [2] CRAN (R 4.2.0)
   classInt      0.4-7      2022-06-10 [1] CRAN (R 4.2.0)
   cli           3.4.1      2022-09-23 [1] CRAN (R 4.2.0)
   crayon        1.5.1      2022-03-26 [1] CRAN (R 4.2.0)
   crul          1.3        2022-09-03 [1] CRAN (R 4.2.0)
   curl          4.3.2      2021-06-23 [1] CRAN (R 4.2.0)
   DBI           1.1.3      2022-06-18 [1] CRAN (R 4.2.0)
   desc          1.4.1      2022-03-06 [1] CRAN (R 4.2.0)
   devtools      2.4.3      2021-11-30 [1] CRAN (R 4.2.0)
   dplyr         1.0.10     2022-09-01 [1] CRAN (R 4.2.0)
   e1071         1.7-11     2022-06-07 [1] CRAN (R 4.2.0)
   ellipsis      0.3.2      2021-04-29 [1] CRAN (R 4.2.0)
   fansi         1.0.3      2022-03-24 [1] CRAN (R 4.2.0)
   fastmap       1.1.0      2021-01-25 [1] CRAN (R 4.2.0)
   fauxpas       0.5.2      2023-05-03 [1] CRAN (R 4.2.0)
   fs            1.5.2      2021-12-08 [1] CRAN (R 4.2.0)
   generics      0.1.3      2022-07-05 [1] CRAN (R 4.2.0)
   glue          1.6.2      2022-02-24 [1] CRAN (R 4.2.0)
   httpcode      0.3.0      2020-04-10 [1] CRAN (R 4.2.0)
   httr        * 1.4.3      2022-05-04 [1] CRAN (R 4.2.0)
   igraph        1.3.4      2022-07-19 [1] CRAN (R 4.2.0)
   jsonlite      1.8.0      2022-02-22 [1] CRAN (R 4.2.0)
   KernSmooth    2.23-20    2021-05-03 [2] CRAN (R 4.2.0)
   lifecycle     1.0.3      2022-10-07 [1] CRAN (R 4.2.0)
   magrittr      2.0.3      2022-03-30 [1] CRAN (R 4.2.0)
   memoise       2.0.1      2021-11-26 [1] CRAN (R 4.2.0)
   pillar        1.7.0      2022-02-01 [1] CRAN (R 4.2.0)
   pkgbuild      1.3.1      2021-12-20 [1] CRAN (R 4.2.0)
   pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.2.0)
   pkgload       1.2.4      2021-11-30 [1] CRAN (R 4.2.0)
   prettyunits   1.1.1      2020-01-24 [1] CRAN (R 4.2.0)
   processx      3.5.3      2022-03-25 [1] CRAN (R 4.2.0)
   proxy         0.4-27     2022-06-09 [1] CRAN (R 4.2.0)
   ps            1.7.0      2022-04-23 [1] CRAN (R 4.2.0)
   purrr         0.3.4      2020-04-17 [1] CRAN (R 4.2.0)
   R6            2.5.1      2021-08-19 [1] CRAN (R 4.2.0)
   Rcpp          1.0.9      2022-07-08 [1] CRAN (R 4.2.0)
   readxl        1.4.0      2022-03-28 [1] CRAN (R 4.2.0)
   remotes       2.4.2      2021-11-30 [1] CRAN (R 4.2.0)
   rlang         1.0.6      2022-09-24 [1] CRAN (R 4.2.0)
   rprojroot     2.0.3      2022-04-02 [1] CRAN (R 4.2.0)
   rstudioapi    0.13       2020-11-12 [1] CRAN (R 4.2.0)
 P safedata    * 1.1.2.9000 2023-05-03 [?] load_all()
   sessioninfo   1.2.2      2021-12-06 [1] CRAN (R 4.2.0)
   sf            1.0-11     2023-03-15 [1] CRAN (R 4.2.0)
   testthat    * 3.1.4      2022-04-26 [1] CRAN (R 4.2.0)
   tibble        3.1.7      2022-05-03 [1] CRAN (R 4.2.0)
   tidyselect    1.2.0      2022-10-10 [1] CRAN (R 4.2.0)
   triebeard     0.3.0      2016-08-04 [1] CRAN (R 4.2.0)
   units         0.8-0      2022-02-05 [1] CRAN (R 4.2.0)
   urltools      1.7.3      2019-04-14 [1] CRAN (R 4.2.0)
   usethis       2.1.5      2021-12-09 [1] CRAN (R 4.2.0)
   utf8          1.2.2      2021-07-24 [1] CRAN (R 4.2.0)
   vctrs         0.4.1      2022-04-13 [1] CRAN (R 4.2.0)
   webmockr    * 0.9.0      2023-02-28 [1] CRAN (R 4.2.0)
   whisker       0.4        2019-08-28 [1] CRAN (R 4.2.0)
   withr         2.5.0      2022-03-03 [1] CRAN (R 4.2.0)

 [1] /usr/local/lib/R/4.2/site-library
 [2] /usr/local/Cellar/r/4.2.0/lib/R/library

 P ── Loaded and on-disk path mismatch.

I'm trying to use webmockr to mock an API for my safedata package, to allow code examples and vignette building in packaging to run without requiring a running server providing the API.

The issue I'm having is that I'd like to have request stub actually create a file from the response body. That is how httr::GET with write_disk works without mocking - the response body is written to the named file. However, when the request is stubbed then I think write_disk is ignored and replaced with whatever is set in the to_return body, which could be an existing file containing the response or some other in memory response. Or at least I think that is how it works (see R code below).

I can see why this is used - there's no duplication of content and running httr:content on the response object gets you the body - but in my case I want to use the request to build up a local file structure in the same way that it would if I wasn't mocking the API.

Is there a way to get webmockr to provide a response body from a file (or directly!) and allow httr to write it out to a new file?

r$> ## make a temp file
r$> f <- tempfile(fileext = ".json")
r$> ## write something to the file
r$> cat("{\"hello\":\"world\"}\n", file = f)
r$> httr_mock()
r$> stub_request("get", "https://httpbin.org/get") %>% 
            to_return(body = file(f), 
             headers = list('content-type' = "application/json"))
r$> # Stubbed request ignores write disk and points to body file
r$> out <- GET("https://httpbin.org/get", write_disk('new_file.txt'))
r$> out
Response [https://httpbin.org/get]
  Date: 2023-05-11 23:25
  Status: 200
  Content-Type: application/json
  Size: 18 B
<ON DISK>  /var/folders/62/cz0ctby96lq7ggnytfbtd__40000gn/T//RtmpKOpQFB/filec99b220f73af.json
r$> file.exists('new_file.txt')
[1] FALSE
r$> stub_request("get", "https://httpbin.org/get2") %>% 
            to_return(body = readLines(f), 
             headers = list('content-type' = "application/json"))
r$> # Again, write_disk is ignored and in memory body is used
r$> out <- GET("https://httpbin.org/get2", write_disk('new_file2.txt'))
r$> out
Response [https://httpbin.org/get2]
  Date: 2023-05-11 23:49
  Status: 200
  Content-Type: application/json
  Size: 17 B
r$> file.exists('new_file2.txt')
[1] FALSE
r$> httr::content(out)
$hello
[1] "world"
r$> # Turn off mocking
r$> httr_mock(FALSE)
r$> # 'Normal' usage writes the content to the named file.
r$> out <- GET("https://httpbin.org/get", write_disk('new_file.txt'))
r$> out
Response [https://httpbin.org/get]
  Date: 2023-05-11 22:25
  Status: 200
  Content-Type: application/json
  Size: 367 B
<ON DISK>  new_file.txt
r$> file.exists('new_file.txt')
[1] TRUE
@sckott
Copy link
Collaborator

sckott commented May 12, 2023

Thanks for the issue! I'll take a look and get back to you soon

@davidorme
Copy link
Author

davidorme commented May 12, 2023

It may well be out of scope but it wasn't obvious that it wouldn't work. I'm not trying to record an existing API - I can see that vcr would be the right tool for the job - just enforce that a particular URL will result in content from a local file being passed in to httr::GET and then into whichever output option is set there.

@sckott
Copy link
Collaborator

sckott commented May 15, 2023

I assume you saw ?mocking-disk-writing given your examples?

See also #57 and ropensci/vcr#81

I think the issue is that for this to work with webmockr the file has to exist already. In your example, you're creating that tempfile f, but then trying to use a completely new file in write_disk that doesn't exist yet.

See also the function mock_file(), which might be helpful.

@davidorme
Copy link
Author

davidorme commented May 16, 2023

If I understand the webmockr design correctly:

  • in un-mocked use, httr::GET takes the body content from the URL and puts that into the file specified in write_disk, which can be a completely new file.
  • In mocked use, the webmockr::HttrAdapter takes the content from a file on disk but when GET is used with write_disk substitutes the path to that file for any named file. That means that the httr::content can retrieve the content from the response object without duplicating the file. But that mechanism does preclude creating a duplicate file in a new location.
  • In my specific use case, I don't just want to mock the httr::content but also create the file when using httr::GET.

Is that right? My use case seems like it might be outside the package design and scope. It would either need an extra argument (actual_outfile) to be passed to HttrAdapter or possibly a package setting (actually_duplicate) to modify the behaviour and that seems like a clunky edge case.

@sckott
Copy link
Collaborator

sckott commented May 16, 2023

  • Correct on httr w/o mocking
  • I think that's right. You can always just use overwrite=TRUE in write_disk
  • Right, that's clear

I don't think it's outside what webmockr can do. Can you try the below example that uses mock_file, i think that satisfies your use case of creating a new file during the get request:

library(httr)
f <- tempfile(fileext = ".json")
file.exists(f)
stub_request("get", "https://hb.opencpu.org/get") %>% 
  to_return(
    body = mock_file(path = f, payload = "{\"foo\": \"bar\"}"),
    headers = list('content-type' = "application/json")
  )
file.exists(f)
out <- GET("https://hb.opencpu.org/get", write_disk(f))
out
file.exists(f)
out$content
readLines(out$content)
content(out, "text", encoding = "UTF-8")

See how file.exists is FALSE until after the GET call is made

@davidorme
Copy link
Author

That's really useful - I hadn't quite got that mechanism straight in my head. I was trying to avoid having to hold the file contents in memory in the stub registry, but the payloads aren't that big!

The thing that is more of an issue is that I didn't mention that some of my payloads are binary files. The mock_file docs only mention character content and - tracing mock_file handling back into the Adapter class - cat is used to write the content. I guess mock_file could be extended to detect the content type (or rely on headers) and write the content appropriately but that definitely looks like new functionality.

@sckott
Copy link
Collaborator

sckott commented May 17, 2023

I think i could support binary too. Would adding that be the last thing you need for this to be useable?

@davidorme
Copy link
Author

Yes - I think that would do it. For the moment, I've got a solution working using a really basic httr callback, but being able to use the full range of wi_th and to_return would be great.

A more general comment is that I found it hard to understand where the content of a response is coming from / going to. If the use case is simply to get the content itself - which is probably 99% of the time - then that doesn't matter, but if did for me. The basic pattern seems to be that the type of thebody argument to to_return always supersedes any setting of httr::write_disk or httr::write_memory.

  • If a source filename or a file connection is passed in, then the content is always a pointer to the source file.

    Incidentally, if the filename is a string, it isn't classed as an httr::path so the content is the path name and doesn't show as <ON DISK> in print(response). Because fromJSON checks if the txt argument is a filename, the actual content does then get read in. It seems like it would be more consistent if the filename was converted to a path object in the response and then httr::content would load the actual content and pass that on to the parser. I can imagine that might be something where different parser behaviour makes it easier to pass the filename around rather than the content though.

  • If character data is passed in, the response content is always in memory, even when write_disk is used.

  • If a mock_file is passed in then the content is always read into the registry and then is always written to the file path used in mock_file even when write_memory is requested.

I'm not quite sure what I'm asking for here but I found some of those outcomes surprising. One approach might be something like a separation in to_return between body (which sets the content) and destination, which could be as.is, or one of the httr::write_ methods to replace the existing arguments to GET (or an equivalent for crul?).

library(httr)
library(webmockr)
enable()


mock_file <- tempfile(fileext = ".json")
out_file <- tempfile(fileext = ".json")
source_file <- tempfile(fileext = ".json")
cat("{\"foo\": \"bar\"}", file = source_file)

# 4 x 2 combinations of possible to_return bodies and httr output settings
bodies <- list(
    "source_file" = source_file,
    "source_path" = source_file,
    "char" = "{\"foo\": \"bar\"}",
    "mock_file" = mock_file(path = mock_file, payload = "{\"foo\": \"bar\"}")
)

httr_out <- list(
    "mem" = write_memory(),
    "disk" = write_disk(out_file)
)

# filename/role lookup
files <- list("mock_file", "out_file", "source_file")
names(files) <- c(mock_file, out_file, source_file)

stub_registry_clear()

# Loop over combinations
for (idxb in seq_along(bodies)) {
    for (idxo in seq_along(httr_out)) {
        # Get inputs
        bod <- bodies[idxb]
        out <- httr_out[idxo]

        if (names(bod) == "source_file") {
            body <- file(bod[[names(bod)]])
        } else {
            body <- bod[[names(bod)]]
        }
        # Create stub with requested body
        stub <- stub_request("get", "https://hb.opencpu.org/get") %>%
            to_return(
                body = body,
                headers = list("content-type" = "application/json")
            )
        # GET response with requested output
        response <- GET("https://hb.opencpu.org/get", out[[names(out)]])

        # Report
        content_is_httr_path <- inherits(response$content, "path")
        if (content_is_httr_path) {
            actual_content <- response$content
        } else {
            actual_content <- content(response, "text", encoding = "UTF-8")
        }

        cat(
            "body: ", names(bod),
            "\nhttr requested output: ", names(out),
            "\nroundtrip success: ",
            all.equal(content(response), list("foo" = "bar")),
            "\ncontent is httr path: ", content_is_httr_path,
            "\nactual content: ", actual_content,
            "\nwhich path: ", files[[actual_content]],
            "\n-------------------\n"
        )

        remove_request_stub(stub)
    }
}

body:  source_file 
httr requested output:  mem 
roundtrip success:  TRUE 
content is httr path:  TRUE 
actual content:  /var/folders/62/cz0ctby96lq7ggnytfbtd__40000gn/T//Rtmp1wVdDX/filef55676e491a7.json 
which path:  source_file 
-------------------
body:  source_file 
httr requested output:  disk 
roundtrip success:  TRUE 
content is httr path:  TRUE 
actual content:  /var/folders/62/cz0ctby96lq7ggnytfbtd__40000gn/T//Rtmp1wVdDX/filef55676e491a7.json 
which path:  source_file 
-------------------
body:  source_path 
httr requested output:  mem 
roundtrip success:  TRUE 
content is httr path:  FALSE 
actual content:  /var/folders/62/cz0ctby96lq7ggnytfbtd__40000gn/T//Rtmp1wVdDX/filef55676e491a7.json 
which path:  source_file 
-------------------
body:  source_path 
httr requested output:  disk 
roundtrip success:  TRUE 
content is httr path:  FALSE 
actual content:  /var/folders/62/cz0ctby96lq7ggnytfbtd__40000gn/T//Rtmp1wVdDX/filef55676e491a7.json 
which path:  source_file 
-------------------
body:  char 
httr requested output:  mem 
roundtrip success:  TRUE 
content is httr path:  FALSE 
actual content:  {"foo": "bar"} 
which path:  
-------------------
body:  char 
httr requested output:  disk 
roundtrip success:  TRUE 
content is httr path:  FALSE 
actual content:  {"foo": "bar"} 
which path:  
-------------------
body:  mock_file 
httr requested output:  mem 
roundtrip success:  TRUE 
content is httr path:  TRUE 
actual content:  /var/folders/62/cz0ctby96lq7ggnytfbtd__40000gn/T//Rtmp1wVdDX/filef5566e9582e8.json 
which path:  mock_file 
-------------------
body:  mock_file 
httr requested output:  disk 
roundtrip success:  TRUE 
content is httr path:  TRUE 
actual content:  /var/folders/62/cz0ctby96lq7ggnytfbtd__40000gn/T//Rtmp1wVdDX/filef5566e9582e8.json 
which path:  mock_file 
-------------------

@sckott
Copy link
Collaborator

sckott commented May 19, 2023

Okay, I can make a change for binary support

@sckott
Copy link
Collaborator

sckott commented May 19, 2023

If a mock_file is passed in then the content is always read into the registry and then is always written to the file path used in mock_file even when write_memory is requested.

hmm, mock_file is only meant to help with mocking write_disk and equivalent in crul, so its not surprising that the behavior when using mock_file with write_memory is not what you'd expect - i'm not sure I want to make any changes along these lines

You're right there is a discrepancy between what body is returned and what you request httr to do wrt disk/memory. The issue is that we need to tell webmockr what to do explicitly - the http client package (httr, crul) does not and should not determine what is returned in the mocked response other than indirectly - we grab some things from the request object to put into the response.

In this thread the focus is on mocking the response body when the http client library writes to disk. Unfortunately, the Ruby library webmock this is based off does not handle this use case of an http library writing directly to disk. It just doesn't happen in Ruby. They make you do an http request, then write to disk if you want to do that. So there's no nicely paved route for how to handle this.

@davidorme
Copy link
Author

davidorme commented May 22, 2023

Many thanks.

Interesting that Ruby forces you to save to disk manually - there's a lot that can go wrong there with content-type and encodings! On the general comment - which is way outside this issue - I think the model I have in my head would be to separate body from destination in creating a stub_request. I think there are four options for destination:

  • respect_call (whatever the actual call to httr or crul set up),
  • divert_to_memory (forces the response body into memory),
  • divert_to_disk (forces the response body into a named file, which might be different from an existing write_disk),
  • load_from_source (which only applies to file bodies and points the response object to the original file to avoid duplicating files).

@sckott
Copy link
Collaborator

sckott commented May 23, 2023

Interesting that Ruby forces you to save to disk manually - there's a lot that can go wrong there with content-type and encodings!

Yeah, but I think actually R may be the outlier in this case - httr and friends in R definitely make things simpler for users compared to other langs.

I think there are four options for destination

I'll have a think about it

@sckott
Copy link
Collaborator

sckott commented May 27, 2023

Okay, I can make a change for binary support

sorry about the delay on this, working on integrating httr2 here and in vcr, and it's rather complicated

@davidorme
Copy link
Author

@sckott I have a home rolled solution (with much reduced functionality and robustness) so safedata is working well with a proof of concept mocking of URLs. And I'm on leave for a fortnight, so there's absolutely no rush!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants