Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in get_inat_obs for taxa having large number of records #27

Open
Cactusolo opened this issue Apr 5, 2019 · 2 comments
Open

Error in get_inat_obs for taxa having large number of records #27

Cactusolo opened this issue Apr 5, 2019 · 2 comments

Comments

@Cactusolo
Copy link

Following code gives error

test <- get_inat_obs(taxon_name = "Tracheophyta",maxresults = 100)
Error in get_inat_obs(taxon_name = "Tracheophyta", maxresults = 100) : 
  Your search returned too many results, please consider breaking it up into smaller chunks by year or month

but both of the following produce desired results.

test <- get_inat_obs(taxon_name = "Kirkiaceae",maxresults = 100)
test <- get_inat_obs(taxon_name = "Danaus",maxresults = 100)
@LDalby
Copy link
Contributor

LDalby commented May 28, 2019

I also get this error. In the Tracheophyta case it seems like you don't get the error if the query argument is used instead of taxon_id. The odd thing is that for Monach butterflies both types of search will work and only the 100 results are returned (maxresults = 100 is default).

Hmm, not sure why this is...

library(rinat)
#> Registered S3 methods overwritten by 'ggplot2':
#>   method         from 
#>   [.quosures     rlang
#>   c.quosures     rlang
#>   print.quosures rlang

test <- get_inat_obs(query = "Tracheophyta")
nrow(test)
#> [1] 100
test2 <- get_inat_obs(taxon_name = "Tracheophyta")
#> Error in get_inat_obs(taxon_name = "Tracheophyta"): Your search returned too many results, please consider breaking it up into smaller chunks by year or month

butterflies <- get_inat_obs(query = "Monarch Butterfly")
nrow(butterflies)
#> [1] 100
butterflies2 <- get_inat_obs(taxon_name = "Danaus plexippus")
nrow(butterflies2)
#> [1] 100

Created on 2019-05-28 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.6.0 (2019-04-26)
#>  os       macOS Mojave 10.14.5        
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       Europe/Copenhagen           
#>  date     2019-05-28                  
#> 
#> ─ Packages ──────────────────────────────────────────────────────────────
#>  package     * version date       lib source        
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.0)
#>  backports     1.1.4   2019-04-10 [1] CRAN (R 3.6.0)
#>  callr         3.2.0   2019-03-15 [1] CRAN (R 3.6.0)
#>  cli           1.1.0   2019-03-19 [1] CRAN (R 3.6.0)
#>  colorspace    1.4-1   2019-03-18 [1] CRAN (R 3.6.0)
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 3.6.0)
#>  curl          3.3     2019-01-10 [1] CRAN (R 3.6.0)
#>  desc          1.2.0   2018-05-01 [1] CRAN (R 3.6.0)
#>  devtools      2.0.2   2019-04-08 [1] CRAN (R 3.6.0)
#>  digest        0.6.19  2019-05-20 [1] CRAN (R 3.6.0)
#>  dplyr         0.8.1   2019-05-14 [1] CRAN (R 3.6.0)
#>  evaluate      0.13    2019-02-12 [1] CRAN (R 3.6.0)
#>  fs            1.3.1   2019-05-06 [1] CRAN (R 3.6.0)
#>  ggplot2       3.1.1   2019-04-07 [1] CRAN (R 3.6.0)
#>  glue          1.3.1   2019-03-12 [1] CRAN (R 3.6.0)
#>  gtable        0.3.0   2019-03-25 [1] CRAN (R 3.6.0)
#>  highr         0.8     2019-03-20 [1] CRAN (R 3.6.0)
#>  htmltools     0.3.6   2017-04-28 [1] CRAN (R 3.6.0)
#>  httr          1.4.0   2018-12-11 [1] CRAN (R 3.6.0)
#>  jsonlite      1.6     2018-12-07 [1] CRAN (R 3.6.0)
#>  knitr         1.23    2019-05-18 [1] CRAN (R 3.6.0)
#>  lazyeval      0.2.2   2019-03-15 [1] CRAN (R 3.6.0)
#>  magrittr      1.5     2014-11-22 [1] CRAN (R 3.6.0)
#>  maps          3.3.0   2018-04-03 [1] CRAN (R 3.6.0)
#>  memoise       1.1.0   2017-04-21 [1] CRAN (R 3.6.0)
#>  munsell       0.5.0   2018-06-12 [1] CRAN (R 3.6.0)
#>  pillar        1.4.0   2019-05-11 [1] CRAN (R 3.6.0)
#>  pkgbuild      1.0.3   2019-03-20 [1] CRAN (R 3.6.0)
#>  pkgconfig     2.0.2   2018-08-16 [1] CRAN (R 3.6.0)
#>  pkgload       1.0.2   2018-10-29 [1] CRAN (R 3.6.0)
#>  plyr          1.8.4   2016-06-08 [1] CRAN (R 3.6.0)
#>  prettyunits   1.0.2   2015-07-13 [1] CRAN (R 3.6.0)
#>  processx      3.3.1   2019-05-08 [1] CRAN (R 3.6.0)
#>  ps            1.3.0   2018-12-21 [1] CRAN (R 3.6.0)
#>  purrr         0.3.2   2019-03-15 [1] CRAN (R 3.6.0)
#>  R6            2.4.0   2019-02-14 [1] CRAN (R 3.6.0)
#>  Rcpp          1.0.1   2019-03-17 [1] CRAN (R 3.6.0)
#>  remotes       2.0.4   2019-04-10 [1] CRAN (R 3.6.0)
#>  rinat       * 0.1.5   2017-03-10 [1] CRAN (R 3.6.0)
#>  rlang         0.3.4   2019-04-07 [1] CRAN (R 3.6.0)
#>  rmarkdown     1.13    2019-05-22 [1] CRAN (R 3.6.0)
#>  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.6.0)
#>  scales        1.0.0   2018-08-09 [1] CRAN (R 3.6.0)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.6.0)
#>  stringi       1.4.3   2019-03-12 [1] CRAN (R 3.6.0)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 3.6.0)
#>  testthat      2.1.1   2019-04-23 [1] CRAN (R 3.6.0)
#>  tibble        2.1.1   2019-03-16 [1] CRAN (R 3.6.0)
#>  tidyselect    0.2.5   2018-10-11 [1] CRAN (R 3.6.0)
#>  usethis       1.5.0   2019-04-07 [1] CRAN (R 3.6.0)
#>  withr         2.1.2   2018-03-15 [1] CRAN (R 3.6.0)
#>  xfun          0.7     2019-05-14 [1] CRAN (R 3.6.0)
#>  yaml          2.2.0   2018-07-25 [1] CRAN (R 3.6.0)
#> 
#> [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library

@stragu
Copy link
Collaborator

stragu commented Jun 17, 2020

Doing a query of "Tracheophyta" will match all the records that contain the term. In my test, I currently get a total of 67,629 entries (for the x-total-entries header), below the current limit of 200,000 that's coded into get_inat_obs(). You can test that with this query: http://www.inaturalist.org/observations.json?&q=Tracheophyta&per_page=1&page=1

Doing a taxon_name search for Tracheophyta will match Tracheophyta as well as all the descendant taxons (there's more than 300,000 species of vascular plants described). This detail of the API is mentioned here: https://www.inaturalist.org/pages/api+reference#get-observations
Testing the taxon search for Tracheophyta gives me a x-total-entries header value of 18,863,802, well above the 200,000 limit. You can try it with this query: http://www.inaturalist.org/observations.json?&taxon_name=Tracheophyta&per_page=1&page=1

Monarch butterfly and Danaus plexippus are both searches that will get results below the limit. I get 96,046 for the taxon search, for example.

So this behaviour is as expected according to the API description and how get_inat_obs() was designed, but I guess we can argue about why that 200,000 x-total-entries limit exists in the first place, given that themaxresults argument can't go above 10,000 anyway? I am new as a maintainer to the package, so didn't follow the development of the functions or the justification for that limit. Any thoughts are welcome! 😄

rinat/R/get_inat_obs.R

Lines 149 to 155 in d5266f4

if(total_res == 0){
stop("Your search returned zero results. Either your species of interest has no records or you entered an invalid search.")
} else if(total_res >= 200000) {
stop("Your search returned too many results, please consider breaking it up into smaller chunks by year or month.")
} else if(!is.null(bounds) && total_res >= 100000) {
stop("Your search returned too many results, please consider breaking it up into smaller chunks by year or month.")
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants