-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in result_fetch(res@ptr, n = n) : embedded nul in string #294
Comments
Thanks. I think the rationale for closing I gave back then still holds. Can you somehow cast the data on the database side to make sure it appears in proper shape in R? What does your stack look like? |
I have same problem as thread #121 When running code in the following way: query <- "SELECT *FROM table" I get the error: "Error in result_fetch(res@ptr, n = n) : embedded nul in string" For now, I have internally resolved it by identifying the problematic column and changing the query to: query <- "SELECT column1, column2,..., REPLACE(column, '\\0', '') AS new_column FROM table" And with this, I have been able to correctly read the column that was causing the issue when reading the data. I was using: RMariaDB_1.2.1, R version 4.0.4 (2021-02-15) |
@paleolimbot: Are there plans for a MariaDB/MySQL ADBC driver, by any chance? This database will support strings with embedded NUL characters. AFAICT, Postgres doesn't support them, and I was unable to replicate it with SQLite. How will ADBC deal with that issue? IIUC, Arrow had the same problem, and the consensus there was to fail explicitly: apache/arrow#8365 (comment) . |
I'm not aware of any, but we recently have dramatically improved the Go and C++ driver framework which should make it much easier to write and maintain drivers if there is some well-known Go or C++ connector worth wrapping. From ADBC's standpoint, I am not sure if the driver would catch this (or want to) before exporting an Arrow array. Any incompatibility would happen on conversion to an R vector, where, as you noted, the arrow package fails (or can strip them). In nanoarrow you can at least work around this by requesting that the string be converted to a blob (the arrow package doesn't allow this type of output request). library(nanoarrow)
string_array_with_embedded_nul <- nanoarrow_array_modify(
nanoarrow_array_init(na_string()),
list(
length = 1,
buffers = list(
NULL,
c(0L, 9L),
c(charToRaw("abcd"), as.raw(0x00), charToRaw("efgh"))
)
)
)
# Error deferred to printing because of ALTREP?
convert_array(string_array_with_embedded_nul)
#> [1]
#> Error in print.default(x): embedded nul in string: 'abcd\0efgh'
# Can always bail to a blob on convert
convert_array(string_array_with_embedded_nul, blob::blob())
#> <blob[1]>
#> [1] blob[9 B]
# Error deferred to printing because of ALTREP
as.vector(arrow::as_arrow_array(string_array_with_embedded_nul))
#> [1]
#> Error in print.default(x): embedded nul in string: 'abcd\0efgh'; to strip nuls when converting from Arrow to R, set options(arrow.skip_nul = TRUE)
# Option exists to strip them (with lots of warnings)
options(arrow.skip_nul = TRUE)
as.vector(arrow::as_arrow_array(string_array_with_embedded_nul))
#> [1]
#> Warning in print.default(x): Stripping '\0' (nul) from character vector
#> Warning in print.default(x): Stripping '\0' (nul) from character vector
#> Warning in print.default(x): Stripping '\0' (nul) from character vector
#> "abcdefgh" Created on 2024-04-01 with reprex v2.1.0 |
If there was an ADBC driver for MariaDB, I'd expect it to thread through the embedded NULLs. The users could then apply repair. Fixing this issue in the existing backends looks like a losing battle. Implementing that ADBC driver seems to be the much more worthwhile goal. |
Issue #121 was locked unresolved. The issue remains, and the solution proposed there, emulating the behavior already used in RMySQL seems like a good way forward.
The text was updated successfully, but these errors were encountered: