Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fileHandle.readableWebStream sometimes crashes on large inputs #56116

Open
smoores-dev opened this issue Dec 2, 2024 · 2 comments
Open

fileHandle.readableWebStream sometimes crashes on large inputs #56116

smoores-dev opened this issue Dec 2, 2024 · 2 comments
Labels
stream Issues and PRs related to the stream subsystem. web streams

Comments

@smoores-dev
Copy link

smoores-dev commented Dec 2, 2024

Version

v22.10.0

Platform

Reproducible on:

Linux smooreswork 6.11.8-300.fc41.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Nov 14 20:37:39 UTC 2024 x86_64 GNU/Linux

Linux 04d2433b1ce0 6.1.49-Unraid #1 SMP PREEMPT_DYNAMIC Wed Aug 30 09:42:35 PDT 2023 x86_64 x86_64 x86_64 GNU/Linux

Linux server 6.8.0-49-generic #49-Ubuntu SMP PREEMPT_DYNAMIC Mon Nov  4 02:06:24 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

fs/promises

What steps will reproduce the bug?

import { open } from "node:fs/promises"

/**
 * Like readFile, but streams the file into memory in chunks
 * to avoid hard-coded 2GB limit on file I/O operations in
 * Node.js/libuv https://github.com/libuv/libuv/pull/1501
 *
 * @param {string} path
 * @returns {Promise<Uint8Array>}
 */
export async function streamFile(path) {
  const fileHandle = await open(path)
  try {
    const stats = await fileHandle.stat()
    const fileData = new Uint8Array(stats.size)
    let i = 0
    for await (const chunk of fileHandle.readableWebStream()) {
      const chunkArray = new Uint8Array(chunk)
      fileData.set(chunkArray, i)
      i += chunkArray.byteLength
    }
    return fileData
  } finally {
    await fileHandle.close()
  }
}

Run the above function on any relatively large file in the Node.js REPL. The larger the file, the more consistent the failure. With a ~100MB file, I get the error maybe 1 in every 20 attempts on my laptop. With a 600MB file, it's more like every other attempt.

How often does it reproduce? Is there a required condition?

The file must be sufficiently large. I have never seen this occur with files smaller than 50MB. With files less than 200MB, it occurs occasionally. With files larger than 500MB, it occurs very consistently.

I attempted to reproduce this in a simple node.js ESM script, but it does not seem to reproduce when the above function is run in a script. I can reproduce consistently in the REPL, and this same issue occurs consistently in the Storyteller Next.js app, where that code sample is from.

What is the expected behavior? Why is that the expected behavior?

The streamFile function should successfully read files of any length into memory. It reads files one chunk at a time to avoid libuv's hard limit on 2GB file I/O operations. There should be no crash or warning about closing a file descriptor on garbage collection, because the file descriptor is closed in a finally block.

What do you see instead?

> await streamFile('./large-test-file.m4b')
(node:460551) Warning: Closing file descriptor 42 on garbage collection
(node:460551) [DEP0137] DeprecationWarning: Closing a FileHandle object on garbage collection is deprecated. Please close FileHandle objects explicitly using FileHandle.prototype.close(). In the future, an error will be thrown if a file descriptor is closed during garbage collection.

After this error log, the Node REPL itself exits with status code 139

Additional information

If I swap fileHandle.readableWebStream() out for fileHandle.createReadStream(), this works without issue for files of any size.

@juanarbol
Copy link
Member

cc/ @nodejs/streams

@juanarbol juanarbol added stream Issues and PRs related to the stream subsystem. web streams labels Dec 2, 2024
@targos
Copy link
Member

targos commented Dec 3, 2024

It seems that the bug is that the web stream doesn't keep a strong reference to the file handle, so that gets garbage collected while it's used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stream Issues and PRs related to the stream subsystem. web streams
Projects
None yet
Development

No branches or pull requests

3 participants