Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design a File API for k6 #2977

Closed
oleiade opened this issue Mar 13, 2023 · 5 comments
Closed

Design a File API for k6 #2977

oleiade opened this issue Mar 13, 2023 · 5 comments
Assignees
Labels
enhancement evaluation needed proposal needs to be validated or tested before fully implementing it in k6

Comments

@oleiade
Copy link
Member

oleiade commented Mar 13, 2023

Problem statement

The only way to interact with files from k6 scripts currently involves using the open function. However, the open function is somewhat (for historical and technical reasons) misnamed as it, under the hood, not only opens a file but reads its whole content and stores it in memory.

Although this exists for technical reasons, for instance, how we package user scripts, resources, and assets and how we interact with the filesystem throughout k6 in general. We believe k6 could use a more "standard" file API; to be more flexible and intuitive and cater to the larger issue of handling large files in k6.

Proposal

We propose the introduction of a basic file API as a k6 module, in the same fashion as Node's. We would assume the existence of a FileHandle (temporary name) construct, which would expose an API to interact with the file itself.

We assume the following operations would be made possible by this new API:

  • open would open a file, regardless of its underlying storage, and return some FileHandle (name used only for the example). A file handle would be the equivalent of an OS file descriptor but remain agnostic to the underlying storage (fs, tar archive file handle as described in Support finer-grained and richer access to tar archives content #2975, or even a memory map).
  • FileHandle.read(): would read the entire content of a file and return it to the user.
  • FileHandle.read(buffer): would read the content of a file and fill the buffer with read data until it is full.
  • FileHandle.close(): would close the FileHandle and render it unusable from this point forward (closing the underlying file descriptor/stream if relevant).
  • Filehandle.seek(offset): would allow to move the "reading head" at a different offset in the file.

Problem space

@oleiade oleiade self-assigned this Mar 13, 2023
@oleiade oleiade added enhancement evaluation needed proposal needs to be validated or tested before fully implementing it in k6 labels Mar 13, 2023
@imiric
Copy link
Contributor

imiric commented Mar 16, 2023

Hey, I like the proposal 👍

Though I think we should be inspired more by Deno than Node, since it's more modern, and doesn't have all the cruft of Node.

For example, take a look at how they handle file opening and streams. It's very readable, and integrates nicely with the HTTP API.

I added an example in the HTTP API design document inspired by this, though we will need to make some decisions first. For example, will open() be sync or async? To me it makes more sense as async, as doing any disk I/O even without reading the file contents could be potentially costly. But then we'll need to support await in the init context, which is possibly not trivial.

@oleiade
Copy link
Member Author

oleiade commented Mar 16, 2023

Thanks for pointing that out 🙇🏻 ! You're right, and the Deno API is likely better, indeed. I'll look into it and update the issue accordingly 👍🏻 I also like that it has support streams already, which is also an issue I've opened, and I believe we should address it.

Regarding sync versus async, I think a good compromise would be to have open remaining sync (at least in the first iteration?) to avoid the issue you've pointed out, and also, as it is more of a syscall than pure IO. I would, however, have operations such as read, readFile, and even seek be async.

@mstoykov
Copy link
Contributor

Note: supporting await in the module (or top-level-await ) is part of the ESM support.

From technical standpoint I currently need to implement it (or something extremely close) just to make the ESM loading work correctly. Mind you this is the hard part, so it might take a while.

On the other hand users can always use .then up until that happens or just have a dummy (async () => {})(). And I am not particularly certain if ESM support or this will land first so 🤷

@na--
Copy link
Member

na-- commented Apr 11, 2023

#3017 is a somewhat related issue, and it probably makes sense to consider both of these together, even if the implementation is not tied. For writing files, we probably will mostly care about the internal Go APIs first, while reading files without fully loading them into memory really needs a JS solution so we can deprecate open().

@oleiade
Copy link
Member Author

oleiade commented Jun 27, 2023

We have reached a consensus on a proposed design and built a proof of concept illustrating its feasibility.

This issue is now in the implementation phase and expected to land in version 0.46 of k6, estimated to be released mid-August 2023.

@oleiade oleiade changed the title Add a file API to k6 Design a File API for k6 Jul 5, 2023
@oleiade oleiade closed this as completed Jul 5, 2023
@github-project-automation github-project-automation bot moved this from Short term - Q3 2023 to Released in k6 open-source public roadmap Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement evaluation needed proposal needs to be validated or tested before fully implementing it in k6
Projects
Development

No branches or pull requests

4 participants