Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partitioning representative periods #78

Open
greg-neustroev opened this issue Nov 11, 2024 · 2 comments
Open

Partitioning representative periods #78

greg-neustroev opened this issue Nov 11, 2024 · 2 comments

Comments

@greg-neustroev
Copy link
Collaborator

greg-neustroev commented Nov 11, 2024

Description

So currently we assume that the input data for clustering is a table with the following header:

  p.u.
profile_name year timestep value

or:

    p.u.
profile_name year period timestep value

If period is not provided, the data first needs to be split into periods. Currently this is done using TulipaClustering.split_into_periods!(df; period_duration), splitting the datafrae into periods of equal length.

At the same time, TulipaEnergyModels supports splitting splitting the year into unequal periods via partition specification, see for example assets-timeframe-partitions.csv.

I think that TulipaClustering should also utilize the partitioning approach, e.g., instead of calling TulipaClustering.split_into_periods!(df; period_duration) we should be able to call TulipaClustering.partition!(df; partition_string) to split the base data into periods.

Example:

Your data frame df is:

  p.u.
profile_name year timestep value
profile_1 2030 1 1
profile_1 2030 2 2
profile_1 2030 3 3
profile_1 2030 4 4
... ... ... ...
profile_1 2030 8760 8760

I can call TulipaClustering.split_into_periods!(df; period_duration=24) which will change the data frame into periods of length 24 each (365 periods in total):

  p.u.
profile_name year period timestep value
profile_1 2030 1 1 1
profile_1 2030 1 2 2
profile_1 2030 1 3 3
profile_1 2030 1 4 4
... ... ... ...
profile_1 2030 365 24 8760

Instead, we might want to partition the first week as a period of length 168, and then have 358 periods of length 24. I would be able to do this by calling TulipaClustering.partition!(df; partition_string="1x168+358x24"). The example above can be done with TulipaClustering.partition!(df; partition_string="365x24").

The questions regarding this:

  1. Is this string-based partitioning useful and worth implementing?
  2. Should it be implemented in TulipaClustering or elsewhere, since partitioning is used outside of clustering as well.
  3. Should the ne partitioning method replace the existing split_into_periods, or coexist with it? What's a good name for the method and its arguments? I like using partition as a verb, but then we also use it as a noun for the string specifying the partitioning structure, so TulipaClustering.partition!(df; partition) would look confusing potentially.
@greg-neustroev
Copy link
Collaborator Author

@abelsiqueira @datejada @g-moralesespana @gnawin

What do you guys think?

@greg-neustroev
Copy link
Collaborator Author

Another question is how do we cluster periods of different length, but this should be a separate issue I think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant