Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small fix for local storage #1556

Open
wants to merge 38 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
223252c
check dir exist for storage
Jun 16, 2023
60edbc4
index should be int
Jun 16, 2023
d607d81
remove to_csv twice
Jun 18, 2023
5579b8a
index should be int
Jun 18, 2023
c0b60e9
should not get_recent_freq if folder is empty
Jun 18, 2023
07083ae
fix index missing bug
Jun 19, 2023
655e666
improve logging
Jun 27, 2023
0a0d7dc
allow None model and dataset in SoftTopkStrategy
Jul 7, 2023
a0d1450
use line width 120
PaleNeutron Aug 17, 2023
70b5c9f
change get_data url (#1558)
SunsetWolf Jun 25, 2023
58f73de
Update release-drafter.yml (#1569)
you-n-g Jun 25, 2023
ba2df87
Update __init__.py
you-n-g Jun 25, 2023
1e9140d
Update __init__.py
you-n-g Jun 27, 2023
194ac59
Update README.md for RL (#1573)
you-n-g Jun 28, 2023
a656648
fix_pip_ci (#1584)
SunsetWolf Jul 5, 2023
706138c
fix download token (#1577)
m3ngyang Jul 6, 2023
fab4e0a
Update qlibrl docs. (#1588)
lwwang1995 Jul 7, 2023
9a0291f
Postpone PR stale. (#1591)
you-n-g Jul 12, 2023
d9936c4
Adjust rolling api (#1594)
you-n-g Jul 14, 2023
6cefe4a
Fixed pyqlib version issue on macos (#1605)
SunsetWolf Jul 18, 2023
a65fca8
Update __init__.py
you-n-g Jul 18, 2023
9e990e5
Bump Version & Fix CI (#1606)
you-n-g Jul 18, 2023
e5df276
fix_ci (#1608)
SunsetWolf Jul 19, 2023
ee50f7c
Update introduction.rst (#1579)
computerscienceiscool Jul 26, 2023
9864038
Update README.md (#1553)
GeneLiuXe Jul 26, 2023
2d0162d
Update introduction.rst (#1578)
computerscienceiscool Jul 26, 2023
e2019f8
depress warning with pandas option_context (#1524)
Fivele-Li Aug 1, 2023
42ba746
fix docs (#1618)
SunsetWolf Aug 2, 2023
b624ddf
Add multi pass portfolio analysis record (#1546)
chenditc Aug 4, 2023
e9fbb4f
Add exploration noise to rl training collector (#1481)
chenditc Aug 18, 2023
10e27d5
Troubleshooting pip version issues in CI (#1504)
Fivele-Li Aug 24, 2023
b300af7
suppress the SettingWithCopyWarning of pandas (#1513)
Fivele-Li Sep 1, 2023
8e446aa
Update requirements.txt (#1521)
kimzhuan Sep 15, 2023
8bcf09e
pred current is confusing
Jul 7, 2023
97c6799
add build system requirements
Jul 19, 2023
265fdc9
add pos and neg operator
Jul 20, 2023
f3ce11a
fix stock is delisted
Jul 20, 2023
68e2640
Merge branch 'main' into main
PaleNeutron Oct 11, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
[build-system]
requires = ["setuptools", "numpy", "Cython"]
requires = ["setuptools", "numpy", "Cython"]
3 changes: 3 additions & 0 deletions qlib/backtest/exchange.py
Original file line number Diff line number Diff line change
Expand Up @@ -511,6 +511,9 @@ def get_deal_price(
self.logger.warning(f"(stock_id:{stock_id}, trade_time:{(start_time, end_time)}, {pstr}): {deal_price}!!!")
self.logger.warning(f"setting deal_price to close price")
deal_price = self.get_close(stock_id, start_time, end_time, method)
# if stock is delisted, the deal_price(close) will be None,set to 0
if deal_price is None or np.isnan(deal_price):
deal_price = 0.0
return deal_price

def get_factor(
Expand Down
11 changes: 5 additions & 6 deletions qlib/contrib/strategy/cost_control.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
"""


import pandas as pd
from .order_generator import OrderGenWInteract
from .signal_strategy import WeightStrategyBase
import copy
Expand All @@ -13,16 +14,11 @@
class SoftTopkStrategy(WeightStrategyBase):
def __init__(
self,
model,
dataset,
topk,
order_generator_cls_or_obj=OrderGenWInteract,
max_sold_weight=1.0,
risk_degree=0.95,
buy_method="first_fill",
trade_exchange=None,
level_infra=None,
common_infra=None,
**kwargs,
):
"""
Expand All @@ -37,7 +33,8 @@ def __init__(
average_fill: assign the weight to the stocks rank high averagely.
"""
super(SoftTopkStrategy, self).__init__(
model, dataset, order_generator_cls_or_obj, trade_exchange, level_infra, common_infra, **kwargs
order_generator_cls_or_obj=order_generator_cls_or_obj,
**kwargs,
)
self.topk = topk
self.max_sold_weight = max_sold_weight
Expand Down Expand Up @@ -70,6 +67,8 @@ def generate_target_weight_position(self, score, current, trade_start_time, trad
# TODO:
# If the current stock list is more than topk(eg. The weights are modified
# by risk control), the weight will not be handled correctly.
if isinstance(score, pd.DataFrame):
score = score.iloc[:, 0]
buy_signal_stocks = set(score.sort_values(ascending=False).iloc[: self.topk].index)
cur_stock_weight = current.get_stock_weight_dict(only_stock=True)

Expand Down
2 changes: 1 addition & 1 deletion qlib/contrib/strategy/signal_strategy.py
Original file line number Diff line number Diff line change
Expand Up @@ -333,7 +333,7 @@ def generate_target_weight_position(self, score, current, trade_start_time, trad

Parameters
-----------
score : pd.Series
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data structure of the score returned by get_signal is defined as Union[pd.Series, pd.DataFrame, None]:

pred_score = self.signal.get_signal(start_time=pred_start_time, end_time=pred_end_time)

Is it appropriate to restrict the score to be only a DataFrame?

score : pd.DataFrame
pred score for this trade date, index is stock_id, contain 'score' column.
current : Position()
current position.
Expand Down
6 changes: 6 additions & 0 deletions qlib/data/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,12 @@ def __ror__(self, other):
from .ops import Or # pylint: disable=C0415

return Or(other, self)

def __pos__(self):
return self

def __neg__(self):
return 0 - self

def load(self, instrument, start_index, end_index, *args):
"""load feature
Expand Down
2 changes: 1 addition & 1 deletion qlib/data/pit.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ def _load_internal(self, instrument, start_index, end_index, freq):
s = self._load_feature(instrument, -start_ws, 0, cur_time)
resample_data[cur_index - start_index] = s.iloc[-1] if len(s) > 0 else np.nan
except FileNotFoundError:
get_module_logger("base").warning(f"WARN: period data not found for {str(self)}")
get_module_logger("base").warning(f"WARN: period data not found for {instrument} {str(self)} ({freq})")
return pd.Series(dtype="float32", name=str(self))

resample_series = pd.Series(
Expand Down
10 changes: 6 additions & 4 deletions qlib/data/storage/file_storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ def __init__(self, freq: str, future: bool, provider_uri: dict = None, **kwargs)
self._provider_uri = None if provider_uri is None else C.DataPathManager.format_provider_uri(provider_uri)
self.enable_read_cache = True # TODO: make it configurable
self.region = C["region"]
self.uri.parent.mkdir(parents=True, exist_ok=True)

@property
def file_name(self) -> str:
Expand All @@ -90,7 +91,7 @@ def _freq_file(self) -> str:
"""the freq to read from file"""
if not hasattr(self, "_freq_file_cache"):
freq = Freq(self.freq)
if freq not in self.support_freq:
if self.support_freq and freq not in self.support_freq:
# NOTE: uri
# 1. If `uri` does not exist
# - Get the `min_uri` of the closest `freq` under the same "directory" as the `uri`
Expand Down Expand Up @@ -199,6 +200,7 @@ def __init__(self, market: str, freq: str, provider_uri: dict = None, **kwargs):
super(FileInstrumentStorage, self).__init__(market, freq, **kwargs)
self._provider_uri = None if provider_uri is None else C.DataPathManager.format_provider_uri(provider_uri)
self.file_name = f"{market.lower()}.txt"
self.uri.parent.mkdir(parents=True, exist_ok=True)

def _read_instrument(self) -> Dict[InstKT, InstVT]:
if not self.uri.exists():
Expand Down Expand Up @@ -233,7 +235,6 @@ def _write_instrument(self, data: Dict[InstKT, InstVT] = None) -> None:
df.loc[:, [self.SYMBOL_FIELD_NAME, self.INSTRUMENT_START_FIELD, self.INSTRUMENT_END_FIELD]].to_csv(
self.uri, header=False, sep=self.INSTRUMENT_SEP, index=False
)
df.to_csv(self.uri, sep="\t", encoding="utf-8", header=False, index=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this to_csv method.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous line already have to_csv at the end of line.


def clear(self) -> None:
self._write_instrument(data={})
Expand Down Expand Up @@ -287,6 +288,7 @@ def __init__(self, instrument: str, field: str, freq: str, provider_uri: dict =
super(FileFeatureStorage, self).__init__(instrument, field, freq, **kwargs)
self._provider_uri = None if provider_uri is None else C.DataPathManager.format_provider_uri(provider_uri)
self.file_name = f"{instrument.lower()}/{field.lower()}.{freq.lower()}.bin"
self.uri.parent.mkdir(parents=True, exist_ok=True)

def clear(self):
with self.uri.open("wb") as _:
Expand Down Expand Up @@ -318,15 +320,15 @@ def write(self, data_array: Union[List, np.ndarray], index: int = None) -> None:
# rewrite
with self.uri.open("rb+") as fp:
_old_data = np.fromfile(fp, dtype="<f")
_old_index = _old_data[0]
_old_index = int(_old_data[0])
_old_df = pd.DataFrame(
_old_data[1:], index=range(_old_index, _old_index + len(_old_data) - 1), columns=["old"]
)
fp.seek(0)
_new_df = pd.DataFrame(data_array, index=range(index, index + len(data_array)), columns=["new"])
_df = pd.concat([_old_df, _new_df], sort=False, axis=1)
_df = _df.reindex(range(_df.index.min(), _df.index.max() + 1))
_df["new"].fillna(_df["old"]).values.astype("<f").tofile(fp)
np.hstack([_old_index, _df["new"].fillna(_df["old"]).values]).astype("<f").tofile(fp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why fill the missing values in np.hstack([_old_index, _df["new"]) instead of _df["new"]?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first item _old_index is the start index not value. Data structure is [first_index, v0, v1, v2].

But I think current version still have bug when new index is smaller than _old_index


@property
def start_index(self) -> Union[int, None]:
Expand Down