Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data] Normalization and Scaling - Effect on algorithm convergence and stability #2

Open
christianadriano opened this issue May 26, 2020 · 3 comments
Labels
question Further information is requested

Comments

@christianadriano
Copy link
Member

christianadriano commented May 26, 2020

@brrrachel and @2start (Nico)

Many approximation algorithms converge better when the data is normalized (zero to one) and scaled (mean==0) . Could you please investigate if this is a possible issue that would be interesting to show?

If positive, we can easily run the algorithms with four different subsets of the utility increase data combining small and large kurtosis and skewness. We would be looking at how quickly (number of episodes) each run achieves a certain level of exploitation (reduces exploration) and how quick it reaches the maximum reward (within a determined margin of error). These are charts that Nico has already developed.

@christianadriano christianadriano added the question Further information is requested label May 26, 2020
@2start
Copy link
Collaborator

2start commented Jun 12, 2020

@christianadriano @brrrachel Sorry, I guess I am a little late to the party.

I thought about this one again. Normalization is used in ML algorithms to normalize the impact of different predictor variables on the target variable. However, we only have a single input variable, the reward. Therefore normalization will probably have no effect because in the current RL algorithms, there is no part sensible to the absolute size of the rewards.

Regarding the transformation of the raw utilities: I don't think this is useful either because we want the agent to maximize the total utility/reward.

r_1 + r_2 + ... + r_n

However, if we somehow use a function f to alter all the rewards r_1 .. r_n we maximize the following function:

f(r_1) + f(r_2) + .. + f(r_n) 

Therefore, I propose we drop this part of modifying the input data. An interesting task, however, would be to analyze the data to find possible faults/interesting characteristics, which will help us understanding the results later on.

@christianadriano
Copy link
Member Author

@brrrachel I would like to hear Rachel opinion on this too.

@brrrachel
Copy link
Collaborator

Well I did some research about this too. Currently our aim is to be able to better predict / distinguish between the <component, failure> combinations. Some important points about normalization:

  • the type of distribution doesn't change
  • allows your the agent to distinguish good and bad actions more effectively
  • reduces training time

Then I red a little bit more about it and could help to deal with a non-stationary environment, too. Since in reinforcement learning the policy of behavior can change during learning, thereby it changes the distribution and magnitude of the values. An approach to deal with that is presented here: https://arxiv.org/pdf/1602.07714.pdf

Currently, I haven't a proved approach how to implement normalisation (since it isn't always about to simply scale it to [-1;1]) but I would not prefer to drop this idea right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants