Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using character variables in a model #93

Open
eityeity opened this issue May 7, 2018 · 2 comments
Open

Using character variables in a model #93

eityeity opened this issue May 7, 2018 · 2 comments

Comments

@eityeity
Copy link

eityeity commented May 7, 2018

Hi everyone,

I'm a new user of breedR from Japan.

I have a question about the class of categorical variable in fixed factors.

I have 4 site data set, and the site name is character.

When I set variable 'site' as character, like
d$site = as.character(d$site)

the result of
res = remlf90(fixed = height ~ site , random = ~ code , data =d)
has "Intercept", like

summary(res)

Formula: height ~ 0 + Intercept + site + code
Data: d
AIC BIC logLik
1106 1115 -551.1

Parameters of special components:

Variance components:
Estimated variances S.E.
code 0.3804 0.11533
Residual 0.5516 0.04156

Fixed effects:
value s.e.
Intercept 3.93742 0.1537
site.A -0.93131 0.1300
site.B -1.12276 0.1299
site.C -2.30688 0.1338
site.Z 0.00000 0.0000

When I set variable 'site' as factor, like
'd$site = as.factor(d$site)',
the result don't have "Intercept", as follows.

Formula: height ~ 0 + site + code
Data: d
AIC BIC logLik
1108 1117 -552.1

Parameters of special components:

Variance components:
Estimated variances S.E.
code 0.3804 0.11533
Residual 0.5516 0.04156

Fixed effects:
value s.e.
site.A 3.0061 0.1329
site.B 2.8147 0.1243
site.C 1.6305 0.1366
site.D 3.9374 0.1537

In help of remlf90(),

... and there are no other categorical covariates in fixed. The latter condition is actually a limitation of (ai)remlf90 backends, which would in any case return an estimate for each level of the categorical covariates while returning 0 for the intercept.

Can I trust the "Intercept" estimated by charactered categorical variable in fixed effect?

Sincerely

@famuvie
Copy link
Owner

famuvie commented May 7, 2018

Hi,
yes, you can trust it. It is the same model parameterised differently.
Yet, I recommend using the factor specification in general. First, because it is conceptually the right way of declaring a categorical variable. Second, because it avoids the numerical degeneracy of adding one collinear parameter and setting it to zero.

Still, I will leave this issue open as a minor bug. breedR should either not run a model with a character variable, or automatically convert it to factor.

On the other hand, I don't see why one of your levels gets renamed from D to Z. I could not reproduce it:

library(breedR)
#> Loading required package: sp
N <- 1e4
dat <- 
  data.frame(
    site = gl(4, k = N/4, labels = LETTERS[1:4]),
    height = rep(10*(1:4), each = N/4) + rnorm(N)
  )

res_fac <- remlf90(
  fixed = height ~ site,
  data = dat
)
#> Using default initial variances given by default_initial_variance()
#> See ?breedR.getOption.

res_chr <- remlf90(
  fixed = height ~ site,
  data = transform(dat, site = as.character(site))
)
#> Using default initial variances given by default_initial_variance()
#> See ?breedR.getOption.

summary(res_fac)
#> Formula: height ~ 0 + site 
#>    Data: dat 
#>    AIC   BIC logLik
#>  28512 28520 -14255
#> 
#> 
#> Variance components:
#>          Estimated variances    S.E.
#> Residual               1.011 0.01431
#> 
#> Fixed effects:
#>          value   s.e.
#> site.A  9.9987 0.0201
#> site.B 20.0432 0.0201
#> site.C 30.0013 0.0201
#> site.D 39.9651 0.0201
summary(res_chr)
#> Formula: height ~ 0 + Intercept + site 
#>    Data: transform(dat, site = as.character(site)) 
#>    AIC   BIC logLik
#>  28510 28518 -14254
#> 
#> 
#> Variance components:
#>          Estimated variances    S.E.
#> Residual               1.011 0.01431
#> 
#> Fixed effects:
#>             value   s.e.
#> Intercept  0.0000 0.0000
#> site.A     9.9987 0.0201
#> site.B    20.0432 0.0201
#> site.C    30.0013 0.0201
#> site.D    39.9651 0.0201

@famuvie famuvie changed the title Estimated "Intercept" with categorical covariates in fixed Using character variables in a model May 7, 2018
@famuvie famuvie self-assigned this May 7, 2018
@eityeity
Copy link
Author

eityeity commented May 7, 2018

Hi,

Thanks. I am relieved.
It will a issue on which the zero is assigned, on Intercept or on one of levels in categorical factor.

Still, I will leave this issue open as a minor bug. breedR should either not run a model with a character variable, or automatically convert it to factor.

I prefer later, because it would be similar behavior to other mixed model packages, like lme4.

On the other hand, I don't see why one of your levels gets renamed from D to Z. I could not reproduce it:

Sorry, It was my mistake. I have changed name of site in the output by hand to hide private information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants