Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of ATLAS_WJ_8TEV in the new format #2210

Merged
merged 16 commits into from
Dec 5, 2024
Merged

Conversation

achiefa
Copy link
Contributor

@achiefa achiefa commented Nov 12, 2024

Re-implementation of the ATLAS_WJ_8TEV in the new commondata format. I summarise the relevant information below, also as a reminder for myself.

$(x, Q^2)$ map and data-theory comparisons

Legacy reports: [legacy], [legacy_ATLAS], [legacy_NP]
New reports: [default HepData v1], [default HepData v2]
default = symmetrized,

Description

This dataset provides a correlation matrix for statistical uncertainties. Systematic uncertainties are given for diagonal entries only. Quoting from the ATLAS paper:

All uncertainties of a statistical nature, such as the statistical uncertainty of the data, the statistical uncertainty of simulated samples used in the background estimate, or the uncertainty from limited sample size of the signal simulation used in the unfolding are treated as uncorrelated between bins and between $W^+$ and $W^−$ production. All other systematic uncertainties are treated as fully correlated between bins and between the production of $W^+$ and $W^−$ bosons.

The statistical uncertainties provided in the HepData table are:

Source Type Corr. b/w bins and $W^{\pm}$ Correalted with other experiments
JetScaleEff1 Systematic Yes No
JetScaleEff2 Systematic Yes No
JetScaleEff3 Systematic Yes No
JetScaleEff4 Systematic Yes No
JetScaleEff5 Systematic Yes No
JetScaleEff6 Systematic Yes No
JetScaleEta1 Systematic Yes No
JetScaleEta2 Systematic Yes No
JetScaleHighPt Systematic Yes No
JetScaleMC Systematic Yes No
JetScalePileup1 Systematic Yes Yes
JetScalePileup2 Systematic Yes Yes
JetScalePileup3 Systematic Yes No
JetScalePileup4 Systematic Yes No
JetScaleFlav1Known Systematic Yes Yes
JetScaleFlav2 Systematic Yes Yes
JetScaleBjet Systematic Yes No
JetScalepunchT Systematic Yes Yes
JetResolution10 Systematic Yes No
JetSFBeff Systematic Yes No
JetSFCeff Systematic Yes No
JetSFLmistag Systematic Yes No
JetSFHighPt Systematic Yes No
JetJVFcut Systematic Yes Yes
ElScaleR12 Systematic Yes No
ElScaleZee Systematic Yes No
ElScalePS Systematic Yes No
ElResolution Systematic Yes No
ElSFReco Systematic Yes No
ElSFId Systematic Yes No
ElSFTrigger Systematic Yes No
ElSFIso Systematic Yes No
ElSFChargeMisID Systematic Yes No
METScale Systematic Yes No
METResLong Systematic Yes No
METResTrans Systematic Yes No
PileupWeight Systematic Yes No
QCDlowRange Theory Yes No
QCDhighRange Theory Yes No
QCDvarIso Theory Yes No
QCDvarElID Theory Yes No
QCDfitUncert Theory Yes No
QCDotherGen Theory Yes No
QCDfitRebin Theory Yes No
XsecZ Theory Yes No
XsecTop Theory Yes No
XsecDibos Theory Yes No
BkgTtbarNorm Theory Yes No
BkgMCstat Systematic Yes No
WHFmodel Theory Yes No
LumiUncert Systematic Yes Yes
UnfoldMCstat Stat No No
UnfoldOtherGen Stat No No
UnfoldReweight Stat No No

Please, @enocera @scarlehoff, feel free to check if this breakdown has flaws, as matching with the paper was not trivial.

There are a few things to mention:

  1. The experimentalists deliver the correlation matrix for the statistical uncertainties. Hence, there will be 16 additive and correlated artificial uncertainties.
  2. Some systematic uncertainties correlate between bins and between $W^+$ and $W^-$ productions, as can be read from the second column. There are 50 + 1 (lumi) correlated systematic uncertainties in total and possibly 3 uncorrelated statistical uncertainties (if I'm not wrong). Since the systematic uncertainties between $W^{\pm}$ productions correlate, I will use the same type of label in the common data (ATLASWX).
  3. The majority of the uncertainties are asymmetric. In the legacy implementation, these were treated using the CMS prescription (eq. 6 of 1703.01630). Roughly speaking, in this prescription the upper and lower bounds of the asymmetric uncertainties are treated as different artificial sources. The same argument applies to symmetric uncertainties, the only exception being the uncertainty associated with the luminosity. Hence, the legacy implementation has $(50+3) * 2 (sys)+ 16 (stat.) + 1 (lumi.) = 123$ artificial sources. @enocera suggested using the symmetric prescription as default. I will implement this CMS prescription as a variant.
  4. There is a third column in the table. This column expresses the inter-experiment correlations, explained in this paper. I post the relevant table below. Accounting for these inter-experiment correlations necessitates common uncertainty type flags for 8 TeV $W$ + jets, 7 TeV $W$ + jets, $t\bar{t}$ lepton + jets data at 8 TeV and 13 TeV, and inclusive jets at 8 TeV datasets. If you agree, I will start by neglecting these correlations and leave them for a second iteration across the commondata.

For the record, there is a third variant in the legacy implementation, which accounts for non-perturbative corrections (NP) and that I will ignore for the moment since they were not used in NNPDF4.0.
Screenshot 2024-11-29 at 16 37 16

@achiefa achiefa requested a review from scarlehoff November 12, 2024 13:00
@achiefa achiefa self-assigned this Nov 12, 2024
@achiefa achiefa marked this pull request as draft November 12, 2024 13:01
@achiefa
Copy link
Contributor Author

achiefa commented Dec 1, 2024

I have implemented the two variants I discussed in the description - default (symmetrized) and CMS_prescription (see above). However, the $\chi^2$ is notably worse than the legacy implementation. However, this could not be a problem since the HepData tables have been updated and modified since the first legacy implementation (see 'Version 2 modifications' in HepData). Below you can find data-theory comparisons with the new implementation for the two variants.
New reports: [sym], [cms]

@achiefa
Copy link
Contributor Author

achiefa commented Dec 2, 2024

I attach the comparison between the two covmats that should be the same (but they are not).
Screenshot 2024-12-02 at 21 12 10

@achiefa achiefa marked this pull request as ready for review December 2, 2024 21:37
@scarlehoff
Copy link
Member

What do you mean they should be equal? If they are not, does that means there's a bug in the old, in the new...?

@achiefa
Copy link
Contributor Author

achiefa commented Dec 3, 2024

I mean that the legacy implementation adopted the cms prescription to treat the asymmetric uncertainties as default. Hence, the variant CMS_prescription should be the same as the legacy one. Now, I have the following observations:

  1. The HepData tables have been updated since the legacy implementation. In particular, they changed the systematic luminosity uncertainty.
  2. I agree that there could be a bug in the old implementation, but that is hard to say from my point of view.
  3. There could be a bug in my implementation, but I have not found any.
    However, ERN and I agreed to use the symmetrised variant as default for NNPDF4.1. The CMS prescription will no longer be used, and that variant was just a plus that I implemented.

@scarlehoff
Copy link
Member

The HepData tables have been updated since the legacy implementation. In particular, they changed the systematic luminosity uncertainty.

Ok, then they should not be equal, right?

@achiefa
Copy link
Contributor Author

achiefa commented Dec 3, 2024

Yes, they should not be equal

Copy link
Member

@scarlehoff scarlehoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the implementation itself I don't see anything wrong however, I'm slightly worried about the chi2 changes. The CMS prescription should've kept the results at least similar, however the change in the chi2 is huge (from ~1 to ~1.5), in the "default" it goes all the way to chi2 ~ 3... and you left a few TODO in the filter_utils and filter files. Are those to-do no longer applicable or some of those checks would change the results?

@enocera is this expected? I am happy with merging this as it is and revising possible problems in the future if you are ok with it.

@achiefa please, remove the old kinematics file

@achiefa
Copy link
Contributor Author

achiefa commented Dec 4, 2024

and you left a few TODO in the filter_utils and filter files. Are those to-do no longer applicable or some of those checks would change the results?

You're right. I'll check whether these TODO's affect the results. Thank you for pointing this out.

@enocera
Copy link
Contributor

enocera commented Dec 4, 2024

@achiefa @scarlehoff This is unexpected. @achiefa If I understand correctly, there are two versions on Hepdata. what happens if you use the other version?

@achiefa
Copy link
Contributor Author

achiefa commented Dec 4, 2024

@enocera It doesn't improve with the older version either. BTW, I was looking carefully at the single sources of systematic uncertainties and I observed that, in some cases, the plus-minus signs are inverted (see the snapshot below). Do these uncertainties require a particular treatment?

Screenshot 2024-12-04 at 12 23 14

@enocera
Copy link
Contributor

enocera commented Dec 4, 2024

@achiefa The case that you have highlighted has symmetric uncertainties upside down. In this case you should also use the symmetrisation formula by D'Agostini, wich will return a "symmetrised" uncertainty of -4 (and not +4). This means that the uncertainty is anticorrelated. Are you sure that the sign is correctly taken into account?

@achiefa achiefa marked this pull request as draft December 4, 2024 15:56
@achiefa
Copy link
Contributor Author

achiefa commented Dec 4, 2024

Ok, I think I was mistakenly taking the absolute value of some numbers. Now the symmetrized version (which is going to become the default variant in the new implementation) gives much better results [see here].

@scarlehoff
Copy link
Member

Great! This is much better. Not only that, it is close to the numbers one gets with the NP variant which I think itis expected, right?

@enocera
Copy link
Contributor

enocera commented Dec 4, 2024

Great! This is much better. Not only that, it is close to the numbers one gets with the NP variant which I think itis expected, right?

Indeed!

@achiefa
Copy link
Contributor Author

achiefa commented Dec 4, 2024

Wait, the link to legacy_NP points to the wrong report. Let me update it.

@achiefa
Copy link
Contributor Author

achiefa commented Dec 4, 2024

Ok, updated. Honestly, I don't see much difference between legacy and legacy_NP.

@achiefa
Copy link
Contributor Author

achiefa commented Dec 4, 2024

Ok, now I have uploaded the data-theory comparisons for version 1 and version 2 of the HepData tables.

@scarlehoff
Copy link
Member

Nice. Is this ready then?

@achiefa
Copy link
Contributor Author

achiefa commented Dec 5, 2024

Yes, it is. The inter-experiment correlations are still missing, but my understanding is that we will address them in a second iteration. Maybe it is worth making a note pointing to this PR for reference. In the description I put all the necessary information to include these correlations.

@achiefa achiefa marked this pull request as ready for review December 5, 2024 11:25
@scarlehoff
Copy link
Member

Perfect! Thanks

@scarlehoff
Copy link
Member

Sorry, just one question, then the CMS version of the prescription we don't want it? Is it because it is exactly the same as before?

@achiefa
Copy link
Contributor Author

achiefa commented Dec 5, 2024

The CMS prescription was the one adopted in the legacy implementation and I didn't manage to reproduce it. However, after a chat with @enocera, we agreed to neglect this prescription so that its implementation is entirely unnecessary.

@scarlehoff scarlehoff added the Done PRs that are done but waiting on something else to merge/approve label Dec 5, 2024
@scarlehoff scarlehoff merged commit 1bc9d9f into master Dec 5, 2024
9 checks passed
@scarlehoff scarlehoff deleted the new_ATLAS_WJ_8TEV branch December 5, 2024 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ATLAS_DY_DATA data toolchain Done PRs that are done but waiting on something else to merge/approve
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants