Skip to content

File transfers to CERN Tape Archive CTA

KatyEllis edited this page Jul 16, 2020 · 71 revisions

Testing file transfers to the CMS CTA instance, in preparation for integration with Rucio.

NOTE: To get anything to work, I had to set up my grid proxy with "grid-proxy-init". No version of "voms-proxy-init -voms cms" with or without -rfc option gave me successful transfers.

  • If a file has transferred to disk, I have never seen it not subsequently transfer to tape. The only exception is when I transfer a second copy with the same name - then the end result is 1 copy on disk, 1 copy on tape.
  • Transfers to and from RAL - Pass (now fails)
  • Copies within CTA - Pass
  • Transfers from EOS - Pass
  • Transfers to EOS (Katy's user store) - Fail

Katy Ellis, 18-21/12/18 Transfers to and from T1_UK_RAL_Disk

Transfer file-

fts-transfer-submit -s <fts server> <source file> <dest file>

If this is submitted successfully, you get a transfer ID. Use this with –

fts-transfer-status -l -s <fts server> <ID>

With help from Chris Brew and Julien Leduc, I found a command that worked from a RAL machine to send a file transfer from RAL ECHO to CERN CTA:

-bash-4.1$ fts-transfer-submit -s https://fts3-devel.cern.ch:8446 gsiftp://gridftp.echo.stfc.ac.uk/cms:/store/mc/RunIIFall18GS/LambdabToP4250Phi_P4250ToJpsiLambda_BMuonFilter_DGamma0_TuneCP5_13TeV-pythia8-evtgen/GEN-SIM/102X_upgrade2018_realistic_v11-v2/10000/287928CF-4371-9A40-B29F-518F1EF75D2B.root root://eosctacmspps.cern.ch/eos/ctacmspps/archivetest/fromECHO_RAL.root

Output: 5feb7500-02de-11e9-88c2-02163e0170e3

Check status of transfer:

-bash-4.1$ fts-transfer-status -l -s https://fts3-devel.cern.ch:8446 5feb7500-02de-11e9-88c2-02163e0170e3

Output:
ACTIVE
Source:
gsiftp://gridftp.echo.stfc.ac.uk/cms:/store/mc/RunIIFall18GS/LambdabToP4250Phi_P4250ToJpsiLambda_BMuonFilter_DGamma0_TuneCP5_13TeV-pythia8-evtgen/GEN-SIM/102X_upgrade2018_realistic_v11-v2/10000/287928CF-4371-9A40-B29F-518F1EF75D2B.root
Destination: root://eosctacmspps.cern.ch/eos/ctacmspps/archivetest/fromECHO_RAL.root
State: ACTIVE
Reason:
Duration: -3754224173
Staging: 0
Retries: 0

Successful transfer to CTA disk:

-bash-4.1$ fts-transfer-status -l -s https://fts3-devel.cern.ch:8446 5feb7500-02de-11e9-88c2-02163e0170e3
Output:
FINISHED
Source:
gsiftp://gridftp.echo.stfc.ac.uk/cms:/store/mc/RunIIFall18GS/LambdabToP4250Phi_P4250ToJpsiLambda_BMuonFilter_DGamma0_TuneCP5_13TeV-pythia8-evtgen/GEN-SIM/102X_upgrade2018_realistic_v11-v2/10000/287928CF-4371-9A40-B29F-518F1EF75D2B.root
Destination: root://eosctacmspps.cern.ch/eos/ctacmspps/archivetest/fromECHO_RAL.root
State: FINISHED
Reason:

Duration: 79
Staging: 0
Retries: 0

Now check if file is at CERN (via LXPLUS machine at the moment):

XrdSecPROTOCOL=gsi X509_USER_PROXY=/tmp/x509up_u31379 eos root://eosctacmspps ls -y /eos/ctacmspps/archivetest/fromECHO_RAL.root
Output:
d1::t0 -rwxr-xr-x 1 cmsrobot def-cg 1939184434 Dec 18 17:04 fromECHO_RAL.root

The d1 shows it is on disk, with t0 meaning it is not written to tape yet.

A short time later it is on tape, t1:
[kellis@lxplus027]~/CTAtest% XrdSecPROTOCOL=gsi X509_USER_PROXY=/tmp/x509up_u31379 eos root://eosctacmspps ls -y /eos/ctacmspps/archivetest/fromECHO_RAL.root
Output:
d0::t1 -rwxr-xr-x 1 cmsrobot def-cg 1939184434 Dec 18 17:04 fromECHO_RAL.root

The reverse transfer back to RAL disk from CTA tape:

-bash-4.1$ fts-transfer-submit --bring-online 3600 -s https://fts3-devel.cern.ch:8446 root://eosctacmspps.cern.ch/eos/ctacmspps/archivetest/fromECHO_RAL.root gsiftp://gridftp.echo.stfc.ac.uk/cms:/store/temp/Katy/fromEcho_RAL_return.root
Output:
52c6f4b2-0518-11e9-9707-fa163edecedf

A little time later...

-bash-4.1$ fts-transfer-status -l -s https://fts3-devel.cern.ch:8446 52c6f4b2-0518-11e9-9707-fa163edecedf
Output:
FINISHED
Source: root://eosctacmspps.cern.ch/eos/ctacmspps/archivetest/fromECHO_RAL.root
Destination: gsiftp://gridftp.echo.stfc.ac.uk/cms:/store/temp/Katy/fromEcho_RAL_return.root
State: FINISHED
Reason:
Duration: 122
Staging: 8
Retries: 0

Check that it is on disk at RAL:

-bash-4.1$ gfal-ls gsiftp://gridftp.echo.stfc.ac.uk/cms:/store/temp/Katy/fromEcho_RAL_return.root
Output: gsiftp://gridftp.echo.stfc.ac.uk/cms:/store/temp/Katy/fromEcho_RAL_return.root

Katy Ellis, 21/12/18 Copying a file within CTA

Submit with the 'bring-online' option:
[kellis@lxplus063]~/CTAtest% fts-transfer-submit --bring-online 3600 -s https://fts3-devel.cern.ch:8446 root://eosctacmspps//eos/ctacmspps/archivetest/testfts_kellis_Tues1 root://eosctacmspps//eos/ctacmspps/archivetest/testfts_kellis_Fri

Initial output is STAGING, then changes to STARTED and FINISHED:
[kellis@lxplus063]~/CTAtest% fts-transfer-status -l -s https://fts3-devel.cern.ch:8446 4c8eb0f6-051f-11e9-8ab7-02163e00a077
Output:
STAGING
Source: root://eosctacmspps//eos/ctacmspps/archivetest/testfts_kellis_Tues1
Destination: root://eosctacmspps//eos/ctacmspps/archivetest/testfts_kellis_Fri
State: STAGING
Reason: null
Duration: 0
Staging: 0
Retries: 0

Transferring a file from EOS

-bash-4.1$ fts-transfer-submit -s https://fts3-devel.cern.ch:8446 root://eoscms.cern.ch//eos/cms/store/PhEDEx_LoadTest07/source/T2CHCERN_D8 root://eosctacmspps//eos/ctacmspps/archivetest/LoadTest_fromEOS
Output:
d022e2d6-0540-11e9-8ab7-02163e00a077

-bash-4.1$ fts-transfer-status -l -s https://fts3-devel.cern.ch:8446 d022e2d6-0540-11e9-8ab7-02163e00a077

FINISHED Source: root://eoscms.cern.ch//eos/cms/store/PhEDEx_LoadTest07/source/T2CHCERN_D8
Destination: root://eosctacmspps//eos/ctacmspps/archivetest/LoadTest_fromEOS
State: FINISHED
Reason:
Duration: 33
Staging: 0
Retries: 0

Transferring a file to EOS

I got some personal space on EOS for testing, from CERN Service Desk, at /eos/cms/store/user/kellis/.

I am currently unable to transfer to this space:

-bash-4.1$ fts-transfer-submit --bring-online 3600 -s https://fts3-devel.cern.ch:8446 root://eosctacmspps//eos/ctacmspps/archivetest/LoadTest_fromEOS root://eoscms//eos/cms/store/user/kellis/LoadTest_fromCTA

Output:
b7db60b4-054e-11e9-8b0a-02163e0185d5

-bash-4.1$ fts-transfer-status -l -s https://fts3-devel.cern.ch:8446 b7db60b4-054e-11e9-8b0a-02163e0185d5
Output:
FAILED
Source: root://eosctacmspps//eos/ctacmspps/archivetest/LoadTest_fromEOS
Destination: root://eoscms//eos/cms/store/user/kellis/LoadTest_fromCTA
State: FAILED
Reason: TRANSFER [42] Error on XrdCl::CopyProcess::Run(): [ERROR] Server responded with an error: [3012] sync - TPC open failed
Duration: 1
Staging: 7
Retries: 0

Tried this again on 3rd January 2019.

fts-transfer-submit --bring-online 3600 -s https://fts3-devel.cern.ch:8446 root://eosctacmspps//eos/ctacmspps/archivetest/LoadTest_fromEOS root://eoscms//eos/cms/store/user/kellis/LoadTest_fromCTA d7e5b130-0f84-11e9-82fa-fa163e044255

fts-transfer-status -l -s https://fts3-devel.cern.ch:8446 d7e5b130-0f84-11e9-82fa-fa163e044255 FAILED

Source: root://eosctacmspps//eos/ctacmspps/archivetest/LoadTest_fromEOS Destination: root://eoscms//eos/cms/store/user/kellis/LoadTest_fromCTA State: FAILED Reason: STAGING [42] [FATAL] Auth failed Duration: 9 Staging: 9 Retries: 0

Also tried again to copy the file back to RAL, which previously worked. Tried with grid- and voms- commands for setting up proxy but both attempts failed:

fts-transfer-submit --bring-online 3600 -s https://fts3-devel.cern.ch:8446 root://eosctacmspps.cern.ch/eos/ctacmspps/archivetest/fromECHO_RAL.root gsiftp://gridftp.echo.stfc.ac.uk/cms:/store/temp/Katy/fromEcho_RAL_return_3Jan.root 961fe542-0f87-11e9-b5f8-02163e00a077

fts-transfer-status -l -s https://fts3-devel.cern.ch:8446 961fe542-0f87-11e9-b5f8-02163e00a077 FAILED

Source: root://eosctacmspps.cern.ch/eos/ctacmspps/archivetest/fromECHO_RAL.root Destination: gsiftp://gridftp.echo.stfc.ac.uk/cms:/store/temp/Katy/fromEcho_RAL_return_3Jan.root State: FAILED Reason: STAGING [42] [FATAL] Auth failed Duration: 9 Staging: 9 Retries: 0

Creating an RSE on EOS

This next section doesn't touch on CTA, but is required as CTA needs to use xrootd, however xrootd doesn't work for third party copies. The one thing I can test is an xrootd copy between EOS and CTA.

I successfully made a replica on EOS from a dataset at Nebraska via the following protocol:

rucio-admin rse add-protocol --hostname eoscmsftp.cern.ch --scheme gsiftp --port 2811 --domain-json '{"wan": {"read": 1, "write": 1, "third_party_copy": 1, "delete": 1}, "lan": {"read": 1, "write": 1, "third_party_copy": 1, "delete": 1}}' --prefix '/eos/cms/store/katy' --impl rucio.rse.protocols.gfalv2.Default T3_CH_CERN_EOS_Test

I wanted to use my personal EOS space, but it's only 2TB, and the datasets already in Rucio are bigger than that. I checked EOS, and verified the file(s) were copied.

Progress as of 31/03/19

I am still using the EOS space noted in the previous section (/eos/cms/store/katy). Having some issues with replicas made previously, so I created a rule to put a new file already registered in Rucio on EOS disk. I then made a rule to replicate the same file on my CTA RSE.

[kellis@lxplus069]/eos/cms/store/katy/cms/store/mc% XrdSecPROTOCOL=gsi X509_USER_PROXY=/tmp/x509up_u31379 eos root://eosctacmspps ls -y /eos/ctacmspps/archivetest/cms/store/mc/RunIIAutumn18NanoAOD/DYJetsToLL_BGenFilter_Zpt-200toInf_M-50_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/102X_upgrade2018_realistic_v15-v1/80000 d1::t0 -rw------- 1 cmsrobot def-cg 23001323 Mar 31 21:39 CA5010F4-D3F0-B54F-8DF1-909B9F23838C.root

Now I want to try the reverse transfer. I deleted the rule keeping the file on EOS. However, Eric mentioned this doesn't really delete the file, and creating the rule again would not trigger a transfer. I then thought I could delete the file on EOS manually; however this is not possible as I do not have permissions to do this (in Rucio I am CMS robot, outside I have only user permissions). My plan would be to then delete the distance/ranking settings connecting EOS to sites other than CTA, if this is possible. This would try to force Rucio to replicate the file from CTA and not from another disk site holding a replica.

Hence my next steps are to copy a file into CTA the way I originally tried this - via fts-transfer-submit. Or somehow move a file to CTA via the Rucio file-registration script. Then I will hopefully have a file unique to Rucio on CTA, which I can try to replicate to EOS.

Using the cmsdatareplica script, tried to register a file into CTA directly - this failed, but not sure if it's dataset name or pin where there is a problem.

[root@6b48abc2ef1e scripts]# ./cmsdatareplica.py --pnn T3_CH_CERN_CTA_Test3 --dataset /BuToKJpsi_Toee_MuFilter_SoftQCDnonD_TuneCP5_13TeV-pythia8-evtgen/AODSIM/PUPoissonAve20_BParking_Bparking_102X_upgrade2018_realistic_v15-v1 --pool 1 --account kellis -v VERBOSE

VERBOSE:root:Getting datasets list for: ['/BuToKJpsi_Toee_MuFilter_SoftQCDnonD_TuneCP5_13TeV-pythia8-evtgen/AODSIM/PUPoissonAve20_BParking_Bparking_102X_upgrade2018_realistic_v15-v1'] Traceback (most recent call last): File "./cmsdatareplica.py", line 481, in <module> _get_dset_list(PCLI, OPTIONS.dataset), File "/tmp/CMSRucio/docker/CMSRucioClient/scripts/instrument.py", line 24, in timer_wrapper ret = func(*args, **kwargs) File "./cmsdatareplica.py", line 394, in _get_dset_list item in pcli.list_data_items(pditem=dset, metadata=False, locality=False) File "/tmp/CMSRucio/docker/CMSRucioClient/scripts/phedex.py", line 192, in list_data_items pditems = [item[outtype][0]['name'] for item in pditems] KeyError: 'name'

Then managed to copy a file from RAL ECHO to CTA via fts-transfer-submit. FTS webpage and fts-transfer-status both reported this as a failure, with incorrect file size, however, I see the file with the correct file size by looking directly on CTA:

[kellis@lxplus069]/eos/cms/store/katy/cms/store/mc% XrdSecPROTOCOL=gsi X509_USER_PROXY=/tmp/x509up_u31379 eos root://eosctacmspps ls -y /eos/ctacmspps/archivetest/ d1::t0 dr-xr-xr-+ 1 cmsrobot def-cg 23001323 Mar 31 21:39 cms d1::t0 -rwxr-xr-x 1 cmsrobot def-cg 6348709302 Apr 1 00:07 fromECHO_ftsTransfer.root

Now I try registering the file in Rucio using the cmsdatareplica script. There is an indication this has worked, but I am not certain - NO I DON'T THINK SO:

[root@6b48abc2ef1e scripts]# ./cmsdatareplica.py --pnn T3_CH_CERN_CTA_Test3 --dataset fromECHO_ftsTransfer.root --pool 1 --account kellis -v VERBOSE VERBOSE:root:Getting datasets list for: ['fromECHO_ftsTransfer.root'] VERBOSE:root:Got 0 datasets SUMMARY:root:Final Stats: n.pnns: 0, n.datasets: 0, poolsize: 1, timing: {'_get_dset_list': {'start': 1554071036.75139, 'end': 1554071036.890538, 'etime': 0.13914799690246582}, '_launch_workers': {'start': 1554071036.890596, 'end': 1554071036.89228, 'etime': 0.0016841888427734375}, '_get_workers': {'start': 1554071036.892288, 'end': 1554071036.892327, 'etime': 3.910064697265625e-05}}

30/04/2019 Working at CERN I successfully transferred a file into the French T1 tape buffer RSE T1_FR_CCIN2P3_Buffer_Test. Still struggling to transfer off the buffer site (whether it is on buffer or on tape). Set up an srm protocol on the EOS RSE, but getting errors.

Eric got the Reaper working on the EOS site, so I can now at least attempt file transfer back from CTA to EOS (because the only place you can currently transfer INTO CTA is EOS, you have to be able to either properly delete files on EOS, or register files on CTA). However this is currently not working.

1/05/2019 Still at CERN I can now transfer the file at the French T1 to Nebraska and EOS site. Eric explained that the srm protocol was not necessary to match with the one on the T1.

The reaper is working, and I can monitor it via the 'pod' logs.

17/05/2019 I have now made several transfers in both directions between EOS and CTA, even after Julien had to delete everything in our store to make some changes (10/05/2019).

The steps are: Find new dataset on e.g. Italian disk T1 or French buffer T1 (these are fully synched). Move dataset to EOS. Update-distance between EOS and CTA (this seems to remove itself in between sessions? Overnight?). Transfer dataset to CTA. Delete rule keeping dataset on EOS - check this has worked, the reaper must run. Start new session so that link between EOS and other sites is 'broken'. Update-distance between CTA and EOS. Create rule to put dataset on EOS - check this has used source CTA.

I also tested the removal of the link between two sites by setting distance and ranking to -1, but this did not work and the file transferred to EOS via the original site again.

Katy Ellis, 06/08/19 Scale tests with EOS -> CTA

TB scale test

Today I moved a 66-file/220GB dataset to CTA via EOS. Then I moved a 500-file/1.2TB block from KIT Disk to EOS, then CTA: DATASET: cms:/Neutrino_E-10_gun/RunIISummer17PrePremix-PURun3_106X_mcRun3_2024_realistic_v4-v2/PREMIX#045f6c94-937a-48e8-916a-0389d85fad97

This got stuck moving to EOS, with 400 files transferred and 100 'stuck'. It took a few hours, but eventually the remaining 100 transferred successfully, and I don't believe any changes were made.

Here is a list of blocks I will use to make a 10 TB test:

cms:/Neutrino_E-10_gun/RunIISummer17PrePremix-PURun3_106X_mcRun3_2024_realistic_v4-v2/PREMIX#045f6c94-937a-48e8-916a-0389d85fad97 cms:/Neutrino_E-10_gun/RunIISummer17PrePremix-PURun3_106X_mcRun3_2024_realistic_v4-v2/PREMIX#4d850a2c-9504-4dc2-8e65-ab15b6429714 cms:/Neutrino_E-10_gun/RunIISummer17PrePremix-PURun3_106X_mcRun3_2024_realistic_v4-v2/PREMIX#2b178282-57a3-4b4a-b81f-4cde045b570c cms:/Neutrino_E-10_gun/RunIISummer17PrePremix-PURun3_106X_mcRun3_2024_realistic_v4-v2/PREMIX#ecf1db3d-5b9b-4af9-8b6b-7ba092f925ae cms:/Neutrino_E-10_gun/RunIISummer17PrePremix-PURun3_106X_mcRun3_2024_realistic_v4-v2/PREMIX#0e4ef092-a61a-41ee-b7ed-86491be67ef3 cms:/Neutrino_E-10_gun/RunIISummer17PrePremix-PURun3_106X_mcRun3_2024_realistic_v4-v2/PREMIX#d8e0c677-3e29-41ff-8348-b8a5734a2466 cms:/Neutrino_E-10_gun/RunIISummer17PrePremix-PURun3_106X_mcRun3_2024_realistic_v4-v2/PREMIX#465d3e98-a600-4e7c-a4d6-a215b6cb1203 cms:/Neutrino_E-10_gun/RunIISummer17PrePremix-PURun3_106X_mcRun3_2024_realistic_v4-v2/PREMIX#bef41487-6cf3-4f6f-ac13-0f00d5c46ccf

Most blocks are approx 1.2 TB, so the 8 in this list should be 9.6 TB or slightly less. Update: all blocks in the list are transferred to CTA. I subsequently deleted them from EOS.

20/08/19 Now to transfer the large blocks from CTA back to EOS (specifying CTA as source, or the files would probably just transfer from KIT again):

Submitted the rule and received 5 timeouts like this: bash-4.2$ rucio add-rule cms:/Neutrino_E-10_gun/RunIISummer17PrePremix-PURun3_106X_mcRun3_2024_realistic_v4-v2/PREMIX#045f6c94-937a-48e8-916a-0389d85fad97 1 T3_CH_CERN_EOS_Test --source-replica-expression T3_CH_CERN_CTA_Test3 2019-08-20 11:14:30,075 ERROR An unknown exception occurred. Details: no error information passed (http status code: 504 ('gateway_timeout',))

I then did list-rules and saw that the rule was not present. I submitted the rule a 6th time and got the message that this rule already exists. I did list-rules again and saw that it was in state STUCK[0/300/200].

However, there seems to be a lot of transfers in the system lately, and it's taking some time for new requests to be dealt with. I will now attempt to submit rules for the other large blocks from the same dataset.

Katy Ellis, 28/11/19 More TB Scale tests with EOS -> CTA, took place during the Rucio Coding Camp 15-17 October 2019

I set up rules to transfer approximately 10TB of data into CTA tape. The multi-hop functionality is not yet in place, so I chose data that is already on T2_CH_CERN (i.e. on EOS). At first files were transferring, but then the rate started to plateau. I tried re-triggering the transfers - e.g. setting them to 'stuck' but this had little effect. Today, the results look like this:

10d4c94fc4d74cffab1f12bafdaabd11 root cms:/Neutrino_E-10_gun/RunIISummer19ULPrePremix-UL17_106X_mc2017_realistic_v6-v1/PREMIX#2760361f-2c1c-4d14-a4f6-633d85f782f9 REPLICATING[490/10/0] T3_CH_CERN_CTA_Test 1 2019-10-17 12:33:02 3f05283434ef41da8332837312bfd16d root cms:/Neutrino_E-10_gun/RunIISummer19ULPrePremix-UL17_106X_mc2017_realistic_v6-v1/PREMIX#23024b3e-4721-4c04-bbbe-1a05064d7b3a REPLICATING[0/500/0] T3_CH_CERN_CTA_Test 1 2019-10-17 07:59:26 bf997b83f6eb43299c0492d494ca3234 root cms:/Neutrino_E-10_gun/RunIISummer19ULPrePremix-UL17_106X_mc2017_realistic_v6-v1/PREMIX#13b7f736-62d4-4e89-a8ef-a4a6f28de325 REPLICATING[0/500/0] T3_CH_CERN_CTA_Test 1 2019-10-16 15:45:58 4c9417252b70471cab735a3994114664 root cms:/Neutrino_E-10_gun/RunIISummer19ULPrePremix-UL17_106X_mc2017_realistic_v6-v1/PREMIX#d1dd4582-faf1-4e75-a254-ffb2b36e6cf5 REPLICATING[184/316/0] T3_CH_CERN_CTA_Test 1 2019-10-17 13:45:12 10a9972d7f7348f5ac4b4be8db74232d root cms:/Neutrino_E-10_gun/RunIISummer19ULPrePremix-UL17_106X_mc2017_realistic_v6-v1/PREMIX#282993cf-39be-48a6-94f3-ed0262a2a88f REPLICATING[27/473/0] T3_CH_CERN_CTA_Test 1 2019-10-17 09:47:21 e82b44587d1542f5a2471f65c2467034 root cms:/Neutrino_E-10_gun/RunIISummer19ULPrePremix-UL17_106X_mc2017_realistic_v6-v1/PREMIX#1f6ad7af-53ee-4e1a-8244-93d28d68315d REPLICATING[0/500/0] T3_CH_CERN_CTA_Test 1 2019-10-16 20:45:58 41f75b4e04e84683b43afa8618592be3 root cms:/Neutrino_E-10_gun/RunIISummer19ULPrePremix-UL17_106X_mc2017_realistic_v6-v1/PREMIX#db92055b-ddf3-40e6-b31f-64e1142e91f8 REPLICATING[210/290/0] T3_CH_CERN_CTA_Test 1 2019-10-17 14:43:21

As you can see, some of the datasets have transferred zero files. I must have tried to find out at the time what the problem was, but I think there were other problems with k8s or FTS logs.

bash-4.2$ rucio list-dataset-replicas cms:/Neutrino_E-10_gun/RunIISummer19ULPrePremix-UL17_106X_mc2017_realistic_v6-v1/PREMIX#2760361f-2c1c-4d14-a4f6-633d85f782f9

DATASET: cms:/Neutrino_E-10_gun/RunIISummer19ULPrePremix-UL17_106X_mc2017_realistic_v6-v1/PREMIX#2760361f-2c1c-4d14-a4f6-633d85f782f9 +---------------------+---------+---------+ | RSE | FOUND | TOTAL | |---------------------+---------+---------| | T1_US_FNAL_Disk | 500 | 500 | | T2_CH_CERN | 500 | 500 | | T3_CH_CERN_CTA_Test | 490 | 500 | +---------------------+---------+---------+

bash-4.2$ rucio list-dataset-replicas cms:/Neutrino_E-10_gun/RunIISummer19ULPrePremix-UL17_106X_mc2017_realistic_v6-v1/PREMIX#1f6ad7af-53ee-4e1a-8244-93d28d68315d

DATASET: cms:/Neutrino_E-10_gun/RunIISummer19ULPrePremix-UL17_106X_mc2017_realistic_v6-v1/PREMIX#1f6ad7af-53ee-4e1a-8244-93d28d68315d +---------------------+---------+---------+ | RSE | FOUND | TOTAL | |---------------------+---------+---------| | T2_CH_CERN | 500 | 500 | | T3_CH_CERN_CTA_Test | 0 | 500 | +---------------------+---------+---------+

Friday 13th March 2020 - working at Fermilab with Eric: Attempting to get multihop to work e.g. from another site to CTA via EOS.

T2_CH_CERN_Test is used as the intermediate step. This RSE has the added attribute : available_for_multihop: True

The system as a whole should be in this state: rucio-admin config set --section transfers --option use_multihop --value True

I created a rule and 2 FTS requests were made simultaneously. The first one succeeded and copied a file from T2_US_Florida to T2_CH_CERN_Test via the gsiftp protocol. The second one failed, as the file was not yet on EOS. A couple of minutes later, the second 'hop' succeeded, as the file was now on EOS, and I can see it as both a successful rule and the file is on CTA buffer:

[kellis@lxplus735]~/CTAtest% XrdSecPROTOCOL=gsi X509_USER_PROXY=/tmp/x509up_u31379 eos root://eosctacmspps ls -y /eos/ctacmspps/archivetest/cms//store/mc/RunIIFall17NanoAODv5/BulkGravToZZToZhadZhad_narrow_M-1000_13TeV-madgraph/NANOAODSIM/PU2017_12Apr2018_Nano1June2019_102X_mc2017_realistic_v7-v1/40000/60BAD679-9C62-9446-BDFF-3424C437A89C.root d1::t0 -rw------- 1 cmsrobot def-cg 595232779 Mar 13 16:11 60BAD679-9C62-9446-BDFF-3424C437A89C.root

15th April 2020 Multihop seen working properly for the first time, in both directions. Requires the following config:

Enable multihop On the 'middle hop' RSE - available_for_multihop: True When creating the conveyor-submitter daemon, it needs this option with the value set to >1 : --bulk-group 2

Do not create a direct link e.g. T1 -> CTA (not sure if this will stop it working or what). Make sure links T1->EOS and EOS->CTA are valid.

A submitted rule should generate one FTS job for the entire transfer.

22nd April 2020 Submitted a 500 file (2 TB) transfer to CTA. The dataset was on T2_CH_CERN, and although the hop is defined on T2_CH_CERN_Test, I specified the source to be the other RSE with the dataset, T1_US_FNAL_Disk.

Monitoring

https://monit-grafana.cern.ch/d/000000006/perf3?orgId=29&from=1587670200000&to=now&refresh=30s&var-tapeserver=eosctafst0106.cern.ch&var-resolution=10&var-instance=All

this is the monitoring of your SSD server, look at the netdevice part of the plots, rx is what is received by the buffer server -> what comes from EOSCMS, tx is what goes to tape infrastructure, rx-tx is what is accumulated in the buffer, to be written to tape.

https://monit-grafana.cern.ch/d/000000786/infrastructure-castor-and-cta?from=now-1h&orgId=29&to=now&var-vo=CMS

23rd April 2020 Submitted 4 more 500 file / 2 TB blocks to CTA, mostly specified to transfer from FNAL. This gave a rate of 3 GB /s onto the buffer, with one server.

Julien said: "I just allocated 1 machine to the eosctacmspps instance out of the 32 I have. -> 90GB/s for run3 out of which I will just use ~60"

He is currently short of tape drives - they are delayed due to Covid19.

30th June 2020 In the last week I have been preparing for a much larger test for CTA. I have chosen a dataset which is 187 TB. I have coordinated with Maria from the EOS team as well as Julien, to make sure there is space in EOS for the multihop, and provided her with the exact location the data will land on its route.

This is the dataset: cms:/Neutrino_E-10_gun/RunIISummer17PrePremix-PURun3_106X_mcRun3_2023_realistic_v3-v2/PREMIX

During this week we found a new multihop issue, related to the recent update to the functionality (of course this was not seen in tests with ATLAS!). Because I was only testing with CTA as a destination, I did not have links set for CTA as a source. This caused the code to consider CTA RSE as an 'island'. So it was unable to transfer. As soon as I set the outward link things started to look better.

At the same time, Julien was updating the CMS instance of CTA to version 3. In the process he cleared our pre-production space, where Rucio writes on CTA. When this was done, Rucio retransferred all the files, satisfying all the existing rules (except one, where I think the data was not present - deleted this). I figure this was a good pre-test for the big test tomorrow (Wed 1st July).

One other change for the test - since multihop is currently not using the same FTS for both hops (it uses the FTS specified by T2_CH_CERN_Test for the first transfer, and not using fts3-pilot.cern.ch for both, as it should) I am changing the fts attribute on T2_CH_CERN_Test to the pilot instance. Julien wants to continue using this instance for this test, as it is easier to debug/update and has less traffic. However, he confirmed it is now possible to use fts3.cern.ch for transfers to CTA - next time perhaps.

The test report is here: https://docs.google.com/document/d/1rkrOIVboyNdd9BKIhePd10Psyp0ib4v4An8kz1cd_dk/edit?usp=sharing