Meeting with supervisor (Dentica) #2

AChatzigoulas · 2020-09-01T14:55:35Z

Dear Harry,

we received an email to contact you as our supervisor to discuss details about the implementation of OPENAIRE project Phase 2.

Can we arrange a meeting for tomorrow? We are available at 10am or 1pm EEST.

Best,
Ingredio team

harry-di · 2020-09-01T16:11:56Z

Dear @AChatzigoula,
unfortunately 10am or 1pm EEST would not work for me, as I'm already booked with other telcos during that period, but I'm available on Thursday until between 11am and 3:30pm Athens time. Would that work for you?

Kind regards,
Harry

zoecournia · 2020-09-01T19:03:57Z

Hi Harry,

Unfortunately we are not available Thursday and Friday. Would any time work for you tomorrow?
Best
Zoe

zoecournia · 2020-09-01T19:04:00Z

Hi Harry,

Unfortunately we are not available Thursday and Friday. Would any time work for you tomorrow?
Best
Zoe

zoecournia · 2020-09-01T19:04:07Z

Hi Harry,

Unfortunately we are not available Thursday and Friday. Would any time work for you tomorrow?
Best
Zoe

zoecournia · 2020-09-01T19:04:08Z

Hi Harry,

Unfortunately we are not available Thursday and Friday. Would any time work for you tomorrow?
Best
Zoe

harry-di · 2020-09-01T20:01:36Z

Hi Zoe,

The only slot that I can cancel tomorrow is 3-4:30pm (attending the VLDB 2020 Round table discussion on "Intelligent Data Exploration") but that's ok if it suits you as I want to know how things are going on and what help you require from OpenAIRE.

Best,
Harry

zoecournia · 2020-09-01T22:26:34Z

Hi Harry,

No that's ok please dont cancel it. I will be traveling on Thursday but I guess I could participate in a call at 11am. Will you send us a zoom link?

harry-di · 2020-09-02T00:42:07Z

OK, great, I'll send a link for Thu 11am.

harry-di · 2020-09-02T10:16:00Z

Dentica Meeting - OpenAIRE-Advance Open Innovation Call
Thu, Sep 3, 2020 11:00 AM - 12:00 PM (EEST)

Please join my meeting from your computer, tablet or smartphone.

https://global.gotomeeting.com/join/830960165

You can also dial in using your phone.
(For supported devices, tap a one-touch number below to join instantly.)

United States (Toll Free): 1 866 899 4679

One-touch: tel:+18668994679,,830960165#

United States: +1 (571) 317-3116

One-touch: tel:+15713173116,,830960165#

Access Code: 830-960-165

More phone numbers:
(For supported devices, tap a one-touch number below to join instantly.)

Australia: +61 2 9091 7603

One-touch: tel:+61290917603,,830960165#

Austria: +43 7 2081 5337

One-touch: tel:+43720815337,,830960165#

Belgium: +32 28 93 7002

One-touch: tel:+3228937002,,830960165#

Canada: +1 (647) 497-9373

One-touch: tel:+16474979373,,830960165#

Denmark: +45 32 72 03 69

One-touch: tel:+4532720369,,830960165#

Finland: +358 923 17 0556

One-touch: tel:+358923170556,,830960165#

France: +33 187 210 241

One-touch: tel:+33187210241,,830960165#

Germany: +49 721 6059 6510

One-touch: tel:+4972160596510,,830960165#

Ireland: +353 15 360 756

One-touch: tel:+35315360756,,830960165#

Italy: +39 0 230 57 81 80

One-touch: tel:+390230578180,,830960165#

Netherlands: +31 207 941 375

One-touch: tel:+31207941375,,830960165#

New Zealand: +64 9 913 2226

One-touch: tel:+6499132226,,830960165#

Norway: +47 21 93 37 37

One-touch: tel:+4721933737,,830960165#

Spain: +34 932 75 1230

One-touch: tel:+34932751230,,830960165#

Sweden: +46 853 527 818

One-touch: tel:+46853527818,,830960165#

Switzerland: +41 225 4599 60

One-touch: tel:+41225459960,,830960165#

United Kingdom: +44 20 3713 5011

One-touch: tel:+442037135011,,830960165#

New to GoToMeeting? Get the app now and be ready when your first meeting starts: https://global.gotomeeting.com/install/830960165

harry-di · 2020-09-03T08:42:31Z

Dear Zoe, all,

Regarding the issues you've been having with the OpenAIRE I've managed to get a reply from Claudio at CNR:
"Please ignore that dump files. We got a more recent one on OpenAIRE's Hadoop HDFS, represented in a simpler and better documented json-based data model".

harry-di · 2020-09-03T08:43:13Z

So, he will get back to me with more details soon (he's on parental leave this week, but he might give me more info by tomorrow).

zoecournia · 2020-09-03T09:24:00Z

Dear Harry,
This is excellent news! Looking forward to receiving the new files.

harry-di · 2020-09-04T11:59:15Z

Dear Zoe, all,

Claudio said that the new graph dump is represented according to the following json schema
https://code-repo.d4science.org/miriam.baglioni/dnet-hadoop/src/branch/dump/dhp-workflows/dhp-graph-mapper/src/main/resources/eu/dnetlib/dhp/oa/graph/dump_whole/schema
(You might find some minor issues here and there as we're still finalising it, but it will be published soon as the official graph dump model v1.0)

I'm still waiting for instructions on how you can access this dump on HDFS.

harry-di · 2020-09-04T15:13:56Z

Claudio now informed me that HDFS is not normally accessed by external users, so he's trying to get an answer from project admin if we can consider you as "extended tech team" and grant you access for the duration of phase 2, etc.

zoecournia · 2020-09-06T10:58:30Z

Dear Harry,

Thank you very much for your prompt actions. It would be great to have access to the file in this formal. Also, we'd certainly need to know if this is a format that you plan to stick to in order to adopt it.
If it;s possible to send us the feedback of our application, that would be great.

harry-di · 2020-09-07T12:36:04Z

Dear Zoe,

I just had an update from Claudio:

" last week when we discussed the possibility to grant temporary access on HDFS to the OpenCall participants I didn’t keep in mind that the whole set of tools and web UIs we (tech team) use on a daily basis are reachable only through our VPN, so in my opinion this is a no-go for external users, I don’t think we can support them to configure the clients on their sides. So since Miriam won’t be back until next week we cannot expect the dump to be published on Zenodo until at least 10days/2weeks, therefore to cut the corner and save some time I’m trying to move the file containing the publications (plus the other result types) on some VM@CNR, where I’ll make it temporarily available for some time through an HTTP url. "

I think that is the best and easiest solution for you at the moment. Let's wait for Claudio to copy the data to a site you can access and download it.

Regarding the format, the official version will be published in about 10 days but it is very likely that it will be identical to this one.

All the best,
Harry

harry-di · 2020-09-07T12:37:53Z

Actually he's just done it:

_Harry the result.tar file is available from https://dev-openaire.d4science.org/dump/result.tar
I downloaded ~4Gb of it, un-tarred and noticed the content is there, so assuming the transfer didn’t corrupt the rest of the file it should be OK to pass it over to the participants.

Regarding the data model, we’re going to use that JSON schema as reference model for these json dump files, but as I mentioned in skype with Thanasis &CO it still needs some adjustments before it can be officially published (edited)_

zoecournia · 2020-09-08T05:49:59Z

Hi Harry, thanks a lot! - I am forwarding this info to the team and we ;ll let you know if we have any questions!!

zoecournia · 2020-09-09T21:30:32Z

Dear Harry,

We have now downloaded and processed the json file. I would like to confirm that you would like us to provide our new dump to the OpenAire Research Graph in this format.
If yes, we have processed an example of five entries in the same json format, for you to check that this is indeed the desired format. How can I send you the files?

Best
Zoe

zoecournia · 2020-09-09T21:32:52Z

Archive.zip
OK I could upload the files here.

result_sample.json contains five entries from your file

ingredio_compounds_sample.json contains five entries from our data

The final json file that we will deliver to you can be in the format of ingredio_compounds_sample.json or are there any other changes needed?

Also, if you could send us the feedback we received form reviewers that would be great.

harry-di · 2020-09-10T12:49:38Z

Dear Zoe,
I'm glad that you managed to download and process the json graph file.
Thank you for providing a sample of your output.

So if I understand this correctly, in each line you provide a chemical compound (pubchem id) linked to a number of PMC publications, giving PMID, Pubchem_ID, Article (title), Journal, Abstract, and DOI for each.
This looks fine to me but I've shared your example with Claudio at CNR to be absolutely sure, so I'll let you know as soon as he replies.

Are you going to be processing only PubMed articles or from other repostories too?
The graph contains only Abstracts, so let me know if you will later be also requiring full-texts of any subset of the publications. PDFs when available (depending on the license, etc.) are converted to plaintexts in OpenAIRE. So if you have a set of OpenAIRE IDs or DOIs or other publication IDs (like PMID), Marek from ICM could fetch the plaintexts for you.

I'll try to fetch the reviewers' comments from Phase 2 for you later today.

harry-di · 2020-09-10T13:22:02Z

The comment I got from CNR was:
so at least we need:
sitename (e.g. PubChem)
label (the title/label of the chemical)
url (url to the chemical in PubChem)
refidentifier (the id of the chemical)

CNR is waiting also for the opinion of ICM who deals with mining representation in OpenAIRE.

harry-di · 2020-09-10T13:26:29Z

Sorry actually I truncated the full comment from Alessia at CNR, here it is:

Alessia Bardi 4:01 PM
I think it would be better to have also the openaire identifier of the publication in their output
Probably the pubchem identifier is not enough for us. Assuming we will include them as we do for PDB. let me check what we need
4:07
ok, see here: https://code-repo.d4science.org/D-Net/dnet-hadoop/src/branch/master/dhp-schemas/src/main/java/eu/dnetlib/dhp/schema/oaf/ExternalReference.java

(My comment... this is how we represent PDB entries in OpenAIRE, as external references, so Alessia is suggesting we do the same with chemicals).

Alessia Bardi 4:12 PM
so at least we need:
sitename (e.g. PubChem)
label (the title/label of the chemical)
url (url to the chemical in PubChem)
refidentifier (the id of the chemical)

harry-di · 2020-09-10T14:00:34Z

Marek from ICM added that he is fine with the output provided and Alessia's comments but wanted to know if you are going to provide the dumps periodically to make them "consumable" by OpenAIRE or is your codebase planned to be run as a part of IIS (Information Inference Service of OpenAIRE)?

As far as I remember from your proposal (paragraph on "Maintenance"), you will not be providing or integrating code with IIS but only providing updates. Am I correct?

harry-di · 2020-09-10T16:59:00Z

Here are the comments of the consensus report for Phase 2:

Comments
The whole concept of DENTICA is very interesting and useful for OpenAIRE, while the workflow of Ingredio and its app is attractive for commercialisation. This is clearly a win-win collaboration that will enrich the OpenAIRE Research Graph, even if it is mostly relevant to the domain of chemical ingredients in food and cosmetics, while also delivering a new product for their app and benefiting their company.
In evaluating the initial Phase 1 proposal, there was almost a complete lack of information on how the text mining algorithms would work; however, this concern has now been adequately addressed both in the Phase 1 deliverable and even further in this Phase 2 Prototype template with step-by-step descriptions. Some details might still not be fully clear, but the overall solution design now looks very well-structured and reasonable.
Minor revision of the approach for the integration of the results into the OpenAIRE Research Graph may be required but this could be discussed with the OpenAIRE Technical team during the Phase 2 implementation. There were no updates to their Business Canvas model or the Cost Plan.

zoecournia · 2020-09-11T09:38:32Z

Dear Harry,

Thank you for your feedback on all matters and for the proposal feedback! Here are some responses.

Regarding the format, we looked at https://code-repo.d4science.org/D-Net/dnet-hadoop/src/branch/master/dhp-schemas/src/main/java/eu/dnetlib/dhp/schema/oaf/ExternalReference.java
and have two questions:
a) what is expected in "qualifier"?
b) what is expected in "query"?

We can fill in the rest of the entries with no problems.
Today we plan to finalize our training set for the machine learning algorithm that we will write to explore the OpenAire dumps. The training set will have the form/entries of the link above.

Regarding the updates, indeed we had planned to provide dumps periodically to update the content. If you think is useful to integrate code with IIS we can certainly do it towards the end of the project/after the project since it is not one of our deliverables.

Best
Zoe

harry-di · 2020-09-11T13:14:00Z

Dear Zoe,

1a) "qualifier" is meant to indicate the typology of the external reference. Currently the values supported by the relative vocabulary (dnet:externalReference_typologies) are the following:

accessionNumber
dataset
software
url

So, depending on the type of the external reference you are going to provide us, you can pick one of those 4 values, or suggest new ones, so that we can extend our vocabulary definition. Here’s the json representation for two of them

{"classid":"url","classname":"url","schemeid":"dnet:externalReference_typologies","schemename":"dnet:externalReference_typologies"}
{"classid":"accessionNumber","classname":"accessionNumber","schemeid":"dnet:externalReference_typologies","schemename":"dnet:externalReference_typologies"}

1b) Regarding the field "query" at the moment it is not used. I suggest you keep it empty.

OK great. Regarding integration with IIS we can discuss about it after the end of the project.

Best regards,
Harry

zoecournia · 2020-09-14T13:10:24Z

Dear Harry,

Thanks very much. We have an updated json file. Could you and your colleagues take a look so that we can finalize the format?
newstruct.txt

Best
Zoe

GerasimosKou · 2020-09-14T13:27:14Z

Dear Harry,

Zoe uploaded an older version of the sample. I am attaching the updated sample here.
updated_newstruct.txt

Best regards,
Gerasimos

harry-di · 2020-09-14T13:30:03Z

Dear Gerasimos, thanks.
I've just shared it with my colleagues at CNR and ICM to verify the format.
Best regards,
Harry

harry-di · 2020-09-15T13:53:26Z

Let me know if the above makes sense to you, else I'll give you Claudio's email to speak to him directly.

GerasimosKou · 2020-09-16T13:37:00Z

Dear Harry,

I tried to replicate Claudio's JSON structure, here is a sample.
structSample.txt

I also added the Journal's title at the very end of the JSON which was missing from Claudio's example.
Could you please take a look and let us know if there is something wrong/missing?

Best regards,
Gerasimos

harry-di · 2020-09-16T19:43:39Z

Thanks, Gerasime,
I'm waiting for Claudio's OK.
Best,
Harry

harry-di · 2020-09-17T14:49:07Z

Dear Gerasime,

Concerning the addition of
“container”: {
“name”: “Journal of pharmaceutical sciences”
}
since the class "Publication" already declares "journal" it's best to use "journal" instead of "container" in order to align with our current model.

Interestingly, we’re planning to rename that field as container to align with the Guidelines v4 but this will be done in the future.

Otherwise, all else is fine.

Best regards,
Harry

zoecournia · 2020-09-24T10:27:53Z

Dear Harry,

We are ready to submit

D2) Project abstracts (summary of the actions to be completed during Phase 2) (dealine 15/9)
Output and results: pdf document

D2.1) Documentation (deadline 28/9)
Detailed presentation and documentation of code, API(s), licenses used, to build the prototype service(s)
Output and results: Online document

Should we send to you or OpenAire directly?

Also, you had mentioned that we may get a small extension for the end of the project (26/10/2020):
D2.2) Phase 2 report

Please do let us know if an extension will be granted, so that we plan accordingly.

Best
Zoe

harry-di · 2020-09-24T12:57:53Z

Dear Zoe,

Please send it both to me and Coralia, because they have been very slow to reply lately, so I'd like to have a copy (they might send it to me very delayed).

There has been complete silence from Coralia about the extension. Thanks for reminding me. I'll contact Nektaria there today to find out about it.

Best regards,
Harry

zoecournia · 2020-09-24T13:26:08Z

Great, thanks. Can you share your email here? Alternatively write me an email to zcournia at bioacademy.gr.

harry-di · 2020-09-24T13:50:06Z

Great, I've just sent you an email with my two email accounts (ΕΚΠΑ & ATHENA RC).

harry-di · 2020-09-25T12:12:50Z

Dear Zoe,

I got a reply from Coralia saying that "the deliverables must be sent to [email protected] and we will make sure that all evaluators and respective supervisors receive them as well."

In addition, this evening all SMEs will receive an email from Coralia with a slightly revised timeplan for the deadlines.

Best regards,
Harry

zoecournia · 2020-09-25T21:47:10Z

ΟΚ thanks - I will be sending the delis to this address with you on cc.

harry-di · 2020-09-25T21:49:51Z

Thanks, Zoe.
Have a nice weekend.

GerasimosKou · 2020-10-08T10:40:50Z

Dear Gerasime,

Concerning the addition of
“container”: {
“name”: “Journal of pharmaceutical sciences”
}
since the class "Publication" already declares "journal" it's best to use "journal" instead of "container" in order to align with our current model.

Interestingly, we’re planning to rename that field as container to align with the Guidelines v4 but this will be done in the future.

Otherwise, all else is fine.

Best regards,
Harry

Dear Harry,

Regarding the journal tag change, could you please verify that this sample is in the correct format?
public_record.txt

Best regards,
Gerasimos

harry-di · 2020-10-09T12:22:30Z

Dear Gerasimos,

Sorry for the delayed reply, Claudio just replied that he tested the file "public_record.txt" and parsed it correctly as a Publication. So it all looks good.

Best regards,
Harry

GerasimosKou · 2020-10-09T12:29:27Z

That is great, thanks a lot for letting me know.

Best regards,
Gerasimos

zoecournia · 2020-10-21T08:30:18Z

Dear Harry,

We are all set for our conference call on October 26.

Are there any guidelines/template that we should follow?

Also, what is expected to present in the prototype demonstration ?

Best
Zoe

harry-di · 2020-10-21T15:42:35Z

Dear Zoe,

I've sent a reply to the email thread with Nektaria.

Best,
Harry

GerasimosKou · 2020-10-22T12:21:20Z

Dear Harry,

I have created a file that contains the publications which have been classified from a machine learning algorithm but I have a question regarding the structure of each entry. There are some entries that include Pubmed ID. For these entries I have added a second "qualifier" inside the "pid" tag. Should all entries have the second qualifier with empty value for those that do not contain Pubmed ID ? I will also attach a text file with two entries, one that does not contain Pubmed ID and one that does.
publications_samples.txt

Best regards,
Gerasimos

GerasimosKou · 2020-10-22T17:52:36Z

Dear Harry,

I attach the sample file with the two entries here.
publications_samples.txt

Best regards,
Gerasimos

zoecournia · 2020-10-24T04:48:15Z

Dear Harry,

Did you or Claudio had a chance to look at the file? Because we 'd like to present it on Monday. Otherwise, we ;ll present what we have and could can give us your feedback on Monday to work on the final version which will be delivered to you beginning of November.

Best
Zoe

claudioatzori · 2020-10-24T10:33:29Z

Dear Zoe and Gerasimos,

my apologies for the delayed reply. I checked the JSON records in the publications_samples.txt file attached above and they are OK. The element pid is defined as repeatable as a single research product can be identified by multiple persistent IDs, with DOI and PMID being among the legit ones. Furthermore, the encoding you used for the classid/name perfectly reflects the term defined in the OpenAIRE vocabulary dnet:pid_types.

Kind regards,
Claudio

harry-di · 2020-11-11T09:21:50Z

Dear Zoe and Gerasimos,

We are in the process of evaluation your phase 2 work and we would also like to receive a dump of the output you've produced. So far you have sent us a sample that we know can be ingested in OpenAIRE but the evaluation form requires us also to evaluate how the integration with OpenAIRE is also going on and how OpenAIRE benefits from your work.
Is it possible to send us either a sample or the entire dump?

Many thanks,
Kind regards,
Harry

zoecournia · 2020-11-11T12:31:09Z

Of course, we have split it in 7 JSON files for convenience, and we can send you all of them.

Best
Zoe

GerasimosKou · 2020-11-11T13:20:32Z

Dear Harry,

We have created 7 JSON files, each one ~1.5GB. I can send them to you over email with wetransfer or another service or upload here a sample. Let me know.

Best regards,
Gerasimos

harry-di · 2020-11-11T13:35:47Z

Dear Gerasimos,

For the evaluation (that we have to submit today) and just to tick the box that we have received the files, sending one of those today would suffice and we can discuss how to get access to the rest later.
I think GitHub has a strict file limit of 100MB, so maybe if you can upload it on a Dropbox/OneDrive/Box and give us a link, or put them in an ftp server. Wetransfer is also an option, whatever works for you.

Many thanks,
Harry

harry-di · 2020-11-11T13:39:26Z

You may also use https://git-lfs.github.com

GerasimosKou · 2020-11-11T14:16:30Z

Dear Harry,

I have uploaded one of the files to Dropbox, this is the download link: https://www.dropbox.com/s/855pveyvavoooga/publications_export1.json?dl=0

Best regards,
Gerasimos

harry-di · 2020-11-11T14:30:52Z

Thank you, Gerasime,
I'm downloading the file now.
Best regards,
Harry

zoecournia · 2020-11-12T09:18:52Z

Hi Harry,

Was the file format ok? If there are any amendments we should do, please let us know.

Best
Zoe

harry-di · 2020-11-12T14:22:55Z

Dear Zoe,

We had no time to ingest the data but a quick look I had tells me that all is fine. At this stage, that is all we needed in order to "tick the box" in the assessment form, so all is OK. So no amendments required now.

We will share the assessment report when finalised, however, let me share with you two remarks that Corallia might contact you about, so that you can prepare a repose:

● From the estimation of costs on page 32 of D2.2, the price of a graphic designer is listed although the plan does not include any revision/creation of dedicated mock-ups and GUI for the prototype and final solution for the tender.
● In Exhibit 9 of D2.2, the first table for Phase 1 shows a price per hour of 50 euros, while, for the same type of personnel, the price for phase 2 and 3 is 20 euros per hour. Please check and solve the inconsistency.

Best regards,
Harry

harry-di self-assigned this Oct 9, 2020

Meeting with supervisor (Dentica) #2

Meeting with supervisor (Dentica) #2

Comments

AChatzigoulas commented Sep 1, 2020

harry-di commented Sep 1, 2020 • edited Loading

zoecournia commented Sep 1, 2020

zoecournia commented Sep 1, 2020

zoecournia commented Sep 1, 2020

zoecournia commented Sep 1, 2020

harry-di commented Sep 1, 2020

zoecournia commented Sep 1, 2020

harry-di commented Sep 2, 2020

harry-di commented Sep 2, 2020

harry-di commented Sep 3, 2020

harry-di commented Sep 3, 2020

zoecournia commented Sep 3, 2020

harry-di commented Sep 4, 2020

harry-di commented Sep 4, 2020

zoecournia commented Sep 6, 2020

harry-di commented Sep 7, 2020

harry-di commented Sep 7, 2020

zoecournia commented Sep 8, 2020

zoecournia commented Sep 9, 2020

zoecournia commented Sep 9, 2020

harry-di commented Sep 10, 2020

harry-di commented Sep 10, 2020

harry-di commented Sep 10, 2020 • edited Loading

harry-di commented Sep 10, 2020

harry-di commented Sep 10, 2020

zoecournia commented Sep 11, 2020

harry-di commented Sep 11, 2020 • edited Loading

zoecournia commented Sep 14, 2020

GerasimosKou commented Sep 14, 2020

harry-di commented Sep 14, 2020

harry-di commented Sep 15, 2020

GerasimosKou commented Sep 16, 2020

harry-di commented Sep 16, 2020

harry-di commented Sep 17, 2020

zoecournia commented Sep 24, 2020

harry-di commented Sep 24, 2020

zoecournia commented Sep 24, 2020

harry-di commented Sep 24, 2020

harry-di commented Sep 25, 2020

zoecournia commented Sep 25, 2020

harry-di commented Sep 25, 2020

GerasimosKou commented Oct 8, 2020

harry-di commented Oct 9, 2020

GerasimosKou commented Oct 9, 2020

zoecournia commented Oct 21, 2020

harry-di commented Oct 21, 2020

GerasimosKou commented Oct 22, 2020 • edited Loading

GerasimosKou commented Oct 22, 2020

zoecournia commented Oct 24, 2020

claudioatzori commented Oct 24, 2020

harry-di commented Nov 11, 2020

zoecournia commented Nov 11, 2020

GerasimosKou commented Nov 11, 2020

harry-di commented Nov 11, 2020

harry-di commented Nov 11, 2020

GerasimosKou commented Nov 11, 2020

harry-di commented Nov 11, 2020

zoecournia commented Nov 12, 2020

harry-di commented Nov 12, 2020

harry-di commented Sep 1, 2020 •

edited

Loading

harry-di commented Sep 10, 2020 •

edited

Loading

harry-di commented Sep 11, 2020 •

edited

Loading

GerasimosKou commented Oct 22, 2020 •

edited

Loading