Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meeting with supervisor (Dentica) #2

Open
AChatzigoulas opened this issue Sep 1, 2020 · 64 comments
Open

Meeting with supervisor (Dentica) #2

AChatzigoulas opened this issue Sep 1, 2020 · 64 comments
Assignees

Comments

@AChatzigoulas
Copy link

Dear Harry,

we received an email to contact you as our supervisor to discuss details about the implementation of OPENAIRE project Phase 2.

Can we arrange a meeting for tomorrow? We are available at 10am or 1pm EEST.

Best,
Ingredio team

@harry-di
Copy link
Collaborator

harry-di commented Sep 1, 2020

Dear @AChatzigoula,
unfortunately 10am or 1pm EEST would not work for me, as I'm already booked with other telcos during that period, but I'm available on Thursday until between 11am and 3:30pm Athens time. Would that work for you?

Kind regards,
Harry

@zoecournia
Copy link

Hi Harry,

Unfortunately we are not available Thursday and Friday. Would any time work for you tomorrow?
Best
Zoe

3 similar comments
@zoecournia
Copy link

Hi Harry,

Unfortunately we are not available Thursday and Friday. Would any time work for you tomorrow?
Best
Zoe

@zoecournia
Copy link

Hi Harry,

Unfortunately we are not available Thursday and Friday. Would any time work for you tomorrow?
Best
Zoe

@zoecournia
Copy link

Hi Harry,

Unfortunately we are not available Thursday and Friday. Would any time work for you tomorrow?
Best
Zoe

@harry-di
Copy link
Collaborator

harry-di commented Sep 1, 2020

Hi Zoe,

The only slot that I can cancel tomorrow is 3-4:30pm (attending the VLDB 2020 Round table discussion on "Intelligent Data Exploration") but that's ok if it suits you as I want to know how things are going on and what help you require from OpenAIRE.

Best,
Harry

@zoecournia
Copy link

Hi Harry,

No that's ok please dont cancel it. I will be traveling on Thursday but I guess I could participate in a call at 11am. Will you send us a zoom link?

@harry-di
Copy link
Collaborator

harry-di commented Sep 2, 2020

OK, great, I'll send a link for Thu 11am.

@harry-di
Copy link
Collaborator

harry-di commented Sep 2, 2020

Dentica Meeting - OpenAIRE-Advance Open Innovation Call
Thu, Sep 3, 2020 11:00 AM - 12:00 PM (EEST)

Please join my meeting from your computer, tablet or smartphone.

https://global.gotomeeting.com/join/830960165

You can also dial in using your phone.
(For supported devices, tap a one-touch number below to join instantly.)

United States (Toll Free): 1 866 899 4679

  • One-touch: tel:+18668994679,,830960165#

United States: +1 (571) 317-3116

  • One-touch: tel:+15713173116,,830960165#

Access Code: 830-960-165

More phone numbers:
(For supported devices, tap a one-touch number below to join instantly.)

Australia: +61 2 9091 7603

  • One-touch: tel:+61290917603,,830960165#

Austria: +43 7 2081 5337

  • One-touch: tel:+43720815337,,830960165#

Belgium: +32 28 93 7002

  • One-touch: tel:+3228937002,,830960165#

Canada: +1 (647) 497-9373

  • One-touch: tel:+16474979373,,830960165#

Denmark: +45 32 72 03 69

  • One-touch: tel:+4532720369,,830960165#

Finland: +358 923 17 0556

  • One-touch: tel:+358923170556,,830960165#

France: +33 187 210 241

  • One-touch: tel:+33187210241,,830960165#

Germany: +49 721 6059 6510

  • One-touch: tel:+4972160596510,,830960165#

Ireland: +353 15 360 756

  • One-touch: tel:+35315360756,,830960165#

Italy: +39 0 230 57 81 80

  • One-touch: tel:+390230578180,,830960165#

Netherlands: +31 207 941 375

  • One-touch: tel:+31207941375,,830960165#

New Zealand: +64 9 913 2226

  • One-touch: tel:+6499132226,,830960165#

Norway: +47 21 93 37 37

  • One-touch: tel:+4721933737,,830960165#

Spain: +34 932 75 1230

  • One-touch: tel:+34932751230,,830960165#

Sweden: +46 853 527 818

  • One-touch: tel:+46853527818,,830960165#

Switzerland: +41 225 4599 60

  • One-touch: tel:+41225459960,,830960165#

United Kingdom: +44 20 3713 5011

  • One-touch: tel:+442037135011,,830960165#

New to GoToMeeting? Get the app now and be ready when your first meeting starts: https://global.gotomeeting.com/install/830960165

@harry-di
Copy link
Collaborator

harry-di commented Sep 3, 2020

Dear Zoe, all,

Regarding the issues you've been having with the OpenAIRE I've managed to get a reply from Claudio at CNR:
"Please ignore that dump files. We got a more recent one on OpenAIRE's Hadoop HDFS, represented in a simpler and better documented json-based data model".

@harry-di
Copy link
Collaborator

harry-di commented Sep 3, 2020

So, he will get back to me with more details soon (he's on parental leave this week, but he might give me more info by tomorrow).

@zoecournia
Copy link

Dear Harry,
This is excellent news! Looking forward to receiving the new files.

@harry-di
Copy link
Collaborator

harry-di commented Sep 4, 2020

Dear Zoe, all,

Claudio said that the new graph dump is represented according to the following json schema
https://code-repo.d4science.org/miriam.baglioni/dnet-hadoop/src/branch/dump/dhp-workflows/dhp-graph-mapper/src/main/resources/eu/dnetlib/dhp/oa/graph/dump_whole/schema
(You might find some minor issues here and there as we're still finalising it, but it will be published soon as the official graph dump model v1.0)

I'm still waiting for instructions on how you can access this dump on HDFS.

@harry-di
Copy link
Collaborator

harry-di commented Sep 4, 2020

Claudio now informed me that HDFS is not normally accessed by external users, so he's trying to get an answer from project admin if we can consider you as "extended tech team" and grant you access for the duration of phase 2, etc.

@zoecournia
Copy link

Dear Harry,

Thank you very much for your prompt actions. It would be great to have access to the file in this formal. Also, we'd certainly need to know if this is a format that you plan to stick to in order to adopt it.
If it;s possible to send us the feedback of our application, that would be great.

@harry-di
Copy link
Collaborator

harry-di commented Sep 7, 2020

Dear Zoe,

I just had an update from Claudio:

" last week when we discussed the possibility to grant temporary access on HDFS to the OpenCall participants I didn’t keep in mind that the whole set of tools and web UIs we (tech team) use on a daily basis are reachable only through our VPN, so in my opinion this is a no-go for external users, I don’t think we can support them to configure the clients on their sides. So since Miriam won’t be back until next week we cannot expect the dump to be published on Zenodo until at least 10days/2weeks, therefore to cut the corner and save some time I’m trying to move the file containing the publications (plus the other result types) on some VM@CNR, where I’ll make it temporarily available for some time through an HTTP url. "

I think that is the best and easiest solution for you at the moment. Let's wait for Claudio to copy the data to a site you can access and download it.

Regarding the format, the official version will be published in about 10 days but it is very likely that it will be identical to this one.

All the best,
Harry

@harry-di
Copy link
Collaborator

harry-di commented Sep 7, 2020

Actually he's just done it:

_Harry the result.tar file is available from https://dev-openaire.d4science.org/dump/result.tar
I downloaded ~4Gb of it, un-tarred and noticed the content is there, so assuming the transfer didn’t corrupt the rest of the file it should be OK to pass it over to the participants.

Regarding the data model, we’re going to use that JSON schema as reference model for these json dump files, but as I mentioned in skype with Thanasis &CO it still needs some adjustments before it can be officially published (edited)_

@zoecournia
Copy link

Hi Harry, thanks a lot! - I am forwarding this info to the team and we ;ll let you know if we have any questions!!

@zoecournia
Copy link

Dear Harry,

We have now downloaded and processed the json file. I would like to confirm that you would like us to provide our new dump to the OpenAire Research Graph in this format.
If yes, we have processed an example of five entries in the same json format, for you to check that this is indeed the desired format. How can I send you the files?

Best
Zoe

@zoecournia
Copy link

Archive.zip
OK I could upload the files here.

result_sample.json contains five entries from your file

ingredio_compounds_sample.json contains five entries from our data

The final json file that we will deliver to you can be in the format of ingredio_compounds_sample.json or are there any other changes needed?

Also, if you could send us the feedback we received form reviewers that would be great.

@harry-di
Copy link
Collaborator

Dear Zoe,
I'm glad that you managed to download and process the json graph file.
Thank you for providing a sample of your output.

So if I understand this correctly, in each line you provide a chemical compound (pubchem id) linked to a number of PMC publications, giving PMID, Pubchem_ID, Article (title), Journal, Abstract, and DOI for each.
This looks fine to me but I've shared your example with Claudio at CNR to be absolutely sure, so I'll let you know as soon as he replies.

Are you going to be processing only PubMed articles or from other repostories too?
The graph contains only Abstracts, so let me know if you will later be also requiring full-texts of any subset of the publications. PDFs when available (depending on the license, etc.) are converted to plaintexts in OpenAIRE. So if you have a set of OpenAIRE IDs or DOIs or other publication IDs (like PMID), Marek from ICM could fetch the plaintexts for you.

I'll try to fetch the reviewers' comments from Phase 2 for you later today.

@harry-di
Copy link
Collaborator

The comment I got from CNR was:
so at least we need:
sitename (e.g. PubChem)
label (the title/label of the chemical)
url (url to the chemical in PubChem)
refidentifier (the id of the chemical)

CNR is waiting also for the opinion of ICM who deals with mining representation in OpenAIRE.

@harry-di
Copy link
Collaborator

harry-di commented Sep 10, 2020

Sorry actually I truncated the full comment from Alessia at CNR, here it is:

Alessia Bardi 4:01 PM
I think it would be better to have also the openaire identifier of the publication in their output
Probably the pubchem identifier is not enough for us. Assuming we will include them as we do for PDB. let me check what we need
4:07
ok, see here: https://code-repo.d4science.org/D-Net/dnet-hadoop/src/branch/master/dhp-schemas/src/main/java/eu/dnetlib/dhp/schema/oaf/ExternalReference.java

(My comment... this is how we represent PDB entries in OpenAIRE, as external references, so Alessia is suggesting we do the same with chemicals).

Alessia Bardi 4:12 PM
so at least we need:
sitename (e.g. PubChem)
label (the title/label of the chemical)
url (url to the chemical in PubChem)
refidentifier (the id of the chemical)

@harry-di
Copy link
Collaborator

Marek from ICM added that he is fine with the output provided and Alessia's comments but wanted to know if you are going to provide the dumps periodically to make them "consumable" by OpenAIRE or is your codebase planned to be run as a part of IIS (Information Inference Service of OpenAIRE)?

As far as I remember from your proposal (paragraph on "Maintenance"), you will not be providing or integrating code with IIS but only providing updates. Am I correct?

@harry-di
Copy link
Collaborator

Here are the comments of the consensus report for Phase 2:

Comments
The whole concept of DENTICA is very interesting and useful for OpenAIRE, while the workflow of Ingredio and its app is attractive for commercialisation. This is clearly a win-win collaboration that will enrich the OpenAIRE Research Graph, even if it is mostly relevant to the domain of chemical ingredients in food and cosmetics, while also delivering a new product for their app and benefiting their company.
In evaluating the initial Phase 1 proposal, there was almost a complete lack of information on how the text mining algorithms would work; however, this concern has now been adequately addressed both in the Phase 1 deliverable and even further in this Phase 2 Prototype template with step-by-step descriptions. Some details might still not be fully clear, but the overall solution design now looks very well-structured and reasonable.
Minor revision of the approach for the integration of the results into the OpenAIRE Research Graph may be required but this could be discussed with the OpenAIRE Technical team during the Phase 2 implementation. There were no updates to their Business Canvas model or the Cost Plan.

@zoecournia
Copy link

Dear Harry,

Thank you for your feedback on all matters and for the proposal feedback! Here are some responses.

  1. Regarding the format, we looked at https://code-repo.d4science.org/D-Net/dnet-hadoop/src/branch/master/dhp-schemas/src/main/java/eu/dnetlib/dhp/schema/oaf/ExternalReference.java
    and have two questions:
    a) what is expected in "qualifier"?
    b) what is expected in "query"?

We can fill in the rest of the entries with no problems.
Today we plan to finalize our training set for the machine learning algorithm that we will write to explore the OpenAire dumps. The training set will have the form/entries of the link above.

  1. Regarding the updates, indeed we had planned to provide dumps periodically to update the content. If you think is useful to integrate code with IIS we can certainly do it towards the end of the project/after the project since it is not one of our deliverables.

Best
Zoe

@harry-di
Copy link
Collaborator

harry-di commented Sep 11, 2020

Dear Zoe,

1a) "qualifier" is meant to indicate the typology of the external reference. Currently the values supported by the relative vocabulary (dnet:externalReference_typologies) are the following:

  • accessionNumber
  • dataset
  • software
  • url

So, depending on the type of the external reference you are going to provide us, you can pick one of those 4 values, or suggest new ones, so that we can extend our vocabulary definition. Here’s the json representation for two of them

{"classid":"url","classname":"url","schemeid":"dnet:externalReference_typologies","schemename":"dnet:externalReference_typologies"}
{"classid":"accessionNumber","classname":"accessionNumber","schemeid":"dnet:externalReference_typologies","schemename":"dnet:externalReference_typologies"}

1b) Regarding the field "query" at the moment it is not used. I suggest you keep it empty.

  1. OK great. Regarding integration with IIS we can discuss about it after the end of the project.

Best regards,
Harry

@zoecournia
Copy link

Dear Harry,

Thanks very much. We have an updated json file. Could you and your colleagues take a look so that we can finalize the format?
newstruct.txt

Best
Zoe

@GerasimosKou
Copy link

Dear Harry,

Zoe uploaded an older version of the sample. I am attaching the updated sample here.
updated_newstruct.txt

Best regards,
Gerasimos

@harry-di
Copy link
Collaborator

Dear Gerasimos, thanks.
I've just shared it with my colleagues at CNR and ICM to verify the format.
Best regards,
Harry

@harry-di
Copy link
Collaborator

Let me know if the above makes sense to you, else I'll give you Claudio's email to speak to him directly.

@GerasimosKou
Copy link

Dear Harry,

I tried to replicate Claudio's JSON structure, here is a sample.
structSample.txt

I also added the Journal's title at the very end of the JSON which was missing from Claudio's example.
Could you please take a look and let us know if there is something wrong/missing?

Best regards,
Gerasimos

@harry-di
Copy link
Collaborator

Thanks, Gerasime,
I'm waiting for Claudio's OK.
Best,
Harry

@harry-di
Copy link
Collaborator

Dear Gerasime,

Concerning the addition of
“container”: {
“name”: “Journal of pharmaceutical sciences”
}
since the class "Publication" already declares "journal" it's best to use "journal" instead of "container" in order to align with our current model.

Interestingly, we’re planning to rename that field as container to align with the Guidelines v4 but this will be done in the future.

Otherwise, all else is fine.

Best regards,
Harry

@zoecournia
Copy link

Dear Harry,

We are ready to submit

D2) Project abstracts (summary of the actions to be completed during Phase 2) (dealine 15/9)
Output and results: pdf document

D2.1) Documentation (deadline 28/9)
Detailed presentation and documentation of code, API(s), licenses used, to build the prototype service(s)
Output and results: Online document

Should we send to you or OpenAire directly?

Also, you had mentioned that we may get a small extension for the end of the project (26/10/2020):
D2.2) Phase 2 report

Please do let us know if an extension will be granted, so that we plan accordingly.

Best
Zoe

@harry-di
Copy link
Collaborator

Dear Zoe,

Please send it both to me and Coralia, because they have been very slow to reply lately, so I'd like to have a copy (they might send it to me very delayed).

There has been complete silence from Coralia about the extension. Thanks for reminding me. I'll contact Nektaria there today to find out about it.

Best regards,
Harry

@zoecournia
Copy link

Great, thanks. Can you share your email here? Alternatively write me an email to zcournia at bioacademy.gr.

@harry-di
Copy link
Collaborator

Great, I've just sent you an email with my two email accounts (ΕΚΠΑ & ATHENA RC).

@harry-di
Copy link
Collaborator

Dear Zoe,

I got a reply from Coralia saying that "the deliverables must be sent to [email protected] and we will make sure that all evaluators and respective supervisors receive them as well."

In addition, this evening all SMEs will receive an email from Coralia with a slightly revised timeplan for the deadlines.

Best regards,
Harry

@zoecournia
Copy link

ΟΚ thanks - I will be sending the delis to this address with you on cc.

@harry-di
Copy link
Collaborator

Thanks, Zoe.
Have a nice weekend.

@GerasimosKou
Copy link

Dear Gerasime,

Concerning the addition of
“container”: {
“name”: “Journal of pharmaceutical sciences”
}
since the class "Publication" already declares "journal" it's best to use "journal" instead of "container" in order to align with our current model.

Interestingly, we’re planning to rename that field as container to align with the Guidelines v4 but this will be done in the future.

Otherwise, all else is fine.

Best regards,
Harry

Dear Harry,

Regarding the journal tag change, could you please verify that this sample is in the correct format?
public_record.txt

Best regards,
Gerasimos

@harry-di
Copy link
Collaborator

harry-di commented Oct 9, 2020

Dear Gerasimos,

Sorry for the delayed reply, Claudio just replied that he tested the file "public_record.txt" and parsed it correctly as a Publication. So it all looks good.

Best regards,
Harry

@GerasimosKou
Copy link

That is great, thanks a lot for letting me know.

Best regards,
Gerasimos

@harry-di harry-di self-assigned this Oct 9, 2020
@zoecournia
Copy link

Dear Harry,

We are all set for our conference call on October 26.

Are there any guidelines/template that we should follow?

Also, what is expected to present in the prototype demonstration ?

Best
Zoe

@harry-di
Copy link
Collaborator

Dear Zoe,

I've sent a reply to the email thread with Nektaria.

Best,
Harry

@GerasimosKou
Copy link

GerasimosKou commented Oct 22, 2020

Dear Harry,

I have created a file that contains the publications which have been classified from a machine learning algorithm but I have a question regarding the structure of each entry. There are some entries that include Pubmed ID. For these entries I have added a second "qualifier" inside the "pid" tag. Should all entries have the second qualifier with empty value for those that do not contain Pubmed ID ? I will also attach a text file with two entries, one that does not contain Pubmed ID and one that does.
publications_samples.txt

Best regards,
Gerasimos

@GerasimosKou
Copy link

Dear Harry,

I attach the sample file with the two entries here.
publications_samples.txt

Best regards,
Gerasimos

@zoecournia
Copy link

Dear Harry,

Did you or Claudio had a chance to look at the file? Because we 'd like to present it on Monday. Otherwise, we ;ll present what we have and could can give us your feedback on Monday to work on the final version which will be delivered to you beginning of November.

Best
Zoe

@claudioatzori
Copy link
Member

Dear Zoe and Gerasimos,

my apologies for the delayed reply. I checked the JSON records in the publications_samples.txt file attached above and they are OK. The element pid is defined as repeatable as a single research product can be identified by multiple persistent IDs, with DOI and PMID being among the legit ones. Furthermore, the encoding you used for the classid/name perfectly reflects the term defined in the OpenAIRE vocabulary dnet:pid_types.

Kind regards,
Claudio

@harry-di
Copy link
Collaborator

Dear Zoe and Gerasimos,

We are in the process of evaluation your phase 2 work and we would also like to receive a dump of the output you've produced. So far you have sent us a sample that we know can be ingested in OpenAIRE but the evaluation form requires us also to evaluate how the integration with OpenAIRE is also going on and how OpenAIRE benefits from your work.
Is it possible to send us either a sample or the entire dump?

Many thanks,
Kind regards,
Harry

@zoecournia
Copy link

Of course, we have split it in 7 JSON files for convenience, and we can send you all of them.

Best
Zoe

@GerasimosKou
Copy link

Dear Harry,

We have created 7 JSON files, each one ~1.5GB. I can send them to you over email with wetransfer or another service or upload here a sample. Let me know.

Best regards,
Gerasimos

@harry-di
Copy link
Collaborator

Dear Gerasimos,

For the evaluation (that we have to submit today) and just to tick the box that we have received the files, sending one of those today would suffice and we can discuss how to get access to the rest later.
I think GitHub has a strict file limit of 100MB, so maybe if you can upload it on a Dropbox/OneDrive/Box and give us a link, or put them in an ftp server. Wetransfer is also an option, whatever works for you.

Many thanks,
Harry

@harry-di
Copy link
Collaborator

You may also use https://git-lfs.github.com

@GerasimosKou
Copy link

Dear Harry,

I have uploaded one of the files to Dropbox, this is the download link: https://www.dropbox.com/s/855pveyvavoooga/publications_export1.json?dl=0

Best regards,
Gerasimos

@harry-di
Copy link
Collaborator

Thank you, Gerasime,
I'm downloading the file now.
Best regards,
Harry

@zoecournia
Copy link

Hi Harry,

Was the file format ok? If there are any amendments we should do, please let us know.

Best
Zoe

@harry-di
Copy link
Collaborator

Dear Zoe,

We had no time to ingest the data but a quick look I had tells me that all is fine. At this stage, that is all we needed in order to "tick the box" in the assessment form, so all is OK. So no amendments required now.

We will share the assessment report when finalised, however, let me share with you two remarks that Corallia might contact you about, so that you can prepare a repose:

● From the estimation of costs on page 32 of D2.2, the price of a graphic designer is listed although the plan does not include any revision/creation of dedicated mock-ups and GUI for the prototype and final solution for the tender.
● In Exhibit 9 of D2.2, the first table for Phase 1 shows a price per hour of 50 euros, while, for the same type of personnel, the price for phase 2 and 3 is 20 euros per hour. Please check and solve the inconsistency.

Best regards,
Harry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants