Design for d6tflow framework #123

lyriccoder · 2021-01-29T13:03:53Z

We can split our tasks to the following Task of d6tflow framework
Task1 -> open Java file with correct encoding
Task2 -> remove all spaces and comments in it and save to another file
Task3 -> open file, find all method which can be inlined. Save target, extracted, full_ast, text_file, filename, row_csv from Task2
Task4 -> Task3 get target, extracted and filter it. Save target, extracted, full_ast, text_file, filename, row_csv from Task3
Task5 -> get result from Task3 and filter limited cases. Save target, extracted, full_ast, text_file, filename, row_csv from Task4
Task6 -> Inline Method, save file, row_csv
Task 7 -> save row_csv to global DataFrame

Possible problems:

We have to save our preprocessed files to external memory, since we will have lots of files and it won't have enough memory to keep them in cache. Also, we have to keep them also in external memory since, it's our dataset which will be validated.
Seems, it cannot be done due to Reconfigure save() option to point to external store d6t/d6tflow#6
We need to save different types of objects: ast tree, text. Seems, it's difficult:
Support for Tasks that outputs different types and other extensions... d6t/d6tflow#26

The text was updated successfully, but these errors were encountered:

KatGarmash · 2021-02-01T14:15:02Z

@lyriccoder can you please write this issue in terms of Problem and Proposed solution

Also, give the issue a more informative name

lyriccoder · 2021-02-09T17:43:14Z

Bonobo can run tasks in parallel (it is written, i can't check it).
Also, it can run child tasks automatically.

I didn't solve the issue with aggregation of results to global csv:

KatGarmash · 2021-02-10T07:32:42Z

@lyriccoder

Some comments:
(1) step find_EMS: do we need to extract AST at this stage? it's an expensive operation (i think) and it'd be nice to do it after filtering
(2) in the abstract representation of the dataflow, I'd merge filters into one "Filter", as we may modify the sequence of filters in the future
(3) in ur representation of dataflow, you write "prev filters": you don't actually pass filters, but some filtered items. can you specify them?

lyriccoder · 2021-02-10T07:40:33Z

Of course we need, how can we compare method declaration and method invocation and iterate over all methods?
We can do it only with ast
I've done it since we can omit some filter, if we merge them we wont; be able to discard them one by one
It's the previous results, you are right. It has previous results and plus one filter.

E.g., if we filter by ncss, we add to the all data ncss value. If we filter invocations by the SINGLE_STATEMENT_IN_IF filter, we add it to the final result

KatGarmash · 2021-02-10T07:47:38Z

(1) OK. do we need to pass the AST to the next step then?

(2) I meant, just have a more abstract representation processing steps. Merging them is a way of abstraction

(3) Filter is not data, it s an operator. Or do you mean something else by "filter"? The edges have to be labeled with data only. Nodes are operators. You can maybe write "filters(data)"

lyriccoder · 2021-02-10T07:49:40Z

we need full ast for inlining
--
I need to replace the word filter with data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design for d6tflow framework #123

Design for d6tflow framework #123

lyriccoder commented Jan 29, 2021 •

edited

Loading

KatGarmash commented Feb 1, 2021

lyriccoder commented Feb 9, 2021 •

edited

Loading

KatGarmash commented Feb 10, 2021

lyriccoder commented Feb 10, 2021

KatGarmash commented Feb 10, 2021

lyriccoder commented Feb 10, 2021

Design for d6tflow framework #123

Design for d6tflow framework #123

Comments

lyriccoder commented Jan 29, 2021 • edited Loading

KatGarmash commented Feb 1, 2021

lyriccoder commented Feb 9, 2021 • edited Loading

KatGarmash commented Feb 10, 2021

lyriccoder commented Feb 10, 2021

KatGarmash commented Feb 10, 2021

lyriccoder commented Feb 10, 2021

lyriccoder commented Jan 29, 2021 •

edited

Loading

lyriccoder commented Feb 9, 2021 •

edited

Loading