-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Type loss/erasure in compiled domain classes severely hurts interoperability #4796
Comments
That is a good observation: The new way the APIs are set up suck when used from java. To give a little context on why we set it up this way: It is very much intended that a lot of type info is erased. This is because all nodes are ultimately just GNode, i.e. contain a reference to the graph, and a couple of indices to access their fields; both the class pointer and the graph pointer are, ultimately, convenience and the proper encoding of nodes would be Now suppose you are in a situation where you need to access e.g. The way we set this up, there is no dynamic dispatch depending on the node type. Yay! ================================= Do I understand your situation right that you want to use joern as a library from java code? I think the appropriate thing would be for us to have a chat and figure out nice java bindings / a java API? We could generate them along with the scala domain classes. Is there any small sample code how you use joern from java that you could share? The way I imagine it, this would be near unusable, due to the prevalence of extension methods / implicit conversions. I.e. most of our APIs look like This "use dot-access syntax instead of free-standing functions" style is also not my favorite, but it is where we are, for better or worse. =================================
For some unfathomable reason, scala3 decides that Any ideas @mpollmeier ? |
Thanks for the explanation! We are actually using Kotlin, though I think Java bindings would be great/better to ensure broad compatibility. Generally we have been wrapping the scala implicits with kotlin extensions (with some intention to eventually codegen this), but there are cases where this is actually just impossible to have compiler checking, eg @Suppress(
"BOUNDS_NOT_ALLOWED_IF_BOUNDED_BY_TYPE_PARAMETER",
"TYPE_MISMATCH",
"INCONSISTENT_TYPE_PARAMETER_BOUNDS",
"UPPER_BOUND_VIOLATED",
"RETURN_TYPE_MISMATCH",
"TYPE_INFERENCE_EXPECTED_TYPE_MISMATCH",
"TYPE_INFERENCE_INCORPORATION_ERROR",
"ARGUMENT_TYPE_MISMATCH",
"NEW_INFERENCE_NO_INFORMATION_FOR_PARAMETER",
) And now we can call it on any fun <T : StoredNode> Iterator<T>.fullName(): Iterator<String> = `TraversalPropertyFullName$`.`MODULE$`.`fullName$extension`(this) There is also Here are some more samples for some context. Set of extension wrappers: // <editor-fold desc="AccessNeighborsForMethodTraversal extension implicits">
fun Iterator<Method>.cfgFirst(): Iterator<CfgNode> = `AccessNeighborsForMethodTraversal$`.`MODULE$`.`cfgFirst$extension`(this)
fun Iterator<Method>.methodReturn(): Iterator<MethodReturn> = `AccessNeighborsForMethodTraversal$`.`MODULE$`.`methodReturn$extension`(this)
fun Iterator<Method>.literal(): Iterator<Literal> = `AccessNeighborsForMethodTraversal$`.`MODULE$`.`literal$extension`(this)
fun Iterator<Method>.parameter(): Iterator<MethodParameterIn> = `AccessNeighborsForMethodTraversal$`.`MODULE$`.`parameter$extension`(this)
fun Iterator<Method>.block(): Iterator<Block> = `AccessNeighborsForMethodTraversal$`.`MODULE$`.`block$extension`(this)
fun Iterator<Method>.astIn(): Iterator<AstNode> = `AccessNeighborsForMethodTraversal$`.`MODULE$`.`astIn$extension`(this)
fun Iterator<Method>.astOut(): Iterator<AstNode> = `AccessNeighborsForMethodTraversal$`.`MODULE$`.`astOut$extension`(this)
fun Iterator<Method>.callIn(): Iterator<Call> = `AccessNeighborsForMethodTraversal$`.`MODULE$`.`callIn$extension`(this)
fun Iterator<Method>.cfgOut(): Iterator<AstNode> = `AccessNeighborsForMethodTraversal$`.`MODULE$`.`cfgOut$extension`(this)
fun Iterator<Method>.containsIn(): Iterator<AstNode> = `AccessNeighborsForMethodTraversal$`.`MODULE$`.`containsIn$extension`(this)
fun Iterator<Method>.containsOut(): Iterator<CfgNode> = `AccessNeighborsForMethodTraversal$`.`MODULE$`.`containsOut$extension`(this)
fun Iterator<Method>.dominateOut(): Iterator<CfgNode> = `AccessNeighborsForMethodTraversal$`.`MODULE$`.`dominateOut$extension`(this)
fun Iterator<Method>.postDominateIn(): Iterator<CfgNode> = `AccessNeighborsForMethodTraversal$`.`MODULE$`.`postDominateIn$extension`(this)
fun Iterator<Method>.reachingDefOut(): Iterator<CfgNode> = `AccessNeighborsForMethodTraversal$`.`MODULE$`.`reachingDefOut$extension`(this)
fun Iterator<Method>.refIn(): Iterator<StoredNode> = `AccessNeighborsForMethodTraversal$`.`MODULE$`.`refIn$extension`(this)
fun Iterator<Method>.sourceFileOut(): Iterator<File> = `AccessNeighborsForMethodTraversal$`.`MODULE$`.`sourceFileOut$extension`(this)
fun Iterator<Method>.taggedByOut(): Iterator<Tag> = `AccessNeighborsForMethodTraversal$`.`MODULE$`.`taggedByOut$extension`(this)
// </editor-fold> ...and here's a snippet of one of our "parsers" illustrating the usage of our extension wrappers which is attempting to build out a route tree from Spring annotations /**
* Build route tree
*
* Hierarchy is composed of:
* 1. (Optional) Global prefix via application.properties -> context path
* - uses first application.properties found, first looking for one with `src/main` in the path
* 2. (Optional) Controller prefixes (via RequestMapping on Controllers)
* 3. Endpoint routes (Via *Mapping Spring web annotations
*/
override fun buildRouteTree(cpg: Cpg): PathNode =
pathNode {
val firstAppProperties = resolveAppPropertiesFile(cpg)
path = firstAppProperties?.getContextPath().orEmpty()
// build routes for controllers
cpg.annotation()
.where(Iterator<Annotation>::typeDecl)
.fullNameExact(
"org.springframework.stereotype.Controller",
"org.springframework.web.bind.annotation.RestController",
).typeDecl()
.foreach { controllerRoutes(it) }
} |
Thanks for the examples! Some minor thoughts on the issue:
I think you can get away with pretty OK wrappers even without the StaticType. This is because in joern, for every property access that is valid in the type system, there is always one unique reason why the access is valid. Say We cannot do that on the commercial shiftleft / qwiet side: Our proprietary extensions to the domain classes have clashes. To give an example: We have an abstraction However, I don't think we are willing to commit to preserving that nice "everything for a single reason" property of the open-source joern / codepropertygraph domain schema in the future -- or what do you think @mpollmeier @fabsx00 ? How would you @tajobe feel about generating kotlin wrappers in the flatgraph codegen? I think you would be stuck writing that code, but if you write good unit tests we can probably maintain it. If you want to get fancy, we might also go for macro / annotation based wrappers for non-autogenerated code (i.e. traversal steps / extensions in e.g. semanticcpg)? |
Thank you for bringing up this very valid point! My current thoughts on this: generating additional wrappers for other languages or macros etc. sounds like additional complexity and maintenance burden I'd very much prefer to avoid. The StaticType trick with erased marker traits is very neat, but it only works for Scala and is only used in a few (albeit useful and showcasey-cool) cases. I haven't tried this (and also not properly thought this through), but why don't we move the marker traits for the schema-defined types back to the non-erased traits, i.e. instead of And then the code generator could define extension methods for both That way, we have these marker traits available in the bytecode, and still support the neat usecase for ad-hoc extensions. |
@mpollmeier That would certainly improve the situation for us. I do think there's probably some case for Java bindings for models/traversals for broad JVM interoperability (there are still issues around scala's type system, implicit parameters and functions, etc), but there is certainly added complexity to building and maintaining that. If the types situation can be improved, we'll likely go the route of code-genning our own wrapper extensions rather than forking the schema/domain classes generator entirely. I'm thinking about making it more generic, eg a |
Describe the bug
When invoking joern as a library (like in joernio/standalone-ext), the loss of type information across many domain classes breaks JVM-langauge interop, particularly with the switch to the EMT to provide property/traversal implicits.
For example, the generated
Method
node is compiled as (fromjavap
, removing the fully qualified names for readability):Because it doesn't have
StaticType[MethodEMT]
when compiled, it lacks all of those traits likeHasFilenameEMT
orHasFullNameEMT
. This means there's no way for aMethod
node to satisfy the type constraints on traversals likeThere are other losses of type information, not exclusive to the EMTs.
For example,
AstNode
becomes...lacking
StoredNode
, and thus lacking the.file
implicit/extension.Also, there are several cases where traversals' types get widened from
Iterator<Int>
orIterator<Boolean>
toIterator<Object>
:To Reproduce
Steps to reproduce the behavior:
codepropertygraph-domain-classes_3-1.7.1.jar
or generate classes using the flatgraph domain-classes-generator and compilejavap
or an IDE which decompiles the classes to check signatures/typesExpected behavior
Enough information is available in the compiled classes to be able to reasonably use the nodes/traversals/etc
Screenshots
N/A
Desktop (please complete the following information):
Should be irrelevant here but:
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: