Compiler Rebuild #186

wasabii · 2022-10-22T15:52:09Z

wasabii
Oct 22, 2022
Maintainer

So I've been thinking for awhile about how to approach rebuilding the IKVM compiler. Right now it's very much tied into the Runtime. Relies on many static variables and instances. Loads core types using typeof() in many places. Requires TypeWrappers and ClassLoaderWrappers to be loaded. This means it's very much tied to the classes of the current .NET runtime. You can't really run the compiler on Core 3 and generate .NET 5 assemblies. It's why the IKVM.Runtime sources files are included by ikvmc, with IFDEFs, instead of just referenced as a library.

So my thinking is this needs to change. It needs to be remodeled into a set of classes which can produce assemblies without regard for the current execution environment. We can derive a lot of lessons from Microsoft.CodeAnalysis (Roslyn), and maybe even reuse some classes in it.

Another thing would be to dump the usage of System.Reflection.Emit. Right now this exists in two forms: actual System.Reflection.Emit, for generation of dynamic assemblies at runtime; and IKVM.Reflection.Emit, which is subbed out for ikvmc. I would remodeling this new IL generation around System.Reflection.Metadata. No more hard references to runtime types.

This has a downside: the only way to generate dynamic assemblies is through System.Reflection.Emit. And dynamic assemblies is the way we get unload support. Now, Core can unload assemblies differently. Using AssemblyLoadContext. And this could be a good path forward for Core. ClassLoaders really are an AssemblyLoadContext. There's some way to put this together for Core where we simply no longer use System.Reflection.Emit, but instead load assemblies in an isolated LoadContext. But this would be a very different architecture than Framework, where we'd have to build a translation layer from regular assemblies to Dynamic.

So, translation layer. What would that look like. The main compilation path would emit assemblies with System.Reflection.Metadata. But then those assemblies would be reparsed and reemitted 1->1 to System.Reflection.Emit. There's some history of people working on this at https://github.com/Lokad/ILPack, but in the opposite direction: taking assemblies built by Emit, and rewriting them towards MetadataBuilder. Our converter would be almost the exact same thing but backwards. It is a lot of work. But straightforward. Mapping opcodes to opcodes, etc.

Another option is to write all assemblies to some interface that looks like MetadataBuilder, but calls either MetadataBuilder or Reflection.Emit dynamically. I'm not super fond of this. It seems like the ability to take a static assembly and reemit into AssemblyBuilder might have usages outside IKVM, and thus attract more interest.

So there's a speed sacrifice here. We'd be writing assemblies to a stream, then rereading them back in and emitting them back into AssemblyBuilder. Is this significant? Maybe. Does it matter? Maybe.

It does open the door for a stage two for Core though: dropping Dynamic assemblies completely, and using static assemblies with AssemblyLoadContext. No translation required there.

One stance to take might be that we should simply always target the latest methodology, and write adaptors for the earlier down level platforms, and take the hit. And optimize where we can.

Roslyn builds ontop of MetadataBuilder, with Compilations. These are more about the 'context' surrounding building a bunch of code: how assembly references are located, the relationships between assemblies, and the options that are availabel to customize the output. We can probably mirror much of this: IkvmCompilation instead of CSharpCompilation. Roslyn's stack (CodeAnalysis) provides many tools: assembly identity comparison, signing, reference resolving, etc, that we can make use of. Though we can't make use of any of the actual code generation pieces. There's a lot of this stuff in IKVM: especially assembly identity comparison, which would be nice to dump and replace with a more rigorous implementation.

So here's what I sort of imagine. A new IkvmCompilation class, that operates standalone. It duplicates most of the API surface of CSharpCompilation, but adds some other things in. It has options to emit static assemblies or dynamic assemblies (they do differ in IL). Or at least options that target the specifics of those two. It will take most of the place of Universe, StaticCompiler, and the TypeWrappers, as it relates to generating assemblies. All of the Emit code will be pulled out of the TypeWrappers, and moved into this new library.

TypeWrappers can then invoke this new IkvmCompilation class to obtain references to the assemblies as they need them. With the dynamic path passing different options from the static path. Resolvers will be different between the two: with the dynamic path passing resolvers that take class loaders into consideration. Omitting resources, etc. While the static path can generate slightly different assemblies. The IkvmCompilation code itself should be unaware of this distinction: just accepting different options for each.

I do think this should be a separate project with a separate assembly. It can be independently testable and usage outside of the IKVM.Runtime or the Static Compiler.

I wonder if it makes sense to call it IKVM.CodeAnalysis?

wasabii · 2023-01-02T23:00:46Z

wasabii
Jan 2, 2023
Maintainer Author

Okay. More understanding.

TypeWrappers right now encapsulate two bits of information that can be split. Reflection-only metadata, and runtime metadata. They are used at runtime, to generate dynamic types. But also to implement reflective capabilities, etc. But they are also used by the static compiler and exporter to represent type information without execution.

This deserves being split. TypeWrappers can become two things: JavaTypeInfo and JavaType. The first being metadata-only, suitable for usage by the compilers at any stage. Runtime information can be maintained in JavaType. This should give us the basis for separating the EXPORTER/IMPORTER requirements into the first, but not into the second. A demarcation line to help split things.

ClassLoaderWrapper needs to go away at importer/exporter time. ClassLoaders are a runtime concept. They're being sort of faked by the static compiler though, even though there's no backing java.lang.ClassLoader. So this part of the logic can move out into something very-unJava like. Maybe JavaTypeInfoContext. The idea here is to be a bit recursive. The JavaTypeInfoContext delivers JavaTypeInfo instances, and can provide lookups by name. And then it can be overloaded with various parent search paths, but like a ClassLoader, but not really. For instance, AssemblyJavaTypeInfoResolver. Or FileJavaTypeInfoResolver.

And, at runtime, we can provide a different resolver. At static compile time, we're interested in setting up a Context that can load types from System.Reflection.Metadata and friends as well as FileJavaTypeInfoResolver.

However, at runtime, we're more interested in providing a ClassLoaderJavaTypeInfoResolver, which actually consults the runtime class loader hierarchy. This provides a nice interface split.

2 replies

wasabii Jan 2, 2023
Maintainer Author

JavaTypeInfo

non-runtime representation of a Java-type
replacement for the first stage of Wrappers
many subclasses, but hidden from rest of code. origin not known.
PrimitiveTypeInfo
JavaClassTypeInfo, for instance, replaces DynamicTypeWrapper
many properties, like Type, that describe the Type-As-Known-By-Java
no ability to access underlying Type
wrapped instage 2: JavaType
can be emitted, using internal information related to other types
- will be emitted into dynamic types by runtime
- will be emitted into static types by importer
- can have stubs generated by exporter
only contains information, not runtime type information
however, AssemblyJavaTypeInfo is based on a real .NET Type class (replaces DotNetTypeWrapper)

JavaType

stage 2 of pipeline. represents a type as known at runtime.
can obtain the underlying java.lang.Class
links back to the JavaTypeInfo that produced it

The idea here is we can create JavaTypeInfo's from reflected metadata, or from .class files, and they don't contain any runtime information. So the importer and exporter can manipulate them without conditionals. We should be able to build up a full type graph without any real Java obejcts being loaded.
While at runtime, JavaType is used that provides full linking.

JavaTypeInfo's will need to resolve two distinct but maybe overlapping things: CLR types and other Java types
CLR types will be needed because they'll appear in information about types. For instance, System.Object, System.String, or various attributes.
Java types are the entire point.

TypeResolvers may have Java types available: if they're previously statically compiled things, but these should be ignored. Instead they should only be considered by the JavaTypeInfoResolvers.
JavaTypeInfoResolvers will ALSO have CLR types available: but only after running them through translation to JavaTypeInfos.

The Exporter takes class files and assemblies. Assemblies can be added to both collections, but disregarded by TypeResolvers if they're JavaModules.

We should not form any hard boundaries between these two collections, but instead, lookup the object from the side in which it's needed. For instance, don't take a Attribute and resolve it using FindJavaTypeInfo, unless the goal is to emit Java code related to it. If emitting CLR code, then find it in the TypeResolver set. This lets us deal with the same types from two sources. During runtime, we might be working with the first set on runtime assemblies, while still working on reflected assemblies in the second set.

JavaTypeInfoContext
TypeInfoCache<string, TypeInfo>
JavaTypeInfoCache<JavaTypeName, JavaTypeInfo>

TypeResolver[]
set of resolvers for CLR types, these will not expose any Java type information

JavaTypeInfoResolver[]
set of resolvers for JavaTypes, these will probably be factories of some sort

FindType(string clrTypeName)
enumerates the assembly type resolvers, asking for the CLR type by name

GetType(string clrTypeName)
gets the matching CLR type, after consulting the cache

FindJavaTypeInfo(string binaryName)
enumerates the resolvers, asking for the Java type by name

GetJavaTypeInfo(JavaTypeName binaryName)
gets the matchin Java type, after consulting the cache

JavaTypeInfoAssemblyMetadataResolver

will resolve JavaTypeInfo from an Assembly
takes into account attributes, generated either by hand, or by the static compiler

JavaTypeInfoAssemblyResolver

will resolve JavaTypeInfo from an Assembly
takes into account attributes, generated either by hand, or by the static compiler
these two resolvers can probably share a lot of code, but one can be fed System.Reflection.Metadata.MetadataLoadContext types

JavaTypeInfoClassResolver

will resolve JavaTypeInfo from a directory of class files

JavaTypeInfo needs to always consult the Context for type information.
Context needs methods to resolve CLR types for primitives, builtins, etc.
No more direct usage of typeof().

JavaTypeInfoContext

I think this can fit nicely into IKVM.Metadata. Sort of the same as System.Reflection.Metadata.

wasabii Jan 2, 2023
Maintainer Author

An interesting bit will be having the JavaTypeInfoContext consult the runtime JavaType information. That is, as JavaTypes are generated, they get exposed to the lower level, so they can be referenced by other JavaTypeInfo's at that level. But the idea here is the JavaTypeInfoContext can be customized, and work differently for the tools vs runtime.

The difference between the operation of the tools vs runtime can be encapsulated into different configuration of the Context, and perhaps different options provided for emitting.

Speaking of emitting. I think that probably belongs outside this hierarchy as well. The stub generator should be able to generate stubs for any JavaTypeInfo, without knowing the details about it, as well as the compiler emit code for any JavaTypeInfo, also without knowing the details about it.

But some information needs to be available to the compiler, but not for all types. For instance, JavaTypeInfo's loaded from .class files will have available to methods, and their bodies, in the form of links to ClassFile entities out of IKVM.Bytecode. The methods of these types can be converted.

wasabii · 2023-07-02T15:10:51Z

wasabii
Jul 2, 2023
Maintainer Author

Okay. Some new ideas. Been playing around with some test designs.

First off, going to abstract away the managed type provider and java type provider. These are the sources of raw .NET types and raw Java byte code. So the rest of the code base can depend on that, instead of against System.Type or System.Reflection.Metadata, etc.

So, our own ManagedType, ManagedField, ManagedMethod classes. These dont' need to be interfaes wrapping anything: we'll just do a full load of the source. So, we take a System.Type, and copy everything into ManagedType. We should be careful here about allocations and such. Make good use of structures, etc. Much like System.Reflection.Metadata. So, we populate one of these structures. Optimize for certain patterns: such as the average number of methods a type has (store on type, instead of on list, etc).

Same applies on the Java side, ByteCodeClass, ByteCodeType, etc. This is also the layer at which the ConstantPool overrides take place. ConstantPool overrides are not going to be much more than intercepting the load and rewriting information in the fake class info.

This layer doesn't need to loop back to resolve types. ManagedType needs to be self contained. Any references to other types should be stored as some sort of reference. This layer isn't concerned about resolving stuff, just about describing a type and what that other type relates to. This is in accordance with SRM, which uses ref structs to repreesnt handles to other types. But it's not in accordance with System.reflection, where things contain links to actual System.Types. We need to poke through this layer a bit for System.Reflection, where our handles can optinally contain some sort of quicker way to access the referenced type. That is, resolution of a type reference can be assisted in System.Type. Runtime dynamic compilation will need this for speed, and to maintain the real runtime integraty between System.Type.

Next layer is something like a view of how things look in Java. JavaView, JavaType, not exactly sure. This will be a layer that is fed a bucket of ByteCodeTypes and ManagedTypes, and allows you to lookup a JavaType from it. This lookup happens within some sort of unit (compilation unit). The compilation Unit isn't necessarily representative of a single assembly or single class loader or anything: but it could be. You could pile ManagedTypes obtained from multiple assemblies into it. Or ByteCodeTypes from multiple class files, across Jars, into it. You can query this unit for Java types by name, etc. So, for instance, you could query it for cli.System.Object. If the container contains the ManagedTypes derived from mscorlib.dll (or wherver Object is these days), you'd get back a JavaType that represented that class as seen from Java. This type would have higher level properties, dealing with the shape of the type as seen from Java.

Next layer is the actual compiler/builder. The goal here is to take the JavaTypes and push them out to some sort of assembly builder, as the .NET types that implement them. Basically, we convert back. We know what the Java type should look like. So, we can ask a Java type to emit itself to some sort of builder. What it emits is a .NET assembly that implements the Java type.

This gets a bit funky, in that you could pass a .NET type through it, to see what it looks like from Java, and then ask the resulting Java type to emit itself back as .NET. And it might work, up until the body, where we have no bytecode to convert. So in practice this won't be done. Users will only emit originally Java types. Probably a flag on the type of some kind.

The static compiler can use this infrastructure to load up .NET information from SRM, dump it into a compilation unit, then load up Java byte code, dump it into some compilation units, then create a new MetadataWriter, then loop over the types in the compilation unit and have them emit into the MetadataWriter. That's static compilation.

The dynamic runtime compiler can do the same thing, but based on System.Reflection.AssemblyBuilder, linking a compilation unit to the associated ClassLoader, and asking it to emit managed types into that AssemblyBuilder as needed.

0 replies

wasabii · 2024-03-09T16:15:57Z

wasabii
Mar 9, 2024
Maintainer Author

Some of this work has been completed as of now. IKVM.Runtime's RuntimeJavaType hierarchy now doesn't rely on typeof() in any path intended to convert byte code. It remains in paths intended for dynamic runtime generation (where teh actual types are the runtime types). So, it's a step closer to splitting out of IKVM.Runtime. But not there yet.

Most of the static access in Runtime is gone. Instead, it passses a RuntimeContext class around, which is similar to a DI container, in that it holds instances of the other types which were previously static. This gets us again closer to splitting it off.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IKVM

Compiler Rebuild #186

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

IKVM

Compiler Rebuild #186

wasabii Oct 22, 2022 Maintainer

Replies: 3 comments · 2 replies

wasabii Jan 2, 2023 Maintainer Author

wasabii Jan 2, 2023 Maintainer Author

wasabii Jan 2, 2023 Maintainer Author

wasabii Jul 2, 2023 Maintainer Author

wasabii Mar 9, 2024 Maintainer Author

wasabii
Oct 22, 2022
Maintainer

Replies: 3 comments 2 replies

wasabii
Jan 2, 2023
Maintainer Author

wasabii Jan 2, 2023
Maintainer Author

wasabii Jan 2, 2023
Maintainer Author

wasabii
Jul 2, 2023
Maintainer Author

wasabii
Mar 9, 2024
Maintainer Author