Language Integrated Query (LINQ) is an extension to Microsoft .NET languages that provides query expressions that can be used to extract and process data from collections (for example, arrays, lists, dictionaries).
Concrete syntax of query expressions resembles SQL statements:
IEnumerable<Person> query;
query = from p in Persons where p.age > 90 select p;
In this post, I will introduce query expressions for program introspection in object-oriented languages. For example, query
{ field, in class <T>, <T> extends MyClass, of type int }
yields all integer fields declared in subclasses of MyClass.
Reflections on introspection… (pun intended)
What is LIRQ?
Queries can yield the following entities:
identifiers,
local variables,
function parameters,
arrays,
collections,
fields of classes,
classes,
instances of classes,
interfaces,
enumerations, and
functions (methods).
A query consists of one or more conditions written within curly braces and separated by Boolean operations:
comma , for conjunction
vertical bar | for disjunction
exclamation mark ! for negation.
In this post, I will assume that queries are embedded into Java/C#, though the concept itself does not depend on a particular language.
Value, name and type queries
For an identifier x:
query { &x } yields its value,
query { @x } yields string “x”, and
query { #x } yields type of x , which can be used in declarations:
int x;
{ #x } y; // int y;
Primitive queries
For each entity mentioned above, there is a corresponding query ({var}, {class}, {field}, and so on) that yields all such entities. For example, query {var} will yield all variables. To yield a non-empty result, a query should contain at least one primitive condition.
Regular expressions for names
Query {'v*'} yields all identifiers with names starting with symbol “v”. Queries can be used in qualified names, too:
person.{'a?e'}
Type constraints
Query {var, of type int} yields all local integer variables. It can be used in an assignment statement:
{var, of type int} = 0;
Constraints
Query
{ var, of type int, (that >= 0 | that <= 10) }
yields all local integer variables whose value is in range 0..10.
Keyword that refers to an yielded result of a query. Negation can also be used within that expressions. Queries &that , @that and #that are considered primitive queries. For example, {var, @that} yields a collection of names of all variables.
Query variables
Query
{ var <T>, of type int, ( <T> >= 0 | <T> <= 10 ) }
is equivalent to query
{ var, of type int, (that >= 0 | that <= 10) }
given above, but uses a query variable T that refers to yielded result. Variable names are enclosed in angle brackets (remark: this syntax has nothing to do with generics).
Query variables can also be used for types and essentially all other entities, for example:
{var <X>, of type <Y>, <Y> is subtype of int}
Functions
Query {function <F>() returns <R>} yields all functions without arguments visible in the current scope. Desired parameters can be requested by using regular expressions-like syntax:
? denotes any parameter,
* denotes 0 or more parameters,
+ denotes 1 or more parameters,
int denotes an integer parameter, and so on.
Query {function <F>(?, int, *) returns string} yields all string functions whose second argument is of type integer.
Qualifiers
Query {class <T>, <T> extends MyClass} yields all classes that extend MyClass. Part ... extends ... of this query is called a qualifier. Other qualifiers include:
is abstract
is static
... implements ...
... inherits ...
is subtype of ...
has ... (used to specify that a class has a certain field or method),
and so on.
Declared entities
Qualifier declared ... allows distinguishing between an yielded result and a condition in a query. For example, query
{class <B>, <B> extends <A>, class <A>}
is invalid because it has two primitive conditions ( class <B> and class <A>). However, query
{class <B>, <B> extends <A>, declared class <A>}
is valid and yields all subclasses of all classes.
Instances
Statement
{instance of Person}.age = 0;
assigns value 0 to field age of all instances of class Person. Depending on how semantics of queries is defined, instances may either refer to all declared instances of a class or to all instances existing during runtime.
Loops
Queries can be used in for-each loops:
for x in {var x, of type int} {
x = 0;
}
Scopes
Query {field, of type int, in declared class MyClass} yields all integer fields in class MyClass. Keyword declared can be omitted in in conditions.
Query {in function(int, int) returns <R>, var} yields all local variables in all functions (from the current scope) with two integer arguments.
Nested queries
Query
{in {function(int, int) returns <R>, in class MyClass}, var}
differs from the previous one in that it only considers methods of class MyClass.
Visibility modifiers
Queries can be used to define custom visibility modifiers.
class A {
modifier children = {class <T>, <T> extends A};
[children] int x; // only visible in subclasses of A
...
Queries as first-class citizens
New primitive type query is introduced to represent reflection queries.
query a = {class, with constructor <X>()};
query b = {{a}, that extends MyClass};
<<b>> x = new <<b>>(); // parameterized statement;
// it creates instances of all subclasses of
// MyClass that have an explicit constructor
In this examples, query a yields all classes that have a constructor without parameters. Query b refines this query additionally requiring that those classes extend MyClass. An instance is then created of each matching class.
To typecheck queries, primitive type query should be annotated with the “type” of entities that a query yields. In the example above, complete definitions of a and b will be:
query<class> a = ...
query<class> b = ...
Consider now another example:
query<type> t = {#that, var, @that.startsWith('a')};
<<t>> x;
Query t yields types of all variables whose name starts with symbol “a”. These types are used then in the parameterized declaration statement.
The following query increases all integer variables by 1.
query<var> q = {var, of type int};
<<q>> = <<q>> + 1;
Type annotations (<class> , <type> , <var>, etc.) of queries might not need to be specified explicitly as they can be inferred in most cases from queries themselves.
Kinds of queries
How could it be possible to represent the “machinery” of a query so that one could “compare” them? A possible answer might be to define kinds of queries, in a way somewhat similar to kinds of types.
For example, kind of query {var, of type int} is VAR*, denoting that it yields some variable subject to some conditions, while kind of query {var age} is VAR (without the star), because it yields a specific variable define in the query itself.
For query
{instance of <T>, declared class <T>}
the kind is CLASS* -> INST* , whereas kind of query
{instance of MyClass}
is CLASS -> INST*.
Finally, for query
{
declared class <A>, declared class <B>,
<A> extends <B>,
field, <A> has that
}
its kind is CLASS* -> CLASS* -> FIELD*, denoting that this query has conditions on two classes and yields a field.
Semantics and implementation
Query {var, of type int} yields all integer variables from the current scope. This query can be used in an assignment statement:
int x, y, z;
{var, of type int} = 0; // x = 0; y = 0; z = 0;
In compile-time semantics, queries are essentially treated as macros, and the assignment above is transformed into a sequence of assignments
x = 0;
y = 0;
z = 0;
In run-time semantics, the assignment is transformed to (Java/C#/…) code that emulates the query using corresponding reflection API.
For queries with constraints, such as {var, of type int, that > 0}, only run-time semantics shall be defined.
Design of LIRQ is still very experimental. I am implementing an early prototype using language workbench JetBrains MPS that allows extending Java with new constructs.
Since MPS uses projectional editing, there are no issues with possible ambiguities in concrete syntax of reflection queries (for example, no problem at all with curly braces in queries vs. curly braces in compound statements in Java).
Semantics of reflection queries is enabled by model transformation and code generation mechanisms of MPS. In compile-time semantics, generated code can also be statically typechecked.
Some related work
Object Constraint Language allows specifying conditions of the form context Person inv: self.age >= 0 in a manner similar to reflective queries.
Constraints on type parameters in LINQ and wildcards in Java are similar to qualifiers in the terminology of this paper.
EcmaScript has support for computed names of object members, which resembles name queries.
C# has nameof expression that returns name of an identifier — this is exactly what “@” queries do.
Further ideas
Queries like {int, that > 10} can be seen as dependent types.
Similarly, queries type number = {int | float}; resemble type classes (for example, in Haskell) and “multitypes”.
Reflection queries may be extended to access the AST, for example, query {declared [for loop] <S>, [counter] of <S>, @that} would yield names of all counters in all for loops.
Source: Medium
The Tech Platform
Comentarios