[As community content, this post reflects the views and opinions of the particular author and does not necessarily reflect the official stance of Neo4j.]
In this article we’ll see how programming can be understood, practiced and automated like a game of building tree graphs. We’ll look at the basic programming actions involving the manipulations of tree graphs, populating the nodes, and identifying nodes that can be expanded to add functionality.Like we’ve seen in previous posts, we will attempt to look at programming the way Turing may have looked at it, considering what is going through the mind of the programmer during development and attempting to automate it using Cypher annotated microservices.
Why Coding Is Actually More Like Parse Tree Graph Building
We are all used to the image of the developer using some kind of text editor to write code, but we also know that working code must have a parse tree structure built according to the rules of the programming language used. In fact, if we’ll inspect the successions of commits to a git repository using a parse tree builder like Antlr, we’ll see parse trees being evolved with new branches added with any new valid line of code.
Developers feel a lot more like text code writers than parse tree builders mainly because the tree building and validation process is partially hidden by compilers, and the complexity of the parse tree gets quickly overwhelming even for a few lines of code. Nevertheless, the mind of the programmer has to elaborate consciously or not precisely on these tree kind of graphs if they hope to get anything useful from their code.
So how do these parse tree graphs look like? Even for a simple Java variable declaration like:
Integer dimX = 3;
Antlr4 will depict for us (thanks to @the_antlr_guy and other Antlr contributors) a graph like this:
We could have then depicted this parse tree in the Neo4j browser with a look similar to the graph below, with green nodes representing the leaves of the parse tree:
A for loop to create a number of nodes using Neo4j Java API, expressed in 3 lines of code:
for (int i = 0; i < dimX; i++) { p[i] = db.createNode(pixel); }
would look this cool with Anltr:
and colored in the Neo4j browser:
Now we could have a better appreciation for the background work happening in the mind of coders juggling tens of programming language rules even for a single statement of code.
Getting Inside a Programmer’s Mind to Build a Turing Machine
So what patterns could have Turing noted by observing programmers solving software problems?
Here is a good time to go back to the exercise suggested in the previous post for writing the Java code to draw a pyramid structure in Neo4j. Because even the code for creating the base of the pyramid has a dizzyingly complicated parse tree, for the sake of simplicity, we’ll look at the code required to build just one line of nodes as pixels storing RGB values.
Of course wearing Turing’s hat today and trying to figure out an algorithm for coding similar to the mechanism of the Turing Machine might seem very speculative. But I think the activities below are likely to be observed in one form or another in any software development exercise.
For an easier understanding of the processes which majority of the programmers run in their minds at speeds which resemble magic for outsiders, I’ve prepared a short presentation where you can follow in parallel the changes in the code editor as well as the manipulations and gluing of large tree graphs into working software.
Turing would have observed that a first step would be a clear definition of the problem. As discussed in previous articles, this can have a graph representation for an initial context from where programming will start and a graph representation of the desired destination.
In the case of programming problems, the destination graph will have to include one or more nodes representing code files which will have to be built as part of the solution. These file nodes will need to have properties describing the language, paths and eventually links to specific test scripts which have to be passed by the code.
For simplicity and to focus on the process itself we just noted a few words on the context graphs, indicating that we want to get to a state where we need to have a Java program file which will need to build in Neo4j a graph composed from nodes labeled
Pixel
, storing RGB values and being linked with relationships called X
.The problem wouldn’t be a problem without a difference between the Initial Context and Target Context, and Turing might have noted that the difference between the two context graphs can be computed by listing the nodes, properties and relations which are different or missing between Target Context and Initial Context. Finding a solution would mean to find a number of actions to apply to the Initial Context, to add nodes, properties and relations in such a way that we transform it in the graph of the Target Context including software structures that will pass all required tests.
Think about finding a software solution as using an Uber for software services, which should bring you from the Initial Context point in the wonderful universe of code to a Target Context point. There might be a multitude of paths representing a solution, each of them being composed of different actions or steps in different succession, but all of them covering all the differences or distance between the two points.
If this starts sounding like a software project text, you are on the right spot. A project charter will define the initial and final context, a business analyst will produce eventually a gap analysis representing the difference between contexts. Architects will draft a path based on available tools and actions, and the dev team will implement each step on the path in successive contexts until all tests defining the destination will be passed.
If the team will wait for all paths to be decided before starting implementation, the process will look like a waterfall approach. The algorithm described below could be considered an agile path search in the universe of possible solutions.
Mapping between the Initial and Target Context
Once enough details of the problem are defined and a list of differences between the two graphs emerges, Turing would have observed that the programmer will start searching for known activities which will address one or more differences between the two contexts. We’ll call these activities microservices and in our example let’s assume that the programmer is aware of a microservice offered by the Neo4j Community which creates a stub file and equivalently a graph representing the parse tree of a Java file class initializing a Neo4j database in a given directory.
In a few search iterations on various combinations of delta elements, such a stub file/graph microservice will be at the top of the search result. (Can you guess why? It’s related to dependencies.)
For the programmer, this would feel like a good first step towards a solution because majority of the differencing and memory search operations happen so fast in brains that they don’t reach the consciousness level to allow us to have memories and talk about them.
Once this microservice is selected, and its Cypher query annotations are retrieved from memory, the programmer would be able to run them on the problem contexts. Let’s call the first Cypher annotation the ”Microservice Input Query” which will retrieve from current context the required input arguments which will populate specific leaves of the microservice graph.
In our example, this query will find a string value indicating which path is available for the solution of the problem.
With the input provided and the path node updated, the programmer is ready to write several lines of code in a Java file which is equivalent to creating the microservice stub graph in the graph of the problem. In practical visible activities, we would observe the programmer writing or copying chunks of code lines or the IDE (or some plugins) generating part of the code and the programmer inserting text in some very specific places.
However, wearing the Turing hat, we would realize that the writing will happen only after at least part of the microservice graph is “created” in the mind of the programmer which is the precondition for him or her to find the proper place to write a database path string to be used.
The stub graph will be then connected to a node in the current context representing a Java file that is found using a “Microservice Insert Query” in Cypher, associated (annotated) as well with the microservice.
This way the delta between the Target graph and Initial Context graph is reduced and the programmer is searching now for other microservices to address the remaining delta.
Navigating Future Insert Cycles
Now that we have the framework for the microservices insert process, let’s walk through another insert cycle which will add another statement of code to the text file and a branch of Java parse tree to the graph of the solution. And for more clear “voyance” :-), same as Turing, who observed many end-to-end calculations done by human computers before drafting the algorithm for his machine, we will peek at steps ahead of the current insert cycles.
(These clarifications from the future are often used by compilers and the wonderful people building them. Here these look-ahead steps are just for clarity as the generalized Turing machine operating on a market of microservices will be able to find its way automatically through the multitude of possible logical steps.)
So, in order to have a piece of code which will build in Neo4j a graph like the one below, after having the boilerplate code in place, the coder will have to create the nodes. Then with the nodes in the database, he or she would create the required relationships.
Before creating the Neo4j nodes and relationships, one by one or through an iterator or “for loop,” the coder will have to create and initialize in-memory any variables and related type definitions. The activities known to the programmer from previous coding classes or experience and from Neo4j trainings, or the equivalent microservices available for use/consumption in an open market of microservices, would have dependencies which are checked automatically through the associated Cypher queries.
Thus, a microservice building a Neo4j relationship will have as annotation a Cypher query (Microservice Input Query, or MIQ) which will search in the current problem context pairs of nodes as the ends of the relationship. That microservice is definitely relevant for the problem but can only be used after the nodes are created in the current context – before that the MIQ will return no results and the algorithm will continue its search through the list of relevant microservices until another microservice will create the nodes in the graph and will enable the relationship creation.
Similarly, the node creation will be dependent on statements of variables declarations and initialization and some microservices can include the generation and insertion of such statements and associated parse tree branches.
Enough look ahead, let’s go back under Turing’s hat and in the mind of the programmer. With the Neo4j stub code in place and an actionable microservice found (as described above) to cover the creation of the nodes and any prerequisite declarations, our programmer would have added a line of code to declare and initialize a “Pixel” label:
Turing would have noted the related two mandatory steps in its algorithm as in the two slides below showing the equivalent declaration graph inserted in the space of the problem:
Then the identification of the node where the new branch should be attached to the tree graph of the solution:
Turing would have observed that such cycles are repeated until all differences are covered and all required statement parse graphs are created and linked in the current context, with leaf nodes populated appropriately with arguments from the problem context. The solution tree graph, built in this case based on Java 8 grammar rules, contains in its leafs all the elements (tokens) of the text code and can be either parsed into a source file or into a class file.
A funny thing to think about is that even if the programmer has no clue what a compiler is doing internally or what a Java grammar file looks like, if he is able to produce that source code, he would have in his mind some form (more or less summarized) of the parse tree on which his thinking operated.
See below the full code and the complete solution tree as produced by Antlr4:
And here the equivalent parse tree graph exported from the Neo4j browser:
You can watch in the short 60-second movie below a quick succession of the development of the code in parallel with the development of the parse tree graph in an Antlr4 GUI format and then the evolution of the same graph in a Neo4j format.
Don’t hesitate to contact me via the comments below if you’d like to follow the steps slide-by-slide in a PDF PowerPoint file.
In the next article in this series, we’ll look at what other microservice annotations are necessary to make a programmer’s mind work, and how we can achieve a general problem solver status for our Turing Machine by implementing the algorithm on top of an open market of microservices.
Before finishing this post, I have to recognize the fact that this work including code samples, parse trees and related graphs, would not have been feasible without the use of a number of tools like: Neo4j Community edition, Eclipse, IntelliJ, Antlr4, the Java 8 grammar and MS PowerPoint.
Thank you, guys!
Learn how to solve problems just like this one using Neo4j. Click below to get your free copy of the Learning Neo4j ebook and learn to master the world’s leading graph database.