editing source code

Editing source code as XML versus editing source code in structured form

By Joakim Ziegler

After we announced Conglomerate, we've had a lot of feedback, and a lot of it is from people who envision new areas of use for Conglomerate, areas we didn't even think about. However, one question has popped up more often than others. The essence of it usually goes something like this:

-Now that we can edit XML so great in a structured environment, why don't we create an XML DTD for the structure of a programming language like C, and then we can leave the plain text code editors behind, and have on the fly syntax checking as we code?

At which point most experienced coders are on the floor with spasms. And rightly so, this sort of approach would probably be cumbersome and be used by few people. But it got us thinking. Isn't there a good idea in here somewhere? Probably there is. Conglomerate wasn't really made for XML specifically, it was made for visualizing and editing documents which have structure. And C code most definitely has structure. What's more, that structure, or grammar, is easily and uniformly describable using EBNF (Extended Backus-Naur Form), which incidentally is what XML and SGML DTDs are based on too.

In addition, the visual representation of the code would probably vary considerably from the type of presentation Conglomerate uses for XML structure today. Specifically, a lot of the fancy borders and delimiters will have to go, and the display would probably be very close to what you see in a text editor with syntax highlighting.

So what, then, would be the advantage? Well, Conglomerate with source code editing capabilities would have a full C parser built in. This means that rather than using regular expressions to recognise different parts of the code (like text editors do today to figure out how to syntax highlight, for instance), the code would be represented internally as a syntax tree.

The advantages of this are many. First of all, syntax highlighting would be perfect. It would be extremely fast, and always recognise the parts of the code for what they are. In addition, it would be easy to always keep the code in a consistent state. Need to change the name of a function globally? Swap the order of the parameters? Without breaking stuff? Not a problem. It would also be easy to highlight places where you forgot a brace, etc., without trying a compile. Adding LXR-like cross-referencing capabilities on functions, defines, and variables would take a minimum of work.

Also, since the Conglomerate server will have CVS-like capabilities, a new set of features becomes obvious. The CVS manual specifically states that it doesn't take care of merging in changes on the syntactical level, only on the textual level. That is, there's nothing in CVS keeping you from changing the name or parameters of a function in one source file, commit, and break everything else. If the server knows about code structure, however, it's comparatively easy to add warnings for, and even semi-automatic fixing of this sort of problems.

Obviously, there's only so much time. We'd like to know about the interest for this type of functionality, and more ideas for what it could be used for. If there's sufficient interest (and it doesn't take all that much, we want something good to edit in ourselves), we will implement this sooner or later, but with all the other things that are going into Conglomerate, it'll probably not happen until late spring/early summer 2000, at the earliest. If you'd like to work on this, let us know.