Last week I wrote the first post in this series, where I introduced the project and wrote about generating Rust code for the parsed Postgres AST.
I also wrote about the need for wrapper enums in the generated code, but I don’t think I went into enough detail, based on questions and discussions I had after I shared that post in /r/rust.
So this week I will go into more detail on exactly why I had to do this.
- Part 1: Introduction to the project and generating Rust with Perl
- Part 1.5: More about enum wrappers and serde’s externally tagged enum representation
- Part 2: How I’m testing the pretty printer and how I generate tests from the Postgres docs
A Tagged Enum Example
In order to make this simpler, I’ll use some very simple JSON, as opposed to the rather complex JSON we get back from the Pg parser. However, I cannot change the JSON to make parsing easier, just like I cannot do that with the Pg parser’s output1.
refer to parts of the document. You can see that every object in the JSON is
“tagged” with its type. Those are the title case keys:
Let’s assume that the
$.Root.second key is optional, so it could be entirely
omitted in some documents.
The Naive Approach
Now let’s make some Rust structs that correspond to this JSON. This corresponds to the naive directory in my example repo.
This is all pretty straightforward. We have a
Root struct that can contain a
Foo, an optional
Bar, and zero or more
And here’s our parsing code:
So what happens when we run this?
We get this error:
The important bit is
"missing field `first`", line: 29, column: 1. What’s
at line 29, column 1 of our JSON document? That’s the end of the document,
So basically we’re seeing that the serde JSON parser looked through the entire
top-level object for a
first key but could not find one. That makes sense,
since the top-level object in the actual document only contains a key named
Fortunately, serde has a solution to this, in the form of its “externally
handling. For this type of JSON, each object is annotated with an extra “tag”
indicating its type, just like we see with
But the key word here is “enum”. Serde does not offer a way to handle this style of JSON without using enums. So I need to make a bunch of enums, one for each possible tag.
The So Many Enums Approach
This corresponds to the with-enums directory in my example repo.
And here are our structs and enums:
Note that the type of
output is now
RootWrapper instead of
runs without an error, giving us:
Yay, it works! But it has tons of pointless enums. Boo!
The enums generally clutter up the code with a lot of destructuring. For
example, if I want to get the struct corresponding to
have to write this:
In my Pg formatting code, multiply that destructuring by a thousand.
There must be some way out of here
When I shared this in
last week, /u/nicoburns had some
helpful suggestions for working around this. We went back and forth a
and I was able to get something that worked a little bit. But it only worked
for simple cases. I couldn’t get it to work for cases like
Vec<Action>. And in the Pg parser AST, I also end up with
Option<Vec<Something>> too, as well as cases with tuple structs like
Vec<(Foo, Bar)> and probably some other weird things too.
What I would love is a solution that changes the code generated by the serde macros to just “skip over” the tag instead of creating an enum for it when the enum only has one variant.
A solution that still requires the wrappers and even more generated code for them would be fine, though I suspect it’d make the AST code’s slow compilation even slower.
I started digging into serde a bit to try to understand how I might do this, but it’s pretty complex, and I’m still pretty new to Rust.
For now, I have enough other things to work on with this project. For example,
the way I generate formatted SQL is horrific and unscalable (lots of inline
some_str.push_str("WHERE ") and
format!). I’m starting on a refactor to
generate some sort of intermediate representation of the AST that I can then
turn into a string.
Here’s a list of what I want to cover in future posts.
- Diving into the Postgres grammar to understand the AST.
- How I’m approaching tests for this project, and how I generate test cases from the Postgres documentation.
- The benefits of Rust pattern-matching for working with ASTs.
- How terrible my initial solution to generating SQL in the pretty printer is, and how I fixed it (once I actually fix it).
- How the proc macro in the
- Who knows what else?
Stay tuned for more posts in the future.