House Absolute(ly Pointless)

Eating Vegan in Taiwan

Thu, 28 Dec 2023 11:45:00 +0800

This is a post on finding vegan food in Taiwan, with some context on the history of vegetarian food here. It’s not a list of vegan restaurants. Instead, it’s a meta-post about navigating Taiwan as a vegan.

About Me

I’ve been vegan since late 1996, so over 27 years at the time of this post (December, 2023). I’m currently starting my eleventh visit to Taiwan since 2000, and I’ve spent about 9 months here total across those visits. While I’m not fluent in Mandarin, I can speak and read a little bit. My wife is Taiwanese, so when I need to ask more complex questions I have a translator to help me. But I also go out to eat by myself here without my wife, so I have some sense of what it’s like to eat as a vegan who’s not fluent in the local language.

About You

This post assumes you don’t speak Mandarin or read any Chinese characters. If that’s not true, then some of this advice won’t be necessary, because you can just ask food vendors questions in Mandarin.

Tourists in Taiwan

People in Taiwan are generally quite friendly with visitors. If you learn even a little Mandarin, people will make an effort to communicate with you. Sometimes they’ll even try English, but the average worker in a restaurant or store probably speaks little to no English (or if they do they’re not comfortable speaking it, though they may understand some of what you say).

Edit February, 2024: I’ve realized that there’s very much a generational divide here in regards to English. Many younger people (< 40 or so) I’ve encounted in stores are willing to try speaking English than on my first trips to Taiwan. Older people still do not speak English. That said, you’re still going to have a better experience as a vegan if you don’t rely on English.

Do You Need to Speak Mandarin to Eat Vegan in Taiwan?

This is complicated, and I’ll get into this more below. The short answer is “no”, but depending on how strict you want to be as a vegan, you may end up with a fairly limited set of food options.

Learn Some Mandarin

Learning a few phrases in Mandarin will go a long way. You can also have these written down on paper or in an easy to access place on your phone to show people. Learn to read and pronounce Pinyin so you can say these phrases with the proper tones. Without the tones people may not understand you.

“I am vegan” - 我吃純素 - Wǒ chī chún sù (literally “I eat pure vegetarian”)
- Note that this doesn’t exactly mean vegan as we use the term. You should follow up with the phrases below to tell them you don’t eat eggs or dairy.
“I don’t eat eggs” - 我不吃蛋 - Wǒ bù chī dàn
“I don’t eat milk, butter, cheese” - 我不吃牛奶、奶油起司 - Wǒ bù chī niúnǎi, nǎiyóu qǐ sī
“I eat garlic and onions”¹ - 我吃大蒜和洋蔥 - Wǒ chī dàsuàn hé yángcōng
“English menu” - 英文菜單 - Yīngwén càidān (you can just say this to ask if they have one)

You should also learn things like “thank you”, “excuse me”, and so on. But this isn’t a general “Mandarin for tourists” post, which I’m sure you can find elsewhere.

Learn to Read Two Characters

The two most useful characters you can learn to recognize look like this: 素食

This means “vegetarian food”. Many vegetarian restaurants and food carts use these characters in their signage.

The Swastika (卍)

You may also see this symbol - 卍. Note that it’s the mirror image of the version the Nazis used. Don’t freak out! This is a Buddhist symbol, and any food vendor with this symbol will probably be vegetarian.

Tools You Need

Install the Google Translate Chrome extension. If you’re using Firefox, there is an unofficial extension for Google Translate. Learn how to use this on your laptop browser! It’s incredibly useful for translating websites.

If you’re on Android, the Chrome browser already has this built in. If you’re on an iPhone, translation is built into Safari. I’ve never owned an iPhone so I don’t know how well this works, but I imagine it’s decent.

Install the Google Translate Android app or iPhone app. This app is fantastic! You can point your camera at written text to get a translation, and in my recent experience this reasonably well on menus. I’m able to “read” a Chinese menu using the app, though see below for more detail on understanding menus.

The Google Translate app also does real time (ish) audio translation, so you can have (slow) conversations with people in Chinese. That said, if a restaurant is very busy the staff may not be willing to do this.

English Menus

More upscale restaurants or restaurants that are part of a chain will often have menus with both English and Chinese. Sometimes they’ll have a separate English menu. Smaller mom and pop shops and food carts often don’t, though in touristy areas (like the most popular night markets) they might. But don’t assume you’ll find one. Be prepared to use Google Translate on your phone.

If you don’t look East Asian they may just give you the English menu or point you to where it’s posted. If you look East Asian you’ll probably have to ask for it.

Vegetarian Food in Taiwan

Taiwan has always had a lot of vegetarian food, at least since I first visited in 2000. There are many restaurants, food carts, and options in grocery stores. Historically, this is because Taiwan has a lot of Buddhists (35% according to Wikipedia), and Taiwanese Buddhism emphasizes vegetarianism.

However, Taiwanese Buddhism does allow animal products, as long as the animals aren’t directly killed to make the product. This means dairy has always been allowed. In the past, many Buddhist food vendors would not use eggs, because they might have been fertilized, which means eating one would kill the fetus. But ironically, the advent of factory farming means that eggs are more acceptable to Buddhists, since they cannot be fertilized when the hens live in confinement separate from roosters².

Buddhist practice in Taiwan also forbids garlic, onions, and anything else in the Allium genus, like leeks and chives. If you go into a non-veg restaurant and tell them you’re vegetarian in Chinese, you may end up with food that doesn’t have any garlic or onions.

Dairy and Eggs in Practice

Vegetarian food vendors serving Asian food mostly don’t use dairy, except in some desserts. That’s simply because it’s not part of the cuisine. They do often use eggs.

Vendors selling Western food often use dairy, and it’s common to see things like pasta with cheese in vegetarian restaurants. It’s relatively uncommon to find a vegetarian (as opposed to vegan) place which offers the same items with dairy alternatives. However, there are some vegan restaurants that do offer things like burgers or pizza with vegan cheese.

Reading Menus

At many restaurants, the menu is very minimal, containing only a title for each item without any indication of ingredients, how it’s prepared, etc.

To make matters even more confusing, when you use Google Translate, you sometimes get an unhelpful translation. This is often because it’s a common dish with an idiomatic name that translates to something like “soup, winter pink pigment”. Taiwanese people know what this is, but you won’t.

There’s not much to be done about this. If you’re adventurous and not picky like me³ then go ahead and order it.

The one thing I’d note is that if the translation includes the word “egg” or “omelet” then you should assume that it contains eggs!

Some restaurants that use dairy and eggs mark items which contain them on the menu. Often, this is done with graphics that look like a bottle of milk or eggs, but it’s not uncommon to see text instead. The commonly used characters are:

Has dairy - 奶食 (more literally translated as “milk vegetarian”)
Has eggs - 蛋食 (more literally translated as “egg vegetarian”)
Vegan - 純素 (more literally translated as “pure vegetarian”)
Contains ingredients from the garlic/onion family - 植物五辛素
- A dish with this marker may also contain eggs or dairy.

You may also see the word “vegan” written in English. In my experience this is generally accurate on menus. However, on food packaging this may not be correct. See this blog post on Taiwanese food labeling for much more detail.

Some fancier non-veg restaurants label vegetarian items with a green leaf or some other green icon. But most non-veg restaurants have no such labels. The only way to find out if they have vegan food is to ask.

Non-Vegan Mock Meat

The mock meats used in many Taiwanese vegetarian places may not be vegan, as some mock meats are made with eggs (or whey, I believe). However, it’s quite possible that the vendor won’t know what’s in the them. I typically don’t worry about this too much, but if you are stricter than I am, you should be safe by only eating tofu and mock meats made with soy or wheat gluten (aka mock duck in the US). Avoid things like mock ham, chicken nuggets, etc.

That said, the menu items often don’t make it clear what the mock meat is. It’s very common to see a dish like “rice with meat sauce” at a vegetarian restaurant. Typically, this sauce will be made with a soy product or wheat gluten and some chopped mushrooms. But it could be made with some other mock meat.

Also, the menu may not indicate that an item contains mock meat at all, instead the menu might just say “fried rice” (in Chinese), but that fried rice might contain chopped mock ham.

There’s really no good way to handle this except with a more detailed conversation in Chinese (using Google Translate) or by only eating at 100% vegan places.

Finding Places to Eat

In my experience, the best tool for finding places to eat is Google Maps. Simply searching for “vegan food near me” or “vegan food Taipei” finds a ton of places, including both vegan and vegetarian options. You can also search for “素食”.

However, I would note that Google’s categorization of places as either “vegan restaurant” or “vegetarian restaurant” is quite random. It’s wrong in both ways, marking vegan places as vegetarian and vegetarian places as vegan. I’m trying to fix this a bit as I see it, but just assume it’s wrong.

Google Maps is used widely by Taiwanese people, so there’s a lot of reviews for most places. It’s also the most accurate resource I’ve found for determining whether a business in still in operation.

You may want to use Happy Cow. From what I can tell, Happy Cow is mostly updated by English-speaking people who don’t live here long, so it’s often out of date, with many listings for places that are closed, while missing many great options. I would recommend that you always look a place up on Google Maps to confirm that it still exists before going there. Happy Cow is also often wrong about vegetarian versus vegan.

If you Google for “vegan food in Taiwan” you’ll find a lot of blog posts. Again, these are often quite out of date. Confirm that the places mentioned still exist before going there!

There are also some directory/map sites in Chinese:

https://food.suiis.com/ - In my experience this one is sometimes out of date and it will show you places that are closed, but it can help you find places that don’t show up with an English language search.
https://vegemap.merit-times.com/restaurant_list - I have no idea how up to date this is. I haven’t used it much.
https://ifoodie.tw/ - You can search for “素食” in various cities. I have no idea how up to date or complete this site is. I haven’t used it much.

Again, I recommend you double check any listing you find against Google Maps, which is the most up to date resource I’ve found for whether a business is closed or not.

Note that Google Maps is often wrong about the hours of operation. If you can, it’s good to call first.

Convenience Stores

Unlike in the US, convenience stores here in Taiwan actually have good food! A lot of them have vegan items. That said, from what I’ve read online, even if an item has English “vegan” or “plant-based” on the label, it may not actually be vegan. You can use Google Translate on the ingredients to double check.

Advice About Specific Food Items

Stinky tofu - This may not be vegan. Animal products are often used to start the fermentation process, including animal blood or dairy, so even a “vegetarian stinky tofu” vendor may not be vegan. You’ll have to ask.
Scallion pancakes - Some of these are made with lard instead of vegetable shortening.
Anything deep fried or boiled. This may be cooked in the same oil or water as non-vegan food if the vendor serves non-vegan items.
Bubble tea - Even though oat milk is common at coffee shops it’s rarely found at bubble tea shops, sadly. But bubble tea shops in Taiwan have many milk-free options.

Other Resources

Check out Nick Kembel’s post on the same topic as this one. It covers some things I don’t, like how to order using paper menus as well as more information on vegetarian (but not vegan) options.

The Absolute Best Way to Find Vegan Food in Taiwan

The absolute best way to find vegan food in Taiwan is with a Taiwanese person! I went so far as to marry one. That’s pretty extreme but you may want to consider this option if you struggle with finding vegan food in Taiwan.

I will explain why this phrase is useful below. ↩︎
A great demonstration of how deontological ethics fails without some consequentialism to go with it. ↩︎
I hate mushrooms! ↩︎

Naming Your Binary Executable Releases

Sun, 16 Apr 2023 17:50:12 -0500

My universal binary installer tool, ubi, has to deal with a lot of “interesting” decisions when it comes to how people name their releases on GitHub.

So in the interests of making the world of binary executable releases more machine-readable and a little less weird¹, here are my recommendations on naming your release files. The TLDR is:

Either use an extension or don’t include periods in the filename.
Use well-known operating system and CPU architecture names as part of the filename.

This applies to releases which just contain a single compiled executable (and maybe a readme or license file, but the binary is the important bit). This doesn’t apply to anything which ships its own installer, nor does it apply to files that contain packages, like debs or RPMs. It also doesn’t apply to programs in interpreted languages like Python or Perl, unless you are actually shipping a single-file binary using something like py2exe

Either Include an Extension or Don’t Include Any Periods

Some folks want to release the bare binary without any compression. That’s fine. But if you do this, please don’t put any periods in the filename unless it also has an actual file extension.

The reason for this is that it’s very hard to determine whether “the text after the period” is a file extension or just part of the main filename. For example, given the filename my-cool-program-v1.2.3-linux-x86-64, what’s the file extension? Well, you and I, as humans², can tell that there is no extension. But for a computer, it sure looks like this file has an extension of .3-linux-x86-64!

So either give the file a proper extension, like .exe on Windows, or avoid periods in the name entirely.

A simple way to make sure all your files have an extension is to just compress all of them. What I typically see is .gz for Unix-y systems and .zip for Windows, but any sort of relatively common compression scheme is fine.

Include the Operating System and CPU Architecture in the Filename

And don’t make up your own operating system and architecture naming scheme!

Don’t use names like “linux64” or “win32”. Please include both the OS name and the CPU architecture name as separate components instead of some shorthand for both.

There’s no standard for this but there are many reasonable sources for this information.

Running uname -p on systems that support this will give you a reasonable architecture name.
Running uname -s gives you a reasonable OS name.
With Rust, the “target triple” for the rustc command can be taken from rustc -vV. You can use this target triple directly as part of the filename.
With Go, you can run go version, which will print something like go version go1.18.5 linux/amd64. Split that last bit on the / to get the OS name and CPU architecture.
- If you’re cross-compiling Go then you have to set the GOOS and GOARCH environment variables. You can use the contents of those variables in the filename too!
When you run gcc -v it prints a line like --target=x86_64-linux-gnu, where the target includes both the CPU architecture and OS name (at least on my system with GCC 9.4.0).
Running clang --version prints a line like Target: x86_64-pc-linux-gnu with similar info.

Other languages often include this in their version output, like in perl -V.

There are probably lots of other ways to get this information. The point is that you don’t need to make this up and hard-code an arbitrary string for each platform you build on. You can get a name that’s useful one way or another.

Different languages and tools don’t agree on exactly what to call various things, so we end up with “Darwin” and “macOS”, “x86-64” and “amd64”, etc. But there’s a fairly limited set of variations to account for when using the output from other programs. But if you just make stuff up on your own the variations are limitless, and that’s not a good thing!

You might be tempted not to include this because your program only works on one OS/CPU target. But please don’t skip this. It will still be useful for tools like ubi to have this info in the filename. And who knows, you might end up expanding the set of covered platforms in the future.

I’m mostly against making the world less weird, but I’ll compromise for machine readability. ↩︎
I am definitely not an alien or a dog in a human suit and I can prove it. Woof. ↩︎

Come (Maybe) Be the Boss of Me

Tue, 21 Mar 2023 09:31:24 -0500

When I started at MongoDB in May of 2022 I was the fourth person on my team. Since then, we’ve hired six more engineers, bringing us to a total of 10 people. That’s a big team!

That’s why we are hiring for a new Team Lead, so that we can split into two teams. I’m not sure which team I’ll end up on, but this is your chance to maybe be my new boss! If telling me what to do appeals to you then you should apply right away.

Note that this job is available remotely in North America.

Also, Engineering Team Leads at MongoDB still write code in addition to their management duties. Teams are generally smaller (you’d have 5 direct reports) for that reason. So if you’re interested in management but don’t want to give up coding, this is a great opportunity.

I’ve been really happy working at MongoDB for the last ten months. If you have any questions about the company, team, or product, please reach out to me. I’m happy to chat via email, phone, or video about my experience at MongoDB and any other topic.

Sleep No More Is My New Favorite Videogame

Fri, 17 Mar 2023 13:58:59 -0400

Last Saturday my wife and I saw Sleep No More in New York City. It’s a mostly silent film noir style adaptation of Macbeth as a play/dance piece. There are no seats. Instead you follow the performers around the space, which is four floors of a converted hotel. You can walk through nearly all of the sets in full, and you can more or less go where you want. You don’t even have to follow any performers at all if you don’t want to.

I really loved it and I’m already planning to go back in the future. As I was thinking about why I loved it, I realized that it tickles the same part of my brain as some videogames. In particular, I think it’s a lot like the experience of playing Dishonored and similar stealth games.

When I play a game like Dishonored, there are two aspects of it I enjoy the most. The first is simply exploring the spaces that the game gives you. I find it incredibly rewarding to form a map of a new space in my head, and to look into all the nooks and crannies for fun visual details. Sleep No More lets you do exactly the same thing. You can wander around the whole space, which is fairly large and filled with small visual details. I honestly think I could spend many hours just looking through the rooms without the performers there.

The other fun part of Dishonored is observing the stories of the characters around you. If you’re playing the game in stealth mode¹ you will spend a fair bit of time following the NPCs around, listening to their conversations and learning their routines. Simply doing this and staying out of sight is really fun. Sleep No More gives you a similar experience, though you don’t need to stay out of sight.

So I’m definitely planning to attend Sleep No More again, and this time I’m aiming for 100% completion with all achievements!

This is the only proper way to play it in my opinion. ↩︎

Cross Compiling Rust Projects in GitHub Actions

Sun, 05 Mar 2023 15:28:32 -0600

I was recently working on the CI setup for my ubi project with a couple goals. First, I wanted to stop using unmaintained actions from the actions-rs organization. Second, I wanted to add many more release targets for different platforms and architectures¹.

Replacing some of what I used from actions-rs was pretty easy:

dtolnay/rust-toolchain replaces actions-rs/toolchain.
actions-rust-lang/audit replaces actions-rs/audit.

But what about actions-rs/cargo? You’d think that running cargo wouldn’t even need an action, and you’d be right. Except that this action doesn’t just run cargo. If you set its use-cross parameter to true it uses cross to do the build instead of cargo, making it trivial to cross-compile a Rust project.

I was already doing some cross-compilation for all my Rust projects, and I wanted to add more. So I needed to replace this action with something of my own. I couldn’t find any already written, probably because everyone who moved away from actions-rs kept saying things like “this is too trivial to need an action, it’s just running cargo build.”

So for my first pass, I simply embedded the build pieces directly in the GitHub workflow for UBI, like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110


jobs:
 release:
 name: Release - ${{ matrix.platform.os_name }}
 if: startsWith( github.ref, 'refs/tags/v' ) || github.ref == 'refs/tags/test-release'
 strategy:
 matrix:
 platform:
 - os_name: FreeBSD-x86_64
 os: ubuntu-20.04
 target: x86_64-unknown-freebsd
 bin: ubi
 name: ubi-FreeBSD-x86_64.tar.gz
 cross: true
 cargo_command: ./cross

 - os_name: Linux-x86_64
 os: ubuntu-20.04
 target: x86_64-unknown-linux-musl
 bin: ubi
 name: ubi-Linux-x86_64-musl.tar.gz
 cross: false
 cargo_command: cargo

 - os_name: Windows-aarch64
 os: windows-latest
 target: aarch64-pc-windows-msvc
 bin: ubi.exe
 name: ubi-Windows-aarch64.zip
 cross: false
 cargo_command: cargo

 - os_name: macOS-x86_64
 os: macOS-latest
 target: x86_64-apple-darwin
 bin: ubi
 name: ubi-Darwin-x86_64.tar.gz
 cross: false
 cargo_command: cargo

 runs-on: ${{ matrix.platform.os }}
 steps:
 - name: Checkout
 uses: actions/checkout@v3
 - name: Install toolchain if not cross-compiling
 uses: dtolnay/rust-toolchain@stable
 with:
 targets: ${{ matrix.platform.target }}
 if: ${{ !matrix.platform.cross }}
 - name: Install musl-tools on Linux
 run: sudo apt-get update --yes && sudo apt-get install --yes musl-tools
 if: contains(matrix.platform.os, 'ubuntu') && !matrix.platform.cross
 - name: Install cross if cross-compiling (*nix)
 id: cross-nix
 shell: bash
 run: |
 set -e
 export TARGET="$HOME/bin"
 mkdir -p "$TARGET"
 ./bootstrap/bootstrap-ubi.sh
 "$HOME/bin/ubi" --project cross-rs/cross --matching musl --in .
 if: matrix.platform.cross && !contains(matrix.platform.os, 'windows')
 - name: Install cross if cross-compiling (Windows)
 id: cross-windows
 shell: powershell
 run: |
 .\bootstrap\bootstrap-ubi.ps1
 .\ubi --project cross-rs/cross --in .
 if: matrix.platform.cross && contains(matrix.platform.os, 'windows')
 - name: Build binary (*nix)
 shell: bash
 run: |
 ${{ matrix.platform.cargo_command }} build --locked --release --target ${{ matrix.platform.target }}
 if: ${{ !contains(matrix.platform.os, 'windows') }}
 - name: Build binary (Windows)
 # We have to use the platform's native shell. If we use bash on
 # Windows then OpenSSL complains that the Perl it finds doesn't use
 # the platform's native paths and refuses to build.
 shell: powershell
 run: |
 & ${{ matrix.platform.cargo_command }} build --locked --release --target ${{ matrix.platform.target }}
 if: contains(matrix.platform.os, 'windows')
 - name: Strip binary
 shell: bash
 run: |
 strip target/${{ matrix.platform.target }}/release/${{ matrix.platform.bin }}
 # strip doesn't work with cross-arch binaries on Linux or Windows.
 if: ${{ !(matrix.platform.cross || matrix.platform.target == 'aarch64-pc-windows-msvc') }}
 - name: Package as archive
 shell: bash
 run: |
 cd target/${{ matrix.platform.target }}/release
 if [[ "${{ matrix.platform.os }}" == "windows-latest" ]]; then
 7z a ../../../${{ matrix.platform.name }} ${{ matrix.platform.bin }}
 else
 tar czvf ../../../${{ matrix.platform.name }} ${{ matrix.platform.bin }}
 fi
 cd -
 - name: Publish release artifacts
 uses: actions/upload-artifact@v3
 with:
 name: ubi-${{ matrix.platform.os_name }}
 path: "ubi*"
 if: github.ref == 'refs/tags/test-release'
 - name: Publish GitHub release
 uses: softprops/action-gh-release@v1
 with:
 draft: true
 files: "ubi*"
 body_path: Changes.md
 if: startsWith( github.ref, 'refs/tags/v' )

I’ve actually cut quite a bit of this out, notably the other 15 or so platforms in the matrix.

Here are the highlights:

If it needs cross it will install that from its latest GitHub release using ubi itself. This is much faster than compiling cross by running cargo install. Otherwise it uses dtolnay/rust-toolchain to install the Rust toolchain.
It will run the build command (cross or cargo) in the appropriate shell for the platform. Using the right shell matters for some corner cases. Notably, ubi now depends on the openssl crate with the vendored feature enabled. With that feature, the crate will actually compile OpenSSL and statically link it into your binary. But OpenSSL fails to compile in an msys shell on Windows!

And that’s really it. I’ve extracted the generic bits and turned it into a reusable action called Build Rust Projects with Cross.

You can see it in use in the release job for ubi. The YAML for precious is nearly identical (which suggests that maybe I need to write another action).

For the 0.0.20 release, there are published binaries for 20 different OS/CPU targets, including many for Linux, some for Windows and macOS, and one each for FreeBSD and NetBSD. ↩︎

All My Perl Modules Are in Maintenance Mode

Sat, 11 Feb 2023 12:06:14 -0600

This is probably obvious to anyone paying attention to my CPAN releases over the past few years, but in case it’s not, I wanted to state this clearly. All of my Perl modules are in maintenance mode.

Why? I no longer do any professional work with Perl, and I haven’t done any since 2017 or so. All of my most enjoyable personal projects are in Rust these days. Also, I’m a bit burned out on the Perl community. I think as it’s shrunk some of the most unpleasant aspects of it have been amplified for me, and my time on the Perl and Raku Foundation board exposed me to even more of this, further burning me out. Please note that I’m not talking about TPRF’s board or volunteers here!

So here’s what “maintenance mode” means to me …

I will not personally work on significant new features. The definition of “significant” here is nebulous, but basically, if it takes more then a few hours of effort, I’m probably not going to work on it.

I may review PRs for significant features, but honestly, I probably won’t have the motivation for this. If you want to propose a significant new feature that requires careful review, then the best way to get this to be merged would be to also find some reviewers to look at it. If someone submits a bigger PR and a few other people review it that I trust¹, I’m a lot more likely to merge it.

I will not personally make releases that break backward compatibility for most of my modules. I qualify this with “most” because there are some things that aren’t widely used that I’m okay with breaking, for example, my personal Dist::Zilla bundle and other things where I’m the main (or only) user.

I will probably not hand off ownership of widely used modules. Some of my code is very widely used, including things like DateTime and Log-Dispatch. I’m very wary of giving others control of these since breakage in one of them can break a lot of CPAN or a lot of existing applications. I do welcome help with maintenance in the form of PRs and PR reviews!

This includes modules like Specio, where it’s in wide use entirely because it’s used in DateTime.

Given all this, if for example, you have a suggestion for a DateTime 2.0 feature, I think the best option is to fork DateTime (into a namespace not starting with DateTime) and go ham on it.

I will try to keep my Perl modules working with newer Perl versions. So far this has generally not been too difficult. If that changes, I will reconsider my stance on handing over ownership.

I will try to fix bugs in code and land mines in APIs. I will do this as long as they can be fixed in a backward-compatible way, without major surgery on the code, in a reasonable amount of my time. Backwards compatibility trumps correctness here, especially if the buggy behavior is documented. This goes double for “bugs” of the form “I don’t like this documented” behavior. That’s not a bug, that’s a design decision you disagree with. See above about forking.

One safe way to fix land mines is to add new methods/functions/parameters, like I’ve done with Time::Local. PRs to do this sort of thing are welcome and encouraged, with the caveat about significant features from above.

How long will I continue to do this? I don’t know. I’m amazed that people are still using code I wrote over 20 years ago! Will they still be using it 20 years from now? Maybe. Will there be new versions of Perl to deal with then? Will I still want to touch any of this code? I have no clue.

I can’t list all the people I trust. But I’d say that a good reviewer is someone with a long history of using Perl who understands The River of CPAN metaphor. If you want to know if someone would be a good reviewer, you can email me to ask or find me on the TPRF Slack. ↩︎

Big Changes in Precious v0.4.0

Sat, 19 Nov 2022 16:24:35 -0600

I just released version 0.4.0 of precious, my code quality meta-tool for configuring a collection of linters and tidiers for a project.

The headline change for this release is that command invocation configuration has changed, with the old run_mode and chdir keys being deprecated. Don’t worry, you can safely upgrade to this release, as the old config keys still work and do not cause precious to emit any warning yet¹.

Also, precious still works the same way by default as it always did. It runs the command once per file from the project root, passing the command the file’s path relative to that root.

The problem with the old configuration is that it didn’t really capture the full scope of possible ways to invoke commands. Specifically, it has a few shortcomings:

You couldn’t pass absolute paths to the command no matter what you did.
You couldn’t run a command per directory and pass any path to it except a dot (.) or no path at all.
You couldn’t run a command once and pass file or directory paths to the command.
You couldn’t run a command once from any directory except the project root.

All of these cases are addressed with the new system, which offers three command invocation keys, invoke, working_dir, and path_args. The documentation in the project’s repo has been updated. There’s also documentation on upgrading to v0.4.0 as well as docs on every valid combination of invocation config options.

Please install the new release and take it for a spin. If you encounter any problems please file a GitHub issue.

I do plan to add warnings in a future release and eventually remove support for the old keys. ↩︎

My Team at MongoDB is Hiring

Tue, 11 Oct 2022 10:23:50 -0500

My team at MongoDB is hiring a senior engineer. For this position you can be 100% remote or you can choose to work from one our offices.

I’ve been at MongoDB since May of this year and so far it’s been great. If you have questions about the position, the team, or working at MongoDB, please reach out.

Fixing Some Bugs in My GitHub Profile Generator

Sun, 14 Aug 2022 11:28:22 -0500

A while back I was looking at the output from my GitHub profile generator and it seemed off. In particular, the language stats seemed off. The generator sums up how many bytes of code I’ve written for each language. and then calculates what percentage of my total output that represents.

Here’s what it showed, more or less:

Past Two Years	All Time
Perl: 76%, 9.5 MB	Perl: 77%, 11.3 MB
Rust: 21%, 2.7 MB	Rust: 18%, 2.7 MB
Go: 2%, 214.8 KB	Go: 2%, 368 KB

This isn’t obviously wrong. I’ve written a lot of Perl and I’ve been doing a fair bit of Rust recently. But the Rust numbers seemed excessive. Had I written 2.7MB of Rust code in two years? That’s a lot of code!

So I filed a bug to remind myself to look at this later. Today was later.

I added some debugging output to my code to print out various bits of info as it went, focusing on each repo’s language stats. Eventually, I had it just print out bytes of Rust in each repo that had any Rust. That did the trick.

I realized that my Rust repos have huge amounts of generated code. For example, my tailwindcss-to-rust project exists to generate Rust code from Tailwind CSS. The repo contains an example of that generated code¹. That generated file is 613KB all by itself.

The fix was simple. GitHub uses Linguist for its language detection and stats. You can set attributes in your .gitattributes file to control how Linguist generates stats. Any file with a linguist-generated attribute is excluded from Linguist’s stats collection. So I went through and added this to my Rust repos.

My Rust stat went down to 2.1MB. I’d have expected it to go down more, but I think that maybe some of what I marked as generated was already being excluded somehow.

And then it occurred to me that I have the same issue with some Perl repos too. Notably, DateTime-Locale and DateTime-TimeZone both contain ridiculous amounts of generated code. Apparently, I knew about this Linguist thing before because DateTime-Locale already had a .gitattributes file. But there was none for DateTime-TimeZone. Adding that removed about 6MB of Perl code from my stats.

So here are the new stats:

Past Two Years	All Time
Perl: 60%, 3.6 MB	Perl: 66%, 5.4 MB
Rust: 34%, 2.1 MB	Rust: 26%, 2.1 MB
Go: 3%, 214.8 KB	Go: 4%, 368 KB
HTML: 1%, 62.6 KB

That seems a bit more sensible. I’ve written a lot of Perl, but I haven’t worked on many of my Perl projects for a while.

I also noticed some weirdness with the count of PRs written and merged. When I run the profile generator locally I get a higher number than when it runs in GitHub Actions. That’s presumably because running it locally I run it with a GitHub API token that has access to private repos, so it sees private MongoDB repos.

But if I change the query to exclude private repos and run it locally, it gets a much lower number than it should. I’m not sure what’s going on here. Doing the query manually on the GitHub website I get numbers that match what the code gets in GitHub Actions, so I’m pretty sure that’s the right one. Confusing!

Just for good measure, I excluded all of my work-related orgs from the queries too. The point of the profile is to highlight my FOSS work, not my work work.

But even with this refinement I still get different results from GitHub Actions versus running it locally. If anyone has any ideas on why, I’d love to hear them!

GitHub is pretty slow to render this file. Be patient. ↩︎

What's the Right Way to Merge a Pull Request?

Sat, 02 Jul 2022 15:58:02 -0500

Edit: In the discussion on /r/programming a comment from /u/nik9000 pointed me at what I think is the best solution.

GitHub has a feature where the PR submitter can allow me to push directly to their fork. This means I can effectively edit their PR directly by checking it out and force pushing back to their fork of the repo! Apparently this has existed for a while but I didn’t notice it.

Thanks again to /u/nik9000 for pointing this out.

So I made a new saved reply on GitHub that I will use for all future PRs I receive. Here’s the content:

Hi, thanks for your PR! I’m pretty finicky about my projects (see this blog post for details), so I rarely merge a PR as-is. I can move forward on your PR in one of two ways:

I check it out locally, fiddle with it as needed, merge it locally, and simply close this PR. This will preserve at least one commit with your name on it, but the PR will show up as closed in your GitHub stats.
If you enable me to push directly to your fork, I can do my fiddling, then force push to your fork and merge the resulting PR. Again, this will preserve at least one commit with your name on it, but you also get credit for the PR merge in your GitHub stats. The only downside is that I will be force pushing directly to your fork.

Please let me know which approach you’d prefer. If I don’t hear from you before I get around to working on this PR I’ll go with option #1.

Thanks again for your contribution!

I’ve received a lot of pull requests over the years. But recently, I’ve been thinking about whether I merge them the right way.

When I wrote my my GitHub profile generator, one of the stats I had it generate was how many of my pull requests were merged. My profile currently says I’ve created 562 PRs, of which 420¹ have been merged.

But in fact more than 420 have been merged in some form. It’s just that for some of them, the maintainer fiddled a bit with the submission and merged it locally via the CLI, then closed the PR.

My precious stats!

The issue is that I always do this for PRs submitted to me. I’m incredibly picky about the code in my personal projects, so it’s nearly impossible to submit a PR that I will merge as-is. Things I typically edit in PRs include:

Names of everything.
Code nits like when to include optional parentheses, exactly what operators to use when there are multiple options, and every other possible thing you can think of.²
Adding documentation for API changes/additions (people mostly forget to do this).
Comments, including the word wrapping of comments (I like the way Emacs does it when I hit alt-q).
Commit messages themselves. I like a very specific format, more or less following Chris Beams’s recommendations, except I’m okay with longer subjects.
Making sure the commits are organized well. I hate commits like “fix typo in last commit”. Just edit the previous commit!
Making sure commits that make public-facing changes also update the Changes file.

I could instead put the PR submitter through the wringer to do all this, but I’d rather not. The only way to get someone else to submit something that’s exactly what I want would be to try to operate them as a puppet via PR comments. That would be exhausting for me and infuriating for them.

So instead what I typically do is check out the PR as a local branch with the GitHub CLI tool (gh pr checkout 42), fiddle with their commit(s), make sure CI passes, and then merge it locally. This preserves their name as a committer in the git history, so they get some credit. It’s bit weird, however, since the code with their name on it may be fairly different from what they submitted.

And they won’t get Internet points for the PR being merged. Hell, GitHub gives you achievements for this stuff! So I’m sure some folks would really prefer to have a proper PR merged.

One option I could offer would be to take their original PR, edit it, push it back to my repo as a new branch, then have them submit that branch as a PR. I would be open to doing this if someone really cared about getting that “PR merged” stat up.

So what do you think? If enough people told me they wanted this I would start offering that when people submit a PR. Or maybe there’s another approach I haven’t though of?³ You can email me or discuss this on /r/programming.

hurr durr ↩︎
Fortunately, I’m able to suppress this when reviewing work PRs, but I channel all the insanity back into my personal projects. ↩︎
Note that “don’t be so picky” isn’t an option. ↩︎

My Perl and Raku Conference 2022 Write-Up

Sat, 02 Jul 2022 11:20:36 -0500

I went to The Perl and Raku Conference 2022 in Houston from June 22-24. Here’s my write-up.

Again, I’d like to thank my employer, MongoDB, for paying for my flight and hotel during the conference. We’re hiring for a variety of engineering positions, with many remote options. Contact me if you have any questions about the company or positions, and I’ll see what I can do to find out more.

The Venue and Location

The conference took place at a hotel near the big Houston airport (IAH) in a neighborhood called Greenspoint. A local friend told me it’s often called “Gunspoint”. My wife and I did not get murdered, so that was good.

One of the things I like to do at a conference is try all the interesting vegan food in the city. I rented a car¹ because I knew the venue was a bit outside the core of the city, and Houston is very sprawly. I ended up driving more the week of the conference than I do in a month at home. Most of the restaurants we went to were a 25+ minute drive.

The high temperature each day was around 97-100 degrees Fahrenheit with a fair bit of humidity, so going outside was like getting punched in the face by the sun.

The hotel itself was fine but they had the air conditioning dialed up to 11. So in some conference rooms, I had to wear jeans and a long-sleeved shirt to feel comfortable. Meanwhile, the best clothing for outdoors was … there was no good clothing for going outdoors.

There was some sort of issue with the projection in many rooms that seemed to make slides harder to read than usual. I think maybe the bulbs in the projectors needed to be replaced? Or maybe we needed to dim the lighting in the rooms?

The Talks

Here are some write-ups of many (but not all) of the talks I attended.

People Still Use Perl? - Twenty Years of Making a Living with a Dead Language - Ruth Holloway

Watch it on YouTube.

This is a talk about Ruth’s history with Perl. It’s not super technical, but it’s pretty interesting. It’s also really personal and emotional, which isn’t something you see a lot at programming conferences. I’m impressed that Ruth was able to be so open about herself!

NewFangled: Bringing NewRelic to Perl with Alien and FFI Technology - Graham Ollis

Watch it on YouTube.

Graham has created several CPAN modules to make FFI in Perl easier than it is with raw XS, including FFI-Platypus, which lets you implement FFI entirely in Perl.

I got to this talk a little late but the parts I caught were interesting, and mostly covered the API of this module and some other related ones like FFI-C.

This talk is probably mostly interesting to people interested in Perl, unless you’re implementing FFI tools for another language, in which case there’s a lot to learn from here.

Taming the Unicode Beast - Felipe Gasper

Watch it on YouTube.

This is a good talk for anyone interested in handling Unicode properly. Felipe did a good job of clarifying the difference between Unicode and UTF-8, characters versus bytes, and all the usual confusing parts of Unicode.

He made a good argument that the tools built into the Perl core aren’t good enough for proper UTF-8 handling and that you should use his Sys::Binmode module to fix it. He also explained a variety of potential UTF-8 gotchas in both Perl and XS code.

A lot of these issues stem from the fact that Perl does not distinguish between a scalar containing characters and one containing bytes at a type level. This should sound familiar since this is the issue that led to Python 3. So I guess that’s what we’ll get in Perl 7?²

I think this will be of interest to anyone interested in Unicode issues. And if you’re designing your own language, you should learn about this stuff!

A Nailgun for Raku - Daniel Sockwell

Watch it on YouTube.

A lightning talk about improving the startup time for Raku scripts.

Fun fact, the solution that Daniel discusses was implemented for Perl back in the early 2000s. Check out Matt Sergeant’s PPerl.

Everything old is new again.

Open Source, Self Hosted Password Management with Bitwarden + Vaultwarden - Daniel Sockwell

Watch it on YouTube.

I hope you’re all using a password manager. If not, you should be.

But did you know you can host your own password manager sync server? But I’ll stick with 1Password because I’m lazy.

Modern Approaches to Ancient Perls - Brian Kelly

Watch it on YouTube.

I think I went to this partly because of the schadenfreude. I haven’t done much Perl professionally since 2017 or so, and I was always able to use a modern Perl, so this hasn’t been a problem for me.

But it was interesting to learn how to make older Perls less painful to use.

Command-line Filters - Time to Shine - Bruce Gray

The recording for this one didn’t come out, so I can’t link to it. I’ve heard a re-recording may happen so I will update this post if that happens and I notice the update.

This talk covered how to use command-line filters. I already knew a fair bit about this but I still learned some new things.

Three Ways to Make Wrong Code Look Wrong (er) - Daniel Sockwell

Watch it on YouTube.

I really liked this talk. It’s about three different approaches to making developer mistakes obvious. I preferred the version using the type system, and this approach can be used by any language with a sufficiently not-terrible type system (I’m looking at you, C).

Why Do Programmers Love Rust? - Dave Rolsky

Watch it on YouTube.

A heartbreaking work of staggering genius. Nothing will ever be the same. Anonymous Attendee

I was floored! Then I was ceilinged and walled too! Anonymous Attendee

He is clearly making these quotes up. Anonymous Attendee

I think this talk went reasonably well. I started rushing a bit towards the end because I was afraid I would run out of time, and then I ended a little early. Doh! I did practice this at home to check the timing, but for some reason, in the moment I felt like I was behind schedule. Next time I give this I’ll slow down a little (assuming I have a 50 minute slot).

Also, a few slides were hard to read. For the code examples, I realize now that it’d be better to use a light theme and to make sure that comments aren’t very dim as they are in the theme I used. So I switched the slides to the highlight.js VS theme, which looks much better.

For the screenshots of compiler errors, I’m not sure how to fix these. I suspect the ideal approach is to not use screenshots but to instead recreate the error in the browser in a large font. But that’s a lot of work.

A little bit of this was due to the projector issues I mentioned above, but mostly it’s on me to make better slides in the future.

However, looking at the video it’s all very readable so that’s nice.

And wow, I move my hands a lot when I’m giving a talk. Okay, time to stop watching this video of myself.

Meet the TPF Board

Watch it on YouTube.

This was a brief presentation about The Perl Foundation/Raku Foundation, followed by Q&A. If you wonder what the foundation does, this may help answer your questions.

The Perl Navigator: Code Intelligence for any Editor - Brian Scannell

Watch it on YouTube.

Last time I looked for an LSP server for Perl I gave up. The ones I tried didn’t seem to work.

But then the day before the conference proper, some folks were talking about Perl Navigator in the conference Slack, so I gave it a shot. And to my surprise, it worked quite well out of the box! And Brian fixed the one notable issue I had (not including ./lib in the module search path) quickly.

So I highly recommend Perl Navigator if you want an LSP server. And if you’ve never used an LSP server, you want an LSP server. Using lsp-mode in Emacs has greatly improved my productivity. I don’t know why I took so long to start using it.

So this talk is both about LSP in general and the implementation for Perl specifically. I think it will be of interest to anyone interested in LSP.

Mastering English in Perl - Makoto Nozaki

Watch it on YouTube.

By far the funniest and cutest talk at the conference. Watch it now!

IPv4 subnetting for humans - Teddy Vandenberg

Watch it on YouTube.

He says the math is simple but my brain still hurt. This will be useful for people who are smarter than me.

CLI Tools I Use - Dave Rolsky

Watch it on YouTube

It’s me again. Just a quick talk on some CLI tools I use that might be of interest. This talk has no Perl content and should be of interest to anyone who uses the CLI.

This was inspired by Bruce Gray’s talk earlier that day on command-line filters

SQL::Abstract - Caveat Emptor - Dimitrios Kechagias

Watch it on YouTube

I was never a fan of SQL::Abstract because I find its API arbitrary and confusing. But apparently it’s also really slow.

Dispatches from Raku - Daniel Sockwell

Watch it on YouTube

A quick summary of what’s new in Raku over the past year or so.

Advice for Presenters

Hey, you, Presenter! Have you tried reading your slides on a projector from thirty feet away? No, you clearly haven’t, because I can’t read your slides and I’m closer than that!

Here are some things to consider …

More than 10 lines of code on a slide is probably too much. You can get away with more lines if you use highlighting to just focus on a few lines, but be very aware of the font size.

Grey on grey is not readable. Blue on grey is not readable. Grey on blue is not readable. Many other color combinations are not readable!

Use high-contrast colors. Use higher contrast than you’d use for a web page. Remember, your audience may not be sitting up close and your projector probably will not render things as brightly as your laptop screen.

Less is more. Less text, less code, fewer boxes, fewer graphics. The more stuff you put on a slide the smaller that stuff is and the harder it is to read. Instead of one busy slide, make four, five, or ten simple slides!

Non-Conference Stuff

We stayed for a few extra days on either side of the conference so my wife and I could do some tourist stuff. On the Monday before the conference, we went to the Johnson Space Center, which was fun. I enjoyed walking through the 747 that flew the space shuttle around.

The day after the conference, I met up with a former coworker from ActiveState and his wife. We had lunch with them, then went to the art museum to see an M.C. Escher exhibition. This was great! They had lots of original prints as well as the wood he would carve to make them. They also had drafts and work from his planning process. The amount of work involved in producing these is amazing. My wife and I greatly enjoyed hanging out with them and seeing the exhibition.

Of the food we tried, I think the best was Trendy Vegan, which serves vegan Chinese. The salt and pepper tofu was fantastic. I also liked Tainan Bistro, which serves Taiwanese food. But I’m not sure how easy it would be to get vegan food there if you don’t speak Mandarin.

The Hallway Track

This is the best part of the conference. I enjoyed seeing old friends and making new ones. I was bummed that some folks couldn’t make it for various reasons, including visa and passport issues. I’m hopeful that they’ll be able to come next year.

I was quite happy that for the first time, someone with prior climbing experience joined me for my Rock Climbing BOF! He knew how to belay so I could do more than climb the autobelays. But it would have been nice to have more folks join us. Hopefully next year I’ll get a few more takers.

Next Year’s Conference

It will be in Toronto! The 2005 conference was in Toronto and I loved it. The weather was great and the city is very walkable, which is a huge plus for me. I didn’t like driving so much in Houston.

I paid for this out of pocket, since it was purely for entertainment purposes. ↩︎
This is a joke. ↩︎

Job Search 2022 Update: Postscript

Fri, 17 Jun 2022 11:10:27 -0500

I’ve been at MongoDB for six weeks now and it’s been great. BTW, we’re hiring and we have a lot of remote engineering positions¹. Please email me if you have questions about working at MongoDB or about specific positions².

But that’s not why I’m writing this post. No, I’m writing because apparently my job search wasn’t quite done.

Today. Today! TODAY! I got an email from GitHub saying that the position I applied for had been filled:

The role that you originally applied for has now been filled unfortunately. However, GitHub is continuing to build our team and we are still hiring for a number of roles. Please feel free to check out our Careers Page and see if there are any positions that look appropriate for you and please apply if so!

Yes, I’m definitely going to apply for another role. After all, it took them just 99 days to respond to my first application. I should apply again. Maybe I’ll hear back before I retire.

Also, back on May 16, I heard from Oso, who look like a hare next to GitHub’s tortoise. It took them just 67 days.

I really don’t understand what’s going on at either of these places.

Of course, both of them have Netflix beat, who still haven’t responded at all.

Tech hiring. It’s ridiculous.

Click on the “All locations” dropdown and you’ll see various remote options. ↩︎
No promises on whether I can answer those questions, but I can try. ↩︎

Restoring Window Positions in GNOME After Switching Monitor Inputs

Sat, 14 May 2022 14:05:58 -0500

I suspect that this title makes no sense to most people, so here’s the background.

Like most normal people, I have four¹ computers in my office. I used to have three, but that was shameful, so I was very relieved to get a new laptop for my new job at MongoDB.

A while back, I bought a USB switching device with a remote. This eliminated the need to physically switch my USB hub’s cable from one computer to another.

I have two monitors connected to these computers, and I switch between inputs on the monitors when I switch computers. I used to do this manually by using the buttons on the monitors, but this was annoying. I’ve used KVM switches before but my experience has been that they’re all junk, so I didn’t want to go that route again.

Fortunately, I found an awesome project in Rust called display-switch created by Haim Gelfenbeyn. It runs on Linux, macOS, and Windows as a background service. It listens for USB connect/disconnect events and then uses DDC commands to switch the inputs on the monitor. With this configured on each computer, I can use the USB switch’s remote to switch all the USB devices and the monitors together. It’s great!

And for a while, everything worked fine. I’d switch to my Windows computer for gaming, then back to Linux for day-to-day work and computing. But for some reason when I added my work laptop to the mix, something went wrong on my personal Linux desktop.

Suddenly, when the monitors switched, mutter² would move all the windows on my left monitor onto the right monitor. This was very, very annoying.

Surely, I thought, there must be a way to fix this. The actual issue has been discussed in various forums for quite a few years. Here’s a bug report for mutter on the topic, which has links to more bugs for Red Hat, Ubuntu, and gnome-shell.

I don’t think this had anything to do with my work laptop, exactly. Instead, it’s probably because I shifted some cabling around when I added my work laptop to the mix, moving my personal Linux desktop from HDMI1 to DisplayPort2 on my left monitor. This in turn changes the timing of when the monitor sleeps and wakes when the input is switched, and mutter reacts by moving all my windows around.

The display-switch project lets you run arbitrary commands when the USB device disconnects and connects. I wanted to use this to keep my windows where I put them.

In reading about the issue, I found some workarounds people had come up with, including a very creative one using wmctrl. But wmctrl only works with X and X is going away in favor of Wayland.

But then I read some more and discovered that Gnome has a comprehensive JavaScript binding that you can invoke with some dbus magic:

1
2
3
4
5
6


$> gdbus call \
 --session \
 --dest org.gnome.Shell \
 --object-path /org/gnome/Shell \
 --method org.gnome.Shell.Eval \
 "some_js_stuff(); and_more();"

Could I use this to somehow save and restore my windows? Yes, I could! When you run this command, you will get some output to stdout like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


$> gdbus call \
 --session \
 --dest org.gnome.Shell \
 --object-path /org/gnome/Shell \
 --method org.gnome.Shell.Eval \
 '42'
(true, '42')

$> gdbus call \
 --session \
 --dest org.gnome.Shell \
 --object-path /org/gnome/Shell \
 --method org.gnome.Shell.Eval \
 'throw "Foo"'
(false, 'Foo')

The output is a list where the first item is a boolean indicating whether the code threw an error (I think), and the second is the error output or the value of the last statement executed.

So I wrote a little Perl script to execute the JS I needed and parse the output to check if it worked.

Here’s the code in full:

 1
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123


#!/usr/bin/env perl

use v5.32;
use strict;
use warnings;
use autodie qw( :all );
use Capture::Tiny qw( capture_stdout );
use JSON::MaybeXS qw( decode_json encode_json );
use Path::Tiny qw( path );

my $POSITION_FILE
 = path('/home/autarch/.config/display-switch/window-positions.json');

sub main {
 if ( @ARGV && $ARGV[0] eq 'restore' ) {
 restore();
 }
 elsif ( @ARGV && $ARGV[0] eq 'save' ) {
 save();
 }
 else {
 die q{You must specify 'save' or 'restore' as an argument'};
 }
}

my $SAVE_JS = <<'EOF';
const { Gio, GLib } = imports.gi;
let windows = {};
global.get_window_actors().forEach(function (window) {
 let mw = window.meta_window;
 let rect = mw.get_frame_rect();
 let title = mw.get_title();
 if (title === null || title === "gnome-shell") {
 return;
 }
 let id = mw.get_id();
 let w = {
 title: title,
 monitor: mw.get_monitor(),
 x: rect.x,
 y: rect.y,
 w: rect.width,
 h: rect.height,
 };
 windows[id] = w;
});

const filepath = GLib.build_filenamev([
 GLib.get_home_dir(),
 ".config",
 "display-switch",
 "window-positions.json",
]);
const file = Gio.File.new_for_path(filepath);
const [ok] = file.replace_contents(
 JSON.stringify(windows),
 null,
 false,
 Gio.FileCreateFlags.REPLACE_DESTINATION,
 null
);
if (!ok) {
 log("Could not write to file at " + filepath);
}
EOF

sub save {
 run_js($SAVE_JS);
}

my $RESTORE_JS = <<'EOF';
const { Gio, GLib } = imports.gi;
const filepath = GLib.build_filenamev([
 GLib.get_home_dir(),
 ".config",
 "display-switch",
 "window-positions.json",
]);
const file = Gio.File.new_for_path(filepath);
const [ok, contents] = file.load_contents(null);
if (!ok) {
 log("Could not read from file at " + filepath);
}
const windows = JSON.parse(contents.toString());

global.get_window_actors().forEach(function (window) {
 let mw = window.meta_window;
 let rect = mw.get_frame_rect();
 let id = mw.get_id();
 let w = windows[id];
 if (w === null || w === undefined) {
 return;
 }
 mw.move_to_monitor(w.monitor);
 mw.move_resize_frame(true, w.x, w.y, w.w, w.h);
});
EOF

sub restore {

 # waiting for the monitor to be active again.
 sleep(5);
 run_js($RESTORE_JS);
}

sub run_js {
 my $js = shift;
 my @command = (
 qw( gdbus call), '--session', qw( --dest org.gnome.Shell ),
 qw( --object-path /org/gnome/Shell ),
 qw( --method org.gnome.Shell.Eval)
 );
 my $stdout = capture_stdout(
 sub {
 system( @command, $js );
 }
 );
 $stdout =~ s/^\(|\)$//g;
 my ( $ok, $err ) = split /\s*,\s*/, $stdout, 2;
 die "Error running GJS: $err" unless $ok eq 'true';
}

main();

The Perl parts aren’t that interesting. It’s the JS that’s doing all the work. Here’s the code to save the window positions:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38


const { Gio, GLib } = imports.gi;
let windows = {};
global.get_window_actors().forEach(function (window) {
 let mw = window.meta_window;
 let rect = mw.get_frame_rect();
 let title = mw.get_title();
 if (title === null || title === "gnome-shell") {
 return;
 }
 let id = mw.get_id();
 let w = {
 title: title,
 monitor: mw.get_monitor(),
 x: rect.x,
 y: rect.y,
 w: rect.width,
 h: rect.height,
 };
 windows[id] = w;
});

const filepath = GLib.build_filenamev([
 GLib.get_home_dir(),
 ".config",
 "display-switch",
 "window-positions.json",
]);
const file = Gio.File.new_for_path(filepath);
const [ok] = file.replace_contents(
 JSON.stringify(windows),
 null,
 false,
 Gio.FileCreateFlags.REPLACE_DESTINATION,
 null
);
if (!ok) {
 log("Could not write to file at " + filepath);
}

This loops through all the windows and records information for each window. It saves the monitor the window is on, its unique ID, its X & Y position, and its height & width. This gets written as JSON to a file every time the USB device is disconnected.

One odd thing is that global.get_window_actors() includes one window with a null title and another window for the gnome-shell process. I’m not sure what that null title window is, but it’s best to just skip it and gnome-shell.

The restore code is even simpler:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25


const { Gio, GLib } = imports.gi;
const filepath = GLib.build_filenamev([
 GLib.get_home_dir(),
 ".config",
 "display-switch",
 "window-positions.json",
]);
const file = Gio.File.new_for_path(filepath);
const [ok, contents] = file.load_contents(null);
if (!ok) {
 log("Could not read from file at " + filepath);
}
const windows = JSON.parse(contents.toString());

global.get_window_actors().forEach(function (window) {
 let mw = window.meta_window;
 let rect = mw.get_frame_rect();
 let id = mw.get_id();
 let w = windows[id];
 if (w === null || w === undefined) {
 return;
 }
 mw.move_to_monitor(w.monitor);
 mw.move_resize_frame(true, w.x, w.y, w.w, w.h);
});

It loads the saved window position info, then matches the current windows against the IDs of the saved windows. When there’s a match, it restores the window to the correct monitor, then set its position and size.

One other thing to note is the sleep(5) in the Perl code’s restore subroutine. The program needs to wait for the monitor’s input change to take effect, or else none of this works. It’d be nice if display-switch offered an on_monitor_input_change_execute config option, but I’m not sure if that’s even possible. The sleep is a hack, but it works fine, so it’s good enough for now.

I just got a docking station for my work laptop, so I’ll be able to connect it to both my monitors as well, and I can use this program on that computer too if I need to.

I’m quite pleased with this solution. I thought it might be anywhere from very hard to impossible, but this turned out to be fairly easy. Most of my time was spent simply reading about the problem before discovering the Gnome JS API. Once I knew that API existed, the actual implementation was fairly easy.

I also want to credit this /r/gnome post by MortimerErnest, which links to a bash script they wrote. Reading that script made it quite obvious how I could use the Gnome JS API for my own problem.

Well, more than four, because I also have a NAS, a network router, a Nintendo Switch, a PS5 in the closet, an iPad mini in the same closet, and a Raspberry Pi I bought over a year ago with which I intended to build an LCD panel clock, though I’ve not done so yet. And my phone is also a computer. This is a very normal number of computers to have. ↩︎
The default window manager for GNOME since GNOME 3. ↩︎

Software Job Search 2022 Retrospective: Coding Challenges

Tue, 19 Apr 2022 10:57:22 -0500

I did a lot of coding and design challenges during my recent job search! A lot a lot. And I have some thoughts about them.

I mostly have thoughts about the coding challenges. The design challenges were pretty much what you’d expect. They started with “design a system that has to do X”. Once I had an initial design they’d ask some questions about how to handle various types of changes in scale, requirements, etc.

I’ve given design challenges as an interviewer before. I think they’re great because it’s a chance to have a conversation on a technical topic with someone that is a lot like what you’d do when working with them. More places should do these!

How many of these types of challenges was I given? So many! Here’s a list:

Array - 1 take-home coding.
ClickHouse - 1 take-home coding - but I didn’t do this because it came just before I decided to accept the MongoDB offer.
Google - 1 live coding - but I didn’t continue after this, so there’s probably more that they do.
LogDNA - 1 live system design.
MongoDB - 3 live coding, 1 live system design.
Oden - 1 live coding (which I can’t remember very well), 1 live system design
OneSignal - 2 live coding - I withdrew my application before getting a final response from them, but I think I’d already done their full interview process.
Optic - 1 take-home coding.

But what about the coding challenges? Let’s start at the very beginning, and ask what the purpose of this type of challenge is. I think there are a few things companies would like to learn from these challenges:

Can the candidate understand requirements and build software that fulfills those requirements?
Can the candidate write code that isn’t a complete mess? For example, do they break their code up into reasonably sized functions/methods/classes/packages?
Does the candidate understand various technical concepts at the level you’d expect given their experience? This includes topics like concurrency, serialization, REST APIs, etc.

Live Coding

I very strongly question whether you can learn any of these things from a live coding challenge. There are several reasons for this.

These sorts of exercises are ridiculously artificial, with very little resemblance to real-world work. They take place in a high-pressure situation in a very limited time. Many of them also further hamper the candidate with additional constraints.

It’s very common to do these in CoderPad (or an equivalent). CoderPad is hot steaming garbage and can go fork itself. It’s like a half-assed version of an IDE from 20+ years ago. It has very little customization, no code completion, and is very different from most developers’ preferred environments. Even I, a dinosaur who will have Emacs pried out of my cold dead hands, have finally started used using LSP mode to turn Emacs into a proper IDE.

Why do companies use this tool? I don’t know. Looking at the feature list, it doesn’t seem like it does anything all that amazing. Pretty much every company was using Zoom for interviews, which let syou share your screen. CoderPad does let the interviewer type as well, but there are solutions to that, like the VS Code Live Share extension or the Chrome Remote Desktop extension.

At the very least, I’d like to see more companies offer candidates a choice of environments. Of all the companies I interviewed with, only OneSignal didn’t only use CoderPad (or Google’s internal thing). For one of the challenges we used CoderPad, and then for the other I shared my screen. This was slightly awkward since I was using Zoom’s in-browser version, but its screen sharing is broken if you try to share your whole screen¹, but I needed to share multiple apps (Emacs and a terminal), so I had to quickly install the Zoom Linux app and hope it didn’t hard-lock my computer, as it loves to do.

The challenges I did live included topics like data (de)serialization, concurrency, and algorithms and data structures. For the algorithms and data structures ones, I was told not to look things up online, making the experience even more divorced from normal day-to-day software development. For the others, I was able to look up things like library APIs, and they tended to be more interactive.

The worst of these was with Google, where the interviewer was mostly mute as I stumbled through the problem. The other algorithms interview I had was with MongoDB and the interviewer was more of a partner in the coding process.

If I had to evaluate my own performance, I’d say that on my two algorithms/data structures challenges, I did either very poorly (Google) or somewhat poorly (MongoDB). For the others, I did either very well (finishing the problem in half the allotted time) or fairly well (finishing easily in the allotted time, but with more looking things up online or getting some help from the interviewer).

But when I think about why I did well on some of these and poorly on others, I think it comes down almost entirely to prior experience. There were several different data (de)serialization problems across different companies. I have a lot of experience with this. I’ve helped design a somewhat complex binary on-disk data format for which I wrote the writer and multiple readers. I also wrote a pretty cool (IMHO) JSON tidier, and I’ve dug a bit into the guts of serde, easyjson, and lots of Perl code for handling config files and other formats.

Unsurprisingly, when presented with a similar problem I can solve it very quickly. But that’s not because I’m an awesome programmer², it’s just because I’m repeating a task I’ve already done many times.

Similarly, the concurrency-related tasks weren’t too hard, in part because for my music player frontend I’ve had to work with async APIs and tasks a lot recently. So solving similar problems feels easy.

But if my past work history had been different, would I have done nearly as well? Almost certainly not.

On the flip side, the two algorithms questions were toy problems that bore no resemblance to any work I’ve ever done³. If I had to do something similar for work, I’d google the answer, cut, paste, and tweak some code, and probably have something reasonable working soon enough. But without that “crutch”⁴ to lean on, I didn’t do very well.

Given all that, I’m very skeptical that the interviewers got good answers to the questions I think they should be asking. Instead, I think they mostly learned if I’d done a similar thing in the past or not. That is somewhat useful information. I guess. Maybe. But I think it’s a pretty poor indicator of my future job performance.

Take-Homes

It’s obvious to me that take-homes do a much better job of answering the questions I posed above. The candidate can do them in a reasonable time frame, with less pressure, and in a familiar coding environment.

However, take-homes have a few big disadvantages for both the candidate and employer.

The big one is that they take longer in several ways. They use more of the candidate’s time, which is annoying for the candidate. For the take-homes I was given, I was told they should take 2-4 hours, as opposed to all of the live coding exercises, which were scheduled for 45 minutes or an hour.

And because it’s more time-consuming for candidates, it means that they may just not bother. We saw non-trivial attrition during this step of our hiring process at both MaxMind and ActiveState. We’d give them the take-home challenge and they’d disappear. I certainly don’t blame them!

It also tends to slow down the hiring process. The employer has to give the candidate a reasonable amount of time to do this. Most places gave me 3-7 days as a default, with a provision saying “let us know if you need more time”. That’s a 3- to 7-day stall in the hiring process. During that time the candidate might finish a bunch of live coding interviews with other companies and get an offer!

The other issue I saw with take-homes is that they’re harder to scope well. Notably, I think Array’s exercise was a bit under-specified and could have used more clarification of what was in scope and out of scope. I think I spent more time on it than was needed because of this.

Optic’s Challenge and Why It Was the Best

As I mentioned in a previous post, I really liked how Optic structured their challenge. They’re a remote company trying to work largely asynchronously, and the challenge reflects this. It started with an invite to a fresh Git repo that included the instructions as well as some existing code. The work involved extending an existing service in the repo with some functionality, though you could do this by writing a new service for just the new bits of the API. They also invited me to a Slack channel just for this challenge.

They encouraged me to treat this like I would if I was actually working there. So rather than just taking the instructions and working in isolation, I started by asking for a quick Zoom call with one of their devs⁵ to clarify a few points. Then later in the process, I filed some GitHub issues to ask more questions about details of the project and scope. The same dev responded on GitHub and we discussed the pros and cons of each implementation.

Another thing I liked was that they not only asked for code, they also asked for some documentation around design choices, trade-offs I’d made, and future directions for improvements. This is exactly the type of thing you’d do at work, right? The main difference is that you’d probably do some of that documentation first as part of the feature scoping. Then the final deliverable could include some documentation/stories/tasks/tickets/whatever for next steps on the project.

And finally, they paid me $300 for doing this. FWIW, I understand why this can be a bit tricky at some places, so I think a good alternative would be to offer to make a donation to a 501(c)(3)⁶ charity on the candidate’s behalf.

What About Existing Projects?

None of the companies that gave me any of these challenges offered to let me submit an existing project as a replacement. This surprised me. I didn’t ask any places about this. In retrospect, I wish I had, just out of curiosity as to why.

My guess is that they would say that they want to give all candidates the same process in the interest of equity. But I’m extremely skeptical that in practice this improves diversity in hiring.

If you’re coming into tech from an underrepresented background, are you more or less likely to have the free time and energy to spend three hours per company on take-homes? Are you more or less likely to get nervous and freeze up during a live coding exercise?

If anyone has more data about this I’d be very curious to learn more.

Takeaways

Ironically, when I was given the choice by OneSignal⁷, I chose live coding over a take-home. I was gambling that it would go well and I wanted to save some time. For me, live coding was better because it’s quick and I was fairly confident I could quickly handle most types of problems that I would get in these sessions.

I think more companies should offer this choice. This lets the candidate pick the option that they think will best showcase their skills.

But I think it’s probably worse for the company doing the hiring.

Is there any better way to evaluate someone’s abilities besides a take-home or looking at existing public projects? I can’t think of one. In my time in software engineering, I’ve seen a lot of people hired as developers who were fundamentally incapable of good work for a variety of reasons. Getting answers to the questions I posed above is critical for hiring.

So we’re left in this state where it’s hard to hire software engineers, but we feel like we have to put them through the wringer in the interview process anyway. What a silly, silly field I’m in!

I’m going to blame this on Wayland. ↩︎
Though I am ;) ↩︎
I’d honestly be surprised if any software developer I met had done something similar outside of leetcode-type exercises, though I’m sure someone somewhere has. ↩︎
Of course, this isn’t a crutch. No sane employer will ever complain that you found an answer quickly online as opposed to spending longer solving it from first principles in isolation! ↩︎
Yes, I know I said they work asynchronously, but they also made it clear that a kickoff Zoom call was an option. ↩︎
Or your country’s equivalent. ↩︎
The only company that offered this choice, IIRC. ↩︎

Job Search 2022 Update: The Last One

Fri, 15 Apr 2022 16:39:33 -0500

Assuming nothing unexpected and unhappy happens, this will be my final Job Search 2022 update. I accepted the offer from MongoDB and I start in a few weeks! Thanks again to David Golden for reaching out to me about MongoDB way back in early March.

Ultimately, my thinking came down to a decision between startups and public companies. I realized that given my age and my early retirement goal, I can best achieve that goal by focusing on higher total compensation that’s more secure¹, rather than going for another startup lottery ticket.

I want to give a special shout out to Optic. Of all the places I interviewed with, I liked their hiring process the most². The way they structure their take home coding assignment is a bit different from others, and I’ll go into that in a later post. They are also paying me $300 for doing it, and are the only company that has done this during my search. And as I noted in my last update, their offer includes options expressed as a percentage of the company, which is another outlier. All of this makes me think that working there would be great.

They are still hiring and if you want to get in on an early stage startup that’s not looking to eat your entire life I highly recommend applying.

So that’s this week’s TLDR, but I’ll still do the regular full update of all happenings in the past week.

The list of companies that never responded is still the same:

GitHub - applied on 2022-03-10
Netflix - applied on 2022-03-11
Oso - applied on 2022-03-10

I honestly am questioning my sanity a little here. I do have an automated email from Oso thanking me for my application, but I don’t see one from GitHub or Netflix. Did I not apply somehow? Do they have a system that doesn’t send automated “thanks for applying” emails? I checked my spam folder. Nothing. Maybe I’ll never know. Maybe I’ll get a response three months from now and have a laugh.

So what else happened this past week?

Things had gone well with Array, but on reflection I realized I didn’t understand their business well enough. Specifically, I have ethical concerns about fintech in general and their credit tools specifically. I don’t think this sort of thing is automatically bad, but I can imagine it being used for evil. I realized there’s no way I can figure this out short of working there for a few years. So I withdrew my application.

ClickHouse asked me to do a take home assignment this week. But I got it when I was pretty sure I’d be accepting a different offer on Friday, so I didn’t start it. I withdrew my application today.

I had two interviews scheduled for Monday afternoon with Fastly, but when I went to join the first one I realized the interviewer had not accepted the calendar invite. I think it’s because he’s someone I’ve hung out a lot with at Perl conferences in the past and so it wasn’t appropriate for him to interview me. But the person doing scheduling hadn’t noticed this. So I had one of two interviews that day, and the other was rescheduled with someone else for Wednesday.

I emailed the recruiter saying I wanted to move things along quickly, and ultimately the final interviews were scheduled for Friday the 15th. But at the same time, I’d started negotiating on final offer details with MongoDB. MongoDB’s recruiter called me back Friday morning and they met my increased ask, so I accepted³.

So I cancelled the Fastly interviews, which they had scheduled for me pretty quickly to meet my timeline. I do feel bad about that, but I would note that my application was submitted as an internal referral on March 9, 37 days ago⁴. I didn’t have my first interview with the relevant hiring manager until April 6, just nine days ago.

A couple folks I know through Perl who now work at Google reached out on Monday. One works in developer relations and though that this would be a good fit for me. I asked whether it was possible to move super fast through the hiring process, because I wasn’t going to ask MongoDB and others to wait for several more weeks on the hope of an offer from Google. They thought it might be.

But it wasn’t, so this ended up not going anywhere. I wish I’d reached out to them directly earlier. From talking to them, it sounds like I’d have a better chance of getting hired for a developer relations position than the software engineering position a recruiter submitted me for at the beginning of March. It would’ve been kind of poetic if the very first place I interviewed, where I had a terrible experience, turned out to be the last place I interviewed and I ended up working there.

During my very first call with the recruiter from LogDNA, I told him (and later the hiring manager) about my plans to spend six months in Taiwan in 2023-2024. I tried to make sure I brought this up early with every company in case it was a blocker. So … it turned out to be a blocker. Needless to say, this was a bit frustrating since I’d already spent several hours in interviews with them and they’d made an offer. On the plus side, at least I didn’t also spend several hours on a take home assignment too.

Oden decided not to move forward. I didn’t get any feedback on why.

I hadn’t heard anything from OneSignal this week, so there are no updates about them other than that I withdrew my application.

Parting Thoughts

One of the questions I like to ask companies, especially when talking to people at the director/VP/C-suite level, is what challenges the company is facing now and expects to face in the future. A number of places said that hiring was a challenge. This is no surprise in such a hot market. But then why are some companies still so slow in their hiring processes?

That’s my biggest complaint about my job search. Many companies are surprisingly slow, either to just respond to an application, or to move the process along as it goes. To be fair, I complained about this on the hiring side at past employers as well.

I think more companies need to figure out how to make hiring their top priority for the people involved. There’s no reason not to try to schedule all the interviews in the space of a week. If there’s a take home assigment involved, you should give candidates a reasonable amount of time (one week is a good baseline), but then schedule everything else quickly once that’s done. This is especially true with a candidate like me who is jobless. I had plenty of free time to interview, and would’ve loved to go faster everywhere.

Otherwise I don’t have too many complaints about the various processes I went through, except for a few live coding challenges. I’ll get into that in a future retrospective post.

I’m quite happy to be done with the job search. It was exhausting! And I’m excited to start at MongoDB soon. I think the work will be interesting, and everything I know about the company says it will be a great place to work.

RSUs aren’t guaranteed to have any value in the future, but they’re a more reliable value than equity in a private company. ↩︎
Though all of the companies that I got into the process with were pretty good, modulo some being too slow. ↩︎
I hate negotiating. It always leaves me wondering if I should’ve asked for me. This is why I prefer public well-defined salary levels. ↩︎
perl -MDateTime -E 'my $dur = DateTime->today->delta_days(DateTime->new(year => 2022, month => 3, day => 9)); say $dur->in_units(q{days})' ↩︎

Job Search 2022 Update: Week 5

Sun, 10 Apr 2022 19:16:28 -0500

It’s week 5 and my brain is tired. Fortunately, I think I’m near the end. My current plan is to decide by Friday, April 15. I suspect that some of the companies I’m talking to won’t be in a position to make an offer then, but I already have some good offers, I want to finish this process, and there’s a limit to how long I can ask people to wait for me to decide.

Once again, I’ll start with the the list of places that are just sitting on my application:

GitHub - applied on 2022-03-10
Netflix - applied on 2022-03-11
Oso - applied on 2022-03-10

So it’s basically been a month for all of these now.

This past week also had quite a few interviews, as well as some interesting developments that started on Saturday, April 2. That day, someone shared my blog post about my GitHub profile generator on Hacker News¹.

A hiring manager at Meta (aka Facebook) saw this post, realized I was in the midst of my job search, and reached out to me to see if I was interested in talking to them. I had kind of ruled Meta out because I’m unsure whether Facebook is a net negative for the world², as opposed to being positive or neutral. But interest is always flattering, and they pay the megabucks, so I figured it couldn’t hurt to learn more. I spoke to this manager on Monday, which was helpful.

They ended up connecting me to someone in Production Engineering for another informational interview. PE is Facebook’s SRE, but a little different. Ultimately, I ended up deciding not to apply for any position there, for a few reasons.

One reason is that I’d almost certainly have had to put in some study time for either a software dev or PE job, and I don’t have time to do that on my current schedule. Second, at least the PE jobs involve a lot of C++, which I’d like to avoid.

But the third and most important reason is that after thinking about this a lot I’m still unsure about whether Facebook is a net negative for the world. I just can’t see myself working somewhere that I’m not sure is at least neutral.

On to the rest of the updates …

I had a few more interviews with folks at Array. I think these went fairly well.

I met with the recruiter at ClickHouse and then had an interview with the hiring manager of the position I applied for. I think it went well, although I’m not sure if the position is exactly what I thought it was. I’ll have to learn more about it if I move forward.

Cockroach Labs is no longer on the “hasn’t responded” list. They responded with a rejection. It took them three weeks, but that’s better than not responding at all. Yes, I’m looking at you, GitHub, Netflix, and Oso.

I had my first interview with a hiring manager at Fastly. This went well, and I have several more interviews scheduled for Monday, 2022-04-11. I did emphasize that I’d like to move fast, and I appreciate their responsiveness to that.

MongoDB made an offer! It’s a good offer. But like with LogDNA, I told them that I wanted to decide this coming week. I also have a call scheduled with the VP of the division (team? department? sector?) I’d be on at MongoDB. I greatly appreciate the chance to talk to VPs and CEOs post-offer. Every company should do this.

I haven’t heard back from Oden. I will poke them this week if I don’t hear something soon.

I had my virtual onsite with OneSignal. This was a mix of interviews and coding challenges. For the coding challenge, I was warned that maybe I shouldn’t do it in Rust, so I did it in Rust anyway. It involved concurrency, which I’ve had to deal with in one of my personal Rust projects recently, so this worked out okay. I ended up learning about some new aspects of tokio, which was fun. I probably could’ve done it in Go or Perl too, but I haven’t done anything concurrent in either recently, and I didn’t want to spend the entire time reading docs.

I also had a technical challenge with Optic. They structure this in a really interesting way to make it more of a work simulation. This challenge went well, and they made an offer! It’s a good offer. Of course, I told them that I wanted to decide this coming week.

I have to give a huge shout-out to Aidan, Optic’s CEO, for one small but very important detail in the offer. The equity portion of the offer was described as a percentage of the company. This is the only number for equity that matters in most cases³. Many companies just give you a number of shares and the strike price. But without knowing what percentage of the company those options represent, this is not very meaningful.

So things have been going well, and I’m very optimistic about finishing up this coming week. Stay tuned for what might be my last update in a week or so. I also plan to write a few retrospective posts on recruiters, coding challenges, and the ethical impact of my work after that.

Why didn’t I share this myself? I missed out on 83 points of karma! 83! ↩︎
I feel the same way about much of social media. ↩︎
Strike price can matter too, but only with a smaller exit (probably an acquisition) where the spread between strike price and share purchase price isn’t that big. But if the stock is trading at $250/share, the difference between a $2 and $20 strike price is minimal. ↩︎

Job Search 2022 Update: Week 4

Sat, 02 Apr 2022 10:07:37 -0500

Has it been four weeks? I can’t remember. I’ve been told there was a time before the interviews, before the coding challenges, before the system design questions. But I can’t remember. That life is gone, lost in the haze. All I have now is the interviews, the endless stream of questions and scheduling requests. My only hope now is that this infinity will end, but how can infinity end? That’s the paradox that consumes my waking thoughts.

If I add it all up, I had 11.5 hours of interviews this past week. That doesn’t seem like much when I write it down, but it sure felt like a lot while I was doing it.

Fun fact, a lot of the people I talk to have read these blog posts! That’s not surprising, since I link to this blog from my resume and online profiles. From what I can tell, nothing I’ve written has alarmed anyone too much.

Originally, in my week 2 post I had said I am “terminally blunt” and someone did ask about that. I realized that wasn’t the best wording, and I’ve since edited that blog post. What I was trying to express was that I don’t like the types of social lies where I’m lying and the person I’m speaking to knows I’m lying, but social convention says I should lie anyway. I’d rather just tell the truth. But the phrase “terminally blunt” makes me sound like I could be a huge asshole. I will leave it to those who know me to judge whether I am, but I am sure that none of my references will say that I am.

Again, I’ll start with the list of places that simply have not responded beyond the initial automated “thanks for your application”:

Cockroach Labs - applied on 2022-03-21
GitHub - applied on 2022-03-10
Netflix - applied on 2022-03-11
Oso - applied on 2022-03-10

I added Cockroach Labs since it’s been two weeks since my application now.

ClickHouse is no longer on the list! I have a first interview with someone there next Monday. Can I get through the process with them and other slow movers before I have enough offers that I should just pick one? Maybe they’ll move fast. Let’s see.

I didn’t submit any new applications this week, and at this point I’m responding to all¹ new incoming requests with “thanks but I’m too busy for more interviews”.

Here’s this week’s updates …

I had an initial call with a recruiter at 1Password. I had applied for an engineering manager position because I thought from the job description the role involved a fair bit of IC work too. But it was much less than I wanted. The recruiter said they’d run my resume and interests by all hiring managers to see if maybe there was a management position that fit what I wanted, or if they wanted to consider me for an IC role. The next day I got a generic “we’ve decided not to move forward with the role you applied for” email. So I’m not sure if they actually considered me for anything else.

Maybe they thought I was trying to do some sort of bait and switch application. Is that a thing? How could that possibly work?. I didn’t apply for any IC roles because at the time I didn’t see any that seemed like a good fit. Maybe I was right and that hasn’t changed. Oh well.

I still hadn’t heard back from Array by Tuesday, March 29, six days after submitting my homework, so I emailed the recruiter to check in. My homework was well received and they just needed to get the CTO’s feedback. I ended up meeting with the CTO on Friday morning and he was very positive about my application. They’re hiring for several teams so we talked about which might be the best fit for me, and I’ll move forward with one of those teams next week.

I spoke to a different recruiter at Fastly and I have an interview with a hiring manager scheduled for next Wednesday. I’d like to move this one along since at one interview per week the process will take way too long. If things go well next week I will politely tell the hiring manager that.

I got an offer from LogDNA! It’s a good offer, but I told them that I need to wait to see other offers before I make any decisions.

I had a metric forkton of interviews with MongoDB on Monday and Tuesday. This included a couple live coding exercises, one on concurrency and one on algorithms, as well as a live systems design challenge. I think I did really well on the concurrency exercise, fairly well on the systems design challenge, and not so well on the algorithms exercise. I have a lot more to say about these sorts of live exercises, as well as homework/take-homes, but I’ll save it for after I accept an offer².

I did a live coding exercise with OneSignal as well. This went quite well. I have a “virtual onsite” (aka circa three hours of interviews) scheduled for next week.

One thing about OneSignal that stands out is the detailed interview guide materials that they give you. These materials detail what types of interviews you’ll be having, what topics they’ll cover, the qualities they’re looking for, and they give some guidance on how to prepare. This is great! Every company should do this! If I end up somewhere beside OneSignal I might try to make this happen there.

I met with several people from Oden, including the CEO, another engineering manager, a software engineer, and a product person. This included a systems design challenge with an interesting structure. For the first hour, I worked with an engineer who presented the problem. I sketched out a design in Lucidchart and made a bunch of notes. Then after a short break, I met with a product person. I explained the design to them and we talked about various potential issues.

If I had to pick one company right now based purely on the product (ignoring offer details, tech stack, company size, etc.), it’d probably be Oden. As a refresher, they do data collection and analytics for factory production lines. This is quite interesting, and it’d be totally new for me.

But of course, all those other details besides the product matter a lot too. And fortunately, every company I’ve spoken to has an interesting product that will involve some sort of new challenge for me, whether that be its scale, learning a new field, or something else.

I had a couple more interviews with Optic, including one with an engineer and another chat with their CEO. They’re a very early stage startup (just 5 people right now, I believe), but it sounds like they have a reasonable work/life balance. I greatly enjoy having a chance to talk to company CEOs and learn more about the product vision, their growth and funding plans, the challenges they face beyond the technical ones, and other high-level details. This is a perk of interviewing with smaller companies, and it was an aspect of my interview process at ActiveState I liked a lot back in 2017.

And that’s this week’s update. The list is slowly shrinking, and I’m hoping that I will be able to make a decision by Friday, April 15 at the latest. After that, I suspect the gating factor on my start date will be laptop availability.

If someone reaches out about a job that pays $1 million a year with 20 weeks of PTO and a 4-day week, I’ll definitely respond. Again, my email is autarch@urth.org. ↩︎
Translation, I’m gonna complain about these big time but I don’t want to do it mid-process in case it makes any potential employer mad. ↩︎

Yet Another GitHub Profile Generator

Mon, 28 Mar 2022 10:09:46 -0500

A number of the companies I’ve been interviewing with are using Guide to create an “interview guide” that they send to candidates. I really like this. It includes information about the interview topic, interviewer profiles, and relevant links to pages on the company website and/or blog.

For one company, an interviewer profile linked to that person’s GitHub profile, so I took a look. Instead of the standard profile, they have a custom thing with lots of fun stats about their activity on GitHub. Neat!

But how do I do that? It took me an embarrassingly long time to figure out what the GitHub feature for this was named. The docs call it “Managing your profile README”. Doing it is simple. If your account name is “githubuser”, you make a new repo called “githubuser”. Then any README.md in that repo will be displayed on your GitHub profile page.

I think this feature has existed for a long time but I somehow managed to miss it. I took a look at how the profile I linked to, Eric’s, is generated. It’s using the lowlighter/metrics GitHub Actions. Each plugin generates an infographic, and you control what is generated by which plugins you use in your GitHub Actions workflow. The actual generation uses GitHub’s API (or other services for a few plugins).

But while I like how it looks, I don’t like that it only generates images. This means that nothing is clickable, so you can’t link to recent PRs or repos with recent activity, for example.

Fortunately, there are many other options. A search for “github profile generator” finds many options. One of the most popular appears to be Rahul Jain’s. You fill out a web form and it gives you some markdown to turn into your profile.

But I don’t like this one either. The content it generates is essentially static. If you want to update it you need to go to the form and change things (though you can save your config between uses).

I wanted something that would regenerate regularly, and I wanted it to include whatever I wanted. I also wanted the output to be Markdown (without lots of images) so it could have clickable links. There was nothing else to do but to build yet another profile generator!

Does the world need another profile generator? No, it doesn’t. Did I do it anyway? Yes, I did.

You can see its output in my profile right now.

How It Works

I wrote it in Rust. Is Rust the best language for this? Yes, because I want to learn more Rust.

The code lives in the autarch/autarch repo itself.

I started with octorust, which uses GitHub’s REST API (the V3 API). REST APIs are usually quite easy to understand, even though you end up with a lot of API calls in many cases. However, at one point I found a weird bug in octorust where the Repos::list_languages method inexplicably returns a Result<i64> instead of the data it should return.

The octorust crate is entirely generated, and the generated code lives alongside the generator in the oxidecomputer/third-party-api-clients repo. That repo has issues disabled, so I couldn’t report this (though it does allow PRs).

In the meantime, I’d been looking at how lowlighter/metrics was implemented and saw that it was using GitHub’s GraphQL API (the V4 API). I find GraphQL more challenging to use than REST since you need to formulate queries from scratch, rather than using predefined REST endpoints that correspond to specific resources. But the big advantage of GraphQL is that you can get exactly the data you want across multiple resources in a single query.

I decided that this was a good opportunity to learn something new, so I took some queries from the lowlighter/metrics repo and started massaging them to give me what I wanted.

At first, I tried a single query to give me all the repos to which I had access via the User repositories field. But for some reason, this seemed to skip some repos in my houseabsolute organization. There’s a good chance this was user error on my part, but I eventually ended up simply doing one query for the user and one for the org.

I used the graphql_client crate to make these requests. At first, I used its macros to generate code at compile time from my queries. But the huge downside of this is that it doesn’t play very nicely with code completion. The generated structs are quite complex, and the only way for me to find out what fields they contained was to use cargo-expand. But that didn’t help with code completion.

Fortunately, the crate also includes a graphql-client binary that will generate the same code that the macro does. This was much better. I could open the code in Emacs to view the generated structs and modules, and code completion just worked.

There was still one unfortunate issue, which is that even though my queries for user repos and org repos are nearly identical, the generator produces different Rust types for the user’s list of repos versus the org’s. This means that my stats collection code needs to be repeated for each type.

There are a couple of ways I could fix it. One would be to create a trait that each type can implement. But traits are defined in terms of methods. That means I’d have to write methods for each field in common between the types that I wanted to access. I did try this, but then I ran into lifetime issues when trying to take references to data in these structs. I’m sure those issues are solvable, but I just wanted to get a profile up, not spend all my time solving lifetime issues!

The other way to deal with this would be to write a macro of my own for the stats collection code. But that would have the same code completion problems, and that feels way over-engineered. I think what would be ideal would be a way to tell the graphql-client codegen that these two types are the same type. But that’s a PR for another day.

The final piece to make it all work is a simple GitHub Actions workflow that runs the generator. It runs whenever I push to the master branch of the generator repo as well as running nightly. If the generated README.md changes, it commits the changes and pushes the commit to the repo.

GitHub will stop running scheduled workflows for a repo after 60 days of inactivity, but I’m fairly sure that commits made by this workflow will count as activity, so it will never stop running. But I will find out in a couple of months.

If you want to use this for your own profile you’ll have to change some of the code. I have my username and org as a constant in the code, and most people probably don’t have a single org for most of their code. But really, you should probably use one of the many more battle-tested alternatives.

Job Search 2022 Update: Week 3

Fri, 25 Mar 2022 19:28:14 -0500

Week 3 of my job search has come to a close, and boy is my brain tired! I’m looking forward to starting a new job just so I can do something less exhausting than interviewing all week.

I had a lot of interviews (and a homework assignment) this week, but first, here’s where I stand with various places …

First, the list of shame, which is those companies that have not responded after more than two weeks:

ClickHouse - applied on 2022-03-10
GitHub - applied on 2022-03-10
Netflix - applied on 2022-03-11
Oso - applied on 2022-03-10

We can give ClickHouse and Oso a (partial?) pass here. They’re both startups and I can understand if they don’t have a well-oiled hiring machine. But GitHub and Netflix really have no excuse. Is it possible that no one has looked at my application yet? Or are they committing the gravest of all hiring sins, ghosting people instead of sending rejection emails?

I also submitted one more application. Kevin Centeno, who I greatly enjoyed working with at MaxMind, reached out and said he had a friend working at Cockroach Labs who like it. So I went ahead and applied on Monday, 2022-03-21. No word back yet.

And here’s where I stand with all the other places I applied …

I heard back from 1Password! Of all the places I applied, this is the product I use the most (many times every single day). So I’m quite excited about the idea of working on it. I have a first interview scheduled for next week.

For Array, I submitted my homework on 2022-03-23 and haven’t heard back yet. Either it was amazingly bad or amazingly good. But the most likely cause is that everyone who could review it doing their best to avoid reviewing yet another homework submission. I can understand that.

I spent about 4 hours on this homework, which is more than I would have liked. The exercise was a little too open-ended, in my opinion. I probably could have done less, but I didn’t want to turn in work without any tests, even though they weren’t specifically asked for. I think the homework instructions I wrote at ActiveState are better at putting bounds on the amount of work required, but I wonder if we could have done ever more there. Also, unlike ActiveState, there was no “point us at a GitHub project” alternative. This is a bummer for me since I have literally 10s of thousands of lines of code on GitHub anyone can look at. It might even be a hundred of thousands, but I haven’t counted.

CircleCI said no, because of my desire to not get up early to talk to people in Europe. That’s very reasonable.

I talked to a recruiter at Fastly, and we agreed that it made the most sense to move me forward with one of the three positions I applied for first. That means I have to talk to a different recruiter first who handles that team. They reached out for my available times so hopefully that will happen next week.

LogDNA wins the Speedy Scheduling Award, as I got through all of their interviews this week. It sounds like a good place to work, it’s a product space I understand, and I think I’d find the work challenging and interesting. At the last interview, it sounded like an offer was likely and that I should hear back next week. The downside of them being the speediest is that I will have to ask them to wait a few weeks if they do make an offer, since I need to see what other offers come in. But still, good job, LogDNA!

I met with the Lead at MongoDB for the team I’d applied to and we did a live coding exercise. I know I’ve said these are the worst in my writing on hiring, but ironically this was fine for me. I didn’t find it too stressful because the task at hand was something very relevant to my actual experience, not a I-will-never-do-this-at-work CS problem. Also, it was a collaborative interaction, as opposed to someone just observing me. And finally, I was explicitly told to go ahead and Google things (unlike my Google interview, more irony). So it went fine. I’m scheduled for several more hours of interviews next Monday and Tuesday next week.

I heard back from OneSignal and had a quick interview with the hiring manager for the position. It sounds like they’re doing a lot of Rust, which is appealing. The interview went well, and they sent me one of those “pick an interview slot” links from Lever. So I went and picked one for 4:00 PM, not realizing that it was showing me US Pacific times! Oh, the irony. But seriously, Lever, maybe use another thing I worked on, MaxMind’s GeoIP APIs and databases. They make it easy to use the visitor’s IP to take an educated guess at the right time zone. And there’s no way to tell Lever I made a mistake and want to reschedule. Fortunately, the hiring manager was able to resend the link and I scheduled something at a better time. This will also be a live coding exercise. Hopefully, this will go as well as the one with MongoDB did.

A recruiter on LinkedIn reached out to me about a company called Oden and I had a first phone call with the hiring manager today. Their product is all about collecting information on factory production lines and providing insight for factory managers. This sounds fascinating. They said that new hires get sent to a customer factory to see it in action. I think the only thing better would be if this was software for giant killer robots. For the next step, they offer either a choice of either homework or a live coding exercise. I chose the live coding because I don’t want to spend another 4 hours on homework in a week that’s already starting to get packed with interviews.

Onna asked me when I was available to meet with some folks from their team in Spain, asking if I was available at 7:00 or 8:00 AM my time. That’s when I realized I hadn’t asked about work hours in either of my first two conversations there. So I emailed the recruiter there and asked. They said that they’d want me to be available every day from 9:00-12:00. Again, just like CircleCI, this is totally reasonable, but I don’t want to do it. So I withdrew my application. I feel bad about this because I should have asked the recruiter before I interviewed with their VP of Engineering.

Finally, I spoke with the CEO of Optic. They have a pretty interesting product focused on making it easy to review REST API changes in a similar way as we now review code changes. I have another interview with an engineer there scheduled for next week.

That’s it for job search updates. I’m still getting a lot of recruiter contacts on LinkedIn and some emails too, but at this point I’m telling them I’m too busy for more interviews. I just checked and saw a message from someone looking for “a Software Engineer experienced in Cobol, JCL, and SQL Server Database to work on a contract basis”. Yes, my resume screams “this guy knows Cobol and JCL”. Good job, recruiter!

What will happen next week? Stay tuned.

Job Search 2022 Update: Week 2

Sat, 19 Mar 2022 14:42:29 -0500

Week 2 is over. My applications are moving forward in some places, and getting no response from some places. So it’s exactly what I’d expect.

Last Sunday, I was looking at my job search spreadsheet and realized I’d somehow deleted 1Password, so I quickly put in an application there. That brings my application total to 18, plus I’ve had some sort of informal talks with a few other places.

Of the places I’ve applied, the following companies simply haven’t responded yet (excluding the automated email they send to all applicants):

1Password
ClickHouse
Grafana
GitHub
Netflix
OneSignal
Oso

Now, GitHub and Netflix don’t surprise me. They’re big, desirable places to work that probably receive a huge number of applications. Much like Google, I imagine that their problem is dealing with the flood, as opposed to getting enough good applicants. I do have an inside person at Netflix who said they’d put in a good word for me, but still no response.

But are the others on this list in the same situation? Maybe, I don’t know. That said, it’s annoying to get no response for a week or more. When I have done hiring for past employers I’ve always tried to respond to applicants within a few days.

Fastly was also slow to reply, but my friend Emily who works there was able to poke someone. After that, they asked me to give my availability, so hopefully I’ll move forward with them next week.

Discord rejected my application. I don’t know why, of course. The application form does have a yes/no question asking “If applicable, would you be willing to relocate to Discord’s SF HQ? While Discord is embracing a hybrid remote approach going forward, some roles will remain HQ-based.” I answered “no”.

I had applied for an engineering manager role there, so maybe they would only accept someone who said “yes” here? If that’s the case, it would be nice if they said that in the job description. But of course, they may have rejected my application for some other reason too.

I also had a bunch of interviews:

MongoDB - first interview
Onna - first interview
Array - second interview
CircleCI - first interview

LogDNA wins the award for moving the quickest! Last week I talked to their recruiter, and this week I interviewed with the hiring manager and then had a technical interview and design challenge interview this week.

Next week I have a few more interviews, including the next steps with MongoDB and Onna.

I also had a call with the VP of Engineering at APFusion. This was more of an exploratory chat, though of course every interaction is an interview, since either side can always decide to not move forward.

When I started my search, I updated my profile on Y Combinator’s Work at a Startup site. This has job postings for a lot of startups of all sizes, and companies can see your profile if you want. Two people reached out to me, and I’m scheduled to talk to the CEO of Optic next week. Another company also reached out, but it was a product I wasn’t interested in, so I politely declined.

Salary and Negotiation

I’ve so far resisted every attempt to get me to provide a salary I’d want. I usually ask if they have a range. If they tell me, I can say yes, let’s move forward. In one case I said I’d probably want a bit more than the upper end of their range, though they said the upper end had some flexibility.

Everything I’ve read on negotiation says “don’t give a minimum or a range that you’d accept.” One blog post I read suggested saying something like “I’d prefer to discuss that later in the process after we discover if this is a good fit.” I’ve tried saying that in some cases but it feels pretty inauthentic for me. I don’t like telling someone a lie that they know is a lie just because for some reason it’s considered crass to speak the thing that we both know is the truth. So I’ve moved to saying “I won’t give you a number no matter how much you say you need one because that would be poor negotiation on my part.” But no one I’ve said that to has seemed upset or offended.

For the record, here are some of the posts/articles I’ve read on negotiation:

My number one takeaway from all this was that by the time the company has made you an offer they’re already quite invested in hiring you. Another important takeaway is that the value of the money to you versus the value to the company is very different. Getting an extra $10,000 or $20,000 (or more) in salary, hiring bonus, or RSUs is a drop in the bucket for all but the tiniest companies, but it has a huge impact on you.

So don’t be afraid to ask for more. Unless you do something ridiculous like asking for 4x what was offered, it’s very unlikely that they will simply pull the offer. The worst case is that they say “sorry, we can’t offer any more.”

What’s Next?

I don’t plan to submit more applications at the moment. I think I have some solid prospects based on my interviews so far. I’ll see how things go this week and if I get a lot more rejections, I’ll probably end up applying to more places. But even managing the number of responses I’ve had so far is a lot to deal with. Interviews are exhausting!

Job Search 2022 Update: Week 1.1

Mon, 14 Mar 2022 16:00:45 -0500

So after my last update, I had one small change to write about.

For reference, I live in Minneapolis, which means I’m in America/Chicago (US Central). I’d been talking to an (external) recruiter about positions at Namely. The recruiter told me they asked, “Are you willing to work east coast hours?”.

I asked what that meant, since I’m very much not a morning person, and I’m not up to working 8-4 my time. I’ve had jobs where I had to get up early, and I was never able to shift my sleep times enough. I’d always stay up too late and I’d be tired most of the week, which was unpleasant. One of the best things about working for mostly remote companies has been flexibility in regards to working hours, with most places expecting that there would be some sort of 3-5 hour “core hours” block that mapped to late morning/early afternoon for most US time zones.

So then the recruiter said they wanted to know when I could start. I said 10 am. Most days I’m up before this, but occasionally I do sleep later than this, so I don’t love the idea of committing to 10 am every day. But it’s not untenable, just something I’d factor into my decision-making.

But this was too late for Namely so they didn’t want to move forward.

Maybe it’s just me but this seemed strange, especially for a company that lists all of its jobs as having remote options. If 11 am east coast time is too late, does that mean they won’t hire anyone on the west coast unless they can start before 8 am west coast time every day? Even people who normally get up early may not be able to make that work, especially if they have kids.

I do want to emphasize that I have no complaints about the recruiter who was acting as a relay in all this. He was quick to respond to me and was just passing on what the company was asking.

Overall, this is also a win. It’s always good to find out these types of issues as early as possible, before either side has invested much time in the process.

Job Search 2022 Update: Week 1

Fri, 11 Mar 2022 14:27:04 -0600

Week 1 of my job search is coming to a close. Here’s an update on what I’ve been up to and where I’m at.

First off, I’d like to thank the people who’ve reached out to me in various ways since my first post, as well as people I’ve contacted for the inside scoop at various places:

Adam Reinhardt
Andrea Gunn
Andrew Cholakian
David Golden
Dylan Martin
Emily Shea
Heidi Gider
Joelle Maslak
Jordan Adler
Karen Etheridge
Konstantin Narkhov
Mark Fowler
Miguel Mateus
Pete Sergeant
Ryan Gerry
Tatsuhiko Miyagawa
Neil Bowers
If I forgot to put your name here let me know, and I’m sorry!

Several of these folks have submitted me as an internal referral, which is always a huge boost for an applicant.

I started off the week by looking through a very long list of bookmarks I’d been collecting since I left my last job in October. This was mostly companies that used Rust or had 4-day weeks or for some other reason had caught my eye. This list came from “Who’s Hiring” posts on Hacker News and the Rust subreddit, as well as looking at various job boards and my own research.

I first made a spreadsheet to track the status of all my applications. I borrowed from the one described by Phil Calçado on his blog. Here’s an example version of the Google Sheet I’m using. Feel free to make a copy of it for your own search.

As of right now, I have 93 entries in my spreadsheet. That’s clearly too many to apply to.

I ended up filtering based on the “My Interest”. If a company seemed promising but didn’t have any open roles that fit me, I marked them as a 0. They might be a fit in the future. I also marked a few companies as -1, which means I should probably not consider them in the future. The reasons for that include:

I applied and was rejected (just Google right now).
They cannot hire in Minnesota (Bolt).
Their published salaries are way below my desired range.
They are in a field I don’t want to be in, like cryptocurrency.
They expect > 40 hours per week of work (Wallaroo).
They exclude Colorado or NYC residents from applying for positions¹.
Someone works there that I don’t want to work with².

That left me with a range of 2-4 for the remaining companies. I started by giving a 2 or a 3 to any company that seemed particularly appealing because:

I use and like their product (Discord, GitHub, Netflix, and others).
The position seemed particularly interesting (for example, Grafana Labs has a position working on OpenTelemetry that I’m intrigued by).
They use Rust (Fastly, OneSignal, Oso, and others).
They have a 4-day week or might be willing to offer me one.
I know someone who works there and they tell me it’s a good place to work (Elastic, Fastly, MongoDB, and others).
They are known to pay obscene amounts of money (Netflix and others).

I’ve applied or had recruiters submit me for 17 companies so far. Those companies are:

Array
CircleCI
ClickHouse
Discord
Elastic
Fastly
GitHub
Google
Grafana Labs
LogDNA
MongoDB
Namely
Netflix
OneSignal
Onna
Oso
Wallaroo

I was really excited about the position at Elastic I applied for, but it was already filled³. I talked to a recruiter from a different part of the company, and they were very interested in me, which felt nice!

But the positions were on all-Java teams, involved very little coding, and I think they want someone with more cloud deployment and storage expertise than I have. There weren’t any other positions at my level that felt like a good fit for me, so I decided not to move forward with Elastic. This is too bad. My good friend Andrew works there and I’d love to work with him, plus he’s told me it’s a great place to work.

Of the remainder, I talked to one earlier today and I’m scheduled to talk to four next week, so that feels like good progress.

I’ll try to write weekly updates as I go forward, so stay tuned.

But I’m in Minnesota, right? However, the reason they exclude Colorado residents is that Colorado requires all job postings to include a salary range. NYC also has a similar law coming into effect on May 15. I also saw Home Depot excludes California, presumably because California has very strong anti-non-compete and other labor laws. I don’t want to work for a company that sees those laws as a bad thing. ↩︎
Without naming names I’ll note that some people in the FOSS world have a reputation for being “difficult”. ↩︎
They told me it was filled on Tuesday but it’s still listed on their careers page. I could go on at great length about how bad various companies’ career pages are and how bad every applicant tracking system is. ↩︎

Let the Job Search Begin; And My First Interviews

Mon, 07 Mar 2022 14:16:04 -0600

So it’s that time of my life again. The time when a not-so-young¹ man has to get a job because he can’t live off his savings forever. And that not-so-young man is me.

If you know of something that might be a good fit for me please let me know.

I wrote up a TLDR I’ve been sharing with recruiters that will help you figure out if something is a good fit for me.

Level: Senior+ (Lead, Staff, Principal), or Senior at FAANGMULA (or equivalent very large tech company like Stripe). Would also consider Senior for a Rust job.
Management: Yes, if there is also IC work.
Fields: Tech/software, but no crypto/blockchain, surveillance, defense industry, or gig economy services for non-professionals.
Technologies:
1. Rust
2. Something unusual, e.g. Elixir, Nim, etc.
3. Go
4. Anything else, but no Java or PHP
Location: 100% remote
Comp: Top tier TC or 4 day (32 hour) week - or both!
Hours: No more than 40 hours per week (no startup 45-50 hours stuff)
PTO: 5+ weeks

There are a lot more things I care about, but if I included all of them this would be the “too long”, not the TLDR.

I think I’ll also blog a bit about my interviews as they happen. Why not? I plan to be circumspect about some things, especially specific technical questions and coding exercises. It’s not appropriate for me to share those with the world.

So far I’ve had two interviews.

My first was last Wednesday (March 2) with Wallaroo. This went well … up until the interviewer told me that they expect people to work 45-50 hours per week. So I withdrew my application immediately.

I consider this interview a huge success. We found out that this wasn’t a fit in under 30 minutes, before people on either side of the process had invested much time. That’s a win in my book!

My second was earlier today with Google². It was a phone screening with a shared virtual doc (sort of like Google Docs but it defaulted to monospace at least). The question was comp sci-ish, and I think I did pretty terribly. I dove in and started implementing this in a not-very-optimal way. After a while, the interviewer asked me “could you implement this using X”, where X is a very basic programming concept that I know quite well. And my response was along the lines of “oh yes, X is definitely the right way to do this”. But at that point we were mostly out of time so I couldn’t redo my work using X.

I’ve already written a fair bit about interviewing and the issues with live coding exercises. If I’d been doing this on my own with less time pressure, I think I would’ve hit on X in a reasonable time frame. But in a live 45-minute window it felt like either I picked the right approach immediately or I failed. I don’t think that’s a great screening tool, since it has a lot of false negatives. But the top employers like Google do just fine with a lot of false negatives, so it makes sense that they’d stick with this approach.

Now, I don’t know that I failed (yet), but I don’t think I made a great impression.

Edit @ 14:45pm my time: They chose not to move forward. No surprise. But it’s great that they’re so quick to follow up.

And that’s where I’m at. I have a long list of bookmarks for companies that I might want to apply at, so I’m going through those now and figuring out where to apply first.

But not so old either! ↩︎
The recruiter I talked to swears that Google now has fully remote roles. I would want to confirm this before doing any onsite interviews, because I have no interest in moving for a job. ↩︎

Checking Tailwind Class Names at Compile Time with Rust

Mon, 21 Feb 2022 10:07:18 -0600

At the end of my last post, “Frontend Rust Without Node”, I talked about my big issue with using Tailwind CSS. It has a huge number of classes, I can’t remember their names, so I often typed them incorrectly. This made it difficult to figure out why my styling wasn’t doing what I thought.

Here’s a recap of why this is the case …

Tailwind consists of class names and “modifiers”. A class name is something like text-lg or grid. Any class name can also have a variety of modifiers attached to it, separated by colons, so you can write something like this:

1
2
3


<div class="hidden lg:visible hover:background-indigo-200">
 Content that will only be visible above a certain screen size.
</div>

You can also combine modifiers to create classes like lg:hover:background-indigo-200. So while there are “only” a few hundred CSS class names, the number of names you can use is "base" names × modifiers! (that’s a factorial sign on modifiers!). It’s not really a factorial since you can’t combine every modifier with every other modifier (sm:md:lg:visible makes no sense), but it’s a lot more than a simple multiplication.

As such, it’s not practical to simply generate a CSS file with all possible classes. Well, I lied. It’s entirely practical because it’s trivially doable. Just add this to your tailwind.config.js …

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


module.exports = {
 content: {
 ...
 safelist: {
 pattern: /.*/,
 variants: [ "sm", "md", "lg", "xl", ... ],
 }
 ...
 }
};

… and then run the tailwindcss program to generate the CSS file¹. When I tried this, without even including all variants, I ended up with a 7MB CSS file and I’d only be using a tiny fraction of what it contained.

So it’s not a good idea, and it’s not how the Tailwind CSS authors intend Tailwind to be used. Instead, the tailwindcss program will scan your code with some broad regexes to find strings that could be Tailwind CSS class names. Then it generates a CSS file containing just those strings which match actual Tailwind names.

But this scanning process, because it matches so broadly, errs on the side of false positives, and the tailwindcss program will not emit any warnings when it finds a string that could be a match, but which isn’t. If it did that, you’d end up with hundreds or thousands of warnings quite quickly, as it could match nearly every variable and function name in your codebase, depending on your naming conventions.

So in my first attempts to use Tailwind with Dioxus, my workflow ended up like this:

Add some class names in my React/Dioxus/Seed/Yew code.
Re-run tailwindcss against my code base to regenerate my “compiled” CSS file. ²
Look at my app, which since I’m using Trunk will have hot-reloaded in my browser.
Scream into the void when my CSS changes did not do what I intended, usually doing absolutely nothing. Then try to debug what happened by asking:
- Did my CSS actually get regenerated or is my Trunk config not doing what I think it should do?
- Does the regenerated CSS contain the new class names?
  - If yes …
    - Did the browser properly load the new CSS file³?
    - Does the CSS do what I think it does when attached to the element I think I attached it?
  - If no …
    - Did the tailwindcss work as I expected it to or did I screw up its config so it didn’t see the class names I just added?
    - Or did I typo a class name for the thousandth time today?

I spent a lot of time asking these questions. And most of the time, I had typoed a Tailwind class name but nothing in my toolchain was telling me I had done so.

This was annoying.

It was doubly annoying because I’m using Rust. If Rust does one thing well, that thing is telling me at compile time all the many things I did wrong.

Enlisting the Rust Compiler to Check my CSS

Fortunately, I knew I could make the Rust compiler check this for me. When I experimented with Seed before Dioxus, the quickstart template I used included a plugin for PostCSS written by Martin Kavik, postcss-typed-css-classes, that hooked into PostCSS and generated Rust code for all of its classes⁴.

But I didn’t want to use that plugin for a couple of reasons:

I had so far managed to avoid needing to run node for my project, so I didn’t want to use PostCSS, which requires node.
The code generated by that PostCSS puts all of the classes, tens of thousands of them, into a single struct in an 8MB file. The reason it’s so large is because it includes a huge number of class × modifiers, and it doesn’t even come class to including all possible modifier/class combinations.

This killed my editor⁵ when it came to auto-completion. Even loading the generated file in my editor is slow, probably because of syntax highlighting. And jumping around the file or searching in it is also quite slow.

Obviously, #2 is fixable, but to fix #1 I needed a new tool, ideally written in Rust, since that’s what everything else I’m using is in.

So I Wrote That New Tool

It’s called tailwindcss-to-rust. It generates Rust code with all of the available Tailwind CSS class names and modifiers as static strings. It doesn’t generate strings for modifier/class combinations, which means that the full file is only 624kb. That’s still pretty big, but an order of magnitude smaller than the one generated by the PostCSS plugin. My editor takes a slight pause when it loads, but it’s only a second or two. And jumping around the file and searching it is quick enough to feel instantaneous.

And to further speed up code completion, I split up the classes into a set of structs, where each struct represents a “group” of classes based on function (layout, typography, animation, etc.). These groups are taken from the Tailwind documentation headings.

Unfortunately, there’s nothing in the Tailwind codebase to make this easier. There’s no list of all the available class names, and there’s no reference to the documentation groups in the codebase at all. So all the information I needed, the group and class names, only exists in the documentation or in a generated CSS file. And to make it even worse, the documentation itself is entirely generated by code.

Fortunately, as an old school Perl hacker, I know how to whip up some horrible hacks, sanity be damned! I wrote a Perl script⁶ that crawls the Tailwind documentation site and generates a Rust data structure mapping individual class names to groups.

If you’re running in terror, don’t worry, you don’t need to use Perl to use the tailwindcss-to-rust tool. I wrote the Perl to help me write the Rust to generate the Rust. And you just need to run the Rust that generates the Rust, not the Perl that generates (some of) the Rust to generate the Rust. I hope that clears things up.

The actual generator, tailwindcss-to-rust (written in Rust) takes as its input your tailwind.config.js file and an input CSS file for the tailwindcss program. We’ll call that input file tailwind.css for this explanation. This input file is usually just a few lines, see step 3 of the Tailwind installation docs for details. Then the generator does the following:

Creates a temp directory.
Copies your tailwind.config.js and input CSS file to the temp dir.
Adds safelist: [ { pattern: /.*/ } ] to the tailwind.config.js in the temp dir.
If the directory containing the given tailwind.config.js file contains a node_modules directory, that directory is symlinked from the temp directory. This is so it can access any tailwind plugins in that directory. I’m honestly not sure if this achieves anything, but I haven’t experimented with any plugins that don’t ship as part of the tailwindcss binary.
Runs tailwindcss in the temp directory, using the modified config file. Because it added that safelist item to the config, the generated file will include every possible CSS class. The exact classes vary based on what Tailwind plugins you are using.
“Parses” the generated CSS file to find all the class names it contains.⁷
Generates Rust code with structs for all of those classes. If there are class names the generator doesn’t recognize then they are put in a struct named “Unknown”.

In the future, I may add an option to provide a group mapping for class names. If this tool sees broader adoption I’m sure people will want this, because one of the most powerful features of Tailwind is that you can quite easily create custom classes and modifiers.

The generated Rust code looks like this⁸:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50


#[derive(Clone, Copy)]
pub(crate) struct Modifiers {
 pub(crate) active: &'static str,
 pub(crate) after: &'static str,
 ...
 pub(crate) lg: &'static str,
 pub(crate) ltr: &'static str,
 ...
 pub(crate) visited: &'static str,
 pub(crate) xl: &'static str,
}

pub(crate) const M: Modifiers = Modifiers {
 active: "active",
 after: "after",
 ...
 lg: "lg",
 ltr: "ltr",
 ...
 visited: "visited",
 xl: "xl",
};


#[derive(Clone, Copy)]
pub(crate) struct Accessibility {
 pub(crate) not_sr_only: &'static str,
 pub(crate) sr_only: &'static str,
}

pub(crate) const ACCESSIBILITY: Accessibility = Accessibility {
 not_sr_only: "not-sr-only",
 sr_only: "sr-only",
};

...

#[derive(Clone, Copy)]
pub(crate) struct Sizing {
 ...
}

...

pub(crate) const C: C = C {
 acc: ACCESSIBILITY,
 ...
 siz: SIZING,
 ...
};

Then you can use the generated code like this:

1
2


use gen::{C, M};
let class = [[M.lg, C.siz.w_6].join(":").as_str(), C.typ.text_lg].join(" ");

If you remember, back at the beginning of this post, I mentioned that the tailwindcss program scans your code to figure out which class names you are using, and then generates a CSS file with only those classes. But to turn class names like “w-3/6”, “h-0.5”, or “text-lg” into valid Rust identifiers, I had to transform them a bit. This means that tailwindcss will no longer recognize what classes you’re using!

Fortunately, Tailwind allows you to provide a custom “extractor” to find class names, on a per-file extension basis. So you need to modify your tailwind.config.js file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39


module.exports = {
 content: {
 files: ["index.html", "**/*.rs"],
 // You do need to copy this big blog of code in, unfortunately.
 extract: {
 rs: (content) => {
 const rs_to_tw = (rs) => {
 if (rs.startsWith("two_")) {
 rs = rs.replace("two_", "2");
 }
 return rs
 .replaceAll("_of_", "/")
 .replaceAll("_p_", ".")
 .replaceAll("_", "-");
 };

 let classes = [];
 let class_re = /C\.[^ ]+\.([^\. ]+)\b/g;
 let mod_re = /(?:M\.([^\. ]+)\s*,\s*)+C\.[^ ]+\.([^\. ]+)\b/g;
 let matches = [...content.matchAll(mod_re)];
 if (matches.length > 0) {
 classes.push(
 ...matches.map((m) => {
 let pieces = m.slice(1, m.length);
 return pieces.map((p) => rs_to_tw(p)).join(":");
 })
 );
 }
 classes.push(
 ...[...content.matchAll(class_re)].map((m) => {
 return rs_to_tw(m[1]);
 })
 );
 return classes;
 },
 },
 },
 ...
};

What the custom extractor does is find places in the Rust code that use modifiers or class names, then it transforms the names from Rust identifiers back to the Tailwind CSS names.

And with that in place, you now have compile-time checked Tailwind CSS class names, and a workflow that uses the tailwindcss tool without requiring node, npm, or yarn.

You might be tempted to add the tailwindcss-to-rust invocation to your Trunk.toml file (or other bundler tool). But in many cases, this won’t be necessary. For most projects, you will run the generator very rarely, possibly running it once only. The only things that require a re-run are:

You add/remove plugins from your tailwind.config.js.
You make changes to your tailwind.config.js that change the names of custom CSS classes you have configured.

So unless you have a config that generates custom names, you will rarely need to regenerate your CSS file. If you do have custom config, then it may make sense to have Trunk run tailwindcss-to-rust.

The Ergonomic Macros

The example I gave of using the generated structs earlier was this:

1
2


use gen::{C, M};
let class = [[M.lg, C.siz.w_6].join(":").as_str(), C.typ.text_lg].join(" ");

I said this was gross, and there are a couple reasons I think so. First, I hate having to manually join modifiers with a colon, and then the overall class list with the space. Second, because the first join with the modifier produces a String, you have to convert it to a &str to join it with the static &str in C.typ.text_lg. You could also write C.typ.text_lg.to_string() and drop the earlier .as_str(). But yuck either way.

You’ll be using these modifiers and classes a lot, so having to constantly repeat these join calls is horrible. To make using this generated code not horrible, I wrote a crate with helper macros called tailwindcss-to-rust-macros. Much of this crate’s content is a slightly tweaked version of code copied from the Seed framework codebase, adjusted to make it more generic.

Using the macros looks like this:

1

let class = C![M![M.lg, C.siz.w_6], C.typ.text_lg];

Yay, no join calls! The “arguments” to these macros can be any of these types:

&str
String
&String
Option<T> and &Option<T> where T is any of the above.
Vec<T>, &Vec<T>, and &[T] where T is any of the above.

There’s also a DC![...] macro for use with Dioxus inside its rsx! macro.

The big downside of using macros is that you won’t get any auto-completion help from your IDE inside the macros, at least for now⁹. This is a bit ironic since one of my main motivations for this tool was to make something that worked better with auto-completion. But there are some tricks. You can write this:

1

let class = [[M...., C.siz....], C.typ...

Where the ... is where your IDE will kick in and provide auto-completion. Then you can transform that into the equivalent macros. I bet you could even write an editor plugin to do this for you, but I haven’t done this yet.

If you hate macros, you could just write some helper functions:

1
2
3
4
5
6
7
8
9


fn m(names: &[&str]) -> String {
 names.join(":")
}

fn c(classes: &[&str]) -> String {
 classes.join(" ")
}

let class = c(&[&m(&[M.lg, C.siz.w_6]), C.typ.text_lg]);

This isn’t entirely terrible, but that sure is a lot of references to read. And they don’t handle all the Option and Vec/slice combinations that the macros handle.

A Future Feature?

One person who looked at this tool commented that they didn’t like the name transformations I used and would prefer to just use the original Tailwind names in code. I was thinking about how this might work and I think you could use these names with a procedural macro. So you could write this …

1

let class = C!["hidden", "lg:visible", "w-6", "text-lg"];

… and it would produce code something like this:

1
2
3
4
5
6


let _ = C.lay.hidden;
let _ = M.lg;
let _ = C.lay.visible;
let _ = C.siz.w_6;
let _ = C.typ.text_lg;
let class = ["hidden", "lg:visible", "w-6", "text-lg"];

But there are a some wrinkles. First, it’s not clear how to go from a class name to its group at compile time. How does the macro know that “hidden” belongs to C.lay? This might require producing a single struct with all the classes so the generated code could just reference C.hidden. Or maybe it could generate a bunch of structs split up by the first letter of the class name if one big struct causes editor issues.

Second, I suspect the compilation errors from typos will be kind of horrible, since they’ll end up referring to things like C.lay.hiddden that simply don’t exist in the code you wrote.

But if someone wants this, please make an issue in the repo and we can discuss it.

Putting It All Together

You’ll probably want a module in your code that wraps up the generated code and macros together into a convenient set of exports. The macros documentation shows you how to do that.

There are a lot of moving parts here, so here’s the summary:

Follow the instructions for installing and running the tailwindcss-to-rust tool.
Create the module as described in the docs for the macros.

Import the module and use it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


use css::*;

fn some_func() {
 let class = C![
 C.spc.p_2,
 C.typ.text_white,
 M![M.hover, C.typ.text_blue],
 ];
 ...
}

And that’s how you can have compile-time checking for your Tailwind class names. Of course, in doing all of this I’ve probably learned more about the Tailwind class names than I ever knew before, so I’ll never typo a class name again. Hah!

See my “Frontend Rust Without Node” post for a lot more details on what the tailwindcss tool is and how it’s used. ↩︎
Which you can automate with Trunk. See my “Frontend Rust Without Node” post for example code. ↩︎
Trunk should make sure this happens by appending a content hash to the CSS file to ensure your browser doesn’t use an old cached version. ↩︎
The template also uses PostCSS to generate the “compiled” Tailwind CSS file, which is a way to use Tailwind without needing to run tailwindcss. ↩︎
I’m using Emacs (a great OS with excellent editing built-in) along with the fabulous LSP mode to give me the full IDE experience. As an aside, I only started using LSP mode a few years ago, and it’s been a huge game-changer when writing code in languages with a good LSP server, mostly Go and Rust in my case. ↩︎
I could have written this in Rust, but for me this sort of thing is much, much quicker to whip up in Perl, especially using some great libraries off CPAN, notably LWP::Simple and Mojo::DOM. ↩︎
I put “parses” in quotes because all it does is use a regex to match names like “.foo”. I tried using some CSS parsing crates but they were all enormously complex, and just getting a list of all the classes in a file was ridiculously hard. But then I remembered I’m an old-school Perl hacker and that regexes are always the best worst solution to any problem. ↩︎
The default is pub(crate) but you can make it pub with the --visibility flag. ↩︎
See the “IDEs and Macros” post on the rust-analyzer blog for why. ↩︎

Frontend Rust Without Node

Mon, 14 Feb 2022 11:40:55 -0600

When I started my frontend Rust project, I used a Seed project starter template that included Tailwind, webpack, some more JS frontend dev stuff, and some TypeScript glue code that launches the app.

This was a lot of stuff. Stuff for me to install. Stuff for me to run. Stuff for me to (not) understand. So. Much. Stuff!

To be clear, I don’t want to dump on the author of this template. For one thing, it was created in 2019, and Rust frontend has advanced a lot since then¹. And the template did what it says. It was a quick start. I just didn’t understand how most of it worked. And it was a bit slow to run on code changes. And making changes to it was frustrating because of all the stuff I didn’t understand.

So when I went to try Dioxus, I wanted to see if I could avoid using any Node technologies, especially a bundler like webpack, and doubly especially avoiding any JS/TS glue code for the app.

Can I avoid that? Well, let’s figure that out.

What Does Webpack Do?

It’s called a “bundler”, which is pretty clear. It takes all your stuff and bundles it into a thing you can run from a local dev server or distribute to production. That stuff includes:

Your JS/TS² application code.
At least one HTML file, which you need to kick off the application in the browser.
Your CSS, which may need to be generated from SCSS or SASS.
Images.
Fonts.
Anything else.

The output of Webpack is an index.html file that has been processed to load your compiled CSS³, your compiled JS⁴, and maybe other stuff. It will also place the compiled CSS, images, fonts, etc. in a single directory tree, usually in a top-level folder named dist. Then you can start a dev server to serve this tree or ship it off to production (by making a tarball, a Docker image, etc.).

When working on a frontend application in Rust, you still need to do some of these tasks. You need to compile your application from Rust to WASM. You need an index.html to load that application. You’ll probably want that index.html to load a CSS file. You might have fonts or images you need to distribute. And you can do all of that with webpack.

But you don’t have to!

Just Use Trunk

Trunk is to Rust WASM web apps what Webpack is to JS web apps. It will compile your Rust code to WASM, process SASS or SCSS files, minify things, copy images, etc.

Trunk integrates with the wasm-bindgen tool, which is the CLI tool that turns Rust into WASM. This tool also generates some polyfill code to implement features not yet implemented in all browsers.

The wasm-bindgen crate also provides an API for communicating with JS code from Rust, which allows you to integrate with existing JS libraries⁵. There are a number of libraries that build on top of this integration, like wasm-logger, which makes the output from log crate go the browser’s console.

Here’s a very minimal index.html that will make Trunk compile your Rust app:

1
2
3
4
5
6
7


<html>
<head>
</head>
<body>
 <div id="main"></div>
</body>
</html>

Wait, what? There’s nothing in there! If you don’t point it at any code in your HTML, then Trunk will automatically compile the crate containing your index.html and turn it into a WASM application that your index.html loads.

If you want some CSS you can add this to the <head>:

1
2
3


<head>
 <link data-trunk rel="css" href="/css/my-compiled.css"/>
</head>

If you use rel="scss" or rel="sass" then Trunk will compiled that file into a CSS file. Trunk also hashes the file and puts that hash in the path to ensure that browsers reload the CSS whenever the CSS source changes.

Other file types all use the same <link rel="$type" href="/path/to/file.$type"> pattern. Trunk supports icons (for your favicon), images, and it can copy files and directories wholesale for images, fonts, etc. At the present, it doesn’t support any sort of fancy processing of those files natively, and it doesn’t do hashing of them (yet).

However, you can use your own hooks to make it do other stuff by running arbitrary programs. And this is how I’ve been able to avoid using webpack and I’m able to not run node at all for my web app.

Ok, I lied, I do run node.

Tailwindcss

I’m using the standalone tailwindcss executable, which effectively bundles node plus some JS code into a single executable. You can download this from the tailwindcss GitHub project’s releases page.

I have an “input CSS file” for tailwind that looks like this:

1
2
3


@tailwind base;
@tailwind components;
@tailwind utilities;

I also have a tailwind.config.js file that looks like this:

1
2
3
4
5
6
7


module.exports = {
 content: ["index.html", "**/*.rs"],
 theme: {
 extend: {},
 },
 plugins: [require("@tailwindcss/forms")],
};

Then I have this bit of configuration in my Trunk.toml file:

1
2
3
4
5
6
7
8
9


[[hooks]]
stage = "build"
# I'm not sure why we can't just invoke tailwindcss directly, but that doesn't
# seem to work for some reason.
command = "sh"
command_arguments = [
 "-c",
 "tailwindcss -i css/tailwind.css -o css/tailwind_compiled.css"
]

Why can’t I just run tailwindcss directly. I don’t have a damn clue.

What exactly does tailwindcss do? To answer that, it’s important to understand the basic design of Tailwind. Unlike most CSS frameworks, with Tailwind you don’t build your own SASS/SCSS/CSS file using the framework as a base. You don’t define new classes based on Tailwind classes.

Instead, Tailwind provides hundreds of small utility classes like mr-8 (right margin size 8), flex (use a flexbox layout for this element), and text-lg (make this text larger). You use those classes directly in your code which generates HTML. Here’s an example using JSX:

1
2
3
4
5
6
7


export default function App() {
 return (
 <h1 className="text-3xl font-bold underline">
 Hello world!
 </h1>
 )
}

This example uses three Tailwind classes, text-3xl, font-bold, and underline. Your first reaction may be shock and horror. Mine was! But when I read more about the reasoning behind it I realized that this actually works very nicely with modern frontend web app practices.

Nowadays, you don’t write your HTML in one set of files and then make it dynamic with separate JS code. Having lots of HTML was the original reason to use CSS classes. It meant that you could have many different HTML pages with the same styles easily. Anywhere you embedded a search box you’d slap a search class on the <div>.

In modern apps, your JS (or in my case, Rust) code generates the HTML directly. So your HTML generation can be factored into functions or methods. And that means that you never need to repeat the same sets of Tailwind classes across your application. If you need to reuse some particular piece of layout, you can turn that into a reusable component. Here’s one from my music player:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


#[inline_props]
pub(crate) fn UserFacingError<'a>(
 cx: Scope, error: &'a crate::client::Status,
) -> Element {
 cx.render(rsx! {
 PageTitle {
 "Error"
 },
 div {
 class: "flex flex-row flex-wrap justify-center",
 "{error.message}",
 }
 })
}

If I need that set of classes, "flex flex-row flex-wrap justify-center", in other components then I can either make a new component for that <div> with those classes, or I can just have a function that returns those classes:

1
2
3


fn center_flex_classes() -> &str {
 "flex flex-row flex-wrap justify-center"
}

I’m using a real programming language to generate HTML, so I can take advantage of that fact to avoid repeating my CSS classes all over the place⁶.

Finally, to tie it all together, I load the CSS in my index.html:

1
2
3


<head>
 <link data-trunk rel="css" href="/css/tailwind_compiled.css"/>
</head>

What Does `tailwindcss` Do?

Remember that I’m running tailwindcss via Trunk as part of my build process? Why? If tailwind is just a bunch of already-defined classes what is that command doing?

Well, calling it a “bunch” of classes may be understating things. To see how many, I generated a file containing all of the available-by-default CSS classes, with various media-size modifiers and pseudo-classes for hover and so on. It came out to around 7MB. That’s a big CSS file. Too big.

So what tailwindcss does is figure out which classes you’re using by looking at your code. Then it generates a CSS file with just the ones that you need. For my app I currently end up with a file that’s just 19k. That’s much better than 7MB!

Tailwind Typos

Have I mentioned that Tailwind offers a lot of CSS classes? Take a look at just the padding classes. There are a lot, and the names aren’t all that memorable.

If you typo a name then tailwindcss simply ignores it and it’s missing from the generated CSS. As I worked on my app I did this a lot. And it was always confusing. Was my layout wrong because of my HTML? Was it the CSS classes I chose? Did the CSS generation process not regen the file, so the class wasn’t in the generated CSS? Or did I just typo a class name, so something had “px-13” as a class, which doesn’t exist.

It turns out I made a lot of typos. I kept wasting time trying to debug why my newly modified CSS wasn’t applying any styles, only to realize I’d made a typo.

Wouldn’t it be nice if I could get the Rust compiler to check my CSS class names? Yes, that would be nice.

In fact, the Seed quickstart template I’d been using provides exactly that through a PostCSS⁷ plugin that generates Rust code from your Tailwind config.

Wouldn’t it be nice to have something like that for Dioxus? And maybe it could be generic enough for any framework? And maybe it could be written in Rust?

Yes, that would be fantastic! And I’ll talk about that in my next blog post, covering my new tailwindcss-to-rust tool. I’ll also cover the Dioxus helper crate I wrote that makes it super easy⁸ to use SVG icons from HeroIcons in your Dioxus project.

The first commit was in March, 2019, so it’s been about three years. ↩︎
TS = TypeScript. Your app could also be in another language like Elm or PureScript. ↩︎
Compiled from SASS or generated by tailwindcss or, in the case of plain CSS files, just concatenated together and maybe minified. ↩︎
Compiled from ES2015 or TS or JSX or Elm (probably using Babel) and processed to handle imports, then concatenated into one file and maybe minified. ↩︎
Fortunately, I haven’t had to do that yet, because these sorts of cross-language integrations are often painful in my experience. ↩︎
Of course, “can” is the operative word here. Am I doing this consistently? No, because I’m doing a lot of experimentation so I’m okay with some quick cut and paste for now. But if I was building something that I expected many other people to hack on and maintain, I would (I hope) exercise more discipline. ↩︎
Oh yay, another tool to run and another config file in my project I don’t understand. ↩︎
Barely an inconvenience! ↩︎

What Do I Want from My Next Job? 2022 Edition

Fri, 11 Feb 2022 09:35:56 -0600

The last time I wrote about this was 8 days after I left my last position, at ActiveState. Since writing that post, I’ve continued to think about what I want, and my thinking has evolved.

TLDR:

Top tier TC¹ or a 4 day week, and both would be ideal (but unlikely).
5+ weeks of PTO.
Yes++ to Rust. No Java or PHP.
No companies in certain fields like crypto.
I want to do a significant amount of IC work, even if I’m in a management position.

Reflecting on how I felt last October, I was well and truly burned out. The past few years had been quite tough for me. In September of 2020, my mother died. Then a few months later, my father moved to Minneapolis from Florida. We purchased a duplex together. He moved in right away, and my wife and I moved in a few months later in April of 2022.

This was a lot to deal with. Soon after, some things had happened at my job that had made me unhappy. I think if this had been a different time in my life, I would have fought harder to change these things². But at this particular time, I mostly felt exhausted and discouraged. I didn’t have the energy to tackle the issues that had come up. So instead, I waited until we sold our previous house, which gave us enough cash in the bank for me to take some time off.

I’m really glad I did that. This break has been fantastic. I’ve ended up coding more during this break than I did at ActiveState. It’s amazing how easy it is to write code when you’re the only customer and you can work on whatever you want.

But I’ve also been thinking more about what I want from my next job, and more broadly, what I want out of life³.

Working Less vs Compensation

Last time I wrote about this, I said this:

~~My number one criteria for my next job is being able to work 4 days per week (or less)~~.

But I think this was mostly my burnout talking. While I do like the idea of working less, what I realized was that this should be a long-term goal too. That means that I should be willing to work a bit more now to work less later. Specifically, if I can retire earlier than average, that’s a lot less working.

So here’s my new number one goal:

I want to work less, but that doesn’t have to happen this year or the next.

All of which is to say that I think I’d be perfectly happy to take a higher compensation 5-day job right now so I can save a significant amount of money towards earlier retirement. I’m not aiming for the full FIRE. I’m already too old for that. But I think it’s realistic to aim to retire in my late 50’s or early 60’s.

But I’d still consider a 4-day week too, with the expectation that the highest TC jobs will probably require 5 day weeks.

PTO

I also said:

I also want at least 5 weeks of PTO per year.

This still seems pretty reasonable. I’m sticking with this one, especially if I’m working 5 days a week.

Languages and Technology

I’d still love to work with Rust. During the last 5 months I’ve mostly been writing Rust, and I’ve greatly enjoyed it. I haven’t touched Go since I left ActiveState, though I plan to refresh myself before I start interviewing in earnest.

Everything else I said in my last post about this topic still stands true.

Company Size and Stage

I’m still open to pretty much anything. If I want to focus more on total compensation, I think I will by necessity be looking at larger companies as well, and that’s fine.

IC⁴ or Management?

Same as last time. I’m open to either, but if I’m managing I’d like to be able to code as well.

Company’s Product

Again, same as last time.

No crypto/blockchain.
No social networking⁵.
No gig economy work for non-professionals.

I’d also add “no companies that use animals or animal products”, though this is uncommon in the tech field anyway. I guess I’d probably be avoiding many biotech companies because of animal testing, but given my skill set, I don’t think I’d be a good fit for that field anyway.

Narrowing it Down

My hard requirements are:

Top tier TC or a 4 day week, and both would be ideal.
5+ weeks of PTO.
Yes to Rust. No Java or PHP.
No companies in the unacceptable fields I’ve mentioned.
I want to do a significant amount of IC work, even if I’m in a management position.

TC = total compensation ↩︎
Yes, I’m being vague. There’s no need to air dirty laundry. ↩︎
I’m 48, which means my life is very likely more than half over. I think it’s good to keep that in mind as I make big decisions. ↩︎
IC = individual contributor, aka not management ↩︎
I also included “surveillance capitalism” in my last post, but it occurs to me I’m not 100% sure what that means. I need to think this one through a bit more clearly. I know I wouldn’t work for Facebook, but would I work for Google? My gut instinct is that they have many products that seems perfectly fine to work on, like GCP. ↩︎

My Rust Frontend Experiences

Tue, 08 Feb 2022 17:11:40 -0600

As I mentioned in a previous post, I’ve been working on a music player for a while as a fun side project. Though since I’ve been jobless this has actually been my primary project, and I’ve probably spent more time coding than I did at work¹.

This almost looks like a real app but don't be fooled. Half the buttons don't work yet.

My goal was always to build a backend that could support multiple frontends, especially a web app and mobile apps. I decided to work on the web app frontend first. I wanted to build a frontend quickly so I could work on the backend and importer and have a way to exercise them.

Rust can compile down to WASM, which means you can run Rust in the browser, and there are quite a few Rust frameworks for frontend development. Check out this LogRocket blog post for a good list that’s current as of early 2022². There are a number of options including some that provide an experience very similar to React or Elm.

My initial plan was to use Rust, but after a little investigation I ended up trying Flutter instead. It promises to support both mobile apps and the web, and there was no Rust frontend framework that did that. But then some quick experiments with Flutter’s web output convinced me not to use it. Flutter’s web renderer works by creating a <canvas> tag and drawing your entire app inside it. This means that nothing in the app works the way you’d expect. You can’t even select text without explicitly re-implementing it. So it basically breaks everything the browser does. No. Just no.

So it was back to Rust. After looking at a few options, I decided to start with Seed. It was web only, unfortunately, but it seemed like a good design. I got something working fairly quickly and it was a good experience. The message-passing based API meant I had to create a lot of Msg enums, basically one per component with any interactive piece. And then I was constantly calling some_view(foo).map_msg(Msg::SomeViewMsg) because of the message types. If this makes no sense, just trust me that it’s a bit annoying, but not a dealbreaker.

But then a month ago, Jonathan Kelley announced Dioxus, which supports the web as well as mobile³ and desktop apps. This was exactly what I’d wanted!

So I decided to give it a try. If it worked well for the web app, I’d be way ahead on desktop and mobile versions too. I’ve been working with it for a few months and I quite like it. It’s very React-like, though since my last use of React was in 2017, the whole “hooks” thing is new to me. The Dioxus Discord community has been great, and the author has been incredibly responsive to questions and bug reports. Often he fixes things within minutes or hours of my report, and he’s also been very helpful when I’m confused about how to do something (which happens a lot).

I even wrote a first draft of a (web-only for now) router, which was a fun learning exercise for me. The Dioxus author merged it pretty quickly⁴, so it’s there for anyone else to try out too.

I’ve also released a few more Rust crates related to frontend work, but I’ll go into those in a future blog post. I’ll also talk a bit about how I set this up the development environment, and how I’ve so far managed to avoid needing any Node tools like webpack.

I’ll be looking for a new position soon⁵, but until then I’ll keep working on this app. It’s been a great learning experience, and a lot of fun too.

Hey, I was in management. I wasn’t slacking. ↩︎
Which means it will be out of date in a few months? Weeks? ↩︎
Just iOS for now though. ↩︎
So maybe he doesn’t have the best judgement all the time. ↩︎
My current plan is to start my search in March, but if you have a great Rust job for me please let me know. ↩︎

My PDF Resume Script

Sat, 08 Jan 2022 10:26:12 -0600

Edit 2022-01-19: I’ve since moved the script to a public repo. Also, I’ve reformatted my resume to make it a bit shorter, so here’s a more recent PDF version.

I only want to have one canonical resume, and I want to keep it on my personal website. That makes it trivial to update, and anyone with the link can see the latest version at all times.

But unfortunately very few job application systems will accept a link to a resume. Most want a document of some kind. I didn’t want to maintain a second copy as a Google Doc or something like that, so I wrote a script to transform the web version to a PDF. ~~Here’s a copy of the PDF version~~.

When I first looked into doing this I was afraid it would be a lot of work. Fortunately, it turned out to be super easy¹. It’s in Perl, of course. This script is true glue code, and Perl made this work trivial.

All it does is grab the raw HTML from the page, munge it a bit, and then pass it through Pandoc. Pandoc is great. It uses LaTex as the intermediate format to generate the PDF, so it ends up looking very professional (IMO) with no work on my part!

I wrote the first version back in 2016 when I knew I was looking for a new position. I’ve made a few tweaks to it over the years as the web version has changed, but not many.

So in summary, yay Perl, yay Pandoc.

Barely an inconvenience. ↩︎

Working with MusicBrainz Name Data

Wed, 22 Dec 2021 23:18:15 -0600

Since leaving my job last October I’ve been working on various personal projects for fun and learning. One of those is a music player¹ written entirely in Rust, including a web-based frontend using Seed, which is an Elm-like frontend framework in Rust. I’d like to write some other posts about that as well, but today let’s talk about artist names, specifically how these are represented in the MusicBrainz data.

I have a lot of Japanese music², and one of the main reasons I don’t like any of the music players I’ve tried so far³ is their handling of non-Latin script names and titles. More specifically, most players only allow one name (plus a sortable name if you’re lucky). I find this annoying because what I’d really like to see in many cases is a Latin transcription or English translation instead of the original. For reference, a transcription is a phonetic representation of a name.

I prefer to see a transcribed name like “Ryokushaka” instead of “緑黄色社会”. I also prefer the transcription over the translation, “Green Yellow Society”. But it’s nice to be able to see the Japanese name as well, especially if I’m trying to search the web for information about the artist or search YouTube for videos.

So what I really want is a tool that can handle multiple canonical values for each name or title, specifically the name according to the artist as well as optional transcribed and translated names for non-Latin names. And each of those canonical names also needs a sortable version (at least for artists), so “The Beatles” sorts as “B”, not “T”.

In order to accomplish this, I need to get more data than my MP3 files have. The best data source I’ve found is the MusicBrainz project, which is a fantastic open source/open data project for collecting information about all recorded music, or in their words, “MusicBrainz is an open music encyclopedia that collects music metadata and makes it available to the public.” It’s a lot like Wikipedia in that anyone can make an account and contribute edits. It’s also a lot like Wikipedia in that the quality and correctness of entries varies wildly.

MusicBrainz (MB) allows for artists, releases (albums), and tracks to have many different names/titles. I will focus on artists for now, since artists have the most complex name data. An artist has a primary name, an optional sortable name, and zero or more aliases. Each alias has a name, an optional sortable name, an optional locale, a boolean indicating whether its the primary alias for a locale, and a type. Artists may also appear with different names in artist credits, either of their own albums or others, like compilation albums or as guest artists on someone else’s album.

The possible alias types are “Artist name”, “Legal name”, and “Search hint”. An “Artist name” is another version of the name they perform under. A “Legal name” alias is used to record the name for people who perform under an alias, like Madonna or Lady Gaga. I’m just filtering these out, since they’re not interesting for my purposes. A “Search hint” can be a common misspelling of their name, a fan nickname, or anything else that helps the search engine be smarter.

To make things a little more complex, MB has a policy of having the primary sortable name always be in Latin script, even if the primary name is not.

Let’s use Kenichi Asai⁴ as an example. For reference, his personal name (aka “first name”) is “Kenichi” and his family name (aka “last name”) is “Asai”. In English we’d typically write “Kenichi Asai”, but you might also see “Asai Kenichi”, because Japanese usage always put the family name first. Sometimes you’ll also see “ASAI Kenichi” to denote that “Asai” is the family name.

Here are all of his names in the MB data:

Name	Sortable name	Locale	Alias type
Primary name
浅井健一	Asai, Kenichi
As I noted, MB has a policy of always using a Latin script sortable name for primary names.
Aliases
Kenichi Asai	Asai, Kenichi	en (primary)	Artist name
浅井健一	あさいけんいち	ja (primary)	Artist name
The sortable name here is the Kanji (Chinese characters) written entirely in Hiragana⁵ in order to make the pronunciation clear.
浅井健一	あさいけんいち	ja	Artist name
This one differs from the previous one by having a space between the family and personal names, which is not how Japanese is typically written.
Asai Ken’ichi			Search hint
An alternate Latin transcription of his personal name. There are many different Romanization systems for Japanese.
Benzie	ベンジー		Artist name
A nickname of his, which as far as I can tell is not used for any music credits, so this should probably be a Search hint instead
Artist credits
Asai Kenichi
Kenichi Asai
Santana feat. Kenichi Asai
浅井健一

Wow, that’s a lot names! But we can see that there’s a number of duplicates.

So the question is how to take that list and boil it down to the following elements:

The “real” name, by which I mean the name the artist typically uses in their home locale.
A sortable version of that real name if it differs from the real name.
A transcribed name if the real name is not in Latin script.
A translated name if the real name is not in Latin script and the real name is not a personal name. In other words, it doesn’t make sense to “translate” 浅井健一 (Kenichi Asai), which is a personal name. But it does make sense to translate a band name, for example translating 緑黄色社会 into “Green Yellow Society”.

The answer, of course, is to use a whole bunch of heuristics, because this is impossible to get 100% right, especially since the source data can be incorrect and incomplete. There’s not a lot of consistency in how people end up inputting non-Latin names, and I’m pretty sure the artist credit of “Asai Kenichi” is just wrong and the correct credit is either “浅井健一” or “Kenichi Asai”.

So here’s the algorithm I have so far …

First, we need a way to figure out if a name is in Latin script or not. A name is Latin (for my purposes) if it matches this regex: \A[\p{Latin}&\P{L}]+\z. The \p{Latin} matches any Unicode character marked as being part of the Latin script⁶. This includes ASCII letters, but also things like “ą”, “Ň”, etc. The \P{L} matches any character that is not a letter, including numbers, emoji, etc. These two matches are combined together into a single character set. This is a pretty good heuristic and will get it right for pretty much anything that an English reader can pronounce (or reasonably mangle), including names like “Björk” or “Sigur Rós”, while not matching things we can’t pronounce, like “緑黄色社会” or “Чайковский” (Tchaikovsky).

The only place this goes wrong is names that include non-Latin characters for emoticons like “(┛◉Д◉)┛彡┻━┻”. While it does include a Cyrillic character, “Д”, and a Chinese radical, “彡”, it is not in a foreign language and doesn’t need to be transcribed or translated (though it’s also completely unpronounceable). Looking at the MB data, this is a thing. There’s an artist named “（´・д・）ﾉ” in the database. I’m not sure how you’d pronounce this. Treating this as non-Latin is fine. If the artist has an alternate alias that’s speakable, then it’d be good to have that, and if they don’t, that’s okay.

I start by pulling all the possible names from the MB data and categorizing them into three types:

The Primary name, which is the name attached to the artist record, as opposed to aliases.
Aliases which aren’t search hints. Names used in artist credits are also put in this category.
Search hints.

I have a special case for the fact that MB uses Latin script sortable names for non-Latin names. In some cases the only instance of a Latin name is as the primary sortable name. In that case, I add that sortable name to my list of names as an alias. To make things trickier, this sortable name may be something like “Asai, Kenichi”, so I transform it back to “Kenichi Asai” if it contains “, " (a comma followed by a space)⁷.

Once I have all these names I have to figure out the best primary name and sortable name, transcribed name and sortable name, and translated name and sortable name. Of these, only the primary name is required. Everything else is optional.

I should only have one primary name. If this name is Latin, I use that along with the corresponding primary sortable name. If the primary name is not Latin but its sortable name is Latin, then I use the name but not the sortable name.

It’s very important to take the primary name as given by the MB data. Originally I tried preferring non-Latin names, but that doesn’t work at all. There are lots of Japanese bands whose canonical names are in English, like the pillows and The Back Horn. While these bands do have Japanese aliases, their correct name is in English. There’s also even weirder cases like a band named moools, which is not exactly in English but is still always written in Latin script.

If the primary name is not Latin, then I look at the remaining names to try to figure out whether there are transcribed and/or translated names that can be used.

To do that I first filter out all the names that match the primary name and sortable name, as well as any names not in Latin script. Then I go through whatever’s left and split each name into words. For each word, I look up the word in a dictionary (right now a copy of /usr/share/dict/words) and count how many words in the name are in our dictionary⁸.

If there’s only one name, then if it contains any known words it’s a translation, otherwise it’s a transcription.

If we have multiple possible names to choose from, I use the known word counts to (attempt to) distinguish a transcription from a translation. A name with more known words from the dictionary is more likely to be a translation. So if we have many possibilities, the one with the most known words is a translation, and everything else is a transcription. This part of the algorithm works well for band names.

As an illustration of how this works, let’s take one of my favorite bands, 東京事変. Their name translates to “Tokyo Incidents”, and the transcription is “Tokyo Jihen”. Because “Tokyo” and “Incidents” are both in the word list, the algorithm picks “Tokyo Incidents” as the translation, with two known words, and “Tokyo Jihen” as the transcription, with just one known word.

This works less well for person names, since in that case I really don’t want to pick a translated name at all. There will be cases where a person’s name transcription ends up containing an english word, a Chinese name that transcribes as “Hui Ping”. A future revision will probably want to look at whether it’s dealing with a person or entity.

This gives me a list of possible transcriptions and translations, each of which is a name plus an optional sortable name. To pick the best one from each, I sort each list. If the name came from an alias for the en locale, I prefer that. If multiple names came from that locale, I tiebreak by looking at whether one was marked as the primary alias for that locale. If that doesn’t work, I prefer names that also have sortable names. And if there are still ties, I sort by string length and then just sort the names, in order to ensure that the algorithm picks the same name every time given the same inputs.

Everything that isn’t picked is stored as a search hint, so I don’t have to remember the name the system chose for an artist.

Phew, that’s a lot!

I really enjoy working on this sort of problem, where we have messy data and have to find a best path through to get the results we want.

But I originally started working on this (weeks ago) to do a quick version of the data importer so I could whip up a quick backend so I could get back to where I started, which was the frontend. I had put the frontend on hold because I felt like I needed more real data to work with in order to make more progress on the frontend. Now my importer is capable of importing my entire music collection, so it’s time to whip up that backend and get back to the UI work.

I will probably release the code for this at some point but for now it’s in a private repo. ↩︎
The topic of how and why there’s so much amazing Japanese music as compared to non-Japanese music is an interesting one that I’m only sort of qualified to write about. Maybe I’ll get into this in a future blog post. ↩︎
I currently use Rhythmbox on my computer and YouTube Music on my phone. Both could be better. ↩︎
He’s a fantastic guitarist and songwriter. Here’s a video for one of his songs. Also, as an aside, the drummer has a right handed drum kit but is playing the hi-hat with his left hand, which is really weird to see. ↩︎
Hiragana is one of two Japanese syllabaries (like an alphabet but the components are syllables, not letters). It is used for Japanese words and Katakana, the other syllabary, is used for foreign words. In Japanese, word(s) are often sorted by their pronunciation rather than some more abstract system based on the properties of the kanji used to write the word(s)⁹. ↩︎
Thank dog that Unicode characters have so much metadata! Without this I’m not sure how I’d test if a name is Latin or not. ↩︎
I should probably not do this if there are two commas to handle band names that are lists, like if “Earth, Wind & Fire” used an Oxford comma, which they don’t. But the perfect is the enemy of the good, and I want to get a good enough version going so I can work on other things. ↩︎
If a word contains a hypen I will also split that up and check the subparts to see if they’re words. This would handle something like “Green-Yellow”, which isn’t a word, but is clearly made of words. ↩︎
Sorting based on abstract properties is exactly how Chinese is sorted, since Chinese doesn’t really have a syllabary (except it does, but people don’t use the syllabaries for day to day stuff very much, unlike in Japan). ↩︎

What Do I Want from My Next Job?

Fri, 15 Oct 2021 10:25:12 -0500

I’m currently enjoying being unemployed, and I won’t be looking for a new position until some time in 2022, but I’ve been thinking about what I want from next position. There are many things I’d like, but what are my priorities?

Update 2022: The more I’ve thought about this the more I’ve realized that I can achieve my goal of working less by aiming for higher compensaion now in order to be able work less in the future. All of which is to say that I’m no longer committed to finding a job with a 4-day week.

This post is my attempt to organize my own thoughts about this so I can be more careful and systematic when I start looking for my next position.

Working Less

I’ll be 48 by the time I start my next job. Realistically, that means my life is more than half over. While I don’t hate working, I don’t love it so much that I want to maximize the amount of time I spend doing it! So spending less time at work is my top priority.

My number one criteria for my next job is being able to work 4 days per week (or less). I’m talking about 4 eight hour days, not 4 ten hours days¹. A 3 day week would be even more amazing, but at that point it’s really a part time job. I don’t think I can find that except as a consultant and I don’t want to go back to consulting.

Fortunately, the 4 day work week seems to be getting a bit of traction recently. More and more companies are embracing this, though the absolute numbers are still small. I’ve found a couple job boards that make finding these positions easier, 4dayweek.io and People-First Jobs.

And even if a company isn’t advertising a 4 day week, sometimes you can negotiate for this individually. At my previous job, I had a 9 day fortnight (every other Friday off). And the job before that I did have a 4 day week. In both cases, this was something I negotiated, not a company policy.

I also want at least 5 weeks of PTO per year. Fortunately, this one seems much easier to find. Given the incredibly hot labor market in tech right now, I’m seeing many positions that offer this.

Of course, some places also offer the nebulous “unlimited vacation”. In the best case, employers do this so they can let people who want more time off have it without accruing a huge liability in untaken PTO for people who don’t.

But the devil is in the details. Clearly it’s not actually unlimited, since otherwise I could simply take all my time off and never work. So what does “unlimited” really mean? My plan is to ask everyone I talk to a few questions about this:

How much time did you take off over the past 12 months?
How much time did your coworkers and/or reports take off over the past 12 months?
How much time did members of upper management take off over the past 12 months?
What do you think the real upper limit on PTO is? (Because of course there is one!)

Some companies that offer unlimited vacation also have a mandatory minimum amount of vacation, which I think is a very good sign, as long as it’s enforced.

Compensation

If working less is my top priority, I may have to compromise a bit on compensation. None of the the very top-paying companies offer 4 day weeks yet (AFAIK). But again, given the incredibly hot labor market in tech, I suspect I can find something that pays as much or more than my last position.

I’d love to share what my actual compensation at past positions was, because I think people don’t talk about this nearly enough. But sharing this information with all potential future employers seems like a tactical mistake. I compromised by putting my salary information into levels.fyi. But given how small ActiveState is, it may be quite some time before they have enough data points to show their salaries.

Languages and Technology

I would really love to work with Rust. I’ve been using it for personal projects for a while and I find it quite satisfying. That said, Rust jobs are rare, and many of them are either in fields I want to avoid, like cryptocurrency (see below), or they require expertise I don’t have, like graphics or low-level programming experience.

More generally, I’d really like to do something where I can learn something new. Building yet another REST app in a dynamic language or Go does not feel very exciting at this stage in my career.

Also, I won’t do Java (other JVM languages are fine) or PHP.

Company Size and Stage

In my career so far, I’ve found that I’ve most enjoyed working for a small company that is already profitable. That said, I’ve never worked for a startup that actually took off, nor have I ever worked for a large tech-focused company (like a FAANG, Stripe, Elastic, etc.). I think either of those could be really good experiences as well.

Surprisingly, I have seen a few startups offering 4 day weeks. My question is whether they end up in some sort of indefinite crunch mode and throw this out the window because of the pressure that a startup experiences.

All of this is to say that I’d consider just about anything except for a large company that isn’t a tech company. I did that once, and it was by far the worst work experience I’ve had. Never again!

IC² or Management?

At ActiveState, my most recent employer, I was as Team Lead, which included actual people management responsibilities. I had never really done this before, so I decided to take the position in order to try something new and do something that scared me. Challenges are good.

I enjoyed this position. My team size varied from 2-5 people over time based on restructurings, people leaving, etc. As a rough estimate, I’d say I spent 40% of my time on management stuff and 60% doing IC work like writing design docs, coding, code reviews, etc. This was a good balance for me.

I think I did a reasonably good job at management, though it was definitely a big learning curve. The people who worked for me gave me very positive reviews. The people above me also gave me good reviews, though they had more constructive criticism than my reports.

So what should I do next?

I’m pretty sure I wouldn’t enjoy being any higher up in the management chain. At any level above Team Lead, you have even less time for IC work. My manager, the Director of Engineering, did find some time to code, but not a lot of it. The CTO had even less, to the point where he mostly seemed to write what code he did in the evening or on weekends.

I’d be happy with a similar level of management responsibilities in the future. But I would also enjoy a higher level IC position (Staff/Principal/Grand Poobah Engineer). Strategically, the IC position might be a better choice, because I don’t want to advance any further up the management track than where I’d start.

Company’s Product

I’m not super picky about the product. I think anything can be interesting and rewarding, especially if you have engaged customers who really like what you’re building. A developer-focused product is always fun, and I enjoyed that aspect of ActiveState, but it’s not a necessity for me.

But there are a few things I don’t want to work on:

Cryptocurrency

I don’t know enough about this field to distinguish scams from real products, and it’s really hard for me to see how the upsides of these products outweigh their downsides.

Just no.

Gig Economy Work for Non-Professionals

Again, just say no. Everything I know about Uber/Lyft/Doordash/GrubHub/etc. is bad. They are predatory companies taking brutal advantage of people who are cash-poor and not great at math. Plus they seem to have no viable way of ever being profitable, so they’re basically just a scam to funnel VC³ money to founders, or from early VCs to later VCs⁴.

In my personal life, I’ve sworn off all food delivery services, and I’m doing my best to avoid “rideshare”⁵ services too.

I specify “non-professionals” because there are also gig products aimed at professionals, like graphic designers, software developers, etc. I’m not sure whether these products are purely bad in the same way, and I can imagine a product that was actually a net win for everyone involved.

What Else?

I feel like I’m probably missing some other field. I haven’t seen any MLM⁶ companies advertising for developers, but that would be a hard no if it came up.

Narrowing it Down

My hard requirements are:

A 4 day week (or less).
5+ weeks of PTO.
No Java or PHP.
No companies in the unacceptable fields I’ve mentioned.
I want to do a significant amount of IC work, even if I’m in a management position.

Everything else is a negotiation or compromise.

This is a post that makes me wish I had a commenting solution for this blog. But I’ll share this on Hacker News and hopefully I’ll get some good feedback there.

As a software developer, you’re not going to get more done in a ten hour day anyway! ↩︎
IC = individual contributor, aka not management ↩︎
VC = venture capital ↩︎
Though given how much funding goes into these companies, the VCs must think there is a way to make money here. ↩︎
Why is this called “rideshare”? What is being shared? ↩︎
MLM = multi-level marketing ↩︎

Let the Funemployment Begin

Thu, 07 Oct 2021 15:52:23 -0500

Today was my last day at ActiveState. I enjoyed my nearly five years there (starting in February of 2017), but for a variety of reasons I decided to leave. I’m in the very fortunate position of being able to be jobless for a while, I’m not planning to look for anything new until January of 2022 at the earliest.

I have some programming projects of my own that I plan to work on, and I’ll post about them here if they turn into anything interesting. I’m also maybe open to short-term consulting gigs (a few days or weeks, not months) if someone wants to throw sufficiently large amounts of money at me. Email me if you’re interested.

Writing a Postgres SQL Pretty Printer in Rust: Part 2

Sat, 24 Apr 2021 13:00:33 -0500

It’s been a few weeks since my last post on this project. I was distracted by Go reflection and security issues with Perl IP address modules. But now I can get back to my Postgres SQL pretty printer project¹.

One of the challenges for this project has been figuring out the best way to test it. I tried a standard unit test approach before giving up and settling on integration testing instead, so in this post I’ll talk about what I tried and what I ended up with.

Series Links

Part 1: Introduction to the project and generating Rust with Perl
Part 1.5: More about enum wrappers and Serde’s externally tagged enum representation
Part 2: How I’m testing the pretty printer and how I generate tests from the Postgres docs

Unit Tests

Whenever possible, I prefer to write unit tests for my code. Let’s define “unit test”, since people use that term all the time, but with slightly different definitions

When I say “unit tests” I’m referring to tests that test the functionality of a single module², focusing on its public API. If that module integrates with other modules, unit tests may require mocking. Some will say it always requires mocking, but I think that is often a waste of effort, so I avoid mocking in many cases. And though I said “focusing on its public API”, sometimes I will test private functions directly, if they are complex enough to warrant this and doing so is not too painful.

Unit tests are great when you can write them. They let you focus on one small piece of a code base at a time to make sure that it does what you expect. Good unit tests cover the standard use cases, corner cases (zero values, extreme values, etc.), various permutations of argument combinations, and error handling.

If you unit test all of your modules, you know that each module probably does what you want it to. This doesn’t tell you whether they all work together properly, but it’s a good start.

The bulk of the (non-generated) code for pg-pretty lives in one library crate named formatter, and initially this was all in the crate root’s lib.rs file, though I’m working on rewriting this in a branch named new-formatter (no links because the branch will be deleted after I merge it to master).

The original formatter code³ has many functions, but the vast majority of them work the same way. They take a struct⁴ or slice of structs defined in the parser crate’s ast module and return a string representing the formatted SQL for that piece of the AST⁵.

Here’s a simple function from that code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23


fn format_range_var(&self, r: &RangeVar) -> String {
 let mut names: Vec<String> = vec![];
 if let Some(c) = &r.catalogname {
 names.push(c.clone());
 }
 if let Some(s) = &r.schemaname {
 names.push(s.clone());
 }
 names.push(r.relname.clone());

 let mut e = names
 .iter()
 .map(Self::maybe_quote)
 .collect::<Vec<String>>()
 .join(".");

 if let Some(AliasWrapper::Alias(a)) = &r.alias {
 e.push_str(&Self::alias_name(&a.aliasname));
 // XXX - do something with colnames here?
 }

 e
}

A RangeVar is a struct representing a name in a FROM clause. This will always have a relname (a table, view, or subselect), with optional database and schema names, like some_db.some_schema.some_table. In addition, it may have an alias.

This is a relatively simple example, as this type of struct doesn’t contain any complex structs itself. Other functions, like for formatting a select statement, mostly consist of calls to other formatting functions:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


fn format_select_stmt(&mut self, s: &SelectStmt) -> R {
 let t = match &s.target_list {
 Some(tl) => tl,
 None => return Err(Error::NoTargetListForSelect),
 };
 let mut select = self.format_select_clause(t)?;
 if let Some(f) = &s.from_clause {
 select.push_str(&self.format_from_clause(f)?);
 }
 if let Some(w) = &s.where_clause {
 select.push_str(&self.format_where_clause(w)?);
 }
 if let Some(g) = &s.group_clause {
 select.push_str(&self.format_group_by_clause(g)?);
 }
 if let Some(o) = &s.sort_clause {
 select.push_str(&self.format_order_by_clause(o)?);
 }

 Ok(select)
}

This is an early, incomplete version which doesn’t handle HAVING clauses, LIMIT clauses, window clauses, locking clauses, UNION queries, etc. The point I’m trying to make is that these functions get complex quickly. While breaking them down into lots of smaller functions helps, I’ve ended up with a huge number of small functions.

So how do we test this with unit tests? To do that, we need a way to produce structs from the ast module like RangeVar or SelectStmt. For reference, here’s RangeVar:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


pub struct RangeVar {
 // the catalog (database) name, or NULL
 pub catalogname: Option<String>, // char*
 // the schema name, or NULL
 pub schemaname: Option<String>, // char*
 // the relation/sequence name
 pub relname: String, // char*
 // expand rel by inheritance? recursively act
 // on children?
 #[serde(default)]
 pub inh: bool, // bool
 // see RELPERSISTENCE_* in pg_class.h
 pub relpersistence: Option<char>, // char
 // table alias & optional column aliases
 pub alias: Option<AliasWrapper>, // Alias*
 // token location, or -1 if unknown
 pub location: Option<i64>, // int
}

The RangeVar struct only refers to one other ast enum or struct, AliasWrapper, which in turn contains an Alias. But the SelectStmt is much more complex⁶:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54


pub struct SelectStmt {
 // NULL, list of DISTINCT ON exprs, or
 // lcons(NIL,NIL) for all (SELECT DISTINCT)
 #[serde(rename = "distinctClause")]
 pub distinct_clause: Option<List>, // List*
 // target for SELECT INTO
 #[serde(rename = "intoClause")]
 pub into_clause: Option<IntoClauseWrapper>, // IntoClause*
 // the target list (of ResTarget)
 #[serde(rename = "targetList")]
 pub target_list: Option<List>, // List*
 // the FROM clause
 #[serde(rename = "fromClause")]
 pub from_clause: Option<List>, // List*
 // WHERE qualification
 #[serde(rename = "whereClause")]
 pub where_clause: Option<Box<Node>>, // Node*
 // GROUP BY clauses
 #[serde(rename = "groupClause")]
 pub group_clause: Option<List>, // List*
 // HAVING conditional-expression
 #[serde(rename = "havingClause")]
 pub having_clause: Option<Box<Node>>, // Node*
 // WINDOW window_name AS (...), ...
 #[serde(rename = "windowClause")]
 pub window_clause: Option<List>, // List*
 // untransformed list of expression lists
 #[serde(rename = "valuesLists")]
 pub values_lists: Option<Vec<List>>, // List*
 // sort clause (a list of SortBy's)
 #[serde(rename = "sortClause")]
 pub sort_clause: Option<Vec<SortByWrapper>>, // List*
 // # of result tuples to skip
 #[serde(rename = "limitOffset")]
 pub limit_offset: Option<Box<Node>>, // Node*
 // # of result tuples to return
 #[serde(rename = "limitCount")]
 pub limit_count: Option<Box<Node>>, // Node*
 // FOR UPDATE (list of LockingClause's)
 #[serde(rename = "lockingClause")]
 pub locking_clause: Option<Vec<LockingClauseWrapper>>, // List*
 // WITH clause
 #[serde(rename = "withClause")]
 pub with_clause: Option<WithClauseWrapper>, // WithClause*
 // type of set op
 pub op: Option<SetOperation>, // SetOperation
 // ALL specified?
 #[serde(default)]
 pub all: bool, // bool
 // left child
 pub larg: Option<Box<SelectStmtWrapper>>, // SelectStmt*
 // right child
 pub rarg: Option<Box<SelectStmtWrapper>>, // SelectStmt*
}

Besides having many more fields than RangeVar, most of those fields are other types of ast nodes. In fact, many of these are a boxed Node or a List, which is a Vec<Node>. One field, values_lists, is a Vec<List>! A Node is an enum which can be any AST node⁷.

So how exactly do we produce the various SelectStmt structs that we’d want to feed into format_select_stmt for testing? The structs themselves have no constructors. The parser only generates them based on the results of parsing, which is done in C code for which there is no public API. That C code produces a JSON representation of the AST, which we deserialize into structs.

So the only way left to do this is to construct them “by hand” with code like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29


fn make_range_var(
 c: Option<&str>, s: Option<&str>, r: &str, a: Option<&str>,
) -> Node {
 let alias = match a {
 Some(a) => Some(AliasWrapper::Alias(Alias {
 aliasname: a.to_string(),
 colnames: None,
 })),
 None => None,
 };
 let catalogname = match c {
 Some(c) => Some(c.to_string()),
 None => None,
 };
 let schemaname = match s {
 Some(s) => Some(s.to_string()),
 None => None,
 };

 Node::RangeVar(RangeVar {
 catalogname,
 schemaname,
 relname: r.to_string(),
 inh: false,
 relpersistence: None,
 alias,
 location: None,
 })
}

This isn’t entirely terrible, but as I noted before, the RangeVar struct is one of the simpler structs in the AST. The equivalent for a SelectStmt would be absolutely enormous. Even for a RangeVar, constantly having to pass mostly None for the arguments gets old very fast. To make this simpler, I made a macro:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25


macro_rules! range_var {
 ( $relname:literal $(,)? ) => {
 make_range_var(None, None, $relname, None)
 };
 ( $relname:literal AS $alias:literal ) => {
 make_range_var(None, None, $relname, Some($alias))
 };
 ( $catalogname:literal, $relname:literal $(,)? ) => {
 make_range_var(None, Some($catalogname), $relname, None)
 };
 ( $catalogname:literal, $relname:literal AS $alias:literal ) => {
 make_range_var(None, Some($catalogname), $relname, Some($alias))
 };
 ( $schemaname:literal, $catalogname:literal, $relname:literal $(,)? ) => {
 make_range_var(Some($schemaname), Some($catalogname), $relname, None)
 };
 ( $schemaname:literal, $catalogname:literal, $relname:literal AS $alias:literal ) => {
 make_range_var(
 Some($schemaname),
 Some($catalogname),
 $relname,
 Some($alias),
 )
 };
}

This covers every possible combination of optional arguments, so I could write range_var!("people") or range_var!("people" AS "persons") or range_var!("some_schema", "people") and so on.

This made the tests more concise, but the equivalent macro implementation for a SelectStmt might be hundreds or even thousands of lines long.

Giving Up on Unit Tests

I quickly realized that this approach simply wouldn’t scale. The structs that the formatter deals with are so complex, there are so many of them, and each struct has so many possible variations of optional fields, types of contained nodes, etc.

With my macro plus function approach, I’d have to write tens of thousands of lines of code in support of these unit tests. And that code would itself be so complex that it would really demand its own test suite!

But that’s not even the biggest problem. The biggest problem is that because these AST structs are produced by the Postgres parser C code, I really have no idea what all the possibilities for a given struct are. Given that many structs simply contain Node or List structs, the possibilities are literally limitless.

So in summary:

Writing struct generation code would be much more work than writing the formatter.
The struct generation code would be so complex that it’d require its own test suite.
There’s no good way for me to know what structs to generate for tests without lots of real-world examples.

That last bullet point mentions “lots of real-world examples”. That’s what I needed. So how to get them? I could scour my own projects for examples, though in many cases I use an ORM, so it’s not a trivial matter of just copying some SQL. I could look for projects on GitHub that use Postgres. Maybe I could look on Stack Overflow.

But finally, I had a good idea⁸. The Postgres documentation is quite extensive, and it includes many SQL examples! If I could extract those examples then those examples could form the basis of a test suite.

Integration Tests to the Rescue

Taking a step back, I realized that what I really wanted to test was the formatter as a whole. While unit tests are great, I could greatly simplify my test code by simply comparing SQL statement input to SQL statement output. Any time my output diverged from what I expected, that’s a bug in the formatter (or in my expectations). And if the formatter panics because it can’t handle a particular node⁹, that’s a missing piece of the implementation.

This was yet another case where Perl came in handy. I wrote a quick script to parse the entire documentation tree¹⁰ and find programlisting elements which contained SQL. These are then put into files named after the doc file that contained them, giving me files with content like this:

1
2
3
4
5
6
7


++++
1
----
CREATE TABLE base_table (id int, ts timestamptz)
----
???
----

The first section, 1, is a test description, to be filled in later by me. The second section is the input, and the third, ???, is the expected output. These generated files are useful seeds for test cases, though I have to fill in the test name and expected output. An example from an actual test looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


++++
SELECT FOR UPDATE in subselect
----
SELECT * FROM (SELECT * FROM mytable FOR UPDATE) ss ORDER BY column1
----
SELECT *
FROM (
 SELECT *
 FROM mytable
 FOR UPDATE
 ) AS ss
ORDER BY column1
----

(I will be improving the formatting, because I don’t like the output the original version gives!)

These files are easy to edit, and adding new tests by hand is easy as well.

The test harness simply reads each file, splits them up into individual cases, and runs the input through the formatter and compares it to the output.

In order to make failures more understandable, I use prettydiff’s diff_lines function to compare the expected output to what I actually got. This is quite helpful, but in cases where they differ by whitespace, especially trailing newlines, it’s not as helpful as I’d like.

So I also added some optional debugging output (based on an env var) that shows me each escaped character in the expected versus actual input. This lets me easily see whitespace differences, and newlines are printed as \n, which makes extra newlines obvious as well.

In Summary

Sometimes integration tests are better than unit tests. In this case, focusing on integration tests freed me from a testing morass I was stuck in, letting me focus on the formatting code. The fact that I can generate a huge number of integration tests gives me some confidence that this approach will work.

I definitely won’t get 100% test coverage (which is basically impossible given the recursive nature of the fact that an AST Nodes can contain another Node). But I think I can use this approach to product a decent first release. Once people start using it, I will quickly get bug reports with more test cases.

Next up

Here’s a list of what I want to cover in future posts.

Diving into the Postgres grammar to understand the AST.
The benefits of Rust pattern-matching for working with ASTs.
How terrible my initial solution to generating SQL in the pretty printer is, and how I fixed it (once I actually fix it).
Who knows what else?

Try saying that three times fast. ↩︎
I’m referring to Rust modules here. Substitute package, library, etc. for your language of choice. ↩︎
The examples are from commit 71b6e24. ↩︎
These are the structs that I wrote about generating in Part 1. ↩︎
Having each function return a string directly was the wrong approach. In my new branch I’m having these functions return something else that is then formatted later. But more on that in a future blog post once I’ve solidified this new approach. ↩︎
You don’t need to read the entire struct definition. Just take note that this struct has a lot of fields and move on. ↩︎
In practice, these fields cannot really contain any node. For example, the where_clause field, which is a Box<Node>, is not going to contain an InsertStmt or CreateDomainStmt or an IntoClause. But this is what the underlying Postgres data structures use, and I have no easy way of knowing which subset of nodes a Node-typed field will contain. In some cases, I’ve actually figured this out through reading the Postgres parser code and my generator code overrides the field definition to use a simpler type. I plan to write about how I do this in a future blog post. ↩︎
This happens every once in a while. It’s always very exciting. ↩︎
There are so many node types that I didn’t even try to list them all in the initial format_node implementation’s match. Instead, I had a default match (_) that just returned an error saying that formatting for the given node type wasn’t implemented. ↩︎
The documentation files all have an sgml extension so I used an SGML parser, but just now I looked more closely and I think it’s mostly XML, but without a DTD in most files. Regardless, I ended up using an SGML parser in my Perl code. ↩︎

API Design and the Recent IP Address Module Issues

Sat, 03 Apr 2021 11:01:20 -0500

Earlier this week, I wrote about security issues in Perl IP address distros. I started thinking about why these issues showed up in so many distros, which got me thinking about how an API can make these types of problems harder or easier.

Specifically, I’d like to talk about Data::Validate::IP.

Let’s look at two functions exported by this module, is_ipv4 and is_private_ipv4.

On the surface, these sure look like they’re the same general thing. They take a string and return a boolean. But thinking about their semantics, these are very much not the same general thing.

The is_ipv4 function is validating that a string is an IPv4 address.

1
2
3


is_ipv4('1.2.3.4'); # true
is_ipv4('feed::3e'); # false
is_ipv4('not even an IP address'); # false

As an aside, it actually returns the string you give it for a true value, but that will always be treated as a true value by Perl.

And it also returns false when given 010.0.0.1. This is (maybe) technically incorrect, but as we saw from this week’s security issue, it’s probably better than returning true. If an attacker can somehow supply this IP address to an application, or if someone just makes a typo in a config file, this address can be treated as either 10.0.0.1 or 8.0.0.1, depending on the code in question.

This all seems great so far. So what’s the problem?

Well, let’s think about is_private_ipv4. Here’s what it returns for some inputs:

1
2
3
4
5


is_private_ipv4('10.0.0.1'); # true
is_private_ipv4('010.0.0.1'); # false
is_private_ipv4('1.2.3.4'); # false
is_private_ipv4('feed::3e'); # false
is_private_ipv4('not even an IP address'); # false

So what is this function doing? Well, it’s obviously doing IPv4 address validation, since it returns false for things like feed::3e or not even an IP address. But it’s also doing categorization, because it returns false for a valid IP address like 1.2.3.4, while 10.0.0.1, also a valid IPv4 address, returns true.

So this function does two things, validation and categorization, but the return value lumps these things together. You cannot tell by its return value whether the address was invalid or if it was valid but not private.

The is_public_ipv4 function has the same problem. It does both validation and categorization in one call.

This is a very subtle point, and it’s easy to miss when you’re using this module. It would be very easy to introduce a security issue with this code¹:

1
2
3
4
5


if ( !is_public_ipv4($some_addr) ) {
 send_private_data($some_addr);
} else {
 send_public_data($some_addr);
}

If is_public_ipv4 is given 010.0.0.1 it returns false, which means we send private data. So how should this be written? We need to validate first:

1
2
3
4
5
6
7
8


die "Invalid IPv4: $some_addr"
 unless is_ipv4($some_addr);

if ( !is_public_ipv4($some_addr) ) {
 send_private_data($some_addr);
} else {
 send_public_data($some_addr);
}

Perhaps elsewhere in this code we might want to call is_linklocal_ipv4() or is_loopback_ipv4($ip). But we need to remember to add an is_ipv4 check before every is_*_ipv4 call. Will we remember? Probably not.

While I’ve used this module for years, and I’ve even been its primary maintainer for some time, I didn’t think about the implications of its API until earlier this week!

So if the maintainer didn’t think about it, we can probably assume that most of its other users didn’t either.

What would a better API look like? We need to separate validation and categorization, and we need to force users to go through validation before doing categorization.

There are various ways to do this, but an OO interface makes this trivial:

1
2
3
4
5


if ( !IPv4->new($some_addr)->is_public ) {
 send_private_data($some_addr);
} else {
 send_public_data($some_addr);
}

If the IPv4->new call throws an exception on invalid data, then this code is perfectly safe². There is no way to use this API to categorize invalid data. So even the person who wrote this terrible logic (“if not public send private?” WTF?!) will be prevented from doing more damage.

Another approach would be to have is_private_ipv4 throw an exception if given invalid data. That way it has three “return values”, true (valid and private), false (valid but not private), and exception (invalid).

Data validation is important for correctness, and correctness is important for security. Don’t design APIs that put the validation burden on the user. Make it as hard as you can to do the wrong thing with your API³.

Yes, this code is bad, but that’s kind of the point. Is all the code you’ve ever worked with well thought out and clearly structured? Did it always handle all the corner cases properly? Was it always free from obvious logic errors? I can wait for you to stop laughing before we continue. ↩︎
At least it’s safe if every address that’s not public is private. This isn’t true for IPv4 (or IPv6), but sending private data to a link-local or loopback address is probably(?) okay. ↩︎
Though nothing can stop the truly clueless developer. Someone could still write this:
1 2 3 4 5

if ( !eval { IPv4->new($some_addr)->is_public } ) { send_private_data($some_addr); } else { send_public_data($some_addr); }
But if you write code that intentionally ignores exceptions that you should not ignore I give up. ↩︎

Security Issues in Perl IP Address distros

Mon, 29 Mar 2021 13:33:39 -0500

Edit on 2021-03-29 21:40(ish) UTC: Added Net-Subnet (appears unaffected) and reordered the details to match the list at the top of the post.

Edit on 2021-03-30 14:50(ish) UTC: Added Net-Works (appears unaffected).

Edit on 2021-03-30 15:40(ish) UTC: Added Net-CIDR (some functions are affected).

Edit on 2021-03-31 01:05(ish) UTC: Added Net-IPv4Addr (affected).

Edit on 2021-04-05 01:21(ish) UTC: Net-CIDR-Lite 0.22 contains a remediation.

Edit on 2021-04-05 19:30(ish) UTC: Net-IPAddress-Util 5.000 contains a remediation.

warning

TLDR: Some Perl modules for working with IP addresses and netmasks have bugs with potential security applications. See below for more details on the bug and which modules are affected.

Net-IPv4Addr: Affected.
Net-CIDR-Lite: Vulnerable before the 0.22 release. Upgrade now.
Net-Netmask: Vulnerable before the 2.00000 release. Upgrade now.
Net-IPAddress-Util: Vulnerable before the 5.000 release. Upgrade now.
Data-Validate-IP: Depends on exactly how it’s used. See below for details.
Net-CIDR: Depends on exactly how it’s used. See below for details.
Socket: Appears unaffected.
Net-DNS: Appears unaffected.
NetAddr-IP: Appears unaffected.
Net-Works: Appears unaffected.
Net-Subnet: Appears unaffected.
Net-Patricia: Appears unaffected.

Yesterday, a security issue with the NPM package netmask was published¹.

The issue itself is pretty straightforward. Your OS will allow an IP address like 010.0.0.1 or a netmask like 010.0.0.0/8. That 010 is treated as an octal number, not a base-10 number with a leading zero! That means that 010.0.0.1 is actually 8.0.0.1. We can confirm this with ping:

1
2


ping 010.0.0.1
PING 010.0.0.1 (8.0.0.1) 56(84) bytes of data.

But the NPM netmask package would treat this as 10.0.0.1. This confusion means that an application could be tricked into thinking a public IP - 8.0.0.1 - was part of a private subnet - 10.0.0.0/8. And conversely, you could trick it into thinking a private IP - 10.0.0.1 written as 012.0.0.1 - was part of a public subnet - 12.0.0.1.

This has security implications for any application that is trying to distinguish between public and private IP addresses or networks for access control, firewalling, etc.

As I was reading about this I checked out the Git repo for the netmask package. Its README says “This module is highly inspired by Perl Net::Netmask module.”

And at that point I realized that it was quite possible that this affected Perl code as well! So I started digging into this by looking at various CPAN modules for working with IP addresses, networks, and netmasks.

Here’s the current state of CPAN modules, ordered roughly by their position in The River of CPAN (which basically means how many modules depend on them).

`Net-IPv4Addr`

warning

This distribution is affected by this issue. In addition, this module is almost certainly no longer being maintained. Emails to the author bounce.

This distribution has 4 direct dependents and 12 total dependents.

1
2
3


perl -MNet::IPv4Addr=:all -E 'say $_ for ipv4_network("010.0.0.1")'
10.0.0.0
8

`Net-CIDR-Lite`

info

This distribution was vulnerable prior to its 0.22 release made on 2021-04-04. Thanks to Stig Palmquist for taking this distro over and releasing a fix!

This distribution has 24 direct dependents and 36 total dependents.

1
2
3


perl -MNet::CIDR::Lite -E 'my $c = Net::CIDR::Lite->new; $c->add("010.0.0.0/8"); say $_ for $c->list_range'
Can't determine ip format at /home/autarch/.perlbrew/libs/perl-5.30.1@dev/lib/perl5/Net/CIDR/Lite.pm line 38.
 Net::CIDR::Lite::add(Net::CIDR::Lite=HASH(0x55fe55ade740), "010.0.0.0/8") called at -e line 1

`Net-Netmask`

info

This distribution was vulnerable prior to its 2.0000 release earlier today. Great job on the quick response, Joelle Maslak!

This distribution has 22 direct dependents and 30 total dependents.

So for versions before 2.0000 we see this:

1
2


perl -MNet::Netmask -E 'say defined Net::Netmask->new2(q{010.0.0.0/8}) ? 1 : 0'
0

Note the use of the new2 constructor. The old new constructor cannot be changed to return undef for backwards compatibility reasons. Fortunately, it’s probably not vulnerable in any exploitable way, as it returns a 0-length subnet:

1
2


perl -MNet::Netmask -E 'say Net::Netmask->new(q{010.0.0.0/8})'
0.0.0.0/0

`Net-IPAddress-Util`

info

This distribution was vulnerable prior to its 5.000 release made on 2021-04-04. Thanks to Paul W Bennett for the fix!

This distribution has no dependents.

1
2


perl -MNet::IPAddress::Util=IP -E 'say IP(q{010.0.0.1})'
8.0.0.1

`Data-Validate-IP`

info

This distribution doesn’t misparse octal numbers, but you could be affected depending on exactly how your code uses this distro. See below for details.

This distribution has 21 direct dependents and 60 total dependents.

This distribution returns false for any is_*_ipv4 method that includes an octal number. So both is_private_ipv4('010.0.0.1') and is_public_ipv4('010.0.0.1') return false. Depending on how you’re using this module, it’s possible that this could lead to bugs, including bugs with security implications.

I updated the documentation to explicitly recommend that you always call is_ipv4() in addition to calling a method like is_private_ipv4(). The is_ipv4() method will always return false for IP addresses with octal numbers.

While this isn’t strictly POSIX-correct, this seems like the safest behavior for a module like this. It’s better to be too strict if this eliminates a potential footgun.

If you are using this distribution, I highly encourage you to audit your use of it in a security context!

`Net-CIDR`

info

This distribution is affected, but it has a function to validate CIDR strings that you should use before calling any other functions.

This distribution has 17 direct dependents and 25 total dependents.

The distribution provides a number of functions for working with networks and IP addresses. Most of these are not affected. However, two are:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


perl -MNet::CIDR -E 'say for Net::CIDR::addr2cidr("010.0.0.1")'
010.0.0.1/32
010.0.0.0/31
...
010.0.0.0/8
10.0.0.0/7
8.0.0.0/6
...

perl -MNet::CIDR -E 'say Net::CIDR::cidrlookup("10.0.0.1", "010.0.0.0/8")'
1

However, this distribution also contains a cidrvalidate function that will return false for any CIDR string with a leading 0 in an octet. The documentation explicitly tells you to use this before passing the data to other functions.

If you are using this distribution, I highly encourage you to audit your use of it in a security context!

`Socket`

note

This distribution appears to be unaffected by this issue.

This distribution has 275 direct dependents and 9,936 total dependents.

1
2
3
4
5


perl -MSocket -E 'say inet_ntoa(inet_aton(q{010.0.0.1}))'
8.0.0.1

perl -MSocket=inet_pton,inet_ntop,AF_INET -E 'say inet_ntop(AF_INET, inet_pton(AF_INET, q{010.0.0.1}))'
Bad address length for Socket::inet_ntop on AF_INET; got 0, should be 4 at -e line 1.

The inet_pton() function is just returning undef for this octal-formatted address.

`Net-DNS`

note

This distribution appears to be unaffected by this issue.

This distribution has 104 direct dependents and 561 total dependents.

If you try to resolve an IP address, it turns this into a reverse lookup, but it treats the IP as text:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


perl ./demo/perldig 010.0.0.1
;; Response received from 127.0.0.53 (40 octets)
;; HEADER SECTION
;; id = 6342
;; qr = 1 aa = 0 tc = 0 rd = 1 opcode = QUERY
;; ra = 1 z = 0 ad = 0 cd = 0 rcode = NXDOMAIN
;; qdcount = 1 ancount = 0 nscount = 0 arcount = 0
;; do = 0

;; QUESTION SECTION (1 record)
;; 1.0.0.010.in-addr.arpa. IN A

So it’s not a useful answer, but it’s not looking up the wrong address.

`NetAddr-IP`

note

This distribution appears to be unaffected by this issue.

This distribution has 36 direct dependents and 110 total dependents.

1
2


perl -MNetAddr::IP -E 'say NetAddr::IP->new(q{010.0.0.024})'
8.0.0.20/32

`Net-Works`

note

This distribution appears to be unaffected by this issue.

This distribution has 3 direct dependents and 7 total dependents.

1
2
3
4
5
6
7
8


perl -MNet::Works::Network -E 'say Net::Works::Network->new_from_string(string => q{010.0.0.1/8})'
010.0.0.1/8 is not a valid IP network at /home/autarch/.perlbrew/libs/perl-5.30.1@dev/lib/perl5/Net/Works/Network.pm line 120.
 Net::Works::Network::new_from_string("Net::Works::Network", "string", "010.0.0.1/8") called at -e line 1

perl -MNet::Works::Address -E 'say Net::Works::Address->new_from_string(string => q{010.0.0.1})'
010.0.0.1 is not a valid IPv6 address at /home/autarch/.perlbrew/libs/perl-5.30.1@dev/lib/perl5/Net/Works/Util.pm line 70.
 Net::Works::Util::_validate_ip_string("010.0.0.1", 6) called at /home/autarch/.perlbrew/libs/perl-5.30.1@dev/lib/perl5/Net/Works/Address.pm line 74
 Net::Works::Address::new_from_string("Net::Works::Address", "string", "010.0.0.1") called at -e line 1

Thanks to Stig Palmquist for checking this one and letting me know.

`Net-Subnet`

note

This distribution appears to be unaffected by this issue.

This distribution has 3 direct dependents and 7 total dependents.

1
2
3
4
5


perl -MNet::Subnet -E 'my $m = subnet_matcher(q{10.0.0.0/8}); say $m->(q{012.0.0.1}) ? 1 : 0'
1

perl -MNet::Subnet -E 'my $m = subnet_matcher(q{012.0.0.0/8}); say $m->(q{10.0.0.1}) ? 1 : 0'
1

`Net-Patricia`

note

This distribution appears to be unaffected by this issue.

This distribution has 1 direct dependent and 1 total dependent.

1
2
3
4
5


perl -MNet::Patricia -E 'my $p = Net::Patricia->new; $p->add_string("010.0.0.0/8"); say $p->match_string("8.0.0.1") ? 1 : 0'
1

perl -MNet::Patricia -E 'my $p = Net::Patricia->new; $p->add_string("8.0.0.0/8"); say $p->match_string("010.0.0.1") ? 1 : 0'
1

One of the most ridiculous URL paths I’ve ever seen. ↩︎

Down the Golang nil Rabbit Hole

Sat, 27 Mar 2021 14:36:28 -0500

Edit 2021-03-30: Jeremy Mikkola wrote about some closely related topics back in 2017.

Edit 2021-03-31: Chris Siebenmann wrote a response to this post that explains exactly how interface values that are nil are typed. It’s more complicated than I thought!

I’m not sure I have another Rust & Postgres blog post in me right now, so let’s learn something about Go instead.

Recently I decided I wanted to add a --unique flag to omegasort. Wait, what’s omegasort?

It’s a text file sorting tool that supports lots of different sorting methods. For example, in addition a standard text sort, it can sort numbered lines, date-prefixed lines, paths (including Windows paths with and without drive letters), IP addresses, and IP networks. It also supports Unicode locales, reverse sorting, and locale-aware case insensitive sorting.

I use it together with precious to sort things like .gitignore files, spellchecker allowlists, and things of that nature.

I realized that I really wanted a --unique flag for all of this. While I could just pipe its output to uniq on a *nix system, this doesn’t work so well on Windows. Plus with tools like precious it’s easier if I can use one binary for a given task. If I want to pipe things I have to put that in a shell script that precious calls.

But my rabbit hole experience didn’t happen with omegasort directly. Instead, it happened when I tried to add some integration tests.

While writing those integration tests, I was using github.com/houseabsolute/detest. This is a Golang package I created that offers a test assertion interface inspired by Test2-Suite in Perl.

For reference, here’s a Test2-Suite example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


use Test2::Suite;

object Subtest => sub {
 call name => 'TestsFor::Basic';
 call pass => T();
 call subevents => array {
 object Plan => sub {
 call max => 4;
 call trace => object {
 call package => 'Test::Class::Moose::Role::Executor';
 call subname => 'Test::Class::Moose::Util::context_do';
 };
 };
 ...
}

I think this is pretty self-explanatory, except for T(), which means “true”.

And here’s something like that in Go with detest:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31


import (
 "testing"
 "github.com/houseabsolute/detest/pkg/detest"
)

func TestSomething(t *testing.T) {
 d := detest.New(t)
 d.Is(
 someStruct,
 d.Struct(func(st *detest.StructTester) {
 st.Field("size", 43)
 st.Field("Name", "Douglas")
 d.Map(func(mt *detest.MapTester) {
 mt.Key("foo", d.Slice(func(st *detest.SliceTester) {
 st.Idx(0, d.Map(func(mt *detest.MapTester) {
 mt.Key("bar", d.Slice(func(st *detest.SliceTester) {
 st.Idx(1, "buz")
 st.Idx(2, "not quux")
 }))
 }))
 st.Idx(1, d.Map(func(mt *detest.MapTester) {
 mt.Key("nosuchkey", d.Slice(func(st *detest.SliceTester) {
 st.Idx(1, "buz")
 st.Idx(2, "not quux")
 }))
 }))
 }))
 })
 }),
 )
}

It’s not as nice as the Perl version because it gets quite verbose, but this was the closest I could come. Go’s type system, combined with a lack of syntactic flexibility, means a whole lot of func calls, braces, and parens.

Under the hood, this is implemented with a metric fork ton of runtime reflection using the stdlib’s reflect package. I don’t love this, but absent generics, there’s no other way to implement this sort of API except with code generation. And that codegen would have to be fed by a sort-of-Go language that was translated to real Go, which seems like a terrible idea.

Getting to the Darn Point

So while I was writing those omegasort integration tests using detest, I managed to find a whole lot of bugs in detest.

But the title says nil and I haven’t mentioned those yet.

So here’s a fun fact, Go has multiple “types” of nil. Specifically, there are both typed and untyped nil variables. This surprised me at first, but it makes sense when you think about it.

Let’s take this code¹:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23


package main

import (
 "fmt"
 "reflect"
)

func main() {
 v1 := reflect.ValueOf(nil)
 var uninit []int
 v2 := reflect.ValueOf(uninit)
 logValue("nil", v1)
 logValue("[]int", v2)
 fmt.Printf("[]int == nil? %v\n", uninit == nil)
}

func logValue(what string, v reflect.Value) {
 fmt.Printf("%s is valid? %v\n", what, v.IsValid())
 if v.IsValid() {
 fmt.Printf("%s is nil? %v\n", what, v.IsNil())
 fmt.Printf("%s type = %v\n", what, v.Type())
 }
}

This prints out the following:

1
2
3
4
5


nil is valid? false
[]int is valid? true
[]int is nil? true
[]int type = []int
[]int == nil? true

So a bare nil and a variable that has a type but no value are equal, but if you try to get a reflect.Value for nil, it’s not valid. If you try to call other methods like v.IsNil() or v.Type() on an invalid² reflect.Value, you will get a panic.

I encountered this when trying to test that an error returned by a func call was nil.

This led to a flurry of detest releases as I realized how many parts of the detest code this impacted. In most places where it uses reflect, I have to guard against a bare nil being passed in.

But wait, it gets even more confusing. Sometimes the Go compiler will turn an untyped nil into a typed nil. Here’s an example³:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


package main

import (
 "fmt"
 "reflect"
)

func main() {
 takesSlice("nil", nil)
 var uninit []int
 takesSlice("[]int", uninit)
}

func takesSlice(what string, s []int) {
 logValue(what, reflect.ValueOf(s))
}

func logValue(what string, v reflect.Value) {
 fmt.Printf("%s is valid? %v\n", what, v.IsValid())
 if v.IsValid() {
 fmt.Printf("%s is nil? %v\n", what, v.IsNil())
 fmt.Printf("%s type = %v\n", what, v.Type())
 }
}

And when we run it we get this:

1
2
3
4
5
6


nil is valid? true
nil is nil? true
nil type = []int
[]int is valid? true
[]int is nil? true
[]int type = []int

So when I pass a bare nil to takesSlice, it gets typed as whatever type the function’s signature says it should be.

But wait, it gets even more confusing yet again! Sometimes the Go compiler won’t turn an untyped nil into a typed nil. Here’s an example⁴:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


package main

import (
 "fmt"
 "reflect"
)

func main() {
 takesError("nil", nil)
 var uninit error
 takesError("error", uninit)
}

func takesError(what string, e error) {
 logValue(what, reflect.ValueOf(e))
}

func logValue(what string, v reflect.Value) {
 fmt.Printf("%s is valid? %v\n", what, v.IsValid())
 if v.IsValid() {
 fmt.Printf("%s is nil? %v\n", what, v.IsNil())
 fmt.Printf("%s type = %v\n", what, v.Type())
 }
}

If the type of the argument in the function signature is any type of interface, including interface{}, then the underlying value is still untyped and not valid. This … sort of makes sense? I think the way this works is that anything typed as an interface also has a real underlying type. So an error can be an errors.errorString or an exec.ExitError or a mypackage.DogError. But if we pass a bare nil or an uninitialized variable, there’s no underlying type.

This came up with detest when I wanted to test that I didn’t get an error from a call.

1
2


err := doThing()
d.Is(err, nil, "no error from doing a thing")

Under the hood, the signature for d.Is() uses interface{} for the two arguments being compared. So bare nil as the second argument will never be valid. And the first argument might be valid or it might not be. If doThing()’s return type is just error and it returns a nil, then the value in err has no type.

All of this led to a fair bit more code in the detest guts to handle this. For example, just because two variables don’t have the same type doesn’t mean they’re not equal (from Go’s perspective). A bare nil and an uninitialized slice are equal when compared with ==, which is what d.Is() emulates using reflect.

So there’s quite a few cases around one or both arguments being invalid that need handling. And there are MANY other methods with the same issues to consider, including things like d.Map() and d.Struct(), all of which should handle an invalid value properly.

What Does This Look Like in Other Languages?

Well, I don’t know that many other languages. In Perl this isn’t really a thing, because it has a pretty minimal type system. Perl’s undef can be coerced to lots of things, although under strict trying to use an undef in certain ways is an error, like writing this:

1
2


my $x;
say @{$x};

This will blow up with Can't use an undefined value as an ARRAY reference ... at line 2.

Rust (at least safe Rust⁵) doesn’t have any notion of nil or undefined values. Instead, you have the Option<T> type, which always has a type. For example⁶:

1
2
3
4
5


pub fn main() {
 let a: Option<String> = None;
 let b: Option<i32> = None;
 println!("a == b? {}", a == b);
}

This just won’t compile. While both a and b are None, they’re not the same type of None so you can’t just compare them with ==. The compiler says:

1
2
3
4
5
6
7
8


error[E0308]: mismatched types
 --> src/main.rs:4:33
 |
4 | println!("a == b? {}", a == b);
 | ^ expected struct `String`, found `i32`
 |
 = note: expected enum `Option<String>`
 found enum `Option<i32>`

By the way, aren’t these Rust compiler errors nice? The only other language I’ve seen with this type of extremely detailed compiler errors is Raku.

In Summary

It’s tempting to pick on Go and complain about it. I certainly do that a lot at work. But to be fair, this really isn’t an issue for most Go code. It’s only because I’m trying to do weird stuff with reflect that I’m learning about this internal weirdness. In day to day Go code, the compiler’s handling of various types of nil “just works” the way you’d expect it to. And being able to use a bare nil is quite handy.

But I still prefer how Rust does it, using a parameterized Option<T> type. That way I can easily check if something is None without any special cases. Everything is using the same type system, though that type system is much more complex than Go’s.

https://play.golang.org/p/Xo5hXUIw01U ↩︎
Note that an “invalid” value in the context of reflect is not invalid in the context of a Go program. You can use an invalid value everywhere you can use the corresponding valid but uninitialized nil value. ↩︎
https://play.golang.org/p/HKQBiFCNINk ↩︎
https://play.golang.org/p/NMsi05CH8r3 ↩︎
I know very little about unsafe Rust which is why I’m hedging. ↩︎
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=677599a2ff660f57b51a31219f428312 ↩︎

Writing a Postgres SQL Pretty Printer in Rust: Part 1.5

Sun, 21 Mar 2021 13:19:05 -0500

Last week I wrote the first post in this series, where I introduced the project and wrote about generating Rust code for the parsed Postgres AST.

I also wrote about the need for wrapper enums in the generated code, but I don’t think I went into enough detail, based on questions and discussions I had after I shared that post in /r/rust.

So this week I will go into more detail on exactly why I had to do this.

Series Links

Part 1: Introduction to the project and generating Rust with Perl
Part 1.5: More about enum wrappers and serde’s externally tagged enum representation
Part 2: How I’m testing the pretty printer and how I generate tests from the Postgres docs

A Tagged Enum Example

I’ve made an example crate with all of the code I walk through below at https://github.com/autarch/tagged-enum-example.

In order to make this simpler, I’ll use some very simple JSON, as opposed to the rather complex JSON we get back from the Pg parser. However, I cannot change the JSON to make parsing easier, just like I cannot do that with the Pg parser’s output¹.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28


{
 "Root":{
 "first":{
 "Foo":{
 "size":42,
 "color":"blue"
 }
 },
 "second":{
 "Bar":{
 "mood":"indigo",
 "car":"Super"
 }
 },
 "actions":[
 {
 "Run":{
 "speed":84
 }
 },
 {
 "Sleep":{
 "hours":8
 }
 }
 ]
 }
}

I’ll use JSONPath to refer to parts of the document. You can see that every object in the JSON is “tagged” with its type. Those are the title case keys: $.Root, $.Root.first.Foo, $.Root.second.Bar, $.Root.actions[0].Run, and $.Root.actions[1].Sleep.

Let’s assume that the $.Root.second key is optional, so it could be entirely omitted in some documents.

The Naive Approach

Now let’s make some Rust structs that correspond to this JSON. This corresponds to the naive directory in my example repo.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25



#[derive(Debug, Deserialize)]
struct Root {
 first: Foo,
 second: Option<Bar>,
 actions: Vec<Action>,
}

#[derive(Debug, Deserialize)]
struct Foo {
 size: i8,
 color: String,
}

#[derive(Debug, Deserialize)]
struct Bar {
 mood: String,
 car: String,
}

#[derive(Debug, Deserialize)]
enum Action {
 Run { speed: i64 },
 Sleep { hours: i8 },
}

This is all pretty straightforward. We have a Root struct that can contain a Foo, an optional Bar, and zero or more Action structs.

And here’s our parsing code:

1
2
3
4


fn main() {
 let output: Root = serde_json::from_str(DOC).expect("parsed");
 println!("{:#?}", output);
}

So what happens when we run this?

We get this error:

1

... 'parsed: Error("missing field `first`", line: 29, column: 1)', ...

The important bit is "missing field `first`", line: 29, column: 1. What’s at line 29, column 1 of our JSON document? That’s the end of the document, actually.

So basically we’re seeing that the serde JSON parser looked through the entire top-level object for a first key but could not find one. That makes sense, since the top-level object in the actual document only contains a key named Root.

Fortunately, serde has a solution to this, in the form of its “externally tagged enum representation” handling. For this type of JSON, each object is annotated with an extra “tag” indicating its type, just like we see with $.Root and $.Root.first.Foo and so on.

But the key word here is “enum”. Serde does not offer a way to handle this style of JSON without using enums. So I need to make a bunch of enums, one for each possible tag.

The So Many Enums Approach

This corresponds to the with-enums directory in my example repo.

And here are our structs and enums:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40



#[derive(Debug, Deserialize)]
enum RootWrapper {
 Root(Root),
}

#[derive(Debug, Deserialize)]
struct Root {
 first: FooWrapper,
 second: Option<BarWrapper>,
 actions: Vec<Action>,
}

#[derive(Debug, Deserialize)]
enum FooWrapper {
 Foo(Foo),
}

#[derive(Debug, Deserialize)]
struct Foo {
 size: i8,
 color: String,
}

#[derive(Debug, Deserialize)]
enum BarWrapper {
 Bar(Bar),
}

#[derive(Debug, Deserialize)]
struct Bar {
 mood: String,
 car: String,
}

#[derive(Debug, Deserialize)]
enum Action {
 Run { speed: i64 },
 Sleep { hours: i8 },
}

And our main() is:

1
2
3
4


fn main() {
 let output: RootWrapper = serde_json::from_str(DOC).expect("parsed");
 println!("{:#?}", output);
}

Note that the type of output is now RootWrapper instead of Root. This runs without an error, giving us:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26


Root(
 Root {
 first: Foo(
 Foo {
 size: 42,
 color: "blue",
 },
 ),
 second: Some(
 Bar(
 Bar {
 mood: "indigo",
 car: "Super",
 },
 ),
 ),
 actions: [
 Run {
 speed: 84,
 },
 Sleep {
 hours: 8,
 },
 ],
 },
)

Yay, it works! But it has tons of pointless enums. Boo!

The enums generally clutter up the code with a lot of destructuring. For example, if I want to get the struct corresponding to $.Root.first.Foo, I have to write this:

1
2


 let RootWrapper::Root(root) = output;
 let FooWrapper::Foo(foo) = root.first;

In my Pg formatting code, multiply that destructuring by a thousand.

There must be some way out of here

When I shared this in /r/rust last week, /u/nicoburns had some helpful suggestions for working around this. We went back and forth a bit and I was able to get something that worked a little bit. But it only worked for simple cases. I couldn’t get it to work for cases like Option<Bar> or Vec<Action>. And in the Pg parser AST, I also end up with Option<Vec<Something>> too, as well as cases with tuple structs like Vec<(Foo, Bar)> and probably some other weird things too.

What I would love is a solution that changes the code generated by the serde macros to just “skip over” the tag instead of creating an enum for it when the enum only has one variant.

A solution that still requires the wrappers and even more generated code for them would be fine, though I suspect it’d make the AST code’s slow compilation even slower.

I started digging into serde a bit to try to understand how I might do this, but it’s pretty complex, and I’m still pretty new to Rust.

For now, I have enough other things to work on with this project. For example, the way I generate formatted SQL is horrific and unscalable (lots of inline some_str.push_str("WHERE ") and format!). I’m starting on a refactor to generate some sort of intermediate representation of the AST that I can then turn into a string.

Next up

Here’s a list of what I want to cover in future posts.

Diving into the Postgres grammar to understand the AST.
How I’m approaching tests for this project, and how I generate test cases from the Postgres documentation.
The benefits of Rust pattern-matching for working with ASTs.
How terrible my initial solution to generating SQL in the pretty printer is, and how I fixed it (once I actually fix it).
How the proc macro in the bitflags_serde_int crate works².
Who knows what else?

Stay tuned for more posts in the future.

Ok, technically I could do that, but that would involve parsing the JSON and rewriting it in order to … make it easier to parse? ↩︎
Edit 2021-04-24: Nope, not gonna write about this. It turns out I was reimplementing the already existing #[serde(transparent)] feature. ↩︎

Writing a Postgres SQL Pretty Printer in Rust: Part 1

Sun, 14 Mar 2021 12:48:04 -0500

This is the first of a planned series of blog posts about my pg-pretty project. I’ll cover some things I’ve learned about Rust and Postgres SQL, as well as some things I still don’t know.

Series Links

Part 1: Introduction to the project and generating Rust with Perl
Part 1.5: More about enum wrappers and Serde’s externally tagged enum representation
Part 2: How I’m testing the pretty printer and how I generate tests from the Postgres docs

Why?

I really, really, really, really cannot stand unformatted code, or a mishmash of code styles throughout a codebase. But at the same time, rejecting PRs from other developers at $WORK just because of code formatting is not okay. Making them manually fiddle with formatting is not a good use of their time (or mine).

This is why we have linters, tidiers, and meta code quality tools like my precious.

Combine these with a commit hook and CI checks for code cleanliness, and I never have to reject a PR for formatting. Instead, it gets auto-“rejected” by git commit or CI, and I’m off the hook.

And besides the value of not annoying me, there is also value to enforcing code formatting rules throughout a large codebase. Consistency eliminates a potential distraction, because every Go, Perl, or Python file in the codebase will look like every other Go, Perl, or Python file.

SQL is code, so it sure would be nice to do the same thing there, but I can’t. There are a few SQL pretty printing tools that I’ve found, but none of them handle Postgres-specific idioms.

So of course I should write one!

And I should write one in Rust! Of course?¹

Where to Start?

Writing a Postgres SQL parser from scratch would be quite painful². Fortunately, a lot of the hard lower level work has already been done.

At the very lowest level we have libpg_query, created by Lukas Fittl. This is a project to rip the parser out of the Postgres source tree and turn it into a C library. It’s a shame that the Postgres source is not already organized this way. But I imagine that the parser started off as an integral part of the Postgres codebase, and by the time anyone thought of extracting it, it was more work than anyone wanted to take on.

The next step is to create a Rust wrapper around this C library. Luckily that was already done too. I’m using libpg_query-sys, which is a bare bones wrapper around the C library. It exposes the same types and functions as the C library, but in Rust.

From C to Rust

These underlying tools work by parsing a string containing Postgres SQL and returning a string containing JSON. That JSON represents the AST (Abstract Syntax Tree) of the parsed SQL.

But to actually do anything with that AST, you want native Rust structs, not a giant JSON blob.

And that’s where my work started.

The libpg_query source has a handy directory containing JSON files describing various parts of the AST. For example, the nodestypes.json file defines all of the possible nodes. Many parts of the AST reference the Node type, which is basically “any valid bit of SQL”.

But the most important file is struct_defs.json. This file defines all the data structures we might care about, providing the name, fields, and field types for each struct.

Rust is a statically typed language, so we can’t just parse this stuff at runtime and generate structs in memory. Instead, we need codegen. And since these struct definitions reference C types, we need to translate this all into Rust!

Generating Rust

Enter my totally not-a-hacked-up-mess json-to-parser.pl script.

For each C struct that we care about we generate a corresponding Rust struct³. This mostly means translating from C types to Rust types. To make things extra fun, I try to make the types more specific wherever I can. There are a number of places where the C struct just uses Node*, but in reality only a limited subset of nodes are valid.

I’ve figured this out a couple ways. Sometimes, the comment for the field (which is in the struct_defs.json file) actually tells me. For example, many comments include the text “list of Value strings”, which means it’s a list of strings. For whatever reason, the Postgres C code just uses List* (an array of Node*) here instead of String*⁴.

As an aside, I turn all the comments in the struct_defs.json file into Rust documentation comments in the generated code, which has been quite helpful. This lets me read the generated AST code and get a pretty good understanding of what each struct and field contains.

But in Rust, we really want to know what our possible types are. That’s because I’m using Rust’s enum-based pattern matching. The Node enum has over 100 variants. That’s a lot of matching!

I also need to generate enum wrappers around many structs. Any time a struct references another struct, I need the wrapper indirection. So for example, here’s a little bit of the DeleteStmt struct:

1
2
3
4
5
6
7


#[skip_serializing_none]
#[derive(Debug, Deserialize, PartialEq)]
pub struct DeleteStmt {
 // relation to delete from
 pub relation: RangeVarWrapper, // RangeVar*
 // ... more fields ...
}

The relation field is going to contain a RangeVarWrapper, which is a one-variant enum that looks like this:

1
2
3
4


#[derive(Debug, Deserialize, PartialEq)]
pub enum RangeVarWrapper {
 RangeVar(RangeVar),
}

Why the Wrapper?

The wrappers are annoying, and I’d like to get rid of them, but I can’t figure out how!

Let’s take a very simple DELETE statement and parse it:

1

DELETE FROM films

The parser gives us this (with some outer bits removed for simplicity):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


{
 "DeleteStmt": {
 "relation": {
 "RangeVar": {
 "inh": true,
 "location": 12,
 "relname": "films",
 "relpersistence": "p"
 }
 }
 }
}

There’s a lot to look at there, so let’s zoom in on one part:

1
2
3
4
5
6
7


{
 "DeleteStmt": {
 "relation": {
 "RangeVar": {...}
 }
 }
}

We need to deserialize this into a Rust struct. For deserialization in Rust I’m using serde, which is a powerful Rust framework for deserialization that supports many data formats, including JSON.

The particular structure of the JSON above corresponds to what the serde docs call “the externally tagged enum representation”. In this format, the “tags” such as DeleteStmt and RangeVar are used to indicate which enum variant to deserialize to. A variant of what? Well, that’s the problem.

As far as I can tell, the only way to make this work is to make an enum wrapper for every single struct which might be contained in any other struct. So for the RangeVar struct I need this wrapper:

1
2
3
4


#[derive(Debug, Deserialize, PartialEq)]
pub enum RangeVarWrapper {
 RangeVar(RangeVar),
}

And then when I’m working with the delete statement, I need to pattern match RangeVar struct out of the DeleteStmt:

1
2
3
4


fn format_delete_stmt(&mut self, d: &DeleteStmt) -> R {
 let RangeVarWrapper::RangeVar(r) = &d.relation;
 // .. do something with the RangeVar in r
}

I really don’t like this pattern, but from my reading so far I haven’t seen a simple way to eliminate it. I think the only way to do this would be to provide custom serde deserialization logic for every struct which contains another struct.

This is absolutely possible, but I’ve avoided this so far in order to focus on other aspects of the project. But I want to come back to this in the future, because these wrappers require a lot of extra pattern matching in the formatter code.

So …

So that’s why I need a Perl script to generate Rust code, though I can think of at least a couple other approaches.

One would be to rewrite the Perl in Rust. That would work, but the Perl script is already fast. The naive Rust approach would probably be slower, since I would have to re-compile the Rust generator code every time I changed it, though I could ameliorate that by moving some data to config files. But Perl is a great language for reading JSON and generating code.

Another, almost certainly terrible option, would be to write one or more macros that could read the JSON source data and generate the Rust code directly. I’m fairly sure this is possible with procedural macros. A procedural macro looks like a function call or an attribute when you use it. The implementation is just regular Rust code that takes either its “function” arguments as input, or the thing that they are an attribute of (a type, struct field, etc.). Either way, the macro implementation returns a new AST of Rust code that is effectively inlined in place of the macro.

Procedural macros are incredibly powerful, and I ~~wrote one to change how bitflags are serialized so that serde expects these flags to be integers during deserialization, rather than expecting a JSON object like { "bits": 42 }~~ later realized that serde already did what I needed, so I didn’t need to write that wrapper. The bitflags crate itself is a proc macro, ~~so it’s macros all the way down~~.

But a procedural macro that parses arbitrary JSON files to generate Rust code seems a bit gross⁵. And right now I find myself constantly referring to the generated SQL AST structs. Having those available as regular Rust code that I can examine in my editor is very helpful.

Can You Try it Out?

Err, sort of. If you want to give it a whirl you can clone the repo, then edit the contents of cli/src/main.rs, which has some SQL to be formatted in it. But I haven’t actually built a proper CLI for it yet. I’ve just been focused on the core formatting implementation, which I exercise through its test suite.

Coming Soon

This post is already quite long, but there are many other things I’ve learned while working on this project that I plan to write about, including:

Diving into the Postgres grammar to understand the AST.
How I’m approaching tests for this project, and how I generate test cases from the Postgres documentation.
The benefits of Rust pattern-matching for working with ASTs.
How terrible my solution to generating SQL in the pretty printer is, and how I wonder if there’s a better way to do this.
How the proc macro in the bitflags_serde_int crate works⁶.
Who knows what else?

Rust turned out to be a great fit for this project. More on that in a future post. ↩︎
That’s an understatement. It would be a mammoth project of its own. ↩︎
I figured out what to care about by a combination of trial and error and experimentation. The struct_defs.json file organizes structs based on what files they’re defined in. I was able to determine that (so far) I only care about types from a small subset of these files. ↩︎
Probably because if you’re writing the parser and the thing that consumes it at the same time, you can write code that knows that it’s only a String. Also, C doesn’t have Rust’s exhaustive pattern matching, so you’re not forced to deal with all possible Node types. ↩︎
More than a bit. Really, really gross. ↩︎
Edit 2021-04-24: Nope, not gonna write about this. It turns out I was reimplementing the already existing #[serde(transparent)] feature. ↩︎

2020 Predictions Reviewed

Fri, 29 Jan 2021 16:48:56 -0600

Last year in May I made some predictions. Now it’s time to find out how I did!

Summary

The summary is I was wrong. A lot. This should be no surprise.

First, let’s take a look at my overall accuracy:

The source data for this chart is a spreadsheet I made for my 2020 predictions. Overall, this should be fairly understandable but there’s one nuance that needs some explaining. In order to make the chart simpler, I converted any prediction for less than 50% to its inverse and graphed that.

Here’s an example. I estimated a 10% likelihood that “a vaccine is generally available by end of 2020”. But another way to think of it is that I estimated a 90% likelihood that a vaccine is not generally available by the end of 2020.

If my predictions were perfect, then 60% of my predictions of 60% likelihood would have happened, 70% of 70%, and so on. But I wasn’t even close! Only 33% of my 60% predictions happened. I did even worse at 70% (33% happened), improved a little at 80% (50%), and redeemed myself a little at 90% (80%) and 95% (100%).

The accuracy trendline goes in the right direction, but it’s way too steep. I clearly have work to do on becoming an expert prognosticator.

Deep Dive

Let’s look at each of my predictions in detail. Only the text in bold is new. The rest is copied from my original predictions post.

Politics and Economy

Joe Biden is the Democratic nominee come November voting: 95%
- Yup.
Trump is the Republican nominee: 95%
- Yup.
November election proceeds normally: 95%
- By “normally” I simply mean that the election occurs on the scheduled day in all 50 states. I think states may also expand voting by mail, but no state will cancel or postpone the election. I do expect some states to use the pandemic as an excuse for even more voter suppression, but sadly that fits the definition of “normal” in the US.
- Yup. While shit went crazy after the election, the election itself was close enough to normal that I will count myself correct on this one.
Trump wins the November election: 60%
- Nope. I should have counted on Trump’s incredible incompetence at managing both coronavirus and the economic downturn that it caused. I think this is what cost him the election.
Democrats maintain control of the House: 80%
- Yup.
Republicans maintain control of the Senate: 70%
- This is mostly based on which states are having senatorial elections this year. Nearly all of them are Republican strongholds.
- Nope. But I stand by my 70%. This was incredibly close. I didn’t predict that Trump would completely lose his mind and engage in a series of escalatingly insane attempts to overturn the election, which I think ended up tipping the Georgia Senate votes to the Democrats.
US GDP for 2020 is down by at least 30% year over year compared to 2019: 80%
- Nope. Looking at one source, it shows the US economy as down 3.5% for the year. Thinking back to this, I didn’t actually do any research on past annual GDP changes. If I had, I would have realized that 30% in any direction is utterly ridiculous. On the site I linked before, the biggest change was 18.9% in 1942, in the middle of WW2. Was coronavirus a bigger event than WW2? No, it definitely was not.

Coronavirus

Minnesota lifts stay at home order but then reinstates it at least once: 90%
- Nope. I was tempted to give myself a “Yup” here, but we didn’t get a second stay-at-home order in December. We got something that was fairly similar, but not quite the same.
Three months or more of cumulative stay at home in Minnesota: 80%
- Nope. I didn’t realize the level of insanity that would arise around restrictions, which made it politically impossible for any governor to be this strict, regardless of whether the circumstances warranted it. But if I thought about this a bit more deeply I think I would’ve realized how unlikely this was.
Five months or more of cumulative stay at home in Minnesota: 20%
- Yup.
Over 100,000 in the US dead from coronavirus: 90%
- Yup.
Over 250,000 in the US dead from coronavirus: 40%
- Nope, more than 250,000 died. Sigh.
A vaccine is generally available by end of 2020: 10%
- By “generally available” I mean that it’s not in clinical trials, that it’s available in sufficient quantities for use as needed, and that it is safe to be given to at least 90% of the population.
- Yup, no vaccine was generally available by the end of 2020. But good job vaccine makers for nearly making me wrong here!
A generally available therapy exists that reduces mortality by 30% or greater: 60%
- This can be either be preventative or something that reduces severity of the symptoms.
- WTF? How did I think I could possibly figure this out? Well, I just tried and I can’t. Looking at Our World in Data, it seems like the case fatality rate went from a peak of 6.2% to 1.7%, but I have no idea why. I didn’t factor this prediction into my accuracy chart because it’s not really assessable for accuracy.
I have had coronavirus: 40%
- This is based on either a test or my best guess based on symptoms and contacts.
- Nope, I haven’t had it? I have had both an antibody test (November 18, 2020) and a test for the virus (in mid-January). Both were negative. However, I, my wife, and two close friends all got sick with what felt like very bad cold symptoms at the end of February, 2020. Maybe we had COVID, maybe we didn’t. We’ll never know. But I counted this as a miss in my accuracy chart.
I am hospitalized for coronavirus: 5%
- Yup, I was not hospitalized.
At least one person I know dies of coronavirus: 60%
- This hinges a lot on the definition of “know”. Let’s say it’s people I’ve met in person more than once and I remembered how I know them when I learned about their death. Public figures and celebrities don’t count (not that I’ve met many).
- Nope. I know more than a few people who’ve had it, but no one I know personally has died. Glad to be wrong.
Fatality rate in the US is estimated at less than 1% in retrospect, excluding any newly developed treatments from #13: 70%
- Note that even a “low” fatality rate like this is still very dangerous when combined with rapid spread and no vaccine/immunity.
- Nope with a side of WTF. This is phrased so unclearly that I’m not sure how I would’ve handled the whole “excluding any newly developed treatments” thing. But that’s moot since the fatality rate is clearly above 1%.

Personal

I am still working for ActiveState: 95%
- Yup. And I still like working there.
I have attended at least one non-Perl conference (in person or online): 60%
- What “attending” means for an online conference will be on the honor system.
- Yup. I attended RustConf 2020 back in August of 2020. It was great.
I have released at least one CPAN module every month for 20 years: 80%
- Yup. I did this one.
My weight is 210 pounds or below and has been since November 1: 70%
- See I Weigh Way Less for more than you ever wanted to know about this topic. For reference I was at 216 last time I weighed myself.
- Yup. My weight has been around 201-204 most times that I’ve checked over the past few months. There were a couple of days where it dipped below 200, but that was very temporary. I’m pretty happy with where I’ve gotten to. I don’t know how I’d lose significantly more weight at this point without adopting a much more drastic diet change than I’m up for.
My weight is 200 pounds or below and has been since November 1: 10%
- Yup. My weight was not below 200 consistently.
I have climbed at least one climbing route (top rope or lead) rated 5.11(-/a) or higher: 60%
- This mostly reflects my prediction about access to climbing gyms and outdoor climbing this year. Absent coronavirus I’d have put this at 95%. Note I consider this to have happened even if I fall or rest during the climb and then continue. This is just about whether I can top out at all.
- Nope. I haven’t been back to the climbing gym since February of 2020. I did take a top rope anchors class, and I did some outdoor top rope and lead climbing, but not nearly enough to improve, and I haven’t been climbing in several months now. I will not be surprised if I’m not even able to do 5.10+ next time I actually go to the gym.
I have a bouldering wall in my garage: 80%
- Nope. My mother threw a monkey wrench into this one by dying last September. That set in motion a plan where my father moved here and we purchased a new duplex with him. My father is currently living in his part of it that is his, and we’re having major renovations done on our part with a plan to move in April or May. Once that started it was obvious there was no point in having a bouldering wall built in our current garage.
I maintain my three hours (ish) per week weight training schedule (modulo illness or injury that prevents me from doing so): 95%
- There’s a little leeway here. For example, yesterday I did 45 minutes instead of 60 because of some neck & shoulder issues, but I’d still count this week as a success.
- Yup. I’ve had some days/weeks off because of illness and injury, but overall I’ve stuck with my exercise plan. I’m probably doing about 150 to 165 minutes per week, which I will count as “three hours (ish)”.
I’m still vegan: 95%
- If I let myself score 100% this would definitely be 100%.
- Yup. In retrospect I’m not sure if this sort of prediction is worth putting on the list. I could also put things like “I won’t be hit by a meteor” or “I will go shopping for food”. While these are likely to be accurate predictions, they don’t really do anything to evaluate my prognositication skills. I think I just juked the stats with this prediction.
I spend time in Taiwan this year (whether on vacation or as a temporary move): 10%
- Yup. I did not go to Taiwan in 2020. We’re hoping to go later this year, though.

Parting Thoughts

I looked at the stats for my three categories and I didn’t see any huge difference in accuracy between them. The only obvious highlight is that things I predicted at 95% were the most accurate. But those were basically all sure things.

My main takeaways are as follows:

If my initial certainty on a prediction is lower than 90%, I probably need to do some research or deeper thought to improve my accuracy.
I should be careful to consider how I will actually figure out whether the predicted thing happened. There were a couple cases where I couldn’t really figure out how to do that this year.
I’m not a superforecoaster. But I would’ve predicted that to be the case, so maybe I am, at least when it comes to predicting my forecasting abilities.

Stay tuned for 2021 predictions. Maybe. Right now I don’t have as many ideas for things I might predict this year. Politics and coronavirus are (hopefully) going to be less interesting this year, and I’m not sure there are other really interesting fields I can even attempt to make predictions in.

What I did on my winter vacation

Sat, 09 Jan 2021 11:12:57 -0600

TLDR: Helped my father move. Then I shaved all the yaks. Fur everywhere. Very messy.

Because of the way holidays at ActiveState work, it’s very economical in terms of vacation days to take the last two weeks of the year off (Christmas and New year’s weeks). I had a fair bit of vacation left, so I decided to take the first week of the new year off as well, for a total of three weeks of vacation.

So what did I do with all that time?

Answer: Helped me father move, then wrote a lot of Rust and some Perl.

Helping My Father Move

My mother died at the beginning of September. She and my father had been living in Florida, but it didn’t make sense for my father to continue living there alone. My wife and I had been discussing moving to a new house in Minneapolis for a while, so my father suggested we find a place where we could all live. We were looking for either a duplex or a property with an ADU¹. We found a perfect odd duplex² very quickly. The sale closed on December 16. My father’s stuff from Florida arrived a few days later, and he moved in to his part of the duplex.

Unfortunately, because of COVID, we didn’t go down to Florida to help him prepare for the move. While he did sort through some of his stuff, his sorting could’ve been more aggressive, as he was moving from a large house with two people to a much smaller space with one person. The upshot is that he moved a ridiculous amount of stuff here, far more than could ever fit in his space. So we spent many hours going through it.

We still have a ridiculous number of boxes of stuff to give away in the main part of the house, along with an endless pile of cardboard and packing paper, but his unit in the duplex is looking great.

This made me even more enthusiastic about getting rid of tons of stuff before my wife and I move, which will happen later this year, after some renovations to our part of the duplex.

My Local COVID Tracker

I’ve posted about this a couple times already. I made an update to track more counties, as well as to add a graph showing the seven-day new case average per 10,000 people.

The new graph by population made it clear that the counties near me were doing worse than the state as a whole. The original graph, showing just the raw count of new cases, made it look like the state overall was worse than the nearby counties, when in fact I think the opposite is true.

Precious and My Yak Shaving Expedition

Precious is a project is a project I started to create a meta-linter/tidier in Rust. The goal is to replace TidyAll. I’ve written about TidyAll’s issues in the past.

Switching DateTime to use Precious

I’ve used precious for a few projects, including itself, but I wanted to try moving a Perl project to it. I picked DateTime for no particular reason, other than that it’s something I’ve worked on for many years.

This is what led me down the yak shaving rabbit hole, to mix some metaphors.

First, I found some bugs with precious and made a v0.0.7 release of it.

Then when I started working on converting DateTime from TidyAll to precious, I found several bugs in omegasort³, leading to a new release of that as well.

As I worked on switching DateTime to use precious, I realized there was a bootstrapping problem. With TidyAll, I can just specify TidyAll and any needed plugins as develop phase prereqs for the distribution, making it easy for others to install all the needed tools, like perltidy, perlcritic, and TidyAll itself.

But precious isn’t on CPAN, nor is omegasort. I did briefly consider going down the whole Alien route, but quickly discarded that idea. While depending on a notional Alien::precious would work fine for my Perl projects, it wouldn’t help for anything that’s not Perl.

Note that precious, being a Rust project, produces a single statically linked⁴ binary. Go programs like omegasort are totally static, not even linking libc. This will become more important later in my yak shaving journey.

Just One Little Installer

So my next thought was to build a simple installer for precious that could be run using the very safe “pipe curl output into an interpreter” strategy. I quickly wrote an installer in Perl that would live in the precious repo. I was using fatpack, which is a great tool for turning Perl programs into single-file executables, as long as all of its dependencies are pure Perl.

But then I started thinking about omegasort. Did I want to write a nearly identical program to live in the omegasort repo? And then do it again for my next Rust or Go project? Plus there are lots of other useful tools that fall in this “released as a single binary on GitHub” bucket, like ripgrep.

Just One(!) Installer

Then I had an idea. What if I wrote one installer for all these things? So I made a little Perl distribution called App-ugri, where “ugri” stood for Universal GitHub Release Installer. I actually finished that before I realized the critical flaw in my plan. Even though I could fatpack ugri into a single file so you could pipe curl into perl, what about Windows?

Then I remembered why I was creating this installer in the first place. It’s because of languages like Rust and Go that produce a single statically linked executable. The solution was pretty obvious. Write this installer in Rust.

ubi

“UBI” stands for Universal Binary Installer. I liked the pun here more than the original “ugri” name. Of course, by “universal binary installer” I mean it just installs single-file executables from GitHub project releases, so it’s not very universal (yet?).

Since I’d already written this once in Perl, writing a Rust version was mostly pretty easy. The only wrinkle was that I had to learn a little bit about async programming in Rust, because the GitHub API client I was using, octocrab, is async. This was something I’d been wanting to learn about anyway, so I welcomed the challenge.

There is a usable 0.0.2 release on the GitHub project’s releases page.

Of course, ubi has a bootstrapping problem. What do you install a universal binary installer with? You curl a script into sh, of course⁵!

That looks like this:

1
2
3


$> curl --silent --location \
 https://raw.githubusercontent.com/houseabsolute/ubi/master/bootstrap/bootstrap-ubi.sh |
 sh

That should work on Linux and macOS. But I would love some help with creating the equivalent PowerShell script for Windows. I also need to improve the release tooling to provide binaries for more systems.

Installing `precious` and `omegasort`

Once you have ubi installed, installing these tools is trivial:

1
2
3
4


$> ubi --project houseabsolute/precious --in ~/bin
$> ubi --project houseabsolute/omegasort --in ~/bin
# and for good measure
$> ubi --project BurntSushi/ripgrep --exe rg --in ~/bin

But wait, there’s more yak!

Along the way, I also ended up working on my dzil bundle to to make switching to precious easier. So now my bundle will:

Generate a precious.toml config file for any project which doesn’t yet have one, as long as there’s no tidyall.ini file either.
Generate an extended test that runs precious lint --all and makes sure there are no files that fail the linting checks.
Generate a simple dev-bin/install-xt-tool.sh script that installs ubi, then uses that to install precious and omegasort.
Generate a git hook script that uses precious, along with a git/setup.pl script to install that hook for the given repo.
Update the dist.ini to always include an authordep on the version of the bundle that is being run.

And since I was messing with all this, I added podchecker and podtidy to my standard precious config for Perl projects.

That last bullet point, about updating the dist.ini file, came out of some issues I found in trying to get precious tests passing in CI using my ci-perl-helpers tooling for Azure Pipelines. I’ve written about those in the past as well. I ended up making some improvements to the helpers, so now they’ll automatically run the dev-bin/install-xt-tools.sh script before running extended tests.

And when they’re building a tarball for any Perl distro where the name starts with “Dist-Zilla”, they make sure to include the git checkout’s lib directory in @INC when running dzil build. I assume that if you’re testing a dzil plugin or plugin bundle in CI, then you want to use said plugin or bundle when generating the tarball for said plugin or bundle⁶.

The results of all of this can be seen in my PR to switch DateTime to precious.

Other Random Bits

I tweaked a PR for the CLDR project (Common Locale Data Repository) to fix a typo in one language’s datetime info.
I submitted a PR to octocrab to update its dependencies so I could update the same deps in ubi.
I made a new release of DateTime-Locale to add some more documentation.
I almost started rewriting omegasort in Rust (just because I like it better than Go), but I managed to restrain myself. I might get back to this at some point, but I’m glad I worked on these other projects for now.
I wrote this blog post.

Putting the Yaks to Bed

I wrapped this all up yesterday, more or less. I return to work on Monday, so my timing was pretty good.

Accessory Dwelling Unit - think of an apartment built over a garage. ↩︎
It’s not a normal duplex. Instead of a house split into two pieces, it’s an older three story house, built in 1915, with a newer, much smaller two story “apartment” attached to the back. ↩︎
Because TidyAll is in Perl, sorting plugins for it are just simple Perl classes. But precious only invokes other executables, so I realized I needed a sorting tool soon after starting on precious. I wanted to write the sorting tool in Rust but at the time I started, Rust had no support for Unicode collation, so I wrote it in Go instead. ↩︎
Except for libc. ↩︎
But it can install itself if you already have it installed. ↩︎
Does that sentence make any sense? I’ve lost track. There’s too much yak fur in here! ↩︎

My New Rube Goldberg Machine

Sat, 19 Dec 2020 22:30:10 -0600

My last post was about my local COVID tracker tool. While it worked well, I found having to re-run the report.pl script every time I wanted an update annoying. Plus, I wanted to share this on Facebook, but I have non-technical friends who would not be able to run it for themselves.

So I decided to put up a hosted version, but I challenged myself. I wanted it to run entirely on someone else’s machines. And I didn’t want to pay for it.

So how to do it?

Well, the hosting is simple. I’ve been using Render for this blog, as well as my professional site and some other static sites¹. And while my COVID tracker does require updated data to stay relevant, the data is just a simple JSON file and the chart is generated entirely in the browser.

So the trick was to make the data file available in a way that let me deploy it with Render every time there are updates.

Enter my Rube Goldberg machine.

The data source I’m using, covid-api.com, only updates their data daily, so I only need to run this once a day to stay relevant. This sounds like a job for cron. But not on my desktop machine (even though that would be way simpler).

Instead, I used GitHub Actions. It supports scheduled jobs as well as running on every push to the repo. But the trick is to then make the data available after each run. And then the trickier trick is to get that data as part of running the deploy job on Render. Oh, and every time the GitHub Action runs, I want to have the Render site deploy again.

This turned out to be not that hard.

My GitHub Actions workflow runs the report.pl script, which generates a summary.json file. Then the workflow uploads that file as a build artifact. This is all incredibly trivial, and by using caching for both my Perl prereqs and the intermediate data files, I can make it quite fast. When the cache is warm, a run takes less than a minute. When it finishes, it hits a webhook provided by Render to trigger a deploy.

Of course, GitHub has an API for artifacts like the summary.json file. So all I need to do in the Render deploy script is use the API to find the latest artifact, then download that and deploy it along with my index.html and chart.js files. With a little experimentation, I was able to create a Bash script to do exactly that. I could have written this in Perl, but the combination of curl, jq, and zcat (artifact files are always zipped) actually made this much simpler to do in Bash than Perl². I had to use sed, which always seems weird when I know Perl, but doing this in Perl requires at least a few more characters.

The hardest part was figuring out how to securely store the Render webhook URL in GitHub and then access it in my workflow. I had to store a GitHub token in Render³ as well.

And so I present to you covid.urth.org.

Also, you might note that the chart has changed a little since last time. I made the past 7-day average line thicker and the daily numbers line thinner. The average is much more indicative of trends then the actual daily numbers, which jump around quite a bit.

See my previous writeup on moving all my sites to Render. ↩︎
This happens every once in a while. ↩︎
I guess I didn’t have to, but the GitHub limit on unauthenticated requests is so low that I figured it was best to use authentication instead. ↩︎

My Local COVID Stats Tracker

Sat, 12 Dec 2020 14:31:59 -0600

For many months now, I’ve been following the COVID stats in the Star Tribune, the local Minneapolis newspaper. There’s a lot of interesting info there, but it’s not really useful for reaching conclusions about the safety of various activities. The problem is that the data is either for the wrong-sized area or I can’t group together the bits I care about.

Most of the stats are state-wide. But I don’t care about the whole state. I live in Minneapolis, part of the Twin Cities metro area. We have more than half of the state’s population here. It’s the infection rate in that metro area that really matters to me, as opposed to the whole state. If COVID is under control in the Iron Range six hours north of me, that has very little impact on how risky going shopping at the local co-op is.

They also provide some county-level stats, but there’s no way to view a group of counties together in a historical graph. They also provide per-postal code stats. This is really useless. For example, one of the postal codes that’s most out of control is 56525 on the west side of the state. Their current case count is a very high 170 per 1,000 people. Except that postal code only contains 100 people in total. Insert facepalm here.

The zip code stats for my local area are arguably less useless, as they contain more than 100 people each, but it’s much too granular. I can’t click on dozens of dots and form a mental picture of local COVID prevalence and trends.

So I wrote my own hacky tool to do just that, using Perl and d3.js. I start by pulling data from covid-api.com, which allows to me get the daily infection stats by county. I separate out the four counties I care most about (mine plus a few neighbors). I also have a bucket for the entire state so I can compare these counties to the state as a whole. There is one line for each of these, lightly smoothed. Then it adds a thinner past 7-day average line for each of these as well, so I can get a better sense of the trends. I wish that this API provided infection rates per 1,000 people, but I can still see trends just by looking at daily new infections.

I save all this in a JSON file, and then use d3.js to make a reasonably pretty chart. The API calls are cached so I don’t beat up the server. Whenever I run the script it’ll download any missing days of data.

And here’s the end result:

The d3 code is super cargo culted, so I’m sure it’s terrible.

The code is on GitHub, of course. I think it’s usable for other people, though PRs to make it better are welcome. At some point I may try to add a mouseover handler to show the values for each date, like in this example.

The ActiveState Platform and Perl 5.32

Thu, 10 Dec 2020 15:16:00 -0600

Note: Technically, this post qualifies as paid promotion, because I work for ActiveState. But I volunteered to write about our new Platform and put it on my personal blog because I think what we’re doing is really cool and might be of interest to the Perl community at large.

TLDR

We have an entirely new system that supports Windows and Linux (macOS coming soon), providing you binary builds of the Perl core, Perl distros, and supporting C/C++ libraries¹.
When you use our State Tool², you can create any number of entirely self-contained virtual environments, one per project. This makes switching between projects trivial and these virtual environments are easily shared across a team or organization.
No more ActiveState Community License³! The only licenses that apply are the original licenses for each open source package we build for you.
You don’t need a Platform account to try this out. But you can play with our system and sign up at any time and keep all the work you’ve done so far.
It’s usually quite fast. If we’ve already built a particular distro/language core for the given platform, we use a cached version, so many builds take a few seconds. Entirely new builds are slower, but still faster than doing it by hand locally in many cases, because we distribute work throughout a build farm.
The core features are all free. Most features are free for public projects. We also have paid features including private projects, build engineering support, support for older platforms, indemnification, and more.
The Platform has lots of other cool features like revisioned projects, advanced dependency resolution, and more.

What Is It?

So what is “The ActiveState Platform”? We describe it as “multi-project, cross-platform package management for Perl 5.32”. But here’s my description for Perl people. It’s like perlbrew plus Carton on steroids, except not because it gives you binaries.

It’s cross-platform and easy for organizations to use across teams.

Besides Perl, we offer Python and Tcl (with our old licensing, for now), with other languages coming in the future.

But that’s still a mouthful, so instead, let’s dig into each of the features in detail.

Note that some of what I’m describing only applies to Perl 5.32 right now. We’re in the process of moving from using a legacy build tool⁴ to new tooling that’s much better. This new tooling lets us do faster parallel builds, as well as providing a better base for future features.

Package Management (with Versioning)

Because we give you binaries, the Platform is really more like Apt, RPM, Chocolatey, or Homebrew, not CPAN tooling like cpanminus or Carton.

You don’t need to figure out how to build that pain in the rear distro. Instead, we do all of the compilation and building on our side and give you the bits. This includes not just the Perl core and Perl module code (XS or pure Perl), but also C/C++ library dependencies, which are statically linked as needed⁵.

All of this is managed from the command line using our State Tool. The State Tool takes care of downloading your build and installing it locally. In addition, you can use it to add, remove, or change the Perl modules associated with your project, though we have a very usable web UI too.

One of the coolest features of our package management system is that it’s all versioned. Every change to your requirements creates a new “commit” in our system. This works a lot like any VCS. You can see your commit history, revert to an earlier state, etc. And we have work in progress to support branches, to be released in the future.

The set of packages associated with each commit is frozen in time, down to the binary level. I have more details about that later in this post.

Cross Platform (OS)

We support builds on Windows and Linux, with macOS in the works. The State Tool is entirely cross-platform as well.

One thing that we’re still working on is making it possible to have a multi-OS project with per-OS package additions/removals. Right now, if you have a project that builds on both Windows and Linux, you can only add Perl distros that work on both platforms. So for example if you tried to add Win32, you’d get an error saying this can’t be built on Linux.

I think the solution to this will be via the in-progress branch support I mentioned previously. This would allow you to have a shared set of base packages, with additional Windows and Linux branches. Or you could have Linux as your main branch and Windows as a branch off that with any necessary distro additions and removals.

Multiple Projects with Shared Virtual Environments

Because all of your project’s configuration is stored in our system, it’s trivial for an entire team to share that configuration. All a new team member needs to do to get started is to state activate the project, and they’ll get the same virtual environment as everyone else, with the same Perl core, Perl modules, and any C/C++ libraries those modules need.

This makes onboarding new team members or contributors trivial. And it makes it trivial to have many projects with different sets of requirements. And these environments can be packaged up into a file tree that you can distribute in production with the state deploy command.

But Wait, There’s More!

The State Tool has a lot of other features including shared secret management, support for shared scripts, and the ability to execute those scripts in response to events. See its documentation for more details.

Of course, we have more features in the works including CVE reporting and mitigation, license reporting, and support for other languages like Ruby, JavaScript, and Java.

Fun Technical Bits

The team I lead here at ActiveState has worked on some of the core components of the Platform, so I want to talk about the nitty gritty a bit.

Solver and Ingredients Database

The two big things we created are the (Dependency) Solver and the Ingredients Database (and its API).

Let’s start with the database.

Our entire package database is based on timestamped revisions. When we go to resolve dependencies for a project configuration, that request is always timestamped based on the project’s most recent commit. That means the project does not see any data changes that were made after its commit.

So let’s say that a project requires DateTime. You will always get the same version of DateTime no matter how many times we solve for your dependencies. But you also get the same version of each of DateTime’s dependencies, like DateTime-Locale and DateTime-TimeZone. And that applies through the entire dependency graph.

So if we add a new version of DateTime-Locale that breaks your DateTime version⁶, your project is unaffected. You can opt into newer versions explicitly, however.

This is actually even more granular than at the version level. We revision every version of every Perl core and distro that we know about⁷. So all of a distro’s dependency data is revisioned. This means that we can freely change that data without every breaking your build, allowing you to opt into changes on your own schedule.

The system supports a lot of static metadata that cannot be expressed in the Perl ecosystem. We can declare conflicts between distros or conflicts between a distro and a platform. But our platforms are defined very granularly, so we are really defining dependencies or conflicts in terms of platform components such as the kernel version, libc version, CPU architecture, etc.

And because we can add a new revision to an existing release, we can update this metadata as things change. Take my DateTime-Locale example from up above. With CPAN, the only way to fix this is for me to upload a new DateTime version that works with the new DateTime-Locale. I have no way to tell the CPAN toolchain that every earlier version of DateTime would work as long as it doesn’t use DateTime-Locale past a certain version. But our system supports all sorts of version requirements, including defining minimum and maximum versions for dependencies, and even excluding arbitrary versions in a range.

Another cool feature we built is support for dependency conditions, so we can easily say that a dependency is only needed on a certain platform or with certain version of the Perl core. All of this data is stored statically, so our Solver can give you useful errors when these constraints cannot be satisfied.

I’m really proud of the design my team came up with for this. It’s as simple as it can be, and we’ve done a good job of ensuring that we apply this revisioning in a consistent way across all relevant data, because we need to revision everything that factors into dependency resolution. That includes things like platform/OS data, global and per-package options like enabling debugging or threading (coming soonish), the VM/Docker images in which we do builds, and anything else that could affect the build output.

This extreme revisioning has made it much easier for us to change and update our data without the risk of breaking existing projects⁸.

The Solver is based on an algorithm created by Natalie Weizenbaum while working on the library tooling for the Dart language. She developed a SAT Solver called PubGrub. Natalie has written a great introductory article on PubGrub, which I highly recommend. There is also a detailed technical specification in the pub repository.

I wrote a post about this for the ActiveState blog that goes into more detail on the specifics of our implementation and some of the ways it differs from Natalie’s design.

Try It Out

Remember, you don’t need an account to try it out. If you like it, you can always make an account and associate it with your anonymous project.

If you have questions you can email me at autarch@urth.org, though depending on your question I may ask you to post it on our Community Forums (using Discourse).

If you’re wondering how this is entirely new since I just described what ActivePerl has always been, keep reading. ↩︎
We named it the State Tool so you can run state activate to start using it for a project. See what we did there? ↩︎
If you’re about to point out that you can’t necessarily relicense this software anyway, we know. Our license applied to the bundle, not the individual components. It is legally possible to license a software collection with a different license than applies to the individual components. ↩︎
Which is called “camel”. You’ll never guess what language it’s written in! ↩︎
XS modules are still compiled to .so files, but those .so files statically link to any needed C libraries. ↩︎
Surely the author of those packages would never be so sloppy as to allow this to happen, but let’s just imagine it could happen. ↩︎
Of course we do the same for all supported languages. ↩︎
For some reason users complain when their builds stop working. ↩︎

Twenty Years of Monthly CPAN Releases

Sat, 05 Dec 2020 10:21:22 -0600

I did it!

For the last twenty years I’ve uploaded at least one new release to CPAN every month. How do I know? Neil Bowers has been keeping track on his CPAN Regular Releasers page for quite some time. I’ve had the montly release quite a long time there.

The second place for monthly release streaks is Chris Williams (BINGOS), at 177 months, which is 14 years and 9 months. Also of note is Karen Etheridge (ETHER), who has maintained a weekly streak for 457 weeks (8+ years)!

So how did I do it? I cheated, of course.

One of the distributions I created and maintain is DateTime-TimeZone. This distribution contains the entire IANA time zone database as Perl data. So every time there’s a new database release I have to upload a new DateTime-TimeZone. This process is nearly entirely automated, consisting of just 3 commands if tests pass. I don’t have to code anything, I just update the Changes file.

I also had some help. Karen Etheridge contacted me to let me know I was about to miss a month this past June. That prompted me to do a release just in time. I also added a repeating to do item at the end of each month to check if I had done a release that month.

My First CPAN Releases

My first releases were actually under a different CPAN ID, “PGRIMES”, not “DROLSKY”. I got my start with the online world by dialing in to BBS’s way back in the 80s, using a 300 baud modem with my Commodore 64. At that time, no one used their real names a BBS. Instead, they used pseudonyms (mine was embarassingly childish and I’m not going to tell you what it was). I was so used to using pseudonyms that I continued to do so early on with the Internet (my first email address was grimes @ waste.org). And I still do, to some degree. It’s why my email is “autarch@urth.org”, though “dave@urth.org” works as well, and I use the latter in any professional context, like my resume.

So my first release was a logging module called Log::Handler. It’s the predecessor to Log-Dispatch, with a significantly worse design. I uploaded it to CPAN on December 31, 1998, according to BackPAN. That’s me partying in the New Year like usual.

But I quickly realized that using a pseudonym for this was a terrible idea. I would want to refer to my CPAN upload on my resume, and I wanted my name to be Googleable. So I switched over to my DROLSKY ID in 1999. My first upload under that account was a new Thesaurus release in September of 1999, followed soon after by my first release of Log-Dispatch in December of 1999.

My twenty year streak started with Alzabo 0.20 in January of 2001.

Twenty More Years?

Probably not.

The IANA time zone database has been getting less active over time. It’s only had four releases so far this year, whereas some past years have had over 10.

At the same time, I’m doing a lot less Perl nowadays. I still use it at work for scripting and quick tools, but the bulk of my team’s code is written in Go (cue rant about how annoying Go is).

And my personal projects lately have mostly been in Rust. It’s not because Rust is the absolute best fit for what I’ve been doing (though it’s good enough). Instead it’s because I wanted to challenge myself and learn something new. Rust is a ton of fun, and the community is great. I highly recommend checking it out.

But Perl is still my first love, and I still have many friends in both the Perl and Raku communities. That’s why I joined the board of The Perl Foundation. I want to do what I can to help the languages and communities stay healthy.

So I’ll be seeing you at the 2021 Perl and Raku Conference (almost certainly virtually) and I hope to see you in person at the 2022 conference.

I Contributed a PR to the Rust Core

Sat, 07 Nov 2020 16:26:52 -0600

It’s a tiny doc patch, but this brings my language core contribution count to two languages. I feel unreasonably proud of this.

A Sqitch “Declare Bankruptcy” Prototype

Sat, 31 Oct 2020 10:00:00 -0500

I wrote a prototype of a tool to “declare bankruptcy” on your Sqitch migrations and start over. If you have a project with 50, 100, or more migrations, this might be useful for you. The details are in the blog post I wrote on the ActiveState blog.

Perltidy Versus Black

Sat, 03 Oct 2020 15:31:00 -0500

I’ve recently been puttering about attempting to write a Postgres SQL & PL/pgSQL tidier in Rust called pg-pretty. If you’ve always wanted such a thing, don’t get too excited. It isn’t even close to usable yet.

But this post is about Perltidy and Black. These are both source code tidiers (aka formatter aka pretty printers). Black is for Python.

These two tidiers reflect their respective languages. Perltidy is all about TIMTOWTDI¹ and Black is very much a TOOWTDI² project.

I’ve been thinking about these two approaches as I work on pg-pretty. It sure would be cool to offer a tool that let you format SQL your way. But there are a lot of ways to format SQL! The number of options I could imagine is pretty huge. And every option increases the complexity of the source code quite a bit. Since many options can interact with each other, the complexity is even more than just the sum of the options. That’s certainly the case with Perltidy, where certain options behave differently depending on how other options are set.

Black, OTOH, has no options for formatting. This certainly makes it simpler!

And we can see this by looking at the size of the code bases. I did rough counts using wc -l, which includes docs and comments. I only counted application code, not tests. Perltidy, including its CLI program perltidy, comes out to 44,700 lines or so. Black, using the same wc -l approach, is just under 6,800 lines.

I honestly cannot imagine writing a Pg formatter in the Perltidy style! I don’t care that much about the details of how it looks. I just want to think as little as possible when reading code that others write!

There’s more than one way to do it. ↩︎
There’s only one way to do it. ↩︎

My Move to Render is Complete

Fri, 25 Sep 2020 12:57:00 -0500

I’ve now moved all of my sites from my Linode box to Render. This includes:

This blog - Uses Hugo. Repo
houseabsolute.com - Contains my resume and links to conference slides. Also uses Hugo. Repo
presentations.houseabsolute.com - A new hostname for my slides. See below for more details. Repo
masonbook.houseabsolute.com - Just a static site. Repo
vegguide.org - Just another static site. Repo

I really like Render! It’s incredibly easy to use. All of the sites are static sites either using Hugo or are just raw HTML and CSS pages in a directory tree. The one exception is the presentations site, which took a bit of fiddling. Static sites are completely free on Render, so this lets me replace my $20/month Linode server with $0/month hosting. Of course, Render is a startup and might disappear or start charging, but they also have competitors like Netlify offering the same deal. And I can always move to AWS Amplify for a very low cost if needed.

Previously, I had served the houseabsolute.com domain using WordPress and Apache. Most of the site was WP, but the /presentations path was just a directory on disk. That directory contained a clone of my presentations repo that I would pull into as needed.

But with Render I needed to make the presentations repo part of the houseabsolute.com repo. My first thought was to use git submodules, but this didn’t work. The problem is that the presentations dir has a ton of symlinks. Most of my presentations use reveal.js (v3 or v4). Instead of copying the JS and CSS from reveal to each presentation directory, I just have one copy of reveal.js in the repo (well, one v3 and one v4) and symlink into it from the directory contaning each presentation.

I tried getting Hugo to treat this all as static content, but it mostly (entirely?) skips symlinks. So that was a no go.

Next I tried getting Render to just treat the entire repo as a static site. But Render also doesn’t support symlinks. I’m guessing that it serves static content from S3 or something similar. So when it serves content there’s no filesystem and therefore no symlinks.

Fortunately, with Render, you can specify a command to prepare a static site for publishing, and then a single directory that contains that site. So I wrote a fairly simple render.sh script to do that for my presentations repo. It first gets rid of any node_modules directories. These are only needed for the reveal.js local server mode, which I use when I give a presentation to open a separate speaker notes window. Then I use rsync to copy all the presentations to a subdirectory. Rsync has a very handy --copy-links option that essentially “delinkifies” the tree as it copies. So every symlink turns into a copy of the thing it links to. I found out about this through a Stack Overflow answer that I cannot find again right now.

This works nicely, though it ends up being bigger than is really needed because of all the copies of the reveal.js code. All told, it ends up at 107MB. And this is why my presentations have moved to their own hostname. Fortunately, Render lets you add redirects to static sites so any stray links to the old paths in the houseabsolute.com hostname will continue to work.

Render has a lot of other really nice features. When you set it up you specify the domain you are using for the site. Once it detects that you’ve updated DNS to point the domain at their servers, it automatically sets up SSL for that domain using Let’s Encrypt. When you first configure a new site, it sets itself up to rebuild your site on every push, which is a great default. If you wanted something more complex you can turn that off and hit a deploy hook URL that they provide for your site. You can also set it up to let you preview PRs of that repo.

Of course, all of this is just a loss leader for their real services, which let you deploy actual applications along with databases and persistent disks as needed. If I had an app I needed this for I would definitely try this out.

Overall, I’m really happy with this move. I much prefer editing Markdown files in Emacs. The WordPress editor is pretty good, but Markdown is just better for me. And the attack surface for this setup is much smaller than running an actual server on my own.

One impact from all of this is that I will no longer be on IRC. I was hosting The Lounge on my server, but that server is going away soon. There are other web IRC options, but I realized that I barely use IRC these days. If you want to get in touch with me you can find me in the TPF Slack (I can invite you to the #perl channel there if you want) or just email me. TPF has discussed setting up a permanent community chat forum, and if that does happen I’ll be there too.

New Blog Software

Sun, 20 Sep 2020 15:32:12 +0000

Many, many years ago, in the flower of my youth (June, 1999), I registered the urth.org domain. I can’t remember when I started hosting my own email, but it was around that time. For many years after, I had a server in my home that hosted my email, various websites, and some web applications. Eventually VPS’s became cheap and powerful enough that I ditched the home server and moved everything to the cloud (Linode, specifically).

Linode has been great, and has only gotten cheaper and better over time. But I’ve been doing less and less with my VPS over the years. I moved my mail to Gmail years ago because maintaining deliverability was too much work. I stopped hosting things like wikis and other webapps as cheap or free SaaS offerings became available.

Now the server is doing very little. It runs WordPress for my blog and my resume/portfolio site, hosts a few static sites, and it runs The Lounge for IRC. But do I really need WordPress for a blog? No, not really. Static site generators like Hugo are extremely powerful and require much less maintenance. They also have a much smaller security footprint. I don’t spend much time on IRC, and there are hosted solutions available for free or very low cost.

So I concluded that I really don’t need a server any more. This blog is now running with Hugo, as the footer states. I’m using Render for hosting, which is a really nice service. It’s free(!) for static sites like this one, and configuring it for a Hugo site is incredibly easy. My one concern about Render is that they’re a startup, and I’m always nervous about startup longevity. On the other hand, there are a number of competitors, including Netlify and AWS Amplify. But I hope Render sticks around, and I encourage you to check it out.

The one big change is that this new site no longer support comments. I looked at a few different options for this, but nothing seemed great. There’s Disqus, but no. Just no.

There are also a number of FOSS options, but most require running a persistent service somewhere. There are also some clever serverless options like utterances. But that uses the GitHub API in anonymous mode, which has extremely low rate limits. In testing I found it was easy to exceed those limits, which causes the comments to disappear from the blog until the limit resets. Each person/IP/browser (not sure how GH counts this) has its own limit, but just browsing through the archives is enough to trigger the per-hour limit. I really didn’t like the idea of a heisencomments system.

There are also a few paid options that aren’t creepy, but even the cheapest would run about $2 per month. Given that I write less than 20 posts a year, and most posts don’t get any comments, that seems like a poor use of money.

Instead, I encourage people to submit a blog post to Hacker News or an appropriate subreddit and start a discussion there. You can also email me. And if you find a typo in a post, you can just submit a PR on GitHub directly!

About This Site

Thu, 17 Sep 2020 00:00:00 +0000

This is Dave Rolsky’s blog. It contains blog posts. These posts contain ideas, mostly in the form of words. The words are made of letters, and each letter is made of pixels. The pixels are made of turtles.

Feel free to email me.

My GitHub Profile
My houseabsolute GitHub organization has most of my FOSS code.
My CPAN modules
My resume

I Attended RustConf 2020

Thu, 27 Aug 2020 19:20:54 +0000

Like so many conferences this year, RustConf 2020 was a purely virtual event. I’d already helped organize and attended The Perl Conference in the Cloud 2020 as a virtual conference earlier this year, so I knew it could work.

RustConf was very different from The Perl Conference. It was just one day and one track, lasting about five and a half hours with a break in the middle. The conference schedule was incredibly detailed. For example the opening keynote was from 9:35-10:27. And it really was exactly that long. I was totally amazed that the speakers could stick to such strict times until I realized that everything being presented was pre-recorded.

The talk quality was quite high. I believe that the conference organizers worked with each presenter to help them polish their presentation, which is really nice. This may be related to the format, since this wouldn’t be feasible with a much larger set of presentations. And obviously pre-recording helps here since the speakers could do multiple takes (and even do edits if they wanted).

The conference itself was streamed on YouTube, with Discord as a chat server. The organizers set up channels for each talk that were only usable during the talk and were made read-only before and after. This worked quite well. There were also some general channels like #hallway and #jobs and such.

Finally, there was a companion “app”, Meeting Pulse, with the detailed schedule, a place for attendee pictures, and some other features. I say “app” in quotes because it was just a webapp, though it worked fine on a phone. I ran it on my desktop in a second monitor, which was fine, except that it used insane amounts of CPU and GPU and occasionally lagged quite a bit. But this wasn’t a big problem.

I didn’t like every aspect of the scheduling. The first part ran from 11:30-13:00 with no breaks. For anyone organizing an online conference I’d suggest you institute a rule of a minimum of 10 minutes break after 60 minutes of content. This gives people a chance to go to the bathroom and get some food.

There was a two hour “lunch” break, which made sense in Pacific time as it was from 12:00-14:00 but it was from 14:00-16:00 for me, so it wasn’t so useful. I’m all for a mid-content break, but this was really long, and I would’ve just preferred it to be shorter so we could finish earlier. A 30-60 minute break would’ve been better. But neither of these scheduling issues were a big deal, just small things I’d like to see changed if next year’s conference is virtual as well.

And now some brief writeups on each of the presentations …

Rust for Non-Systems Programmers - Rebecca Turner

This wasn’t the first talk but I’m covering it first because it should have been earlier on the schedule. This is a great intro to Rust for those new to the language, covering many language constructs, error handling, writing simple CLI apps, using REST APIs, and more. I highly recommend this if you’re new to Rust and want to learn more. If you’re totally new to Rust and want to get a quick overview of it then watch this talk!

Opening Keynote - Rust People

I’m not sure if all the presenters were part of the Core Team, or exactly how Rust is organized. Suffice it to say that all of the presenters are involved in Rust development and/or the community. The keynote mostly talked about Rust as a project and community, and was more on Rust’s values, especially community values. I think it was a good opening since Rust has really positive community values. I hope that made all the attendees feel welcome.

Error Handling Isn’t All About Errors - Jane Lusby

This was my favorite talk overall, with great technical info, excellent slides, and a very polished presenter. Jane talked about the difference between errors, error context, and error reporters, which was really interesting. Rust makes this a bit more explicit than many other languages, especially if you use some of the really nice crates she covered. I highly recommend watching this even if you have no interest in Rust! I’ve been using anyhow and thiserror myself, but she introduced eyre, a fork of anyhow that looks really nice. Plus it’s a funny joke on multiple levels.

How to Start a Solo Project that You’ll Stick With - Harry Bachrach

This was more of a psychology talk than a technical one, but it was pretty interesting. That said, my way of dealing with my side projects is quite different from theirs. If you struggle with your own side projects, or are just generally interested in psychology, this is worth checking out.

Under a Microscope: Exploring Fast and Safe Rust for Biology - Samuel Lim

I have less to say about this talk than the others. I just can’t get very excited about scienitific programming topics for some reason, but that’s no dig against the talk or the speaker. The most interesting parts for me was when he talked about using Rust as part of the toolchain for some very big data analysis.

Bending the Curve: A Personal Tutor at Your Fingertips - Esteban Kuber

This was a really great talk about how the Rust compiler is designed to teach you how to code in Rust. The compiler’s error messages are one of my favorite things about Rust coding. The only other language I’ve seen that does this as well is Raku, and they got there first. I did plug Raku a bit in chat. I suspect the two communities could get some good error message ideas from each other.

My First Rust Project: Creating a Roguelike with Amethyst - Micah Tigley

Now we’re talking! I’ve actually considered trying to create a roguelike inspired by Cataclysm: Dark Days Ahead in Rust. I enjoyed learning about Micah’s use of Amethyst. She talked about various aspects of it, including it’s animation system and how that plugs into its overall event loop. If I do decide to work on a roguelike I’ll be checking out Amethyst.

Controlling Telescope Hardware with Rust - Ashley Hauck

This was more about the hardware and image processing than the science. There were some interesting points about working with low-level interfaces (serial ports), image processing (math I didn’t understand), and using threads to separate controlling hardware from the UI to ensure responsiveness. If you’re a hardware hacker I recommend this talk, but there’s enough general content for anyone interested in Rust.

Macros for a More Productive Rust - jam1garner

This was a deep technical dive into how macros in Rust work. The short summary is that they’re extremely powerful. I couldn’t help but think of Raku and it’s goal of making slangs possible. I learned a lot about the Rust macro system, and the talk ended with some programming madness that I think even Damian Conway might envy. I think I learned the most new technical stuff in this talk, with Jane’s as a close second.

Closing Keynote - Siân Griffin

I don’t want to say too much about this talk, because that would ruin the fun of it. Suffice it to say that it was both incredibly entertaining and informative, while making a strong case for the benefits of a language that prioritizes safety and correctness along with performance. If you only watch one talk from the conference it should be this one.

Thank you to the RustConf organizers and presenters for your hard work! This was clearly a labor of love, much like the Perl Conference. It’s amazing how much work people will put in for a language and community they believe in.

House Absolute(ly Pointless)

Eating Vegan in Taiwan

About Me

About You

Tourists in Taiwan

Do You Need to Speak Mandarin to Eat Vegan in Taiwan?

Learn Some Mandarin

Learn to Read Two Characters

The Swastika (卍)

Tools You Need

English Menus

Vegetarian Food in Taiwan

Dairy and Eggs in Practice

Reading Menus

Menu Markers for Ingredients

Non-Vegan Mock Meat

Finding Places to Eat

Convenience Stores

Advice About Specific Food Items

Other Resources

The Absolute Best Way to Find Vegan Food in Taiwan

Naming Your Binary Executable Releases

Either Include an Extension or Don’t Include Any Periods

Include the Operating System and CPU Architecture in the Filename

Come (Maybe) Be the Boss of Me

Sleep No More Is My New Favorite Videogame

Cross Compiling Rust Projects in GitHub Actions

All My Perl Modules Are in Maintenance Mode

Big Changes in Precious v0.4.0

My Team at MongoDB is Hiring

Fixing Some Bugs in My GitHub Profile Generator

What's the Right Way to Merge a Pull Request?

My Perl and Raku Conference 2022 Write-Up

The Venue and Location

The Talks

People Still Use Perl? - Twenty Years of Making a Living with a Dead Language - Ruth Holloway

NewFangled: Bringing NewRelic to Perl with Alien and FFI Technology - Graham Ollis

Taming the Unicode Beast - Felipe Gasper

A Nailgun for Raku - Daniel Sockwell

Open Source, Self Hosted Password Management with Bitwarden + Vaultwarden - Daniel Sockwell

Modern Approaches to Ancient Perls - Brian Kelly

Command-line Filters - Time to Shine - Bruce Gray

Three Ways to Make Wrong Code Look Wrong (er) - Daniel Sockwell

Why Do Programmers Love Rust? - Dave Rolsky

Meet the TPF Board

The Perl Navigator: Code Intelligence for any Editor - Brian Scannell

Mastering English in Perl - Makoto Nozaki

IPv4 subnetting for humans - Teddy Vandenberg

CLI Tools I Use - Dave Rolsky

SQL::Abstract - Caveat Emptor - Dimitrios Kechagias

Dispatches from Raku - Daniel Sockwell

Advice for Presenters

Non-Conference Stuff

The Hallway Track

Next Year’s Conference

Job Search 2022 Update: Postscript

Restoring Window Positions in GNOME After Switching Monitor Inputs

Software Job Search 2022 Retrospective: Coding Challenges

Live Coding

Take-Homes

Optic’s Challenge and Why It Was the Best

What About Existing Projects?

Takeaways

Job Search 2022 Update: The Last One

Parting Thoughts

Job Search 2022 Update: Week 5

Job Search 2022 Update: Week 4

Yet Another GitHub Profile Generator

How It Works

Job Search 2022 Update: Week 3

Job Search 2022 Update: Week 2

Salary and Negotiation

What’s Next?

Job Search 2022 Update: Week 1.1

Job Search 2022 Update: Week 1

Let the Job Search Begin; And My First Interviews

Checking Tailwind Class Names at Compile Time with Rust

Enlisting the Rust Compiler to Check my CSS

So I Wrote That New Tool

The Ergonomic Macros

What Does `tailwindcss` Do?

IC⁴ or Management?

IC² or Management?