Cute Mage's Tower

AI: Jumping in the Wyrmhole

Astute readers will note that I have analyzed the other three AI rounds in descending order of how much I liked them. Conjuri’s Quest is my favorite, ABCDE was quite fun, and Ascent was good despite the metapuzzle and the overall hunt holding it back. Following this pattern, you probably have some guesses about how I feel about the Wyrm. It’s still worth looking into anyway.

This is part of a multi-part series analyzing the various AI rounds from the 2023 MIT Mystery Hunt and seeing what they can teach us about writing puzzle hunts in general.
Note: After this article was posted, Alex Irpan, a member of teammate who was involved with the Wyrmhole, posted some notes about this article. They are good notes and people should read them. Most of them will make sense after you have read this article, but there is one point that he makes that points out a flaw in my analysis. I will address that when we get to it.

Let me clarify before jumping into this - I’m going to have a bunch of criticisms of the Wyrm round in this article, but that doesn’t mean that I think that Wyrm was broken or shouldn’t have been in the Hunt. The structure did its job, and folks were able to solve it. However, if I was in charge of editing this Hunt, this round would never have made it past me. It is mechanically fine, but in my opinion, missing a bunch to make it good.

But that’s okay. Different teams have different editing styles. Different people have different things that they enjoy from a round of puzzles. My opinion will not invalidate the Wyrm, will not invalidate the work that people did to make it happen, and will not invalidate the story overall. But I am going to explain why this round did not land for me. I mention this for any teammate folks reading - I’m going to have some stronger criticisms of this round than I did for the other AI rounds, so if that’s not something you’re in the appropriate headspace for, please don’t read this.

Lastly, I am of course biased in how I feel. This round was the last round for my team, The MIT Mystery Hunt ✅, and of course the frustration about being stuck on this round is going to affect how I think about it. That being said, one of the reasons why I waited this long to post this is because I wanted to spend time thinking about it after my emotions calmed down. I want to be as fair as I can to the authors of this round, and that involves actually analyzing it, not just repeating my unfiltered feelings.

Positives

Metameta

Let’s start with my favorite part of this round - the metameta. This is the cleanest metameta of any of the AI rounds. The real/imaginary pairs are a cool idea to base the meta around, and getting the period of the Mandelbrot set is a cool mechanic. While there are technically three different ahas in the solving path, in reality only the real/imaginary aha is a hard aha - the other two are much more straightforward given the cluing of the puzzle.

Is this metapuzzle a little ridiculous? Absolutely. However, it is the metameta for a world, as opposed to the single meta for an individual round, so it should do something ridiculous. Using the Mandelbrot set is definitely in the right scale for the MIT Mystery Hunt, and it feels like a good way to wrap up the round from an artistic standpoint.

An important note is that the metameta has some built-in flexibility. While we’re using the real/imaginary answers to generate complex numbers, we’re then transforming those numbers with arbitrary transformations to get what we need for extraction. We have complete freedom as to what the transformations are, which means that as long as we can get a pair of real/imaginary things that works, we can use them in the meta. This is great for metametas of worlds, as this allows us freedom to construct and find interesting metapuzzles for the round1.

The Loop Aha

teammate is very good at environmental puzzles. This shows through in the loading puzzle, which did an absolutely wonderful job of communicating that it was a puzzle despite not being allowed to say that it was. The aha on closing the loop is a round-wide environmental puzzle, and it is well clued. Let’s list the clues:

I wish I could’ve been there when we made that aha, because it’s just so beautiful.

Negatives

Stepping Through the Loop

Round 1

When solvers first interact with this round, they get access to six puzzles that they hopefully have already solved, along with a metapuzzle that directs them to a set of physical objects that has been dropped off outside their door.

The metapuzzle itself is not too hard. The only reason it really took us time was that we were 1) trying to figure out how to represent it for people who weren’t in person and 2) convincing ourselves that this actually worked. A couple of things cross my mind as we’re solving this:

  1. It’s really weird that the answer list contains TRIFORCE when the meta is building a mega-triforce. That feels a little too on the nose.
  2. These answers could have been anything. The only restriction on them is how many letters they have in total and that they need to not be too similar so that the logic puzzle is unique.
  3. It seems like the Wyrm’s bit is reusing puzzles from elsewhere in the Hunt, which is an interesting bit.

Those three items ring some alarm bells in my head, but not super loudly.

Round 2

When solvers unlock the next round, it’s revealed that The Legend is a feeder puzzle to a new round. This time the puzzles are new for this round - not just repeats from earlier in the hunt, which gets rid of that as a possible running theme for the round. These puzzles were another step harder than the previous puzzles we’d been given, but that made sense for where we were in the hunt.

Once we had solved3 enough puzzles, the second metapuzzle opened - The Scheme. Click on that link to see the puzzle, but keep in mind that when we opened it, the errata hadn’t happened yet, so the first line of the flavor text wasn’t there. Let’s take inventory of all the information in the puzzle:

The numbers at the bottom look like indices. Presumably we’re going to combine these answers in some way to get something that is at least 45 letters long, and then use these numbers as indices for that thing. That something is probably going to involve going clockwise around the outside of a triangle in some way. The nice thing is that the lengths of all the answers add up to 45.

You know what else there’s 45 of? Letters around the perimeter of the triangle from The Legend.

This makes sense. Putting things one-to-one is a very natural thing to do in hunt puzzles, and once we would figure out how the correspondence would work, that would give us the ordering we needed to use the indices. This would also explain the weirdness with the triforce and how unconstrained it was.

Then the errata came out.

An interesting story is told by the stats page for this puzzle. The errata came out at 5:22 PM. In the next 7 minutes, 7 teams solve the meta. What was that errata?

Well, it was actually a hint. The hint added one sentence to the flavor text: ”Stack the words from 1 to 9.” That was enough to get teams to notice that the answer words, if multi-word answers are split into their component words, contain every length from 1-9. In order to get access to every letter, the arrow needs to spiral in the triangle, which means that teams are pushed away from interpreting the arrow as only residing on the perimeter of the triangle.

That errata was absolutely necessary. The idea that you have to split the answers up into individual words has to be clued if you’re going to use it and the split only ends up happening for some answers. Let’s be clear - I’m not saying that the mechanic can never be used. Puzzled Pint uses it a bunch to get around the fact that they only have 4 answers. However, whenever they use it they state that you need to use the words separately. It is not a normal thing to do without being told.4

I also wonder what would have happened if they had changed the triangle picture so that the arrow kept going until it had to spiral inward for the first time. That arrow on its own doesn’t clue spiral - the only reason we inferred that it spiraled came from taking all of the other information together and reasoning that it had to.

Round 3

The third round unlocks, once again with a new round of puzzles, including the meta from the previous round as a feeder puzzle. This cements the telescoping structure of the round as the “bit” for this AI.

The puzzles are nothing unusual for this hunt5, and the meta doesn’t look like anything too unusual either. But let’s take a look at that meta a bit closer. Here are the steps needed to solve that metapuzzle.

  1. Figure out that each of the answers is/clues a US Navy Ship.
  2. Notice that the year corresponds to the ship in some way, which lets you assign the ships to the appropriate row.
  3. Look at the hull number for each ship, notice that there is always the same number of rectangles as there are letters in the hull number for the ship.
  4. Recognize that the squares can be filled in with the nautical flags for the letters, and the dots therefore extract a series of colors.
  5. Recognize that the series of colors can be turned into a series of directions using the compass rose below.
  6. Use the directions to assign the ships to ships on the map, making a chain of ships.
  7. Index into the answer of the next ship by the digits from the hull number of the previous ship. Read the letters in the chain.

Some of these steps are perfectly fine, but there are three main issues with this meta.

Step 1 is way too loose

Let’s take a look at how the answers connect to a ship:

The answers are split evenly between the three different methods. This is unusual, and for a good reason. When solvers are trying to break into a puzzle, especially a meta, they are looking for a clear connection to show them that they’re on the right track. Experienced puzzle solvers will know that you can match any set of data to another set of data if you just get creative with how you allow the connections - therefore doing this exact thing is an indication that you are not on the right track. The years are not as big of a help as one might imagine, as you don’t know which answer goes with which year, so you’re trying to match something from the pool of answers with the pool of ships. There are later steps that can confirm what you’re doing is correct, but there really should be something earlier to do exactly that.

Basically, when solvers are trying to break into a metapuzzle, they are looking for clear directions that they are in the right direction. This meta uses a muddled connection that obscures the trail far more than normal.

Steps 4-6 are unnecessary

Let’s work through the logic of why we’re doing these steps. In step 4, we use the flags to get us a series of colors. In step 5, we use the colors to get us directions. In step 6, we use the directions to figure out which ship in the map goes with the answer. This gives us a chain of ships… that is exactly the same as the ordering in the puzzle in the first place. In fact, you have to figure out that ordering in order to get the directions you need in the first place! In fact, the puzzle could’ve looked like this and nothing would need to change:

  1. Figure out that each of the answers is/clues a US Navy Ship.
  2. Notice that the year corresponds to the ship in some way, which lets you assign the ships to the appropriate row.
  3. Look at the hull number for each ship, notice that there is always the same number of rectangles as there are letters in the hull number for the ship.
  4. Index into the answer of the next ship by the digits from the hull number of the previous ship. Read the letters in the chain.

Done.

The only real use this has in the puzzle is as a confirmer. It confirms you have the right ships, and it confirms that you have the right order. This is good. Shell metas need confirmers that you are in the right direction. However, this is a lot of ahas and steps to do that confirmation, and it’s too late in the process.

Okay, so this is the part that is actually incorrect. The given ordering of the ships is different from the map ordering of the ships, and both are necessary for different reasons in the final extraction. This was... a decision. I still think it's not great design, but I will admit that my analysis is wrong. I'm leaving this section up for posterity's sake.

Step 7 is index hell

I am adding “index hell” onto my list of phrases I need to explain fully in a blog post at some point6, but basically it means that you get to the last step and you have a whole table of data and any pair of things could be a reasonable method of indexing and you don’t know which it is until you try all of them. Sometimes, index hell is the fault of the team who is solving if they miss something that the puzzle is clearly trying to push them towards. However, some puzzles are just more likely to end in index hell.

And look, this isn’t necessarily a bad thing. It’s not something you want all your puzzles to go to, but in the end it can be fun looking in the weeds for the right combination. This is also a great place where skill and intuition can be tested and therefore is not bad in later rounds. However, this combined with the ease at getting lost in the weeds at the beginning leads to a puzzle where the solvers are just lost7.

That being said, I will give the authors credit on this one. Someone clearly noticed this and added the flavor text at the top that spells out how to index. Were there more interesting, subtle ways of doing this? Probably. Does this work? Absolutely and I don’t fault them at all.

Fixing Lost at Sea

I don’t want to imply that Lost at Sea would need to be completely scrapped. I think that it’s very fixable with a few small changes. Here they are:

Lost at Sea is a textbook8 example of how you can have a good structure for a puzzle but the small details make it much harder or easier than intended. The details matter a ton for puzzles, especially metapuzzles because teams will be attempting to solve those without all the answers. It’s really annoying because some of it is out of your control. You can spend a ton of time perfecting a puzzle, but at some point the time you spend isn’t worth it, and there will still be people who miss those details anyway.

Round 4

This says it’s a round, but it’s really part of the loop aha. It’s already been talked about in the Positives section.

Interconnectedness

Mapping Interconnectedness

One of the things that really struck me at the end of this Hunt was how interconnected it was. Answers get used in one meta, and then reused in other ones. To try to get an understanding of what was going on, I made a map of the whole Hunt. (link to the pdf) There’s a lot more arrows in that graph than there would be in previous years’ graphs.

Interconnectedness is a really important tool for puzzle hunt structure designers. It allows for more complicated structures while keeping the total numbers of puzzles down, which helps make the whole hunt more tractable. The problem with answer reuse is that it puts more and more restrictions on the answers, reducing the set of possible answers and perhaps forcing constructors to settle for nonthematic or just ugly answers. Now, this isn’t a problem unique to answer reuse - every hunt has to ask itself how thematic to the puzzle the answers are going to be and how much they’re willing to rework the answers to make them as nice as possible. However, answer reuse puts strain in this specific area, so it’s worth pointing out.

One trick to solve this is to make one metapuzzle that puts little to no requirements on its answers. This allows the constructor to have some flexibility with the answers they pick, as they can focus on fixing the other answers with more restrictions first. However, the less “close” a metapuzzle is, the more it needs to exceed in other areas to feel like a good metapuzzle.

Wyrm vs. Indana Jones

A good example to look at is the Indiana Jones round from the 2013 MIT Mystery Hunt. That round also had an involved metameta where answers needed to be paired up, and metapuzzles that did not put many restrictions on their answers. Let’s summarize how that round works:

The round is split up into three adventures, aka subrounds. Each subround consists of 8 feeder puzzles. These eight puzzles feed into a rather shelly metapuzzle that is themed around snakes. Every time a team solves the round meta, they unlock a “tablet” which is a piece of the metameta. Answers in the round can be paired up so that they form historical events, such as ALI/FORMAN or MASH/FINALE. Each of these events has a specific day in history that they occurred on, which gives us a list of 12 different dates. The symbols on the tablets are a system for writing dates, and the 12 sets of symbols that make the outside circle are the 12 dates indicated from the feeder answers. Once solvers figure out how the date system works, solvers can translate the dates in the middle of the tablet, but they are missing one number each. Each of the dates is an occurrence of MIThenge, and they each have a unique number that can be put in the blank to make the date correct. These years are all between 2001 and 2026, meaning that the last two digits of the years in order spell PARTNERSHIP WITH BOA.

This comparison makes both the strengths and the weaknesses of Wyrm stand out clearly. Wyrm’s metameta is cleaner than Indiana Jones’, and fits better with what the whole round is doing. However, Wyrm’s submetas are less fun to solve and just less interesting overall. In addition, Indiana Jones has a clear theme of snakes in all of the submetas, which makes sense given Indy’s famous fear of snakes, and the final pun has something to do with snakes. Wyrm’s submeta theme is… triangles?

The Legend’s author notes explains where the triangle theming came from.

The triangle theming within all the Wyrmhole metas originates from this puzzle. We wanted a fractal-y starting meta as a teaser for the rest of the round, and after a proof of concept for this puzzle tested successfully, we backported the triangle theming to all other meta drafts and chose TRIFORCE as the answer to close the loop in Wyrmhole. This did arguably break The Error That Can’t Be Named, but we wanted to make the triangle motif as strong as possible for that answer.

First of all, while I can see the argument that TRIFORCE didn’t break TETCBN, I wouldn’t want to be the person to defend that side. More importantly, this is an unusual justification for triangles which doesn’t make sense unless you have read that Author’s Note. If someone saw only the beginning of the round and not the end, then the triangles would seem really random.9

I’ve gone on a bit of a rant about triangles10, but now back to interconnectedness. Let’s list out all the answers in both rounds next to each other.

Indiana Jones Answers Wyrm Answers
ALI
CARL SAGAN
DISCOVERY
FAT
FINALE
FOREMAN
GUNPOWDER
LAST
LOOKOUT
MAN
MASH
MOLLY
MOONWALK
MOUNTAIN
OATH
PITCHER
PLOT
PLUTO
PULITZER
RIOT
STRAVINSKY
SVALBARD
TENNIS COURT
TOTAL ECLIPSE
APPLE
ARABIAN NIGHTS
AVENUE Q
BEIJING TIGERS
BRITAIN
CARBON SINK
CERBERUS
DISCIPLE
DRAGON
EYE OF PROVIDENCE
FAVORITE PIN
FELLOWSHIP
GEOMETRIC SNOW
GOLD TOUCH
HAMILTON
HOGWARTS
INCEPTION
MONOPOLY
NINTENDO
OLD HICKORY
PICO
SEA OF DECAY
TANGRAM
TRANSAMERICA PYRAMID
TRIFORCE
UNDERWOOD

Okay, but this isn’t the full story of what’s happening. Let’s take a look at the restrictions on each answer.

Indiana Jones Answers
One half of the name of a famous event with a specific date and…
Wyrm Answers
One Half of a Real/Imaginary Pair that can clue a number and…
ALI - works in the word ladder
CARL SAGAN - long enough to make an interesting snake
DISCOVERY - has a letter in ATLANTIS
FAT - works in the word ladder
FINALE - long enough to make an interesting snake
FOREMAN - has a letter in ATLANTIS
GUNPOWDER - long enough to make an interesting snake
LAST - works in the word ladder
LOOKOUT - has a letter in ATLANTIS
MAN - works in the word ladder
MASH - works in the word ladder
MOLLY - works in the word ladder
MOONWALK - has a letter in ATLANTIS
MOUNTAIN - has a letter in ATLANTIS
OATH - works in the word ladder
PITCHER - long enough to make an interesting snake
PLOT - works in the word ladder
PLUTO - has a letter in ATLANTIS
PULITZER - has a letter in ATLANTIS
RIOT - long enough to make an interesting snake
STRAVINSKY - long enough to make an interesting snake
SVALBARD - long enough to make an interesting snake
TENNIS COURT - has a letter in ATLANTIS
TOTAL ECLIPSE - long enough to make an interesting snake
APPLE - Can be in a word web
ARABIAN NIGHTS - Can be in a word web
AVENUE Q - Has a unique word length, contains some letters of EYE OF PROVIDENCE
BEIJING TIGERS - Put in the Triforce, one word is a unique word in one company’s menu, extracts a needed bigram in mate’s meta
BRITAIN - Has a unique word length, contains some letters of EYE OF PROVIDENCE
CARBON SINK - Put in the Triforce, one word is a unique word in one company’s menu, extracts a needed bigram in mate’s meta
CERBERUS - Put in the Triforce, is the name of a conspiracy, extracts a needed bigram in mate’s meta
DISCIPLE - Can be in a word web
DRAGON - Can be in a word web
EYE OF PROVIDENCE - Can clue a Navy Ship, can be produced by a metapuzzle
FAVORITE PIN - Can clue a Navy Ship
FELLOWSHIP - Can be in a word web, can be produced by a metapuzzle
GEOMETRIC SNOW - Put in the Triforce, works in Nuclear Words, extracts a needed bigram in mate’s meta
GOLD TOUCH - Can clue a Navy Ship
HAMILTON - Can clue a Navy Ship
HOGWARTS - Can be in a word web
INCEPTION - Has a unique word length, contains some letters of EYE OF PROVIDENCE, can be produced by a metapuzzle
MONOPOLY - Can be in a word web
NINTENDO - Has a unique word length, contains some letters of EYE OF PROVIDENCE
OLD HICKORY - Can clue a Navy Ship
PICO - Has a unique word length, contains some letters of EYE OF PROVIDENCE
SEA OF DECAY - Has a unique word length, contains some letters of EYE OF PROVIDENCE
TANGRAM - Put in the Triforce, works in Nuclear Words, extracts a needed bigram in mate’s meta
TRANSAMERICA PYRAMID - Can clue a Navy Ship
TRIFORCE - Put in the Triforce, clues a set of colors, extracts a needed bigram in mate’s meta
UNDERWOOD - Can clue a Navy Ship

These restrictions have been colored above based on how strict they are. Black restrictions are ones that put no restrictions on the answer. green restrictions are ones that put minimal if any restrictions on the answer, and orange restrictions are ones that put nontrivial but still not hard restrictions on the answer. (There are higher levels than orange, but I don’t need them for this chart9.)

One quick glance at the chart can see the difference between Indiana Jones answers and Wyrm answers. Indiana Jones makes its round work with two restrictions on every answer, whereas Wyrm has multiple answers with four restrictions on it.11 No wonder The Legend doesn’t put any restrictions on those answers - they’re already going through enough!

At this point we’ve established that the Wyrm answers have more restrictions on them than usual, but how does that affect the actual constructions? To determine that, we need to do a different comparison.

Wyrm vs. The Ministry

These rounds are very different - I’m not here to do a point by point comparison, because that’s worthless given what the rounds were intended to do. Wyrm was meant to be one of four ending rounds, and The Ministry was meant to be a mid-size team’s goal. First, let’s sum up the Ministry for those who aren’t familiar.

Let’s do a breakdown of the restrictions for the Ministry answers. 13

Baker Answers - Can be placed in the grid to make the indicated letters a full alphabet
Dewey Answers - Contains an anagram of a world currency and some of the letters of COLORFUL HEAD
Hayden Answers - Contains 2 sets of doubled letters and some of the letters of GOES ON LONGER THAN WAR AND PEACE
Lewis Answers - Is the name of an album and contains 4 of the letters in MULTIPART COMPOSITION
Rotch Answers - Is the name of a street that crosses a street with the same name as a street in Cambridge

(Click to reveal - the table looked really ugly, so we’re using the <details> tag instead.)

In the last example we showed how some of the Wyrm answers had four restrictions compared to Indiana Jones’ two, making the Wyrm much more restricted. Surely the fact that every answer in The Ministry had six restrictions means that it must be even worse than that, right?

Well, no. To discover why, we need to take a look at how you would construct these rounds and what actual problems these restrictions cause for construction. Fortunately, I can speak for the construction of one of them. Let’s dive into how the Ministry was constructed.

Early in meta construction, we knew that the Ministry was going to be a 25 feeders -> 5 submetas -> metameta structure, but we hadn’t figured out exactly how it would work. While most people were focused on Pen Station14 metas, Kevin Wald came up with the initial idea for the Ministry metameta. We refined it in Team Aardvark15, and then proposed it in much the same form that it was published in. The only two changes were that GOES ON LONGER THAN WAR AND PEACE was originally GOES ON LONGER THAN AYN RAND and the feeder answers were dummy answers that demonstrated the mechanic but weren’t intended to be final answers.

This metameta tested well, so while people were still working on Pen Station metas, I typed up a set of guidelines for how Ministry submetas would work. This included some advice for what was expected of a meta this round (only 5 answers so probably involves a shell, solvers need to be able to pick the answers out of a group of 25 answers), but also indicated that the meta must be able to take a variety of answers, and mentioned that the answers that were submitted for that meta would not necessarily be the ones that it used when we put them all together.16 Members of Palindrome submitted metas, and the meta editors picked a first draft of metapuzzles we liked - the pangram Baker, the currency Dewey, the doubled up Hayden17, the [redacted] Lewis, and the street Roach. Huh, I wonder why the Lewis meta is [redacted].9

We started trying to piece all the metas together. This involved getting a big spreadsheet with all of the information in one spot, and brainstorming answers for the hard to determine letters. A couple things became clear when we tried assigning metameta letter constraints to the submetas. The first was that it was really hard to get the letters R and V for the metameta. These answers needed to contain a bug, but they had to be seven letters or less - and we needed 3 Rs. While it was not hard to create those answers in my mockup, it was hard to create answers that could then be used in actual submetas. To release the pressure on these words, we changed AYN RAND to WAR AND PEACE, making the limit from 7 to 11. I was worried that this would be too hard to find for answers that had to be 12 letters or greater, but Eric Berlin said it was necessary. Turns out, Eric was right.

At that point, we started doing a lot of different word searches on our various programs to see what answers would work with this one change, and it turns out that we can come up with a plethora of answers for each letter and all of the metas can take answers that they need.18 Well, all of them except for the Lewis meta. Both the Lewis meta and the Roach meta had stronger constraints on answers, they weren’t playing well together, and the Lewis meta was having trouble finding answers that worked with it in the first place. While theoretically any answer could fit somewhere in the metameta framework, there were some letters (like D) that weren’t used in the metameta answer, therefore there were some answers that just wouldn’t work. In the end, we got rid of the Lewis meta, and after apologizing to the meta author we then got to work coming up with a new Lewis meta. Mark Halpin noticed the potential for something built around album names, and then after some discussion, I ended up creating the current iteration of Lewis. This also meant that Lewis was the last meta to have its answers finalized, as I was trying to find the right combination of album names to extract in that submeta, give the right letters in the metameta, and also sound like plausible answers that didn’t involve referencing that album.

Once we had a workable set of answers, it was time to test. Testing this was the anchor to a large testsolving session, and we gave each team a giant spreadsheet. There was one tab for each of the metas with that meta’s shell, and then there was one tab for the answers where we started off with 18 answers for them, and then slowly added more as time went on. This gave us great feedback not just about the metas, but how solvers went about the approach of assigning answers to metapuzzles.19 It also told us that the metapuzzles were much more backsolvable than we thought they were, but honestly, we were fine with that. We made some minor adjustments based on the testsolving, then released the feeder answers for puzzles.

I can’t imagine trying to approach writing the Ministry in a different way. This top-down approach was important because we knew that restrictions were going to interact in weird ways, and because everything was going to be interacting with each other, we wanted to make sure what we were doing could work. I can’t imagine setting Wyrm like this.

There is a huge issue that comes with the top-down approach for the Wyrm: The sheer scale of what you need to set at the same time. Because the Wyrm is grabbing six puzzles from the Museum rounds, this means that you would need to set the Wyrm at the same time as the Museum rounds. In addition, because the submeta answers are feeders to the next submeta, you also have to set the Wyrm submeta answers at the same time. However, because Wyrm feeders are paired for the metameta, every time you set a Wyrm feeder, you have to have to set another one at the same time. Is this possible? Yes. Is it a lot? Yes. Is it too much? I would say yes. I certainly wouldn’t want to do that.

However, you run into a problem if you don’t do the top-down approach - at some point you’re going to have a group of answers that you need to write a metapuzzle for, instead of the other way around. This is… not great. There is a game among some folk in the puzzling community called “Spaghetti”, where one person comes up with five random words and tells people to find the answer to the puzzle even though they are literally just five random words.20 If you scroll through past Spaghetti games, there are some incredibly clever finds there that make you wonder if they were really random, but you are allowed to add a sixth answer of your choice and you can make your answer to the meta anything. Avoiding the top-down approach here means that you are writing a metapuzzle given a (partial?) set of feeders and a set answer, which means you have to Spaghetti without any of the flexibility that makes Spaghetti doable. You’re not going to close metapuzzles this way.

However, even with the top-down approach to the structure, you are still going to run into some issues with writing a meta for the puzzles instead of puzzles for the meta. While this isn’t great, not every metapuzzle needs to be a home run, especially when it’s supporting something else. However, because this steals puzzles from the Museum, that means that there’s also the chicken & egg problem between the Museum and Wyrm, which is going to hurt one of them. All of this adds up to a lot of metas that just aren’t close because of how they’re forced to be constructed.

Collage

I have already made my disdain for word webs quite clear. Do I think Collage did everything it needed to? Yes. Do I think it did it well? No. It suffers from a serious case of Fridge Logic21.

First, let’s deal with the word web. I think one of the reasons why the word web genre is well regarded is because of the fondness of or nostalgia for Funny Farm. Look, that may have been a fun game, but it’s not a great mechanic for a puzzle hunt-style puzzle. “Guess words vaguely relating to a theme” is not a great mechanic, as you’re not really deducing anything. It’s not bad as the start to a puzzle, but ideally there should be something deeper going on. I’m still not a fan of Major Monster Mash (from Puzzle University), but at least has the mechanic that all of the meta answers and feeder answers from the main section of the hunt are in the web, so you have a goal, even if it’s still a little vague how you get there. The Cracked Crystal subpuzzle of Endless Practice restricts the web to be compound words or common two-word phrases, which makes filling in the grid less guesswork.

I dunno - this may be my “old woman yells at cloud” moment, but I feel like without something additional, solving a word web in a puzzle hunt is like solving a themeless crossword in a puzzle hunt. Without something extra, it’s just out of place.

However, the fridge logic hits even worse. This puzzle is supposed to be solved earlier in the hunt as a feeder puzzle without having any additional information, but also as a metapuzzle that you can backsolve to get the rest of the feeder answers, while pretending to be a metapuzzle that could be forward solved if the Wyrm Round 4 puzzles actually existed. However, if Collage can be solved as a feeder puzzle, then it’s a really crappy metapuzzle. Also, are you really backsolving the Round 4 answers? You’re not doing any sort of meaningful backsolving. How did you get TRIFORCE? You found a differently-colored word and put it in the answer checker. How did you get MONOPOLY? You found a differently-colored word and put it in the answer checker. Collage is just a feeder puzzle that you get 7 answers out of. There’s no meaningful backsolving here.

break;

The MIT Mystery Hunt is a really weird puzzle hunt compared to other hunts. It is a puzzle hunt where most people do not expect to finish. Tons of puzzle hunters throw themselves at the Hunt every MLK weekend not because they expect to finish, but because they expect to solve a bunch of challenging puzzles that they can’t get anywhere else with a bunch of their friends. I will admit that this is very wild to me. I want to be on a team that finishes Mystery Hunt every year - the ends of Mystery Hunts are cool and I want to see them22. Also, there are all sorts of discussions about the difficulty of Hunt and how long it should take to find the coin and how long HQ should be open for and how many teams should be finishing. But the fact is, the majority of teams probably won’t be completing the Hunt, and writers have to account for that.

This means that you have to not just account for the experience of teams who solve the whole Hunt. You have to also account for the experience of people who get partway and then stop. What groups you target Hunt for and how to give them the best experiences possible is a whole different blog post. However, what I do want to talk about is the teams who only solve part of Wyrm. What is their experience?

Let’s imagine someone who starts solving Wyrm, gets to Round 3, and then the Mystery Hunt ends without them having solved Lost at Sea. What has their experience been?

First of all, the round naturally bottlenecks itself. One round doesn’t unlock until you solve the previous metapuzzle. This means that if you are stuck on a metapuzzle, that’s the only puzzle in the round for a while. This can cause frustrations, especially depending on the team size and the current puzzle radius.

Second of all, this rounds stands out from the other AI rounds as the round that doesn’t stand out. The other rounds all forced you to interact with their gimmick in some way. The two gimmicks that the Wyrm has are the telescoping nature and the fact that their first round puzzles are stolen from somewhere else in the Hunt. The stolen puzzles matter for like two minutes and then fade into irrelevance23. The telescoping nature of the rounds doesn’t actually affect solving in any way. Because of the bottlenecking, you can only solve in the direction you are told to, so new rounds that open are just regular rounds where you happen to have one of the answers already.

In short, this is a worse ⊥IW.nano.

On first glance, the two rounds are fairly similar. Both use a telescoping structure where the answer to a meta is a feeder to another round, and both have bottlenecks because of the sequential nature of the rounds. The difference is that ⊥IW.nano worked backwards. You weren’t just solving puzzles as normal, you had to figure out how the meta worked, then use backsolving logic to get the answer you needed to progress. Even if you didn’t get through the whole round, the very presentation of the first part forces the team to grapple with the fact that the metapuzzle is already solved. Whether this was a good idea or whether it was implemented correctly is a different question24, but the backwards solving is the heart of the round and why the structure worked. For everything that this person sees of the Wyrm, they only see forwards solving, which means that they see ⊥IW.nano but without the heart of that round.

Granted, you don’t want to just repeat the gimmick from a previous hunt. That’s totally fair. The point here isn’t that the Wyrm should have followed the gimmick of ⊥IW.nano, but that it has all the same issues without the interesting bits, and Wyrm’s interesting bits all come after when this person would have stopped. If this had happened in ⊥IW.nano, that person still sees the thing that makes that round cool. If this had happened in any other AI round, that person still sees the thing that makes that round cool. In Wyrm, it just sucks.

Of course, one might take a look at the stats and see that there aren’t a lot of teams where this situation applies. The Scheme was solved by 16 teams and Lost at Sea was solved by 13 teams, so this should only apply to 3 teams, right? No - many people on the 13 teams that solved Lost at Sea never got to see the last round. They might have chosen to or were forced to stop hunting by then, or they were just solving something else while the rest of their team finished the Wyrmhole. They didn’t get to experience the looping structure for themselves. Sure, not everyone on a team gets to participate in solving every metapuzzle, but the entire team feels in the restrictions. Give them something interesting for the barriers that are being thrown up there.

Fixing the Wyrmhole

Obviously, I have a lot of opinions about the Wymhole, especially places where it falls short. Of course, it’s easy to tear something down, it’s harder to fix things. How would I fix this?

Let’s keep a couple things in mind as I do this:

  1. I am making these suggestions after 11 months of thinking about it. teammate didn’t have that luxury.
  2. I am making these suggestions after perhaps the biggest testsolving session of all - the Hunt itself. teammate didn’t have that luxury.
  3. I don’t have to convince anyone else that these suggestions are necessary. teammate had to work as a team, and ideas had to go through editors, hunt admin, tech team, etc.
  4. This is first draft quality, and I don’t have to go any further if I don’t want to. teammate had to produce something that was ready for solvers.25

I still think this is useful as an exercise - otherwise I wouldn’t be putting it here - but keep in mind that I have it easier than teammate and let that be the lens through which you judge this.

Here are my goals for this edit:

That’s a lot of goals. I’d explain them all, but I feel like I already did that in the entire rest of this blog post26, so let’s jump straight in.

Theming

Two of the goals pull against each other here. I want to keep the loop aha, which means that the loop won’t be revealed until the end of the round. However, I want to make the round feel special as you’re solving it, not just at the end. This means that I need something else to make the round feel special. The answer comes from leaning into something else that teammate did. I like teammate’s shenanigans with the beginning where they stole puzzles from other rounds. There is a whole debate about the ethics of AI, their tendency to hallucinate, and the fact that they don’t create anything of their own, they just create things based on the text they’ve read before.27 Let’s use this here. The answer to our problems is plagiarism.28

Let’s talk about how our new theme plays out in the round.

  1. Round 1, Wyrm just steals puzzles from the Museum rounds, but makes small changes to them to make them extract a different answer. These puzzles should be really straightforward if you solved the previous ones. Possible candidates for this include:
  2. Round 2, Wyrm continues stealing puzzles from the Museum rounds, but this time, they are the “evolved” versions of puzzles, much like the Pokémon round from 2018. These puzzles should use similar mechanics as the previous ones, but include a twist to make them harder. Possible candidates for this include:
    • Apples Plus Bananas - Honestly, I think that this could’ve been a later round, especially since it involved coding to get the answer. This could’ve been the evolved version, and an easier version that has similar ideas would work in the Museum.
    • G|R|E|A|T W|H|A|L|E S|O|N|G - There’s something interesting here that could be expanded on. Something in my brain tells me “make this three-dimensional.” Do I know how that would work? No. But it feels like there’s something there.
    • Interpretive Art - The leveled up version would involve putting entire sentences in each box.
    • Weaver - I feel like this is another puzzle that could split up into two puzzles. Put the PEACEKEEPERS answer earlier in the process, people think that they’re done, then surprise! You need to use them again.
  3. Round 3, Wyrm discovers the MIT Mystery Hunt Hunts by Year page and steals puzzles from other years. This is the chance to put in a bunch of sequel puzzles! Sequel puzzles are fun when done correctly. Possible candidates for this include, well, a lot of beloved puzzles from previous years.
  4. Round 4, Wyrm tries to write a bunch of puzzles stealing from all over the internet, but all that comes out is “As a Learned Language Model, I cannot…” with a different excuse for each puzzle. In the bottom right corner, Wyrm says “Sorry! I thought I could write a puzzle about __, but it turns out I can’t. This isn’t solvable.”

This theme has the advantage that solvers are forced to interact with it - you can’t solve a puzzle without dealing with the fact that they’re “plagiarized”. It gives something memorable to think about, and I’m sure the art department could come up with something interesting to make those puzzles distinct.29

It’s also worth saying that I’ve provided some examples of puzzles from the Museum rounds that would work for Rounds 1 & 2, but obviously if you tell people about this ahead of time, they can make puzzles that are designed to do this.

None of the individual rounds are unprecedented. Changing the extraction of a previously solved puzzle was the central mechanic of Dory, from 2015. Evolving a puzzle is from the Pokémon round (as mentioned earlier), and sequel puzzles are done all the time.30

Changing Collage

Obviously, I’m not a fan of Collage, but with this set-up, we no longer have the issue that the answer to a feeder puzzle has to be the same as the answer to the metapuzzle, so we have more flexibility, and we’re going to use that to make things better.

However, first we need to change which puzzle is the hidden metapuzzle. Instead of Collage, we’re going to use Natural Transformation. This will necessitate some changing of the puzzle and possibly adding some more transformation types, but we’re going to keep the same idea for the puzzle. When it comes time to do the loop aha, a couple things need to be true:

  1. The new version of Natural Transformation can’t be forward solved, but can be backsolved from the meta.
  2. Once you know the answer to Natural Transformation, you can start trying to backsolve the words that must have been fed into the final diagram, which is hard, but doable.
  3. This is made easier when you unlock the final round. Each of the puzzles has the text “I cannot write a puzzle about…”, and the thing after the word “about” is a clue to one of the answer words. This should help both clue what answer goes with what puzzle and can help clue what the answer words are if it turns out ambiguous.

This obviously needs testing - but it’s enough that we could actually start writing the puzzle.

The New Sub Metapuzzles

I’ve described the replacement for Collage already, but there are three other metapuzzles that have to be written. I want these metapuzzles to either be more interesting, give light constraints on the answers so that they don’t feel so far31, or ideally, both.

I’m going to start by focusing on the replacements for The Scheme and Lost at Sea right now. We’ll make the Legend our dumping ground, and honestly if that’s our dumping ground, the triangle meta works. It’s not amazing, but it fits everything we wanted.

Round 2’s theme is “evolved” forms of previous puzzles, so my first instinct is to look at the Museum metapuzzles and see what goes well with being evolved. Here are my instincts about each of the metapuzzles:

In Round 3, we’re stealing puzzles from all throughout the MIT Mystery Hunt’s history. I’ll be honest here - I came up with an amazing idea for a metapuzzle. However, it’s good enough (and fully formed enough)that I don’t want to share it here. But let’s take a look at some candidates for a sequel metapuzzle looking at the history of the Hunt.

Am I actually writing these submetapuzzles? Honestly this blog post is 26 pages long in Google Docs already and I need to write for actual puzzle projects I’m getting paid for or have already committed to, so I don’t have the time.33 But I have enough that I could start diving into the weeds if I wanted to.

Am I promising that this is perfect? Absolutely not. But this is definitely first draft quality and is on its way towards being better than the current iteration of Wyrm.

Looping it All Together

It’s been 11 months since the 2023 MIT Mystery Hunt. It’s been 8 months since I started this series. It’s been 4 months since the last installment of this series. I think it’s safe to say that Wyrm has been living rent-free in my head. I’ve been planning this article since I started the series, and the outline has shifted a lot from where it started back in March. Why have I spent this long and this much headspace on this?34

The 2023 Hunt was frustrating because of the difficulty, but I was having fun because of the difficulty. As I mentioned in my Hunt recap, I kinda like those overwhelming Hunts when they happen occasionally. I loved the AI rounds concept, I loved the story, I loved the art, I loved so much about this Hunt. However, much like a surprise tomato in a Caesar Salad Wrap, the metapuzzles were leaving a bad taste in my mouth.35 I had… stronger words during the Hunt, but I couldn’t come up with more specific words. They were not to my liking, but in a way that caused me to look deeper into myself, determine what parts of how I felt were my taste, and what parts were actually bad design.

The original outline of this article (written when I was angrier) ended with the line “Maybe Wyrm was the one AI we shouldn’t have plugged in.” However, after months of contemplation, that’s not really how I feel. First of all, it’s too mean, and if there’s one thing I hope we all agree on, it’s that teammate doesn’t need more anger directed at them. Second, I’m glad that teammate wrote and published this round. It wasn’t broken - everything worked modulo the general editing issues. It was bad, but it was bad in a new and interesting way that caused self-reflection. As an educator, I am happy when my students make mistakes and learn from them, and I encourage people to share their mistakes so that others can learn from them. What kind of person would I be if I didn’t extend that to others outside of the classroom?

If there was one thing that I learned while writing the 2022 Hunt, it’s that writing the Hunt is consuming work.36 Writing the Hunt is not just about the puzzles - it’s pouring your heart and soul into a huge creative project that has meaning far and above the puzzles themselves. Who am I to say that Wyrm shouldn’t have existed? The Wyrm is a major part of the reason why I started this Blog in the first place. The Wyrm spurred me to think deeply about puzzle hunt concepts that I hadn’t heard other people mention before. In a small way, the Wyrm kinda changed my life.

So thank you teammate for writing the Wyrm. Perhaps it wasn’t for the reason you intended, but I enjoyed the experience.

– Cute Mage


  1. I’m having flashbacks to the giant spreadsheet with lots of different possible answers for The Ministry. 

  2. I mean, this could be a JFK SHAGS A SAD SLIM LASS situation, but that’s unlikely to happen, especially with multiple puzzles. More experienced puzzlers are going to compare this to the last puzzles from Pokémon island, where the puzzles are solved by understanding the context of the entire round. 

  3. Yeah yeah yeah. Obviously we didn’t actually solve a bunch of the puzzles. We all know the deal blah blah blah. I’m not here to harp on that problem, especially since of all the AI rounds, this is the round whose structure was affected the least by the buying of puzzles. 

  4. Honestly, when I find myself doing this, it’s often a sign that I’m going off on the wrong path and that I should try something else. 

  5. I mean, We Made a Quiz Bowl Packet but Somewhere Things Went Horribly Wrong37 is a pretty unusual puzzle for not good reasons, and Folded Cards was an unusual puzzle but very enjoyable, but for the purpose of looking at the structure, the puzzles were nothing unusual. 

  6. I believe it joins squad-hours and puzzle radius now. Closeness is off the list thanks to ABCDE. 

  7. At sea38 

  8. I wanna write this textbook. Maybe that’s what this blog is? 

  9. Foreshadowing is a literary device where…  2 3

  10. I’m a geometry teacher. This is nothing new. 

  11. One thing to note - each of the reused answers from the Museum has the restrictions from that round listed, but technically the restriction is “can fit into any one of the Museum meta rounds”. 

  12. And a Fruit Around, but that’s irrelevant to this discussion. 

  13. Note that this has the same caveat as above - any of the set of 5 binary restrictions could apply to any of the twenty-five feeder answers. This is just how we got it to work. 

  14. My spell checker REALLY does not like “Pen Station”. Somehow it hasn’t learned that I really like puns. 

  15. I’ve mentioned this before, but meta construction was done in three groups on Palindrome. Team Aardvark39 was the one I headed up. 

  16. Also in those guidelines was one hard no - the Dewey meta could not be about the Dewey Decimal System. The reason for this was two-fold. First, the Dewey for whom the Dewey Library was named is not the same Dewey for whom the Dewey Decimal System was named.40 Second, it looked like libraries were slowly starting to move away from the Dewey Decimal System for various reasons including the fact that the Decimal System Dewey was not the greatest person to ever exist. 

  17. I don’t remember exactly if we went into the creation with the ideas for the Dewey/Hayden or whether they were created while we were piecing things together. I guess I could look through the Discord to find out, but I don’t want to. 

  18. Fun story, I was searching for things that fit U and it turns out that MOTHERF***** works for U. I put it in the spreadsheet as a joke, but I wouldn’t actually make that as an answer to a puzzle. Turns out that it would have been an even better answer for Go The F*** to Sleep. I don’t know if I would’ve found a Motherf***** st in America though. 

  19. Now you might be thinking “Hey Cute Mage, why didn’t you test the metameta at the same time?” Well, uh, because we didn’t think of that, and when people mentioned it at the end of the testsolve, we had already had a bunch of other stuff scheduled and we didn’t want to run over. Also, the metameta had two prior clean tests, so we weren’t worried about it. Whoops. I think this is my biggest mistake when it comes to editing for the whole process. 

  20. You can see the game played here, and a sweet actual puzzle based on the concept here

  21. Content Warning: I am about to send you to TV Tropes. If you are not careful, you could waste hours and hours on this site. Stay laser focused, read what you need to, and get out. Or not. I’m not your mom. But don’t say I didn’t warn you. TV Tropes: Fridge Logic 

  22. I am working on my deep feelings of FOMO when it relates to the MIT Mystery Hunt overall, but I don’t believe that this particular feeling of FOMO is necessarily a bad thing. 

  23. Maybe they matter for a little bit longer if you are able to use The Legend to backsolve an answer and get credit for it somewhere else in the Hunt. 

  24. For the record, my general answer to this is yes, but I think it’s totally reasonable to not like a round where backsolving is hard required. 

  25. Man, Google Docs really did not like me starting sentences with lowercase letters. 

  26. Sometimes I notice when I’m repeating myself. Sometimes. 

  27. I’m explaining this badly. If you want a better explanation of this - go to Google. 

  28. The AI plagiarism problem is something that is not addressed in the 2023 Hunt at all. This may be because a) it wasn’t as big of a deal in 2022 than I remember or b) they intentionally didn’t want to address it because they wanted people to be on the side of the AIs. For a), fair enough. I don’t remember how much I was thinking about AI plagiarism back in early 2022, and I’m not going back into my head in early 2022. It was not in the best place.41 For b), I believe that there’s an angle that you can take to make Wyrm seem like sympathetic by going for the puppy who tears up the newspaper but is so cute that you can’t stay mad at it. I dunno. I’m just saying that there are other reasons why teammate may have decided not to touch the “AI Plagiarism” angle. 

  29. Pro Tip: If you’re going to do something like this - LET THE ART TEAM KNOW EARLY. They need to plan and it’s a lot of effort to do something weird. 

  30. I wrote two of them in 2022! Crow Facts 300042 and First You GO To… Given that I tend to be a Mystery Hunt historian, this should not surprise anyone. 

  31. Far, as in the opposite of close. 

  32. Sorry, did I say fun? I meant frustrating. I honestly don’t know what’s come over me that I’m suggesting a Rubik’s Cube metapuzzle. I have a very love/hate relationship with the cube. I still have one cube in the final position from testsolving Curious and Determined because it’s pretty and I will never get it back in that setup again. And yes, I testsolved that puzzle with a physical cube over Discord. It worked surprisingly well. 

  33. Apparently I have the time to write this blog post though. I hate my brain sometimes. 

  34. Okay, BESIDES the fact that I am always thinking about creating puzzles/games/lessons/experiences. 

  35. This may seem very specific, but if you don’t like the taste of tomatoes, it only takes one time to have surprise tomatoes in your Chicken Caesar Wrap to remember it forever. 

  36. Okay, if I learned one thing, it’s that if you write a round that causes teams to submit a creative task with every puzzle, thereby increasing the amount of work that your team has to do during Hunt weekend, they will not let you hear the end of it during Hunt weekend. (Love you all.) If there is a second thing, it’s that writing the Hunt is consuming work. 

  37. While I don’t like the puzzle, I absolutely adore comically long puzzle names, so it will always have a space in my heart for that reason. Also the fact that it references TIMEMIT, which was something that I really pushed for in 2022. I still use TIMEMIT. 

  38. I know that’s a bad joke but I just had to take that opportunity. 

  39. I still cannot write this word without going “A-A-R-D-VARK”. Thanks Arthur. 

  40. Look, we did some research into Dewey Library. Still didn’t realize that it was in the same building as Hayden therefore creating a small plot hole, but we knew some things! 

  41. Still isn’t, but for different reason. 

  42. Okay, I hadn’t read Crow Facts 3000 in a while, but I am very proud of some of these tweets. They are great. Shame about Twitter itself though.