Astute readers will note that I have analyzed the other three AI rounds in descending order of how much I liked them. Conjuri’s Quest is my favorite, ABCDE was quite fun, and Ascent was good despite the metapuzzle and the overall hunt holding it back. Following this pattern, you probably have some guesses about how I feel about the Wyrm. It’s still worth looking into anyway.
This is part of a multi-part series analyzing the various AI rounds from the 2023 MIT Mystery Hunt and seeing what they can teach us about writing puzzle hunts in general.
Note: After this article was posted, Alex Irpan, a member of teammate who was involved with the Wyrmhole, posted
some notes about this article. They are good notes and people should read them. Most of them will make sense after you have read this article, but there is one point that he makes that points out a flaw in my analysis. I will address that when we get to it.
Let me clarify before jumping into this - I’m going to have a bunch of criticisms of the Wyrm round in this article, but that doesn’t mean that I think that Wyrm was broken or shouldn’t have been in the Hunt. The structure did its job, and folks were able to solve it. However, if I was in charge of editing this Hunt, this round would never have made it past me. It is mechanically fine, but in my opinion, missing a bunch to make it good.
But that’s okay. Different teams have different editing styles. Different people have different things that they enjoy from a round of puzzles. My opinion will not invalidate the Wyrm, will not invalidate the work that people did to make it happen, and will not invalidate the story overall. But I am going to explain why this round did not land for me. I mention this for any teammate folks reading - I’m going to have some stronger criticisms of this round than I did for the other AI rounds, so if that’s not something you’re in the appropriate headspace for, please don’t read this.
Lastly, I am of course biased in how I feel. This round was the last round for my team, The MIT Mystery Hunt ✅, and of course the frustration about being stuck on this round is going to affect how I think about it. That being said, one of the reasons why I waited this long to post this is because I wanted to spend time thinking about it after my emotions calmed down. I want to be as fair as I can to the authors of this round, and that involves actually analyzing it, not just repeating my unfiltered feelings.
Positives
Let’s start with my favorite part of this round - the metameta. This is the cleanest metameta of any of the AI rounds. The real/imaginary pairs are a cool idea to base the meta around, and getting the period of the Mandelbrot set is a cool mechanic. While there are technically three different ahas in the solving path, in reality only the real/imaginary aha is a hard aha - the other two are much more straightforward given the cluing of the puzzle.
Is this metapuzzle a little ridiculous? Absolutely. However, it is the metameta for a world, as opposed to the single meta for an individual round, so it should do something ridiculous. Using the Mandelbrot set is definitely in the right scale for the MIT Mystery Hunt, and it feels like a good way to wrap up the round from an artistic standpoint.
An important note is that the metameta has some built-in flexibility. While we’re using the real/imaginary answers to generate complex numbers, we’re then transforming those numbers with arbitrary transformations to get what we need for extraction. We have complete freedom as to what the transformations are, which means that as long as we can get a pair of real/imaginary things that works, we can use them in the meta. This is great for metametas of worlds, as this allows us freedom to construct and find interesting metapuzzles for the round.
The Loop Aha
teammate is very good at environmental puzzles. This shows through in the loading puzzle, which did an absolutely wonderful job of communicating that it was a puzzle despite not being allowed to say that it was. The aha on closing the loop is a round-wide environmental puzzle, and it is well clued. Let’s list the clues:
- First, all of the puzzles (besides the previous metapuzzle) are blank, indicating that there are some shenanigans afoot.
- The art for the fourth round is the same as the round art for Collage in the first round.
- Collage was in the middle of the art for the first round, and the previous metapuzzle for each round is in the middle of the next round’s art. (This is the zooming-out effect)
- All of the round metas involved triangles in some way, and all of the nodes in Collage are triangles.
I wish I could’ve been there when we made that aha, because it’s just so beautiful.
Negatives
Stepping Through the Loop
Round 1
When solvers first interact with this round, they get access to six puzzles that they hopefully have already solved, along with a metapuzzle that directs them to a set of physical objects that has been dropped off outside their door.
The metapuzzle itself is not too hard. The only reason it really took us time was that we were 1) trying to figure out how to represent it for people who weren’t in person and 2) convincing ourselves that this actually worked. A couple of things cross my mind as we’re solving this:
- It’s really weird that the answer list contains
TRIFORCE
when the meta is building a mega-triforce. That feels a little too on the nose.
- These answers could have been anything. The only restriction on them is how many letters they have in total and that they need to not be too similar so that the logic puzzle is unique.
- It seems like the Wyrm’s bit is reusing puzzles from elsewhere in the Hunt, which is an interesting bit.
Those three items ring some alarm bells in my head, but not super loudly.
Round 2
When solvers unlock the next round, it’s revealed that The Legend is a feeder puzzle to a new round. This time the puzzles are new for this round - not just repeats from earlier in the hunt, which gets rid of that as a possible running theme for the round. These puzzles were another step harder than the previous puzzles we’d been given, but that made sense for where we were in the hunt.
Once we had solved enough puzzles, the second metapuzzle opened - The Scheme. Click on that link to see the puzzle, but keep in mind that when we opened it, the errata hadn’t happened yet, so the first line of the flavor text wasn’t there. Let’s take inventory of all the information in the puzzle:
- The flavor text ”What words can help you find your way?”
- A diagram of a triangle, with an arrow partially going around the outside the triangle.
- A series of numbers, with the largest one being 45.
- The context that the previous round’s metapuzzle answer is one of the feeder answers to this round.
- The context of the previous round, whose metapuzzle seemed weird, like we didn’t understand everything that was going on.
- The context that in the previous round, one of the puzzle answers was a clue for what to do.
The numbers at the bottom look like indices. Presumably we’re going to combine these answers in some way to get something that is at least 45 letters long, and then use these numbers as indices for that thing. That something is probably going to involve going clockwise around the outside of a triangle in some way. The nice thing is that the lengths of all the answers add up to 45.
You know what else there’s 45 of? Letters around the perimeter of the triangle from The Legend.
This makes sense. Putting things one-to-one is a very natural thing to do in hunt puzzles, and once we would figure out how the correspondence would work, that would give us the ordering we needed to use the indices. This would also explain the weirdness with the triforce and how unconstrained it was.
Then the errata came out.
An interesting story is told by the stats page for this puzzle. The errata came out at 5:22 PM. In the next 7 minutes, 7 teams solve the meta. What was that errata?
Well, it was actually a hint. The hint added one sentence to the flavor text: ”Stack the words from 1 to 9.” That was enough to get teams to notice that the answer words, if multi-word answers are split into their component words, contain every length from 1-9. In order to get access to every letter, the arrow needs to spiral in the triangle, which means that teams are pushed away from interpreting the arrow as only residing on the perimeter of the triangle.
That errata was absolutely necessary. The idea that you have to split the answers up into individual words has to be clued if you’re going to use it and the split only ends up happening for some answers. Let’s be clear - I’m not saying that the mechanic can never be used. Puzzled Pint uses it a bunch to get around the fact that they only have 4 answers. However, whenever they use it they state that you need to use the words separately. It is not a normal thing to do without being told.
I also wonder what would have happened if they had changed the triangle picture so that the arrow kept going until it had to spiral inward for the first time. That arrow on its own doesn’t clue spiral - the only reason we inferred that it spiraled came from taking all of the other information together and reasoning that it had to.
Round 3
The third round unlocks, once again with a new round of puzzles, including the meta from the previous round as a feeder puzzle. This cements the telescoping structure of the round as the “bit” for this AI.
The puzzles are nothing unusual for this hunt, and the meta doesn’t look like anything too unusual either. But let’s take a look at that meta a bit closer. Here are the steps needed to solve that metapuzzle.
- Figure out that each of the answers is/clues a US Navy Ship.
- Notice that the year corresponds to the ship in some way, which lets you assign the ships to the appropriate row.
- Look at the hull number for each ship, notice that there is always the same number of rectangles as there are letters in the hull number for the ship.
- Recognize that the squares can be filled in with the nautical flags for the letters, and the dots therefore extract a series of colors.
- Recognize that the series of colors can be turned into a series of directions using the compass rose below.
- Use the directions to assign the ships to ships on the map, making a chain of ships.
- Index into the answer of the next ship by the digits from the hull number of the previous ship. Read the letters in the chain.
Some of these steps are perfectly fine, but there are three main issues with this meta.
Step 1 is way too loose
Let’s take a look at how the answers connect to a ship:
- The name of the ship is just the answer
HAMILTON
-> USS Hamilton
UNDERWOOD
-> USS Underwood
- The name of the ship is a substring of the answer
FAVORITE PIN
-> USS Favorite
EYE OF PROVIDENCE
-> USS Providence
- The name of the ship is clued by the answer
GOLD TOUCH
-> USS Midas
TRANSAMERICA PYRAMID
-> USS San Francisco
OLD HICKORY
-> USS Jackson
The answers are split evenly between the three different methods. This is unusual, and for a good reason. When solvers are trying to break into a puzzle, especially a meta, they are looking for a clear connection to show them that they’re on the right track. Experienced puzzle solvers will know that you can match any set of data to another set of data if you just get creative with how you allow the connections - therefore doing this exact thing is an indication that you are not on the right track. The years are not as big of a help as one might imagine, as you don’t know which answer goes with which year, so you’re trying to match something from the pool of answers with the pool of ships. There are later steps that can confirm what you’re doing is correct, but there really should be something earlier to do exactly that.
Basically, when solvers are trying to break into a metapuzzle, they are looking for clear directions that they are in the right direction. This meta uses a muddled connection that obscures the trail far more than normal.
Steps 4-6 are unnecessary
Let’s work through the logic of why we’re doing these steps. In step 4, we use the flags to get us a series of colors. In step 5, we use the colors to get us directions. In step 6, we use the directions to figure out which ship in the map goes with the answer. This gives us a chain of ships… that is exactly the same as the ordering in the puzzle in the first place. In fact, you have to figure out that ordering in order to get the directions you need in the first place! In fact, the puzzle could’ve looked like this and nothing would need to change:
Figure out that each of the answers is/clues a US Navy Ship.
Notice that the year corresponds to the ship in some way, which lets you assign the ships to the appropriate row.
Look at the hull number for each ship, notice that there is always the same number of rectangles as there are letters in the hull number for the ship.
Index into the answer of the next ship by the digits from the hull number of the previous ship. Read the letters in the chain.
Done.
The only real use this has in the puzzle is as a confirmer. It confirms you have the right ships, and it confirms that you have the right order. This is good. Shell metas need confirmers that you are in the right direction. However, this is a lot of ahas and steps to do that confirmation, and it’s too late in the process.
Okay, so this is the part that is actually incorrect. The given ordering of the ships is different from the map ordering of the ships, and both are necessary for different reasons in the final extraction. This was... a decision. I still think it's not great design, but I will admit that my analysis is wrong. I'm leaving this section up for posterity's sake.
Step 7 is index hell
I am adding “index hell” onto my list of phrases I need to explain fully in a blog post at some point, but basically it means that you get to the last step and you have a whole table of data and any pair of things could be a reasonable method of indexing and you don’t know which it is until you try all of them. Sometimes, index hell is the fault of the team who is solving if they miss something that the puzzle is clearly trying to push them towards. However, some puzzles are just more likely to end in index hell.
And look, this isn’t necessarily a bad thing. It’s not something you want all your puzzles to go to, but in the end it can be fun looking in the weeds for the right combination. This is also a great place where skill and intuition can be tested and therefore is not bad in later rounds. However, this combined with the ease at getting lost in the weeds at the beginning leads to a puzzle where the solvers are just lost.
That being said, I will give the authors credit on this one. Someone clearly noticed this and added the flavor text at the top that spells out how to index. Were there more interesting, subtle ways of doing this? Probably. Does this work? Absolutely and I don’t fault them at all.
Fixing Lost at Sea
I don’t want to imply that Lost at Sea would need to be completely scrapped. I think that it’s very fixable with a few small changes. Here they are:
- Put the rows above the picture in a different order - perhaps alphabetical order by ship.
- Have some indicator for which of the three cluing mechanisms each ship type is using.
- Optionally, maybe have something that connects one pair of (answer, ship in table, ship on map).
Lost at Sea is a textbook example of how you can have a good structure for a puzzle but the small details make it much harder or easier than intended. The details matter a ton for puzzles, especially metapuzzles because teams will be attempting to solve those without all the answers. It’s really annoying because some of it is out of your control. You can spend a ton of time perfecting a puzzle, but at some point the time you spend isn’t worth it, and there will still be people who miss those details anyway.
Round 4
This says it’s a round, but it’s really part of the loop aha. It’s already been talked about in the Positives section.
Interconnectedness
Mapping Interconnectedness
One of the things that really struck me at the end of this Hunt was how interconnected it was. Answers get used in one meta, and then reused in other ones. To try to get an understanding of what was going on, I made a map of the whole Hunt. (link to the pdf) There’s a lot more arrows in that graph than there would be in previous years’ graphs.
Interconnectedness is a really important tool for puzzle hunt structure designers. It allows for more complicated structures while keeping the total numbers of puzzles down, which helps make the whole hunt more tractable. The problem with answer reuse is that it puts more and more restrictions on the answers, reducing the set of possible answers and perhaps forcing constructors to settle for nonthematic or just ugly answers. Now, this isn’t a problem unique to answer reuse - every hunt has to ask itself how thematic to the puzzle the answers are going to be and how much they’re willing to rework the answers to make them as nice as possible. However, answer reuse puts strain in this specific area, so it’s worth pointing out.
One trick to solve this is to make one metapuzzle that puts little to no requirements on its answers. This allows the constructor to have some flexibility with the answers they pick, as they can focus on fixing the other answers with more restrictions first. However, the less “close” a metapuzzle is, the more it needs to exceed in other areas to feel like a good metapuzzle.
Wyrm vs. Indana Jones
A good example to look at is the Indiana Jones round from the 2013 MIT Mystery Hunt. That round also had an involved metameta where answers needed to be paired up, and metapuzzles that did not put many restrictions on their answers. Let’s summarize how that round works:
The round is split up into three adventures, aka subrounds. Each subround consists of 8 feeder puzzles. These eight puzzles feed into a rather shelly metapuzzle that is themed around snakes. Every time a team solves the round meta, they unlock a “tablet” which is a piece of the metameta. Answers in the round can be paired up so that they form historical events, such as ALI/FORMAN
or MASH/FINALE
. Each of these events has a specific day in history that they occurred on, which gives us a list of 12 different dates. The symbols on the tablets are a system for writing dates, and the 12 sets of symbols that make the outside circle are the 12 dates indicated from the feeder answers. Once solvers figure out how the date system works, solvers can translate the dates in the middle of the tablet, but they are missing one number each. Each of the dates is an occurrence of MIThenge, and they each have a unique number that can be put in the blank to make the date correct. These years are all between 2001 and 2026, meaning that the last two digits of the years in order spell PARTNERSHIP WITH BOA
.
This comparison makes both the strengths and the weaknesses of Wyrm stand out clearly. Wyrm’s metameta is cleaner than Indiana Jones’, and fits better with what the whole round is doing. However, Wyrm’s submetas are less fun to solve and just less interesting overall. In addition, Indiana Jones has a clear theme of snakes in all of the submetas, which makes sense given Indy’s famous fear of snakes, and the final pun has something to do with snakes. Wyrm’s submeta theme is… triangles?
The Legend’s author notes explains where the triangle theming came from.
The triangle theming within all the Wyrmhole metas originates from this puzzle. We wanted a fractal-y starting meta as a teaser for the rest of the round, and after a proof of concept for this puzzle tested successfully, we backported the triangle theming to all other meta drafts and chose TRIFORCE as the answer to close the loop in Wyrmhole. This did arguably break The Error That Can’t Be Named, but we wanted to make the triangle motif as strong as possible for that answer.
First of all, while I can see the argument that TRIFORCE
didn’t break TETCBN, I wouldn’t want to be the person to defend that side. More importantly, this is an unusual justification for triangles which doesn’t make sense unless you have read that Author’s Note. If someone saw only the beginning of the round and not the end, then the triangles would seem really random.
I’ve gone on a bit of a rant about triangles, but now back to interconnectedness. Let’s list out all the answers in both rounds next to each other.
Indiana Jones Answers |
Wyrm Answers |
ALI CARL SAGAN DISCOVERY FAT FINALE FOREMAN GUNPOWDER LAST LOOKOUT MAN MASH MOLLY MOONWALK MOUNTAIN OATH PITCHER PLOT PLUTO PULITZER RIOT STRAVINSKY SVALBARD TENNIS COURT TOTAL ECLIPSE |
APPLE ARABIAN NIGHTS AVENUE Q BEIJING TIGERS BRITAIN CARBON SINK CERBERUS DISCIPLE DRAGON EYE OF PROVIDENCE FAVORITE PIN FELLOWSHIP GEOMETRIC SNOW GOLD TOUCH HAMILTON HOGWARTS INCEPTION MONOPOLY NINTENDO OLD HICKORY PICO SEA OF DECAY TANGRAM TRANSAMERICA PYRAMID TRIFORCE UNDERWOOD |
Okay, but this isn’t the full story of what’s happening. Let’s take a look at the restrictions on each answer.
Indiana Jones Answers One half of the name of a famous event with a specific date and… |
Wyrm Answers One Half of a Real/Imaginary Pair that can clue a number and… |
ALI - works in the word ladder CARL SAGAN - long enough to make an interesting snake DISCOVERY - has a letter in ATLANTIS FAT - works in the word ladder FINALE - long enough to make an interesting snake FOREMAN - has a letter in ATLANTIS GUNPOWDER - long enough to make an interesting snake LAST - works in the word ladder LOOKOUT - has a letter in ATLANTIS MAN - works in the word ladder MASH - works in the word ladder MOLLY - works in the word ladder MOONWALK - has a letter in ATLANTIS MOUNTAIN - has a letter in ATLANTIS OATH - works in the word ladder PITCHER - long enough to make an interesting snake PLOT - works in the word ladder PLUTO - has a letter in ATLANTIS PULITZER - has a letter in ATLANTIS RIOT - long enough to make an interesting snake STRAVINSKY - long enough to make an interesting snake SVALBARD - long enough to make an interesting snake TENNIS COURT - has a letter in ATLANTIS TOTAL ECLIPSE - long enough to make an interesting snake |
APPLE - Can be in a word web ARABIAN NIGHTS - Can be in a word web AVENUE Q - Has a unique word length, contains some letters of EYE OF PROVIDENCE BEIJING TIGERS - Put in the Triforce, one word is a unique word in one company’s menu, extracts a needed bigram in mate’s meta BRITAIN - Has a unique word length, contains some letters of EYE OF PROVIDENCE CARBON SINK - Put in the Triforce, one word is a unique word in one company’s menu, extracts a needed bigram in mate’s meta CERBERUS - Put in the Triforce, is the name of a conspiracy, extracts a needed bigram in mate’s meta DISCIPLE - Can be in a word web DRAGON - Can be in a word web EYE OF PROVIDENCE - Can clue a Navy Ship, can be produced by a metapuzzle FAVORITE PIN - Can clue a Navy Ship FELLOWSHIP - Can be in a word web, can be produced by a metapuzzle GEOMETRIC SNOW - Put in the Triforce, works in Nuclear Words, extracts a needed bigram in mate’s meta GOLD TOUCH - Can clue a Navy Ship HAMILTON - Can clue a Navy Ship HOGWARTS - Can be in a word web INCEPTION - Has a unique word length, contains some letters of EYE OF PROVIDENCE, can be produced by a metapuzzle MONOPOLY - Can be in a word web NINTENDO - Has a unique word length, contains some letters of EYE OF PROVIDENCE OLD HICKORY - Can clue a Navy Ship PICO - Has a unique word length, contains some letters of EYE OF PROVIDENCE SEA OF DECAY - Has a unique word length, contains some letters of EYE OF PROVIDENCE TANGRAM - Put in the Triforce, works in Nuclear Words, extracts a needed bigram in mate’s meta TRANSAMERICA PYRAMID - Can clue a Navy Ship TRIFORCE - Put in the Triforce, clues a set of colors, extracts a needed bigram in mate’s meta UNDERWOOD - Can clue a Navy Ship |
These restrictions have been colored above based on how strict they are. Black restrictions are ones that put no restrictions on the answer. green restrictions are ones that put minimal if any restrictions on the answer, and orange restrictions are ones that put nontrivial but still not hard restrictions on the answer. (There are higher levels than orange, but I don’t need them for this chart.)
One quick glance at the chart can see the difference between Indiana Jones answers and Wyrm answers. Indiana Jones makes its round work with two restrictions on every answer, whereas Wyrm has multiple answers with four restrictions on it. No wonder The Legend doesn’t put any restrictions on those answers - they’re already going through enough!
At this point we’ve established that the Wyrm answers have more restrictions on them than usual, but how does that affect the actual constructions? To determine that, we need to do a different comparison.
Wyrm vs. The Ministry
These rounds are very different - I’m not here to do a point by point comparison, because that’s worthless given what the rounds were intended to do. Wyrm was meant to be one of four ending rounds, and The Ministry was meant to be a mid-size team’s goal. First, let’s sum up the Ministry for those who aren’t familiar.
- The round consists of 25 puzzles, 5 submetas, and one metameta.
- The 25 puzzles need to be split up into the 5 submetas, each of which gives a different answer.
- The 5 submetas indicate qualities that can apply to the answer phrases, and the metameta gives them an ordering.
- Solvers can use 5 qualities to produce a 5-digit binary number for each puzzle answer in the order given by the round page.
- Solvers can convert the 5-digit binary numbers to letters to spell the answer.
Let’s do a breakdown of the restrictions for the Ministry answers.
Baker Answers - Can be placed in the grid to make the indicated letters a full alphabet
BEEBLEBROX
- Contains the name of a bug
- Doesn’t contain a space
- Even number of letters
- Starting letter is ROYGBIV
- Shorter than 12 letters long
FILMOGRAPHIES
- Doesn’t contain the name of a bug
- Doesn’t contain a space
- Odd number of letters
- Starting letter isn’t ROYGBIV
- Longer than 11 letters long
JAVELIN THROW
- Doesn’t contain the name of a bug
- Contains a space
- Even number of letters
- Starting letter isn’t ROYGBIV
- Longer than 11 letters long
QUANTIZATION
- Contains the name of a bug
- Doesn’t contain a space
- Even number of letters
- Starting letter isn’t ROYGBIV
- Longer than 11 letters long
YARDSTICK
- Contains the name of a bug
- Doesn’t contain a space
- Odd number of letters
- Starting letter is ROYGBIV
- Shorter than 12 letters long
Dewey Answers - Contains an anagram of a world currency and some of the letters of COLORFUL HEAD
BROKENHEARTED
- Doesn’t contain the name of a bug
- Doesn’t contain a space
- Odd number of letters
- Starting letter is ROYGBIV
- Longer than 11 letters long
CARILLONNEURS
- Doesn’t contain the name of a bug
- Doesn’t contain a space
- Odd number of letters
- Starting letter isn’t ROYGBIV
- Longer than 11 letters long
FOOLHARDINESS
- Doesn’t contain the name of a bug
- Doesn’t contain a space
- Odd number of letters
- Starting letter isn’t ROYGBIV
- Longer than 11 letters long
GOSPEL OF ST LUKE
- Doesn’t contain the name of a bug
- Contains a space
- Even number of letters
- Starting letter is ROYGBIV
- Longer than 11 letters long
RECORDANT
- Contains the name of a bug
- Doesn’t contain a space
- Odd number of letters
- Starting letter is ROYGBIV
- Shorter than 12 letters long
Hayden Answers - Contains 2 sets of doubled letters and some of the letters of GOES ON LONGER THAN WAR AND PEACE
BEECHWOOD
- Contains the name of a bug
- Doesn’t contain a space
- Odd number of letters
- Starting letter is ROYGBIV
- Shorter than 12 letters long
GET BOGGED DOWN WITH
- Doesn’t contain the name of a bug
- Contains a space
- Odd number of letters
- Starting letter is ROYGBIV
- Longer than 11 letters long
MAMMOTH HOT SPRINGS
- Contains the name of a bug
- Contains a space
- Odd number of letters
- Starting letter isn’t ROYGBIV
- Longer than 11 letters long
REPROACHLESSNESS
- Contains the name of a bug
- Doesn’t contain a space
- Even number of letters
- Starting letter is ROYGBIV
- Longer than 11 letters long
VALLICELLIANA
- Contains the name of a bug
- Doesn’t contain a space
- Odd number of letters
- Starting letter is ROYGBIV
- Longer than 11 letters long
Lewis Answers - Is the name of an album and contains 4 of the letters in MULTIPART COMPOSITION
AUTOBIOGRAPHY
- Doesn’t contain the name of a bug
- Doesn’t contain a space
- Odd number of letters
- Starting letter isn’t ROYGBIV
- Longer than 11 letters long
BEDTIME STORIES
- Doesn’t contain the name of a bug
- Contains a space
- Even number of letters
- Starting letter is ROYGBIV
- Longer than 11 letters long
ONE IN A MILLION
- Doesn’t contain the name of a bug
- Contains a space
- Odd number of letters
- Starting letter is ROYGBIV
- Longer than 11 letters long
ONE-TRICK PONY
- Doesn’t contain the name of a bug
- Contains a space
- Even number of letters
- Starting letter is ROYGBIV
- Longer than 11 letters long
PARALLEL LINES
- Doesn’t contain the name of a bug
- Contains a space
- Odd number of letters
- Starting letter isn’t ROYGBIV
- Longer than 11 letters long
Rotch Answers - Is the name of a street that crosses a street with the same name as a street in Cambridge
BACON
- Doesn’t contain the name of a bug
- Doesn’t contain a space
- Odd number of letters
- Starting letter is ROYGBIV
- Shorter than 12 letters long
GOLDEN BAMBOO
- Doesn’t contain the name of a bug
- Contains a space
- Even number of letters
- Starting letter is ROYGBIV
- Longer than 11 letters long
MARTIN LUTHER KING JR
- Doesn’t contain the name of a bug
- Contains a space
- Even number of letters
- Starting letter isn’t ROYGBIV
- Longer than 11 letters long
OAK GROVE
- Doesn’t contain the name of a bug
- Contains a space
- Even number of letters
- Starting letter is ROYGBIV
- Shorter than 12 letters long
RIP VAN WINKLE
- Doesn’t contain the name of a bug
- Contains a space
- Even number of letters
- Starting letter is ROYGBIV
- Longer than 11 letters long
(Click to reveal - the table looked really ugly, so we’re using the <details>
tag instead.)
In the last example we showed how some of the Wyrm answers had four restrictions compared to Indiana Jones’ two, making the Wyrm much more restricted. Surely the fact that every answer in The Ministry had six restrictions means that it must be even worse than that, right?
Well, no. To discover why, we need to take a look at how you would construct these rounds and what actual problems these restrictions cause for construction. Fortunately, I can speak for the construction of one of them. Let’s dive into how the Ministry was constructed.
Early in meta construction, we knew that the Ministry was going to be a 25 feeders -> 5 submetas -> metameta structure, but we hadn’t figured out exactly how it would work. While most people were focused on Pen Station metas, Kevin Wald came up with the initial idea for the Ministry metameta. We refined it in Team Aardvark, and then proposed it in much the same form that it was published in. The only two changes were that GOES ON LONGER THAN WAR AND PEACE
was originally GOES ON LONGER THAN AYN RAND
and the feeder answers were dummy answers that demonstrated the mechanic but weren’t intended to be final answers.
This metameta tested well, so while people were still working on Pen Station metas, I typed up a set of guidelines for how Ministry submetas would work. This included some advice for what was expected of a meta this round (only 5 answers so probably involves a shell, solvers need to be able to pick the answers out of a group of 25 answers), but also indicated that the meta must be able to take a variety of answers, and mentioned that the answers that were submitted for that meta would not necessarily be the ones that it used when we put them all together. Members of Palindrome submitted metas, and the meta editors picked a first draft of metapuzzles we liked - the pangram Baker, the currency Dewey, the doubled up Hayden, the [redacted] Lewis, and the street Roach. Huh, I wonder why the Lewis meta is [redacted].
We started trying to piece all the metas together. This involved getting a big spreadsheet with all of the information in one spot, and brainstorming answers for the hard to determine letters. A couple things became clear when we tried assigning metameta letter constraints to the submetas. The first was that it was really hard to get the letters R
and V
for the metameta. These answers needed to contain a bug, but they had to be seven letters or less - and we needed 3 Rs. While it was not hard to create those answers in my mockup, it was hard to create answers that could then be used in actual submetas. To release the pressure on these words, we changed AYN RAND
to WAR AND PEACE
, making the limit from 7 to 11. I was worried that this would be too hard to find for answers that had to be 12 letters or greater, but Eric Berlin said it was necessary. Turns out, Eric was right.
At that point, we started doing a lot of different word searches on our various programs to see what answers would work with this one change, and it turns out that we can come up with a plethora of answers for each letter and all of the metas can take answers that they need. Well, all of them except for the Lewis meta. Both the Lewis meta and the Roach meta had stronger constraints on answers, they weren’t playing well together, and the Lewis meta was having trouble finding answers that worked with it in the first place. While theoretically any answer could fit somewhere in the metameta framework, there were some letters (like D
) that weren’t used in the metameta answer, therefore there were some answers that just wouldn’t work. In the end, we got rid of the Lewis meta, and after apologizing to the meta author we then got to work coming up with a new Lewis meta. Mark Halpin noticed the potential for something built around album names, and then after some discussion, I ended up creating the current iteration of Lewis. This also meant that Lewis was the last meta to have its answers finalized, as I was trying to find the right combination of album names to extract in that submeta, give the right letters in the metameta, and also sound like plausible answers that didn’t involve referencing that album.
Once we had a workable set of answers, it was time to test. Testing this was the anchor to a large testsolving session, and we gave each team a giant spreadsheet. There was one tab for each of the metas with that meta’s shell, and then there was one tab for the answers where we started off with 18 answers for them, and then slowly added more as time went on. This gave us great feedback not just about the metas, but how solvers went about the approach of assigning answers to metapuzzles. It also told us that the metapuzzles were much more backsolvable than we thought they were, but honestly, we were fine with that. We made some minor adjustments based on the testsolving, then released the feeder answers for puzzles.
I can’t imagine trying to approach writing the Ministry in a different way. This top-down approach was important because we knew that restrictions were going to interact in weird ways, and because everything was going to be interacting with each other, we wanted to make sure what we were doing could work. I can’t imagine setting Wyrm like this.
There is a huge issue that comes with the top-down approach for the Wyrm: The sheer scale of what you need to set at the same time. Because the Wyrm is grabbing six puzzles from the Museum rounds, this means that you would need to set the Wyrm at the same time as the Museum rounds. In addition, because the submeta answers are feeders to the next submeta, you also have to set the Wyrm submeta answers at the same time. However, because Wyrm feeders are paired for the metameta, every time you set a Wyrm feeder, you have to have to set another one at the same time. Is this possible? Yes. Is it a lot? Yes. Is it too much? I would say yes. I certainly wouldn’t want to do that.
However, you run into a problem if you don’t do the top-down approach - at some point you’re going to have a group of answers that you need to write a metapuzzle for, instead of the other way around. This is… not great. There is a game among some folk in the puzzling community called “Spaghetti”, where one person comes up with five random words and tells people to find the answer to the puzzle even though they are literally just five random words. If you scroll through past Spaghetti games, there are some incredibly clever finds there that make you wonder if they were really random, but you are allowed to add a sixth answer of your choice and you can make your answer to the meta anything. Avoiding the top-down approach here means that you are writing a metapuzzle given a (partial?) set of feeders and a set answer, which means you have to Spaghetti without any of the flexibility that makes Spaghetti doable. You’re not going to close metapuzzles this way.
However, even with the top-down approach to the structure, you are still going to run into some issues with writing a meta for the puzzles instead of puzzles for the meta. While this isn’t great, not every metapuzzle needs to be a home run, especially when it’s supporting something else. However, because this steals puzzles from the Museum, that means that there’s also the chicken & egg problem between the Museum and Wyrm, which is going to hurt one of them. All of this adds up to a lot of metas that just aren’t close because of how they’re forced to be constructed.
Collage
I have already made my disdain for word webs quite clear. Do I think Collage did everything it needed to? Yes. Do I think it did it well? No. It suffers from a serious case of Fridge Logic.
First, let’s deal with the word web. I think one of the reasons why the word web genre is well regarded is because of the fondness of or nostalgia for Funny Farm. Look, that may have been a fun game, but it’s not a great mechanic for a puzzle hunt-style puzzle. “Guess words vaguely relating to a theme” is not a great mechanic, as you’re not really deducing anything. It’s not bad as the start to a puzzle, but ideally there should be something deeper going on. I’m still not a fan of Major Monster Mash (from Puzzle University), but at least has the mechanic that all of the meta answers and feeder answers from the main section of the hunt are in the web, so you have a goal, even if it’s still a little vague how you get there. The Cracked Crystal subpuzzle of Endless Practice restricts the web to be compound words or common two-word phrases, which makes filling in the grid less guesswork.
I dunno - this may be my “old woman yells at cloud” moment, but I feel like without something additional, solving a word web in a puzzle hunt is like solving a themeless crossword in a puzzle hunt. Without something extra, it’s just out of place.
However, the fridge logic hits even worse. This puzzle is supposed to be solved earlier in the hunt as a feeder puzzle without having any additional information, but also as a metapuzzle that you can backsolve to get the rest of the feeder answers, while pretending to be a metapuzzle that could be forward solved if the Wyrm Round 4 puzzles actually existed. However, if Collage can be solved as a feeder puzzle, then it’s a really crappy metapuzzle. Also, are you really backsolving the Round 4 answers? You’re not doing any sort of meaningful backsolving. How did you get TRIFORCE
? You found a differently-colored word and put it in the answer checker. How did you get MONOPOLY
? You found a differently-colored word and put it in the answer checker. Collage is just a feeder puzzle that you get 7 answers out of. There’s no meaningful backsolving here.
break;
The MIT Mystery Hunt is a really weird puzzle hunt compared to other hunts. It is a puzzle hunt where most people do not expect to finish. Tons of puzzle hunters throw themselves at the Hunt every MLK weekend not because they expect to finish, but because they expect to solve a bunch of challenging puzzles that they can’t get anywhere else with a bunch of their friends. I will admit that this is very wild to me. I want to be on a team that finishes Mystery Hunt every year - the ends of Mystery Hunts are cool and I want to see them. Also, there are all sorts of discussions about the difficulty of Hunt and how long it should take to find the coin and how long HQ should be open for and how many teams should be finishing. But the fact is, the majority of teams probably won’t be completing the Hunt, and writers have to account for that.
This means that you have to not just account for the experience of teams who solve the whole Hunt. You have to also account for the experience of people who get partway and then stop. What groups you target Hunt for and how to give them the best experiences possible is a whole different blog post. However, what I do want to talk about is the teams who only solve part of Wyrm. What is their experience?
Let’s imagine someone who starts solving Wyrm, gets to Round 3, and then the Mystery Hunt ends without them having solved Lost at Sea. What has their experience been?
First of all, the round naturally bottlenecks itself. One round doesn’t unlock until you solve the previous metapuzzle. This means that if you are stuck on a metapuzzle, that’s the only puzzle in the round for a while. This can cause frustrations, especially depending on the team size and the current puzzle radius.
Second of all, this rounds stands out from the other AI rounds as the round that doesn’t stand out. The other rounds all forced you to interact with their gimmick in some way. The two gimmicks that the Wyrm has are the telescoping nature and the fact that their first round puzzles are stolen from somewhere else in the Hunt. The stolen puzzles matter for like two minutes and then fade into irrelevance. The telescoping nature of the rounds doesn’t actually affect solving in any way. Because of the bottlenecking, you can only solve in the direction you are told to, so new rounds that open are just regular rounds where you happen to have one of the answers already.
In short, this is a worse ⊥IW.nano.
On first glance, the two rounds are fairly similar. Both use a telescoping structure where the answer to a meta is a feeder to another round, and both have bottlenecks because of the sequential nature of the rounds. The difference is that ⊥IW.nano worked backwards. You weren’t just solving puzzles as normal, you had to figure out how the meta worked, then use backsolving logic to get the answer you needed to progress. Even if you didn’t get through the whole round, the very presentation of the first part forces the team to grapple with the fact that the metapuzzle is already solved. Whether this was a good idea or whether it was implemented correctly is a different question, but the backwards solving is the heart of the round and why the structure worked. For everything that this person sees of the Wyrm, they only see forwards solving, which means that they see ⊥IW.nano but without the heart of that round.
Granted, you don’t want to just repeat the gimmick from a previous hunt. That’s totally fair. The point here isn’t that the Wyrm should have followed the gimmick of ⊥IW.nano, but that it has all the same issues without the interesting bits, and Wyrm’s interesting bits all come after when this person would have stopped. If this had happened in ⊥IW.nano, that person still sees the thing that makes that round cool. If this had happened in any other AI round, that person still sees the thing that makes that round cool. In Wyrm, it just sucks.
Of course, one might take a look at the stats and see that there aren’t a lot of teams where this situation applies. The Scheme was solved by 16 teams and Lost at Sea was solved by 13 teams, so this should only apply to 3 teams, right? No - many people on the 13 teams that solved Lost at Sea never got to see the last round. They might have chosen to or were forced to stop hunting by then, or they were just solving something else while the rest of their team finished the Wyrmhole. They didn’t get to experience the looping structure for themselves. Sure, not everyone on a team gets to participate in solving every metapuzzle, but the entire team feels in the restrictions. Give them something interesting for the barriers that are being thrown up there.
Fixing the Wyrmhole
Obviously, I have a lot of opinions about the Wymhole, especially places where it falls short. Of course, it’s easy to tear something down, it’s harder to fix things. How would I fix this?
Let’s keep a couple things in mind as I do this:
- I am making these suggestions after 11 months of thinking about it. teammate didn’t have that luxury.
- I am making these suggestions after perhaps the biggest testsolving session of all - the Hunt itself. teammate didn’t have that luxury.
- I don’t have to convince anyone else that these suggestions are necessary. teammate had to work as a team, and ideas had to go through editors, hunt admin, tech team, etc.
- This is first draft quality, and I don’t have to go any further if I don’t want to. teammate had to produce something that was ready for solvers.
I still think this is useful as an exercise - otherwise I wouldn’t be putting it here - but keep in mind that I have it easier than teammate and let that be the lens through which you judge this.
Here are my goals for this edit:
- I want to keep the metameta - I may have to change the answers, but the idea is honestly pretty cool. The real/imaginary aha is awesome, and I like the overall mechanic allowing for flexibility in a round that is going to constrain its answers.
- I want to keep the loop aha as much as I can. It’s clear that this was what the round was built around, and I want to respect that. I want to make the best version of that idea.
- Try to remove the fridge logic from Collage (or the puzzle that replaces it).
- Reduce the amount of interconnectivity to release the pressure on some puzzles.
- See if we can make the intermediate metapuzzles at least a bit more interesting.
- Make the round feel special as you’re solving it, not just at the end.
That’s a lot of goals. I’d explain them all, but I feel like I already did that in the entire rest of this blog post, so let’s jump straight in.
Theming
Two of the goals pull against each other here. I want to keep the loop aha, which means that the loop won’t be revealed until the end of the round. However, I want to make the round feel special as you’re solving it, not just at the end. This means that I need something else to make the round feel special. The answer comes from leaning into something else that teammate did. I like teammate’s shenanigans with the beginning where they stole puzzles from other rounds. There is a whole debate about the ethics of AI, their tendency to hallucinate, and the fact that they don’t create anything of their own, they just create things based on the text they’ve read before. Let’s use this here. The answer to our problems is plagiarism.
Let’s talk about how our new theme plays out in the round.
- Round 1, Wyrm just steals puzzles from the Museum rounds, but makes small changes to them to make them extract a different answer. These puzzles should be really straightforward if you solved the previous ones. Possible candidates for this include:
- Round 2, Wyrm continues stealing puzzles from the Museum rounds, but this time, they are the “evolved” versions of puzzles, much like the Pokémon round from 2018. These puzzles should use similar mechanics as the previous ones, but include a twist to make them harder. Possible candidates for this include:
- Apples Plus Bananas - Honestly, I think that this could’ve been a later round, especially since it involved coding to get the answer. This could’ve been the evolved version, and an easier version that has similar ideas would work in the Museum.
- G|R|E|A|T W|H|A|L|E S|O|N|G - There’s something interesting here that could be expanded on. Something in my brain tells me “make this three-dimensional.” Do I know how that would work? No. But it feels like there’s something there.
- Interpretive Art - The leveled up version would involve putting entire sentences in each box.
- Weaver - I feel like this is another puzzle that could split up into two puzzles. Put the
PEACEKEEPERS
answer earlier in the process, people think that they’re done, then surprise! You need to use them again.
- Round 3, Wyrm discovers the MIT Mystery Hunt Hunts by Year page and steals puzzles from other years. This is the chance to put in a bunch of sequel puzzles! Sequel puzzles are fun when done correctly. Possible candidates for this include, well, a lot of beloved puzzles from previous years.
- Round 4, Wyrm tries to write a bunch of puzzles stealing from all over the internet, but all that comes out is “As a Learned Language Model, I cannot…” with a different excuse for each puzzle. In the bottom right corner, Wyrm says “Sorry! I thought I could write a puzzle about __, but it turns out I can’t. This isn’t solvable.”
This theme has the advantage that solvers are forced to interact with it - you can’t solve a puzzle without dealing with the fact that they’re “plagiarized”. It gives something memorable to think about, and I’m sure the art department could come up with something interesting to make those puzzles distinct.
It’s also worth saying that I’ve provided some examples of puzzles from the Museum rounds that would work for Rounds 1 & 2, but obviously if you tell people about this ahead of time, they can make puzzles that are designed to do this.
None of the individual rounds are unprecedented. Changing the extraction of a previously solved puzzle was the central mechanic of Dory, from 2015. Evolving a puzzle is from the Pokémon round (as mentioned earlier), and sequel puzzles are done all the time.
Changing Collage
Obviously, I’m not a fan of Collage, but with this set-up, we no longer have the issue that the answer to a feeder puzzle has to be the same as the answer to the metapuzzle, so we have more flexibility, and we’re going to use that to make things better.
However, first we need to change which puzzle is the hidden metapuzzle. Instead of Collage, we’re going to use Natural Transformation. This will necessitate some changing of the puzzle and possibly adding some more transformation types, but we’re going to keep the same idea for the puzzle. When it comes time to do the loop aha, a couple things need to be true:
- The new version of Natural Transformation can’t be forward solved, but can be backsolved from the meta.
- Once you know the answer to Natural Transformation, you can start trying to backsolve the words that must have been fed into the final diagram, which is hard, but doable.
- This is made easier when you unlock the final round. Each of the puzzles has the text “I cannot write a puzzle about…”, and the thing after the word “about” is a clue to one of the answer words. This should help both clue what answer goes with what puzzle and can help clue what the answer words are if it turns out ambiguous.
This obviously needs testing - but it’s enough that we could actually start writing the puzzle.
I’ve described the replacement for Collage already, but there are three other metapuzzles that have to be written. I want these metapuzzles to either be more interesting, give light constraints on the answers so that they don’t feel so far, or ideally, both.
I’m going to start by focusing on the replacements for The Scheme and Lost at Sea right now. We’ll make the Legend our dumping ground, and honestly if that’s our dumping ground, the triangle meta works. It’s not amazing, but it fits everything we wanted.
Round 2’s theme is “evolved” forms of previous puzzles, so my first instinct is to look at the Museum metapuzzles and see what goes well with being evolved. Here are my instincts about each of the metapuzzles:
- Artistic Vision is slightly off of Rubik’s Cube colors. Mapping rubik’s cublets to the alphabet is always fun. Even if the Rubik’s Cube doesn’t work out, I feel like things that are known for exactly three colors might be interesting. It also occurs to me that if yellow/green is swapped, these are the Magic the Gathering colors.
- Nuclear Words feels like there could be other colors of atoms that define different properties. For example, you could also have yellow/green for vowel/consonant, and then B could either be blue or green any time it shows up. I would make sure I am a physically far distance from the meta’s authors before suggesting this though.
- A Conspiracy Network potentially has something with the different hash methods, but I’m not excited by it.
- I’m going to avoid The Cafe and The King. I can think of interesting evolved versions of both, but all of those involve putting a bunch of restrictions on the answers.
In Round 3, we’re stealing puzzles from all throughout the MIT Mystery Hunt’s history. I’ll be honest here - I came up with an amazing idea for a metapuzzle. However, it’s good enough (and fully formed enough)that I don’t want to share it here. But let’s take a look at some candidates for a sequel metapuzzle looking at the history of the Hunt.
- From 2022, any of the Minister metas are a candidate here. Lewis if you’re daring, Barker if you’re not.
- From 2021, Next House seems like it could have a spin on it, and Next House just sounds appropriate for a sequel puzzle.
- From 2018, Fear seems like it would be good fodder for a sequel.
- From 2013, Highlight Reel feels like another one that could be updated.
Am I actually writing these submetapuzzles? Honestly this blog post is 26 pages long in Google Docs already and I need to write for actual puzzle projects I’m getting paid for or have already committed to, so I don’t have the time. But I have enough that I could start diving into the weeds if I wanted to.
Am I promising that this is perfect? Absolutely not. But this is definitely first draft quality and is on its way towards being better than the current iteration of Wyrm.
Looping it All Together
It’s been 11 months since the 2023 MIT Mystery Hunt. It’s been 8 months since I started this series. It’s been 4 months since the last installment of this series. I think it’s safe to say that Wyrm has been living rent-free in my head. I’ve been planning this article since I started the series, and the outline has shifted a lot from where it started back in March. Why have I spent this long and this much headspace on this?
The 2023 Hunt was frustrating because of the difficulty, but I was having fun because of the difficulty. As I mentioned in my Hunt recap, I kinda like those overwhelming Hunts when they happen occasionally. I loved the AI rounds concept, I loved the story, I loved the art, I loved so much about this Hunt. However, much like a surprise tomato in a Caesar Salad Wrap, the metapuzzles were leaving a bad taste in my mouth. I had… stronger words during the Hunt, but I couldn’t come up with more specific words. They were not to my liking, but in a way that caused me to look deeper into myself, determine what parts of how I felt were my taste, and what parts were actually bad design.
The original outline of this article (written when I was angrier) ended with the line “Maybe Wyrm was the one AI we shouldn’t have plugged in.” However, after months of contemplation, that’s not really how I feel. First of all, it’s too mean, and if there’s one thing I hope we all agree on, it’s that teammate doesn’t need more anger directed at them. Second, I’m glad that teammate wrote and published this round. It wasn’t broken - everything worked modulo the general editing issues. It was bad, but it was bad in a new and interesting way that caused self-reflection. As an educator, I am happy when my students make mistakes and learn from them, and I encourage people to share their mistakes so that others can learn from them. What kind of person would I be if I didn’t extend that to others outside of the classroom?
If there was one thing that I learned while writing the 2022 Hunt, it’s that writing the Hunt is consuming work. Writing the Hunt is not just about the puzzles - it’s pouring your heart and soul into a huge creative project that has meaning far and above the puzzles themselves. Who am I to say that Wyrm shouldn’t have existed? The Wyrm is a major part of the reason why I started this Blog in the first place. The Wyrm spurred me to think deeply about puzzle hunt concepts that I hadn’t heard other people mention before. In a small way, the Wyrm kinda changed my life.
So thank you teammate for writing the Wyrm. Perhaps it wasn’t for the reason you intended, but I enjoyed the experience.
–
Cute Mage