saylornotes

The Blog of Chris Saylor

Building a Chess bot for Slack

August 23, 2018 engineering Chris Saylor

With Atlassian’s announcement suspending development of Stride and dropping support for Hipchat in favor of Slack, I decided that the time was right to learn and experiment with Slack integrations. Any time I take on learning some new technology, I always make a project that utilizes much of the features of that technology. The project needs to be small enough that I can complete it relatively quickly, but deep enough to flex the technology I am learning. Recently, I’ve been getting back into Chess and I wondered if I could make an interactive Chess game inside of Slack’s platform.

Learn the game

The key to any project is that you need to know the domain of what you are trying to build. How well did I really know the rules of Chess? I knew the piece movements, king checking, castleing, and checkmate. What I didn’t know:

  • You can’t castle the king while in check, if it would put the king in check, or even pass through a square that could be attacked.

The highlighted squares are where the king would go if castling on either side. The king-side castle (O-O) is not valid because the queen would check, and the queen-side castle (O-O-O) is not valid because the king would have to “move through” the endangered square D1.

  • Various automatic draws (stalemate, threefold repatition, fifty-move rule, etc)
  • There are a couple of notations for representing a move.
  • There are a couple of ways to serialize a game (FEN/PGN).

Before I even looked at Slack, I learned all the rules in order to know what I needed to implement.

Design Phase

What does a Chess game inside a team-collaboration chat tool look like? How do I render the board? How do players communicate moves? The game needed to do the following things in order to be successfully playable:

  • Allow players to challenge each other and accept or reject those challenges.
  • Display the game board.
  • Notify the player that it is their turn (since this is asyncronous communication).
  • Contain games to message threads (so subsequent board renders and players moves don’t clutter up the channel) and also encourage some friendly banter during the game.
  • Communicate moves to the bot.
  • Displaying the winner.

To start the game off, I needed a way for a player to challenge another player. In the channel where the chessbot is available, a player would mention the chessbot and “challenge” another player by mention.

@ChessBot challenge @cjsaylor

The bot would issue a challenge via IM and send an interactive message.

The bot would respond to the interactive message in order to remove the buttons.

Do I draw the board with ascii symbols or render the board as an image? Ascii is the easiest, but also the dullest to look at. It also lacks certain features that images could support: highlighting the previous move, highlight a king that is in check, etc. The image was a better option, and Slack supports displaying images by way of message “attachments”.

The only viable way of communicating moves is by sending text to the bot. There are a couple of notations that could be used, but I chose standard algebraic notation (SAN) for it’s ease of use (since it is purely grid positions, d2d4 for example). Otherwise, players would have to know how to express checks, captures, and castling (O-O, O-O-O) with no way to know if it would work until after sending the message.

Implementation

The first part I worked on was rendering the board. I built a stand-alone golang library called chessimage that could be used to render board states as images. Since Slack will display this via image_url, I implemented this into a webserver endpoint. I waffled back and forth if the endpoint would use a FEN serialization to explicitely tell the webserver what to render, or be given a game ID to look up the serialized game and render that. After implementing the game ID lookup, it suffered from one fatal flaw: it would only render the current state, and no previous states of the game. If the player refreshed the browser page or looked at it on another device, then all previous renderings of the board would look like the current state of the game. Because of this, I went back to rendering by passing a FEN string, but also a signature to avoid it being generically used outside of the chessbot service.

On the subject of game ID, it turns out that choosing to contain all the messaging into a thread gave us an unintentional, but very useful benefit: The thread ID could also serve as a game ID. This makes retrieving the game state when a player makes a move trivial and also allows the player to have more than one game going on at a time in the same channel. At first, I implemented a very simple memory store for the game state, however if the server needed to be restarted, everyone’s game would be lost, so after I stabalized the storage interface, I implemented a Sqlite data store to persist between server restarts.

For the game logic, I used an established library for validation and serialization. To store the game in sqlite, I serialized the game state into portable game notation (PGN). This gives us a complete history of all moves of a game. I would then feed in player moves from the app_mention events and then send a message to the user with a rerendering of the board.

All the communication is implemented through Slack’s event callbacks. The only event the bot listens to is the app_mention event. To do different commands like help, challenge and regular moves, I parse the messages with regular expressions.

Testing complications

One of the first things I did while implementing the game logic was to create a simple REPL that took the same input that slack would but without the long and complicated feedback loop of Slack webhooks. This was tremendously helpful as I could test the game loops, inputs, player setup, and outcomes.

Of course, the Slack integration would eventually need to be tested. I used ngrok to expose my local Golang webserver to Slack in order to receive and react to all the events and message interactions. This was a tedious process as the hostname would change, which would require updating the Slack configuration for the app. For ease of development, I would always challenge myself, however this left two bugs hidden during development:

  1. Displaying the current player’s turn was backwards. (fixed in e3ec7a14)
  2. Other players could issue move commands when it was not their turn. Indeed, anyone mentioning the bot in the channel could move on anyone’s behalf. (fixed in 283d132c)

Coincidently enough, Slack’s app distribution checklist specifically asks for a clean workspace with 2 users, which I would guess would catch these sorts of quirks. Speaking of app distribution:

Deployment

Up until this point, we’ve been using a manually installed app token in our Slack development workspace, however in order to be installed on other workspaces, the manual bot token needed to be replaced with an oauth workflow.

The only thing needed was an endpoint on our webserver that would accept an authorization code and exchange it for an authentication token (specifically the bot authentication token). Our app does not request any special scope permissions, since everything is driven through Slack’s bot interface, so we discard the authentication token from the user and only keep the bot token. The response includes a team ID, so the bot token can be stored per team in a key/value store. Since we have a very simple case, we simply store this in Sqlite along with all the other game data.

Following the same pattern with our challenge and game data, we implemented both a memory and sqlite implementation from a common interface.

A note on security

As with any side project, this is hosted along other applications on a VPS. Even though Go applications are very simple to run on other environments (change the compile target), all the other applications on the server run in Docker context since they are a mixed bag of technologies. Since we’re storing some sensitive access tokens on our server, it’s important to be mindful of access to it. I wrapped the web server in a docker image and created an encrypted data volume to house the sqlite file, so that only the docker runtime user has read and write priviledges to that data file on the host machine. An Nginx web server container is exposed through the firewall and a private docker network is used to proxy the request from Nginx to the go webserver.

To improve security (or at least the ease of encrypting the data) include Hashicorp’s Vault as a container that is only networked to the go webserver container, and then it could store the access tokens in a key-value secrets engine.

Retrospective

During my time developing this bot, Slack provides a great platform with a lot of features that enabled this game to function relatively well. However, it is not without its worts.

The API while pretty well documented, is under constant change. Finding information on particular implementation details or questions in general of the Slack platform is challenging due to almost all of it being out of date. What makes matters worse is all the APIs that are deprecated have features that aren’t fully supported yet on whatever new API they are promoting.

Slack’s RTM mechanism would be awesome to use for this bot to decrease lag and not require so much serializing/deserializing of the game, however it doesn’t yet have all the messaging abilities of the regular events API. It is also a bit clunky to know when a user directly mentions the bot since you have to parse it out of the text of the incoming messages yourself. Finally, we had to implement a web server for the authentication and image rendering, so it would be a lot of effort just to have a quicker receipt of messages.

Future development

Implementing a DB backend like Redis for the challenge, game, and oauth token storage would enable the bot to scale horizontally, although since it doesn’t earn money, I’d be loath to scale it up.

Alternatively, since the app is almost entirely webhook based, this would be a perfect opportunity to implement on a function-as-a-service like AWS Lambda. A simple API gateway coupled with some Lambda functions would be a fantastic use case for Slack bot communication. Additionally, being on AWS would allow usage of S3 or dynamodb as a state storage. Given low volume, this option may be next to free to host, and most of the web handling functions could be re-used almost verbatum with the Lambda functions. You’d just toss the webserver part.

Another nice to have would be to use a concurrency-safe LRU cache that would allow us to store a fixed number of games in memory in order to avoid having to deserialize the game state from sqlite (or other DBs) on every move command. We could then only serialize when the LRU cache was about to dump the least used games and only deserialize when we have to go to the DB to look up the game. This would be a great mechanism to control the amount of memory pulled in for game storage and for the players that are playing the most, they wouldn’t burn up requests to the DB. hashicorp/golang-lru looks like a good candidate for this use case as it also provides a callback when an item is evicted which could be used to serialize the game and store it. The os/signal package would need to be used to listen for SIGTERM and serialize all the games in the LRU cache.

I also think it would be neat to implement Stockfish as a way of playing against an AI bot as opposed to requiring another player. I initially wanted to do this, but there are so many alternative ways to play chess with a bot that doesn’t involve the sending messages, I abandoned as it would probably never be used by anyone, however it would serve as a good way to test the system if they behave like a player.

Conclusion

After all this research into Chess, I’m still not very good at actually playing Chess yet. I had a lot of fun developing this bot. I’ve played a few games in Slack with friends and it works surprisingly well. Does this supplant something like Lichess? Absolutely not, but it can be a good way to keep your remote team socially engaged and connected.

All of the code for the chess bot is open source: https://github.com/cjsaylor/chessbot.

You can also add the hosted version of the bot to your workspace directly:

Add to Slack

Credits

  • fogleman/gg  which made drawing the board graphics a snap.
  • notnil/chess for providing an excellent library for validating and serializing chess games.
  • nlopes/slack  which offered a pretty comprehensive library of all slack events, methods, and communications channels.
  • David Lapetina for the featured image

All chessboard images were generated by the chessimage library.