“An error occurred, please try again.” Hmm – PC Load Letter? This perfectly nondescript error message came up only in Production, and only in a specific scenario. What do you when faced with limited information and a bug you can’t reproduce locally?

This was my first significant challenge as a co-op student. Fixing it involved equal parts debugging tools, intuition, and continuous delivery. Today, I’m sharing how I reached the root of the problem and resolved it.

Default Template Error
Generic errors are the best


The Problem

The Hootsuite account had over 80 Twitter accounts connected. When trying to create a Twitter Aggregate Report with 82 or more Twitter accounts, the report’s creation failed. What gives?

Kibana to the Rescue

The first step was to find out what the error was. We use Kibana to track errors, debug messages, and other data in production, which made this easy. I looked up this user in our installation, and found an exception being logged whenever the report failed to create.

This exception gets triggered at a relatively high level in our system, when a token is missing from a form submission. However, this error occurred only when attempting to create the report with 82 or more Twitter accounts.

Magic Numbers

Two key points here:

  • The error was thrown long before the user’s request reached any Analytics-specific logic
  • 82 seemed like a “magic number” for the report creation to fail with, but there was no sign of that number in the code

Magic numbers in code are always suspicious, and I was quite confident that the error could only be thrown if the token was actually missing or invalid.

Before embarking on the arduous exercise of creating 80+ test accounts on Twitter to reproduce the issue locally (where I could use Xdebug to examine the program’s runtime state), I decided to check the obvious first: does the token reach the server?

The token is there!

A peek with my browser’s developer tools quickly assured me that it was. Interestingly, the token was the very last input field in the request, after all the fields representing the selected Twitter accounts and other report parameters.

Eureka!

This is where intuition came in: the token was always the last field in the HTTP request, and after adding a “magic” number of Twitter accounts to the request, the token check failed. Though I was unable to run a debugger in production to confirm this, I figured that the input was being truncated – especially after counting the sheer number of form fields the request sent with that many Twitter accounts.

A little Google-fu – “how much input can php handle” – led me to this StackOverflow question, which revealed the max_input_vars setting in PHP. Checking our production php.ini config, I found this limit set lower than the number of fields the failing request sent, which each selected Twitter account added 12 fields to. The pieces of the puzzle all lined up!

The Solution

While we could have raised this PHP setting, this would only have pushed the “magic” limit of Twitter accounts in the report higher. Instead, my solution to the issue was to serialize the report parameters into a string – that way, we’d always stay well clear of the max_input_vars limit no matter how many Twitter accounts a user added to a report.

Client-side, serializing to JSON is trivial. Server-side, it makes all the difference.
Client-side, serializing to JSON is trivial. Server-side, it makes all the difference.

My first attempt at a solution parsed the serialized string with parse_str, but I quickly learned that this, too, was subject to the tyranny of max_input_vars. Fortunately, json_decode and our JSON library were not, so serializing the data as JSON worked perfectly.

Key Takeaways

  • Be suspicious of magic numbers
  • Resist the temptation to assume an error is wrong. Assume an error is correctly thrown, and check whether the condition it describes is actually happening, before tearing up the codebase
  • Logging errors in production is incredibly useful – it’s impossible to reproduce some things locally in a reasonable amount of time
  • Knowing exactly how your production environment is set up is key to understanding configuration-related issues
  • Continuous deployment makes it trivial to try a second solution when the first one doesn’t work

The combination of Hootsuite’s deployment pipeline, our available tools, and a critical look for “things” that stand out, were instrumental in quickly fixing the problem.

Further Reading

Special Thanks

Shoutouts to Shane Da Silva, Chris Richardson, Jake King, Bethany Foote, Noel Pullen, and Kimli Welsh for helping me put this post together, including advice, feedback, and revisions; and to Catherine Tan for showing me the ropes with Kibana. You’re all awesome.

About the AuthorPD

Peter Deltchev is a co-op student on our CoreUX team by day and and cartoon enthusiast at night. His interests include building fan sites, running conventions, and automating all the things. Follow him on Twitter @Feld0.