Sometimes when you’re developing a product and you are faced with a difficult problem, a clear view of your application structure can help you understand it and solve it faster. Also, having to read and understand poorly written code can make it harder to come up with a solution for the problem.

By using static code analysis we’ve managed to integrate and develop two tools that improve the overall experience of our developers, thus increasing overall code quality as well as general productivity. Read on to find out how we implemented these tools and how they work in our environment.

ASTs

Static code analysis means analyzing code without executing it. This concept is most commonly used to check if the source code has any errors before runtime, and it’s most often seen in compilers which can find lexical, syntactic and even some semantic mistakes.

First, we need to understand how static analysis actually works. At its core we have the AST which stands for Abstract Syntax Tree. An AST is a tree representation of the structure of the source code where leaves represent constants or variables and inner nodes represent operators or statements.

AST Tree Representation
Image from http://vinaytech.wordpress.com

ASTs are also called “parse trees” because they are often the output of a parser (usually the parsing stage of a compiler). In JavaScript some parsers are actively developed, but one of the most used is Esprima. Esprima serves as an important building block for some JavaScript language tools, from code instrumentation to editor autocompletion. To see how esprima works we have the following example:

Code Example before Esprima

Esprima takes the code and returns a JSON formatted, ESTree compatible, object which describes the program as a tree structure. Even though each generated node has its own properties, the “type” property is common to all nodes.

With this info we can extract all the details of the code structure. Esprima can also provide an extra array containing all found tokens that we can use to check for errors like wrong indentation, extra spaces, trailing commas etc.

Code Example after Esprima

Project 1 – Code Linting

Now that we know what ASTs and static analysis are we can move on to the first use case.

Our Problem

Our problem was that, as we develop, we want our code to be properly styled according to our coding conventions, so we can write quality code faster.

Our Solution

The solution to this problem was to integrate ESLint into our workflow. ESLint is an open source project that has the goal to provide a pluggable linting utility for JavaScript. Basically, we’ve set up ESLint to run when a developer tries to commit changes to his local repository (through a git pre-commit hook). ESLint then scans through all the JS files and checks them against configured rules like:

  • indent – specify tab or space width for your code,
  • max-len – specify the maximum length of a line in your program,
  • quotes – specify whether backticks, double or single quotes should be used and
  • camelcase – require camel case names.

If at least one of the rules fails the commit will also fail and an error message will be displayed:

ESLint example error message

But what does ESLint have to do with ASTs?

ESLint comes with a built in parser called Espree (which is an Esprima fork), that it uses to convert the code to ASTs. After the AST was generated, each configured rule will traverse the AST using a nice plugin called estraverse and check for its specific behaviour to finally output errors.

Code Linting with Espree and Estraverse

The Results

After integrating ESLint into our codebase we came up with the following stats:

  • 75 configured rules,
  • 550+ code errors fixed,
  • Rule with most impact (170+ errors): max-len (limit code to 80 chars per line),
  • Lines changed: +1275 -1171.

Project 2 – Code diagram

For my second project I had to solve a problem that often comes across developers who use React. As the application grows, you are less likely to remember how it’s structured and how components change from one pull request to another. Therefore we would like a way to have an overview of our app structure.

The solution we came up with was to generate a tree diagram that displayed the structure of the application. A very simple example can be seen below:

Code Diagram

The diagram generating algorithm follows some steps that ESLint also uses. We first start with parsing the source code from each .jsx file into an AST. Then, we traverse each AST with estraverse and search for React components that are injected by the parent. Once we find the children we recursively traverse them to generate a big virtual tree with all the info we need. The virtual tree will contain information about the children of a component and its node name. In addition to this we can add special info like expected component props and also the props that are sent to the children. In the end we transpose this tree into a graph description language (DOT) that will generate our diagram in a .svg format.

Workflow to get Code Diagram with Espree Estraverse and DOT

Now that we had a clear diagram we still wanted to see the changes that occur as a result of a pull request. Therefore we also implemented a diffing algorithm that was able to color the added, deleted or modified nodes and edges accordingly. An example of this behaviour can be seen below:

Code Diagram using DOT

Finally to make it easier to access by the developers we integrated the diagram generating script into our github workflow. We use a special trigger phrase that, when typed into a github comment, will generate the diagram and post it on the pull request.

The script checks out on the target branch (usually master) and generates the base diagram to be used later for the diff. Then, it checks out back to the source branch (the head of the pull request branch) and generates a head diagram and also checks for modified files between the two branches. Finally it diffs the two diagrams, generates a .svg file, hashes and uploads it to an Amazon AWS bucket and posts it as a comment to github through the github comment api. The result can be seen below:

Final result of code-diagram in Github

Conclusions

To sum up, by integrating these tools in our workflow we obtained both a cleanly styled codebase and also a way of extracting an overview picture of what has changed in a certain pull request thus managing to reduce the time spent on developing, as well as fixing styling errors after code on review.

Flavius Tirnacop

About the Author

Flavius Tirnacop is a Co-op on the Analytics UI team. He loves web development but also enjoys tinkering on robots and gadgets. Follow him on Twitter @flaviusone.