Ben Joffe

JavaScript Syntax Highlighter

This is a JavaScript syntax highlighter experiment. You can enter JavaScript code into the textarea (down below) and you will be able to view the code with syntax highlighting.

It first builds a token list then sends that to a function to output the data. The tokeniser is designed to be as accurate as possible, for example the code ++ will be read as one operator token (increment), while +- will be read as two operator tokens (plus then minus). Although this is not strictly required for a syntax highlighter, it is implemented so that I may later extend this code to allow crushing/beautifying and/or actual parsing and executing. This level of accuracy does actually have benefits to syntax highlighting however, as tokens such as { and } can be coloured differently depending on whether they are block level indicators or object literals.

Every open source Syntax highlighter that I have tried fails on at least some valid JS. Common pitfalls include:

  • Failure to recognise numbers that start with a period, eg: .01
  • Failure to recognise that the second period in the following is an operator, and not part of the number: 0.1.method(); // yes this is valid JS
  • Failure to handle multiline strings
  • Interpreting the following as containing a regular expression: 1/2/3;
  • Ending regular expression tokens prematurely, eg: /reg[/]exp/; /[/*regexp*/]/;
  • Recognising some edge case 'divide' operators as regular expression opening tags, eg: /regexp/
    /notRegexp/g;
    In this example the first line does not have a semicolon, so the first / on the next line immediately becomes a division operation, with notRegexp and g as variables.

I know of only one bug in my implementation: if an object literal is placed in the false branch of a tertiary statement then it is highlighted as a block level token (though most other libraries don't distinguish between these). Fixed!

Christian Krebbs writes in to inform of another bug, a regular expression following a variable declaration (without an assignment or semicolon) is interpreted as a division. This one is going to be particularly difficult to fix. If you find another code sequence that fails to highlight correctly please contact me.


Note: This is not designed to necessarily fail gracefully on invalid input.