Syntax highlighting source code for the web with SilverCity

  • Published 2005-02-27 (3 years, 7 months ago)

Syntax highlighting of code snippets is something I've always wanted to implement on fauskes.net. When I finally gave it a try, it turned out to be quite easy. A few hours of research, some Python magic and voilĂ , no more boring looking source code listings on fauskes.net.

I wanted to generate colorized HTML from XML/HTML, CSS, PHP, LaTeX and Python source code. Writing a full-fledge, multi language syntax highlighter is not a trivial task, and I felt no need for reinventing the wheel. Therefore I went looking for an existing Python package to do all the hard work for me. Fortunately I found SilverCity.

SilverCity

SilverCity is a Python interface to the Scintilla lexers. SilverCity provides lexical analysis for over 20 programming and markup languages. The package offer a low level interface to the lexer, as well as higher level classes for generating syntax-styled HTML from selected languages.

I could have used SilverCity right out of the box. However, there are some issues with the HTML markup generated by SilverCity:

  • The markup is a bit bloated
  • Whitespace and line breaks are converted to non-breaking spaces and <br> tags

SilverCity maps directly from the lexer to HTML. This is a little overkill for my needs. I also markup all of my source listings with the <pre> tag, which preserves whitespace and line breaks. Inserting markup for this is therefore unnecessary.

Modifying the SilverCity HTML generators

Fortunately SilverCity comes with full source code. There is not much documentation, but it is not too difficult to understand the code. I discovered that with some subclassing and simple modifications, I could customize SilverCity to suit my needs.

Below is the implementation of my MySilverCity module. You can also download mysilvercity.py directly. The code is hopefully straightforward and well commented.

"""\
MySilverCity - A customized version of the SilverCity HTML generators

Features:
  - whitespace and linebreaks are optionally preserved
  - generated css classes can be modified

Usage:
To modify the generated css classes: 
1. take a look at the ScintillaConstants.py file in your SilverCity 
distribution (typically in the site-packages directory). Locate the names of 
the lexical states you want to keep and/or modify.

2. Create at dictionary with the selected names as keys. The keys must be
in lowercase and without the SCE_ prefix. As value, set to True if you only 
want to keep the css class unmodified, or set it to a new name. 

Example:
    mypython_css = {
        'p_commentline': True,     # keep unaltered
        'p_number': True,          # "
        'p_string': True,          # " 
        'p_character':'p_string',  # rename to 'p_string'
        'p_triple':True,
        'p_tripledouble' : 'p_triple', # combine p_triple and p_tripledouble
        'p_word': True,
    }
    All other classes like p_operator and p_default are ignored

3. Use the customization dictionary as a parameter to the modified HTML 
generator classes 

Author: Kjell Magne Fauske
"""

from SilverCity import Python, XML, CSS

def cleanCssClasses(s1, s2):
    """Return only the css classes we are interested in
    
    Input:
        s1 - original dictionary with token id -> css name mappings
        s2 - a dictionary where the keys are the css names we are interested
             in and the values are either True or a new css name
    Output:
        Returns a dictionary with new token id -> css name mappings
    """
    stmp = {}
    for (key, value) in s1.items():
        s2val = s2.get(value,False)
        if s2val:
            if type(s2val) == str:
                stmp[key] = s2val
            else: stmp[key] = value
    return stmp

class HTMLGeneratorMixIn:
    """Adds extra functionality to the SilverCity.HTMLGenerator class
    
    The mixin class modifies the behaviour of the SilverCity.HTMLGenerator
    class in three ways:
        - It is possible to specify which css classes that are generated 
        - A css class name can be renamed and multiple css classes can share
          the same name
        - No markup is inserted for whitespaces and linefeeds
    """
    def __init__(self, htmlgenerator, mycss_classes = None, pre = True):
        """Initilize HTML generator
        
        Input:
            htmlgenerator - An inststance of the original html generator
            mycss_classes - A dictionary used to modify the css classes
                generated by the original html generators. Default to None.
            pre - If set to True, whitespace and linebreaks are unaltered. 
                Useful for usage in the <pre>..</pre> HTML environment. If 
                set to false, whitespace is replaced by &nbsp; and linefeeds
                with <br /> tags. Default set to True.
        """
        if mycss_classes:
            self.css_classes=cleanCssClasses(self.css_classes,mycss_classes)
        self.pre = pre
        self.htmlgenerator = htmlgenerator
    def preformat(self, text):
        """Override the SilverCity.HTMLGenerator.preformat method"""
        if self.pre:
            text = self.escape(text.expandtabs())
            return text
        else: 
            return self.htmlgenerator.preformat(self,text)

# Modification of the default python styles
mypython_css = {
    'p_commentline': True,
    'p_number': True,
    'p_string': True,
    'p_character':'p_string',
    'p_triple':True,
    'p_tripledouble' : 'p_triple',
    'p_word': True,
}

# Modification of the default xml/xhtml styles        
myhtml_css = {
    'h_comment': True,
    'h_tag': True,
    'h_attribute': True,
    'h_doublestring': True,
    'h_singlestring': 'h_doublestring',
}

class MyPythonHTMLGenerator(HTMLGeneratorMixIn, Python.PythonHTMLGenerator):
    def __init__(self, mycss_classes = None, pre = True):
        Python.PythonHTMLGenerator.__init__(self)
        HTMLGeneratorMixIn.__init__(self, Python.PythonHTMLGenerator, 
                                    mycss_classes, pre)


class MyXMLHTMLGenerator(HTMLGeneratorMixIn, XML.XMLHTMLGenerator):
    def __init__(self, mycss_classes = None, pre = True):
        XML.XMLHTMLGenerator.__init__(self)
        HTMLGeneratorMixIn.__init__(self, XML.XMLHTMLGenerator, 
                                    mycss_classes, pre)

# Modification of the default CSS styles                                    
mycss_css = {
    'css_tag': True,
    'css_class': 'css_tag',
    'css_id': 'css_tag',
    'css_comment' : True,
    'css_identifier' : True,
}

class MyCSSHTMLGenerator(HTMLGeneratorMixIn, CSS.CSSHTMLGenerator):
    def __init__(self, mycss_classes = None, pre = True):
        CSS.CSSHTMLGenerator.__init__(self)
        HTMLGeneratorMixIn.__init__(self, CSS.CSSHTMLGenerator, 
                                    mycss_classes, pre)

Usage

The modified HTML classes have the same interface as the original ones. Below is a simple example:

"""MySilverCity test"""

import mysilvercity as msc
import StringIO

code = '''
# comment
def pest(hello):
    pass

if __name__ == '__main__':
    #test_file = StringIO.StringIO()
    test_file = open('d:/python/fnet/test/code.html','w')
    PythonXHTMLGenerator().generate_html(test_file, code)
    test_file.close()
'''

code2 = '''
<html id = "test">
<!-- comment -->
<h1>Title</h1>
<p class='sdf'>sdlkf jsldkf lsdkf </p>
</html>
'''

f = StringIO.StringIO() 
mhtml  = msc.MyPythonHTMLGenerator(msc.mypython_css)
mhtml.generate_html(f, code)
print f.getvalue()

f2 = StringIO.StringIO()   
msc.MyXMLHTMLGenerator(msc.myhtml_css,pre=False).generate_html(f2, code2)
print f2.getvalue()

Concluding remarks

SilverCity is a great tool for syntax highlighting. With my modified classes it is easy to integrate syntax highlighting in a web framework. Unfortunately support for a few languages is missing. LaTeX for instance. Scintilla has a LaTeX lexer hidden somewhere, but it is probably necessary to modify and rebuild the SilverCity library in order to access it.

Comments

  • #1 Ronny Pfannschmidt, September 20, 2006 at 11:18 a.m.

    it would be nice if you used SilverCity.LanguageInfo to find all html-generators, then used a class-generator function to create the clean-html classes for all of them and saved them to a dict

  • #2 Kjell Magne Fauske, September 20, 2006 at 2:37 p.m.

    Its been a while since I wrote the MySilverCity module. Maybe it is time to take a look at it again to see if I can improve it.

  • #3 Kjell Magne Fauske, November 28, 2006 at 2:53 p.m.

    As mentioned in my previous comment, it's been a while since I wrote the MySilverCity module. A new pure Python module for source highlighting is now available. It's called Pygments. I find it a bit easier to work with than SilverCity, and I'm already using it on my PGF/TikZ gallery. I will probably soon switch to Pygments for the rest of the web site as well.

  • #4 Kjell Magne Fauske, March 1, 2007 at 12:35 p.m.

    I have now completely switched to Pygments.

Comments are disabled for this entry