Syntax highlighting source code for the web with SilverCity
- Published 2005-02-27 (5 years ago)
Syntax highlighting of code snippets is something I've always wanted to implement on fauskes.net. When I finally gave it a try, it turned out to be quite easy. A few hours of research, some Python magic and voilĂ , no more boring looking source code listings on fauskes.net.
I wanted to generate colorized HTML from XML/HTML, CSS, PHP, LaTeX and Python source code. Writing a full-fledge, multi language syntax highlighter is not a trivial task, and I felt no need for reinventing the wheel. Therefore I went looking for an existing Python package to do all the hard work for me. Fortunately I found SilverCity.
SilverCity
SilverCity is a Python interface to the Scintilla lexers. SilverCity provides lexical analysis for over 20 programming and markup languages. The package offer a low level interface to the lexer, as well as higher level classes for generating syntax-styled HTML from selected languages.
I could have used SilverCity right out of the box. However, there are some issues with the HTML markup generated by SilverCity:
- The markup is a bit bloated
- Whitespace and line breaks are converted to non-breaking spaces and
<br>tags
SilverCity maps directly from the lexer to HTML. This is a little
overkill for my needs. I also markup all of my source listings with the
<pre> tag, which preserves whitespace and line breaks.
Inserting markup for this is therefore unnecessary.
Modifying the SilverCity HTML generators
Fortunately SilverCity comes with full source code. There is not much documentation, but it is not too difficult to understand the code. I discovered that with some subclassing and simple modifications, I could customize SilverCity to suit my needs.
Below is the implementation of my MySilverCity module. You can also download mysilvercity.py directly. The code is hopefully straightforward and well commented.
"""\
MySilverCity - A customized version of the SilverCity HTML generators
Features:
- whitespace and linebreaks are optionally preserved
- generated css classes can be modified
Usage:
To modify the generated css classes:
1. take a look at the ScintillaConstants.py file in your SilverCity
distribution (typically in the site-packages directory). Locate the names of
the lexical states you want to keep and/or modify.
2. Create at dictionary with the selected names as keys. The keys must be
in lowercase and without the SCE_ prefix. As value, set to True if you only
want to keep the css class unmodified, or set it to a new name.
Example:
mypython_css = {
'p_commentline': True, # keep unaltered
'p_number': True, # "
'p_string': True, # "
'p_character':'p_string', # rename to 'p_string'
'p_triple':True,
'p_tripledouble' : 'p_triple', # combine p_triple and p_tripledouble
'p_word': True,
}
All other classes like p_operator and p_default are ignored
3. Use the customization dictionary as a parameter to the modified HTML
generator classes
Author: Kjell Magne Fauske
"""
from SilverCity import Python, XML, CSS
def cleanCssClasses(s1, s2):
"""Return only the css classes we are interested in
Input:
s1 - original dictionary with token id -> css name mappings
s2 - a dictionary where the keys are the css names we are interested
in and the values are either True or a new css name
Output:
Returns a dictionary with new token id -> css name mappings
"""
stmp = {}
for (key, value) in s1.items():
s2val = s2.get(value,False)
if s2val:
if type(s2val) == str:
stmp[key] = s2val
else: stmp[key] = value
return stmp
class HTMLGeneratorMixIn:
"""Adds extra functionality to the SilverCity.HTMLGenerator class
The mixin class modifies the behaviour of the SilverCity.HTMLGenerator
class in three ways:
- It is possible to specify which css classes that are generated
- A css class name can be renamed and multiple css classes can share
the same name
- No markup is inserted for whitespaces and linefeeds
"""
def __init__(self, htmlgenerator, mycss_classes = None, pre = True):
"""Initilize HTML generator
Input:
htmlgenerator - An inststance of the original html generator
mycss_classes - A dictionary used to modify the css classes
generated by the original html generators. Default to None.
pre - If set to True, whitespace and linebreaks are unaltered.
Useful for usage in the <pre>..</pre> HTML environment. If
set to false, whitespace is replaced by and linefeeds
with <br /> tags. Default set to True.
"""
if mycss_classes:
self.css_classes=cleanCssClasses(self.css_classes,mycss_classes)
self.pre = pre
self.htmlgenerator = htmlgenerator
def preformat(self, text):
"""Override the SilverCity.HTMLGenerator.preformat method"""
if self.pre:
text = self.escape(text.expandtabs())
return text
else:
return self.htmlgenerator.preformat(self,text)
# Modification of the default python styles
mypython_css = {
'p_commentline': True,
'p_number': True,
'p_string': True,
'p_character':'p_string',
'p_triple':True,
'p_tripledouble' : 'p_triple',
'p_word': True,
}
# Modification of the default xml/xhtml styles
myhtml_css = {
'h_comment': True,
'h_tag': True,
'h_attribute': True,
'h_doublestring': True,
'h_singlestring': 'h_doublestring',
}
class MyPythonHTMLGenerator(HTMLGeneratorMixIn, Python.PythonHTMLGenerator):
def __init__(self, mycss_classes = None, pre = True):
Python.PythonHTMLGenerator.__init__(self)
HTMLGeneratorMixIn.__init__(self, Python.PythonHTMLGenerator,
mycss_classes, pre)
class MyXMLHTMLGenerator(HTMLGeneratorMixIn, XML.XMLHTMLGenerator):
def __init__(self, mycss_classes = None, pre = True):
XML.XMLHTMLGenerator.__init__(self)
HTMLGeneratorMixIn.__init__(self, XML.XMLHTMLGenerator,
mycss_classes, pre)
# Modification of the default CSS styles
mycss_css = {
'css_tag': True,
'css_class': 'css_tag',
'css_id': 'css_tag',
'css_comment' : True,
'css_identifier' : True,
}
class MyCSSHTMLGenerator(HTMLGeneratorMixIn, CSS.CSSHTMLGenerator):
def __init__(self, mycss_classes = None, pre = True):
CSS.CSSHTMLGenerator.__init__(self)
HTMLGeneratorMixIn.__init__(self, CSS.CSSHTMLGenerator,
mycss_classes, pre)
Usage
The modified HTML classes have the same interface as the original ones. Below is a simple example:
"""MySilverCity test"""
import mysilvercity as msc
import StringIO
code = '''
# comment
def pest(hello):
pass
if __name__ == '__main__':
#test_file = StringIO.StringIO()
test_file = open('d:/python/fnet/test/code.html','w')
PythonXHTMLGenerator().generate_html(test_file, code)
test_file.close()
'''
code2 = '''
<html id = "test">
<!-- comment -->
<h1>Title</h1>
<p class='sdf'>sdlkf jsldkf lsdkf </p>
</html>
'''
f = StringIO.StringIO()
mhtml = msc.MyPythonHTMLGenerator(msc.mypython_css)
mhtml.generate_html(f, code)
print f.getvalue()
f2 = StringIO.StringIO()
msc.MyXMLHTMLGenerator(msc.myhtml_css,pre=False).generate_html(f2, code2)
print f2.getvalue()
Concluding remarks
SilverCity is a great tool for syntax highlighting. With my modified classes it is easy to integrate syntax highlighting in a web framework. Unfortunately support for a few languages is missing. LaTeX for instance. Scintilla has a LaTeX lexer hidden somewhere, but it is probably necessary to modify and rebuild the SilverCity library in order to access it.

Comments
-
- #1 Ronny Pfannschmidt, September 20, 2006 at 11:18 a.m.
-
-
- #2 Kjell Magne Fauske, September 20, 2006 at 2:37 p.m.
-
-
- #3 Kjell Magne Fauske, November 28, 2006 at 2:53 p.m.
-
-
- #4 Kjell Magne Fauske, March 1, 2007 at 12:35 p.m.
-
Comments are disabled for this entryit would be nice if you used SilverCity.LanguageInfo to find all html-generators, then used a class-generator function to create the clean-html classes for all of them and saved them to a dict
Its been a while since I wrote the MySilverCity module. Maybe it is time to take a look at it again to see if I can improve it.
As mentioned in my previous comment, it's been a while since I wrote the MySilverCity module. A new pure Python module for source highlighting is now available. It's called Pygments. I find it a bit easier to work with than SilverCity, and I'm already using it on my PGF/TikZ gallery. I will probably soon switch to Pygments for the rest of the web site as well.
I have now completely switched to Pygments.