Parsing Simulink mdl files with Pyparsing
- Published 2007-05-14 (12 months ago)
- Updated 2007-08-27 (8 months, 2 weeks ago)
I am currently working on a small side project where I need to extract information from Simulink mdl files. Thanks to the Pyparsing module, it is surprisingly easy to create a full-fledged parser. I am not aware of any publicly available Simulink parsers, so I am posting the full source code here. Maybe someone will find it useful.
The Simulink file format
The file format used to store Simulink models is really simple. The data is stored in a structured ASCII file. Below are excerpts from a mdl file.
System {
Name "feedbackloop"
Location [480, 85, 1016, 386]
PaperType "A4"
TiledPaperMargins [0.500000, 0.500000, 0.500000, 0.500000]
...
Block {
BlockType Fcn
Name "Fcn"
Position [120, 50, 180, 80]
Expr "sin(u(1)*exp(2.3*(-u(2))))+sin(u(1)*exp(2.3*(-u"
"(2))))+sin(u(1)*exp(2.3*(-u(2))))+sin(u(1)*exp(2.3*(-u(2))))+sin(u(1)*exp(2.3"
"*(-u(2))))+sin(u(1)*exp(2.3*(-u(2))))+sin(u(1)*exp(2.3*(-u(2))))"
}
...
Line {
SrcBlock "Integrator"
SrcPort 1
Points [35, 0]
Branch {
Points [0, -45; 40, 0]
DstBlock "Out1"
DstPort 1
}
Branch {
Points [0, 70; -25, 0]
Branch {
Points [-95, 0]
DstBlock "Sum"
DstPort 2
}
Branch {
Points [0, 40; -135, 0]
}
...
}
}
}
A few observations:
- The data is hierarchical.
- Parameters and values are separated by whitespace
- Long strings are split over multiple lines
- Object parameter types are strings, integers, floats, lists and matrices
Although the Simulink file format is simple, parsing the data is not trivial due to the nested objects. Writing a parser from scratch is time consuming. Fortunately there exists a Python library that provides the necessary boiling-plate code and does most of the hard job.
Pyparsing to the rescue
Trying Pyparsing has been on my todo-list for a long time, and the need for a Simulink parser was a perfect opportunity to get acquainted with Pyparsing. The need for parsing data structures arises from time to time, and Pyparsing seems to be the right tool for the job. I also have a very good impression of the author, Paul McGuire. He is a frequent contributor to comp.lang.python and always has helpful and insightful comments.
My standard approach when learning a new tool, is to try find an example that is tangential to a problem I want to solve. I then tweak and adapt the example to get a hands-on feeling of how things work. In this case I found an example that is almost a perfect fit: jsonParser.py. The example shows how to parse the JSON format, which has many similarities with the Simulink file format. After some experimentation I was able to adapt the example to parse Simulink data.
Code
This is my first attempt to write a parser using Pyparsing, so the quality of the code may not be the best. You can also download the code directly: simulinkparser.py
Update: Fixed problem with strings split over multiple lines. Latest version of Pyparsing broke the code.
"""
A simple Simulink mdl file parser.
Credits:
Most of the code is based on the json parser example distributed with
pyparsing. The code in jsonParser.py was written by Paul McGuire
"""
__author__ = 'Kjell Magne Fauske'
__license__ = 'MIT'
# A high level grammar of the Simulink mdl file format
SIMULINK_BNF = """
object {
members
}
members
variablename value
object {
members
}
variablename
array
[ elements ]
matrix
[elements ; elements]
elements
value
elements , value
value
string
doublequotedstring
float
integer
object
array
matrix
"""
from pyparsing import *
# parse actions
def convertNumbers(s,l,toks):
"""Convert tokens to int or float"""
# Taken from jsonParser.py
n = toks[0]
try:
return int(n)
except ValueError, ve:
return float(n)
def joinStrings(s,l,toks):
"""Join string split over multiple lines"""
return ["".join(toks)]
# Define grammar
dblString = dblQuotedString.setParseAction( removeQuotes )
mdlNumber = Combine( Optional('-') + ( '0' | Word('123456789',nums) ) +
Optional( '.' + Word(nums) ) +
Optional( Word('eE',exact=1) + Word(nums+'+-',nums) ) )
mdlObject = Forward()
mdlName = Word('$'+'.'+alphas+nums)
mdlValue = Forward()
# Strings can be split over multiple lines
mdlString = (dblString + Optional(OneOrMore(Suppress(LineEnd()) + LineStart()
+ dblString)))
mdlElements = delimitedList( mdlValue )
mdlArray = Group(Suppress('[') + Optional(mdlElements) + Suppress(']') )
mdlMatrix =Group(Suppress('[') + (delimitedList(Group(mdlElements),';')) \
+ Suppress(']') )
mdlValue << ( mdlNumber | mdlName| mdlString | mdlArray | mdlMatrix )
memberDef = Group( mdlName + mdlValue ) | Group(mdlObject)
mdlMembers = OneOrMore( memberDef)
mdlObject << ( mdlName+Suppress('{') + Optional(mdlMembers) + Suppress('}') )
mdlNumber.setParseAction( convertNumbers )
mdlString.setParseAction(joinStrings)
mdlparser = mdlObject
if __name__ == '__main__':
import pprint
testdata = """
System {
Name "feedbackloop"
Location [480, 85, 1016, 386]
Open on
TiledPaperMargins [0.500000, 0.500000, 0.500000, 0.500000]
Block {
BlockType Fcn
Name "Fcn"
Position [120, 50, 180, 80]
Expr "sin(u(1)*exp(2.3*(-u(2))))+sin(u(1)*exp(2.3*(-u"
"(2))))+sin(u(1))"
}
Block {
BlockType Outport
Name "Out1"
Position [345, 73, 375, 87]
IconDisplay "Port number"
BusOutputAsStruct off
}
Line {
SrcBlock "Integrator"
SrcPort 1
Points [35, 0]
Branch {
Points [0, -45; 40, 0]
DstBlock "Out1"
DstPort 1
}
}
}
"""
result = mdlparser.parseString(testdata)
pprint.pprint(result.asList())
Example output:
['System',
['Name', 'feedbackloop'],
['Location', [480, 85, 1016, 386]],
['Open', 'on'],
['TiledPaperMargins', [0.5, 0.5, 0.5, 0.5]],
['Block',
['BlockType', 'Fcn'],
['Name', 'Fcn'],
['Position', [120, 50, 180, 80]],
['Expr', 'sin(u(1)*exp(2.3*(-u(2))))+sin(u(1)*exp(2.3*(-u(2))))+sin(u(1))']],
['Block',
['BlockType', 'Outport'],
['Name', 'Out1'],
['Position', [345, 73, 375, 87]],
['IconDisplay', 'Port number'],
['BusOutputAsStruct', 'off']],
['Line',
['SrcBlock', 'Integrator'],
['SrcPort', 1],
['Points', [35, 0]],
['Branch',
['Points', [[0, -45], [40, 0]]],
['DstBlock', 'Out1'],
['DstPort', 1]]]]
Final words
Writing a full-fledged parser with Pyparsing is so easy that it almost feels like cheating. However, writing the parser is only the first step. The next thing I have to do is to find a practical and efficient way of using the parsed data.
The output from the parser is a nested list structure. This structure is a bit impractical if I for instance quickly want to find out if two blocks are connected. Pyparsing offers a few hooks to manipulate data during parsing, like for instance parse actions. The easiest solution is probably to iterate a few times over the output and store the data I need in an appropriate data structure.

Comments
Very cool! I hope you don't mind, I add a link back to your blog from the pyparsing "Who's Using Pyparsing" wiki page. I especially liked your "feels like cheating" line!
-- Paul
@Paul
I don't mind at all Paul. Thanks for your comment, and thanks for the JSON parser example!
I recently discovered that the Simulink parser code didn't work properly with Pyparsing 1.4.7. I have now fixed the code to make it compatible.
Very interesting work : I am looking for free simulation possibilities, your Simulink import could be one of the basic blocs... Are you thinking of reformatting your parsed data for PySim ? I am also looking at Ptolemy (classic)...
PS: funny, I have also been using your BibConverter tool... Thanks for having taken few hours to automate this boring convertion in an elegant manner !
Thanks Loul for your comment.
I'm not familiar with PySim. A Google search gives several candidates, so I'm not sure which project you refer to. The output from the Simulink parser is very basic and clearly needs to be reformatted. My motivation for writing the parser was to use it in my mdl2tex project. Progress has been slow, so I have not yet found a good way of representing the data in a mdl file.
You're welcome. It has saved me from many hours of boring typing as well :)
how to convert a mdl file to obj format
Very useful script! I'm willing to use it for one of my upcoming books: I do not like how Simulink projects look like. In the book I explain some features of Simulink but also discuss some general ideas on system analysis, not necessarily tied to this application.
I'll keep an eye on this. Cheers, Agostino
Very interesting indeed. Looking forward to next chapter. Any thoughts on what tools you will use in making use of the data (besides Pyparsing)?
On a side note: Lot's of references to Latex on site. In your opinion, what are the pros/cons to writing latex as opposed to say xml docbook style?
Python is of course an obvious tool. My original plan with the simulink parser was to create a tool for converting simulink block diagrams to a format suitable for use with LaTeX. (Similar to my dot2tex tool). In order to do that I have to interpret how the blocks are connected and how the connections are routed. Finding the right data structure for querying the model is an interesting problem. I will post the results when I have something that is working. Currently I don't need to draw complex block diagrams, so progress has been slow.
A long time ago I experimented with using docbook for writing articles for my web site. I gave it up because writing docbook xml and customizing the docbook xls was too cumbersome for me. Recently I have started to use reStructuredText and docutils instead. Its markup is very simple,and since it is written in Python it is easy to write extensions and customize the output. I still think docbook is a great, but now I'd rather write a docbook translator for docutils than writing docbook markup directly.
So, back to LaTeX. LaTeX is primarily for the printed medium and I use it for writing scientific papers and other documents with math-heavy content.
Pros:
Cons:
To summarize: For math-heavy and scientific documents go for LaTeX. For primarily online documents, user guides, API documentations etc, tools like docbook and docutils are a good choice.
Post a comment
Markdown syntax enabled