Parsing Simulink mdl files with Pyparsing
- Published 2007-05-14 (6 years ago)
- Updated 2008-11-25 (4 years, 6 months ago)
I am currently working on a small side project where I need to extract information from Simulink mdl files. Thanks to the Pyparsing module, it is surprisingly easy to create a full-fledged parser. I am not aware of any publicly available Simulink parsers, so I am posting the full source code here. Maybe someone will find it useful.
The Simulink file format
The file format used to store Simulink models is really simple. The data is stored in a structured ASCII file. Below are excerpts from a mdl file.
System {
Name "feedbackloop"
Location [480, 85, 1016, 386]
PaperType "A4"
TiledPaperMargins [0.500000, 0.500000, 0.500000, 0.500000]
...
Block {
BlockType Fcn
Name "Fcn"
Position [120, 50, 180, 80]
Expr "sin(u(1)*exp(2.3*(-u(2))))+sin(u(1)*exp(2.3*(-u"
"(2))))+sin(u(1)*exp(2.3*(-u(2))))+sin(u(1)*exp(2.3*(-u(2))))+sin(u(1)*exp(2.3"
"*(-u(2))))+sin(u(1)*exp(2.3*(-u(2))))+sin(u(1)*exp(2.3*(-u(2))))"
}
...
Line {
SrcBlock "Integrator"
SrcPort 1
Points [35, 0]
Branch {
Points [0, -45; 40, 0]
DstBlock "Out1"
DstPort 1
}
Branch {
Points [0, 70; -25, 0]
Branch {
Points [-95, 0]
DstBlock "Sum"
DstPort 2
}
Branch {
Points [0, 40; -135, 0]
}
...
}
}
}
A few observations:
- The data is hierarchical.
- Parameters and values are separated by whitespace
- Long strings are split over multiple lines
- Object parameter types are strings, integers, floats, lists and matrices
Although the Simulink file format is simple, parsing the data is not trivial due to the nested objects. Writing a parser from scratch is time consuming. Fortunately there exists a Python library that provides the necessary boiling-plate code and does most of the hard job.
Pyparsing to the rescue
Trying Pyparsing has been on my todo-list for a long time, and the need for a Simulink parser was a perfect opportunity to get acquainted with Pyparsing. The need for parsing data structures arises from time to time, and Pyparsing seems to be the right tool for the job. I also have a very good impression of the author, Paul McGuire. He is a frequent contributor to comp.lang.python and always has helpful and insightful comments.
My standard approach when learning a new tool, is to try find an example that is tangential to a problem I want to solve. I then tweak and adapt the example to get a hands-on feeling of how things work. In this case I found an example that is almost a perfect fit: jsonParser.py. The example shows how to parse the JSON format, which has many similarities with the Simulink file format. After some experimentation I was able to adapt the example to parse Simulink data.
Code
This is my first attempt to write a parser using Pyparsing, so the quality of the code may not be the best. You can also download the code directly: simulinkparser.py
Update 2008-11-25: Added support for for mulitline strings. Thanks Jack Guo for pointing out this bug! I have also added support for comments. Lines that start with a # are ignored. The parser now parses all of the mdl files bundled with Matlab 2008a.
Update: Added support for names with underscores. Thanks Alex Meyer for the fix!
"""
A simple Simulink mdl file parser.
Credits:
Most of the code is based on the json parser example distributed with
pyparsing. The code in jsonParser.py was written by Paul McGuire
"""
__author__ = 'Kjell Magne Fauske'
__license__ = 'MIT'
__version__ = '1.3'
# A high level grammar of the Simulink mdl file format
SIMULINK_BNF = """
object {
members
}
members
variablename value
object {
members
}
variablename
array
[ elements ]
matrix
[elements ; elements]
elements
value
elements , value
value
string
doublequotedstring
float
integer
object
array
matrix
"""
from pyparsing import *
# parse actions
def convertNumbers(s,l,toks):
"""Convert tokens to int or float"""
# Taken from jsonParser.py
n = toks[0]
try:
return int(n)
except ValueError, ve:
return float(n)
def joinStrings(s,l,toks):
"""Join string split over multiple lines"""
return ["".join(toks)]
# Define grammar
# Parse double quoted strings. Ideally we should have used the simple statement:
# dblString = dblQuotedString.setParseAction( removeQuotes )
# Unfortunately dblQuotedString does not handle special chars like \n \t,
# so we have to use a custom regex instead.
# See http://pyparsing.wikispaces.com/message/view/home/3778969 for details.
dblString = Regex(r'\"(?:\\\"|\\\\|[^"])*\"', re.MULTILINE)
dblString.setParseAction( removeQuotes )
mdlNumber = Combine( Optional('-') + ( '0' | Word('123456789',nums) ) +
Optional( '.' + Word(nums) ) +
Optional( Word('eE',exact=1) + Word(nums+'+-',nums) ) )
mdlObject = Forward()
mdlName = Word('$'+'.'+'_'+alphas+nums)
mdlValue = Forward()
# Strings can be split over multiple lines
mdlString = (dblString + Optional(OneOrMore(Suppress(LineEnd()) + LineStart()
+ dblString)))
mdlElements = delimitedList( mdlValue )
mdlArray = Group(Suppress('[') + Optional(mdlElements) + Suppress(']') )
mdlMatrix =Group(Suppress('[') + (delimitedList(Group(mdlElements),';')) \
+ Suppress(']') )
mdlValue << ( mdlNumber | mdlName| mdlString | mdlArray | mdlMatrix )
memberDef = Group( mdlName + mdlValue ) | Group(mdlObject)
mdlMembers = OneOrMore( memberDef)
mdlObject << ( mdlName+Suppress('{') + Optional(mdlMembers) + Suppress('}') )
mdlNumber.setParseAction( convertNumbers )
mdlString.setParseAction(joinStrings)
# Some mdl files from Mathworks start with a comment. Ignore all
# lines that start with a #
singleLineComment = Group("#" + restOfLine)
mdlObject.ignore(singleLineComment)
mdlparser = mdlObject
if __name__ == '__main__':
import pprint
testdata = """
# $Revision: 1.1.6.3 $
System {
Name "feedbackloop"
Location [480, 85, 1016, 386]
Open on
TiledPaperMargins [0.500000, 0.500000, 0.500000, 0.500000]
Block {
BlockType Fcn
Name "Fcn"
Position [120, 50, 180, 80]
Expr "sin(u(1)*exp(2.3*(-u(2))))+sin(u(1)*exp(2.3*(-u"
"(2))))+sin(u(1))"
}
Block {
BlockType Outport
Name "Out1"
Position [345, 73, 375, 87]
IconDisplay "Port number"
BusOutputAsStruct off
}
Line {
SrcBlock "Integrator"
SrcPort 1
Points [35, 0]
Branch {
Points [0, -45; 40, 0]
DstBlock "Out1"
DstPort 1
}
}
T_e_st {
Dummy "A\nmultiline\nstring"
}
}
"""
result = mdlparser.parseString(testdata)
pprint.pprint(result.asList())
Example output:
['System',
['Name', 'feedbackloop'],
['Location', [480, 85, 1016, 386]],
['Open', 'on'],
['TiledPaperMargins', [0.5, 0.5, 0.5, 0.5]],
['Block',
['BlockType', 'Fcn'],
['Name', 'Fcn'],
['Position', [120, 50, 180, 80]],
['Expr', 'sin(u(1)*exp(2.3*(-u(2))))+sin(u(1)*exp(2.3*(-u(2))))+sin(u(1))']],
['Block',
['BlockType', 'Outport'],
['Name', 'Out1'],
['Position', [345, 73, 375, 87]],
['IconDisplay', 'Port number'],
['BusOutputAsStruct', 'off']],
['Line',
['SrcBlock', 'Integrator'],
['SrcPort', 1],
['Points', [35, 0]],
['Branch',
['Points', [[0, -45], [40, 0]]],
['DstBlock', 'Out1'],
['DstPort', 1]]],
['T_e_st', ['Dummy', 'A\nmultiline\nstring']]]
Final words
Writing a full-fledged parser with Pyparsing is so easy that it almost feels like cheating. However, writing the parser is only the first step. The next thing I have to do is to find a practical and efficient way of using the parsed data.
The output from the parser is a nested list structure. This structure is a bit impractical if I for instance quickly want to find out if two blocks are connected. Pyparsing offers a few hooks to manipulate data during parsing, like for instance parse actions. The easiest solution is probably to iterate a few times over the output and store the data I need in an appropriate data structure.

Comments
-
- #1 Paul McGuire, May 23, 2007 at 6:18 p.m.
-
-
- #2 Kjell Magne Fauske, May 23, 2007 at 6:33 p.m.
-
-
- #3 Kjell Magne Fauske, August 27, 2007 at 9:05 p.m.
-
-
- #4 Loul, December 3, 2007 at 10:49 p.m.
-
-
- #5 Kjell Magne Fauske, December 4, 2007 at 8:09 a.m.
-
-
- #6 Raj Dilip, February 22, 2008 at 7:20 p.m.
-
-
- #7 Agostino, March 28, 2008 at 7:36 p.m.
-
-
- #8 Bruce, May 6, 2008 at 11:05 p.m.
-
-
- #9 Kjell Magne Fauske, May 7, 2008 at 9:25 a.m.
-
-
Beautiful typesetting and layout
-
Easy to write equations
-
A wealth of existing packages for creating tables, illustrations etc.
-
Relatively easy to customize
-
Extensive bibliography and cross-referencing support
-
Very flexible
-
Not that easy to publish online unless you want to put a PDF online
-
Steep learning curve if you want to dive into the internals.
-
- #10 Romain, May 22, 2008 at 5:51 p.m.
-
-
- #11 Kjell Magne Fauske, May 22, 2008 at 9:55 p.m.
-
-
- #12 Kjell Magne Fauske, October 16, 2008 at 8:09 a.m.
-
-
- #13 Jack Guo, November 13, 2008 at 3:14 p.m.
-
-
- #14 James Ismail, November 18, 2008 at 8:39 a.m.
-
-
- #15 Kjell Magne Fauske, November 18, 2008 at 9:04 a.m.
-
-
- #16 James Ismail, November 18, 2008 at 6:02 p.m.
-
-
- #17 Kjell Magne Fauske, November 18, 2008 at 6:45 p.m.
-
-
- #18 Wolfgang Ulmer, November 19, 2008 at 1:39 p.m.
-
-
-
-
-
...
-
- #19 Kjell Magne Fauske, November 19, 2008 at 2:23 p.m.
-
-
- #20 Jack Guo, November 25, 2008 at 2:12 p.m.
-
-
- #21 Kjell Magne Fauske, November 25, 2008 at 2:47 p.m.
-
-
- #22 Kjell Magne Fauske, November 25, 2008 at 4:51 p.m.
-
-
- #23 Jack Guo, November 25, 2008 at 5:53 p.m.
-
-
- #24 Kjell Magne Fauske, November 25, 2008 at 6:27 p.m.
-
-
- #25 Wolfgang, November 26, 2008 at 10:15 a.m.
-
-
- #26 Kjell Magne Fauske, November 26, 2008 at 10:35 a.m.
-
-
- #27 nguyengn, March 26, 2009 at 1 p.m.
-
-
- #28 Magnus Persson, July 2, 2009 at 6:30 p.m.
-
-
- #29 Nate Parsons, July 9, 2009 at 7:44 p.m.
-
-
- #30 Wolfgang, July 10, 2009 at 10:20 a.m.
-
-
- #31 Kjell Magne Fauske, July 10, 2009 at 11:50 a.m.
-
Comments are disabled for this entryVery cool! I hope you don't mind, I add a link back to your blog from the pyparsing "Who's Using Pyparsing" wiki page. I especially liked your "feels like cheating" line!
-- Paul
@Paul
I don't mind at all Paul. Thanks for your comment, and thanks for the JSON parser example!
I recently discovered that the Simulink parser code didn't work properly with Pyparsing 1.4.7. I have now fixed the code to make it compatible.
Very interesting work : I am looking for free simulation possibilities, your Simulink import could be one of the basic blocs... Are you thinking of reformatting your parsed data for PySim ? I am also looking at Ptolemy (classic)...
PS: funny, I have also been using your BibConverter tool... Thanks for having taken few hours to automate this boring convertion in an elegant manner !
Thanks Loul for your comment.
I'm not familiar with PySim. A Google search gives several candidates, so I'm not sure which project you refer to. The output from the Simulink parser is very basic and clearly needs to be reformatted. My motivation for writing the parser was to use it in my mdl2tex project. Progress has been slow, so I have not yet found a good way of representing the data in a mdl file.
You're welcome. It has saved me from many hours of boring typing as well :)
how to convert a mdl file to obj format
Very useful script! I'm willing to use it for one of my upcoming books: I do not like how Simulink projects look like. In the book I explain some features of Simulink but also discuss some general ideas on system analysis, not necessarily tied to this application.
I'll keep an eye on this. Cheers, Agostino
Very interesting indeed. Looking forward to next chapter. Any thoughts on what tools you will use in making use of the data (besides Pyparsing)?
On a side note: Lot's of references to Latex on site. In your opinion, what are the pros/cons to writing latex as opposed to say xml docbook style?
Python is of course an obvious tool. My original plan with the simulink parser was to create a tool for converting simulink block diagrams to a format suitable for use with LaTeX. (Similar to my dot2tex tool). In order to do that I have to interpret how the blocks are connected and how the connections are routed. Finding the right data structure for querying the model is an interesting problem. I will post the results when I have something that is working. Currently I don't need to draw complex block diagrams, so progress has been slow.
A long time ago I experimented with using docbook for writing articles for my web site. I gave it up because writing docbook xml and customizing the docbook xls was too cumbersome for me. Recently I have started to use reStructuredText and docutils instead. Its markup is very simple,and since it is written in Python it is easy to write extensions and customize the output. I still think docbook is a great, but now I'd rather write a docbook translator for docutils than writing docbook markup directly.
So, back to LaTeX. LaTeX is primarily for the printed medium and I use it for writing scientific papers and other documents with math-heavy content.
Pros:
Cons:
To summarize: For math-heavy and scientific documents go for LaTeX. For primarily online documents, user guides, API documentations etc, tools like docbook and docutils are a good choice.
Have you been able to exploit the produced list ?
I have not worked much on it yet. I have put my Simulink project on hold until I need to draw some complex block diagrams for my thesis.
Updated the parser to support names with underscores. Thanks Alex Meyer for the patch.
Well, this looks interessting. I'll be borrowing this simulink parser for some documentation I'll need to do for work.
I used Matlab and XSLT for parsing Simulink mdl files and converting them to an XML-based block diagram representation. First, I used Matlab to convert the hierarchical lists into a generic XML format. Then I wrote XSLT transforms to convert the generic XML into another proprietary XML format for describing control systems. Then the proprietary XML is parsed by a "code generator" written in C#/C++ to generate the real-time executable C++ code representing the controller.
This method provides us with a seamless method of implementing (in real-time C++) exactly the controller we design and simulate (in Simulink/Matlab). The entire process is completely automated and extensible so that any blocks supported in our real-time C++ libraries can have an equivalent in Simulink. Because it is completely automated, there's no chance of a skew between what's simulated and what's implemented, thus "closing the loop" on the control design process . . . if you'll excuse the pun ;)
Thanks James for sharing. Never tried to query and manipulate Simulink models directly from Matlab. Does Matlab have a convenient API for accessing the structure of a mdl file? If this is the case it is probably easier and more robust than reverse engineering the mdl file format like I have tried to do.
I am interested in the data structure you used to represent the control system, but if the format is proprietary you probably can't give any details ;)
Is the automatic code generator you mention a commercially available product?
Matlab has no API that I'm aware of for manipulating Simulink models. I reversed engineered it just like you, which is what motivated my original posting (which has nothing to do with Python or TeX). It was not a pleasant process either, and your code, I guarantee, is much more elegant and simpler. I leaned heavy on regular expressions.
Both the XML data structure and code generator are tools we developed in-house, so I can't give details. However, I can say that there's not much behind the theory. The data structure represents blocks and their connections. That's it :)
The interesting part was the XSLT, which is made to transform XML to (other) XML. The work was in figuring out the connection between blocks, as the grammar from mdl files to our format was different (e.g. we don't have a notion of subsystems the way Simulink does). I ended up implementing some pretty hairy recursive algorithms that completely baffled the debugger in the XML IDE I use (Stylus Studio).
When I was writing my dissertation (back in 96-97), I ran into exactly the same dilemma you are solving now. Fortunately, my block diagrams were not too difficult to draw in LaTeX manually since they all looked pretty much the same with some simple tweaks here and there.
I wish I had the time to collaborate with you on this. What brought me to your site was the fact that I need to document some block diagrams in TeX again, but this time for a patent I am writing up. I can see some small mods to your python parser and a re-thinking (i.e. non-proprietary version) of my XSLT transforms and we could have something pretty useful :)
Browsing briefly through the Matlab documentation I came to the same conclusion. Doing it all using Matlab and XSLT is very impressive.
I understand. Representing the structure is probably not the hardest part. The parser already does that (sort of). Finding a an elegant and convenient way to query block properties and connections is more difficult.
Personally I have no problems with drawing block diagrams directly using TikZ, but I see that a Simulink-based block diagram drawing tool could be useful for many LaTeX users. It is also an interesting and fun programming challenge.
Thank you. Very inspiring. If you have ideas for improvements let me know. I don't have time to work on this at the moment either, but judging from the comments on this article there is a need for tools that can parse and manipulate simulink mdl files. There is a good chance that I will do some more work on this project in the (hopefully) not too distant future.
Well, there is an API for manipulating Simulink models:
http://www.mathworks.com/access/helpdesk/help/toolbox/simulink/slref/bq3cxmi.html#bq3ftt5
It provides methods like
add_blockadd_lineadd_paramYou can also examine the structure of the model by accessing the block's parameters like
LineHandlesorPortConnectivity.I still like the possibility to access a MDL file directly from Python without starting Matlab, executing the command, exiting Matlab and resuming my Python script.
Running a large project build involves doing automated checks of Simulink models for which I use the structure generated by
mdlParser.Thank you Wolfgang for the API link. Looking briefly at the API it seems that Matlab offers all the necessary building blocks for extracting model information. Good to know.
I'm glad to hear that you find
mdlParseruseful.I looked up again the section of the manual that describes the mdl file format. They have now added this warning:
Not very encouraging ;)
Hi,
Any ideas how to also include string commands like
\ninto the parser's grammar?Thanks Jack for spotting this. A quick test revels that the parser can't parse data like
I'm surprised I haven't noticed this before. A quick fix is to replace the line
with
It will now parse the above statement as:
I'll do some more tests and update the code.
This is actually an old bug in Pyparsing. I have not tested if this has been fixed in the latest Pyparsing version.
I have now updated the parser to correctly parse multiline string values. It now also skips lines that start with
#. Some demo files from Mathworks contain such lines.To verify the parser I have parsed most of the mdl demo files bundled with Matlab 2008b. With the latest updates to the parser all of the mdl files I have tried have been parsed without problems.
Another Question.
Is there anyways to implement the mdlparser as a dictionary?
I noticed in jysonParser that he's able to use the class Dict out of pyparser on his jsonObject:
But trying it out on the simulinkpaser returns me an error.
here's what i changed in the simulinkparser:
Try instead:
This allows you to write:
The problem is that the keys are not unique. Many blocks share the same name. Only the last one will be stored in the dictionary so you will only be able to access the last one through the dictionary interface.
Using a dict as parse output structure may be good for simply reading/accessing the data, but if you want to edit the data and write it back to an MDL file, you should keep the order of the fields (which can only be done by storing the data in a nested list). Otherwise, Matlab will not be able to read the MDL file.
You are right Wolfgang. When using the method suggested by Jack pyparsing actually gives you both an ordered list and dictionary access. It does not, however, handle duplicated keys.
I have been think some about data structures for representing a mdl file, and I think a dictionary-like interface is the way to go. Writing mdl files is an interesting challenge and will require some sort of ordered dictionary. Not that difficult to implement and the extra memory required should not be a problem.
For blocks and connections a graph-like data structure is probably a good idea. This will allow to easily access outgoing and incoming lines/edges for each block/node. On top of that we can add various iterators to easily iterate over blocks, lines, annotations etc.
Hi all,
thanks for the simulinkparser. There is a question. How can i expand smartly at the simulink model which has stateflow data ?
Stateflow (and SimEvents) breaks the assumption that there's a "top" element called "Model", and adds "sibling" elements to that. So this implies adding new stuff to the parser to handle the Stateflow parts...
Very interesting. I've been meaning to write a .mdl graphical diff tool for a while, and learning python since before that. Although it seems like you haven't come back to this project, I might use it as a starting point for my project.
Sounds interesting. This would be a big help for all Simulink developers. Would you like to share your code and efforts and start a project for this ?
Thank you everyone for the interest in the Simulink parser. I will probably not get back to this project in the near future, but the parser should be a decent starting point for other projects. For more inspiration you could take a look at the Java-based Simulink Library. It provides not only a parser, but also data structures for accessing the simulink models, including Stateflow data.