Sunday, September 13, 2009

Matlab: Tips from the Trenches

Every once in a while, you'll stumble upon something you can't do in Matlab, and spend way too many hours figuring it out. Every once in a while, you will spend hours doing things manually because you didn't even suspect Matlab could do it for you. Here are a few tips from the trenches that could save you a lot of time and help you build better software.

Often times, you will be writing a LaTeX report to communicate your Matlab simulation results. This gets tiresome if every time you modify your data or simulation parameters you have to regenerate and save every plot and array by hand to replace it accordingly in your LaTeX document. We'll see how all of this can be automated by learning to use a few commands. Remember to type 'help ' at the Matlab command prompt to find out more.

Perfect plots all by script, and how to get them in LaTeX

The first command you really should know about is 'print'. It lets you save the current figure to an image file from script or the command prompt. I have two favourite modes:

print('-dpng', '-r100', 'filename.png');

This will save the current figure in a PNG image file named 'filename.png', with a resolution of 100 dots per inch. This is useful for quick communication or backup of the plot, since PNG's are widely supported. My other favourite mode is:

print('-depsc', 'filename.eps');

This will save the current figure to a colour EPS file. This is awesome because EPS images are vector based, hence: whatever the resolution you print them, they will look perfect. And LaTeX can exploit that feature! Basically, an EPS file contains all the data Matlab used to draw the plot in the first place, and enough instructions to redo the job for any display size or paper printing. Unfortunately, EPS files are less convenient for communicating your results as such, since only few people know how to display them and have the proper software handy.

Usually, you will want to use both commands, to keep a quickly accessible version of your plot, and a neat version for your LaTeX report. So your code would look something like this:

plot( ...... );
print('-dpng', '-r200', 'myplot1.png');
print('-depsc', 'myplot1.eps');

Let's find out how to include the EPS graphics in a LaTeX document that we will export to PDF (works running Windows for sure, should be just the same running Linux). First, you need to include the graphicx package:

\usepackage{graphicx}

Then just include your figure like you would always do, only make sure to *not* specify the extension of the image file name. So for an image called filename.eps, write something like this:

\begin{figure}[h]
\centering
\includegraphics[width=.75\textwidth]{filename}
\caption{Description of my perfectly drawn plot.}
\label{myfigurelabelforreferences}
\end{figure}

The trick now is to *not* call PDFLatex. The latter cannot deal with EPS files. You need to first build your LaTeX source using the normal LaTeX compiler, which will output a DVI. Then call DviPDF on that file. An example of a fairly good free LaTeX editor running on Windows that lets you do that easily is LEd (LaTeX Editor). That's it!

Now, when you save plots automatically, you want to control every single detail of how it will appear from script, since you won't be able to do it manually using the GUI. It's actually something you want to do anyway, since the GUI can get very slow. Here are a few tips on how to do just that (call matlab help on them):

figure : will open a new window for plotting purposes

title, xlabel, ylabel, zlabel: work as usual to set the title and axis labels. You might want to set the font size, example:

xlabel('\fontsize{12} Frequency \xi [Hz]');

Indeed, you can use greek letters with the usual LaTeX commands.

xlim, ylim, zlim : choose the limits of the axis box

Here is how you can choose where the axis ticks are.

set(gca, 'XTick', 0:10:100);

Here, 'gca' means "get current axes", which lets you work on the current figure, really. XTick means you want to act on the X axis, but you could use YTick or ZTick. The last parameter is a vector indicating where you want the axis ticks to be.

grid on or grid off displays or hides a grid over the plot, according to the axis ticks.

This lets you set the axis ticks font size too:

set(gca, 'fontsize', 12);

Here is a very handy function, that lets you impose the aspect ratio of the different axes:

pbaspect([1.3 1 1]);

This means that the X axis will be 30% longer than the Y axis (and the Z axis is not important for a 2D plot). You can also use:

axis equal : means that a unit on the X axis will span the same size (in centimeters) as a unit on the Y axis. That comes in handy if the X and Y axis have the same units.

axis tight : prevents the plot box from having a white unused border region: the box will be defined by the extreme values of your plotted data.

One more thing: if you happen to work in a display less environment (we'll talk about this some more later on), you can still use all of these functions, and that's a pretty convenient way of still being able to produce plots. You can also force Matlab to not display a figure, so that generating bunches of plots in a loop would be faster:

set(gcf, 'visible', 'off');

Here, 'gcf' means "get current figure", and you should call this function after you created the figure by calling the 'figure' command.

Porting Matlab code to C++

From time to time, you'll need to do this. It's a pain. Really. But less so if you use the right tools. I gave the GSL a try: do NOT use that library. Complex numbers and matrices support (or lack thereof) are awful. These are the tools I use and enjoy:

Armadillo For vectors and matrices support: very high quality code, with delayed evaluations: incredibly efficient, amazing syntax, motivated developer. Highly recommend it. It integrates with LAPACK and BLAS (and ATLAS and BOOST). If you're a windows user, you might as well not lose to much time trying to compile these libraries (it's horrible, take my word for it), and get these binaries from Victor Liu. You will need a latest generation GCC compiler for Armadillo to work. The neat thing is: Matlab is built on BLAS and LAPACK, so you will get the same numerical results.

FFTW is the FFT computation library of choice, and is also the one that Matlab uses. It integrates just fine with Armadillo, with a little care. This library is pretty sophisticated, so you might want to spend some time reading the documentation.

For the rest, you're on your own. I'm not going to lie to you: it will be painful.

Matlab via SSH and the screen command

If you get access to a computation server via SSH (that's pretty common), you might want to run long simulations on the server, without the need to keep the SSH connection alive. Here is how you can do that:
  1. 'ssh' into your server
  2. execute the 'screen' command (just type 'screen' and hit return)
  3. launch matlab in console mode: 'matlab -nodisplay'
  4. hit 'ctrl+a' then 'd' (this will detach you from the screen)
You're now back to your previous terminal, and matlab disappeared. How can you get back to it? Just type 'screen -r' from *any* terminal (on *any* computer!) that is connected to your server session via ssh. This will get you back to visualizing what your Matlab program is doing.

This way, you can launch a computation via ssh from your laptop, then shut your laptop down, go back home, get some sleep, come back the next day and just get your results back. If you want to monitor the activity of your process, the 'top' command might be useful (inside or outside the screen program, it doesn't change anything). Screen is a very powerful command, and there is much more to it than just this.

A word about transpose and hermitian

I was shocked to discover that, after four years of intense Matlab use: the " ' " (quote) operator is *not* transpose! It's conjugate-transpose (hermitian). The proper operator for the transpose is " .' " (dot-quote). I didn't know about that important difference, and found out that many of my friends didn't either. I dearly hope it was just us.

Profiling your Matlab code and memory management

Sure, Matlab is only for prototyping. But most of the time, it's just not worth your time to port a Matlab code to C or C++, and you still need to run pretty big computations. So how can you speed things up? Well, my advice would be to use the profiler to spot the tricky parts of the code, and write some C-MEX code for these bits. The profiler basically monitors how much time each line of your code has been executed, which lets you find out what bits of code make up for most of the execution time. And so you know where to focus your attention to get things going faster.

Here is basically how to use it:

profile clear;
profile on;

% execute your code here

profile off;
profile viewer;

A GUI will pop up and give you plenty of information about your code.

Remember though: optimization is the last step in software development. Early optimization is pretty much an excuse for code obfuscation. A readable, documented, maintainable, straightforward, simple code is what every one wants. When you have that, profile. The profiler will tell you where it's worth it to do some dirty weird coding.

Also, please note that the 'reshape' command is actually free: no memory is copied around or moved on the physical medium. That is because all matrices and arrays are stored as 1D vectors. Reshaping a matrix really just changes the way you access elements with multi indices.

An important rule to remember about matlab memory management is this:

When you call a function, every parameter passed to it is, by default, passed as a reference. Only if the parameter is modified inside the function will it be passed by copy.

This means that it *is* okay to pass a 1 Gb matrix as a parameter to a function that you call a billion times. As long as the data in the matrix is just read, and not modified, it will not be duplicated in memory: no overhead.

Example:

function s = getsize(A)
s = size(A);
end

A will not be copied: free.

function s = getsize(A)
A = A + 1;
s = size(A);
end

If you have doubts about whether your data is being passed by reference or by copy, here is a nice way to find out. Type this at the command prompt:

format debug;

Now, every time you display a variable, Matlab will also display the memory pointer to the variables data. It's called the 'pr' field. Here is an example:

>> A = rand(3);
>> B = A;
>> A

A =


Structure address = 8fb42e0
m = 3
n = 3
pr = c2d8a90
pi = 0
0.7060 0.2181 0.3709
0.6451 0.7724 0.8909
0.5523 0.2280 0.8564

>> B

B =


Structure address = 8fb7c60
m = 3
n = 3
pr = c2d8a90
pi = 0
0.7060 0.2181 0.3709
0.6451 0.7724 0.8909
0.5523 0.2280 0.8564

As you can see, A and B share the same 'pr' field: A and B point to the *same* memory. Now if we actually modify B, Matlab will copy A's data and save it in a separate location:

>> B = B + 1

B =


Structure address = 8fb7c60
m = 3
n = 3
pr = adbb2e0
pi = 0
1.7060 1.2181 1.3709
1.6451 1.7724 1.8909
1.5523 1.2280 1.8564


Use the same principle with function inputs / outputs to make sure that your data is not being stupidly copied around. But Matlab does a pretty good job, so not to worry. To revert to normal display again, type 'format short'.


That's it for today,

Have fun optimizing the world!

Kirua

0 comments: