Rules of Hot Code Swapping

Posted by on September 02, 2008

According to page 355 in Joe Armstrong’s book, there have been Erlang systems running out there for years with 99.9999999% reliability (that’s nine nines). He also says that correctly written Erlang applications can run forever. Although this sounds a bit exaggerated, I completely believe it to be true, because a well-written Erlang application never needs to be brought down, ever. This is all thanks to a powerful feature known as hot code swapping.

Every programmer has “code swapped” before. The steps are simple: You find a bug in the system, you fix it locally, you take down the system, upload the new code, restart the system. Or at least something along those lines.

Now imagine this routine to fix a bug: You find a bug in the system, you fix it locally, you upload the new code. The system is fixed, important processes are never interrupted, there is no downtime. This is hot code swapping.

But remember, the first paragraph says well written applications never have to be taken down. I’ll now show you how to write hot-swappable code in Erlang.

Basic Hot Code Swapping

Let’s assume we have a basic module which is an echoing process. Every message it receives it echos to the console. The code is as follows:

-module(hot_swap1).
-compile(export_all).
-define(VERSION, 1).

start() ->
  Pid = spawn_link(?MODULE, runner, []),
  register(?MODULE, Pid),
  Pid.

runner() ->
  receive
    {echo, Msg} ->
      io:format("v~p Some dude said: ~p~n", [?VERSION, Msg]);
    _ ->
      ignore
  end,
  runner().

 

Pretty basic stuff, right? Well great! We made this awesome echo process thing (Well, I’m sure someone might think its awesome). But I have some bad news: After deploying your work of art, while 1000 people are out there using it, you’ve been receiving complains that the use of “dude” in the echo is insulting to women. Some genius in the business center of your building decides that a better word-choice would be “dude/dudette” (genius). So you follow the afore-mentioned steps of fixing the bug, you recompile the code, upload it, and Erlang should just start using it, right? Unfortunately, since the code wasn’t “well written” to begin with, Erlang doesn’t use the new code.

The problem lies in the last line of the app, the one which says runner(). If you don’t prefix a method call with the module the method resides in, its actually saying “use this version of this method” and not the current version. The fix? Easy: Just change the last line to read ?MODULE:runner(). Sometimes, there are reasons that you would not want to do this, such as if your process is storing state information that is changing structure in a new code change. In this case, it would be better to have an “update” code message or something where the process can do some cleaning up and reopen connections and such. But I’ve found that most often, it has been fine to update code without such a task.

IMPORTANT: Since you’re now calling the method with its fully qualified function name, you need to make sure the module exports the method you’re calling. So “hot_swap1″ needs to export “runner/0.” Normally, modules hide their internal receive loops from outside access, but in this case, its necessary to export it.

Couple notes: First, ?MODULE refers to the name of the current module, if you didn’t know. Second, since we changed runner(). to ?MODULE:runner(). after code has already been deployed, you’ll have to *gasp* restart the processes in order to take advantage of the hot code swapping capabilities.

The Two Version Rule

An important piece of information to know when dealing with code swapping in Erlang is that it only allows two versions of a module to be stored in memory at any given time. The consequences of this are shown in the following example.

In the code above, the spawn_link call was done on purpose. Assuming you have copied and pasted this code and are running it now, the process will be linked to your Erlang shell (or whatever process that made it). Now, if you change the file, lets say you change the VERSION define to “2.” When you compile it, it will compile fine. Now change the VERSION to “3″ and compile again. It should have still compiled fine but you probably got a message like this:

11> c("/Users/mitchellh/Repository/erlang/blog_posts/hot_swap1", [{outdir, "/Users/mitchellh/Repository/erlang/blog_posts/"}]).
** exception exit: killed

 

This is your linked process telling you that it died. But why did it die? Since you changed the version twice, the first version, which the running process was sitting on, was bumped out of memory and any processes running this version are killed.

Getting Around The Two Version Rule

A common problem is that a process can sometimes sit on a receive loop for a very long time, long enough for two versions to pass by, causing the running process to be killed. Normally this isn’t a problem because a well-written Erlang application should have supervisors watching it’s processes. But if a process is storing important state information and you don’t want the process to die during code updates, then you should always put a timeout on your receive loop. Example:

runner() ->
  receive
    {echo, Msg} ->
      io:format("v~p Echo: ~p~n", [?VERSION, Msg]);
    _ ->
      ignore
  after 60000 ->
    reload
  end,
  ?MODULE:runner().

 

The value the timeout returns is not important. The important thing is that after, in this case, 60 seconds, the loop goes down and runs the next iteration of the method. Of course, if you still compile two new versions of the module within 60 seconds, the process will die, but hopefully that isn’t happening! You could always lower the timeout, too. :)

Swapping Code in Detached Running Processes

99.999999% of the time (see, I can use nine nines too, although I have no proof to back this one) you will be running the Erlang VM detached. If you type “erl” on the command line again this will just create a new virtual machine, and compiling your code will not hot swap it in. You need to actually shell into the running Erlang VM. Luckily doing this is easy:

Chip ~: erl -sname node2@localhost -remsh node1@localhost

 

The important part is the “remsh” flag which tells the Erlang VM to connect to a remote shell. In this case the shell is at node1@localhost. You should replace this with the node your running code is on. Once you’re in there, you need to tell the code server to load the new BEAM file into memory by running the command code:load_file(hot_swap1). You should change the argument to the name of the module which you wish to load.

Now your already running echo process will use the newly loaded code! :D

Erlang/OTP Code Upgrades

Erlang/OTP applications provide a way to hot swap code, but thats beyond the scope of this article. I’ll be sure to cover it, or find a link to an article which does cover it, in the future, I promise.

Hopefully this clears up the some of the mystery surrounding Erlang hot code swapping. Once you learn all its little quirks its a great thing to have, since you really never have to take down the system. As always, leave any questions/comments below and I’ll promptly reply. Also, if you find any problems with this article (I am still learning too!), then please comment and I’ll fix it right away. Thanks!

Trackbacks

Use this link to trackback from your own site.

Comments

Leave a response

  1. Damir Sep 02, 2008 11:02

    Nice writeup. I must say, Erlang is closer and closer on my horizon…

  2. Tim Sep 02, 2008 18:08

    This nicely describes code hot-swap, but perhaps it would be a good idea to also include a short screencast showing it in action.

    I think non-stop operation is going to become increasingly important once applications go massively parallel on massively multicore CPUs. In the development environments at work I see enterprise systems constantly having to be rebooted to install changes. It is disruptive and leads to measurable downtime, and this is on relatively non-parallel systems. In a few years where applications might have many thousands of threads (sorry, processes) rebooting an entire system may become a highly disruptive event.

    Keep up the good work. I’m also learning and I’m finding this blog makes a nice complement to reading Joe’s book and the online documentation.

  3. Mitchell Sep 02, 2008 21:47

    Tim, I definitely plan on creating screencasts for Erlang. I just want to take things one step at a time and see if people are interested in seeing such things. :) But judging by the number of hits this site is getting I don’t see why not.

    A code swapping screencast definitely sounds good. Maybe… we’ll see ;)

    I agree with you that non-stop operations are becoming increasingly important. “Downtime” can and will become a thing of the pass!

  4. Bernard Sep 03, 2008 01:18

    I have been interested in Erlang for some years now, and I don’t want to knock the beauty of hot code swapping. However, I just want to point out that there are some environments where one’s been able to do this for years, and these are environments that many people would mock as being toys as far as true development comes to mind. The first of these that comes to mind is Zope - almost every part of a Zope application can be altered without needing to restart the Zope server.

    And here’s the one where you would not credit this … Lotus Notes. Notes applications (served up either through the Notes client or through a browser) consist mostly of pages (or pages made up of page-parts). The widgets on these pages are programmable modules (programmed in a functional language). Changes can be made and there is no re-compilation stage to the application, no need to re-start the server, and no need to visit every server and client to deliver the changes (the changes replicate with the data) - the whole purpose of Notes applications is that they should be running in distributed environments.

    Notes has been around for almost 25 years, the version 8.5 released this year can still run the applications developed on versions back as far as 3 (no-one even tests the earlier apps that were designed to run on DOS, but they would probably still run too). It’s deployed on hundreds of millions of desktops, and the functional language at its core was designed to be used by non-programmers. It would be no surprise to me to learn that in some companies there are still distributed core-business applications running on multiprocessor servers, applications that were written 15 years ago by some secretaries who got curious and started writing their own functions to make life less tedious. When I first started working with Notes over a decade ago, I did meet users who were interested enough to open the ‘Designer Help’ and start coding their own applications.

    Of course, I’m not claiming that a Notes application could scale anything like an Erlang application, but I just wanted to point out that what seems so amazing with Erlang has been around for a long time in other environments too.

  5. Martin Sep 03, 2008 07:01

    Well, interesting, but when I mentioned this hot swap capability to my brother, a pro software developer, he jsut shrugged and said: you can do it in C# or other languges as well.
    He said that basically he has been using a master process to evoke dlls or processes as needed, when he had to swap a dll for a new one, he would stop the process and relaunch using a new dll through the master process.

    You can do hot swapping in other languages as well. It just needs clever planning in advance.

    Not being a pro myself and seeing that you can achieve the same results in other languages - what is the whole fuss about?

  6. Mitchell Sep 03, 2008 07:46

    You’re both correct, this feature is not “new” or innovative really and I don’t credit Erlang for it. Any programming language which can have processes can usually do some sort of hot code swapping. Since Erlang is oriented around a great number of processes, I was merely pointing out how to do this with Erlang, since even beginners should know how to do this task. :)

    I’m also not too familiar with C# (not on a professional level), but I am very familiar with C and its just the same, but its not the same. Sure I can tell the processes to restart with running code but it usually won’t do it automatically as a feature of the language, and so I’d have to program the logic in from the get-go.

    And although in Erlang I do stress that you have write the application “well” before hot code swapping can work, its not difficult and you should be writing applications like this anyway, regardless if you plan on hot swapping the system or not. So there are really no extra steps to getting this hot swapping capability, as there is with C#.

    I’m not bashing any other language, and I don’t want to glorify Erlang. This is an Erlang blog so I’m supporting it, and I’m sorry if my article evoked the thought that this feature is brand new and Erlang is amazing because of it, because that was not my intention.

    And specifically to Bernard: That is very interesting! I’ve never used Lotus Notes, though I’m quite aware of it, and I didn’t know these details about it. That is awesome! :D

  7. Martin Sep 03, 2008 08:09

    Hi Mitchell

    I am an IT enthusiast and I have seen a lot of articles praising Erlang - as i am not too technical, I first thought - wow, that hot swap thing is really cool! (supported natively by a language)
    But then when I spoke to someone who has designed some rather cool software, I got the above response - it is nothing new.

    No dissing of your article intended!

    I should read more on erlang, as I’d really like to know more what would give this language a competitve winning edge. (as some of its concepts can be recreated within other languages)

  8. Mitchell Sep 03, 2008 08:13

    Martin,

    I would say most of its concepts can be recreated in other languages but the whole point is that they were built in to Erlang from the start as a foundation to the language. Erlang was built to be concurrent, fault tolerant, etc. etc. (you’ve probably heard it before ;))

    So although the features can be “recreated,” Erlang makes it very easy to make these otherwise complex systems. And with this ease of use, you don’t have to sacrifice the ability to upgrade your systems seamlessly. Its all part of the language.

    Erlang is by no means a panacea to all computing problems, it has plenty of its own. And it will not replace other languages. Its just its own language, in its own little niche of the programming market :) (Though others are closing in!)

  9. Ulf Wiger Sep 04, 2008 04:15

    To mention another program that has been able to do hot code swapping for ages: emacs! It’s true that dynamically replacing code is not new. If you read Armstrong’s History of Erlang (http://portal.acm.org/citation.cfm?id=1238844.1238850), you will find that hot code swapping was one of the input requirements to what later became Erlang already in 1986, since the Ericsson AXE phone switches that came out in 1974 had this capability. At Ericsson, we also perform hot code swapping on C++ and Java applications.

    Now to the “so what?” In Erlang, the requirement to change code in a running system was a primary input requirement, and influenced other factors of the language design. So it was with Java too, but while Java also had the requirement to appeal to C++ programmers, it stuck with the shared memory model, whereas Erlang departed from it because it makes fault tolerance and in-service code change more difficult (as well as breaking symmetry between local and distributed processing).

    Erlang was also influenced by ML. ML relies on strong static typing, which is problematic from a hot code loading perspective (not unsolveable, but more difficult).

    The combination of share-nothing concurrency, immutable data, dynamic typing and hot code loading is what makes it interesting. Add to this the OTP “behaviours” that provide hot swapping support “for free”, and you have an environment where designers can, and habitually do, change code and hot-load it into a running system as an integral part of the edit-compile-test cycle. I once realised that while prototyping a server application, I’d been doing farly massive changes back and forth, but hadn’t restarted the server once in three months. It’s not a world record by any means, but it’s still pretty cool.

    One place where we can witness the difference is in “bakeoffs” or “interop events”, where different manufacturers meet to try to get their systems working together. It’s quite common that the guys with Erlang-based systems end up making their own code bug compatible with the C++ and Java-based systems, because debugging, modifying and updating the code base is that much easier. In interops, all tricks are allowed, because it’s a learning experience, but what experience has taught us, is that while hot code loading can be done in these other systems as well, the practical difference in the field is quite significant.

    Personally, I’ve experienced even slicker environments. When programming in Omnis 7 (a database 4GL), editor and debugger where fully integrated, and one could edit code while debugging it, stepping through the code, inspecting variables, and if something went wrong, back up the “go point”, edit the code or variables, and then step through the code again, until you got it right. That was 15 years ago…

  10. Tim Sep 04, 2008 08:57

    Above, Martin says “… he would stop the process and relaunch using a new dll through the master process.”

    That’s where there is one difference : as soon as you stop the process you are vulnerable to failure. Even if you restart it as quickly as possible you still have a window of opportunity for a request to attempt to hit your service, at which point you have a failure. In systems that mandate fault tolerance that could be problematic. There might be work-arounds to this, it will depend on exactly what your program is doing.

    In Erlang, you never actually stop it, so you have 0% downtime. Sure, you can do this in other languages, it’s just it’s a lot harder to do if the language doesn’t have hot-swap as a core design principle : you end up doing a lot of coding to recreate something that is built-in to Erlang.

    Aside from a lack of downtime, another important distinction is that when you redeploy in Erlang you automatically maintain state. If you’re doing it in a language that doesn’t have native hot-swap then I guess you have to explicity code for it, serializing your variables etc, and deserializing them when you start the new process… or are there any libraries for Java, C++ etc that can make hot-swap work automagically ?

  11. [...] Mitchell Hashimoto on the basics of hot code swapping: http://spawnlink.com/articles/rules-of-hot-code-swapping/ [...]

  12. Christopher Oliver Sep 12, 2008 05:58

    Ulf Wiger wrote:
    When programming in Omnis 7 (a database 4GL), editor and debugger where fully
    integrated, and one could edit code while debugging it… That was 15 years ago‚Ķ

    One could go back even further with Smalltalk-80 and the lisp machines to see the
    same sort of thing as well as hot swap. I think Erlang’s claim to the stage rests more
    on a lightweight CSP model with a syntax that supports CSP cleanly than on hot
    swapping.

  13. Zamous Sep 14, 2008 22:01

    Any chance on covering this for an OTP application?

    Thanks!

  14. Mitchell Sep 14, 2008 22:16

    Zamous,

    The final article of my OTP introduction series covers this in detail. Should be published at the end of this week, or around there. ;)

  15. John Bender Dec 15, 2008 07:43

    The amount of awesome material you have on your site here is pretty amazing. I look forward to seeing the screencasts, as thats one way to help get the erlang ball rolling for a greater audience.

  16. Mike Mar 16, 2009 09:05
  17. [...] distribute and run on all your nodes. Now you can finally take advantage of release handling with hot code swapping. In an upcoming article, I’ll cover how to deploy release upgrades using reltools and fab. [...]

Comments

Comments: