Tag: hot code swapping

Performing Real-time Upgrades to an OTP System

Posted by on September 24, 2008

This is the seventh and final article of the Erlang/OTP introduction series. If you haven’t already, I recommend you read the first article which lays the foundation for the application which we’ll be upgrading in addition to teaching you the basics of Erlang/OTP before jumping into the topic of this article. If you’re a quick learner or you wish to jump straight into this article, you may click here to download a ZIP of all the files up to this point.

The Scenario: ErlyBank has been running strong for a few months now and based on customer feedback, the bank wants to implement some additional features. First, they want us to implement a credit-based account. This is similar to a normal account except that withdrawals may be made to go into the negative, meaning that money is owed on the credit account. They also want us to change the ATM so that people can only use the ATM to pay bills with a credit account. And to top this all off, they want us to do these upgrades without significant downtime.

The Result: We’ll create a credit server to easily add a credit account system and following that we’ll change the ATM. Luckily for us, once we make these changes, there is a straightforward way of upgrading the system in real-time so that ErlyBank won’t experience much, if any, downtime.

More…

Rules of Hot Code Swapping

Posted by on September 02, 2008

According to page 355 in Joe Armstrong’s book, there have been Erlang systems running out there for years with 99.9999999% reliability (that’s nine nines). He also says that correctly written Erlang applications can run forever. Although this sounds a bit exaggerated, I completely believe it to be true, because a well-written Erlang application never needs to be brought down, ever. This is all thanks to a powerful feature known as hot code swapping.

Every programmer has “code swapped” before. The steps are simple: You find a bug in the system, you fix it locally, you take down the system, upload the new code, restart the system. Or at least something along those lines.

Now imagine this routine to fix a bug: You find a bug in the system, you fix it locally, you upload the new code. The system is fixed, important processes are never interrupted, there is no downtime. This is hot code swapping.

But remember, the first paragraph says well written applications never have to be taken down. I’ll now show you how to write hot-swappable code in Erlang.

Basic Hot Code Swapping

Let’s assume we have a basic module which is an echoing process. Every message it receives it echos to the console. The code is as follows:

-module(hot_swap1).
-compile(export_all).
-define(VERSION, 1).

start() ->
  Pid = spawn_link(?MODULE, runner, []),
  register(?MODULE, Pid),
  Pid.

runner() ->
  receive
    {echo, Msg} ->
      io:format("v~p Some dude said: ~p~n", [?VERSION, Msg]);
    _ ->
      ignore
  end,
  runner().

 

Pretty basic stuff, right? Well great! We made this awesome echo process thing (Well, I’m sure someone might think its awesome). But I have some bad news: After deploying your work of art, while 1000 people are out there using it, you’ve been receiving complains that the use of “dude” in the echo is insulting to women. Some genius in the business center of your building decides that a better word-choice would be “dude/dudette” (genius). So you follow the afore-mentioned steps of fixing the bug, you recompile the code, upload it, and Erlang should just start using it, right? Unfortunately, since the code wasn’t “well written” to begin with, Erlang doesn’t use the new code.

The problem lies in the last line of the app, the one which says runner(). If you don’t prefix a method call with the module the method resides in, its actually saying “use this version of this method” and not the current version. The fix? Easy: Just change the last line to read ?MODULE:runner(). Sometimes, there are reasons that you would not want to do this, such as if your process is storing state information that is changing structure in a new code change. In this case, it would be better to have an “update” code message or something where the process can do some cleaning up and reopen connections and such. But I’ve found that most often, it has been fine to update code without such a task.

IMPORTANT: Since you’re now calling the method with its fully qualified function name, you need to make sure the module exports the method you’re calling. So “hot_swap1″ needs to export “runner/0.” Normally, modules hide their internal receive loops from outside access, but in this case, its necessary to export it.

Couple notes: First, ?MODULE refers to the name of the current module, if you didn’t know. Second, since we changed runner(). to ?MODULE:runner(). after code has already been deployed, you’ll have to *gasp* restart the processes in order to take advantage of the hot code swapping capabilities.

The Two Version Rule

An important piece of information to know when dealing with code swapping in Erlang is that it only allows two versions of a module to be stored in memory at any given time. The consequences of this are shown in the following example.

In the code above, the spawn_link call was done on purpose. Assuming you have copied and pasted this code and are running it now, the process will be linked to your Erlang shell (or whatever process that made it). Now, if you change the file, lets say you change the VERSION define to “2.” When you compile it, it will compile fine. Now change the VERSION to “3″ and compile again. It should have still compiled fine but you probably got a message like this:

11> c("/Users/mitchellh/Repository/erlang/blog_posts/hot_swap1", [{outdir, "/Users/mitchellh/Repository/erlang/blog_posts/"}]).
** exception exit: killed

 

This is your linked process telling you that it died. But why did it die? Since you changed the version twice, the first version, which the running process was sitting on, was bumped out of memory and any processes running this version are killed.

Getting Around The Two Version Rule

A common problem is that a process can sometimes sit on a receive loop for a very long time, long enough for two versions to pass by, causing the running process to be killed. Normally this isn’t a problem because a well-written Erlang application should have supervisors watching it’s processes. But if a process is storing important state information and you don’t want the process to die during code updates, then you should always put a timeout on your receive loop. Example:

runner() ->
  receive
    {echo, Msg} ->
      io:format("v~p Echo: ~p~n", [?VERSION, Msg]);
    _ ->
      ignore
  after 60000 ->
    reload
  end,
  ?MODULE:runner().

 

The value the timeout returns is not important. The important thing is that after, in this case, 60 seconds, the loop goes down and runs the next iteration of the method. Of course, if you still compile two new versions of the module within 60 seconds, the process will die, but hopefully that isn’t happening! You could always lower the timeout, too. :)

Swapping Code in Detached Running Processes

99.999999% of the time (see, I can use nine nines too, although I have no proof to back this one) you will be running the Erlang VM detached. If you type “erl” on the command line again this will just create a new virtual machine, and compiling your code will not hot swap it in. You need to actually shell into the running Erlang VM. Luckily doing this is easy:

Chip ~: erl -sname node2@localhost -remsh node1@localhost

 

The important part is the “remsh” flag which tells the Erlang VM to connect to a remote shell. In this case the shell is at node1@localhost. You should replace this with the node your running code is on. Once you’re in there, you need to tell the code server to load the new BEAM file into memory by running the command code:load_file(hot_swap1). You should change the argument to the name of the module which you wish to load.

Now your already running echo process will use the newly loaded code! :D

Erlang/OTP Code Upgrades

Erlang/OTP applications provide a way to hot swap code, but thats beyond the scope of this article. I’ll be sure to cover it, or find a link to an article which does cover it, in the future, I promise.

Hopefully this clears up the some of the mystery surrounding Erlang hot code swapping. Once you learn all its little quirks its a great thing to have, since you really never have to take down the system. As always, leave any questions/comments below and I’ll promptly reply. Also, if you find any problems with this article (I am still learning too!), then please comment and I’ll fix it right away. Thanks!