Sunday, March 8, 2015

Foundation and (JavaScript) Empire
To succeed, planning alone is insufficient. One must improvise as well.’ I’ll improvise.”
Isaac Asimov,
Foundation

Foundation
Joyent will join forces with IBM, PayPal, Microsoft, Fidelity and The Linux Foundation to establish the Node.js Foundation. This was announced on February 10th.
In open source world foundations are aplenty. From Apache to Document Foundation, including The Linux Foundation, one of the founders here. This one, formed around Node.js, gathered some respectable (and, hmmm, wealthy) companies. So, it hardly likely that these companies joined their meager means to scrape some money for diner across the street. These guys already have money. Lots (loads?) of it. I'm guessing that they are betting on Node as the tool for future development of the web, and are ready to put there money where there mouth are.

And now a few words from sponsors:

IBM

"Open technologies and Enterprise class development are ingrained into IBM's DNA. Node.js is an important technology for our clients looking to leverage Cloud, Data and Mobile applications," said Angel Diaz, Vice President, IBM Cloud Architecture and Technology. "Establishing transparent and open governance within the new Node.js Foundation is significant because it demonstrates the industry's focus on accelerating innovation in JavaScript and ensuring open architectures for the years to come."

PayPal

"Since Node.js' inception over five years ago, I've watched the community grow to become an incredibly rich, active ecosystem of contributors," said Bill Scott, VP Business Engineering, Product Development at PayPal. "Openness is what Node.js is built on and the formation of the Foundation cements the importance of open governance, transparency and synergy."

Microsoft

"We know firsthand why Node.js is a popular choice for thousands of organizations worldwide," said Gianugo Rabellino, Senior Director of Open Source Communities at Microsoft Open Technologies, Inc. "Forming an independent foundation of such passionate contributors and users to guide Node.js as its growth continues validates the project's maturity and sets an open stage for more success to come."

Fidelity

"Establishing an independent foundation for the Node.js platform is an important step for software that is mission critical and growing in adoption across the enterprise," said Travell Perkins, CTO, Fidelity. "To support the engaged and ever-growing Node.js community, Fidelity is taking an active role in the foundation. The new governance model allows our business stakeholders and technology partners to adopt the Node.js platform with increased confidence, it prioritizes thoughtful evolution and long term sustainability."

Second Foundation
The good thing with open source is that it is, well, open source. Anyone who feels that his/hers favorite project is in jeopardy of some kind is free to fork it and create a new one. That's exactly what happened to Node.js. A group of developers, dissatisfied with the project's pace, created io.js. This is the short version of the story. You don't want to hear the long version. However, as bad as it is that Node.js development dwindled (or lost momentum, to say the least) lately, it's good that the community is alive and kicking, even at the price of forking the original project. So, there you are, a possibility for the Second Foundation. If it worked for Hari Seldon, it might work for Node.js.

Conclusion
Joyent ran Node.js development since they acquired it from its original creator Ryan Dahl. In the meantime things went well and then not so well, to the point where a number of people asked if it's still alive. Then Ryan Dahl left the project and then it was forked. This Foundation could do a lot of things right for Node. There are speculations that this even might get io.js “back home”.

Those who like Node.js certainly hope that too many cooks won't spoil the broth.

P.S.
Joyent is offering public cloud, packed with Node.js tools and support, free of charge for new customers for the period of one year. Worth trying.

Sunday, May 18, 2014

Island(s) in the (Non-blocking I/O) Stream
As a great aficionado of Ernest Hemingway and, you guessed it, Dolly Parton I thought that the title will fit the post about non-blocking I/O servers based on Java. While poking around on the Internet doing research for this article, I realized that Apache MINA is not the only one in the category of non-blocking I/O servers written in Java. There's are also Netty and Grizzly. So, only in this installment of this blog, you get triple amount of servers for the for the price of one. No coupons needed.

Java
Besides being an exotic island, it's also a name of a programing language. In good old times there were hardly any pictures in text books, so children had no problem remembering names like Fortran, COBOL, or even better, C. C++ is also an easy one: “C” doubleplusgood. Post 1984 generations, however, started naming programing languages after snakes, islands and gems, to mention a few.
Java is interpreted language, therefore it is slow. This is common knowledge, still, regardless of the fact, it is still widely and wildly used. Even more, it enjoys almost cult status which it attained early in it's life, of a posh “modern, object-oriented programming language”. Programmers are only human, at least most of them are, and word “modern” followed by “object-oriented” is enough to create huge following. To be fair, Java is the first decent programing language that was truly portable. As in development and as in deployment. With single VM (run-time interpreter) from single company. All this makes me want to use it more, than I hit the obstacle because it is slow, which takes me back to square one.
There's more. If you decide to start a Java project you have a number of choices to make. For GUI you can use AWT, Swing or SWT. If you are planning web application you can use JavaFX, JSP or JavaServer Faces. If you are using JSP you can opt for Java Servlets, in which case your choice of web servers narrows to just four: GlassFish, IBM WebSphere, Jetty and Apache Tomcat. And if you want to reuse some of your code, there are always JavaBeansreusable software components for Java” which are by no means to be confused with Enterprise JavaBeansa managed, server-side component architecture for modular construction of enterprise applications” (whatever this means). Well, um, this is kinda complicated, don't you think? Me, I like to keep things simple. This approach does wonders for my blood pressure level. I'm not trying to say abundance of choice is bad thing per se, but there is a number of theories which prove that more choices can prove arduous to make the right decisions. Sometimes Java community looks to me very much like Rebel Alliance which tries to keep things as complicated as possible in order to prevent the evil Galactic Empire from finding out how Java really works. The downside is that only Jedi programmers can understand it. The rest of us are left clueless most of the time.
To add insult to injury, JDK has forked recently and, in addition to official Oracle (Sun) version, there is also OpenJDK. This all very fine, but the problem is that some IDEs like Eclipse and NetBeans would not work with OpenJDK. No explanation given. Just like that.

Apache MINA
Apache foundation maintains a museum of unfinished and/or abandoned projects. In addition they apparently suffer from NIH (Not Invented Here) psychology. When you peek at their site this proud statement stands out:

The ASF is made up of over 100 top level projects that cover a wide range of technologies. Chances are if you are looking for a rewarding experience in Open Source, you are going to find it here.

I mean, not even Microsoft has “more than 100 top level projects”. Just imagine what it would look like if they had to develop Windows, Office and another 99 projects. Things are bad enough even as they are. I am no fan of Microsoft, but if they can't do it - no one can. Unless you are prepared to treat some of your 100+ children as step children. And no one wants to run into one of Apache's stepchildren in the course of planing his/hers new project.
Back to MINA. When you first visit http://mina.apache.org/ you notice that you are on Apache MINA Project page, and that MINA is a sub-project of Apache MINA Project (?!) along with a few other (one of which is “currently dormant”). When you click on the menu option MINA, only then you are transferred to MINA homepage. Anyone else confused, except me?
My next step was to see the documentation. And I found it rather strange. Some parts are well written. On the other hand, chapters are missing, links are missing and there are a lot of presentations and their quality is very uneven. Documentation as a whole is a hotchpotch and I found it very confusing. It's one of those sites where you have to bookmark an interesting page or risk never finding it again.

Netty
Actually, Netty is the result of trying to rebuild MINA from scratch and address the known problems, resulting in cleaner and more documented API. Documentation, although not so extensive as MINA's, but easier to navigate and easier to understand. The site is intuitive and usable.
To be fair I have to say that MINA has more out-of-the-box features. Unfortunately this makes it more complex and a bit slower than Netty. Some sources on the Internet say that Netty can be up to 10% faster than MINA. This, of course, is not a rule and depends a lot on what you do and how you do it. Having ready made features, on the other hand, can speed up your development.

Grizzly
MINA and Netty are community driven open source projects designed and written by the same author. Trustin (Heuiseung) Lee. Therefore, they share a lot of ideas and functionality. Grizzly is entirely different beast, as the name suggests. It was created in 2004 under the GlasFish project. Initially it was built as HTTP web server, replacing Tomcat's Coyote Connector and Sun WebServer 6. This became known as Grizzly 1.0 and it used to be shipped as a replacement for Sun WebServer. This version became very popular in 2006. That same year Sun stared developing Grizzly 1.5. It was open sourced it in February 2007 and officially released during 2007 JavaOne conference. Having all this in mind, I find it rather curious that MINA and Netty are more popular than Grizzly.

Installation
Each of these frameworks is actually a collection of jar files. The only thing you have to do is to download them and place them on the folder of your choice. No installation whatsoever. And, if you happen to have Java VM already installed, you have everything you need to venture into Java technology based NIO development. Just plug and play.
Popular graphical tools such as Eclipse IDE, IntelliJ IDEA and NetBeans can be used for development. And it's no hassle. All you have to do is copy jar files you downloaded to whatever place your favorite IDE tells you to, and you're ready to go.

(Not So) Simple Samples
Now, if you expected to see samples in 10 or less lines of code like in the case of Python or Node.js, well you can forget about it. This is Java we are talking about. Imports and declarations can be longer than that. The simplest example I could find was, surprise surprise, for Apache MINA:
import java.io.IOException;
import java.net.InetSocketAddress;
import java.nio.charset.Charset;

import org.apache.mina.core.service.IoAcceptor;
import org.apache.mina.core.session.IdleStatus;
import org.apache.mina.filter.codec.ProtocolCodecFilter;
import org.apache.mina.filter.codec.textline.TextLineCodecFactory;
import org.apache.mina.filter.logging.LoggingFilter;
import org.apache.mina.transport.socket.nio.NioSocketAcceptor;

public class MinaTimeServer
{
    private static final int PORT = 9123;
    public static void main( String[] args ) throws IOException
    {
        IoAcceptor acceptor = new NioSocketAcceptor();
        acceptor.getFilterChain().addLast( "logger", new LoggingFilter() );
        acceptor.getFilterChain().addLast( "codec", new ProtocolCodecFilter( 
    new TextLineCodecFactory( Charset.forName( "UTF-8" ))));
        acceptor.setHandler( new TimeServerHandler() );
        acceptor.getSessionConfig().setReadBufferSize( 2048 );
        acceptor.getSessionConfig().setIdleTime( IdleStatus.BOTH_IDLE, 10 );
        acceptor.bind( new InetSocketAddress(PORT) );
    }
}
That's not so much, what was the whining for? You need also the code for TimeServerHandler class:
import java.util.Date;

import org.apache.mina.core.session.IdleStatus;
import org.apache.mina.core.service.IoHandlerAdapter;
import org.apache.mina.core.session.IoSession;

public class TimeServerHandler extends IoHandlerAdapter
{
    @Override
    public void exceptionCaught( IoSession session, Throwable cause ) throws Exception
    {
        cause.printStackTrace();
    }
    @Override
    public void messageReceived( IoSession session, Object message ) throws Exception
    {
        String str = message.toString();
        if( str.trim().equalsIgnoreCase("quit") ) {
            session.close();
            return;
        }
        Date date = new Date();
        session.write( date.toString() );
        System.out.println("Message written...");
    }
    @Override
    public void sessionIdle( IoSession session, IdleStatus status ) throws Exception
    {
        System.out.println( "IDLE " + session.getIdleCount( status ));
    }
}
If you really, really feel the urge to test this example then type the following at command prompt:
telnet 127.0.0.1 9123
and the server will greet you with “hello” followed by current date-time. A lot of functionality for such a small code. Right?


The smallest self-contained example is this one from Netty documentation:
package io.netty.example.discard;

import io.netty.bootstrap.ServerBootstrap;

import io.netty.channel.ChannelFuture;
import io.netty.channel.ChannelInitializer;
import io.netty.channel.ChannelOption;
import io.netty.channel.EventLoopGroup;
import io.netty.channel.nio.NioEventLoopGroup;
import io.netty.channel.socket.SocketChannel;
import io.netty.channel.socket.nio.NioServerSocketChannel;

/**
 * Discards any incoming data.
 */
public class DiscardServer {

    private int port;

    public DiscardServer(int port) {
        this.port = port;
    }

    public void run() throws Exception {
        EventLoopGroup bossGroup = new NioEventLoopGroup(); // (1)
        EventLoopGroup workerGroup = new NioEventLoopGroup();
        try {
            ServerBootstrap b = new ServerBootstrap(); // (2)
            b.group(bossGroup, workerGroup)
             .channel(NioServerSocketChannel.class) // (3)
             .childHandler(new ChannelInitializer<SocketChannel>() { // (4)
                 @Override
                 public void initChannel(SocketChannel ch) throws Exception {
                     ch.pipeline().addLast(new DiscardServerHandler());
                 }
             })
             .option(ChannelOption.SO_BACKLOG, 128)          // (5)
             .childOption(ChannelOption.SO_KEEPALIVE, true); // (6)

            // Bind and start to accept incoming connections.
            ChannelFuture f = b.bind(port).sync(); // (7)

            // Wait until the server socket is closed.
            // In this example, this does not happen, but you can do that to
  // gracefully shut down your server.
            f.channel().closeFuture().sync();
        } finally {
            workerGroup.shutdownGracefully();
            bossGroup.shutdownGracefully();
        }
    }

    public static void main(String[] args) throws Exception {
        int port;
        if (args.length > 0) {
            port = Integer.parseInt(args[0]);
        } else {
            port = 8080;
        }
        new DiscardServer(port).run();
    }
}
One more example like this and this post would grow to 20 pages, so I'll stop right here. The morale is obvious. If the most basic servers take so many lines of code, how many lines of code are required for an application that really does something? It all translates to man-hours (um, person-hours to be politically correct) and that translates to cost.

Conclusion
One can but acknowledge the impact Java made and is still making on computer industry. Specially if you take Android into consideration. But (however), from the narrow perspective of wretched programmer, whose job is to create web application that is fast and easy to maintain, it's hard to see Java and frameworks based on it as God-given.
On the other hand, if anything, Java is platform independent. It'll soon be on a refrigerator near you and it will talk to your smart phone so you'll know what to buy on your way home. And, if you allow it, your telly is going to suggest your fridge which brands to buy. Still, I see Java more appropriate for client-server than for web applications. And all that is far from a thin client that can be written in JavaScript. Raise your hands all of you in favor of JavaScript. Thank you.
Here we reviewed an Apache project, a spin-off by the same author and a Oracle project. All very serious contestants. In essence, they are all pretty much alike. Some of them pack more ready-made features, and others make you sweat a little bit. On the Net you will find different opinions on which is better, faster or easier to work with. You'll favor one or the other based on what kind of programmer (person) you are and/or the type of the project you are planning. As usual.
If we were to judge the respective web sites of contestants, the points would go to Netty and Grizzly. They both have fairly similar front pages featuring block diagrams of their architecture. Both sites are clean and intuitive. The similarity stops here. Grizzly has much more elaborate presentation. The documentation is larger, more detailed, better organized and with more examples. Albeit a bit more complicated. Netty's site is simpler, the documentation is more basic, but much easier to understand for a newbie. Apache MINA's site? No comment.
Forced to choose I would most probably go with Grizzly. I am not usually in favor of corporate America's darlings, but in this case, I'd go the safe way. Mostly because of the team behind this project and possibility to get meaningful support.

P.S.
I almost forgot. There is yet another Apache Java NIO framework project. It is called Apache Deft. It's still in incubator. What? Where?

Sunday, November 17, 2013

A Snake in the Whirlwind
Non-blocking I/O web servers are kind of a newish thing around, so there aren't many of them at the moment. Web servers are aplenty. Few of them, called lightweight web servers, are supposed to be very fast. But these are all general purpose web servers and none of them is programable by web application developers. This relatively new kind of non-blocking I/O web servers based on some server-side scripting languages is in another category whatsoever. These are libraries/frameworks used for writing network applications, and not really web servers. Actually, application that you write becomes (or contains) your very own web server. Tornado is the first of the kind to be reviewed here.
Before we proceed to it just brief discussion on sockets, which was perhaps due in the post about non-blocking I/O, but better late than never.

Protocols, Ports and Sockets
On Unix, including its derivatives like Linux, everything is a file. Including I/O devices. In fact, especially I/O devices. If you type
# ls /dev
you shall see something like this:

agpgart fd0u1760 mem ram5 stdout tty29 tty50 usbmon1
block fd0u1840 net ram6 tty tty3 tty51 vcs
bsg fd0u1920 network_latency ram7 tty0 tty30 tty52 vcs1
bus fd0u360 network_throughput ram8 tty1 tty31 tty53 vcs2
cdrom fd0u720 null ram9 tty10 tty32 tty54 vcs3
cdrw fd0u800 oldmem ramzswap0 tty11 tty33 tty55 vcs4
char fd0u820 parport0 random tty12 tty34 tty56 vcs5
console fd0u830 pktcdvd rfkill tty13 tty35 tty57 vcs6
core full port root tty14 tty36 tty58 vcs7
cpu_dma_latency fuse ppp rtc tty15 tty37 tty59 vcs8
disk hpet psaux rtc0 tty16 tty38 tty6 vcsa
dvd input ptmx scd0 tty17 tty39 tty60 vcsa1
dvdrw kmsg pts sda tty18 tty4 tty61 vcsa2
ecryptfs log ram0 sda1 tty19 tty40 tty62 vcsa3
fb0 loop0 ram1 sda2 tty2 tty41 tty63 vcsa4
fd loop1 ram10 sda5 tty20 tty42 tty7 vcsa5
fd0 loop2 ram11 sg0 tty21 tty43 tty8 vcsa6
fd0u1040 loop3 ram12 sg1 tty22 tty44 tty9 vcsa7
fd0u1120 loop4 ram13 shm tty23 tty45 ttyS0 vcsa8
fd0u1440 loop5 ram14 snapshot tty24 tty46 ttyS1 vga_arbiter
fd0u1600 loop6 ram15 sndstat tty25 tty47 ttyS2 zero
fd0u1680 loop7 ram2 sr0 tty26 tty48 ttyS3
fd0u1722 mapper ram3 stderr tty27 tty49 urandom
fd0u1743 mcelog ram4 stdin tty28 tty5 usbmon0

Even more interesting for us here is what you get when you type:
# ls /proc/net

anycast6 dev_snmp6 ip_mr_cache netfilter psched rt6_stats sockstat tcp6 udplite6
arp if_inet6 ip_mr_vif netlink ptype rt_acct sockstat6 tr_rif unix
connector igmp ipv6_route netstat raw rt_cache softnet_stat udp wireless
dev igmp6 mcfilter packet raw6 snmp stat udp6
dev_mcast ip6_flowlabel mcfilter6 protocols route snmp6 tcp udplite

Files tcp and udp are obviously used for protocol communication (like all the others in this folder, btw). This also means that all inter-computer (a.k.a. Internet) communication is actually based on plain ASCII text. If you don't believe me (which hurts a bit), and you are around some *nix machine, type:
# socket -wslqvp "echo Hello from socket 2013!" 2013
This opens a socket, connects it to port 2013, and then socket waits for an unsuspecting victim. When you navigate your browser to localhost:2013
Hello from socket 2013!
appears in it. Easy, isn't it? So, it's all plain text and the only thing left is to agree on the dictionary we all use. That is protocol.
The problem arises when my computer tries to socialize with his long lost brother (because, the odds are, they were both made in the same factory in China) on the other side of the world. On both sides, there could be a number of programs trying to “reach out and touch” the same program on the other computer(s). Besides IP address each of them needs to know another number on which the other side is (hopefully) listening, and where some program speaks the same lingo. That is port number. If you really, I mean really, need to see all port numbers in use, see this huge list.
So, you decided to visit your favorite social network site. With another 9,999,999 suckers users at the same time. Whoops. It's quite clear that their servers can not handle all your requests over single port. Everything would mix up and you wold probably get someone else's profile on your screen. A lot like with good old party lines. Dream come true for every hacker and a nightmare for everyone else. Therefore, when you appear on their server it immediately assigns a random number which will be used instead of one of the well known port numbers in further communication with you (until the wee hours, needles to say). That is socket.
The bottom line is that if you need a fast web server software that will work for only one application, your application, then write it yourself. And if you can do it in high-level programing language using a library/framework that takes worries about low-level stuff like protocols, ports and sockets off your shoulders, even better.

Python
Python is general-purpose, high-level programing language. It runs on Windows, Linux/Unix and Mac OS X.
There was a comment on one of previous post about the “true nature of Python”, or weather it is interpreted or compiled language. Let's settle this first. Python is interpreted language, but it can also be compiled into bytecode (p-code). A little more lite could be shed by the list of possible file extensions:
.py – sript source code, 
.pyc - compiled script (Bytecode or P-code, whichever you prefer to call it), 
.pyo - optimized .pyc file, 
.pyw - Python script for Windows, executed with pythonw.exe without invoking the console, 
.pyd - Python script made as a DLL, 
.pyx – Cython script source code that can be converted to C/C++.
You can invoke interpreter and have syntax checked at each run, or you can compile the script to .pyc and then run compiled code a tad faster. So, is Python interpreted or compiled? The answer is yes.
Python has many implementations and dialects. There's:
  • CPython - default, most widely used Python written in C,
  • PyPy (a.k.a. RPython) - interpreter and just-in-time compiler, focused on speed and compatibility with Cpython,
  • Jython (former JPython) - implementation of Python written in C,
  • IronPython - Python implementation trageting .NET and Mono and,
  • Cython - superset of Python with a foreign function interface for invoking C/C++ routines (very important for us).
There are also py2exe, py2app and pyinstaller, tools that enable you to package your Python script and runtime interpreter into executable. Obviously, not having enough choices is not a problem here.
Where Python really excels is scientific and especially mathematical software. As if it doesn't excel elsewhere, says you! Okay, what I meant was that Python is practically standard for free open source mathematical software. You have, for example, SciPy and SimPy. Have a look at these two, they are both really cool. But, the über cool system is Sage.
Sage is mathematical software that packages a number of FOSS alternatives of Matlab and Mathematica, and I dare say, a bit more than that. All that available from browser and programmable in Python. Open sage notebook, log in with you e.g. Google account, have fun. Sage is based on Apache and not on Tornado, but still a perfect example for what I am trying to prove in this blog.
A mini conclusion on Python would be that it is a very powerful, and definitely not just another server-side scripting language. It runs on all major operating systems, on both client and server-side. There is a number of libraries/frameworks written for it. Last but not least, Python has a very strong community that support it.

Tornado
On Wikipedia Tornado is described as:
a scalable, non-blocking web server and web application framework written in Python. It was developed for use by FriendFeed; the company was acquired by Facebook in 2009 and Tornado was open-sourced soon after.
On their official page it is seen as:
a Python web framework and asynchronous networking library, originally developed at FriendFeed. By using non-blocking network I/O, Tornado can scale to tens of thousands of open connections, making it ideal for long polling, WebSockets, and other applications that require a long-lived connection to each user.
Be it as it may, the thingy is among the fastest thingies of its kind. Tornado is one of the web servers that are trying to solve C10K problem. On the mentioned Wikipedia page, you can see how it outperforms some other web servers (or frameworks), written also in Python, which operate on top of general purpose web servers. Precisely our point here.

Installation
Tornado can be installed automatically by typing:
pip install tornado
Manual installation requires a bit of an effort on your side. First, you have to download tar.gz of current version from https://pypi.python.org. Then type:
tar xvzf tornado-x.y.z.tar.gz
cd tornado-x.y.z
python setup.py build
sudo python setup.py install
And that's it. Tornado runs on Python 2.6, 2.7, 3.2, and 3.3. Bad news, for some, is that it requires *nix machine.
Here's the list of Tornado classes:
Core web framework
  • tornado.web — RequestHandler and Application classes
  • tornado.httpserver — Non-blocking HTTP server
  • tornado.template — Flexible output generation
  • tornado.escape — Escaping and string manipulation
  • tornado.locale — Internationalization support
Asynchronous networking
  • tornado.gen — Simplify asynchronous code
  • tornado.ioloop — Main event loop
  • tornado.iostream — Convenient wrappers for non-blocking sockets
  • tornado.httpclient — Asynchronous HTTP client
  • tornado.netutil — Miscellaneous network utilities
  • tornado.tcpserver — Basic IOStream-based TCP server
Integration with other services
  • tornado.auth — Third-party login with OpenID and OAuth
  • tornado.platform.caresresolver — Asynchronous DNS Resolver using C-Ares
  • tornado.platform.twisted — Bridges between Twisted and Tornado
  • tornado.websocket — Bidirectional communication to the browser
  • tornado.wsgi — Interoperability with other Python frameworks and servers
Utilities
  • tornado.autoreload — Automatically detect code changes in development
  • tornado.concurrent — Work with threads and futures
  • tornado.httputil — Manipulate HTTP headers and URLs
  • tornado.log — Logging support
  • tornado.options — Command-line parsing
  • tornado.process — Utilities for multiple processes
  • tornado.stack_context — Exception handling across asynchronous callbacks
  • tornado.testing — Unit testing support for asynchronous code
  • tornado.util — General-purpose utilities
As you have noticed, there aren't too many of them. That again translates to ease of use and speed.

Simple Samples
We'll begin with the usual Hello World example. In Tornado it looks like this:
import tornado.ioloop
import tornado.web

class MainHandler(tornado.web.RequestHandler):
    def get(self):
        self.write("Hello, world")

application = tornado.web.Application([
    (r"/", MainHandler),
])

if __name__ == "__main__":
    application.listen(8888)
    tornado.ioloop.IOLoop.instance().start()
Actually, not much different from the equivalent in node.js. Set the port, initialize the application, start listening. The only real difference is that Python is working with classes explicitly. Huge advantage to my eye.
The second example has a bit surprising twist. We'll use the tornado.websocket class to write an example WebSocket handler that echoes back all received messages to client.
class EchoWebSocket(websocket.WebSocketHandler):
    def open(self):
        print "WebSocket opened"

    def on_message(self, message):
        self.write_message(u"You said: " + message)

    def on_close(self):
        print "WebSocket closed"
Now comes the surprise. You can invoke the above class in your JavaScript:
var ws = new WebSocket("ws://localhost:8888/websocket");
ws.onopen = function() {
   ws.send("Hello, world");
};
ws.onmessage = function (evt) {
   alert(evt.data);
};
Cute!

Conclusion
Python is serious object-oriented programing language. But, you already know that. I like it for being more C/C++ than lisp, as opposed to JavaScript which is a bit more functional than procedural. Python has a strong foothold in scientific software, including a number of mathematical packages and libraries. Bindings are developed for popular cross-platform GUI APIs like PyQt, PyGTK and wxPython. Last but not least, Python is the language of choice for some serious web application frameworks.
Tornado is nice piece of software. Small and fast, and I hope its developers are going to keep it that way. It is a bit more elaborate than node.js. For example, tornado.web class handles application object. Node.js needs additional module for that. If we start adding, Python + Tornado add to more readable and manageable code than JavaScript + Node.js. At least to me.
The only problem, from the point of view of this blog, is that Python can not be executed in a browser. And the main point here are network applications that run in browser as the only client-side GUI. They are already cross-platform and ubiquitous. Browser developers have to take care of different issues like operating systems, hardware configurations and security, to name a few. I've been in these shoes. If your application is a bit slow on your client's computer, they complain, but when their favorite browser hogs their computer, they buy a faster one. Computer, not browser. So, forget about writing GUI client, network communication and a lot more, and concentrate on your application logic instead.

Sunday, November 3, 2013

Test site
You might have noticed beautiful blue button to the right with the inscription “Test site”. If you venture to click on it, web page http://web-appz.hp.af.cm/ will open in the new tab. In it you'll see this:
                           Node.js testing site for web-appz.blogspot.com
I agree, it's not much. To be honest it is less than “not much”, but it is a beginning. Behind the page is this illustrious code:

var http = require('http');
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/html'});
res.end('Node.js testing site for <a href="http://web-appz.blogspot.com">web-appz.blogspot.com</a>');
}).listen(process.env.VMC_APP_PORT || 1337, null);


The experience with appfog was so satisfactory that I think it would be fair to share it.

AppFog
https://www.appfog.com/ is where the test site is hosted. I went there, I clicked “Sign Up” and was redirected to their sign-up page. Since they are presently enhancing their automated sign-up process I sent an e-mail to their support team, and presto, in less then 15 minutes my account was created and ready to use. Needles to say, I opted for a free account and this is what I bought for that kind of money:


Detailed pricing could be found on https://www.appfog.com/pricing. Later I shall opt for some other plan, but for the beginning it is more that I hoped for.
Once my account was open I started creating my first application. There is a point-and-click interface to jump-start the process. First, you choose app or framework from 14 available ones at the moment (including node.js, Drupal and Ruby on Rails just to name a few). Second, you choose infrastructure. I chose HP. And third, you choose the sub-domain name. My obvious choice was web-appz. Then the system created application skeleton for me. And, just between you and me , that is the code listed above. I did tweak it a little to replace “Hello World” with a link to this blog, but that was all. Also, there is a command-line tool that works nice and you need to install it in order to interact with your application.

So, my start was not as difficult as I thought it would be. On the other hand, the application is far from a serious example I want it to be, but I hope to expand it soon.

Monday, October 28, 2013

Web Servers and the Zen of Non-blocking I/O

This post started as a short introduction to the first one in a mini-series of head-to-head comparisons of three non-blocking I/O web servers. And then things got out of control. Being as it is, I convinced myself that a bit more elaborate introduction is in order. Now comes the hard part, to convince everyone else.

Web servers
Web servers are complicated machines, both hardware and software-wise. The illustration below shows block diagram of a typical web server based on LAMP (Linux Apache MySQL/MariaDB) Perl/PHP/Python) bundle (or stack as some prefer to say).

LAMP bundle web server

Told' ya it was complicated, didn't I? And this is just a block diagram. There are variants of this stack, such as WAMP (Windows Apache MySQL/MariaDB) Perl/PHP/Python). By the way, these bundles are really easy to install. Companies like TurnKey Linux offer a number of software appliances based on Linux in number of formats, including the virtual machine ready ones. All you have to do is download and start VM. That is exactly what hosting companies do. Except that they, presumably, package their own bundles.
Luckily for us, we are going to concentrate only on the little box on the left with the title “Web server”. There are two types of web server software. The one on the picture, Apache, is so called user-mode web server, meaning that it runs in user-space like any other application. The other type is known as in-kernel (or kernel-mode) web server, and the representative is Microsoft IIS. The in-kernel web server software is on a first-name basis with the computer hardware and that supposedly makes it faster than user-space counterparts. The downside is that they are tied to single operating system and can not work on any other. On the other hand, user-space web server software has to compete for the hardware resources with all the other running applications (processes), and that is why they sometimes can be slower. The good thing is that they are operating-system-free, like in LAMP and WAMP.
Still, not all web servers are made equal. Some are built for speed and others for features. But, what interests us here is how they handle I/O.

Blocking and non-Blocking I/O
There is a lot of theory behind handling I/O operations. It involves queueing theory, and it has a lot to do with stochastic processes for which you need to study Markov chains. And that is just the beginning. Not to mention that the whole shebang in most cases is just a model (read: approximation), and since this is a blog and not an online university, we'll revert to analogy.
OK, you are very hungry and looking for a place to eat. In this unhappy hour you are facing two possibilities: fast-food restaurant to your left and a full service restaurant to your right. If you choose to enter the fast-food restaurant, doesn't really matter if it's Windows, Apple or Linux chain, you will see a number of cash registers and a queue in front of each. If you are disrespectful to Murphy's laws, you'll even stop and thinker which queue to join, which would be a great mistake (both theoretically and practically). At the head of each queue there is a customer that is being served at the moment you entered the restaurant, and the next customer in each line can not order until the previous one is finished with his/hers. In case someone in the queue has a large family, you will wait forever for your bite (or Mega Byte). This is blocking I/O.
It is obvious that waiting time here is very susceptible to the number of requests (customers). Also, the system can not prioritize between someone who wants to order just one item and someone else who is ordering full tray. Adding resources (cash registers in our example), is an obvious remedy to the situation. The problem is that it's costly, and a balance has to be made between rush (peak) hour and the off-hour. No one likes to see his expensive assets being idle most of the day and working their normal capacity during peak hours. Preferred situation for the owner is that the system is at nominal capacity most of the day, and in overdrive at rush hour, even at the risk of losing few customers. Technology and economy hand in hand, as always. On the other hand, Windows and Apple chains of restaurants opted for maximizing customer experience by providing abundant resources. We all know on whose expense.
Waiting in the queue made you even more hungry, so you switched to the restaurant across the street. In there, you click, sorry point your finger, at the menu entry, especially in a French restaurant (hoping you did not order something inedible). The waiter takes your order to kitchen and it gets prepared there. In the kitchen, orders are being processed in parallel (more or less), so that orders, no matter how big or small, do not wait for previous others to be finished. This is non-blocking I/O.
The concept of non-blocking I/O allows you to introduce various scheduling policies for performance fine-tuning and enhancement (it applies to restaurant as well as web server). The article on queueing theory lists some of them:
Processor sharing
Service capacity is shared equally between customers.

Priority
Customers with high priority are served first. Priority queues can be of two types, non-preemptive (where a job in service cannot be interrupted) and preemptive (where a job in service can be interrupted by a higher priority job). No work is lost in either model.

Shortest job first
The next job to be served is the one with the smallest size

Preemptive shortest job first
The next job to be served is the one with the original smallest size

Shortest remaining processing time
The next job to serve is the one with the smallest remaining processing requirement.
Of course, adding resources also helps here. In addition, operating system could be optimized (up to a point) for better cooperation with queueing mechanism for further improvements.

C10K Problem
Some think that the C10K is so last decade, and are targeting C10M problem already. The C10k refers to the problem of optimizing network sockets to handle ten thousand connections at the same time. Relatively few web servers address this problem and, according to this article on Wikipedia, node.js is one of them. Socket.io, a library for node.js, is written to help address this specific problem. But, since this topic is not the focus of this post, I shall return to it in some future one.

Satori
From the illustration above, we can see that there are three layers that affect the performance of a web server from the prospective of handling I/O operations. These are: hardware, operating system and application software.
Hardware is not a bottleneck anymore. 64 bit CPUs in gigahertz region, very fast RAM in gigabytes and discs in terabytes o or petabytes. No, we have to look elsewhere for performance improvements.
Operating systems, or more precisely, server operating systems are sort of a problem in respect to handling I/O. In use today are general purpose server operating systems. Even worse, they are general purpose operating systems slightly modified to operate as servers. Unix, for instance, was originally designed to act as a control system for a telephone network. Not for handling data, telephone network took care of that. Another thing, what we are forgetting is that server in the beginning meant file server. Novel NetWare was file server. Actually, Netware was the file server. It was not built as a software layer on top of general purpose OS, but a special purpose OS without time sharing. NetWare had its own file system NWFS. What it did not have was preemption, virtual memory and, heaven forbid, graphical user interface. While we are at it, one of the biggest mysteries of the trade is why certain server OS from certain very large software company has graphical user interface. Even bigger mystery is why the same large software company decided, in the version 2012 of the said server, to introduce the option to switch it off! Performance issues? Welcome back to NetWare. A bit simplified version goes like this: Novel NetWare had one process which used round-robin scheduling to serve files to its users. If you wanted a file from server, the scheduler placed you in the list along with others. Then NetWare would pitch file chunks to users, as a poker dealer does to players at a poker table. Once you received the whole file, you were out of the list. Using this technique and few others, like scheduling policies similar to the ones mentioned before, NetWare outperformed competition's file servers of the time by 5:1 or even 10:1. In the meantime, general purpose operating systems improved, and hardware accelerated to the extent where OS inefficiency is largely compensated by hardware speed. Still, NetWare concept is hard to beat.
What is so wrong with general purpose OS dubbing as file and/or web server? Preemptive multitasking provides every user the feeling that his program is the only one running on the computer. This is done by forcing (preempting) processes to share hardware resources. For their own good, of course. The outcome is that below the surface of this cyber-democracy rages a battle for every CPU cycle and every block of data. As a result we can have resource starvation which can leave our web server software at the wrong end of the stick. Users of such servers, on the other hand, have completely unfounded impression that processes that handle networking and I/O should be given advantage. Web servers resemble a patient whose psychiatrist said to him: “You have a very strong ego, but it has no foundation in reality”.

And how exactly are we enlightened by this development? Obviously, general-purpose preemptive-multitasking operating systems with thin layer of web server software can have serious problems handling large number of I/O requests. But, it is hard to believe that some new cooperative-multitasking operating system will appear as a reincarnation of beloved Novel NetWare, and put our harts and minds at ease. So, what can we do about it? As always, if you want a thing done well, do it yourself. Write your own web server that takes the I/O handling load off the operating system. In the next few post we'll take a look at software platforms which enable you to do just that.