Sunday, November 17, 2013

A Snake in the Whirlwind
Non-blocking I/O web servers are kind of a newish thing around, so there aren't many of them at the moment. Web servers are aplenty. Few of them, called lightweight web servers, are supposed to be very fast. But these are all general purpose web servers and none of them is programable by web application developers. This relatively new kind of non-blocking I/O web servers based on some server-side scripting languages is in another category whatsoever. These are libraries/frameworks used for writing network applications, and not really web servers. Actually, application that you write becomes (or contains) your very own web server. Tornado is the first of the kind to be reviewed here.
Before we proceed to it just brief discussion on sockets, which was perhaps due in the post about non-blocking I/O, but better late than never.

Protocols, Ports and Sockets
On Unix, including its derivatives like Linux, everything is a file. Including I/O devices. In fact, especially I/O devices. If you type
# ls /dev
you shall see something like this:

agpgart fd0u1760 mem ram5 stdout tty29 tty50 usbmon1
block fd0u1840 net ram6 tty tty3 tty51 vcs
bsg fd0u1920 network_latency ram7 tty0 tty30 tty52 vcs1
bus fd0u360 network_throughput ram8 tty1 tty31 tty53 vcs2
cdrom fd0u720 null ram9 tty10 tty32 tty54 vcs3
cdrw fd0u800 oldmem ramzswap0 tty11 tty33 tty55 vcs4
char fd0u820 parport0 random tty12 tty34 tty56 vcs5
console fd0u830 pktcdvd rfkill tty13 tty35 tty57 vcs6
core full port root tty14 tty36 tty58 vcs7
cpu_dma_latency fuse ppp rtc tty15 tty37 tty59 vcs8
disk hpet psaux rtc0 tty16 tty38 tty6 vcsa
dvd input ptmx scd0 tty17 tty39 tty60 vcsa1
dvdrw kmsg pts sda tty18 tty4 tty61 vcsa2
ecryptfs log ram0 sda1 tty19 tty40 tty62 vcsa3
fb0 loop0 ram1 sda2 tty2 tty41 tty63 vcsa4
fd loop1 ram10 sda5 tty20 tty42 tty7 vcsa5
fd0 loop2 ram11 sg0 tty21 tty43 tty8 vcsa6
fd0u1040 loop3 ram12 sg1 tty22 tty44 tty9 vcsa7
fd0u1120 loop4 ram13 shm tty23 tty45 ttyS0 vcsa8
fd0u1440 loop5 ram14 snapshot tty24 tty46 ttyS1 vga_arbiter
fd0u1600 loop6 ram15 sndstat tty25 tty47 ttyS2 zero
fd0u1680 loop7 ram2 sr0 tty26 tty48 ttyS3
fd0u1722 mapper ram3 stderr tty27 tty49 urandom
fd0u1743 mcelog ram4 stdin tty28 tty5 usbmon0

Even more interesting for us here is what you get when you type:
# ls /proc/net

anycast6 dev_snmp6 ip_mr_cache netfilter psched rt6_stats sockstat tcp6 udplite6
arp if_inet6 ip_mr_vif netlink ptype rt_acct sockstat6 tr_rif unix
connector igmp ipv6_route netstat raw rt_cache softnet_stat udp wireless
dev igmp6 mcfilter packet raw6 snmp stat udp6
dev_mcast ip6_flowlabel mcfilter6 protocols route snmp6 tcp udplite

Files tcp and udp are obviously used for protocol communication (like all the others in this folder, btw). This also means that all inter-computer (a.k.a. Internet) communication is actually based on plain ASCII text. If you don't believe me (which hurts a bit), and you are around some *nix machine, type:
# socket -wslqvp "echo Hello from socket 2013!" 2013
This opens a socket, connects it to port 2013, and then socket waits for an unsuspecting victim. When you navigate your browser to localhost:2013
Hello from socket 2013!
appears in it. Easy, isn't it? So, it's all plain text and the only thing left is to agree on the dictionary we all use. That is protocol.
The problem arises when my computer tries to socialize with his long lost brother (because, the odds are, they were both made in the same factory in China) on the other side of the world. On both sides, there could be a number of programs trying to “reach out and touch” the same program on the other computer(s). Besides IP address each of them needs to know another number on which the other side is (hopefully) listening, and where some program speaks the same lingo. That is port number. If you really, I mean really, need to see all port numbers in use, see this huge list.
So, you decided to visit your favorite social network site. With another 9,999,999 suckers users at the same time. Whoops. It's quite clear that their servers can not handle all your requests over single port. Everything would mix up and you wold probably get someone else's profile on your screen. A lot like with good old party lines. Dream come true for every hacker and a nightmare for everyone else. Therefore, when you appear on their server it immediately assigns a random number which will be used instead of one of the well known port numbers in further communication with you (until the wee hours, needles to say). That is socket.
The bottom line is that if you need a fast web server software that will work for only one application, your application, then write it yourself. And if you can do it in high-level programing language using a library/framework that takes worries about low-level stuff like protocols, ports and sockets off your shoulders, even better.

Python
Python is general-purpose, high-level programing language. It runs on Windows, Linux/Unix and Mac OS X.
There was a comment on one of previous post about the “true nature of Python”, or weather it is interpreted or compiled language. Let's settle this first. Python is interpreted language, but it can also be compiled into bytecode (p-code). A little more lite could be shed by the list of possible file extensions:
.py – sript source code, 
.pyc - compiled script (Bytecode or P-code, whichever you prefer to call it), 
.pyo - optimized .pyc file, 
.pyw - Python script for Windows, executed with pythonw.exe without invoking the console, 
.pyd - Python script made as a DLL, 
.pyx – Cython script source code that can be converted to C/C++.
You can invoke interpreter and have syntax checked at each run, or you can compile the script to .pyc and then run compiled code a tad faster. So, is Python interpreted or compiled? The answer is yes.
Python has many implementations and dialects. There's:
  • CPython - default, most widely used Python written in C,
  • PyPy (a.k.a. RPython) - interpreter and just-in-time compiler, focused on speed and compatibility with Cpython,
  • Jython (former JPython) - implementation of Python written in C,
  • IronPython - Python implementation trageting .NET and Mono and,
  • Cython - superset of Python with a foreign function interface for invoking C/C++ routines (very important for us).
There are also py2exe, py2app and pyinstaller, tools that enable you to package your Python script and runtime interpreter into executable. Obviously, not having enough choices is not a problem here.
Where Python really excels is scientific and especially mathematical software. As if it doesn't excel elsewhere, says you! Okay, what I meant was that Python is practically standard for free open source mathematical software. You have, for example, SciPy and SimPy. Have a look at these two, they are both really cool. But, the über cool system is Sage.
Sage is mathematical software that packages a number of FOSS alternatives of Matlab and Mathematica, and I dare say, a bit more than that. All that available from browser and programmable in Python. Open sage notebook, log in with you e.g. Google account, have fun. Sage is based on Apache and not on Tornado, but still a perfect example for what I am trying to prove in this blog.
A mini conclusion on Python would be that it is a very powerful, and definitely not just another server-side scripting language. It runs on all major operating systems, on both client and server-side. There is a number of libraries/frameworks written for it. Last but not least, Python has a very strong community that support it.

Tornado
On Wikipedia Tornado is described as:
a scalable, non-blocking web server and web application framework written in Python. It was developed for use by FriendFeed; the company was acquired by Facebook in 2009 and Tornado was open-sourced soon after.
On their official page it is seen as:
a Python web framework and asynchronous networking library, originally developed at FriendFeed. By using non-blocking network I/O, Tornado can scale to tens of thousands of open connections, making it ideal for long polling, WebSockets, and other applications that require a long-lived connection to each user.
Be it as it may, the thingy is among the fastest thingies of its kind. Tornado is one of the web servers that are trying to solve C10K problem. On the mentioned Wikipedia page, you can see how it outperforms some other web servers (or frameworks), written also in Python, which operate on top of general purpose web servers. Precisely our point here.

Installation
Tornado can be installed automatically by typing:
pip install tornado
Manual installation requires a bit of an effort on your side. First, you have to download tar.gz of current version from https://pypi.python.org. Then type:
tar xvzf tornado-x.y.z.tar.gz
cd tornado-x.y.z
python setup.py build
sudo python setup.py install
And that's it. Tornado runs on Python 2.6, 2.7, 3.2, and 3.3. Bad news, for some, is that it requires *nix machine.
Here's the list of Tornado classes:
Core web framework
  • tornado.web — RequestHandler and Application classes
  • tornado.httpserver — Non-blocking HTTP server
  • tornado.template — Flexible output generation
  • tornado.escape — Escaping and string manipulation
  • tornado.locale — Internationalization support
Asynchronous networking
  • tornado.gen — Simplify asynchronous code
  • tornado.ioloop — Main event loop
  • tornado.iostream — Convenient wrappers for non-blocking sockets
  • tornado.httpclient — Asynchronous HTTP client
  • tornado.netutil — Miscellaneous network utilities
  • tornado.tcpserver — Basic IOStream-based TCP server
Integration with other services
  • tornado.auth — Third-party login with OpenID and OAuth
  • tornado.platform.caresresolver — Asynchronous DNS Resolver using C-Ares
  • tornado.platform.twisted — Bridges between Twisted and Tornado
  • tornado.websocket — Bidirectional communication to the browser
  • tornado.wsgi — Interoperability with other Python frameworks and servers
Utilities
  • tornado.autoreload — Automatically detect code changes in development
  • tornado.concurrent — Work with threads and futures
  • tornado.httputil — Manipulate HTTP headers and URLs
  • tornado.log — Logging support
  • tornado.options — Command-line parsing
  • tornado.process — Utilities for multiple processes
  • tornado.stack_context — Exception handling across asynchronous callbacks
  • tornado.testing — Unit testing support for asynchronous code
  • tornado.util — General-purpose utilities
As you have noticed, there aren't too many of them. That again translates to ease of use and speed.

Simple Samples
We'll begin with the usual Hello World example. In Tornado it looks like this:
import tornado.ioloop
import tornado.web

class MainHandler(tornado.web.RequestHandler):
    def get(self):
        self.write("Hello, world")

application = tornado.web.Application([
    (r"/", MainHandler),
])

if __name__ == "__main__":
    application.listen(8888)
    tornado.ioloop.IOLoop.instance().start()
Actually, not much different from the equivalent in node.js. Set the port, initialize the application, start listening. The only real difference is that Python is working with classes explicitly. Huge advantage to my eye.
The second example has a bit surprising twist. We'll use the tornado.websocket class to write an example WebSocket handler that echoes back all received messages to client.
class EchoWebSocket(websocket.WebSocketHandler):
    def open(self):
        print "WebSocket opened"

    def on_message(self, message):
        self.write_message(u"You said: " + message)

    def on_close(self):
        print "WebSocket closed"
Now comes the surprise. You can invoke the above class in your JavaScript:
var ws = new WebSocket("ws://localhost:8888/websocket");
ws.onopen = function() {
   ws.send("Hello, world");
};
ws.onmessage = function (evt) {
   alert(evt.data);
};
Cute!

Conclusion
Python is serious object-oriented programing language. But, you already know that. I like it for being more C/C++ than lisp, as opposed to JavaScript which is a bit more functional than procedural. Python has a strong foothold in scientific software, including a number of mathematical packages and libraries. Bindings are developed for popular cross-platform GUI APIs like PyQt, PyGTK and wxPython. Last but not least, Python is the language of choice for some serious web application frameworks.
Tornado is nice piece of software. Small and fast, and I hope its developers are going to keep it that way. It is a bit more elaborate than node.js. For example, tornado.web class handles application object. Node.js needs additional module for that. If we start adding, Python + Tornado add to more readable and manageable code than JavaScript + Node.js. At least to me.
The only problem, from the point of view of this blog, is that Python can not be executed in a browser. And the main point here are network applications that run in browser as the only client-side GUI. They are already cross-platform and ubiquitous. Browser developers have to take care of different issues like operating systems, hardware configurations and security, to name a few. I've been in these shoes. If your application is a bit slow on your client's computer, they complain, but when their favorite browser hogs their computer, they buy a faster one. Computer, not browser. So, forget about writing GUI client, network communication and a lot more, and concentrate on your application logic instead.