End-to-end JavaScript Web Applications

A Snake in the Whirlwind

Non-blocking I/O web servers are kind of a newish thing around, so there aren't many of them at the moment. Web servers are aplenty. Few of them, called lightweight web servers, are supposed to be very fast. But these are all general purpose web servers and none of them is programable by web application developers. This relatively new kind of non-blocking I/O web servers based on some server-side scripting languages is in another category whatsoever. These are libraries/frameworks used for writing network applications, and not really web servers. Actually, application that you write becomes (or contains) your very own web server. Tornado is the first of the kind to be reviewed here.

Before we proceed to it just brief discussion on sockets, which was perhaps due in the post about non-blocking I/O, but better late than never.

Protocols, Ports and Sockets

On Unix, including its derivatives like Linux, everything is a file. Including I/O devices. In fact, especially I/O devices. If you type

# ls /dev

you shall see something like this:

agpgart	fd0u1760	mem	ram5	stdout	tty29	tty50	usbmon1
block	fd0u1840	net	ram6	tty	tty3	tty51	vcs
bsg	fd0u1920	network_latency	ram7	tty0	tty30	tty52	vcs1
bus	fd0u360	network_throughput	ram8	tty1	tty31	tty53	vcs2
cdrom	fd0u720	null	ram9	tty10	tty32	tty54	vcs3
cdrw	fd0u800	oldmem	ramzswap0	tty11	tty33	tty55	vcs4
char	fd0u820	parport0	random	tty12	tty34	tty56	vcs5
console	fd0u830	pktcdvd	rfkill	tty13	tty35	tty57	vcs6
core	full	port	root	tty14	tty36	tty58	vcs7
cpu_dma_latency	fuse	ppp	rtc	tty15	tty37	tty59	vcs8
disk	hpet	psaux	rtc0	tty16	tty38	tty6	vcsa
dvd	input	ptmx	scd0	tty17	tty39	tty60	vcsa1
dvdrw	kmsg	pts	sda	tty18	tty4	tty61	vcsa2
ecryptfs	log	ram0	sda1	tty19	tty40	tty62	vcsa3
fb0	loop0	ram1	sda2	tty2	tty41	tty63	vcsa4
fd	loop1	ram10	sda5	tty20	tty42	tty7	vcsa5
fd0	loop2	ram11	sg0	tty21	tty43	tty8	vcsa6
fd0u1040	loop3	ram12	sg1	tty22	tty44	tty9	vcsa7
fd0u1120	loop4	ram13	shm	tty23	tty45	ttyS0	vcsa8
fd0u1440	loop5	ram14	snapshot	tty24	tty46	ttyS1	vga_arbiter
fd0u1600	loop6	ram15	sndstat	tty25	tty47	ttyS2	zero
fd0u1680	loop7	ram2	sr0	tty26	tty48	ttyS3
fd0u1722	mapper	ram3	stderr	tty27	tty49	urandom
fd0u1743	mcelog	ram4	stdin	tty28	tty5	usbmon0

Even more interesting for us here is what you get when you type:

# ls /proc/net

anycast6	dev_snmp6	ip_mr_cache	netfilter	psched	rt6_stats	sockstat	tcp6	udplite6
arp	if_inet6	ip_mr_vif	netlink	ptype	rt_acct	sockstat6	tr_rif	unix
connector	igmp	ipv6_route	netstat	raw	rt_cache	softnet_stat	udp	wireless
dev	igmp6	mcfilter	packet	raw6	snmp	stat	udp6
dev_mcast	ip6_flowlabel	mcfilter6	protocols	route	snmp6	tcp	udplite

Files tcp and udp are obviously used for protocol communication (like all the others in this folder, btw). This also means that all inter-computer (a.k.a. Internet) communication is actually based on plain ASCII text. If you don't believe me (which hurts a bit), and you are around some *nix machine, type:

# socket -wslqvp "echo Hello from socket 2013!" 2013

This opens a socket, connects it to port 2013, and then socket waits for an unsuspecting victim. When you navigate your browser to localhost:2013

Hello from socket 2013!

appears in it. Easy, isn't it? So, it's all plain text and the only thing left is to agree on the dictionary we all use. That is protocol.

The problem arises when my computer tries to socialize with his long lost brother (because, the odds are, they were both made in the same factory in China) on the other side of the world. On both sides, there could be a number of programs trying to “reach out and touch” the same program on the other computer(s). Besides IP address each of them needs to know another number on which the other side is (hopefully) listening, and where some program speaks the same lingo. That is port number. If you really, I mean really, need to see all port numbers in use, see this huge list.

So, you decided to visit your favorite social network site. With another 9,999,999 ~~suckers~~ users at the same time. Whoops. It's quite clear that their servers can not handle all your requests over single port. Everything would mix up and you wold probably get someone else's profile on your screen. A lot like with good old party lines. Dream come true for every hacker and a nightmare for everyone else. Therefore, when you appear on their server it immediately assigns a random number which will be used instead of one of the well known port numbers in further communication with you (until the wee hours, needles to say). That is socket.

The bottom line is that if you need a fast web server software that will work for only one application, your application, then write it yourself. And if you can do it in high-level programing language using a library/framework that takes worries about low-level stuff like protocols, ports and sockets off your shoulders, even better.

Python

Python is general-purpose, high-level programing language. It runs on Windows, Linux/Unix and Mac OS X.

There was a comment on one of previous post about the “true nature of Python”, or weather it is interpreted or compiled language. Let's settle this first. Python is interpreted language, but it can also be compiled into bytecode (p-code). A little more lite could be shed by the list of possible file extensions:

.py – sript source code,
.pyc - compiled script (Bytecode or P-code, whichever you prefer to call it),
.pyo - optimized .pyc file,
.pyw - Python script for Windows, executed with pythonw.exe without invoking the console,
.pyd - Python script made as a DLL,
.pyx – Cython script source code that can be converted to C/C++.

You can invoke interpreter and have syntax checked at each run, or you can compile the script to .pyc and then run compiled code a tad faster. So, is Python interpreted or compiled? The answer is yes.

Python has many implementations and dialects. There's:

CPython - default, most widely used Python written in C,
PyPy (a.k.a. RPython) - interpreter and just-in-time compiler, focused on speed and compatibility with Cpython,
Jython (former JPython) - implementation of Python written in C,
IronPython - Python implementation trageting .NET and Mono and,
Cython - superset of Python with a foreign function interface for invoking C/C++ routines (very important for us).

There are also py2exe, py2app and pyinstaller, tools that enable you to package your Python script and runtime interpreter into executable. Obviously, not having enough choices is not a problem here.

Where Python really excels is scientific and especially mathematical software. As if it doesn't excel elsewhere, says you! Okay, what I meant was that Python is practically standard for free open source mathematical software. You have, for example, SciPy and SimPy. Have a look at these two, they are both really cool. But, the über cool system is Sage.

Sage is mathematical software that packages a number of FOSS alternatives of Matlab and Mathematica, and I dare say, a bit more than that. All that available from browser and programmable in Python. Open sage notebook, log in with you e.g. Google account, have fun. Sage is based on Apache and not on Tornado, but still a perfect example for what I am trying to prove in this blog.

A mini conclusion on Python would be that it is a very powerful, and definitely not just another server-side scripting language. It runs on all major operating systems, on both client and server-side. There is a number of libraries/frameworks written for it. Last but not least, Python has a very strong community that support it.

Tornado

On Wikipedia Tornado is described as:

a scalable, non-blocking web server and web application framework written in Python. It was developed for use by FriendFeed; the company was acquired by Facebook in 2009 and Tornado was open-sourced soon after.

On their official page it is seen as:

a Python web framework and asynchronous networking library, originally developed at FriendFeed. By using non-blocking network I/O, Tornado can scale to tens of thousands of open connections, making it ideal for long polling, WebSockets, and other applications that require a long-lived connection to each user.

Be it as it may, the thingy is among the fastest thingies of its kind. Tornado is one of the web servers that are trying to solve C10K problem. On the mentioned Wikipedia page, you can see how it outperforms some other web servers (or frameworks), written also in Python, which operate on top of general purpose web servers. Precisely our point here.

Installation

Tornado can be installed automatically by typing:

pip install tornado

Manual installation requires a bit of an effort on your side. First, you have to download tar.gz of current version from https://pypi.python.org. Then type:

tar xvzf tornado-x.y.z.tar.gz
cd tornado-x.y.z
python setup.py build
sudo python setup.py install

And that's it. Tornado runs on Python 2.6, 2.7, 3.2, and 3.3. Bad news, for some, is that it requires *nix machine.

Here's the list of Tornado classes:

Core web framework

tornado.web — RequestHandler and Application classes
tornado.httpserver — Non-blocking HTTP server
tornado.template — Flexible output generation
tornado.escape — Escaping and string manipulation
tornado.locale — Internationalization support

Asynchronous networking

tornado.gen — Simplify asynchronous code
tornado.ioloop — Main event loop
tornado.iostream — Convenient wrappers for non-blocking sockets
tornado.httpclient — Asynchronous HTTP client
tornado.netutil — Miscellaneous network utilities
tornado.tcpserver — Basic IOStream-based TCP server

Integration with other services

tornado.auth — Third-party login with OpenID and OAuth
tornado.platform.caresresolver — Asynchronous DNS Resolver using C-Ares
tornado.platform.twisted — Bridges between Twisted and Tornado
tornado.websocket — Bidirectional communication to the browser
tornado.wsgi — Interoperability with other Python frameworks and servers

Utilities

tornado.autoreload — Automatically detect code changes in development
tornado.concurrent — Work with threads and futures
tornado.httputil — Manipulate HTTP headers and URLs
tornado.log — Logging support
tornado.options — Command-line parsing
tornado.process — Utilities for multiple processes
tornado.stack_context — Exception handling across asynchronous callbacks
tornado.testing — Unit testing support for asynchronous code
tornado.util — General-purpose utilities

As you have noticed, there aren't too many of them. That again translates to ease of use and speed.

Simple Samples

We'll begin with the usual Hello World example. In Tornado it looks like this:

import tornado.ioloop
import tornado.web

class MainHandler(tornado.web.RequestHandler):
    def get(self):
        self.write("Hello, world")

application = tornado.web.Application([
    (r"/", MainHandler),
])

if __name__ == "__main__":
    application.listen(8888)
    tornado.ioloop.IOLoop.instance().start()

Actually, not much different from the equivalent in node.js. Set the port, initialize the application, start listening. The only real difference is that Python is working with classes explicitly. Huge advantage to my eye.

The second example has a bit surprising twist. We'll use the tornado.websocket class to write an example WebSocket handler that echoes back all received messages to client.

class EchoWebSocket(websocket.WebSocketHandler):
    def open(self):
        print "WebSocket opened"

    def on_message(self, message):
        self.write_message(u"You said: " + message)

    def on_close(self):
        print "WebSocket closed"

Now comes the surprise. You can invoke the above class in your JavaScript:

var ws = new WebSocket("ws://localhost:8888/websocket");
ws.onopen = function() {
   ws.send("Hello, world");
};
ws.onmessage = function (evt) {
   alert(evt.data);
};

Cute!

Conclusion

Python is serious object-oriented programing language. But, you already know that. I like it for being more C/C++ than lisp, as opposed to JavaScript which is a bit more functional than procedural. Python has a strong foothold in scientific software, including a number of mathematical packages and libraries. Bindings are developed for popular cross-platform GUI APIs like PyQt, PyGTK and wxPython. Last but not least, Python is the language of choice for some serious web application frameworks.

Tornado is nice piece of software. Small and fast, and I hope its developers are going to keep it that way. It is a bit more elaborate than node.js. For example, tornado.web class handles application object. Node.js needs additional module for that. If we start adding, Python + Tornado add to more readable and manageable code than JavaScript + Node.js. At least to me.

The only problem, from the point of view of this blog, is that Python can not be executed in a browser. And the main point here are network applications that run in browser as the only client-side GUI. They are already cross-platform and ubiquitous. Browser developers have to take care of different issues like operating systems, hardware configurations and security, to name a few. I've been in these shoes. If your application is a bit slow on your client's computer, they complain, but when their favorite browser hogs their computer, they buy a faster one. Computer, not browser. So, forget about writing GUI client, network communication and a lot more, and concentrate on your application logic instead.

Search This Blog