A
Snake in the Whirlwind
Non-blocking I/O web servers are kind of
a newish thing around, so there aren't many of them at the moment.
Web
servers are aplenty. Few of them, called lightweight
web servers, are supposed to be very fast. But these are all
general purpose web servers and none of them is programable by web
application developers. This relatively new kind of non-blocking I/O
web servers based on some server-side scripting languages is in
another category whatsoever. These are libraries/frameworks used for
writing network applications, and not really web servers. Actually,
application that you write becomes (or contains) your very own web
server. Tornado is the first of the kind to be reviewed here.
Before we proceed to
it just brief discussion on sockets, which was perhaps due in the
post about non-blocking I/O, but better late than never.
Protocols, Ports and Sockets
On Unix, including
its derivatives like Linux, everything is a file.
Including I/O devices. In fact, especially I/O devices. If you type
#
ls /dev
you shall see
something like this:
agpgart | fd0u1760 | mem | ram5 | stdout | tty29 | tty50 | usbmon1 |
block | fd0u1840 | net | ram6 | tty | tty3 | tty51 | vcs |
bsg | fd0u1920 | network_latency | ram7 | tty0 | tty30 | tty52 | vcs1 |
bus | fd0u360 | network_throughput | ram8 | tty1 | tty31 | tty53 | vcs2 |
cdrom | fd0u720 | null | ram9 | tty10 | tty32 | tty54 | vcs3 |
cdrw | fd0u800 | oldmem | ramzswap0 | tty11 | tty33 | tty55 | vcs4 |
char | fd0u820 | parport0 | random | tty12 | tty34 | tty56 | vcs5 |
console | fd0u830 | pktcdvd | rfkill | tty13 | tty35 | tty57 | vcs6 |
core | full | port | root | tty14 | tty36 | tty58 | vcs7 |
cpu_dma_latency | fuse | ppp | rtc | tty15 | tty37 | tty59 | vcs8 |
disk | hpet | psaux | rtc0 | tty16 | tty38 | tty6 | vcsa |
dvd | input | ptmx | scd0 | tty17 | tty39 | tty60 | vcsa1 |
dvdrw | kmsg | pts | sda | tty18 | tty4 | tty61 | vcsa2 |
ecryptfs | log | ram0 | sda1 | tty19 | tty40 | tty62 | vcsa3 |
fb0 | loop0 | ram1 | sda2 | tty2 | tty41 | tty63 | vcsa4 |
fd | loop1 | ram10 | sda5 | tty20 | tty42 | tty7 | vcsa5 |
fd0 | loop2 | ram11 | sg0 | tty21 | tty43 | tty8 | vcsa6 |
fd0u1040 | loop3 | ram12 | sg1 | tty22 | tty44 | tty9 | vcsa7 |
fd0u1120 | loop4 | ram13 | shm | tty23 | tty45 | ttyS0 | vcsa8 |
fd0u1440 | loop5 | ram14 | snapshot | tty24 | tty46 | ttyS1 | vga_arbiter |
fd0u1600 | loop6 | ram15 | sndstat | tty25 | tty47 | ttyS2 | zero |
fd0u1680 | loop7 | ram2 | sr0 | tty26 | tty48 | ttyS3 | |
fd0u1722 | mapper | ram3 | stderr | tty27 | tty49 | urandom | |
fd0u1743 | mcelog | ram4 | stdin | tty28 | tty5 | usbmon0 |
Even more interesting
for us here is what you get when you type:
#
ls /proc/net
anycast6 | dev_snmp6 | ip_mr_cache | netfilter | psched | rt6_stats | sockstat | tcp6 | udplite6 |
arp | if_inet6 | ip_mr_vif | netlink | ptype | rt_acct | sockstat6 | tr_rif | unix |
connector | igmp | ipv6_route | netstat | raw | rt_cache | softnet_stat | udp | wireless |
dev | igmp6 | mcfilter | packet | raw6 | snmp | stat | udp6 | |
dev_mcast | ip6_flowlabel | mcfilter6 | protocols | route | snmp6 | tcp | udplite |
Files tcp
and udp
are obviously used for protocol communication (like all the others in
this folder, btw). This also means that all inter-computer (a.k.a.
Internet) communication is actually based on plain ASCII text. If you
don't believe me (which hurts a bit), and you are around some *nix
machine, type:
#
socket -wslqvp "echo Hello from socket 2013!" 2013
This opens a socket,
connects it to port 2013, and then socket waits for an unsuspecting
victim. When you navigate your browser to localhost:2013
Hello
from socket 2013!
appears in it. Easy,
isn't it? So, it's all plain text and the only thing left is to agree
on the dictionary we all use. That is protocol.
The
problem arises when my computer tries to socialize with his long lost
brother (because, the odds are,
they were both made in the same factory in China) on the other side
of the world. On both sides, there could be a number of programs
trying to “reach out and touch” the same program on the other
computer(s). Besides IP address each of them needs to know another
number on which the other side is (hopefully) listening, and where
some program speaks the same lingo. That is port
number. If you really, I mean really,
need to see all port numbers in use, see this
huge list.
So, you decided to
visit your favorite social network site. With another 9,999,999
suckers users at the same time. Whoops. It's quite
clear that their servers can not handle all your requests over single
port. Everything would mix up and you wold probably get someone
else's profile on your screen. A lot like with good old party lines.
Dream come true for every hacker and a nightmare for everyone else.
Therefore, when you appear on their server it immediately assigns a
random number which will be used instead of one of the well
known port numbers in further communication with you (until the
wee hours, needles to say). That is socket.
The bottom line is
that if you need a fast web server software that will work for only
one application, your application, then write it yourself. And if you
can do it in high-level programing language using a library/framework
that takes worries about low-level stuff like protocols, ports and
sockets off your shoulders, even better.
Python
Python
is general-purpose, high-level programing language. It runs on
Windows, Linux/Unix and Mac OS X.
There was a comment
on one of previous post about the “true nature of Python”, or
weather it is interpreted or compiled language. Let's settle this
first. Python is interpreted language, but it can also be compiled
into bytecode (p-code). A little more lite could be shed by the list
of possible file extensions:
.py
– sript source code,
.pyc - compiled script (Bytecode or P-code, whichever you prefer to call it),
.pyo - optimized .pyc file,
.pyw - Python script for Windows, executed with pythonw.exe without invoking the console,
.pyd - Python script made as a DLL,
.pyx – Cython script source code that can be converted to C/C++.
.pyc - compiled script (Bytecode or P-code, whichever you prefer to call it),
.pyo - optimized .pyc file,
.pyw - Python script for Windows, executed with pythonw.exe without invoking the console,
.pyd - Python script made as a DLL,
.pyx – Cython script source code that can be converted to C/C++.
You can invoke
interpreter and have syntax checked at each run, or you can compile
the script to .pyc and then run compiled code a tad faster. So, is
Python interpreted or compiled? The answer is yes.
Python has many
implementations and dialects. There's:
- CPython - default, most widely used Python written in C,
- Jython (former JPython) - implementation of Python written in C,
- IronPython - Python implementation trageting .NET and Mono and,
- Cython - superset of Python with a foreign function interface for invoking C/C++ routines (very important for us).
There
are also py2exe, py2app and pyinstaller,
tools that enable you to package your Python script and runtime
interpreter into executable. Obviously, not having enough choices is
not a problem here.
Where Python really
excels is scientific and especially mathematical software. As if it
doesn't excel elsewhere, says you! Okay, what I meant was that Python
is practically standard for free open source mathematical software.
You have, for example, SciPy
and SimPy.
Have a look at these two, they are both really cool. But, the über
cool system is Sage.
Sage is
mathematical software that packages a number of FOSS alternatives of
Matlab and Mathematica, and I dare say, a bit more than that. All
that available from browser and programmable in Python. Open sage
notebook, log in with you e.g. Google account, have fun. Sage is
based on Apache and not on Tornado, but still a perfect example for
what I am trying to prove in this blog.
A mini conclusion on
Python would be that it is a very powerful, and definitely not just
another server-side scripting language. It runs on all major
operating systems, on both client and server-side. There is a number
of libraries/frameworks written for it. Last but not least, Python
has a very strong community that support it.
Tornado
On Wikipedia Tornado
is described as:
a scalable, non-blocking web server and web application framework
written in Python. It was developed for use by FriendFeed; the
company was acquired by Facebook in 2009 and Tornado was open-sourced
soon after.
On their official
page
it is seen as:
a Python web framework and asynchronous networking library,
originally developed at FriendFeed. By using non-blocking network
I/O, Tornado can scale to tens of thousands of open connections,
making it ideal for long polling, WebSockets, and other applications
that require a long-lived connection to each user.
Be it as it may, the
thingy is among the fastest thingies of its kind. Tornado is one of
the web servers that are trying to solve C10K problem. On the
mentioned Wikipedia page, you can see how it outperforms some other
web servers (or frameworks), written also in Python, which operate on
top of general purpose web servers. Precisely our point here.
Installation
Tornado can be
installed automatically by typing:
pip install tornado
Manual installation requires a bit of an effort on your side. First,
you have to download tar.gz of current version from
https://pypi.python.org.
Then type:
tar xvzf tornado-x.y.z.tar.gz
cd tornado-x.y.z
python setup.py build
sudo python setup.py install
And that's it. Tornado runs on Python 2.6,
2.7, 3.2, and 3.3. Bad news, for some, is that it requires *nix
machine.
Here's the list of
Tornado classes:
Core
web framework
- tornado.web — RequestHandler and Application classes
- tornado.httpserver — Non-blocking HTTP server
- tornado.template — Flexible output generation
- tornado.escape — Escaping and string manipulation
- tornado.locale — Internationalization support
Asynchronous
networking
- tornado.gen — Simplify asynchronous code
- tornado.ioloop — Main event loop
- tornado.iostream — Convenient wrappers for non-blocking sockets
- tornado.httpclient — Asynchronous HTTP client
- tornado.netutil — Miscellaneous network utilities
- tornado.tcpserver — Basic IOStream-based TCP server
Integration
with other services
- tornado.auth — Third-party login with OpenID and OAuth
- tornado.platform.caresresolver — Asynchronous DNS Resolver using C-Ares
- tornado.platform.twisted — Bridges between Twisted and Tornado
- tornado.websocket — Bidirectional communication to the browser
- tornado.wsgi — Interoperability with other Python frameworks and servers
Utilities
- tornado.autoreload — Automatically detect code changes in development
- tornado.concurrent — Work with threads and futures
- tornado.httputil — Manipulate HTTP headers and URLs
- tornado.log — Logging support
- tornado.options — Command-line parsing
- tornado.process — Utilities for multiple processes
- tornado.stack_context — Exception handling across asynchronous callbacks
- tornado.testing — Unit testing support for asynchronous code
- tornado.util — General-purpose utilities
As
you have noticed, there aren't too many of them. That again
translates to ease of use and speed.
Simple Samples
We'll begin with the
usual Hello World example. In Tornado it looks like this:
import tornado.ioloop
import tornado.web
class MainHandler(tornado.web.RequestHandler):
def get(self):
self.write("Hello, world")
application = tornado.web.Application([
(r"/", MainHandler),
])
if __name__ == "__main__":
application.listen(8888)
tornado.ioloop.IOLoop.instance().start()
Actually, not much different from the equivalent in node.js. Set the
port, initialize the application, start listening. The only real
difference is that Python is working with classes explicitly. Huge
advantage to my eye.
The second example
has a bit surprising twist. We'll use the tornado.websocket
class to write an
example WebSocket handler that echoes back all received messages to
client.
class EchoWebSocket(websocket.WebSocketHandler): def open(self): print "WebSocket opened" def on_message(self, message): self.write_message(u"You said: " + message) def on_close(self): print "WebSocket closed"
Now comes the surprise. You
can invoke
the above class in your JavaScript:
var ws = new WebSocket("ws://localhost:8888/websocket");
ws.onopen = function() {
ws.send("Hello, world");
};
ws.onmessage = function (evt) {
alert(evt.data);
};
Cute!
Conclusion
Python is serious
object-oriented programing language. But, you already know that. I
like it for being more C/C++ than lisp, as opposed to JavaScript
which is a bit more functional than procedural. Python has a strong
foothold in scientific software, including a number of mathematical
packages and libraries. Bindings are developed for popular
cross-platform GUI APIs like PyQt,
PyGTK
and wxPython.
Last but not least, Python is the language of choice for some serious
web application frameworks.
Tornado is nice piece
of software. Small and fast, and I hope its developers are going to
keep it that way. It is a bit more elaborate than node.js. For
example, tornado.web
class handles application object. Node.js needs additional module for
that. If we start adding, Python + Tornado add to more readable and
manageable code than JavaScript + Node.js. At least to me.
The only problem,
from the point of view of this blog, is that Python can not be
executed in a browser. And the main point here are network
applications that run in browser as the only client-side GUI. They
are already cross-platform and ubiquitous. Browser developers have to
take care of different issues like operating systems, hardware
configurations and security, to name a few. I've been in these shoes.
If your application is a bit slow on your client's computer, they
complain, but when their favorite browser hogs their computer, they
buy a faster one. Computer, not browser. So, forget about writing GUI
client, network communication and a lot more, and concentrate on your
application logic instead.
Comments
Post a Comment