Recent Posts
Blog Archive
:-)
%7Cutmcmd%3Dreferral%7Cutmcct%3D%252Fissue%3B%2B)
Ethan Ram’s geeky blog on the seam of technology and product management.
I rarely get to see a new technology that sparks my mind and keeps me late at night, trying to utilize it and doing something with it. It happened to me a couple of months ago when I first played the PC version of Angry Birds (2 white nights…!) and lately again with Node.JS. But this one is no game! To explain the thing I have to take you back in time to year 1999…
It all started when I wrote my first server for Exent’s Games on Demand platform. It was a large-volume file data server designed to respond to the requesting clients very fast and serve thousands of concurrent clients. We wrote the server as a kernel module and accordingly it was written in a fully asynchronous fashion. This project lead by my first team-leader, Amnon Romm, was certainly the most beautiful piece of code I have seen to date. [us, developers, can see beauty and ugliness in simple code. It’s a special gift we have that a non-developer will never understand… J Code is actually mostly old and ugly. If you write something and a colleague comes to you and tells you your code is beautiful – this would be the BEST compliment you can ever get. Really]
Since then I’ve seen so many other servers. Some of them, like Check Point’s Firewall, definitely have a good asynchronous architecture (even if the code is somewhat ugly…) But one thing I could never figure out how come the whole WWW (the browser part of the internet) is running on top of badly designed synchronous servers. Maybe it’s the basic design of the [wonderful] HTTP protocol that is request-response based. Maybe it’s us, developers, who find it harder to design and code asynchronously. Maybe it’s because in the early ‘90s when internet standards were written, running a CGI process on a UNIX was the main way to handle HTTP server requests, and we just never wanted to stop supporting those standards… Anyway, I figured out that all the most common servers – Apache, IIS, JAVA based web-services, the standard .NET stack, Django/python, PHP, Ruby and almost every other piece of HTTP server out there are written to run on a synchronous environment. Every request is either served by a new thread or a thread from a big thread pool. Such a thread is executing the request and response stacks waiting for resources from the DB, from the disk drive, from a memcached service they call etc. And each time they go to sleep waiting for a response from the device to arrive and context-switched out to give another thread some CPU time. This means that heavy-load servers spend MUCH of the CPU and memory-bus time switching threads. The simplest server written with (the newly designed) .NET/WCF can have up to 100 threads running on a dual-core processor. The result is that a strong server can serve only a few thousands of clients/browsers concurrently. So much CPU time and money is wasted.
Another issue is the usage of high-level interpreted languages to write the internet. From JSP to PHP to Python. Most of the internet is written in scripting languages because it is easy to write and easy to deploy. But everyone knows it is running slow. It’s a balance development managers take – code fast and get it out the door. They say “We’ll have other ways to speed up the beast after it’s already out” – clustering, stronger hardware, another caching layer etc. Anyway – once the version is out its now the problem of the IT guys to meet the SLA. WTF??? Some real efforts were recently made by Facebook to speed up PHP with their release of Hiphop – a server add-on that transforms PHP scripts into highly optimized C++ code and then uses g++ to compile it to machine code before its run. They say that on average it reduced CPU usage at Facebook by about 50% (!!!) and that WordPress 3.0 is running x2.7 faster under Hiphop. Wow! Impressive!
But what if we could solve the synchronous design in a similar way? The problem is we cannot – we’ll have to throw away all the code that was ever written and start fresh. That is because asynchronous code cannot call code that blocks and all that code out there is blocking.
So what is it? I see it as the next-gen any-software platform. You write the code in JavaScript that then gets run inside Google’s blazing-fast V8 JavaScript Engine. This engine first compiles the code to binary then executes it. The more important thing is that the core libraries of Node.JS require you to write everything asynchronously. Whether you’re accessing a config file, requesting an access token from a Facebook API or running a SQL query on your DB – all the APIs are synchronous. They also took the event-driven I/O approach and implemented the CommonJS specifications so that it’s extremely simple to write servers using Node.JS. There are also a few other strong features: a way to package modules of code into libraries or Modules; a way have native binary Modules (written mostly in C/C++); a package manager called NPM; and an excellent open-source spirit community of devs around it. It was started by Ryan Dahl in 2009, and its growth is sponsored by cloud provider Joyent, which employs Dahl.
So what can you do with it? Well, basically everything you can do with Python, Perl or Java – client code, server code. But the goal is definitely server code. Blazing fast web-servers that can handle x10 more traffic and do it much faster. This can serve not only “regular” browser based traffic, but can also be utilized to stream music and video, used for sharing applications, reverse proxies etc.
There are a couple of things you need if you want to have a team of developers starting to work on your next big thing. First you want a proper a development environment (use Eclips with Google’s V8 plugin); then you need a Unit Testing framework (use Expresso); an application server with MVC and templating support (use Express); an ORM/Hibernate-like tools to ease coding on the DB (see MongooseJS for MongoDB and SequelizeJS for MySQL); a library of utility Modules to copy-paste from for almost every basic need (see NMP Registry with almost 3000 entries << try searching “facebook”); and a cloud based app-engine to deploy your application on, preferably for free (I found 11 such services but Nodejitsu and Cloud Foundry seems to be the most advanced). Let’s not forget that a strong development community is also very important (NodeJS main newsgroup: ~50 threads/day; ExpressJS group: ~10 threads/day ; Linkedin NodeJS group has 832 members; StackOverflow NodeJS tag: ~13 tags/day). I think we are good to go!
Node.JS thing is catching fire these days – this Google Trends view clearly shows how fast Node.JS is soaring and that it’s now almost as big as Roby on Rails. Many new startups are seeing this as a great opportunity and are developing on Node.JS. It fits so well with the lean-startup concept. One coding language for both front-end and backend >> one developer can write the whole feature, end to end. And no more translations between XML and JSON. Now everything is JSON in all application layers. No more
Some giants have already decided they are joining the party. Microsoft has recently announced it’s going to support Node.JS on Azure (and Visual Studio, for sure). VMWare is already supporting the Node.JS deployments on their cloud services – CloudFoundary.
Interesting blog posts if you want to further read –
Did I say a game-changing technology? I think that this is taking on the HTML5 hype. I’m predicting here that in 5 years Node.JS would be the most prominent coding language in the world, and servers based on Node.JS platform will replace most ailing JBOSS, ISS and Apace servers out there. The LAMP stack is dead, long live JavaScript.
p.s. I guess some of the readers of this post are saying “this guy is crazy! He’s taking an immature technology and convincing it should be used in production today. The risk is too high yada yada yada…” I agree. The risk is high. If you don’t have strong devs that can master a new technology and face some difficulties then you should stick with the usual Django/Rails/GWT. If you have strong devs the up side of this technology is great and I think it’s mature enough for most tasks. Especially if you start something new.
Hi eram,
thanks for your insight! I still not sure about the numbers: You stated “The result is that a strong server can serve only a few thousands of clients/browsers concurrently”. From my personal experience that sounds about right for a multi-threaded server.
If you look at Google Search – I guess the biggest web site out there – they have something like 34 thousand requests per second (number from 2/2010, see http://searchengineland.com/by-the-numbers-twitter-vs-facebook-vs-google-buzz-36709). So with multi-threaded servers they could run their front-end on less than fifty servers.
The websites I have been working on had more like 200 thousand requests per day, which comes out to a measly 2.3 requests per second. I would argue that most public websites in the Internet today have less than 100 requests per second, something that could easily be handled by any multi-threaded server in any language.
I therefore think there are certain applications that need extremely high concurrency, and those would benefit from the asynchronous programming model of Node.js, but most applications don’t need that. What is you opinion on that?
Keep up the good work!
Marc
Hi Ethan
That was quite a compliment there, thank you. I stumbled across this article by chance. In todays fast pace shareholders over the shoulder zero design, periodic bug distribution subscription age, you are unlikely to find beautiful code, most code is obfuscated by style. But we the secret craftsmen of abstract worlds should keep the flame going for generations to come and one day our descendants may be able to say “we told you so”.
Have fun
Amnon