The Null Terminator
Ethan Ram’s geeky blog on the seam of technology and product management.
Category Archives: Lean startup
On how to go Agile without missing your yearly quota/deadlines
2011 Oct 10Posted by on
Going Agile in a B2B Company – Part 2This is the second post about going Agile on a B2B company. Read the 1st part – that’s where you’ll find the why. This post is about the how.
Well… So much has been written on how to go Agile and what an Agility project looks like. I’m not going to repeat. I’ll be giving some general key guidelines – so that when you go on reading about Agile methodologies you can look at it with the right perspective; When you go to your managers to ask for an Agility project funding and time you’ll come with a good plan; When you speak with an Agile Coach you can see how well he can fit into your Agility project, rather than him dictating you what it should look like.
First thing to know is that Going Agile is not a one-time project thing. It’s a management philosophy. But – if you haven’t been practicing Agility in your company you should start with an Agile project with set goals – ppl like to have clear goals. Still, you should always remember that being Agile should be a general long-term goal. A goal to move fast, to be more productive, to beat competition.
A working company cannot and should not stop everything (development, sales) to have an Agile project in place. No manager would approve that, even in a startup. Most development group is investing about 25% of its time in building infrastructure and refactoring existing code. The initial Agility project would certainly require more resources, but it should not stop every other functional development.
Agile projects are all different because organizations are different and their Agility focus is different. Setting “standard” ultimate goals to achieve in a 6 months project is not realistic. Your managers will not understand where you are heading and what the urgency is (and you’ll miss your quota/deadlines). It’s better to educate your staff with the Agility concepts you want to embed in your process and set short-term goals that people understand. Then continue evolving in the Agile directions.
An Agility project should ultimately reduce development time, shorten release cycles, improve product quality and make your development team happier in general. But these are hard to quantify and it’s hard to “sell” an Agility project to higher management if you talk about these goals. So here is an idea – Start by setting project goals where your product lacks the most in a ways that hurts the SALES cycle and OPs. If you manage to improve here the benefit is measurable, immediate and noticeable. You’ll also make some other department managers happy about your Agility project and it’ll be easy to set new Agility goals.
So if it’s taking 5 days of work to configure each new customer environment – focus to resolve that with the Agility thinking and tools in mind – maybe by automating the installation and configuration process. If your QA cycle takes 2 weeks from code freeze to release – that’s an excellent place to start your test automation. If you’re having troubles upgrading customer’s database every time a new version is released – set fixing this issue as one of your first goals as part of your Agility project – for instance, by moving your database schema to the main code branch, building it as part of every build.
Introduce your managers (group leaders, team leaders) to the Agility philosophy and tools first. Set clear Agility goals for 2 months ahead and get your managers involved early in the process of how to get to those goals in time. Prepare to have some resistance – ppl don’t like changes and in many cases they will see it as an extra workload, even if they agree with your long-term goals. When you have a plan gather everyone and explain them Agility-Why and show them the plan.
Automation… Automation… Automation!
Many of the Agility tools and methodologies involve automating manual work – build servers, automatic unit testing, integration testing and QA automation – all require scripting and batch files. You’ll quickly find that your developers prefer coding in JAVA/C# than in Ant/cshell and that there are only few QA engineers that can script. This means ppl will need to adjust, evolve and learn new stuff as part of an Agility project. I found that I had to replace some of my QA staff with others that had automation and scripting in their background. I had to give some time for my developers to learn Ant scripting so that they can contribute to the Agility project in the short-term and continue developing in an Agile way later-on.
You’ll surely be creating some new development infrastructure to enable the Agile process to happen. A Continuous Integration Server, updated workspaces/projects for your developers that include built-in emulators for unit-testing, a new product packaging/installer etc. It’s wise to have a task-force in place that would build this infrastructure that would then be used by everyone else. Create a task-force built from engineers from all teams. This will help make sure that planning takes into consideration all relevant aspects and that when the initial infrastructure is ready there are already engineers in all teams that know how to use it and can teach the rest.
The new infrastructure soon affects every engineer’s daily work so it better be stable and solid enough so that it doesn’t break everyone’s work halting development completely. The task force should build the infrastructure and emulators, then the first few unit-tests and scripts on-top of it, to see that it’s actually functioning and can also be used by everyone else as an example. You can also start using the new infrastructure and test automation in parallel with the older process and throw warnings on unit-tests that fail rather than block the process.
In GameGround after the initial infra was developed we took 80% of engineers (everyone but a few left to fix blocker bugs) for a week of writing unit tests. So that everyone had both a chance to experience writing unit tests to their code, use the new infrastructure and understanding how development is going to be changed. This also meant that by the end of that week we also had a significant amount of our code tested regularly by our continuous integration server – 10%-20% of functionality. A good start.
Well, I must say, in reality, it was (is) much harder to achieve… I’ll try to write about the hurdles I’ve had in one of my future posts.
p.s. the post really had little with “yearly quota” – but it comes to show that an Agile project can be done without derailing the yearly planning of the company…
OPs / Production services review: Nagios, Kiwi Syslog and Limelight CDN
2011 Sep 30Posted by on
A Review of Services I’ve used in GameGround – Part IV
This is the 4th part in a series of blog posts reviewing several 3rd pty products and services I’ve used in GameGround and my take on them. The basic approach I’m taking here is the applicability of the product for a lean-startup that wants to move fast. In the last post I wrote about Analytics and BI Reporting tools for the marketing team. This post is about Monitoring the health of the system server – for the OPS team. Next in the series – development infrastructure.
Nagios probably is “The Industry Standard in IT Infrastructure Monitoring” as their slogan says. It’s very popular among IT stuff and can be configured to monitor and alerts about up to 40-50 servers. So even a medium size company can use it. It’s free server software – basically a scheduler that executes service checks against installed agents and tests against network devices, reports back the results and raises alerts above predefined thresholds. There’s also a comprehensive list of extensions or plugins written by the community that can be utilized to monitor about anything you’ll ever want.
It’s easy to setup Nagios to watch for server disk-space, CPU and the existence of certain services. The difficult part is to create checks that would alert you if internal parts of the software behave irrational and users are not seeing what they should. E.g. certain transactions do not end in time, server response time for certain requests is going up, users suddenly cannot see their friend’s list etc. These are much harder to watch. To monitor these you’ll need to write code both on your back-end servers – special functions (REST/WSDL) that would do some internal testing and return true/false accordingly. Nagios is able to call such functions periodically and alert if they failed. It’s an evolving process: You’ll see your systm fail without Nagios alerting about it and then add more of those checks till it functions well.
So- It’s wiser to add some testing functionality on design time: plan your server modules to have Nagios testing APIs. You’ll also need to watch that some of your 3rd pty providers are working right: If the A/B testing API you are using is down then your site is probably down too. If your Content Delivery Network is down ppl are not getting to see your website, although everything is functioning on your side.
Nagios – the CONS:
- Nagios was started in 1999 and is written in Perl. Although new versions have been released it fills like an old product and things seem harder to achieve than what we’re used to these days. Most of the checks have to be configured annually in config files including thresholds for alerts, the amount of times a failure does not raise an alert etc.
- To achieve the functional testing mentioned above and to integrate with all the plugins for the different OS types and monitoring you’d want to do you’ll need to code some scripts on Nagios side (or at least edit existing scripts). That’s Perl coding. Although Perl knowledge is still quite common it’s fast diminishing from the planet. It’s already hard to find IT managers who can code it and younger developers haven’t even heard of it… We ended up getting outside help to create the basic setup. That much about free software…
- The learning curve is long. Expect the system to text-message you false alarms at 2am, telling you the system is down, for a few months, until you get the thresholds right. Expect your CEO to call you at 2am to tell you the system is down but there was no alert… Lots of things went wrong in a live environment of (only) 4 servers we had in production in GameGround – it took us about 3-4 months to get to a relatively solid Nagios setup that actually alerted us on most of the real problems.
- Some of the things to watch for ened-up being certain errors written into the different servers’ log files. These may be critical bugs and exceptions thrown from bad things happening down in your code stacks. So you can set up Nagios to grep the log files for those strings. This is very heavy on your servers and on the traffic. Better have a proper central log server with alerts (see below). But then this actually means that you’re going to have 2 monitoring systems – one is the Nagios and one in the central logging server.
- Nagios is good in giving you a green or red sign next to your servers/services. But in reality managers want to know ahead of time that things are going in the wrong direction: queues are not emptying fast enough, response time on some requests are mounting. Nagios is no good for those tasks. You cannot use it to create graphs and its dashboard is not flexible.
- You have to manually define each server and each service you want to monitor. This does not work for cloud-based environment where adding a server instance is done in a click, or even automatically.
I don’t know of a good alternative. But I would like to see something that combines system health alerts with Syslog analysis and a real-time configurable dashboard. Any ideas?
If you want to have a good insight into what’s actually happening in your servers you must check the different servers’ logs. Getting all the logs from all the servers into one place and automating the search for errors, exceptions and irregularities is key to having a healthy working production environment. First product we checked following warm recommendations from friends was Splunk. It has excellent easy-to-use web-interface and the setup is very easy (assuming that your servers are written and configured to upload syslog/log4 to a central server…). But Splunk is VERY expensive, even for a small server setup like ours they asked for something like $6000/year. The free version is only good for internal testing and running on-top of QA systems. For production you’ll need the enterprise version. It does not make sense to pay that much in a startup… So we checked Kiwi Syslog.
Kiwi Syslog is a relatively small piece of software made by a NZ company. Their main interface is based on a Windows installed client. But they now also have a web-based dashboard that gives you the most important features. It’s easy to setup and work with. It’s cool. And it costs like 2% of Splunk’s cost. Go Kiwi Syslog! Go!
Working with a Content Delivery Network is an important factor in speeding your pages loading time. When we tested before-and-after we saw a dramatic decrease of first-time page load from 3-4 seconds to 2-2.5 seconds for US-based users. With later widgets and pages the load time was about 30% faster. This is a lot! The other reason you’d like to have a CDN is that it’s going to take a large percentage of the traffic from your servers – so you’ll end up having less servers and pay less on traffic.
The basic service a CDN offers is the speeding up of static content (Imgs, CSS, JS files) delivery. The advanced services CDNs offer are media streaming and something called Whole Site Delivery – out of scope for this blog post. For the small site/service you’re going to pay $1000-$2000/month for the basic CDN – it may not be too bad considering the reduced costs on servers and traffic.
If you know you’re going to use a CDN you can write your code and delivery procedures in a way that starting to use a CDN would just be a flip of a config file entry. If you already have a website/service functioning without a CDN you’ll probably need to do some work to separate and version the static files correctly and add proper configuration everywhere. So, with the right design you should be able to integrate with a CDN, change CDN or stop working with a CDN in a matter of minutes.
So the story goes like this: We decided we had to have a CDN because every millisecond of page load time is critical. This was before launching our initial service. We went shopping and were surprised – it seems that most of the bigger CDNs were not willing to work with us at this stage at all. Even the local rep of the local Cotendo (a startup sharing a VC with GameGround) never returned a phone call… Luckily the local rep of Limelight was willing to take the deal and after a couple of weeks on negotiations we switched NO the config and it was working well (we did have a couple of config issues – minor faults on our side)
Q: Should a small lean-startup deploy a CDN as part of their initial release?
A: NO NO NO. It’s expensive and the signing up with the local representative of a CDN will consume too much of your time.
Q: Should a lean-startup write their code with a CDN in mind?
A: Yes! Sure! This will allow you to speed up your site and offload traffic if and when your site/service is showing some signs of success. Coding with a CDN in mind won’t make it slower anyway.
Q: Can you give some hints on how to design it right to work with a CDN?
A: I promise to have a post about it later on… << but if you have a specific Q – ask it in a comment below
Q: Are there no free/cheap alternatives?
A: There are! Check out this post about using Google Apps Engine as a free static data CDN. Also – this post about using DropBox as a free CDN solution. Note that if the delivery of the resources from those unofficial-CDNs is not faster than delivering them from your own site then adding a CDN configuration might actually slow down your site. Be ware!
Web analytics and BI reporting services review: Google Analytics and SiSense Prism,
2011 Aug 30Posted by on
A Review of Services I’ve used in GameGround – Part III – Marketing tools
This is the 3rd part in a series of blog posts reviewing several 3rd pty products and services I’ve used in GameGround and my take on them. The basic approach I’m taking here is the applicability of the product for a lean-startup that wants to move fast. In the last post I wrote about Community engagement tools for the marketing team: sending emails and engaging customers in a conversation. This post is about Analytics and BI Reporting. Next up – OPS tools and of course, development infrastructure.
This extremely popular free SAAS service by Google has become the de-facto standard in website traffic analysis. 10 Years ago I used to download my http server logs and run a simple analysis tool that gave me most of the basic analysis features I needed, but this SAAS has some excellent analytics features like measuring page view time, campaign origin tracking, goal tracking, integration with AdSense etc. There are a few BUTs here, which make me think twice before I choose this option again:
- You want analytics not only about page views. You want to know about downloads ppl have made, about clicks to the cancel button on the registration page etc. All that is not well supported with Google Analytics APIs. The Events API does not work with the goals feature and calling the APIs on page events (.g. clicks) reporting fake URLs is skewing everything and the numbers just don’t add up.
- The tool is mostly good for the product managers to see how the crowds are using the product/website. A few other causes are not well served: your sales manager would like to get information of sales progress from analytics on individual potential customer – integrate the analytics goals with SalesForce and alert sales staff to contact potential customers. Engineering would like to know of faulty pages (500s), broken links (404s) etc. Security officers would like to know of traffic spikes and login errors to track potential invaders and hackers. You’ll need other tools for those tasks. p.s. check out the friends at Totango for an excellent analytics tool specific for SAAS sales managers.
- Much of the analytics data is delayed. You get much of your statistical views update daily. So you upload a change close to the end of your working day. You come in the morning – still no significant results. You have to wait another day before you get some results. And that is IF you’ve managed to write the fake URLs thing correctly. In many cases you’ll need to fix it a bit and repeat the test. This is way too slow for a startup…
- GA is a basic website traffic analysis tool. Traffic alone is good for SEO tasks, understanding traffic sources, goal achievement etc. For Business Interlace you’ll need something much stronger with access to the business data stored in your database (see below).
In short – For modern websites and apps GA is almost useless – it will only give you the big picture. Forget about the details …Or check out a better service that was designed for it.
Two insights on the development management side of things: Plan the analytics of every feature as part of the design of the feature itself. Having a feature that one cannot analyze and understand user interaction with is usually worthless. Plan to spend more time than you initially thought to support Google Analytics efforts (probably true with any analytics.)
SiSense is a startup developing a very interesting reporting product that is based on unique Columnar Data Storage technology (as opposed to the “regular” OLAP-cubes or other in-memory solutions) that enables large-scale data-sets analysis. The product has an easy-to-use interface that allows creating of beautiful web-based reports for business intelligence, website analysis fort any where managers need a dashboard with stats. It can connect to multiple data sources including most common DBs and even cloud services like Google AdWords, Google Analytics and Amazon S3 logs. This means that the cost of creating and operating excellent reports is much lower than with some other popular products by IBM, Omniture, Microsoft, Oracle and so many others.
I liked using their product a lot. In GameGround the product was mostly operated by one of our QA guys (in addition to his QA roles) that had some basic knowledge in databases and SQL and assisted by our DBA occasionally.
A few notes for everyone thinking of building a BI suite using SiSense and the like:
- Lean start-ups – abort here! The establishment of a BI product is lengthy, expensive and has a high learning curve. In most cases it would involve bring in an expensive contractor just to help you boot-start the thing. If your data includes a few thousands records you’re much better off with Excel. Excel can connect to most Databases and you can create filters and graphs and send them by email to the marketing/sales daily. It may sound “ugly”, but the time/cost it would take is a fraction of the time it takes to build a proper BI suite. BI reporting suites are not meant for lean start-ups! Starting thinking about a BI suite when you have some real customers and Excel’s abilities of crunching data are too low (over a couple of hundreds of data rows Excel starts slowing down to a crawl!)
- In many (read: most) cases the information you want to investigate does not exist in the Database. To create a report that shows you how many ppl clicked on ‘like’ and how many ppl uploaded a picture every day you’ll need to add code to collect the data. In many cases this involves code on both the website, the back-end services and changes to the database. No magic here – BI needs are met with development costs even if the BI person is part of the marketing team.
- The most important thing with BI is knowing the right questions to ask. In most cases the basic question of “if I give you the data you requested what would you do with it” is never asked. Ask questions that gets you actionable data. The harsh reality is that in too many cases reports were requested and were never utilized. Still producing those reports took a lot of effort.
- As a manager, if you ask for very detailed reports you’ll find that you drown in details and cannot get the whole picture. The whole point of having BI dashboards is that you can get the interesting point in 5 seconds. So, start by asking for basic stats that can be visualized over time. Then ask to get details on specific actual things you see.
- Building a good BI dashboard is, thus, an evolving process, not a project. The project would be to get to the point where you have the first 3 reports in place. Then you’ll want to continue develop more reports and improve on existing.
A few notes specific to SiSense Prism:
- They are still a startup themselves. Investing in a BI suite that may not be there in a couple of years is risky. Still I give the team at SiSense a strong plus. I’ve seen the work they’ve done with wix is pretty impressive.
- They prefer you to take a monthly paid subscription to use the product instead of paying a one-time. This is cool and allows you to pay-as-go and pay-as-you consume. It also reduces the cost of boot-starting a BI solution that mostly involves buying a strong server, paying for a contractor to help you out getting the thing to work etc.
- Their pricing plans are a bit problematic. Their most basic feature – viewing the reports in a browser – is only available in the most pricy plan.
- Their customer support had some serious issues at times. We got no response for a bug we had with their reports viewer software (we did not use the web-based version), and reverted to use a 500MB server software on each of the managers’ laptops…
Community Engagement services review: SendGrid, GetSatisfaction
2011 Aug 3Posted by on
A Review of Services I’ve used in GameGround – Part II – Marketing tools
This is the 2nd part in a series of blog posts reviewing several 3rd pty products and services I’ve used in GameGround and my take on them. The basic approach I’m taking here is the applicability of the product for a lean-startup that wants to move fast. In the last post I wrote about A/B/Split testing tools for the marketing team. This post is about Community Mgmt. Next up – Web analytics and BI reporting, OPS tools and of course, development infrastructure.
One of the first features every service has is sending email to customers. There are 2 basic types of emails to send: transactional and mass-mailing. Transactional emails are those produced as a result of a user action, like registration, friend invites etc. Mass-mailing are those when you invite your registered users to an event, a sale etc. So why not use your own corporate SMTP server for those emails? Because you are likely to find yourself in one of the many black lists of spam servers at some point. If spam filters on several servers worldwide find your emails to be spam or If 2-3% of your users mark your email as spam you’ll be black listed and will not be able to send emails from your company at all… bad idea. Other issues you’ll have to manage yourself if you don’t use a SAAS for this is managing unsubscribe lists (<1% of users on social networks unsubscribe in average) and email bounce list (~12% of email address users give on social networks are miss-typed or bogus). Managing those lists is mandatory if you don’t want to get black-listed.
We started with using MailChimp, probably the largest of several competing services, but quickly found that they will not send our mass-mails as they are afraid their servers would get black-listed. We then had the same issue with Constant Contact and CampaginMonitor. It seems that most EMS vendors send all email from a set of about a dozen shared IP addresses. Thus, they have to minimize complaints across their entire portfolio. Most EMS vendors require that you give your users either opt-in (“I’m willing to get marketing materials” checkbox on registration) or double opt-in (+email verification). And if the complaints rate resulting from your service is above a very low rate they kick you out. On our first campaign to just 1200 registered users we had a complaint rate of 1.1% and their acceptable limit was 0.2%… For a young company with little history records that is running its first campaigns the demanded ratios we not acceptable. And- we wanted to have an opt-out on sign up, not an opt-in. We got stuck for a few days till we managed to resolve the mess.
Then entered SendGrid! SendGrid is a cloud-based SAAS with a technology that seems to be far more resilient to black-listing. Their white-label feature allows you to bind your domain MX records to one of their servers with an IP address in cloud. This means you do not share IPs with others and do not need to comply with such low complaints rates. If you get black listed you can change IP address and/or domain name and get back on business in a matter of minutes. So we set up 2 accounts – one for transactional emails, that are less likely to cause blacklisting, and bound it the company’s domain name. Then we bought another domain ‘mailer1 –mycompany.com’ and bound it to the second account. SendGrid system appends an ‘unsubscribe’ link to your emails if you don’t do it yourself and they manage the lists for you – they won’t send an email to someone who unsubscribed, even your service did send them. You get a dashboard where you can see stats of your sent mails, bounces, spam reports etc. and fix your email templates as needed.
The integration with SendGrid’s basic SMTP service took us 15 minutes. They also give you APIs to sync user lists, send using predefined templates etc. but we haven’t got to use those. Pricing is low for what you get. It’s highly recommended to work with them and utilize their APIs to save you the need to write email templates and change them every other days according with the product needs. Let the product guys edit the email templates on SendGrid control panel. No code changes involved unless a radical change is made and different parameters are needed to fill-in the template. So much simpler to operate this feature too. Our email system is working fine with a delivery rate of ~95% on the transactional emails, which is excellent.
Now, how about some tips on how to avoid getting your emails marked as spam? This is a bit out of scope here – maybe I’ll do another post on the quests I’ve had to work-around the spam filters mine-fields. Meanwhile, you may want to read here.
- The product is a hybrid between an online feedback survey and a forums product. It has the disadvantages of both: you mostly get to hear only the ppl complaining about your product; irrelevant questions and remarks bloat the service with historical and irrelevant data.
- The loading of their widget is slow and cause long delays in the page loading -time. On some pages the feedback widget was the first thing to show up on the page (WTF???). We ended up writing a script that delay-load their script to bypass their mentality of “We’re THE product and our users are the websites using it”…
- The suggestion engine meant to prevent users from entering the same feedback over and over again is weak.
- You already have a Facebook page, a blog and the product pages with comments. Why do you need another place where ppl would talk about your product?! I would give this service a pass next time I’m around. Instead just open a feedback page, place a Facebook comments widgets in it and you’re done.
Marketing tools review: Google Web optimizer, Visual Web optimizer and Unbounce
2011 Aug 1Posted by on
A Review of Services I’ve used in GameGround – Part I – A/B/Split testing and landing pages services
GameGround.com is a service I’ve built during 2010 and was alive till mid-2011. I’ve managed this startup dev teams, developing a consumer facing social meta-game. This is a short review of several 3rd pty products and services I’ve used and my take on them. The basic approach I’m taking here is the applicability of the product to a lean startup that wants to move fast. I started writing it and quickly found out that it’s actually too long for one post. So I’m going to make it a series of post covering Marketing tools, Community Mgmt. tools, OPS tools and of course, development infrastructure.
Google Website Optimizer
Visual Web Optimizer
Unbounce is a landing pages SAAS. “… a self-serve hosted service that provides marketers doing paid search, banner ads, email or social media marketing, the easiest way to create, publish & test promotion specific landing pages without the need for IT or developers.” Yes! Landing pages for specific audiences and campaigns is an excellent way to drive traffic to your site. And Unbounce’s platform with its WYSIWYG HTML editor simplify the process even further allowing the marketing to create those pages and amange them as part of campaigns they are having without needing development involvement. They even give you multi-pages per landing page (e.g. a small website), a lead generation module, A/B/Split testing tools and other goodies. So far so good.
BUT! There’s a major but here: the SEO marks for those pages on Unbounce are extremely low. Search engines don’t like websites and landing pages that has only static content. They also don’t like it that the landing page in not under your own domain, but rather on Unbounce’s, and so they incorrectly see the landing page as a spam blog. This (among others, I’m sure) led us to get very few displays of our ads on Google Adwords and very few clicks coming from this major traffic source.
We ended up using some other desktop HTML editor to create a single-page site for each landing page. It was then uploaded to our live production servers, under the ‘/play’ folder, using a FTP we opened for it. This way the marketing team could create their landing pages according to the running campaigns and upload them to production with little or no dev/OPS involved. This is lean-thinking in its best – have as little ppl involved in each task. Ppl should mostly be able to complete their tasks end-to-end without needing to interface with others.
Node.JS – I Love this Technology!
2011 Aug 1Posted by on
Beware! A game-changing technology has entered the arena. The internet as we (developers) know it is about to change soon.
I rarely get to see a new technology that sparks my mind and keeps me late at night, trying to utilize it and doing something with it. It happened to me a couple of months ago when I first played the PC version of Angry Birds (2 white nights…!) and lately again with Node.JS. But this one is no game! To explain the thing I have to take you back in time to year 1999…
It all started when I wrote my first server for Exent’s Games on Demand platform. It was a large-volume file data server designed to respond to the requesting clients very fast and serve thousands of concurrent clients. We wrote the server as a kernel module and accordingly it was written in a fully asynchronous fashion. This project lead by my first team-leader, Amnon Romm, was certainly the most beautiful piece of code I have seen to date. [us, developers, can see beauty and ugliness in simple code. It’s a special gift we have that a non-developer will never understand… J Code is actually mostly old and ugly. If you write something and a colleague comes to you and tells you your code is beautiful – this would be the BEST compliment you can ever get. Really]
Since then I’ve seen so many other servers. Some of them, like Check Point’s Firewall, definitely have a good asynchronous architecture (even if the code is somewhat ugly…) But one thing I could never figure out how come the whole WWW (the browser part of the internet) is running on top of badly designed synchronous servers. Maybe it’s the basic design of the [wonderful] HTTP protocol that is request-response based. Maybe it’s us, developers, who find it harder to design and code asynchronously. Maybe it’s because in the early ‘90s when internet standards were written, running a CGI process on a UNIX was the main way to handle HTTP server requests, and we just never wanted to stop supporting those standards… Anyway, I figured out that all the most common servers – Apache, IIS, JAVA based web-services, the standard .NET stack, Django/python, PHP, Ruby and almost every other piece of HTTP server out there are written to run on a synchronous environment. Every request is either served by a new thread or a thread from a big thread pool. Such a thread is executing the request and response stacks waiting for resources from the DB, from the disk drive, from a memcached service they call etc. And each time they go to sleep waiting for a response from the device to arrive and context-switched out to give another thread some CPU time. This means that heavy-load servers spend MUCH of the CPU and memory-bus time switching threads. The simplest server written with (the newly designed) .NET/WCF can have up to 100 threads running on a dual-core processor. The result is that a strong server can serve only a few thousands of clients/browsers concurrently. So much CPU time and money is wasted.
Another issue is the usage of high-level interpreted languages to write the internet. From JSP to PHP to Python. Most of the internet is written in scripting languages because it is easy to write and easy to deploy. But everyone knows it is running slow. It’s a balance development managers take – code fast and get it out the door. They say “We’ll have other ways to speed up the beast after it’s already out” – clustering, stronger hardware, another caching layer etc. Anyway – once the version is out its now the problem of the IT guys to meet the SLA. WTF??? Some real efforts were recently made by Facebook to speed up PHP with their release of Hiphop – a server add-on that transforms PHP scripts into highly optimized C++ code and then uses g++ to compile it to machine code before its run. They say that on average it reduced CPU usage at Facebook by about 50% (!!!) and that WordPress 3.0 is running x2.7 faster under Hiphop. Wow! Impressive!
But what if we could solve the synchronous design in a similar way? The problem is we cannot – we’ll have to throw away all the code that was ever written and start fresh. That is because asynchronous code cannot call code that blocks and all that code out there is blocking.
Enter Node.JS (Start fresh!)
So what can you do with it? Well, basically everything you can do with Python, Perl or Java – client code, server code. But the goal is definitely server code. Blazing fast web-servers that can handle x10 more traffic and do it much faster. This can serve not only “regular” browser based traffic, but can also be utilized to stream music and video, used for sharing applications, reverse proxies etc.
There are a couple of things you need if you want to have a team of developers starting to work on your next big thing. First you want a proper a development environment (use Eclips with Google’s V8 plugin); then you need a Unit Testing framework (use Expresso); an application server with MVC and templating support (use Express); an ORM/Hibernate-like tools to ease coding on the DB (see MongooseJS for MongoDB and SequelizeJS for MySQL); a library of utility Modules to copy-paste from for almost every basic need (see NMP Registry with almost 3000 entries << try searching “facebook”); and a cloud based app-engine to deploy your application on, preferably for free (I found 11 such services but Nodejitsu and Cloud Foundry seems to be the most advanced). Let’s not forget that a strong development community is also very important (NodeJS main newsgroup: ~50 threads/day; ExpressJS group: ~10 threads/day ; Linkedin NodeJS group has 832 members; StackOverflow NodeJS tag: ~13 tags/day). I think we are good to go!
Node.JS thing is catching fire these days – this Google Trends view clearly shows how fast Node.JS is soaring and that it’s now almost as big as Roby on Rails. Many new startups are seeing this as a great opportunity and are developing on Node.JS. It fits so well with the lean-startup concept. One coding language for both front-end and backend >> one developer can write the whole feature, end to end. And no more translations between XML and JSON. Now everything is JSON in all application layers. No more
Some giants have already decided they are joining the party. Microsoft has recently announced it’s going to support Node.JS on Azure (and Visual Studio, for sure). VMWare is already supporting the Node.JS deployments on their cloud services – CloudFoundary.
Interesting blog posts if you want to further read –
- 6 months with node (a thank you note)
- Why did NodeJS become popular faster than its peers?
- What it’s like building a real website in Node.js?
- What are the benefits of developing in Node.js versus Python/Django?
p.s. I guess some of the readers of this post are saying “this guy is crazy! He’s taking an immature technology and convincing it should be used in production today. The risk is too high yada yada yada…” I agree. The risk is high. If you don’t have strong devs that can master a new technology and face some difficulties then you should stick with the usual Django/Rails/GWT. If you have strong devs the up side of this technology is great and I think it’s mature enough for most tasks. Especially if you start something new.