Recent Posts
Blog Archive
:-)
%7Cutmcmd%3Dreferral%7Cutmcct%3D%252Fissue%3B%2B)
Ethan Ram’s geeky blog on the seam of technology and product management.
This is the 4th part in a series of blog posts reviewing several 3rd pty products and services I’ve used in GameGround and my take on them. The basic approach I’m taking here is the applicability of the product for a lean-startup that wants to move fast. In the last post I wrote about Analytics and BI Reporting tools for the marketing team. This post is about Monitoring the health of the system server – for the OPS team. Next in the series – development infrastructure.
Nagios probably is “The Industry Standard in IT Infrastructure Monitoring” as their slogan says. It’s very popular among IT stuff and can be configured to monitor and alerts about up to 40-50 servers. So even a medium size company can use it. It’s free server software – basically a scheduler that executes service checks against installed agents and tests against network devices, reports back the results and raises alerts above predefined thresholds. There’s also a comprehensive list of extensions or plugins written by the community that can be utilized to monitor about anything you’ll ever want.
It’s easy to setup Nagios to watch for server disk-space, CPU and the existence of certain services. The difficult part is to create checks that would alert you if internal parts of the software behave irrational and users are not seeing what they should. E.g. certain transactions do not end in time, server response time for certain requests is going up, users suddenly cannot see their friend’s list etc. These are much harder to watch. To monitor these you’ll need to write code both on your back-end servers – special functions (REST/WSDL) that would do some internal testing and return true/false accordingly. Nagios is able to call such functions periodically and alert if they failed. It’s an evolving process: You’ll see your systm fail without Nagios alerting about it and then add more of those checks till it functions well.
So- It’s wiser to add some testing functionality on design time: plan your server modules to have Nagios testing APIs. You’ll also need to watch that some of your 3rd pty providers are working right: If the A/B testing API you are using is down then your site is probably down too. If your Content Delivery Network is down ppl are not getting to see your website, although everything is functioning on your side.
Nagios – the CONS:
I don’t know of a good alternative. But I would like to see something that combines system health alerts with Syslog analysis and a real-time configurable dashboard. Any ideas?
If you want to have a good insight into what’s actually happening in your servers you must check the different servers’ logs. Getting all the logs from all the servers into one place and automating the search for errors, exceptions and irregularities is key to having a healthy working production environment. First product we checked following warm recommendations from friends was Splunk. It has excellent easy-to-use web-interface and the setup is very easy (assuming that your servers are written and configured to upload syslog/log4 to a central server…). But Splunk is VERY expensive, even for a small server setup like ours they asked for something like $6000/year. The free version is only good for internal testing and running on-top of QA systems. For production you’ll need the enterprise version. It does not make sense to pay that much in a startup… So we checked Kiwi Syslog.
Kiwi Syslog is a relatively small piece of software made by a NZ company. Their main interface is based on a Windows installed client. But they now also have a web-based dashboard that gives you the most important features. It’s easy to setup and work with. It’s cool. And it costs like 2% of Splunk’s cost. Go Kiwi Syslog! Go!
Working with a Content Delivery Network is an important factor in speeding your pages loading time. When we tested before-and-after we saw a dramatic decrease of first-time page load from 3-4 seconds to 2-2.5 seconds for US-based users. With later widgets and pages the load time was about 30% faster. This is a lot! The other reason you’d like to have a CDN is that it’s going to take a large percentage of the traffic from your servers – so you’ll end up having less servers and pay less on traffic.
The basic service a CDN offers is the speeding up of static content (Imgs, CSS, JS files) delivery. The advanced services CDNs offer are media streaming and something called Whole Site Delivery – out of scope for this blog post. For the small site/service you’re going to pay $1000-$2000/month for the basic CDN – it may not be too bad considering the reduced costs on servers and traffic.
If you know you’re going to use a CDN you can write your code and delivery procedures in a way that starting to use a CDN would just be a flip of a config file entry. If you already have a website/service functioning without a CDN you’ll probably need to do some work to separate and version the static files correctly and add proper configuration everywhere. So, with the right design you should be able to integrate with a CDN, change CDN or stop working with a CDN in a matter of minutes.
So the story goes like this: We decided we had to have a CDN because every millisecond of page load time is critical. This was before launching our initial service. We went shopping and were surprised – it seems that most of the bigger CDNs were not willing to work with us at this stage at all. Even the local rep of the local Cotendo (a startup sharing a VC with GameGround) never returned a phone call… Luckily the local rep of Limelight was willing to take the deal and after a couple of weeks on negotiations we switched NO the config and it was working well (we did have a couple of config issues – minor faults on our side)
Q: Should a small lean-startup deploy a CDN as part of their initial release?
A: NO NO NO. It’s expensive and the signing up with the local representative of a CDN will consume too much of your time.
Q: Should a lean-startup write their code with a CDN in mind?
A: Yes! Sure! This will allow you to speed up your site and offload traffic if and when your site/service is showing some signs of success. Coding with a CDN in mind won’t make it slower anyway.
Q: Can you give some hints on how to design it right to work with a CDN?
A: I promise to have a post about it later on… << but if you have a specific Q – ask it in a comment below
Q: Are there no free/cheap alternatives?
A: There are! Check out this post about using Google Apps Engine as a free static data CDN. Also – this post about using DropBox as a free CDN solution. Note that if the delivery of the resources from those unofficial-CDNs is not faster than delivering them from your own site then adding a CDN configuration might actually slow down your site. Be ware!
Of the 14 years I’ve been developing software 10 years were with companies doing B2B software (intended to be sold to another business, as opposed to B2C – software that is directed at customers online etc.) In recent years the Agile development methodology is growing strong and a recent Forrester study shows that now over 40% of development teams in the US are using some sort of Agile development methodology. I’ve heard of Agile project in some of the larger companies and had a chance of “upgrading” my own development department to work in an Agile environment (we took Kanban as our preferred Agile approach). Now this blog post is not going to be about my experience with Agile. Instead I’m going to tell you about a talk I’ve had with a friend who told me Agile was not for his (awesome) B2B software company and my response to that. So these are the reasons why he thought Agile was not for him:
At first this all seemed logical to me as I knew that the real power of Agile development lies in the quick release cycles (“give something small to your customers often”) and in cases that the software quality socks. Anyway, with another thought, these are the questions I’ve asked–
Well – you’re expecting this – go Agile. 🙂
I’m not saying its magic. You’ll have to invest time to make it happen. You’ll have to give it some chance and believe it can greatly improve your performance. Why “Believe???” – – We are engineers (or sales guys) and we have targets and methodologies to work. Why do we need to believe? Because you’ll have to change the way you work. People don’t like to change. People like to stick with processes they know. They mostly don’t see the flaws. They find it hard to believe that it can be so much better.
This is why I think that the goals for an Agile project must be set by a high ranking manager. The R&D manager is going to be involved for sure, but also marketing/product manager, and sales exec for sure as one of the main goals for an Agility project would ultimately be to improve the sales cycle and time to market. So, it seems the division head or CEO is probably the person that should set the goals and allocate the resources for an Agility project.
OK. So you are saying “the above is exactly the problems I see in my company but I’m not the CEO. I’m merely a team leader in the R&D…” – Now what? Well, send this post to your CEO – ask for a meeting to discuss the topic. Come prepared. Bring along an Agile couch/consultant. They are used to talking high mgmt. into investing in Agile.
Next up – some guidelines on how to go Agile without missing your yearly quota / deadlines.
Qs and comments are welcome as always.