The Null Terminator

Ethan Ram’s geeky blog on the seam of technology and product management.

Category Archives: IT

Soluto Frustrations

On why I was deeply unimpressed with Soluto’s product

The last week hasn’t been an easy one… I’ve got a brand new Lenovo X220 – and it’s giving me a hard time. For over 10 years now that I’ve always had a ThinkPad laptop (except for a couple of years with a MacBook – but this story is for another post) and I was always very happy with it. But this time…

Boot is stuck for over a minute between login and getting to see the desktop!

My laptop gets stuck forever 2-3 seconds after I go wireless on my workplace Wi-Fi network!!

Changing writing language too often doesn’t work – I’m getting stuck in Hebrew forever!!!

Fingerprint software is dead…

A PCI port is missing a driver and Windows keeps complaining about it…!

The Lenovo System Updater gets stuck forever when I run it to check maybe I’m missing some updates… errr.

In general – I get to the point that I have to reboot the computer 2-3 times a day… err err errrrr!

…And I’m a software guy, expert with Windows Internals – right? So I must be able to find the cause of these. But hell – it’s a new computer running latest software – I don’t feel like spending day resolving these. Or – maybe return the thing to the IT department and let them break their heads on this (still I will have to take a day off or work on a temp PC that does not have my configuration… bad idea!)

WOW – after a couple of days I hit some major frustration loosing over ½ an hour of work. ((Remember this was actually common in the ’90… but things surly have changed since… or maybe this is only me??))

Then I remembered this slogan from a young and apparently very successful startup from Tel Aviv called Soluto – “Soluto is bringing an end to PC user frustration …killer technology”. They have millions of installations and got so many positive reviews and prizes. It seems that Yishay Green, Roy Karthy and the other guys there are really managing to create a buzz and get some heavy funding. Actually, I remember I even tried one of their early betas after hearing one of their founders talk about his former startup successes… I have to give it a try. Maybe 15 minutes with this and I can save the hours of digging into finding what’s wrong with my PC.

I’m not going to write a full review about my experience. Just a few bits. They have a smooth installer and the UI looks great although it is a bit slow to come up with answers. (I even took some time to help them defining what some of the more obscure apps I’m running on my laptop are – after all it’s much of a community project.)

But then I hit this…

So they found that Google Chrome is running in the boot taking 29 seconds of my boot time. But they cannot do anything about it. WTF?? This is simply wrong. I actually ran a boot profiler myself (SysInternals ProcMon) and Chrome does not run on boot at all… And why do they say they cannot do anything about it? Can’t they help me uninstall it?…

They are offering me to remove some 10 pieces of software from the boot process, most of them with no clear explanation on what they do, and each taking a full 0.1 seconds of my life… Who cares?! But check out this suggestion – “pause it unless you connect to a network on the internet using an Intel wireless network adapter” – Even a pro like me got confused for a minute. This is cryptic Chinese for 99% of the world population. I’m sure that those poor 10% of people who choose to follow their advice and disabled their PC wireless have really got no frustrations now…

So it’s offering me to remove 8 of the 18 plugins I have in my Chrome browser. They say I have 2 Chrome toolbars and six more plugins that I can safely be removed… WHAT??? Toolbars in Chrome?! In Chrome there are no toolbars, like they have in (poor) Internet Explorer; and those 6 plugins I use are very useful and consume somewhere between ZERO to NOTHING processing time. Why do they suggest I remove them? Why do they tell me 26% of users actually disabled their Multimedia Plugin (and forgot about having media playback capabilities in the process)!?

I have a few more examples but I think the point I’m making here was well understood. So I’ll shortly conclude with this strange behavior: it takes Soluto 6 seconds to open its own About dialog… At least in the first time after every boot.

As you can understand it did not find my issues, or help me fix them, so we parted like friends (somewhat of a frustrated PC user friend, though).

OK – OK. I know I’m probably not the average guy and probably I’m not their target customer and/or my new PC is not their target PC because it’s new. I can even think of some cases when I was called-in to help fix a dysfunctional computer where this utility application could actually do some good. Still I felt I’ve wasted time playing with it.

So here is advice for what could be some excellent features for the next Soluto version that may actually make a difference. And if they don’t add them to Soluto, still, my dear readers, you can follow and fix your frustrating computer yourself.

  1. Enable Windows update to automatically update your PC. 99.9% of ppl should absolutely automatically update their system. Always.
  2. Install a basic anti-virus, anti-malware and update their signatures. Also enable windows firewall. In 90% of the PCs I was called to help with these where not right. Microsoft Anti-Virus/Malware and Firewall are excellent free alternatives here that require close to zero user interaction/intervention and get their updates from Microsoft as part of the Windows Update service. So simple.
  3. Uninstall most stuff that comes bundled with the original PC and with hardware drivers: The HP suite to tools and games that came bundled with the printer drivers,  the Cannon suite that came bundled with your latest digital camera etc.
  4. Uninstall those Shell/Explorer extensions, Internet Explorer extensions and the rest of the known-evil-doers. If you don’t know which of those is an evil just delete them all – you want miss them.
  5. Prevent about 15 Windows services from loading in the boot. Some can be delay-loaded and some can be changed to load ‘manual’. Here’s the list I use.
  6. Uninstall redundant drivers that haven’t been used for a while. Same for applications that haven’t been used. Things like that old Printer you once had in the office, that Nokia Suite that used to sync your late mobile etc.
  7. Run a registry consistency and temporary files cleanup tool like the CCleaner that I regularly use. (CCleaner actually fixed some of the issues I’ve had – see above)
  8. Analyze the system event log and saved mini-dumps to find problematic hardware and drivers causing issues with the PC. Just recently I’ve had to replace a display adapter that was blue-screening my home PC every other day.
  9. Assist in updating the display adapter drivers to the latest – – a very common problem.
  10. Rebuild the TCP/IP stack and remove any Winsock/network providers.

Enough said. I still have a couple of issues on my new laptop to resolve today, although most issues I’ve already found and fixed 🙂

One twists I really liked in Soluto: They have this little tray-icon menu where one can click “My PC Just Frustrated Me”. I’ve clicked it a couple of times not knowing what will happen. It seems to be doing nothing – no dialog opened, no thank-you message. Nothing. Maybe they are just collecting data for their next version or something. Maybe a bug? Strangely, after clicking it I felt somewhat relieved. Like a small steam release.

Advertisement

OPs / Production services review: Nagios, Kiwi Syslog and Limelight CDN

A Review of Services I’ve used in GameGround – Part IV 

This is the 4th part in a series of blog posts reviewing several 3rd pty products and services I’ve used in GameGround and my take on them. The basic approach I’m taking here is the applicability of the product for a lean-startup that wants to move fast. In the last post I wrote about Analytics and BI Reporting tools for the marketing team. This post is about Monitoring the health of the system server – for the OPS team. Next in the series – development infrastructure.

Nagios

Nagios probably is “The Industry Standard in IT Infrastructure Monitoring” as their slogan says. It’s very popular among IT stuff and can be configured to monitor and alerts about up to 40-50 servers. So even a medium size company can use it. It’s free server software – basically a scheduler that executes service checks against installed agents and tests against network devices, reports back the results and raises alerts above predefined thresholds. There’s also a comprehensive list of extensions or plugins written by the community that can be utilized to monitor about anything you’ll ever want.

It’s easy to setup Nagios to watch for server disk-space, CPU and the existence of certain services. The difficult part is to create checks that would alert you if internal parts of the software behave irrational and users are not seeing what they should. E.g. certain transactions do not end in time, server response time for certain requests is going up, users suddenly cannot see their friend’s list etc. These are much harder to watch. To monitor these you’ll need to write code both on your back-end servers – special functions (REST/WSDL)  that would do some internal testing and return true/false accordingly. Nagios is able to call such functions periodically and alert if they failed. It’s an evolving process: You’ll see your systm fail without Nagios alerting about it and then add more of those checks till it functions well.

So- It’s wiser to add some testing functionality on design time: plan your server modules to have Nagios testing APIs. You’ll also need to watch that some of your 3rd pty providers are working right: If the A/B testing API you are using is down then your site is probably down too. If your Content Delivery Network is down ppl are not getting to see your website, although everything is functioning on your side.

Nagios – the CONS:

  1. Nagios was started in 1999 and is written in Perl. Although new versions have been released it fills like an old product and things seem harder to achieve than what we’re used to these days. Most of the checks have to be configured annually in config files including thresholds for alerts, the amount of times a failure does not raise an alert etc.
  2. To achieve the functional testing mentioned above and to integrate with all the plugins for the different OS types and monitoring you’d want to do you’ll need to code some scripts on Nagios side (or at least edit existing scripts). That’s Perl coding. Although Perl knowledge is still quite common it’s fast diminishing from the planet. It’s already hard to find IT managers who can code it and younger developers haven’t even heard of it… We ended up getting outside help to create the basic setup. That much about free software…
  3. The learning curve is long. Expect the system to text-message you false alarms at 2am, telling you the system is down, for a few months, until you get the thresholds right. Expect your CEO to call you at 2am to tell you the system is down but there was no alert… Lots of things went wrong in a live environment of (only) 4 servers we had in production in GameGround – it took us about 3-4 months to get to a relatively solid Nagios setup that actually alerted us on most of the real problems.
  4. Some of the things to watch for ened-up being certain errors written into the different servers’ log files. These may be critical bugs and exceptions thrown from bad things happening  down in your code stacks. So you can set up Nagios to grep the log files for those strings. This is very heavy on your servers and on the traffic. Better have a proper central log server with alerts (see below). But then this actually means that you’re going to have 2 monitoring systems – one is the Nagios and one in the central logging server.
  5. Nagios is good in giving you a green or red sign next to your servers/services. But in reality managers want to know ahead of time that things are going in the wrong direction: queues are not emptying fast enough, response time on some requests are mounting. Nagios is no good for those tasks. You cannot use it to create graphs and its dashboard is not flexible.
  6. You have to manually define each server and each service you want to monitor. This does not work for cloud-based environment where adding a server instance is  done in a click, or even automatically.

I don’t know of a good alternative. But I would like to see something that combines system health alerts with Syslog analysis and a real-time configurable dashboard. Any ideas?

Kiwi Syslog

If you want to have a good insight into what’s actually happening in your servers you must check the different servers’ logs. Getting all the logs from all the servers into one place and automating the search for errors, exceptions and irregularities is key to having a healthy working production environment. First product we checked following warm recommendations from friends was Splunk. It has excellent easy-to-use web-interface and the setup is very easy (assuming that your servers are written and configured to upload syslog/log4 to a central server…). But Splunk is VERY expensive, even for a small server setup like ours they asked for something like $6000/year. The free version is only good for internal testing and running on-top of QA systems. For production you’ll need the enterprise version. It does not make sense to pay that much in a startup… So we checked Kiwi Syslog.

Kiwi Syslog is a relatively small piece of software made by a NZ company. Their main interface is based on a Windows installed client. But they now also have a web-based dashboard that gives you the most important features. It’s easy to setup and work with. It’s cool. And it costs like 2% of Splunk’s cost. Go Kiwi Syslog! Go!

Limelight CDN

Working with a Content Delivery Network is an important factor in speeding your pages loading time. When we tested before-and-after we saw a dramatic decrease of first-time page load from 3-4 seconds to 2-2.5 seconds for US-based users. With later widgets and pages the load time was about 30% faster. This is a lot! The other reason you’d like to have a CDN is that it’s going to take a large percentage of the traffic from your servers – so you’ll end up having less servers and pay less on traffic.

The basic service a CDN offers is the speeding up of static content (Imgs, CSS, JS files) delivery. The advanced services CDNs offer are media streaming and something called Whole Site Delivery – out of scope for this blog post. For the small site/service you’re going to pay $1000-$2000/month for the basic CDN – it may not be too bad considering the reduced costs on servers and traffic.

If you know you’re going to use a CDN you can write your code and delivery procedures in a way that starting to use a CDN would just be a flip of a config file entry. If you already have a website/service functioning without a CDN you’ll probably need to do some work to separate and version the static files correctly and add proper configuration everywhere. So, with the right design you should be able to integrate with a CDN, change CDN or stop working with a CDN in a matter of minutes.

So the story goes like this: We decided we had to have a CDN because every millisecond of page load time is critical. This was before launching our initial service. We went shopping and were surprised – it seems that most of the bigger CDNs were not willing to work with us at this stage at all. Even the local rep of the local Cotendo (a startup sharing a VC with GameGround) never returned a phone call… Luckily the local rep of Limelight was willing to take the deal and after a couple of weeks on negotiations we switched NO the config and it was working well (we did have a couple of config issues – minor faults on our side)

Q: Should a small lean-startup deploy a CDN as part of their initial release?

A: NO NO NO. It’s expensive and the signing up with the local representative of a CDN will consume too much of your time.

Q: Should a lean-startup write their code with a CDN in mind?

A: Yes! Sure! This will allow you to speed up your site and offload traffic if and when your site/service is showing some signs of success. Coding with a CDN in mind won’t make it slower anyway.

Q: Can you give some hints on how to design it right to work with a CDN?

A: I promise to have a post about it later on…  << but if you have a specific Q – ask it in a comment below

Q: Are there no free/cheap alternatives?

A: There are! Check out this post about using Google Apps Engine as a free static data CDN. Also – this post about using DropBox as a free CDN solution. Note that if the delivery of the resources from those unofficial-CDNs is not faster than delivering them from your own site then adding a CDN configuration might actually slow down your site. Be ware!

%d bloggers like this: