Posts for the month of November 2012

How to avoid crap.

The rule of thumb when dealing with Open Source solutions is this:

It must be evolved solution, built in a recursive bottom-up process, for the author's own needs to deal with a real-world problem.

This is how a masterpieces such as Scheme, Erlang, memcached, nginx, redis or riak were made.

The authors start from the foundations - design decisions, appropriate level of abstraction and clean, close-to-hardware implementation of the basic building blocks, such as buffers, hashes, RPC. Then they grow, evolve, adapt the code according with evolving and growing of their own understanding and experience, in parallel.

What we have instead, are tons of crap, piled up without any serious design work or proper studying of underlying ideas, by jumping right into IDE to monkey-patch it all together and sell to a bigger fool.

This is the standard approach in Java world, which we do not even considering here. This is how we got a crap like PHP or MongoDB.

MongoDB is supposed to be a fast and very popular document-oriented database, the default choice for ignorant amateurs. They even tried to exploit the idea of redefining the LAMP stack - "M now stats for MongoDB instead of MySQL". In other words, it is professionally marketed and pushed.

When you take a lock at its website it is all about success of developers, how easy and fast they can start coding, without any thinking. This is the basic selling strategy - it is all easy and no thinking or even understanding is required.

There is what they say:

MongoDB allows very fast writes and updates by default. The tradeoff is that you are not explicitly notified of failures. By default most drivers do asynchronous, ‘unsafe’ writes - this means that the driver does not return an error directly, similar to INSERT DELAYED with MySQL. If you want to know if something succeeded, you have to manually check for errors using getLastError.

Notice the language usage - fast writes and updates, and especially not explicitly notified of failures. Actually it means that they just cheating you providing an unsafe storage, with possible data-lose in case of a failure (segfault, or kernel trap or a hardware or FS failure).

Notice also that getLastError - it is definitely from PHP.)

Would a sane developer create such solution for himself?

There is more.

Older versions of MongoDB - pre 2.0 - had a global write lock. Meaning only one write could happen at once throughout the entire server. Wait, what?!

This is needed to be re-read. This means that before 2.0 it has the same logic as a plain file and a lock - acquire a lock, append to the end of file (no, the have no append-only journal until 2.0), release lock. Do they call it a database? Of course, you cannot find such details on their web site. There is nothing but marketing-speak there.

Would a sane developer create such solution for himself?

There is one more thing:

MongoDB uses memory mapped files and flushes to disk are done every 60 seconds, which means you can lose a maximum of 60 seconds + the flush time worth of data.

I think comments aren't necessary.

So, there is two basic strategies - grow up your own solution, or pile up some code for sale. The second one is a major one, and products built this way should be avoided. PHP, MySQL (with MyISAM), MongoDB, NodeJS, Clojure are among the list of the most popular examples).

As in any other markets, detecting and avoiding scams is a crucial skill. If something was created for sale, like Java, it is probably a well-financed scam.