MongoDB: 5 Syntactic Weirdnesses to Keep in Mind

People like to complain about MongoDB. For instance, maybe they feel that it ruined their social network, or any number of other less recent complaints. The debate gets so heated, though, that sometimes valid criticisms - and nothing is above criticism - are dismissed as bandwagon hatred. It's a problem that Slava Kim seems very aware of in this recent blog post on some of the syntactic weirdnesses of MongoDB. It's not bashing, Kim stresses. For developers to effectively use any technology, they need to understand the "sharp edges."

Kim goes into detail for each warning, covering five general areas:

Microsoft Aims to Take Over IoT with Windows 10

Everybody's trying to get in on the big-money future of IoT, and now "everybody" includes Microsoft with Windows 10. Larry Dignan at ZDNet put together a look at Windows 10's role in IoT - as it's been described by Microsoft CEO Satya Nadella at the Gartner Symposium ITXpo, at least - which suggests that the new OS will be a central platform for IoT systems of all types. Specifically:

Windows will be able to run on everything from sensors to wearables to whatever computing shift emerges.

MaxScale for the Rest of Us, Part 3: Install and Configure MaxScale

This third post in this series of blogs about MaxScale is finally getting where you want to go: Install and configure MaxScale. The first blog in this series was an overview of what MaxScale is and the second about how to set up a Cluster of MariaDB servers, using MariaDB Replication, for MaxScale to access. But now it's time to introduce MaxScale.

If you skipped the second post as you already know how to set up MariaDB with Replication and all that, be remineded that I will use the same Linux server setup as outlined there even for the MaxScale server and for a client to do some testing, and I recommend you stick with that for now (for MariaDB itself you can use any relevant setup you want, MaxSCale doesn't really care, but MaxScale is pretty new and has still not been tested on that many platforms, so try to stick to the CentOS 6.5 setup I propose.

MapReduce Algorithms: Understanding Data Joins, Part II

hadoop-logoIt’s been awhile since I last posted, and like last time I took a big break, I was taking some classes on Coursera. This time it was Functional Programming Principals in Scala and Principles of Reactive Programming. I found both of them to be great courses and would recommend taking either one if you have the time. In this post we resume our series on implementing the algorithms found in Data-Intensive Text Processing with MapReduce, this time covering map-side joins. As we can guess from the name, map-side joins join data exclusively during the mapping phase and completely skip the reducing phase. In the last post on data joins we covered reduce side joins. Reduce-side joins are easy to implement, but have the drawback that all data is sent across the network to the reducers. Map-side joins offer substantial gains in performance since we are avoiding the cost of sending data across the network. However, unlike reduce-side joins, map-side joins require very specific criteria be met. Today we will discuss the requirements for map-side joins and how we can implement them.

Map-Side Join Conditions

To take advantage of map-side joins our data must meet one of following criteria:

The Magic Testing Challenge: Part 2

My last article raised an interesting discussion whether you should see tests more as documentation or more as specification. I agree that they can contribute to both of them, but I still think tests are just - tests...

There were also complaints about my statement that testing often becomes tedious work which nobody likes. Also here I agree, that techniques like TDD can help you to structure your code and make sure you code exactly what is needed by writing the tests, but the result of the process will still be a class which needs to be tested somehow.

JXSE and Equinox Tutorial, Part 3: Introducing the JP2P Container


  • It has been a while since the first and second posts of this series, but a lot has happened in the past few months, most notably the fact that the code from eclipselabs is going to be ported to Project Chaupal which will (eventually) be(come) the OSGI implementation of the JXTA specs. As a result, I decided to rename the packages and make the architecture as clean as possible prior to the change.

    • This tutorial will cover some of the features which are already available, and can help to make the development of JXTA applications in Eclipse/Equinox a bit easier.

High Availability with MySQL Fabric: Part II

Originally written by and

This is the third post in our MySQL Fabric series. If you missed the previous two, we started with an overall introduction, and then a discussion of MySQL Fabric’s high-availability (HA) features. MySQL Fabric was RC when we started this series, but it went GA recently. You can read the press release here, and see this blog post from Oracle’s Mats Kindahl for more details. In our previous post, we showed a simple HA setup managed with MySQL Fabric, including some basic failure scenarios. Today, we’ll present a similar scenario from an application developer’s point of view, using the Python Connector for the examples. If you’re following the examples on these posts, you’ll notice that the UUID for servers will be changing. That’s because we rebuild the environment between runs. Symbolic names stay the same though. That said, here’s our usual 3 node setup:

Here’s how Bell was Hacked: SQL Injection Blow-by-Blow

OWASP’s number one risk in the Top 10 has featured prominently in a high-profile attack this time resulting in the leak of over 40,000 records from Bell in Canada. It was pretty self-evident from the original info leaked by the attackers that SQL injection had played a prominent role in the breach, but now we have some pretty conclusive evidence of it as well:

The usual fanfare quickly followed – announcements by the attackers, silence by the impacted company (at least for the first day), outrage by affected customers and the new normal for public breaches: I got the data loaded into Have I been pwned? and searchable as soon as I’d verified it.

Geek Reading for the Weekend

I have talked about human filters and my plan for digital curation. These items are the fruits of those ideas, the items I deemed worthy from my Google Reader feeds. These items are a combination of tech business news, development news and programming tools and techniques.

I hope you enjoy today’s items, and please participate in the discussions on those sites.

Easily Find & Kill MongoDB Operations from MongoLab’s UI

A few months ago, we wrote a blog post on finding and terminating long-running operations in MongoDB. To help make it even easier for MongoLab users* to quickly identify the cause behind database unresponsiveness, we’ve integrated the currentOp() and killOp() methods into our management portal.

* currentOp and killOp functionality is not available on our free Sandbox databases because they run on multi-tenanted mongod processes.

The Difference Between TokuMX Partitioning and Sharding

In my last post, I described a new feature in TokuMX 1.5—partitioned collections—that’s aimed at making it easier and faster to work with time series data. Feedback from that post made me realize that some users may not immediately understand the differences between partitioning a collection and sharding a collection. In this post, I hope to clear that up.

On the surface, partitioning a collection and sharding a collection seem similar. Both actions take a collection and break it into smaller pieces for some performance benefit. Also, the terms are sometimes used interchangeably when discussing other technologies. But for TokuMX, the two features are very different in purpose and implementation. In describing each feature’s purpose and implementation, I hope to clarify the differences between the two features.

Continuous Delivery != DevOps

Continuous Delivery and DevOps are interdependent, not equivalent

Since the publication of Dave Farley and Jez Humble’s seminal book on Continuous Delivery in 2010, its rise within the IT industry has been paralleled by the growth of the DevOps movement. While Continuous Delivery has an explicit goal of optimising for cycle time and an established set of principles and practices, DevOps is a more organic philosophy that is defined as “aligning development and operations roles and processes in the context of shared business objectives“, and gradually codifying into principles and practices. Continuous Delivery and DevOps possess a shared background in agile methods and Lean Thinking, and a shared desire to eliminate Waterscrumfall silos – but what is the nature of their relationship?

Building a Data Warehouse, Part 5: Application Development Options

see also:

in part i we looked at the advantages of building a data warehouse independent of cubes/a bi system and in part ii we looked at how to architect a data warehouse’s table schema. in part iii, we looked at where to put the data warehouse tables. in part iv, we are going to look at how to populate those tables and keep them in sync with your oltp system. today, our last part in this series, we will take a quick look at the benefits of building the data warehouse before we need it for cubes and bi by exploring our reporting and other options.

Build Continuous Delivery In

Building Continuous Delivery into an organisation requires radical change

While Continuous Delivery has a well-defined value proposition and a seminal bookon how to implement a deployment pipeline, there is a dearth of information on how to transform an organisation for Continuous Delivery. Despite its culture-focussed principles and an adoption process described by Jez Humble as ”organisational-architecture-process not tools-code-infrastructure“, many Continuous Delivery initiatives fail to emphasise an organisational model in which software is always releasable. This contravenes Lean Thinking and the Deming 95/5 Rule – that 95% of problems are attributable to system faults, while only 5% are due to special causes of variation. Building an automated deployment pipeline can eliminate the 5% of special causes of variation in our value stream (e.g. release failures), but it cannot address the remaining 95% of problems caused by our organisation structure (e.g. wait times between silos). From this we can infer that:

Appsec and Technical Debt

Technical debt is a fact of life for anyone working in software development: work that needs to be done to make the system cleaner and simpler and cheaper to run over the long term, but that the business doesn't know about or doesn't see as a priority. This is because technical debt is mostly hidden from the people that use the system: the system works ok, even if there are shortcuts in design that make the system harder for developers to understand and change than it should be; or code that’s hard to read or that has been copied too many times; maybe some bugs that the customers don’t know about and that the development team is betting they won’t have to fix; and the platform has fallen behind on patches.

It’s the same for most application security vulnerabilities. The system runs fine, customers can’t see anything wrong, but there’s something missing or not-quite-right under the hood, and bad things might happen if these problems aren't taken care of in time.