Will the tidal wave of real-time data require high-performance systems?
The real-time web has never been hotter, with Bing and Google presenting Twitter and Facebook updates in real time, and Google upping the ante with news headlines and MySpace updates, as well. This type of data is just the tip of the real-time iceberg, though, and displaying information in search results is only the first step in leveraging it. The question now is what types of systems companies will need to put in place in order to make the most of these mountains of data.
A recent article in the MIT Technology Review highlights the software solutions to this problem, including products like Truviso and Streambase. These products analyze streaming data in real time so that businesses — from banks to ecommerce sites to, really, anyone that has a constant stream of information — can make instantaneous decisions based on that data. As we speak, researchers are working to free MapReduce-Hadoop combo from its batch-processing prison and optimize it for real-time tools. Aside from algorithmic trading, real-time analysis could be used to deliver advertising or other relevant content on the fly, or to detect critical network problems. The results can connect to other data stores, such as the geographic information available via GeoAPI, to even further hone analysis and make even better decisions.
As real-time analysis catches on and becomes a must-have competitive capability, it stands to reason that big changes on the infrastructure front will follow. Will run-of-the-mill web companies build high-performance, low-latency systems like those in investment banks to ensure they can act on shared data faster than their competitors can? If this is the case, systems vendors might have to worry less about a tectonic shift to cloud computing than some suggest. For real-time processing, at least, the performance (and reliability) limitations of virtual machines will keep systems in-house. If these high-speed processing systems do become commonplace, vendors pushing InfiniBand solutions could reap the rewards of a vastly larger customer base.
And what about the data tier, after the information already has been processed? Truviso works with a relational database, but the prevalence of unstructured web data might necessitate the inclusion of NoSQL solutions, as well. Some might argue, too, that disks have no place in a real-time environment. Persistent in-memory data grids, flash memory and solid-state disks will provide faster serving of these streaming data once they have been processed and stored. Many database/data-warehouse operations also might move to the cloud, where high-performance data solutions are becoming quite prevalent, and where capital expenditures will be kept to a minimum — a big deal when you’re talking about storing huge volumes of data.
We’re just at the cusp of the real-time movement right now, but we’re moving fast. When companies begin truly harnessing the infinity of data constantly being generated, it could bring once-exclusive systems and technologies into data centers everywhere. The relative ubiquity of these systems could kick next-generation development into high speed – but that’s a whole other subject.