Every year, Gartner publishes a hype cycle chart for emerging technologies. Get this: in the 2019 hype cycle, Gartner lists 29 emerging technologies, and at least 16 of them are related to data science, machine learning (ML), and artificial intelligence (AI)! Over half! That’s quite a dose of hype for one chart. The only topic with more industry hysteria might be 5G. But probably not.
ML/AI is definitely here to stay, so we need to understand it. Networking complexity keeps growing while businesses build deeper network integrations, so ML/AI becomes necessary to make networks more autonomous. Nonetheless, not every problem needs “data-driven AI,” since many networking challenges are well-solved by rule-driven, model-driven, or expert-driven approaches (more on this in a future post). But, it’s hard to get to that truth, or to tease apart the realities of general AI vs narrow AI. That’s why we’re embarking on this article series, because we want to sift through the noise and attempt to take an honest look at ML and AI, including the good, bad, and confusing.
To do that, let’s start with a little setup. ML/AI is emerging as the topic du jour because of 3 primary trends, which we will tackle next
I know, we beat this topic to death. Give me a chance to explain though because this cloud thread runs across all industries, products, and solutions. The right kind of cloud makes data science easier. It’s generally true that machine learning workloads in the cloud could be done on-premises (as they say, “cloud is just someone else’s computer”), but it’s exponentially more complex and costly, more difficult to evolve, less modular, and harder to manage programmatically with a thin IT team focused on network-driven business objectives.
Consider some of the data and performance advantages with cloud solutions:
Clearly it’s self-promoting to focus on the benefits of cloud, but it’s real. And cloud also helps us solve the next issue, which is related to data.
To get data science right, you need good data. Shocker, right?! Really though, you need data that is accurate, granular, accessible, and representative of all the diverse interests of your data algorithms (again, more on this in a future post). You need enough of the right data, and the ability to adapt to new data needs quickly. At the same time, too much data creates transport, storage, and processing overhead, which drives up costs and compromises ROI. Thankfully, the “data-driven” mantra has created a technology culture that is increasingly data literate, and in some cases, well equipped with the right balance of accessible data.
You might be surprised, but despite the seeming ubiquity of “big data,” most scalable data systems that provide the kind of performance we need for ML pipelines are cumbersome to manage and maintain because they were built for DevOps teams with very large data centers. Data redundancy and clustering can be difficult wrangling, corruption happens, and indexing processes hang. Also, increasing data granularity or storage duration to solve data science problems can multiply storage requirements in a hurry, but you can’t just “throw more disk at it” all the time to scale. But, this is largely solved with DevOps’ help operating a distributed data platform, where it becomes possible to leverage data science on a stable foundation. And perhaps it’s another success story of public and private cloud, where small operators get the same benefits of scale and stability without the overhead.
Architectures with data-centric approaches have several benefits:
So, cloud and data architectures are ready, but the lynchpin for ML/AI progress is the software.
If you take a quick survey of the data science industry, there are an endless number of tools and software solutions: data streaming and warehousing options, compute and ETL choices, data engineering platforms, complex event processing applications, transport (e.g. bus/queue) mechanisms, visualization libraries, and on and on. The sea of choice is vast, and growing rapidly as data becomes the language of the next decade.
This tooling ecosystem has completely revolutionized data science in just 5 years. If you rewind the clock just a bit and tried solving complex data problems “back then,” you’d hit new roadblocks (which require custom development work) around every corner. The delay and cost incurred by that one-off development were just too much to make many data projects successful. But, the open-source community and cloud computing giants have made effective data science possible. Several of the best tools today were jumpstarted as internal projects at the Big5 tech companies (Amazon, Apple, Facebook, Google, and Microsoft) and startups, and were either open sourced or are available at approachable cost as part of a cloud computing framework.
The accessibility (“democratization,” as they say) of this end-to-end ML/AI ecosystem makes it possible for everyone to build solutions with minimal ML/AI expertise. With even modest investment in ML/AI expertise alongside domain experts, you can build very sophisticated systems with impressive efficiency. And if you think about the value chain as companies are willing to open-source their algorithms, what this means is that the real value is not in the algorithm itself, but in the customer data; we just need to extract it and apply it to solving business problems.
It’s becoming common knowledge that ML/AI is not really a new technology. The theory and basic algorithms have been around for decades. But, for ML to really gain traction, we needed all of these layers of the ecosystem to develop around it. We needed sufficient progress in end-to-end handling of data volume and cost; ubiquitous computing (i.e. mobile) to generate relevant data for business applications; cloud services to lower the barriers of entry and enable rapid prototyping; and flexible software tooling for diverse market applications. And we are finally here.
Back to Gartner…their so-called “trough of disillusionment” is the post-hype stage when we face the disappointment of reality. If you remember the promises of SDN, the industry was all hyped up for automated everything; the reality of SDN was an incremental process improvement for some use cases, another tool in your toolbelt. ML/AI for networking will also follow the hype cycle, but the trough should be much more exciting. But, like most emerging technologies, hype happens because of misinformation—AI is treated as a magical wand. In goes the data, outcomes perfect insights.
That is the “why” behind this series. Again, ML/AI is here to stay, so we need to understand it. Everyone wants to sell its magical powers; but if we’re honest, there will be phases of evolution, stairsteps of progress with some disillusionment mixed in, as well as some really helpful use cases and operational enhancement too—we’ve already seen some of them. Artificial general intelligence (i.e. systems behaving like data-aware humans) is still distant. In the meantime, as networks are becoming more complex, intelligent data-driven systems are closing the gap.
So here’s the rub. Follow best practices in design. Continue building domain skills. Learn to use data-driven tools. AI is no magic cure for crappy deployments. But, ML and AI, and data-driven systems are raising awareness and facilitating problem-solving already, which is why Extreme believes in, and is investing in, all the varieties of data science at our fingertips.
***This is the first of a series of blogs exploring data science, ML, and AI. In upcoming blogs and videos, we’ll get into some of the technology of data science in-depth, and explore some ways Extreme is leveraging data to solve problems.
For more information about Extreme’s intelligent public, private, and local cloud options, here’s a link to our wares: https://extremeengldev.wpengine.com/kr/products/.
This blog was originally authored by Marcus Burton, Architect, Cloud Technology