AI — simplifying through a functional lens
Here’s how the AI industry is typically depicted. It’s no wonder it’s so impenetrable for most people.
7 clusters, 78 sub-clusters. 1,000’s of solutions. Good luck with your AI journey!
Our approach:
When we explain the AI stack, we like to use the analogy of the old-school slide projector. Specifically, the tray that you used to load up the slides.
The picture shown at the start of this post is what happens if you try to look from the end on, through all of the slides in the tray, all at once. Impossibly complex. Whereas if you look at it from the top, you see lots of separate slides. And you can select each slide separately to lift it out and look at it. You get a much clearer picture and each slide returns very different information.
Using this analogy, we’re going to attempt to demystify the AI stack — starting with one of the fundamental and consequential panes, the functional lens.
How does this whole complex environment work?
When you look at it this way, it’s pretty simple. When you think about the vast number of AI applications that you can conceive, they all boil down to a small number of functional components. Extract, predict, recommend, and convert.
And that’s what we’re going to talk about.
The first is Extraction — in order to make a decision on data, you need to have the data to make the decision on.
In the world of relational databases, where data was both structured and relational to other pieces of data, this was relatively simple.
In the world we now operate in, the data is much more far-flung and nowhere near as structured. So we have to have a new process to extract the data from its current environment.
A good example of this is a video frame.
A video frame would have some structured data attached to it. For example, it would have the frame number and the name, and a bunch of dimensions.
What it wouldn’t have is data on what’s in that particular video frame.
So if we wanted to extract that, we would probably want to use some sort of computer vision module to do the heavy lifting of identifying what is happening in that given video frame.
I talk in another post about the Netflix thumbnail personalisation app. This is an excellent example of Extraction. They process every single video frame in every show to parse which characters are showing, which emotions they are demonstrating, whether they’re looking directly at the camera or not — so that they can make a personalised thumbnail of that show. Just for you.
As it relates to documents — and data in documents — we’re all pretty aware of that subject because we’ve known about Optical Character Recognition for 20 or 30 years. So, for instance, it’s now trivial to extract data from a PDF. Pour petrol on that and advance the technology 20 or 30 years — that’s where we’re going. It’s mind-blowing what you can do in this space now.
So while it is just Extraction, it’s really important — you need to figure out how you’re going to get the data you want to make decisions on. And nowadays, you are not limited to only using structured data.
The second is Prediction — the idea of seeing unseen patterns and extrapolating those out into the future to predict something.
Predicting something that you think is going to happen is probably one of the most potent benefits of artificial intelligence and machine learning.
We’ve all come from the world of “business intelligence”, which was primarily driven by looking backwards. The opportunity to say “right, how might we predict something?” is really profound.
But how do you understand what you need to predict?
So we’ve extracted the data. What are the patterns in the data that you could look at that will begin to paint a forward picture? How would you know? The idea of pattern recognition and being able to extrapolate that into future circumstances is the core of what this component does.
What’s important is you don’t have to tell it what to look for — that’s really the big opportunity here. Historically, anything that was done for Prediction had people modelling it. They basically limited the outcomes based on their level of comprehension. Now, machines do the modelling — they can deal with vast amounts of data and see previously unseen patterns to predict what will happen in the future.
So, Prediction equals taking extracted data, extrapolating patterns from it, looking to the future, and making decisions on implications of that. That’s the core of the prediction business case.
A good example of the Prediction function is the work we’ve done with Ubco, which I’ve written about in another post. Being able to capture the behaviour of the bike rider (based on speeds, distance, etc.), they can now extrapolate and predict when they might have a technical issue. That’s a very, very powerful, and common use case of Prediction.
The third is Recommendation — understanding a given user in the context of all other users to recommend a unique experience.
The most famous and easily understood example of this is the Netflix recommendation engine, which I’ve talked about in more detail in another post.
At the heart of this is user personalisation. And it’s really important because it drives better experiences.
The core of this function is its ability to say, because you like X, you might like Y. Or equally, because you look like this other group of customers, and they like Z, then you might also like it.
This data is being increasingly augmented by external third party data. For instance, let’s say it’s raining where you live right now (we know this from weather data) — combining all of the above with the fact that it’s raining, we recommend a particular experience.
Whether you call it Recommendation, or user personalisation, it’s really just an ability to extrapolate very large sets of data and do cohort analyses between groups — and basically say, this is what we think is right for you.
The fourth is Conversion — the function of converting data and decision context from one situation to another.
This one’s a little bit harder to get your head around. There’s one aspect of Conversion that’s really obvious — language conversion. You’ve probably used Google Translate — they’re using AI to complete a language conversion.
A different level of Conversion is taking certain types of inputs and converting them for other users.
As an example, let’s use Tesla and their push for autonomous driving.
Inside their autonomous driving strategy, there are a huge number of AI use cases. For the sake of this post, we’ll simplify the instantaneous decision that needs to be made if the machine sees something crossing the road in front of the vehicle.
In order to make a decision to stop or not, the vehicle takes information extracted from a LIDAR module (a form of radar) and a computer vision module. LIDAR says there is something in front of you, and the computer vision tells you what it thinks it is.
At this point, they need to be Converted — so the decision engine can make a nano-second decision. If it’s a person, stop — if it’s a paper bag, keep going.
For some readers, like those familiar with software, this sounds like an integration challenge. Which, it is — on one level. But there is a big difference between the science and the software world here.
Maintaining the integrity of the data science requires Conversion — so that the important context and inputs are not lost as things are passed to the next module in the chain. If you don’t get this right, it impacts the whole system’s ability to learn.
As I said, it’s a little more abstract than the other functions — but a big part of the AI mix.
So there you go. So that’s the four functional aspects of the AI stack.
We’ve taken the initial pic of 7 clusters, 78 sub-clusters and 1,000’s of solutions and whittled it down to just four functions that you need to think about when you are on this journey.
In the next post, we’ll begin to look at examples (I’ve already used a couple here) of how these work in the real world. We’re going to start by looking at Netflix through this functional classification lens, simplifying it and hopefully making the whole subject more accessible.
Thanks for reading.