The technology responsible for AI’s recent breakthrough is data-driven machine learning. In this post we lift the veil on what ‘data-driven’ means, and why it’s important to understand even if you yourself are just a user of AI. The terminology used in the AI world is borrowed from our daily life language, but be aware that there are subtle differences in the meaning of terms.
Humans can learn by experience, for example on-the-job training. The concept of learning without an explicit knowledge transfer from man to machine has inspired the data-driven learning approach. Imagine the simple act of catching a thrown ball. There’s a mix of physics formulas needed to predict where the ball is going to go. We give medals to those few high school students who can work out those equations during their 3 hour final exams. Yet we expect kids in kindergarten to catch the ball in a split second. These kids probably don’t even understand the concept of gravity, other than “falling is ouch”.
Youngsters don’t hit the books, they are learning by example. So does supervised machine learning. A health AI may receive an example of a patient aged 30, with a weight of 60kg who recovered from surgery in 3 days. Another patient aged 40, weighing 80kg, recovered in 4 days, etc.. With enough examples, a machine learning algorithm trains a model that observes characteristics of a new patient (age 32, weight 72kg) to predict a recovery time of 4.32 days. A prediction (inference) for a particular patient could be wrong, just as a kid won’t catch all balls. But a successful algorithm will be approximately right most of the time.
Data-driven machine learning is powerful, but there is a catch: it’s only as good as the data it was fed. A model trained on children’s hospital data may not work well in the geriatric ward. If blood pressure is a key factor, but it’s not recorded in the data set, the model has no choice but to ignore. If the data set contains many characteristics that are not or hardly related to recovery time, the model may find spurious correlations by coincidence.
Machine learning methods are designed to cater for the possibility that a new case is not exactly the same as one of the cases in the training data set. However, the better the distribution of characteristics used for training reflects the real world, the more accurate AI will perform. That’s why “big data” has been one of the foundations for the rise of AI. Just remember that “big” is not all about size, but also about coverage and variety.
In the above example, we showed input values (age, weight) which are available at both training time and at inference time when a new patient is encountered. These are called samples or instances. The output variable (recovery time) is available at training time only, and is known as a target for the sample. At inference time, the challenge is to predict that value.
Data is the key ingredient for machine learning, but how does an AI know what outcomes we expect it to extract from the data? In the next post, we’ll discuss how problem definitions can go horribly wrong.
In this post’s example a surgery recovery time is predicted. What type of AI task is that? (See answer in blog post 1.)
When you’re working with students and their families, and interfacing with the government, data security is paramount. Canon’s iR-ADV Gen III Series III multifunction devices deliver multi-layered security that you can rely on to help protect sensitive data from internal and external threats.
In the new era of law, contracts are being completely re-designed or even re-imagined in various ways to make them easier to understand
EOFY isn’t only about getting your financials in order. It’s a great time for small businesses to review and plan
How to settle on the right practice management software for your business
Men, women and managers, how do their approach and attitudes toward remote working differ?
With automation becoming more and more sought after, what common tasks should remain sacred?
While email has become an everyday part of our work-life, you still might be doing it wrong.
If you lead an established, market-leading firm, you face a dilemma.
The first in a series of blogs on AI by Dr Jeroen Vendrig, from Canon Information Systems Research Australia (CISRA)
Email isn’t going anywhere, but there may be better ways to communicate and collaborate with your colleagues
When Mahoneys moved to larger premises, the firm looked to make improvements to security, efficiency and time. Mahoneys’ GM, Mike Haworth, tells the story.
Email our customer support teamSend an enquiry
For customer service and sales enquiries just give us a call from within Australia
(8am to 5pm, Monday - Friday)