FAQ: Commonly used Terms and Concepts to Help you Understand AI



Annotations are the lifeblood of AI. Without annotations, it is impossible to train or test a model. Annotations enable us to show algorithms examples of both an input (the data you generate) and our expected output (the label we want the algorithm to generate). In the absence of annotations there is nothing to learn. Annotations are the basic building block of AI and their development is critical to the success of every AI implementation.


Structured vs Unstructured

Structured and unstructured data are the two forms of data. Structured data is data that is easily machine readable and available in a fixed field in a table or database that has a predictable and constrained format. Structured data are things like temperature readings contained in a spreadsheet, the cost of a share on the New York Stock Exchange at a particular time (or over a specific time period), or the colour of a specific pixel in RGB values in an image.

Unstructured data is essentially anything else. Unstructured data is things like this micro post. Free-flowing text in documents, video clips from YouTube and telephone calls are other examples of unstructured data. All of these examples have structured components (typically the meta-data) however the actual semantic content of the information is inherently unstructured.


AI vs IA

AI is artificial intelligence. It’s why you’re here. But maybe you should be here for IA. IA is intelligent automation. It doesn’t require fancy algorithms, it only requires creativity and an understanding of your workflow. Often times you are better served by focusing on intelligent automation first with a view to building the annotations needed to build your AI systems.

Limitations of AI

With all the hype surrounding AI, one might think that there is nothing it can’t do. But this couldn’t be further from the truth. AI has a lot of limitations and understanding them can help you or your business make the most of what AI has to offer. Three of the most common limitations are the data, the types of problems it can solve and explainability.  

When it comes to machine learning, your model is only as good as your data. At the moment, AI is essentially pattern recognition on steroids. But it starts off really dumb. In order to get it to a point where it’s useful, you need to show your model TONS examples. Don’t have tons of examples? Your model will likely perform very badly. If you do? Well that comes with another set of problems. In most cases, data is messy. In order to train a model, you data needs to be organized in a structured way. For example, if you want to train your model to tell the difference between cats and dogs, you usually can’t just show it pictures and expect it to figure it out. You need to classify these pictures with labels as either a cat or a dog. This kind of labelling to structure data is incredibly time consuming. On top of that, even if you have tons of examples that are labelled and ready to go, if your data is biased (for example, if you have more cat pictures than dog pictures) your model will be too. But, just having your data is not enough.

AI is also limited in the types of problems it can solve. With AI, machines can learn to recognize patterns in your data. However, it can only do this in a very narrow way and it doesn’t generalize well. Essentially, if you train your model to detect cats, it won’t be able to detect dogs and if you train your model to differentiate dogs from cats, it likely won’t be able to differentiate wolves from jaguars. Furthermore, abstraction, like creativity and emotion are huge challenges for machine learning. Have a problem which requires deciding the emotional impact of firing an employee? That’s still something you should be asking a human for feedback on. But for problems that AI can solve, there is still the issue of how the model is making decisions.

Explainability, also often referred to as the “black box” problem,  comes from the fact that we don’t know how our models are making decisions. When it’s deciding whether a picture of a cat is a cat is it looking at it’s ears? It’s nose? It’s tail? The size in general? All we see are the weights of pathways in a mathematical model changing. If you’ve made a model that decides whether or not someone should be approved for a mortgage, they’re likely going to want to know what factors impacted that decision. This becomes difficult to answer.

Although it’s an incredibly useful tool, we haven’t even begun to reach the limit of the applications for it; we’re a long way from running into the Terminator.


Capture, Curate, Consume

Capture, Curate, Consume are the three stages of an operational AI pipeline. Capture refers to the act of acquiring the information. Whether that is through a third party API or by tracking user interactions the information is acquired. Curate refers to removing unnecessary information or cleaning information for noise. For example, removing duplicate records from a database and performing checks to make sure data is effectively managed and stored in appropriate records are examples of curation. It is a critical step since ineffective curation can lead to huge data management challenges later on. Finally consume is the act of taking the captured and curated information and transforming the information into usable insights by processing the information through the AI model of your choice.


Human in the loop

Just as the name suggests, human in the loop machine learning leverages both human and machine intelligence to build the best possible model. By keeping humans directly involved in the training, tuning and testing process you can better optimize your model. While there are some problems which machines are uniquely positioned to solve, there are still many instances where they fall short. By having a human intervene and correct inaccuracies in the predictions made by a model, you can increase accuracy and deliver higher quality results.