Ivory Tower Notes: The Problem | Towards Data Science

months on a Machine Learning challenge, solely to find you by no means outlined the “appropriate” drawback initially? In that case, or even when not, and you might be solely beginning with the info science or AI subject, welcome to my first Ivory Tower Observe, the place I’ll handle this subject.

The time period “Ivory Tower” is a metaphor for a state of affairs through which somebody is remoted from the sensible realities of on a regular basis life. In academia, the time period typically refers to researchers who interact deeply in theoretical pursuits and stay distant from the realities that practitioners face exterior academia.

As a former researcher, I wrote a brief sequence of posts from my outdated Ivory Tower notes — the notes earlier than the LLM period.

Scary, I do know. I’m scripting this to handle expectations and the query, “Why ever did you do issues this fashion?” — “As a result of no LLM advised me how one can do in any other case 10+ years in the past.”

That’s why my notes comprise “legacy” subjects reminiscent of knowledge mining, machine studying, multi-criteria decision-making, and (typically) human interactions, airplanes ✈️ and artwork.

Nonetheless, at any time when there is a chance, I’ll map my “outdated” data to generative AI advances and clarify how I utilized it to datasets past the Ivory Tower.

Welcome to publish #1…

How each Machine Studying and AI journey begins

— It begins with an issue.

For you, that is normally “the” drawback as a result of it is advisable reside with it for months or, within the case of analysis, years.

With “the” drawback, I’m addressing the enterprise drawback you don’t absolutely perceive or know how one can remedy at first.

A good worse state of affairs is if you assume you absolutely perceive and know how one can remedy it rapidly. This then creates solely extra issues which can be once more solely yours to unravel. However extra about this within the upcoming sections.

So, what’s “the” drawback about?

Causa: It’s largely about not managing or leveraging sources correctly — workforce, tools, cash, or time.

Ratio: It’s normally about producing enterprise worth, which might span from improved accuracy, elevated productiveness, value financial savings, income positive factors, quicker response, choice, planning, supply or turnaround occasions.

Veritas: It’s all the time about discovering an answer that depends and is hidden someplace within the present dataset.

Or, multiple dataset that somebody labelled as “the one”, and that’s ready so that you can remedy the drawback. As a result of datasets observe and are created from technical or enterprise course of logs, “there must be an answer mendacity someplace inside them.”

Ah, if solely it had been really easy.

Avoiding a unique chain of thought once more, the purpose is you will have to:

1 — Perceive the issue absolutely,
2 — If not given, discover the dataset “behind” it, and
3 — Create a strategy to get to the answer that may generate enterprise worth from it.

On this path, you may be tracked and measured, and time won’t be in your facet to ship the answer that may remedy “the universe equation.”

That’s why you will have to method the issue methodologically, drill all the way down to smaller issues first, and focus solely on them as a result of they’re the foundation explanation for the general drawback.

That’s why it’s good to learn to…

Think like a Data Scientist.

Returning to the issue itself, let’s think about that you’re a vacationer misplaced someplace within the large museum, and also you wish to determine the place you might be. What you do subsequent is stroll to the closest information map on the ground, which can present your present location.

At this second, in entrance of you, you see one thing like this:

Course of. Picture by Writer, impressed by Microsoft Learn

The subsequent factor you may inform your self is, “I wish to get to Frida Kahlo’s portray.” (Observe: These are the insights you wish to get.)

As a result of your objective is to see this one portray that introduced you miles away from your property and now sits two flooring beneath, you head straight to the second ground. Beforehand, you memorized the shortest path to succeed in your objective. (Observe: That is the preliminary knowledge assortment and discovery section.)

Nonetheless, alongside the way in which, you come upon some obstacles — the elevator is shut down for renovation, so you need to use the steps. The museum work had been reordered simply two days in the past, and the information plans didn’t replicate the adjustments, so the trail you had in thoughts to get to the portray will not be correct.

Then you end up wandering across the third ground already, asking quietly once more, “How do I get out of this labyrinth and get to my portray quicker?”

Whilst you don’t know the reply, you ask the museum workers on the third ground that can assist you out, and also you begin gathering the brand new knowledge to get the proper path to your portray. (Observe: It is a new knowledge assortment and discovery section.)

Nonetheless, when you get to the second ground, you get misplaced once more, however what you do subsequent is begin noticing a sample in how the work have been ordered chronologically and thematically to group the artists whose kinds overlap, thus supplying you with a sign of the place to go to search out your portray. (Observe: It is a modelling section overlapped with the enrichment section from the dataset you collected throughout college days — your artwork data.)

Lastly, after adapting the sample evaluation and recalling the collected inputs on the museum route, you arrive in entrance of the portray you had been planning to see since reserving your flight a number of months in the past.

What I described now could be the way you method knowledge science and, these days, generative AI issues. You all the time begin with the tip objective in thoughts and ask your self:

“What’s the anticipated final result I need or have to get from this?”

Then you definately begin planning from this query backwards. The instance above began with requesting holidays, reserving flights, arranging lodging, touring to a vacation spot, shopping for museum tickets, wandering round in a museum, after which seeing the portray you’ve been studying about for ages.

After all, there may be extra to it, and this course of must be approached in a different way if it is advisable remedy another person’s drawback, which is a little more advanced than finding the portray within the museum.

On this case, you need to…

Ask the “good” questions.

To do that, let’s define what a good question means [1]:

A good knowledge science query should be concrete, tractable, and answerable. Your query works properly if it naturally factors to a possible method on your challenge. In case your query is too obscure to counsel what knowledge you want, it received’t successfully information your work.

Formulating good questions retains you on monitor so that you don’t get misplaced within the knowledge that must be used to get to the precise drawback answer, otherwise you don’t find yourself fixing the fallacious drawback.

Going into extra element, good questions will assist establish gaps in reasoning, keep away from defective premises, and create various situations in case issues do go south (which just about all the time occurs)👇🏼.

**Picture created by Writer after analyzing “Chapter 2. Setting objectives by asking good questions” from “Suppose Like a Information Scientist” e-book [2]**

From the above-presented diagram, you perceive how good questions, at first, have to help concrete assumptions. This implies they must be formulated in a method that your premises are clear and guarantee they are often examined with out mixing up info with opinions.

Good questions produce solutions that transfer you nearer to your objective, whether or not by confirming hypotheses, offering new insights, or eliminating fallacious paths. They’re measurable, and with this, they hook up with challenge objectives as a result of they’re formulated with consideration of what’s doable, helpful, and environment friendly [2].

Good questions are answerable with obtainable knowledge, contemplating present knowledge relevance and limitations.

Final however not least, good questions anticipate obstacles. If one thing is definite in knowledge science, that is the uncertainty, so having backup plans when issues don’t work as anticipated is vital to supply outcomes on your challenge.

Let’s exemplify this with one use case of an airline firm that has a problem with rising its fleet availability on account of unplanned technical groundings (UTG).

These sudden upkeep occasions disrupt flights and price the corporate vital cash. Due to this, executives determined to react to the issue and name in a knowledge scientist (you) to assist them enhance plane availability.

Now, if this might be the primary knowledge science activity you ever acquired, you’ll perhaps begin an investigation by asking:

“How can we remove all unplanned upkeep occasions?”

You perceive how this query is an instance of the fallacious or “poor” one as a result of:

It’s not real looking: It contains each doable defect, each small and large, into one unattainable objective of “zero operational interruptions”.
It doesn’t maintain a measure of success: There’s no concrete metric to indicate progress, and should you’re not at zero, you’re at “failure.”
It’s not data-driven: The query didn’t cowl which knowledge is recorded earlier than delays happen, and the way the plane unavailability is measured and reported from it.

So, as an alternative of this obscure query, you’ll in all probability ask a set of focused questions:

Which plane (sub)system is most important to flight disruptions?
(Concrete, particular, answerable) This query narrows down your scope, specializing in just one or two particular (sub) techniques affecting most delays.
What constitutes “essential downtime” from an operational perspective?
(Invaluable, ties to enterprise objectives) If the airline (or regulatory physique) doesn’t outline what number of minutes of unscheduled downtime matter for schedule disruptions, you may waste effort fixing much less pressing points.
Which knowledge sources seize the foundation causes, and the way can we fuse them?
(Manageable, narrows the scope of the challenge additional) This clarifies which knowledge sources one would want to search out the issue answer.

With these sharper questions, you’ll drill all the way down to the true drawback:

Not all delays weigh the identical in value or affect. The “appropriate” knowledge science drawback is to foretell essential subsystem failures that result in operationally pricey interruptions so upkeep crews can prioritize them.

That’s why…

Defining the issue determines each step after.

It’s the inspiration upon which your knowledge, modelling, and analysis phases are constructed 👇🏼.

**Picture created by Writer after analyzing and overlapping totally different pictures from “Chapter 2. Setting objectives by asking good questions, Suppose Like a Information Scientist” e-book [2]**

It means you might be clarifying the challenge’s aims, constraints, and scope; it is advisable articulate the final word objective first and, aside from asking “What’s the anticipated final result I need or have to get from this?”, ask as properly:

What would success appear like and the way can we measure it?

From there, drill all the way down to (doable) next-level questions that you just (I) have realized from the Ivory Tower days:
— Historical past questions: “Has anybody tried to unravel this earlier than? What occurred? What continues to be lacking?”
— Context questions: “Who’s affected by this drawback and the way? How are they partially resolving it now? Which sources, strategies, and instruments are they utilizing now, and may they nonetheless be reused within the new fashions?”
— Impression Questions: “What occurs if we don’t remedy this? What adjustments if we do? Is there a worth we will create by default? How a lot will this method value?”
— Assumption Questions: “What are we taking without any consideration which may not be true (particularly in terms of knowledge and stakeholders’ concepts)?”
— ….

Then, do that within the loop and all the time “ask, ask once more, and don’t cease asking” questions so you’ll be able to drill down and perceive which knowledge and evaluation are wanted and what the bottom drawback is.

That is the evergreen data you’ll be able to apply these days, too, when deciding in case your drawback is of a predictive or generative nature.

(Extra about this in another notice the place I’ll clarify how problematic it’s attempting to unravel the issue with the fashions which have by no means seen — or have by no means been educated on — related issues earlier than.)

Now, going again to reminiscence lane…

I wish to add one vital notice: I’ve realized from late nights within the Ivory Tower that no quantity of information or knowledge science data can prevent should you’re fixing the fallacious drawback and attempting to get the answer (reply) from a query that was merely fallacious and obscure.

When you have got an issue available, don’t rush into assumptions or constructing the fashions with out understanding what it is advisable do (Festina lente).

As well as, put together your self for sudden conditions and do a correct investigation together with your stakeholders and area consultants as a result of their persistence might be restricted, too.

With this, I wish to say that the “actual artwork” of being profitable in knowledge initiatives is understanding exactly what the issue is, determining if it may be solved within the first place, after which developing with the “how” half.

You get there by studying to ask good questions.

If I got one hour to avoid wasting the planet, I might spend 59 minutes defining the issue and one minute fixing it.

Thanks for studying, and keep tuned for the subsequent Ivory Tower notice.

In case you discovered this publish helpful, be at liberty to share it together with your community. 👏

Join for extra tales on Medium ✍️ and LinkedIn 🖇️.

References:

[1] DS4Humans, Backwards Design, accessed: April fifth 2025, https://ds4humans.com/40_in_practice/05_backwards_design.html#defining-a-good-question

[2] Godsey, B. (2017), Suppose Like a Information Scientist: Sort out the info science course of step-by-step, Manning Publications.

Source link

How to Build an MCQ App

Simulating Flood Inundation with Python and Elevation Data: A Beginner’s Guide

LLM Optimization: LoRA and QLoRA | Towards Data Science

President Donald Trump Says TikTok ‘Will Be Protected’ in US

Testing the Power of Multimodal AI Systems in Reading and Interpreting Photographs, Maps, Charts and More

Chapter 4: The Watchers, the Vigils, and The Will | by David Samuel Joy | Apr, 2025

Why Gen Z Is Ditching the Corner Office Dream — and How Businesses Can Adapt

Manus AI: The World’s First Truly Autonomous AI Agent? | by Cogni Down Under | Mar, 2025

Most Popular

K-Nearest Neighbor (KNN) — The Lazy Learning Algorithm | by Bhakti K | Feb, 2025

Mastering Natural Language Processing — Part 13 Running and Evaluating Classification Experiments in NLP | by Connie Zhou | Apr, 2025

How to Use Open-Source Tools for Data Governance

Our Picks

Partying Like A Young Degenerate Is Not Good For Your Finances

Could AI Be the Key to Addressing Upcoming Staffing Shortages?

The Logic Gap: AI Insights vs. Policy Actions | by Sheedeh Rahimi | May, 2025