Apply René Descartes’ principles of knowledge to your data science projects

--

Cogito, ergo sum

Descartes in his “Discourse on the Method”

Portrait of René Descartes (1596–1650)

Probably most of the people have heard of this classic philosophical statement by René Descartes — the great French intellectual, by many believed to be the founding father of modern philosophy. But have you heard of his four principles of knowledge? Philosophy in general, and theory of knowledge in particular, is sometimes given an inaccurate reputation as being “irrelevant” or “too abstract”. However, if you are a data scientist then stay tuned as you may find those Cartesian principles helpful in your data science projects.

By the end of reading this article you will:

1. Know how and why has René Descartes built his epistemology from scratch (if term ‘epistemology’ is new to you, don’t worry, — we’ll explain it in a second)

2. Be able to explain his four cardinal principles of theory of knowledge

3. Be able to outline the key properties of a data science project and see how to apply Descartes’ principles to those projects

Descartes wanted to avoid falling into the trap of ignorance

Descartes’ has grown within a scholastic tradition and well educated in the fields of logic, mathematics and philosophy. At the time, the Aristotelian worldview was dominating within philosophy, and I suspect our philosopher may have been a bit weary of his temporary academics’ self-confidence regarding “the truth”. As Descartes was seeking certainty he turned his attention to logic and mathematics, but concluded in his Discourse on the Method, that these two subjects will not yield him any new knowledge about the world. This is because logic is concerned with relation between ideas and can help us with figuring out the coherence of complex statements composed from elementary notions. So you launch from premises to end at correct conclusions and logic will help you make sure you arrived safely. But what if premises are false?

Descartes realized that principles of logic would be beneficial but there was no escape from the fact that premises he deemed true could’ve been false. And this means principles of logic wouldn’t make him immune from ignorance. So he came-up with these alternative four principles that he believed would make him reach certainty.

Principles of Cartesian epistemology

Epistemology is a fancy term for theory of knowledge. Put it very simply, such a philosophical theory is supposed to answer three key questions: what is possible to know, why is it so, and how can we know it? So Descartes has put forward principles that “will get you there” as far as epistemology is concerned.

The first methodological principle within Cartesian epistemology, we could call “doubt and search for what is obvious”. Practically, it means that you would systematically go through each and every “piece of information” that has managed to enter your psyche, and ask whether it is clear and distinct. Notice that undertaking such a procedure entails doubting all your notions. Once you conclude that any given idea is clear and distinct, you allow it into your corpus of “true statements”.

But how can we determine whether something is clear and distinct? There doesn’t seem to be an obvious answer to this, but the key point is to keep asking questions about assumptions and composition of our beliefs until it is unreasonable to methodically reject any idea or hypothesis concerned.

The second principle is connected to the first and can be named as “break things down and analyse”. Sometimes we may face issues, concepts or problems that are very complex for us to understand and therefore to make use of. Descartes is saying that we ought to divide them into their constituent parts. If we’ve done the process accurately we should end up with a list of simple ideas we can actually work with. Notice that we could effectively rely on “break and conquer” principle to follow thought the first principle. For example, when considering statement “un-happy customers cancel their subscription”, upon reflection we can say that this statement deals with two related yet different events: client dissatisfaction and churn event. We can then move on and figure out the factors that lead to dissatisfaction, and the process of cancelation.

The third principle is the principle of “synthesis and systematisation” and it is essentially the reverse process to what we considered so far. Once we have undertaken the analysis and conceived the simplified ideas, we can begin the process of building up from the ground-up. Lego will help us with analogy here. Your simple ideas or simplified hypothesis act as a building block for your grandeur building that is either a theory or a working system.

Finally, Descartes is advising us to bear in mind the re-evaluation principle. By periodically applying the above three principles, especially the first two, we ensure that our knowledge is reevaluated and fresh. This is important for empirical domains — since change is the only constant, what was true in the past may no longer be the case.

No man ever steps in the same river twice, for it’s not the same river and he’s not the same man

Heraclitus

Hopefully by applying the Cartesian epistemology we will be able to avoid falsehood and remain within the confines of what is reliable. And this outcome is what makes principles laid down by Descartes relevant to pretty much any endeavour we may consider. I also think these principles are attractive in data science and to understand why we need to review the structure and development of data science projects.

You can yield better outcomes in data science by applying Cartesian method

From a business perspective, there are different approaches in data science as far as value extraction is concerned. You can make an analogy of being blind in a dark room trying to find your way by touching surrounding objects. Gradually you build the picture and based on that you decide what products and solutions to offer, design and implement. Alternatively, you can have a fairly well developed problem or objective. In the latter case developing your data science project can be split into the following steps:

1. Problem definition

2. Hypothesis generation

3. Exploratory data analysis

4. Model training and validation

5. Model production

6. Model re-training or review

We can say that with such a pipeline in place we can make use of Descartes epistemological method, pretty much at each stage. We have to remember that data science is not just about being able to import a python library and running a fit command. As much as possible, we need to provide a coherent end-to-end tangible and working solution.

To achieve that, firstly, we have to engage with stakeholders to gather information about the problem to be solved. Here we can get a great benefit from the “doubt and search for what is obvious” and “break things down and analyse” principles. Putting solution aside, it is sometimes the case that clients of data science services have a vague understanding of the problem. But a vague understanding may not be enough to actually come with useful answer. So we have to doubt the assumptions, ask many questions (especially why-type questions) and look for what is obvious — for the things we deem clear and distinct, as Descartes would say. Moreover, although it is true that on occasions the problems are simple, often we have to deal with complex challenges. In these situations we can reduce the complexity and deal with the individual elements constituting the problem. Here it could be the case that we’ve got to ask for the right data, clean the data or create a new type of data. Any preliminary hypothesis is formed, tested or refined through data exploration. We then can go onto to apply the “synthesis and systematisation” principle by combining the bits to form a story, train and validate a model. If model passes the accuracy benchmarks we have to build a system where it can be implemented and used. Last but not least, as new data is gathered model may have to be re-evaluated and re-trained.

And after all that you move on to do your next data science project!

--

--