Member-only story

Data modeling culture versus algorithmic modeling culture

A critical choice for any data scientist

Javier Marin
3 min readOct 14, 2022
Picture from Pexels

Leo Breiman wrote an interesting article about the two cultures in the use of statistical modeling to reach conclusions from data (Breimam, 2001): the data modeling culture and the algorithmic modeling culture. The differences between these two models have recently come to the fore again in a discussion between Noam Chomsky (Katz, Y. , 2017) and Google’s director of AI research, Peter Norvig (Norvig, 2022). To recap these two cultures we can say the following:

  • The data modeling culture argues that nature can be understood as a black box with a very simple underlying model (that can be assumed) that translates from input variables to output variables.
  • The algorithmic modeling culture’s approach is to identify a function able to predict the output from a given input. But the inside of the box is unknown from this culture’s point of view.

The difference between these two approaches us that the conclusions made by data modeling are about the model, not about the nature of phenomena.

Usually, simple parametric models (from data modeling culture) imposed on data generated by complex systems result in a loss of accuracy and information as compared to algorithmic models.

Leo Breiman (2001)

Breiman argues that data modeling culture has some limitations, such as its (sometimes) low accuracy, the inability to present a clear picture of nature’s mechanism when we have complex data, and the reasonable doubt about whether the chosen statistical model is the one that best reflects the nature of the phenomenon. Chomsky opposes the algorithmic model in his discussion because the function it produces is difficult to understand, which, in his opinion, makes no sense. He would rather think that the model used to explain this data must be relatively simple. Norving says that reality is messier and “we shouldn’t accept a theoretical framework that places a priority on making the model simple over making it accurately reflect reality.”

--

--

Javier Marin
Javier Marin

Written by Javier Marin

Stats, ML/AI, data, management, strategy. All views are my own applied research. Sign up here: https://evenai.ghost.io/ Business inquiries: javier@jmarin.info

No responses yet