<p>Talking with people about BI, it is not unusual for the question of what exactly Big Data is to arise. It seems to be a generally accepted idea that big data is a vague term, while actually it is clear what big data is about. What we call <em>Big Data</em> is about this:</p>
<ol>
<li>Companies that started to handle massive amounts of data like Google, Amazon, Yahoo, Facebook, Twitter, etc. realized that I/O (input/output, the data flow) is the bottleneck in analytical data processing and not the available processing power. They realized that the processing power has to be brought to the data instead of the data to the processing power. And they started to use clusters of what is called commodity hardware instead of ever bigger and more powerful servers. Massive racks with standard “pizza boxes” are used, and the hardware involved is still getting leaner with ARM and Celeron based green computers.</li>
<li>These companies, whose business model is creating value from data instead of creating value from writing and selling software, started to create and publish very capable open source data-storage and analytical software. Unlike traditional software publishers who make money from the use of their software by others. It is in the interest of these companies that as many people as possible are adding to the software and it is thus in their interest that it is used as widely as possible. It is also in their interest to adhere to open standards, or create them, and publish their application programmer interfaces. This aspect is not to be overlooked in comparison to classical commercial software,   it is the reason that the “big data ecosystem” is growing very fast. Meanwhile, these new kids on the block <em>(Amazon, Google, etc.)</em> apply that fast growing software themselves to make money.</li>
<li>This hardware architecture and software allows organizations to store and analyze the complete relevant history at the deepest relevant level of detail on all relevant subjects, with no sampling involved. In a certain sense, this complete model of activities and concerns poses the “<em>end of why</em>” for most business purposes.</li>
</ol><a class=Read the whole article">
Big Data: What's in a name, by Henk Scholten
19-08-2015 13:31

Talking with people about BI, it is not unusual for the question of what exactly Big Data is to arise. It seems to be a generally accepted idea that big data is a vague term, while actually it is clear what big data is about. What we call Big Data is about this:

  1. Companies that started to handle massive amounts of data like Google, Amazon, Yahoo, Facebook, Twitter, etc. realized that I/O (input/output, the data flow) is the bottleneck in analytical data processing and not the available processing power. They realized that the processing power has to be brought to the data instead of the data to the processing power. And they started to use clusters of what is called commodity hardware instead of ever bigger and more powerful servers. Massive racks with standard “pizza boxes” are used, and the hardware involved is still getting leaner with ARM and Celeron based green computers.
  2. These companies, whose business model is creating value from data instead of creating value from writing and selling software, started to create and publish very capable open source data-storage and analytical software. Unlike traditional software publishers who make money from the use of their software by others. It is in the interest of these companies that as many people as possible are adding to the software and it is thus in their interest that it is used as widely as possible. It is also in their interest to adhere to open standards, or create them, and publish their application programmer interfaces. This aspect is not to be overlooked in comparison to classical commercial software,   it is the reason that the “big data ecosystem” is growing very fast. Meanwhile, these new kids on the block (Amazon, Google, etc.) apply that fast growing software themselves to make money.
  3. This hardware architecture and software allows organizations to store and analyze the complete relevant history at the deepest relevant level of detail on all relevant subjects, with no sampling involved. In a certain sense, this complete model of activities and concerns poses the “end of why” for most business purposes.
Read the whole article

Opslaan