Today, the world is trying to create and educate data scientists because of the phenomenon of Big Data. And everyone is looking deeply into this technology. But no one is looking at the larger architectural picture of how Big Data needs to fit within the existing systems (data warehousing systems). Taking a look at the larger picture into which Big Data fits gives the data scientist the necessary context for how pieces of the puzzle should fit together. Most references on Big Data look at only one tiny part of a much larger whole. Until data gathered can be put into an existing framework or architecture it can't be used to its full potential. Data Architecture a Primer for the Data Scientist addresses the larger architectural picture of how Big Data fits with the existing information infrastructure, an essential topic for the data scientist.
Drawing upon years of practical experience and using numerous examples and an easy to understand framework. W.H. Inmon, and Daniel Linstedt define the importance of data architecture and how it can be used effectively to harness big data within existing systems. You'll be able to:
- Turn textual information into a form that can be analyzed by standard tools.
- Make the connection between analytics and Big Data
- Understand how Big Data fits within an existing systems environment
- Conduct analytics on repetitive and non-repetitive data
- Discusses the value in Big Data that is often overlooked, non-repetitive data, and why there is significant business value in using it
- Shows how to turn textual information into a form that can be analyzed by standard tools.
- Explains how Big Data fits within an existing systems environment
- Presents new opportunities that are afforded by the advent of Big Data
- Demystifies the murky waters of repetitive and non-repetitive data in Big Data
If you want a copy of the book, you can order it at Amazon.com
The good thing on this book is that it covers the broad perspective of data: structured/unstructured, repetitive/non-repetitive, Big Data, Corporate Data. The authors describes the different types of data in their point of view and for the use of the book in being an introduction to data and data architecture nowadays. This is done in such a way that if you are not familiar with data and data architecture that it will become clear and you will have the broad knowledge when you reach the end.
What is also nice is that each chapter is very strict and clear in structure. First start with an Abstract followed by a page with a list of Keywords and already some context which are explained throughout the chapter. This is a very modular approach which makes it a pleasant read for me. You do not have to flip back some pages or chapters to catch up.
On the downside of the modular approach and the fact that there are lots of different ways to look at data some subjects are repeated in the different chapters, luckily in a consistent way.
From my personal opinion I do love the way the difference is made between the different types of data. There is Corporate Data which essentially is all data created by an organization or of interest for an organization. This is then divided into structured vs Big Data/unstructured. And finally the Great divide into repetitive unstructured data and non-repetitive unstructured data. The difference is made on how the data is managed, handled and business relevancy of the data. All aspects of data handling are touched in the book with examples to help understand the authors meaning.
The Data Vault / DV2.0 chapter was clearly written by another author. After reading this part the reader has a good understanding of Data Vault. There are however some parts that are a bit far-fetched to be put in this book.
The pictures in this book are very simple and are in fact standard Microsoft Powerpoint pictures. They however are sufficient enough. I was reading a digital edition and pictures aren't great in scaling on e-ink.
The only part I was missing is the claim that this book is written for tha Data Scientist. This is merely a book for someone who wants to now on data and all the aspects around data and data architecture. It does not have the depth to make the claim. It does however is broad enough!
Overall I can recommend this book to anyone new on the area of data and wants to now more. I will also note that if you are more experienced in the DWBI area this is a nice to read book.