Book "Data Asset Management Core Technologies and Applications" Share

Data Asset Management Core Technologies and Applications is a book published by Tsinghua University Press, and the main features of this book are as follows:

1、Relying on big data technology, exclusive encrypted data bloodline of the underlying technology to realize the

2、Detailed data asset management knowledge system and core technology

3. Apply metadata management and data modeling techniques to fully realize the greater potential and value of data assets.

4, the whole book from metadata, data blood, data quality, data services, data monitoring, data modeling, data architecture and other dimensions to analyze the core technology and application of big data asset management.

5, the book supporting the PPT courseware and the author of WeChat Q & A service

The core chapters are introduced as follows: the whole book relies on big data technology, exclusive encrypted data bloodline of the underlying technology to realize the

Typically, the sources of data bloodlines can include the data source itself, the task of data processing, and the orchestration system of the data task.

The data source itself: for example, like Hive, since it itself supports data processing through HQL, it can itself be analyzed through the process of data processing and thus obtain the bloodline.
Data processing tasks: this point is easy to understand, because whether it is a real-time task or an offline task, will involve the processing of data logic, from the underlying implementation of the data task technology, whether it is Hadoop's Map-Reduce task or Spark task or Flink task, the essence is also to do the conversion of data processing, there is a conversion of data, there may be a changes in data lineage.
Data task orchestration system: this point is also easy to understand, as shown in Figure 3-1-1 below, when task orchestration, many different task nodes may be connected in series according to the order of dependency. The data output of the previous task node will be the data input of the next task node, so it will certainly also produce data conversion, it will certainly also exist in the bloodline.

The underlying technical implementation of Data Bloodlines is highlighted, including:

How to get data bloodline from Hive
Getting Data Bloodlines from Spark Execution Plans
Getting Data Bloodlines from Spark SQL Statements
Getting Data Bloodlines from Flink
Acquisition of data from the scheduling system of data tasks