Core Technologies and Applications for Data Asset Management is a book published by Tsinghua University Press. The book is divided into 10 chapters, Chapter 1 mainly allows readers to recognize data assets, understand the basic concepts related to data assets, and the development of data assets. Chapters 2 to 8 mainly introduce the core technologies involved in data asset management in the era of big data, including metadata collection and storage, data blood, data quality, data monitoring and alerting, data services, data rights and security, and data asset management architecture. Chapters 9 to 10 mainly introduce the application practice of data asset management technology from a practical perspective, including how to manage metadata to realize the greater potential of data assets, and how to model data to mine the greater value in the data.
Book Description:Core Technologies and Applications for Data Asset Management
Today, I'm mainly going to share the first chapter with you:
Chapter 1 is titled Recognizing Data Assets and is divided into a total of eight subsections.
1.1 Basic introduction to data assets
Data assets are typically data sets that can be analyzed to reveal value, support business decision making, optimize business processes, predict future trends in an industry, or generate greater economic value. These data may be generated by the enterprise itself or obtained externally (e.g., social media, third-party data providers, web crawlers, etc.), and they may be in a variety of formats, including structured, semi-structured, or unstructured data.
Key characteristics of data assets: availability, accessibility, integrity, reliability and security
Availability: The data asset needs to be able to be used, if it can't be used, then the data asset can't reflect its core value, and the availability of the data asset needs to rely on the support of data quality, data monitoring and many other key elements.
Accessibility: This refers to the fact that the data asset needs to be accessible to the users of the data, if it cannot be accessed then the data asset will also appear to have no value, because only if it can be accessed can it go on to unlock more value from the data.
Security: This refers to the need for data assets to safeguard the security of their data, preventing it from being leaked, lost or tampered with by hackers.
Reliability: It means that the data assets must be reliable, otherwise they can't be used for decision making and judgment of the enterprise, if the data is not reliable, then the decision made through the data will certainly not be reliable, thus it will bring great loss to the enterprise.
Completeness: the data in the data asset must be complete, if the data is incomplete, then the information obtained will not be complete, incomplete data can not be used for data analysis, data decision-making and so on.
There are several common types of data assets:
Unstructured Data: Unstructured data does not have a predefined format or organization, making it more difficult to process and analyze. This type of data includes text documents, PDF files, emails, videos, images and audio files. Although more complex to process, unstructured data often provides richer information and insights, and is particularly valuable for areas such as machine learning and natural language processing.
Structured data: This type of data usually exists in a predefined data model, so it is clearly formatted and easy to search and organize. Structured data is usually stored in relational databases, such as SQL databases, which support complex queries, reporting, and analysis. For example, customer information, sales records, inventory data, and financial transactions can be stored in a structured form. Usually they exist in tabular form, with each column representing a data field and each row representing a data record.
Semi-structured data: Semi-structured data falls between structured and unstructured data, they may not conform to a strict database model, but contain labels or other markup to separate semantic elements and to make the hierarchy of elements recognizable.XML and JSON are typical examples of semi-structured data, and are widely used for web data interaction.
Real-time data: Real-time data is data that needs to be processed immediately for quick response or decision making. This type of data is very common in financial transactions, web analytics, Internet of Things (IoT) device monitoring, and online advertising. Real-time data processing typically requires a high level of technical competence to quickly capture, analyze, and respond to data streams.
Time Series Data: Time series data is information about data collected in chronological order and is often used to analyze data for trends, cyclical and seasonal variations, etc. Stock prices, weather records and surveillance data are examples of time series data.
Geospatial data: Geospatial data contains information about geographic location, which is critical in planning, logistics and location analysis. For example, map data, satellite imagery and GPS tracking data fall into this category.
Metadata: Metadata is data that describes other data, which can include file sizes, storage paths, creation dates, author information, and more. Metadata helps to organize, manage, and retrieve data, and is often an integral core component of data management, data warehouses, and data lakes.
1.2 Classification of data assets
The ways in which data assets are categorized typically include the following
Classification based on data sensitivity: Data can usually be categorized into different levels based on the sensitivity of the data, such as public data, internal-use data, sensitive data, private data, and top secret data. This classification generally helps businesses or organizations to take different protection measures for different levels of data to ensure data security and privacy.
Classification based on data source: Data can usually be categorized into many different categories based on the source of the data, such as membership data, merchandise data, business data, transactional data, third-party data, and so on. This type of classification generally helps a business or organization to better understand the sources and uses of data so that it can better utilize the data.
Classification according to data usage: According to the usage of data, data can usually be classified into various different categories, such as analytical data, decision-making data, sales data, and wind control data. This type of classification generally helps companies or organizations to better understand the application scenarios and purposes of data use, so as to better leverage the value of data use.
Classification based on data format: Based on the format and type of data, data can usually be classified into more different categories such as semi-structured data, structured data, unstructured data, text data, image data, audio data, etc. This classification helps the business or organization to better understand the structure and characteristics of the data so that it can be better processed and preserved.
1.3 Valuation of data assets
1. Valuing data assets through costs
Using cost to value data assets is an approach often used in the data domain, where the main idea is to determine how much value can be generated from data by considering the costs of acquiring, processing, storing, and later maintaining and upgrading the data.
2. Valuing data assets through earnings
Utilizing revenue to assess the value of a data asset means assessing how much value can be generated from the data asset based on the existing data asset's application and usage in the past and future application scenarios.
1.4 Quality of data assets
Ensuring high data quality is one of the core aspects of data asset management. An enterprise or organization manages its data because it needs to use the data or tap into the greater value of the data, and in order to ensure that the data meets the needs of the use, then it must do a good job of managing the quality of the data. Poor data quality is a costly drain on any business or organization. Low quality data usually incurs unnecessary cost expenses such as:
Data often requires constant rework and correction
Low quality of data leads to wrong decision making in the business or organization, which results in huge economic losses
The low quality of the data leads to less use of the data, which prevents the data from being utilized for its application value
Corresponding benefits of high quality data include:
Improve the customer experience
It's a better way to boost productivity.
Reduce the risk of low quality data creating uncontrollable risks
High-quality data that can lead to greater access and opportunity
Greater competitive advantage from insights into customers, products, processes and opportunities
1.5 Storage of data assets
From the traditional IT era to today's Internet era and the era of big data, with the continuous and rapid development of technology, the storage of data assets is also constantly undergoing radical changes, the development of data asset storage history is mainly divided into the following stages
Document Storage Times:
The age of common database storage:
The age of data warehouse storage:
The age of data lake storage:
The era of the one lake barn:
1.6 Management of data assets
Data asset management is a complex process involving the identification, classification, storage, protection and use of data. Data asset management includes the need to know the source of the data, where it is stored, its quality, applicable compliance requirements and how to maximize its use. When doing data asset management, you also need to be aware of the data lifecycle. The data lifecycle typically includes the creation, storage, use, sharing, archiving and destruction of data. At each stage, appropriate management measures are needed to protect the value of the data and ensure its quality and compliance.
Typically, the management of data assets includes:
Metadata management: As mentioned earlier, metadata is the data that describes other data, and is the core of data asset management. If there is no metadata management, when users use the data, they do not know what the data is, what information it contains, and where the data they need is, etc. Only by doing a good job of managing metadata can the data be retrieved more easily, and can the users of the data find the data they need quickly. data.
Data Acquisition Management: This usually refers to the management of acquiring data from the data source, for example, when there are many data sources, it is necessary to manage each data source that collects data.
Data Processing Management: When the data is acquired from the source of the data, it is usually necessary to do some processing and handling of the data, such as data formatting, data compression, and data outliers processing.
Master Data Management: It is the management of key data related to the entities of the core business, which may be different in different enterprises or different environments. Master data management can further increase the value of data and improve the responsiveness of data to business.
Data lineage management: It is the management of the correlation relationship between data. Through data lineage management, data users can know where the data comes from, what handling and processing have been done, and so on.
Data Quality Management: The management of data quality through the definition of data quality rules to measure the quality of data, the quality of data will be directly related to the value of data.
Data monitoring management: Data monitoring management is to monitor and alert data links, data tasks, data services, data processing resources and other links, when problems are found, the problem can be alerted and notified out in time, so that the data operation and maintenance personnel latter management personnel to deal with it in a timely manner.
Data service management: In data assets, data services are one of the most important forms of external use and access, and data can only reflect its own value by providing external access. The management of data services is the management of those services that provide external use of data.
Data Rights and Security Management: In data assets, the management of data rights and security is the management that allows the entire life cycle of the data to be free from unauthorized misuse of the data, thus protecting the security and privacy of the data from being violated.
Typically, data assets are managed in the following ways
Strengthen data governance: Generally speaking, data governance is the core of good data asset management, through the continuous establishment and improvement of data governance processes and norms, clear data management responsibilities and division of labor, data classification and labeling, so that the data is more convenient to be found.
Establishment of a perfect data quality system: data quality directly determines whether the data can play its due role, a sound and perfect data quality system can continuously improve the quality of the data, so that the data can more accurately support the enterprise or the organization's decision-making.
Establishment of perfect data rights and security management system: data security is the basis of the entire data asset management, the establishment of a set of systems, including data backup and recovery, data encryption and decryption, data rights control, etc., can better guarantee the safety and reliability of data.
Uncover more value of data through data analysis: Data analysis is the core application of data assets.
1.7 Informatization of data asset management
Informatization of data asset management usually refers to the management and maintenance of an enterprise's or organization's data assets through informatization technologies such as big data, and informatization of data asset management can bring the following benefits
Early detection of data problems: The computerization of data asset management can strengthen the quality of data as well as monitoring and alerts, so that when data problems occur, they can be detected early.
Improve the efficiency of data management: through big data and other IT technology means, automation and intelligent management of data can be realized, reducing manual operations as well as human errors, lowering labor costs and the risk of data errors.
Let the data can produce value faster: through big data and other IT technology means, let the data analysis, data mining more quickly, can be faster for the enterprise or organization to provide faster and more accurate decision-making.
Make data traceable and trackable: By building a data asset management platform to manage the processing and lineage of data, etc., users of the data will be able to trace the data back to its source.
The core elements of the data asset management informationization are as follows:
Data Acquisition: Realize automatic, real-time and accurate data acquisition from various business systems or hardware and software equipment through information technology.
Data Processing: Through Spark, Flink and other big data technologies, we do cleaning and conversion processing on the collected data in real time to dig out more data value.
Data Storage: Store massive amounts of data in different data types and formats by means of distributed storage technologies such as data warehouses or data lakes.
Data Services: Build a unified data services platform so that data can be easily accessed by business needs.
Data security: Establish an information-based security mechanism to automatically identify possible security access risks in the data, and automatically back up the data so that it can be automatically recovered in the event of data loss.
1.8 Data assets and artificial intelligence
With the new round of technological and technical changes, artificial intelligence has become one of the hot topics of current technology, and data is the key to empower the development of artificial intelligence.
Artificial Intelligence needs a large amount of data to support its algorithmic learning and model training, and the results predicted by AI algorithms need to be verified by data feedback to verify their accuracy, so it can be seen that data is the key to supporting the development of AI, and improving the accuracy of AI requires a large amount of data to continuously train its models.
The combination of data assets and artificial intelligence will make
Artificial intelligence is smarter and frees up more labor costs.
Can better and faster drive many traditional enterprises or organizations to do digital transformation.
Accelerate the process of scientific and technological development and discover more unknown laws and phenomena.
In short the combination of data assets and AI can bring greater business opportunities for many businesses or organizations, allowing them to better understand the data and explore more potential value in the data.