Abstract: This article focuses on deploying Datavines and executing check jobs based on source code, and is divided into the following sections:
- Platform Introduction
- Rapid deployment
- Operational data quality check operations
The goal of Datavines is to become a better open source project in the field of data observability, solving problems in metadata management and data quality management for more users. We sincerely welcome more contributors to participate in the community building, grow with us, and work together to build a better community.
/datavane/datavines
/datavane/datavines/issues
/datavane/datavines/pulls
Platform Introduction
Datavines
Is a one-stop open source data observability platform, providing metadata management, data overview report, data quality management, data distribution query, data trends insights and other core capabilities, is committed to helping users to fully understand and control the data, so that you do have a clear picture.
Rapid deployment
environmental preparation
installed inDatavines
Before doing so, please make sure you have the following software installed on your server
-
Git
Ensuregit clone
successful implementation of the Convention on the Rights of the Child -
JDK
Ensurejdk >= 8
-
Maven
, to ensure the smooth packaging of the project (of course you can also package it locally and upload it to the server later) -
MySQL
, Ensure version>=5.7
Download Code
git clone /datavane/
cd datavines
Database preparation
Datavines
The metadata is stored in a relational database that currently supports theMySQL
The following is based on theMySQL
As an example, the installation procedure is illustrated:
- Creating a database
datavines
- fulfillment
script/sql/
Script for database initialization
Project construction
Pack and unzip
mvn clean package -Prelease
cd datavines-dist/target
tar -zxvf datavines-1.0.
After unzipping, go to the directory
cd datavines-1.0.0-SNAPSHOT-bin
Editing configuration information
cd conf
vi
Modify database information
spring:
datasource:
driver-class-name:
url: jdbc:mysql://127.0.0.1:3306/datavines?useUnicode=true&characterEncoding=UTF-8
username: root
password: 123456
If you are using theSpark
does the execution engine and is submitted to theyarn
to execute it, then it needs to be executed in the Medium Configuration
yarn
Related information
- standalone mode
=standalone
=http://%s:%s/ws/v1/cluster/apps/%s # the first %s needs to be replaced with the ip address of yarn
=8088
- ha mode
=ha
=http://%s:%s/ws/v1/cluster/apps/%s
=8088
=192.168.0.1,192.168.0.2
Starting services
cd bin
sh start mysql
Check the log, if there is no error message in the log and you can see the[INFO] 2022-04-10 12:29:05.447 :[61] - Started DatavinesServer in 3.97 seconds (JVM running for 4.69)
is used to prove that the service has been successfully started.
Access to front-end pages
Enter it in your browser:Server IP:5600
Then you will be redirected to the login screen, enter your account password.admin/123456
Operational data quality check operations
Creating a Data Source
After entering the home page, click on the upper right cornerCreating a Data Source
button, enter the name of the data source, and then select the data source type. Take theMySQL
For example, typeMySQL
connection information, clicktest connection
button. If successful, click thesave (a file etc) (computing)
。
Access to data sources
Click and go to the data source and find theoperations management web page
Creating checking assignments
-
strike (on the keyboard)
Creating Rule Assignments
button, selectData quality operations
-
Go to the configuration page of the rule
-
Perform rule configuration
- option
Enumeration value [not present] checking
rules and regulations - Select the database, table and column in turn
- Input enumeration array
[0,1]
- option
-
Perform Expectation Configuration
- If there are no expectations then select
not have
- If there are no expectations then select
-
Perform calibration configuration
- option
actual value
Check the formula,>
Comparator and input threshold10
- This constitutes
[Actual value > 10]
The formula , when the formula holds, indicates that the result of the check is a success, otherwise it is a failure.
- option
-
Perform error data configuration
- Select Save in source data source and fill in the database you have created.
-
After completing the configuration click
Save and run
to perform inspection operations.
Viewing information about a rule job
existJob Listings
Locate the inspection job that was just created and executed.
strike (on the keyboard)Record of implementation
page, you can see the execution history list.
strike (on the keyboard)log (computing)
button, you can see the log information of the rule execution.
strike (on the keyboard)in the end
button, you can see the results of the rule execution check.
strike (on the keyboard)error message
button, you can see the error data of the rule execution.
concluding remarks
This article describes in detail the Datavines platform deployment installation to run the entire process, each link illustrated, I believe that many partners are eager to try, move it, more exciting waiting for you to dig.
About Datavane
Datavane is an open source organization (community) focusing on the field of big data, co-founded by a group of outstanding open source project authors in the field of big data, aiming to help open source project authors to better build projects and provide high-quality open source software for the public, with the purpose of: just to make a good software. Currently has gathered a number of high-quality open source projects related to data integration, big data component management, data quality and so on.
In the Datavane community, all projects are open source and open to potential projects with quality code and architectural design. The community maintains open neutrality, collaborative creativity, and adherence to excellence, and encourages all developers, users, and contributors to actively participate in our community, work together, innovate, and create to build a stronger open source community.
Official website./
Github : /datavane