The complete code and attachments for this article have been uploaded to my
Github
repository/CNFeffery/DataScienceStudyNotes
1 Introduction
Hello everyone I'm Mr. Fei, and today I'd like to introduce you to thePython
The framework is calledibis
, yes, the same name as the famous hotel chain Ibis, whose authors created thepandas
、Arrow
and other well-known frameworks such as theWes McKinney。
ibis
The core concept is to use the same set of data frame operation API, unified manipulation of a variety of mainstream data computing frameworks, so that users can more easily and consistently build their own data analysis and processing, rather than confined to different frameworks between the use of the thousands of differences, the following let us together to learn the basic use of the 😉 ~!
2 Introduction to basic ibis usage
2.1 ibis installation
For demonstration purposes, we'll use theconda
maybemamba
Create a new virtual environment to install the trialibis
in order tomamba
For example, execute the following commands in the terminal to complete the creation and activation of the demo virtual environment and install theibis
Most basic and relevant modules + sample dataset modules:
mamba create -n ibis-demo python=3.9 -y
mamba activate ibis-demo
pip install ibis-framework[duckdb,examples]
Verify that the installation was successful:
python -c "import ibis;print(ibis.__version__)"
As you can see, we're done with theibis
installation, the current version is9.0.0
:
2.2 Main functions of ibis
2.2.1 Constructing an analytical logic with portability
ibis
itself does not directly perform analytical calculations, when we write the corresponding calculation logic for the target data theibis
After the code is actually executed, its underlying layer automatically transforms the computational logic into the current computational backend. Currentlyibis
be in favor ofMore than 20 typesCompute the back-end, both of which are currentStand-alone analysis、distributed analysis (computing)The dominant framework in the field:
This makes it possible to create a system based onibis
Built data analysis workflows that can be easily extended to apply to arbitrary computational backends without modifying the code.
For example.ibis
Default useDuckDB
As a backend for standalone analytic computing, you can use some of the data locally to build and validate the business analysis logic before applying the same code and switching the computational backend to something likeClickHouse
performs distributed operations on it, a feature that is alsoibis
portabilityThe Embodiment.
As a simple example, first we basedpandas
Generate sample data with 10 million rows of records and export as aparquet
Format:
import numpy as np
import pandas as pd
# Generate test data and export to parquet format
(
pd
.DataFrame(
{
'categories': (list('abcdef'), 10000000),
'values': (0, 100, 10000000)
}
)
.assign(value = lambda df: df['value'].round(3))
.to_parquet('')
)
Next we'll start by using the defaultDuckDB
The back end, in theibis
performs some sample operations in the
Next, we switch the computational backend topandas
, executing exactly the same computational code, one can see a steep increase in computational time consumption, after all, thepandas
Processing tens of millions of rows of data is far less powerful than theDuckDB
, this example exemplifiesibis
Powerful code portability:
2.2.2 Pairing Python and SQL well together
ibis
Another very powerful feature is the ability to fully integrate thePython
code andSQL
code to carry out the analysis, e.g., you can set theibis
The analysis code is directly converted toSQL
Statements:
It is also possible to directly execute theSQL
The statements are analyzed:
this has led toibis
It can fully connect all kinds of state-of-the-art computing backends to help analysts easily maintain business analysis logic.
Whether you were originally usingpandas
、SQL
neverthelessR
Conducting data analysis.ibis
The official website documentation all provides very user-friendly instructions to ensure that you can migrate painlessly using theibis
:
ibis
is in a phase of rapid iterative development, and its code repository is undergoing new commit activity almost daily, so ⭐ support is welcome:/ibis-project/ibis
For more information, please move to its official website to learn more:/
This is the whole content of this article, welcome to discuss with us in the comment section~