Location>code7788 >text

(Data Science Learning Handbook 163) ibis: A Highly Promising Python Data Analytics Framework

Popularity:497 ℃/2024-08-07 16:14:57

The complete code and attachments for this article have been uploaded to myGithubrepository/CNFeffery/DataScienceStudyNotes

1 Introduction

Hello everyone I'm Mr. Fei, and today I'd like to introduce you to thePythonThe framework is calledibis, yes, the same name as the famous hotel chain Ibis, whose authors created thepandasArrowand other well-known frameworks such as theWes McKinney

ibisThe core concept is to use the same set of data frame operation API, unified manipulation of a variety of mainstream data computing frameworks, so that users can more easily and consistently build their own data analysis and processing, rather than confined to different frameworks between the use of the thousands of differences, the following let us together to learn the basic use of the 😉 ~!

2 Introduction to basic ibis usage

2.1 ibis installation

For demonstration purposes, we'll use thecondamaybemambaCreate a new virtual environment to install the trialibisin order tomambaFor example, execute the following commands in the terminal to complete the creation and activation of the demo virtual environment and install theibisMost basic and relevant modules + sample dataset modules:

mamba create -n ibis-demo python=3.9 -y

mamba activate ibis-demo

pip install ibis-framework[duckdb,examples]

Verify that the installation was successful:

python -c "import ibis;print(ibis.__version__)"

As you can see, we're done with theibisinstallation, the current version is9.0.0

2.2 Main functions of ibis

2.2.1 Constructing an analytical logic with portability

ibisitself does not directly perform analytical calculations, when we write the corresponding calculation logic for the target data theibisAfter the code is actually executed, its underlying layer automatically transforms the computational logic into the current computational backend. Currentlyibisbe in favor ofMore than 20 typesCompute the back-end, both of which are currentStand-alone analysisdistributed analysis (computing)The dominant framework in the field:

This makes it possible to create a system based onibisBuilt data analysis workflows that can be easily extended to apply to arbitrary computational backends without modifying the code.

For example.ibisDefault useDuckDBAs a backend for standalone analytic computing, you can use some of the data locally to build and validate the business analysis logic before applying the same code and switching the computational backend to something likeClickHouseperforms distributed operations on it, a feature that is alsoibisportabilityThe Embodiment.

As a simple example, first we basedpandasGenerate sample data with 10 million rows of records and export as aparquetFormat:

import numpy as np
import pandas as pd

# Generate test data and export to parquet format
(
    pd
    .DataFrame(
        {
            'categories': (list('abcdef'), 10000000),
            'values': (0, 100, 10000000)
        }
    )
    .assign(value = lambda df: df['value'].round(3))
    .to_parquet('')
)

Next we'll start by using the defaultDuckDBThe back end, in theibisperforms some sample operations in the

Next, we switch the computational backend topandas, executing exactly the same computational code, one can see a steep increase in computational time consumption, after all, thepandasProcessing tens of millions of rows of data is far less powerful than theDuckDB, this example exemplifiesibisPowerful code portability:

2.2.2 Pairing Python and SQL well together

ibisAnother very powerful feature is the ability to fully integrate thePythoncode andSQLcode to carry out the analysis, e.g., you can set theibisThe analysis code is directly converted toSQLStatements:

It is also possible to directly execute theSQLThe statements are analyzed:

this has led toibisIt can fully connect all kinds of state-of-the-art computing backends to help analysts easily maintain business analysis logic.

Whether you were originally usingpandasSQLneverthelessRConducting data analysis.ibisThe official website documentation all provides very user-friendly instructions to ensure that you can migrate painlessly using theibis

ibisis in a phase of rapid iterative development, and its code repository is undergoing new commit activity almost daily, so ⭐ support is welcome:/ibis-project/ibis

For more information, please move to its official website to learn more:/


This is the whole content of this article, welcome to discuss with us in the comment section~