NET library for reading and writing Apache Parquet files , open source using the MIT protocol , github repository :/aloneguid/parquet-dotnetApache Parquet is a columnar storage format for big data. NET 4.5 and above and .NET Standard 1.4 and above, which means that it also implicitly supports all versions of .NET Core. The library runs on all versions of Windows, Linux, macOSX, and on platforms that support .NET Standard, such as mobile devices (iOS, Android) and consoles, via Maui. NET Standard-enabled platforms such as mobile devices (iOS, Android, and game consoles) via Maui.
One of the key features that makes the .NET platform more complete for Big Data applications is its support for Apache Parquet files. NET/C# platform for Big Data applications because the Parquet library is primarily available for Java, C, and Python. NET developers with a powerful tool for working with Parquet files. NET ecosystem, helping developers efficiently process and store data.
Provides both low-level APIs and high-level APIs, allowing users to perform flexible operations as needed. In addition, it also provides a row-based API, making it more intuitive and convenient to deal with complex data structures. Supports dynamic schema and can automatically serialize C# classes into Parquet files without the need to write tedious code. Used by many small and large organizations around the world.Official public NuGet statisticsIt has been shown thatAzure Machine Learningcap (a poem) is using it, both are great, but there are plenty of other users out there.
Parquet is a columnar storage format designed to provide efficient storage and retrieval capabilities and is widely used in big data processing frameworks such as Apache Spark.Parquet supports advanced compression and encoding schemes to optimize storage space and increase read speed. As of2024 year, is the world'sThe fastest Parquet libraryNET runtime, not only in comparison to the .NET runtime, but also to all platforms.
The advanced APIs provided specifically include the following features:
columnar storage: Parquet is a columnar storage format, which means that data is stored in columns rather than rows. This type of storage can significantly improve the efficiency of big data processing and analysis.
Efficient data reading: With its columnar storage structure, Parquet achieves high data readability, especially when dealing with large data sets.
Low-Level API Usage: There is also a low-level API, which is the most similar and high-performance approach to the Parquet data structure. While this approach is not as intuitive as the other high-level APIs, it requires the user to have some knowledge of the Parquet data structure and to define the schema before using it.
Currently, the latest version is 4.25.0, which can be installed in Visual Studio via the NuGet package manager.