m4 mac mini local deployment of ComfyUI, test Flux-dev-GGUF workflow model 10-step out of the map, test AI drawing performance, based on MPS (fp16), the advantages of small energy consumption and silent

m4 mac mini has been released for a period of time, for this product, more about the cost-effective discussion, if you put aside a variety of subsidies regardless of the price and the previous release of the mini in fact, the difference is not very big, really want to talk about the cost-effective, a variety of windows system of the mini host of the price is in fact hanging Apple.

This time we do a test for the AI performance of m4 mac mini, using the most widely used AI workflow software:ComfyUI framework, based on MPS(fp16) mode.

Mac Os Local Deployment of ComfyUI

First of all, make sure that the machine has been installed based on the arm architecture of Python3.11, the reason for using Python3.11, because this version of the performance of certain optimizations, and will not be like the latest version of 3.13 due to the new version, triggered by the dependencies can not be loaded problem.

The Mac version of the Python 3.11 installer can be downloaded at.

Subsequent cloning of the official program.

git clone /comfyanonymous/

Next, install the MPS version of torch

pip3 install --pre torch torchvision torchaudio --extra-index-url /whl/nightly/cpu

Then install the dependencies:

pip3 install -r

After the dependency is installed, you need to upgrade your SSL certificate:

bash /Applications/Python*/Install\

Next, install ComfyUI's Manager project, which is used to install various nodes:

cd custom_nodes  
git clone /ltdrdata/

At this point the ComfyUI project is deployed.

Flux-dev-GGUF Model Download

Download the required flux-dev model. Since the official model is too big (23G), here we download the quantized version of GGUF:.

/s/2907b57697fe

The model names are :flux1-dev-Q4_1.gguf and t5-v1_1-xxl-encoder-Q5_K_M.gguf, which are placed in the UNET directory and the clip directory of the models, respectively.

Afterwards, go back to the root directory of the project and enter the command to start the ComfyUI service:

python3  --force-fp16

Here, forcing the use of fp16 precision is used to improve performance.

The program returns.

liuyue@mini ComfyUI % python3  --force-fp16  
[START] Security scan  
[DONE] Security scan  
## ComfyUI-Manager: installing dependencies done.  
** ComfyUI startup time: 2024-12-08 23:04:08.464703  
** Platform: Darwin  
** Python version: 3.11.9 (v3.11.9:de54cf5be3, Apr  2 2024, 07:12:50) [Clang 13.0.0 (clang-1300.0.29.30)]  
** Python executable: /Library/Frameworks//Versions/3.11/bin/python3  
** ComfyUI Path: /Volumes/ssd/work/ComfyUI  
** Log path: /Volumes/ssd/work/ComfyUI/  
  
Prestartup times for custom nodes:  
   0.7 seconds: /Volumes/ssd/work/ComfyUI/custom_nodes/ComfyUI-Manager  
  
Total VRAM 24576 MB, total RAM 24576 MB  
pytorch version: 2.5.1  
Forcing FP16.  
Set vram state to: SHARED  
Device: mps  
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --use-split-cross-attention  
[Prompt Server] web root: /Volumes/ssd/work/ComfyUI/web  
### Loading: ComfyUI-Manager (V2.51.9)  
### ComfyUI Revision: 2859 [b4526d3f] | Released on '2024-11-24'  
[ComfyUI-Manager] default cache updated: /ltdrdata/ComfyUI-Manager/main/  
[ComfyUI-Manager] default cache updated: /ltdrdata/ComfyUI-Manager/main/  
[ComfyUI-Manager] default cache updated: /ltdrdata/ComfyUI-Manager/main/  
[ComfyUI-Manager] default cache updated: /ltdrdata/ComfyUI-Manager/main/  
Torch version 2.5.1 has not been tested with coremltools. You may run into unexpected errors. Torch 2.4.0 is the most recent version that has been tested.  
[ComfyUI-Manager] default cache updated: /ltdrdata/ComfyUI-Manager/main/  
  
Import times for custom nodes:  
   0.0 seconds: /Volumes/ssd/work/ComfyUI/custom_nodes/websocket_image_save.py  
   0.0 seconds: /Volumes/ssd/work/ComfyUI/custom_nodes/ComfyUI-GGUF  
   0.1 seconds: /Volumes/ssd/work/ComfyUI/custom_nodes/ComfyUI-Manager  
   2.2 seconds: /Volumes/ssd/work/ComfyUI/custom_nodes/ComfyUI-MLX  
  
Starting server  
  
To see the GUI go to: http://127.0.0.1:8188

Represents a successful deployment. access:http://127.0.0.1:8188

Testing the Flux-dev-GGUF Workflow

Download the GGUF-based workflow:

/flux-gguf/

After importing the workflow, enter the prompt word.

a super sexy gal holding a sign that says "ComfyUI Mac"

Meaning sexy woman holding up a sign that says ComfyUI Mac

At this point, the workflow can be executed directly and the program returned:

ggml_sd_loader:  
 13                            144  
 0                              50  
 14                             25  
Requested to load FluxClipModel_  
Loading 1 new model  
loaded completely 0.0 323.94775390625 True  
Requested to load FluxClipModel_  
Loading 1 new model  
  
ggml_sd_loader:  
 1                             476  
 3                             304  
model weight dtype torch.bfloat16, manual cast: None  
model_type FLUX  
Requested to load Flux  
Loading 1 new model  
loaded completely 0.0 7181.8848876953125 True  
 20%|██████████████████▌                                                                          | 2/10 [01:04<04:18, 32.27s/it]

Iterations stabilize at around 30 per second, and a graph will probably take around 3-5 minutes.

The configuration of the author's m4 mac mini is the beggar's edition upgraded to 24G RAM, and in the process of coming out of the picture, the activity monitor shows that the memory is not occupied:

As you can see, only 21G of memory, a netizen using a purely beggar version of the 16G memory mini test, 16g memory actually shaved off the system occupancy, idle up to 10g, beyond the part can only run the virtual memory of the SSD, resulting in the GPU to run dissatisfied, so beggar version of the 16G memory is likely to lead to a reduction in the efficiency of the map.

Finally, the 10-step iteration is out of the picture:

As you can see, the accuracy hasn't dropped too much, and the main problem is still the slow output speed.

concluding remarks

The AI ecology of the m4 mac mini still has a lot of room for improvement, it is recommended that AI practitioners be careful to buy, if you must buy, you also need to avoid the version of the 16G memory, because if the model size is too large, the 16G memory in the real can only be used in fact only 10G memory, which may lead to a reduction in the efficiency of the model inference, of course, we can not ignore the m4 mac mini reasoning models Advantages, that is, the energy consumption is small relative power saving, and the use of the sound is very small, unlike the N card device does not move the mountain whistling.