折腾 ChatGLM-MNN

ChatGLM 本地挺慢的，在它的 readme 中找到一个 cpp 实现的，试试

操作环境：windows 11 python 3.10

经折腾发现，比 python 的还慢。

编译 mnn

git clone https://github.com/alibaba/MNN.git
cd MNN
mkdir build && cd build
cmake ..
cmake --build . -j4

cp -r ../include /path/to/ChatGLM-MNN/
cp libMNN.so /path/to/ChatGLM-MNN/libs
cp express/libMNN_Express.so /path/to/ChatGLM-MNN/libs

下载模型

cd ChatGLM-MNN/resource/models

后面两个一样，不用重复下载

int4

mkdir in4 && cd in4

wget https://github.com/wangzhaode/ChatGLM-MNN/releases/download/v0.3/glm_block_0.mnn
... 
wget https://github.com/wangzhaode/ChatGLM-MNN/releases/download/v0.3/glm_block_27.mnn 

wget https://github.com/wangzhaode/ChatGLM-MNN/releases/download/v0.1/lm.mnn 
wget https://github.com/wangzhaode/ChatGLM-MNN/releases/download/v0.1/slim_word_embeddings.bin

int8

mkdir in8 && cd in8

wget https://github.com/wangzhaode/ChatGLM-MNN/releases/download/v0.2/glm_block_0.mnn
...
wget https://github.com/wangzhaode/ChatGLM-MNN/releases/download/v0.2/glm_block_27.mnn

wget https://github.com/wangzhaode/ChatGLM-MNN/releases/download/v0.1/lm.mnn
wget https://github.com/wangzhaode/ChatGLM-MNN/releases/download/v0.1/slim_word_embeddings.bin

fp16

mkdir fp16 && cd fp16

wget https://github.com/wangzhaode/ChatGLM-MNN/releases/download/v0.1/glm_block_0.mnn
... 
wget https://github.com/wangzhaode/ChatGLM-MNN/releases/download/v0.1/glm_block_27.mnn 

wget https://github.com/wangzhaode/ChatGLM-MNN/releases/download/v0.1/lm.mnn
wget https://github.com/wangzhaode/ChatGLM-MNN/releases/download/v0.1/slim_word_embeddings.bin

编译运行

本地 msvc 编译

cd ChatGLM-MNN
mkdir build && cd build
cmake ..
cmake --build . -j4
./cli_demo # cli demo
./web_demo # web ui demo

msvc 编译需要 CMakeLists 添加

add_compile_options("$<$<CXX_COMPILER_ID:MSVC>:/source-charset:utf-8>")
add_compile_options("$<$<C_COMPILER_ID:MSVC>:/source-charset:utf-8>")

web_demo 引用了 pthread，需要 mingw 编译。。依赖的 mnn 库也要编译。。。

cmake .. -G "MinGW Makefiles"
cmake --build . -j4

生成不了 express 库

无法运行。。禁用掉 gpu

./cli_demo -g0

还是不行。。

wsl 编译

直接 wsl 中编译运行吧

mkdir build && cd build
cmake ..
cmake --build . -j4

运行

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:../libs
./web_demo -g0 -m "../resource/models/int4"

虚拟机内内存不足。。已杀死。。需要的内存真多。。大概需要 25G

添加配置，增加 wsl 虚拟内存。%UserProfile%/.wslconfig 内容

[wsl2]
swap=20480MB

占的 C 盘空间。。用完得去掉。。

慢得不得了。。。

mingw 编译

注释掉对 mnn-express 的依赖，再用 mingw 编译

出错：undefined reference to `__imp_setsockopt'
少 ws2_32 库

        target_link_libraries(web_demo chat pthread)

改成

if(WIN32)
        target_link_libraries(web_demo chat ws2_32)
else()
        target_link_libraries(web_demo chat pthread)
endif()

把 libMNN.dll 复制过来，运行

./web_demo -g0 -m "../resource/models/int4"

还是太慢了，比 ptyhon 还慢。而且 web 输进去控制台显示乱码，程序无法识别。程序输出的也是乱码。毕竟里面编码 utf8

算了，不折腾了。没用，删除。。。还是直接用网上能用的吧。

ai chatglm gpt

ChatGLM-6B 本地部署下玩玩

顺便折腾下 Stable Diffusion

0 Comments

折腾 ChatGLM-MNN

编译 mnn

下载模型

int4

int8

fp16

编译运行

本地 msvc 编译

wsl 编译

mingw 编译

ChatGLM-6B 本地部署下玩玩

顺便折腾下 Stable Diffusion

Related Articles

继续折腾下 llama.cpp + Chinese-LLaMA-Alpaca

顺便折腾下 Stable Diffusion

ChatGLM-6B 本地部署下玩玩