大模型_SuperSonic项目本地实战调试笔记
一、相关工程
https://github.com/tencentmusic/supersonic.git
二、环境配置
本地运行OLLAMA并配置大模型
Model Name:qwen:0.5b
Base URL:http://127.0.0.1:11434/
配置OLLAMA运行的Embedding向量模型
Model Name:mofanke/dmeta-embedding-zh:latest
三、调试解读
1.周期调度加载HanLP词典
com.tencent.supersonic.headless.server.service.impl.DictTaskServiceImpl#dailyDictTask
com.tencent.supersonic.headless.server.task.DictionaryReloadTask#reloadKnowledge
2.周期调度加载Embedding向量量库
com.tencent.supersonic.headless.server.task.MetaEmbeddingTask#reloadMetaEmbedding
3.对话框指标维度中文名自动检索补全
com.tencent.supersonic.chat.server.rest.ChatQueryController#search
- 自然语言分词
- 基于HanLP词典前后缀识别联想关键词

4.自然语言解析NL2DSL2SQL
com.tencent.supersonic.chat.server.rest.ChatQueryController#parse

关键词匹配Mapper
- 基于Embedding向量知识库EmbeddingMapper语义识别
- 基于HanLP词典HanlpDictMapper前后缀识别
- 基于汉明距离FuzzyNameMapper模糊识别
DSL生成Parser
- 基于LLM和‘范例+规则+元数据关键词’提示的LLMSqlParser
- 基于查询模式规则匹配和元数据关键词的RuleSqlParser
SQL语法修正Corrector
- 元数据结构完整性修正的SchemaCorrector
- 时间过滤条件修正的TimeCorrector
- select、where、groupby、having子句预发修正的GrammarCorrector

规则化生成的请求LLM的prompt范例:
1 | Prompt { text = " |
5.执行SQL获取数据查询结果
com.tencent.supersonic.chat.server.rest.ChatQueryController#execute
四、改造适配
使用本地mysql
替换h2内存数据库为mysql,避免每次重启配置刷新
1 | mysql -u root -p supersonic_v1 < /Users/jiazhengyang3/Desktop/schema-mysql.sql |
换向量模型
可以在前端页面编辑保存,也可以直接编辑mysql数据库中s2_system_config表的parameters字段。通过debug接口/api/semantic/knowledge/meta/embedding/reload验证在加载向量库数据时是不是使用了变更配置的向量模型mofanke/dmeta-embedding-zh:latest。
1 | [{"candidateValues":["OPEN_AI","AZURE","OLLAMA","QIANFAN","ZHIPU","LOCAL_AI","DASHSCOPE"],"comment":"接口协议","dataType":"list","defaultValue":"OPEN_AI","description":"","module":"对话模型配置","name":"s2.chat.model.provider","value":"OLLAMA"},{"comment":"BaseUrl","dataType":"string","defaultValue":"https://api.openai.com/v1","dependencies":[{"name":"s2.chat.model.provider","setDefaultValue":{"OPEN_AI":"https://api.openai.com/v1","AZURE":"https://your-resource-name.openai.azure.com/","OLLAMA":"http://localhost:11434","QIANFAN":"https://aip.baidubce.com","ZHIPU":"https://open.bigmodel.cn/","LOCAL_AI":"http://localhost:8080","DASHSCOPE":"https://dashscope.aliyuncs.com/api/v1"},"show":{"includesValue":["OPEN_AI","AZURE","OLLAMA","QIANFAN","ZHIPU","LOCAL_AI","DASHSCOPE"]}}],"description":"","module":"对话模型配置","name":"s2.chat.model.base.url","value":"http://localhost:11434"},{"comment":"Endpoint","dataType":"string","defaultValue":"llama_2_70b","dependencies":[{"name":"s2.chat.model.provider","setDefaultValue":{"QIANFAN":"llama_2_70b"},"show":{"includesValue":["QIANFAN"]}}],"description":"","module":"对话模型配置","name":"s2.chat.model.endpoint","value":"llama_2_70b"},{"comment":"ApiKey","dataType":"password","defaultValue":"demo","dependencies":[{"name":"s2.chat.model.provider","setDefaultValue":{"OPEN_AI":"demo","QIANFAN":"demo","ZHIPU":"demo","LOCAL_AI":"demo","AZURE":"demo","DASHSCOPE":"demo"},"show":{"includesValue":["OPEN_AI","QIANFAN","ZHIPU","LOCAL_AI","AZURE","DASHSCOPE"]}}],"description":"","module":"对话模型配置","name":"s2.chat.model.api.key","value":"demo"},{"comment":"SecretKey","dataType":"password","defaultValue":"demo","dependencies":[{"name":"s2.chat.model.provider","setDefaultValue":{"QIANFAN":"demo"},"show":{"includesValue":["QIANFAN"]}}],"description":"","module":"对话模型配置","name":"s2.chat.model.secretKey","value":"demo"},{"comment":"ModelName","dataType":"string","defaultValue":"gpt-3.5-turbo","dependencies":[{"name":"s2.chat.model.provider","setDefaultValue":{"OPEN_AI":"gpt-3.5-turbo","OLLAMA":"qwen:0.5b","QIANFAN":"Llama-2-70b-chat","ZHIPU":"glm-4","LOCAL_AI":"ggml-gpt4all-j","AZURE":"gpt-35-turbo","DASHSCOPE":"qwen-plus"},"show":{"includesValue":["OPEN_AI","AZURE","OLLAMA","QIANFAN","ZHIPU","LOCAL_AI","DASHSCOPE"]}}],"description":"","module":"对话模型配置","name":"s2.chat.model.name","value":"qwen:0.5b"},{"comment":"是否启用搜索增强功能,设为false表示不启用","dataType":"bool","defaultValue":"false","dependencies":[{"name":"s2.chat.model.provider","setDefaultValue":{"DASHSCOPE":"false"},"show":{"includesValue":["DASHSCOPE"]}}],"description":"","module":"对话模型配置","name":"s2.chat.model.enableSearch","value":"false"},{"comment":"Temperature","dataType":"slider","defaultValue":"0.0","description":"","module":"对话模型配置","name":"s2.chat.model.temperature","value":"0.0"},{"comment":"超时时间(秒)","dataType":"number","defaultValue":"60","description":"","module":"对话模型配置","name":"s2.chat.model.timeout","value":"60"},{"candidateValues":["IN_MEMORY","OPEN_AI","OLLAMA","AZURE","DASHSCOPE","QIANFAN","ZHIPU"],"comment":"接口协议","dataType":"list","defaultValue":"IN_MEMORY","description":"","module":"向量模型配置","name":"s2.embedding.model.provider","value":"OLLAMA"},{"comment":"BaseUrl","dataType":"string","defaultValue":"","dependencies":[{"name":"s2.embedding.model.provider","setDefaultValue":{"OPEN_AI":"https://api.openai.com/v1","OLLAMA":"http://localhost:11434","AZURE":"https://your-resource-name.openai.azure.com/","DASHSCOPE":"https://dashscope.aliyuncs.com/api/v1","QIANFAN":"https://aip.baidubce.com","ZHIPU":"https://open.bigmodel.cn/"},"show":{"includesValue":["OPEN_AI","OLLAMA","AZURE","DASHSCOPE","QIANFAN","ZHIPU"]}}],"description":"","module":"向量模型配置","name":"s2.embedding.model.base.url","value":"http://127.0.0.1:11434"},{"comment":"ApiKey","dataType":"password","defaultValue":"","dependencies":[{"name":"s2.embedding.model.provider","setDefaultValue":{"OPEN_AI":"demo","AZURE":"demo","DASHSCOPE":"demo","QIANFAN":"demo","ZHIPU":"demo"},"show":{"includesValue":["OPEN_AI","AZURE","DASHSCOPE","QIANFAN","ZHIPU"]}}],"description":"","module":"向量模型配置","name":"s2.embedding.model.api.key","value":""},{"comment":"SecretKey","dataType":"password","defaultValue":"demo","dependencies":[{"name":"s2.embedding.model.provider","setDefaultValue":{"QIANFAN":"demo"},"show":{"includesValue":["QIANFAN"]}}],"description":"","module":"向量模型配置","name":"s2.embedding.model.secretKey","value":"demo"},{"comment":"ModelName","dataType":"string","defaultValue":"bge-small-zh","dependencies":[{"name":"s2.embedding.model.provider","setDefaultValue":{"IN_MEMORY":"bge-small-zh","OPEN_AI":"text-embedding-ada-002","OLLAMA":"all-minilm","AZURE":"text-embedding-ada-002","DASHSCOPE":"text-embedding-v2","QIANFAN":"Embedding-V1","ZHIPU":"embedding-2"},"show":{"includesValue":["IN_MEMORY","OPEN_AI","OLLAMA","AZURE","DASHSCOPE","QIANFAN","ZHIPU"]}}],"description":"","module":"向量模型配置","name":"s2.embedding.model.name","value":"mofanke/dmeta-embedding-zh:latest"},{"comment":"模型路径","dataType":"string","defaultValue":"","dependencies":[{"name":"s2.embedding.model.provider","setDefaultValue":{"IN_MEMORY":""},"show":{"includesValue":["IN_MEMORY"]}}],"description":"","module":"向量模型配置","name":"s2.embedding.model.path","value":""},{"comment":"词汇表路径","dataType":"string","defaultValue":"","dependencies":[{"name":"s2.embedding.model.provider","setDefaultValue":{"IN_MEMORY":""},"show":{"includesValue":["IN_MEMORY"]}}],"description":"","module":"向量模型配置","name":"s2.embedding.model.vocabulary.path","value":""},{"candidateValues":["IN_MEMORY","MILVUS","CHROMA"],"comment":"向量库类型","dataType":"list","defaultValue":"IN_MEMORY","description":"目前支持三种类型:IN_MEMORY、MILVUS、CHROMA","module":"向量库配置","name":"s2.embedding.store.provider","value":"IN_MEMORY"},{"comment":"BaseUrl","dataType":"string","defaultValue":"","dependencies":[{"name":"s2.embedding.store.provider","setDefaultValue":{"MILVUS":"http://localhost:19530","CHROMA":"http://localhost:8000"},"show":{"includesValue":["MILVUS","CHROMA"]}}],"description":"","module":"向量库配置","name":"s2.embedding.store.base.url","value":""},{"comment":"ApiKey","dataType":"password","defaultValue":"","dependencies":[{"name":"s2.embedding.store.provider","setDefaultValue":{"MILVUS":"demo"},"show":{"includesValue":["MILVUS"]}}],"description":"","module":"向量库配置","name":"s2.embedding.store.api.key","value":""},{"comment":"DatabaseName","dataType":"string","defaultValue":"","dependencies":[{"name":"s2.embedding.store.provider","setDefaultValue":{"MILVUS":""},"show":{"includesValue":["MILVUS"]}}],"description":"","module":"向量库配置","name":"s2.embedding.store.databaseName","value":""},{"comment":"持久化路径","dataType":"string","defaultValue":"","dependencies":[{"name":"s2.embedding.store.provider","setDefaultValue":{"IN_MEMORY":""},"show":{"includesValue":["IN_MEMORY"]}}],"description":"默认不持久化,如需持久化请填写持久化路径。注意:如果变更了向量模型需删除该路径下已保存的文件或修改持久化路径","module":"向量库配置","name":"s2.embedding.store.persist.path","value":""},{"comment":"超时时间(秒)","dataType":"number","defaultValue":"60","description":"","module":"向量库配置","name":"s2.embedding.store.timeout","value":"60"},{"comment":"纬度","dataType":"number","defaultValue":"","dependencies":[{"name":"s2.embedding.store.provider","setDefaultValue":{"MILVUS":"384"},"show":{"includesValue":["MILVUS"]}}],"description":"","module":"向量库配置","name":"s2.embedding.store.dimension","value":""},{"comment":"是否将Mapper探测识别到的维度值提供给大模型","dataType":"bool","defaultValue":"true","description":"为了数据安全考虑, 这里可进行开关选择","module":"Parser相关配置","name":"s2.parser.linking.value.enable","value":"true"},{"comment":"few-shot样例个数","dataType":"number","defaultValue":"3","description":"样例越多效果可能越好,但token消耗越大","module":"Parser相关配置","name":"s2.parser.few-shot.number","value":"5"},{"comment":"self-consistency执行个数","dataType":"number","defaultValue":"1","description":"执行越多效果可能越好,但token消耗越大","module":"Parser相关配置","name":"s2.parser.self-consistency.number","value":"1"},{"comment":"解析结果展示个数","dataType":"number","defaultValue":"3","description":"前端展示的解析个数","module":"Parser相关配置","name":"s2.parser.show.count","value":"3"}] |
换向量库
提交代码增加pgvector向量库选择https://github.com/tencentmusic/supersonic/pull/1800。可以在前端页面编辑选择pgvector向量库,也可以如换向量模型直接编辑mysql数据库中s2_system_config表的parameters字段。

换大数据库
服务器借助docker单点部署doris集群:
1 | docker run -d -it --name=doris -p 9030:9030 -p 8030:8030 apache/doris:build-env-ldb-toolchain-latest /bin/bash |