PAI灵骏智算构建全链路LLM服务的最佳实践.pdf

资源描述

PAI LLMContents020305010401XLABXPSPAI-TensorflowPAI-PyTorchPAI-StudioDLC DSW EASNLP/CV/千亿参数 ODLM6OFA Swin-TransformerPAIAI 9SLA 数据训练推理稳定性PAI面向LLM全链路的一站式智算平台02-Data Deduplication from Google(2022/03)-Text Deduplication from BigCode(2023/05)-The RefinedWeb for Falcon LLM(2023/06)高质量的文本输入可以获得更好的大语言模型 jieba MinHashMinHashLSH GABABGG 1.2.Power law10 Distributed union find 1.join 2.图连通分量算法示例实现样本数重复率耗时PrecisionRecallF1PAI5亿50%1h 34min879993其他实现5亿50%4h 10min859290PAI10亿50%3h 0min829990其他实现10亿50%6h 54min80908503 A general framework that helps dispatching the operators into new backends(AICompiler)and meanwhile provides new Tensor expression that swaps in eager mode.AIAn Compiler that uses the advanced optimization skills in order to support high performance codegen.Support FSDP,TP and other distribute strategies.TorchAcceleratorTorchAcceleratorTorchAccelerator基于Kube Scheduler FrameworkAIASW/DSW/PSW合适的网络架构的调度选择可以更充分的释放高性能网络的潜力04LLMEAS OPT/GPT/Bloom/GLM *模型压缩权重量化激活量化KV Cache量化系统优化编译器优化高性能算子库分布式执行张量并行流水并行Nvidia GPUAMD GPU建模主流模型高性能实现开源模型全兼容OPT-66BGPU01234A100(80GB)V100(32GB)A10(24GB)fp16int8int4OPT-66Bperplexity036912wikitext2ptbc4fp16int8int4服务吞吐提升1.73.8倍首包延迟降低8.713.8倍LLMBladeLLMModel weights/configCompressionCompilingServingUserPlatform05高性能灵骏集群带来了非常有挑战的稳定性ECC ErrorNCCL TimeoutNCCL HangPCIE降速NVLink ErrorAIMasterEasyCKPT AIMaster HangCheckpointEasyCKPT 多级存储异步并行存储最快支持秒级存储，大幅减少计算上的浪费EasyCKPTServerless PAIPAI面向LLM全链路的一站式智算平台THANKS

展开阅读全文

关于我们便捷服务自信AI AI导航抽奖活动

客服电话：0574-28810668 投诉电话：18658249818

浙公网安备33021202000488号

浙ICP备2021020529号-1 | 浙B2-20240490

关注我们：

PAI灵骏智算 构建全链路LLM服务的最佳实践.pdf

PAI灵骏智算构建全链路LLM服务的最佳实践.pdf