About RLHF need · InternLM/xtuner

(3 comments) (1 reaction) (1 assignee)Python (424 forks)github user discovery

feature requestgood first issue

Repository metrics

Stars: (5,148 stars)
PR merge metrics: (Avg merge 2d 22h) (65 merged PRs in 30d)

Description

需要实现几种对齐算法 1.PPO 这个没的说，比较传统和通用，但是训练的开销会大一点 2. RAFT LMFLOW社区有做 https://optimalscale.github.io/LMFlow/examples/raft.html 3.pangu-coder2 RRTF (Rank Responses to align Test&Teacher Feedback) 总结一下是说，他们是用了代码单元测试，然后把单元测试的结果作为标签合并Loss微调LLM https://arxiv.org/abs/2307.14936 RRTF华为他们这部分没有开源。RAFT是开源了，RRTF可以的话可以一起讨论一起实现一下。

Contributor guide

Research direction: Study the referenced algorithms (PPO, RAFT, RRTF) and find open source implementations. For RRTF, analyze the paper and discuss with the community. Consider starting with RAFT as it has an open source reference.
Tech stack: python
Domain: machine learning
Issue type: Feature
Difficulty: 4
Estimated time: Over 1 week
Activity status: Active
Clarity: Clear
Prerequisites: Python
Newbie friendliness: 20

Repository metrics

Description

Contributor guide

Get fresh easy issues in your inbox.