Open-source incentive program, Skywork-Reward-V2, blazes a trail with innovative approaches for compensation in the open-source community.

Rebranding effort intends to redefine community collaboration through openness and advancements.

, and Administrator

2025 July 7 . 12:47 PM

2 min read

Open-Source Rewards V2 Paves a New Way for Monetary Compensation in the Open-Source Community

Open-source incentive program, Skywork-Reward-V2, blazes a trail with innovative approaches for compensation in the open-source community.

In a groundbreaking development, a new collaborative data pipeline has significantly improved the quality and performance of open-source reward models. This pipeline, named "human-machine collaboration, two-stage iteration," was used in the creation of Skywork-SynPref-40M, the largest preference hybrid dataset to date, containing 40 million preference sample pairs.

The pipeline begins with Stage 1: Human-Guided Small-Scale High-Quality Preference Construction. This stage involves creating an unverified initial preference pool, with Large Language Models (LLMs) generating auxiliary preference-related attributes like task type, objectivity, and controversy. Human annotators then apply a strict verification protocol, assisted by external tools and advanced LLMs, to review partial data. This results in a small but high-quality "gold standard" dataset, serving as a reliable foundation for subsequent data generation and model evaluation.

Stage 2: Large-Scale Data Expansion and Iterative Optimization follows. Using the gold standard preference labels as guidance, LLMs generate a large volume of high-quality "silver standard" data, achieving substantial data scale expansion. Multiple rounds of iteration follow, with reward models being trained and their weaknesses identified based on gold standard data performance. Similar samples are retrieved, and multi-model consensus mechanisms automatically annotate these, further expanding and enriching the silver standard dataset. This iterative human-machine closed-loop continuously enhances the reward model’s ability to understand and discriminate nuanced human preferences.

The impact of this pipeline is profound. It addresses the fragility of reward models caused by limitations in existing datasets, such as limited coverage, mechanical labeling, and insufficient quality control. By combining strict human verification with scalable machine augmentation and iterative refinement, the pipeline ensures large-scale data that is both diverse and of high quality. Consequently, reward models trained on Skywork-SynPref-40M show improved robustness and effectiveness in modeling complex, nuanced human preferences, especially in open-ended and subjective tasks.

The Skywork-Reward-V2 series, consisting of eight reward models based on Qwen3 and LLaMA3 series models, with parameters ranging from 600 million to 8 billion, has benefited greatly from this pipeline. The smallest model, Skywork-Reward-V2-Qwen3-0.6B, achieves overall performance nearly matching the previous generation's strongest model on average. Remarkably, training an 8B-scale model using only 1.8% of the high-quality data from Skywork-SynPref-40M already exceeded the performance of current 70B-level SOTA reward models.

Skywork-Reward-V2 series has achieved top rankings across seven major mainstream reward model evaluation benchmarks. The technical report for Skywork-Reward-V2 can be found at arxiv.org/abs/2507.01352.

Download links for Skywork-Reward-V2 are available on HuggingFace and GitHub.

Data-and-cloud-computing was utilized to store and process the large-scale preference data generated by the pipeline, enhancing the efficiency of the two-stage process. The novel pipeline, integrating technology such as artificial-intelligence and human annotation, significantly improved the performance and quality of reward models, yielding models that demonstrated robustness and effectiveness in modeling complex human preferences.

Latest

Carbon Research Technology Foundation announces new chief executive officer in June

All about technology.

Carbon Research Technology Foundation designates a new head for their organization: the recent appointment of a new CEO.

Research Foundation for Carbon Technology announces appointment of Maria-Krystyna Duval as CEO

, and Administrator

2025 July 7

Investment Secured: Wisdom Locks bags $14 Million; Announces New Chief Executive Officer Jenna...

All about technology.

Company Secures $14 Million Investment; New CEO, Jenna Wells, Takes the Helm

Investigative platform Supply Wisdom successfully raises $14 million in Series B financing, overseen by Jurassic Capital.

, and Administrator

2025 July 7

Various AI Technologies Deployed in Healthcare Sector

All about technology.

AI Applications in Medical Field: An Overview

AI Applications in Modern Healthcare Present a Mixed Bag of Advantages and Obstacles, Ranging from Medical Records Management to Patient Surveillance.

, and Administrator

2025 July 7