A Unified Framework for NLP Tasks by ReLabel Method
A Unified Framework for NLP Tasks by ReLabel Method
Abstract
In industry deep learning application, our dataset has a certain number of noisy data. The init datasets are from human labeling or LLM (large language model) generation or user behavior log. To solve this problem and achieve more than 90 score in dev dataset, we present a framework to find the noisy data and relabel the noisy data, given the model predictions as references in relabeling. The process of relabeling can be done manually or using LLM for annotation. In this paper, we illustrate our idea for a broad set of deep learning tasks, includes classification, sequence tagging, object detection, sequence generation, click-through rate prediction. The dev dataset evaluation results and human evaluation results verify our idea.
Keywords
NLP, LLM
1. Introduction
In recent years, deep learning \cite{ref1} and LLM \cite{ref2} have shown significant improvement on natural language processing(NLP), computer vision and speech processing technologies. However, the model performance is limited by the dataset quality. The main reason is that the dataset has a certain number of noisy data. In this paper, we present a framework to find the noisy data and relabel the noisy data, then we further illustrate our idea for sequence tagging, object detection, sequence generation, click-through rate (CTR) prediction.
2. Method

2.1 Initial Datasets
Our initial datasets can be sourced from the following three methods:
1) Manual Annotation: Data noise in a manually annotated dataset, using a classification task as an example, occurs when there is disagreement among annotators. For instance, for 3 very similar data to-label, 2 annotators assign label-A, while 1 annotator assigns label-B.
2) LLM Generation: For datasets generated by LLM, data noise in a classification task often stems from overlapping or repetitive definitions for labels within the prompts.
3) User Behavior Logs: Datasets based on user behavior logs are constructed from user actions. For example, in an e-commerce scenario, a dataset can be built based on whether a user clicks on an item or places an order.
2.2 Find Noisy Data
We first train a model on the initial dataset. Then, we use this model to generate predictions for the entire training set. The data where the model’s prediction differs from the original ground-truth label, or where the prediction error is large, are identified as potential noise. This method allows us to flag approximately 5-15% of the data for re-annotation. This approach not only reduces manual annotation costs, but its effectiveness in identifying noisy data has also been validated by our experimental results.
2.3 Relabel Step
We perform a manual re-annotation of the noisy data. During this process, we provide the human annotators with both the original label and the model’s prediction as input information. In the era of LLM, we are now replacing this manual re-annotation with an automated process using an LLM. Similarly, we feed the LLM the same inputs: the original label and the model’s prediction.
3. Experimental Results
4. Discussion
We find noisy data by contrasting original labels with model predictions.
To correct noisy labels, LLM can be employed to relabel data, thereby reducing the scope of manual annotation. Nevertheless, human annotation remains indispensable, as relying purely on LLM to relabel would impose a performance ceiling limited by the LLM itself.
5. Conclusion
In the era of LLM, our goal is to train models for NLP tasks. To correct the noise in our initial dataset, we propose a framework that supports both a human-in-the-loop (HITL) and an LLM-in-the-loop (LITL) approach. Experimental results have validated the effectiveness of our method.
Reference
\bibitem{ref1}
Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[J]. Advances in neural information processing systems, 2012, 25: 1097-1105.
\bibitem{ref2}
Achiam J, Adler S, Agarwal S, et al. Gpt-4 technical report[J]. arXiv preprint arXiv:2303.08774, 2023.