{"id":1499,"date":"2022-07-28T09:01:32","date_gmt":"2022-07-28T09:01:32","guid":{"rendered":"https:\/\/tagon.ai\/data-labeling-of-images-for-supervised-learning\/"},"modified":"2022-07-28T09:01:32","modified_gmt":"2022-07-28T09:01:32","slug":"data-labeling-of-images-for-supervised-learning","status":"publish","type":"post","link":"https:\/\/tagon.vn\/vi\/data-labeling-of-images-for-supervised-learning\/","title":{"rendered":"Data Labeling of Images for Supervised Learning"},"content":{"rendered":"<p>Successful machine learning applications depend on high-quality tags. In the field of Automated Visual Inspection (AVI), labels are human-provided cues to teach models:<\/p>\n<ul>\n<li>How to identify a specific class of defects of interest<\/li>\n<li>How to highlight the defect area<\/li>\n<\/ul>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_42 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" area-label=\"ez-toc-toggle-icon-1\"><label for=\"item-69f8ee7a30ac8\" aria-label=\"Table of Content\"><span style=\"display: flex;align-items: center;width: 35px;height: 30px;justify-content: center;direction:ltr;\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/label><input  type=\"checkbox\" id=\"item-69f8ee7a30ac8\"><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/tagon.vn\/vi\/data-labeling-of-images-for-supervised-learning\/#What_is_data_labeling\" title=\"What is data labeling? \">What is data labeling? <\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/tagon.vn\/vi\/data-labeling-of-images-for-supervised-learning\/#Challenges_to_Data_Labeling_in_Computer_Vision\" title=\"Challenges to Data Labeling in Computer Vision\">Challenges to Data Labeling in Computer Vision<\/a><ul class='ez-toc-list-level-3'><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/tagon.vn\/vi\/data-labeling-of-images-for-supervised-learning\/#Few_Defective_Sample_Images\" title=\"Few Defective Sample Images\">Few Defective Sample Images<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/tagon.vn\/vi\/data-labeling-of-images-for-supervised-learning\/#Inconsistent_Labeling_of_Image_Data\" title=\"Inconsistent Labeling of Image Data\">Inconsistent Labeling of Image Data<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/tagon.vn\/vi\/data-labeling-of-images-for-supervised-learning\/#Obtaining_Consistency_in_Data_Labeling\" title=\"Obtaining Consistency in Data Labeling\">Obtaining Consistency in Data Labeling<\/a><ul class='ez-toc-list-level-3'><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/tagon.vn\/vi\/data-labeling-of-images-for-supervised-learning\/#1_Create_a_Defect_book\" title=\"1. Create a Defect book\">1. Create a Defect book<\/a><ul class='ez-toc-list-level-4'><li class='ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/tagon.vn\/vi\/data-labeling-of-images-for-supervised-learning\/#Document_the_Projects_Context_and_Terminologies\" title=\"Document the Project\u2019s Context and Terminologies\">Document the Project\u2019s Context and Terminologies<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/tagon.vn\/vi\/data-labeling-of-images-for-supervised-learning\/#Specify_Each_Class_of_Defects\" title=\"Specify Each Class of Defects\">Specify Each Class of Defects<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/tagon.vn\/vi\/data-labeling-of-images-for-supervised-learning\/#Provide_Clear_Instruction_on_How_to_Label_Defects\" title=\"Provide Clear Instruction on How to Label Defects\">Provide Clear Instruction on How to Label Defects<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/tagon.vn\/vi\/data-labeling-of-images-for-supervised-learning\/#2_Establish_Defect_Labeling_Consensus\" title=\"2. Establish Defect Labeling Consensus\">2. Establish Defect Labeling Consensus<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/tagon.vn\/vi\/data-labeling-of-images-for-supervised-learning\/#3_Review_Data_Labeling_for_Quality_Assurance\" title=\"3. Review Data Labeling for Quality Assurance\">3. Review Data Labeling for Quality Assurance<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/tagon.vn\/vi\/data-labeling-of-images-for-supervised-learning\/#Successful_ML_Projects_Formalize_Data_Labeling\" title=\"Successful ML Projects Formalize Data Labeling\">Successful ML Projects Formalize Data Labeling<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"What_is_data_labeling\"><\/span><strong>What is data labeling? <\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Data labeling is the process of manually annotating content with labels or tags. We refer to the people who add these tags as annotators. In the field of computer vision, the label identifies elements within the image. The annotated data is then used in supervised learning. The labeled dataset is used to teach the model through examples. Data labeling is critical to the success of machine learning mode. Label errors can result in lower model success rates.<\/p>\n<p>Through labeling, we aim to distill the knowledge of subject matter experts (SMEs) with decades of experience in machine learning models. These models can be replicated across tens or hundreds of production lines to support the large-scale visual inspection process. The better the signals from human SMEs, the more accurate the output model will be. Depending on the application, there are different types of data identification. In an object detection task, we are not only interested in knowing the class of target objects, but also their locations. So we draw bounding boxes around the target objects on the images. There are also image classification, semantic segmentation, and instance segmentation tasks. We label with classes, segmentation maps and instance segmentation maps as shown in the image below.<\/p>\n<p><img decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/landing.ai\/wp-content\/uploads\/2021\/03\/unnamed-7.png\" \/><\/p>\n<p>AI engineers (MLEs) will team up with annotators to create labels on their datasets. To assist labelers with playing out the marking errands precisely, MLEs will set up a labeling book that gives exact depiction of the objective classes and nitty gritty guidance on the most proficient method to draw labels on images. In the automated visual inspection (AVI) domain, the labeling book is also called the defect book.<\/p>\n<div class=\"et_pb_module et_pb_text et_pb_text_3 et_pb_text_align_left et_pb_bg_layout_light\">\n<div class=\"et_pb_text_inner\">\n<h2><span class=\"ez-toc-section\" id=\"Challenges_to_Data_Labeling_in_Computer_Vision\"><\/span><strong>Challenges to Data Labeling in Computer Vision<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>From our past experience, we observed two key challenges on labeling in AVI:<\/p>\n<ol>\n<li>The number of defective samples is relatively small compared to research datasets like\u00a0<a href=\"http:\/\/image-net.org\/about-overview\" target=\"_blank\" rel=\"noopener noreferrer\">ImageNet<\/a>\u00a0and\u00a0<a href=\"https:\/\/cocodataset.org\/#home\" target=\"_blank\" rel=\"noopener noreferrer\" data-et-has-event-already=\"true\">COCO<\/a>.<\/li>\n<li>SME\u2019s judgement on defective samples is not consistent.<\/li>\n<\/ol>\n<\/div>\n<\/div>\n<div class=\"et_pb_module et_pb_text et_pb_text_4 et_pb_text_align_left et_pb_bg_layout_light\">\n<div class=\"et_pb_text_inner\">\n<h3><span class=\"ez-toc-section\" id=\"Few_Defective_Sample_Images\"><\/span><strong>Few Defective Sample Images<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Modern quality control methods have reduced the inspection line defect rate to less than 1%. For some types of rare defects, a defective pattern may only occur once in a million. As a result, only a small number of unique samples per defect class can be collected by model iterations.<\/p>\n<p>Defect classes in manufacturing are not commonly seen in daily life. Sometimes a defect is defined as \u201c3 cm long gap\u201d or \u201chair-like scratch at the top left corner reflecting light\u201d. They are much more difficult to label than a cat, a dog or a motorcycle. Typically, SMEs take months or years to develop their heuristics to detect such failures on production lines.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Inconsistent_Labeling_of_Image_Data\"><\/span>Inconsistent Labeling of Image Data<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Given the same defective pattern, different SMBs may have different opinions about the type of defect present in the image. In addition, the same SME can judge differently depending on the day or time.<\/p>\n<p>Typically, deep learning research teams handle such inconsistencies by collecting large numbers of samples from a large tag team. Misalignments are averaged over the largest data set.<\/p>\n<p>However, as we mentioned in the first issue, our sample data set is very limited.The time to train a large SME team is too expensive. So we need another method to get rid of the inconsistencies.<\/p>\n<p>Below is a sample image showing clear vs. ambiguous visual inspection errors.<\/p>\n<div style=\"width: 463px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/landing.ai\/wp-content\/uploads\/2021\/03\/unnamed-9.png\" alt=\"\" width=\"453\" height=\"255\" \/><p class=\"wp-caption-text\">Credit: image is from the Batch newsletter https:\/\/www.deeplearning.ai\/the-batch\/issue-65\/<\/p><\/div>\n<h2><span class=\"ez-toc-section\" id=\"Obtaining_Consistency_in_Data_Labeling\"><\/span><strong>Obtaining Consistency in Data Labeling<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>A process has been developed to solve the two challenges above. It includes a few key steps, which we\u2019ll cover in detail.<\/p>\n<ol>\n<li>Create a defect book<\/li>\n<li>Establish Defect Labeling Consensus<\/li>\n<li>Review Data Labeling for Quality Assurance<\/li>\n<\/ol>\n<h3><span class=\"ez-toc-section\" id=\"1_Create_a_Defect_book\"><\/span>1. Create a Defect book<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>The defect book contains a list of the most important errors and their clear definitions as well as some example images. It provides a reliable and trusted source of truth on the ground. The error book provides the reader with a precise and unambiguous description of the error. For questions like &#8220;should this area of \u200b\u200bthe image be considered defective?&#8221; or \u201cIs this model prediction correct?\u201c.<\/p>\n<p>Based on our past experience, creating an accurate and complete bug book is one of the most important prerequisites for successful AVI projects. They capture the formal definition of defects in a defect book to quickly train a new annotator to capture defects correctly.<\/p>\n<p>The process of creating a bug book is to extract all the heuristics from the minds of experienced SMEs and put them in writing. Deviations between the SME judgment and the book of defects lead to labeling errors. If the bug book is complete enough, you can train new annotators to quickly reach the SME&#8217;s level of knowledge of those defects.<\/p>\n<p>Below we describe the key elements that go into creating such a book.<\/p>\n<h4><span class=\"ez-toc-section\" id=\"Document_the_Projects_Context_and_Terminologies\"><\/span>Document the Project\u2019s Context and Terminologies<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p>First, the overview of the project background and terminologies. In our experience, many first-time practitioners ignored this step of describing the background and went straight to listing the defects. However, we have found that a detailed description of the project context and purpose improves communication with the annotators and makes them more aware of regions of interest or distinguishing between critical errors and noise.<\/p>\n<p>Most AVI projects have special foreground and background compositions or domain-specific terminologies. Help readers understand by introducing key terminology and explaining the layout of the image at the beginning of the defect book.<\/p>\n<table style=\"border-collapse: collapse; width: 100%;\">\n<tbody>\n<tr>\n<td style=\"width: 50%;\">\n<p><div style=\"width: 357px\" class=\"wp-caption alignleft\"><img loading=\"lazy\" decoding=\"async\" class=\"\" src=\"https:\/\/landing.ai\/wp-content\/uploads\/2021\/03\/unnamed-8.png\" alt=\"\" width=\"347\" height=\"220\" \/><p class=\"wp-caption-text\">Example battery inspection: explain the composition in the image.<\/p><\/div><\/td>\n<td style=\"width: 50%;\">\n<p><div style=\"width: 357px\" class=\"wp-caption alignright\"><img loading=\"lazy\" decoding=\"async\" class=\"\" src=\"https:\/\/landing.ai\/wp-content\/uploads\/2021\/03\/unnamed-5.png\" alt=\"\" width=\"347\" height=\"274\" \/><p class=\"wp-caption-text\">Example steel surface inspection: explain which area in the image the annotators needs to inspect for defects.<\/p><\/div><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h4><span class=\"ez-toc-section\" id=\"Specify_Each_Class_of_Defects\"><\/span>Specify Each Class of Defects<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<div class=\"et_pb_module et_pb_text et_pb_text_13 et_pb_text_align_left et_pb_bg_layout_light\">\n<div class=\"et_pb_text_inner\">\n<p>Each section of the defect book should provide an accurate description of a specific type of defect. Include its major visual patterns and where it may appear in an image. We find it extremely effective for understanding the defect by providing sample images that represent the majority of the defects, both the common ones as well as some edge cases.<\/p>\n<p>It is useful to include some counter-examples of images with similar patterns but are not valid defects. This helps labelers correctly determine one class of defects apart from others.<\/p>\n<p>If a defect consists of a few distinctive looks, then to avoid confusion create a few subsections to introduce them separately.<\/p>\n<\/div>\n<\/div>\n<div class=\"et_pb_module et_pb_text et_pb_text_14 et_pb_text_align_left et_pb_bg_layout_light\">\n<div class=\"et_pb_text_inner\">\n<h4><span class=\"ez-toc-section\" id=\"Provide_Clear_Instruction_on_How_to_Label_Defects\"><\/span>Provide Clear Instruction on How to Label Defects<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p>We have seen a few customers take this for granted. They start labeling without defining a clear set of labeling instructions. As a result, the labeling quality is very poor with large inconsistency among different annnotators. This problem can be avoided by defining a clear labeling book in the beginning.<\/p>\n<p>If you are drawing bounding box or segmentation labels, here are recommended best practices:<\/p>\n<ul>\n<li>Draw labels tightly around the target objects<\/li>\n<\/ul>\n<p>The models will be penalized or rewarded based on how well their predictions are matched with the labels by pixels. If you keep unnecessary margins between the labels and the objects, you will misguide the model.<\/p>\n<div style=\"width: 823px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/landing.ai\/wp-content\/uploads\/2021\/03\/unname.png\" alt=\"\" width=\"813\" height=\"378\" \/><p class=\"wp-caption-text\">Example: draw bounding boxes tightly around the objects.<\/p><\/div>\n<ul>\n<li>Label each target object individually<\/li>\n<\/ul>\n<p>You may encounter scenarios when there is a cluster of small, target defective objects close to each other. Labeling each object with individual bounding boxes will cost time and make it difficult for your model to fit with each ground truth label precisely. Instead, draw a big bounding box that covers the cluster of defective objects. Create heuristics on when to draw a single bounding box and when to draw separate bounding boxes. Keep this consistent among annotators.<\/p>\n<div style=\"width: 1307px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/landing.ai\/wp-content\/uploads\/2021\/03\/unnamed2.png\" alt=\"\" width=\"1297\" height=\"369\" \/><p class=\"wp-caption-text\">Example: draw bounding boxes for each of the defects separately.<\/p><\/div>\n<ul>\n<li>Defect Books Should be Updated Frequently<\/li>\n<\/ul>\n<p>Keep your defect book updated, so that all of your labelers will have the latest knowledge about the defects. When you have a new defect type or edge case sample, it\u2019s time to update the defect book.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"2_Establish_Defect_Labeling_Consensus\"><\/span><strong>2. Establish Defect Labeling Consensus<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>After creating a defect book, quickly test its accuracy and coverage before labeling all the data. If there\u2019s incorrect definition or edge cases not covered sufficiently in the defect book, capture these issues early. Rely on the defect consensus to evaluate whether people are aligned on their defect definitions and labeling.<\/p>\n<p>Typically you will have three people participate in a defect consensus task. We ask both the SME and new annotators to label the same set of defect samples by referring to the defect book. It helps us surface up any possible misalignments. The recommended composition is to have one SME, one labeler, and one Machine Learning Engineer (MLE) or an additional labeler. The SME will label based on their knowledge as well as the defect book. Whereas the other participants will rely entirely on the defect book\u2019s instruction, since they don\u2019t have much domain knowledge.<br \/>\nWe highly recommend having the MLE participate in this process. The MLE will get more context on the labeling rules and better understanding of the defect definitions by involvement in the defect consensus task. Later when analyzing model errors, the MLE can quickly tell if an error is due to ambiguities in the defect book. This is the most common type of error we\u2019ve seen.<\/p>\n<p>We recommend randomly picking 10 samples per defect from the entire dataset. This allows you to examine all the defect classes and cover major pattern types within each class. Then ask each participant to label these samples independently.<\/p>\n<p>Once the participants are finished, an agreement score will be calculated for each image. We can developed an internal scoring system that covers all the labeling types and offers it as a tool to all of our users. For classification labeling, the agreement will be calculated based on the class given by participants. For<a href=\"https:\/\/en.m.wikipedia.org\/wiki\/Object_detection\" target=\"_blank\" rel=\"noopener noreferrer\">\u00a0object detection<\/a>\u00a0and\u00a0<a href=\"https:\/\/www.jeremyjordan.me\/semantic-segmentation\/\" target=\"_blank\" rel=\"noopener noreferrer\">semantic segmentation<\/a>\u00a0labels, the agreement score will be calculated with both the class and region labeled by all participants.<\/p>\n<p>An overall consensus score is calculated by aggregating the agreement scores of all images. It tells you how well your participants are aligned with each other. This reflects how accurate and complete the defect book is given the sample dataset. For images that achieve very low agreement scores, discuss with SMEs the root cause of misalignment. Once you identify the source, update the corresponding section in the defect book. Add the image as an example if needed.<\/p>\n<div style=\"width: 679px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/landing.ai\/wp-content\/uploads\/2021\/03\/unnamed-1.png\" alt=\"\" width=\"669\" height=\"424\" \/><p class=\"wp-caption-text\">Illustration of how labelers drew their bounding boxes differently.<\/p><\/div>\n<p>Establishing a defect consensus is not a one-time task. Everytime the defect book is updated, or a new labeler is added to the project, do a defect consensus. This ensures your labelers reach sufficiently high alignment on their understanding of the defect book.<\/p>\n<div class=\"et_pb_module et_pb_text et_pb_text_24 et_pb_text_align_left et_pb_bg_layout_light\">\n<div class=\"et_pb_text_inner\">\n<h3><span class=\"ez-toc-section\" id=\"3_Review_Data_Labeling_for_Quality_Assurance\"><\/span><strong>3. Review Data Labeling for Quality Assurance<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Finally, you are ready to start labeling all of our data. To ensure the quality of your labels, there\u2019s usually a review process. This way only the approved images will be released to the next step. That\u2019s when model training and evaluation occurs.<\/p>\n<p>With an accurate and complete defect book, it helps you train labelers with SME\u2019s knowledge without decades of exercise. Therefore, now you can afford to have multiple labelers working on your dataset. Two or more labelers are recommended for labeling the same dataset independently. For each image, assign multiple labelers to label and only accept labels with high agreement among all labelers.<\/p>\n<p>After they finish, similar to what we did in the defect consensus, an agreement score will be calculated for each image. This is based on the class as well as the region, if available, labeled by all the participants. You can set up a minimum threshold to reject images with inconsistent labels automatically. Review the remaining images with agreement scores that are above the bar. By doing so, you can quickly review your labeled datasets and prevent any inconsistent labels leaked to the next step.<br \/>\nAfter they finish, similar to what we did in the defect consensus, an agreement score will be calculated for each image. This is based on the class as well as the region, if available, labeled by all the participants. You can set up a minimum threshold to reject images with inconsistent labels automatically. Review the remaining images with agreement scores that are above the bar. By doing so, you can quickly review your labeled datasets and prevent any inconsistent labels leaked to the next step.<\/p>\n<\/div>\n<\/div>\n<div class=\"et_pb_module et_pb_text et_pb_text_25 et_pb_text_align_left et_pb_bg_layout_light\">\n<div class=\"et_pb_text_inner\">\n<h2><span class=\"ez-toc-section\" id=\"Successful_ML_Projects_Formalize_Data_Labeling\"><\/span><strong>Successful ML Projects Formalize Data Labeling<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>At TagOn, we observed how many projects took an unnecessarily long and painful process to complete. It was due to ambiguous defect definitions or poor labeling quality. In comparison, it will make the life of machine learning engineers much easier, and the whole project lifespan much shorter, by having a dataset with high quality labels. Therefore, it is very important to invest the time in the project\u2019s early stage to clarify defect definitions and formalize labeling.<\/p>\n<p>We iterated the labeling process described above among our many projects. We formalized the defect definitions and introduced the heuristics from SMEs on how to recognize defects into the defect book. It is an important source of truth to train labelers as well as evaluating model predictions at the model iteration stage.<\/p>\n<p>With defect consensus, we can examine the accuracy and completeness of the defect book. We can identify possible misalignments on the knowledge of defect definitions between labelers and SMEs. In the final labeling step, we have multiple labelers label the same dataset and then only approve images with consistent and unambiguous labels. Once this whole process completes, the data is then ready to be used for model training and evaluation.<\/p>\n<p>Source: LandingAI<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Successful machine learning applications depend on high-quality tags. In the field of Automated Visual Inspection (AVI), labels are human-provided cues to teach models: How to identify a specific class of defects of interest How to highlight the defect area What is data labeling? Data labeling is the process of manually annotating content with labels or [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1500,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[27],"tags":[28,29,30],"class_list":["post-1499","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-solution-vi","tag-data-labeling-vi","tag-images-labeling-vi","tag-supervised-learning-vi","type-post-blog"],"acf":[],"_links":{"self":[{"href":"https:\/\/tagon.vn\/vi\/wp-json\/wp\/v2\/posts\/1499","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/tagon.vn\/vi\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/tagon.vn\/vi\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/tagon.vn\/vi\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/tagon.vn\/vi\/wp-json\/wp\/v2\/comments?post=1499"}],"version-history":[{"count":0,"href":"https:\/\/tagon.vn\/vi\/wp-json\/wp\/v2\/posts\/1499\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/tagon.vn\/vi\/wp-json\/wp\/v2\/media\/1500"}],"wp:attachment":[{"href":"https:\/\/tagon.vn\/vi\/wp-json\/wp\/v2\/media?parent=1499"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/tagon.vn\/vi\/wp-json\/wp\/v2\/categories?post=1499"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/tagon.vn\/vi\/wp-json\/wp\/v2\/tags?post=1499"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}