{"id":35459,"date":"2025-02-22T02:27:24","date_gmt":"2025-02-22T01:27:24","guid":{"rendered":"https:\/\/kinit.sk\/overshoot-new-momentum-based-method-for-model-training\/"},"modified":"2025-02-22T02:29:16","modified_gmt":"2025-02-22T01:29:16","slug":"overshoot-new-momentum-based-method-for-model-training","status":"publish","type":"post","link":"https:\/\/kinit.sk\/sk\/overshoot-new-momentum-based-method-for-model-training\/","title":{"rendered":"Overshoot: Speeding up model training with a new momentum-based method"},"content":{"rendered":"<div id=\"\" class=\"element core-paragraph\">\n<p>In machine learning, models are trained in steps. During a step, the model is shown a bunch of examples (training data) and then an update to the model is made &#8211; so the model performs better on such data in the future. An update step can be imagined as a shift in a multidimensional space (one model parameter = one dimension). The hope is, that over many steps, the model arrives at such parameter configuration that it performs the given task well.<\/p>\n<\/div>\n\n<div id=\"\" class=\"element core-paragraph\">\n<p>This basic principle works just fine, but since the training of models consumes resources, people enhanced it with the idea of momentum. In momentum-based training, models are updated with a weighted sum of several recent steps, not just the last step. This allows the model to move faster through those parts of the space, where consecutive steps point roughly in the same direction, saving some training steps. It also helps the model to overcome bumps along the way (local optima).<\/p>\n<\/div>\n\n<div id=\"\" class=\"element core-paragraph\">\n<p>Our new <strong>Overshoot <\/strong>method tries to improve on the idea of momentum (see figure below). In classical momentum, some of the past steps (dark grey) may be placed away from the current position of the model (blue). Overshoot mitigates this problem by computing updates (gradients) against models shifted gamma-times further in the direction of the last update (red). This leads to better placement of the past steps vis-a-vis the current position, hence better informed next step. Overshoot can be used in combination with existing momentum methods like ADAM.<\/p>\n<\/div>\n\n<div class=\"wp-block-image\"><div id=\"\" class=\"element core-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" width=\"1024\" height=\"665\" data-src=\"https:\/\/kinit.sk\/wp-content\/uploads\/2025\/02\/overshoot-momentum-figure-1024x665.png\" alt=\"\" class=\"wp-image-35450 lazyload\" data-srcset=\"https:\/\/kinit.sk\/wp-content\/uploads\/2025\/02\/overshoot-momentum-figure-1024x665.png 1024w, https:\/\/kinit.sk\/wp-content\/uploads\/2025\/02\/overshoot-momentum-figure-300x195.png 300w, https:\/\/kinit.sk\/wp-content\/uploads\/2025\/02\/overshoot-momentum-figure-768x499.png 768w, https:\/\/kinit.sk\/wp-content\/uploads\/2025\/02\/overshoot-momentum-figure.png 1141w\" data-sizes=\"(max-width: 1024px) 100vw, 1024px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 1024px; --smush-placeholder-aspect-ratio: 1024\/665;\" \/><\/figure>\n<\/div><\/div>\n\n<div id=\"\" class=\"element core-paragraph\">\n<p>We tested the idea on multiple neural architectures and tasks. We observed that Overshoot enhancement, either combined with ADAM or SGD CM (classical momentum), regularly speeds up the convergence, saving between 15% to 25% of the training steps required to reach a 95% loss reduction threshold. With only minimal computational and no memory overhead, this makes Overshoot potentially beneficial for any large-scale training efforts.<\/p>\n<\/div>\n\n<div id=\"\" class=\"element core-paragraph\">\n<p>For more details on Overshoot and its evaluation, see <a href=\"https:\/\/arxiv.org\/pdf\/2501.09556\"><strong>our paper<\/strong><\/a> or <a href=\"https:\/\/github.com\/kinit-sk\/overshoot\"><strong>try using Overshoot<\/strong><\/a> yourself.<\/p>\n<\/div>\n\n<div id=\"\" class=\"element core-paragraph\">\n<p>As of January 2025, this work is far from over. Further variants of the method will be tested in the future. A broader evaluation is ahead of us as well.<\/p>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>In machine learning, models are trained in steps. During a step, the model is shown a bunch of examples (training data) and then an update to the model is made&#8230;<\/p>\n","protected":false},"author":34,"featured_media":25091,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[67,83,88,520],"tags":[],"class_list":["post-35459","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized-sk","category-news-sk","category-pop-science-sk","category-2025-sk"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Overshoot: Speeding up model training with a new momentum-based method - KInIT<\/title>\n<meta name=\"description\" content=\"Overshoot enhancement, either combined with ADAM or SGD CM, regularly speeds up the convergence, saving between...\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/kinit.sk\/sk\/overshoot-new-momentum-based-method-for-model-training\/\" \/>\n<meta property=\"og:locale\" content=\"sk_SK\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Overshoot: Speeding up model training with a new momentum-based method - KInIT\" \/>\n<meta property=\"og:description\" content=\"Overshoot enhancement, either combined with ADAM or SGD CM, regularly speeds up the convergence, saving between...\" \/>\n<meta property=\"og:url\" content=\"https:\/\/kinit.sk\/sk\/overshoot-new-momentum-based-method-for-model-training\/\" \/>\n<meta property=\"og:site_name\" content=\"KInIT\" \/>\n<meta property=\"article:published_time\" content=\"2025-02-22T01:27:24+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-02-22T01:29:16+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/kinit.sk\/wp-content\/uploads\/2023\/02\/Web_news_feature_general_5.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1800\" \/>\n\t<meta property=\"og:image:height\" content=\"942\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Wanda Pribylincova\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@kinit\" \/>\n<meta name=\"twitter:site\" content=\"@kinit\" \/>\n<meta name=\"twitter:label1\" content=\"Autor\" \/>\n\t<meta name=\"twitter:data1\" content=\"Wanda Pribylincova\" \/>\n\t<meta name=\"twitter:label2\" content=\"Predpokladan\u00fd \u010das \u010d\u00edtania\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 min\u00faty\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/overshoot-new-momentum-based-method-for-model-training\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/overshoot-new-momentum-based-method-for-model-training\\\/\"},\"author\":{\"name\":\"Wanda Pribylincova\",\"@id\":\"https:\\\/\\\/kinit.sk\\\/#\\\/schema\\\/person\\\/64db52a830dcb6d4df386e78e7eb748b\"},\"headline\":\"Overshoot: Speeding up model training with a new momentum-based method\",\"datePublished\":\"2025-02-22T01:27:24+00:00\",\"dateModified\":\"2025-02-22T01:29:16+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/overshoot-new-momentum-based-method-for-model-training\\\/\"},\"wordCount\":365,\"image\":{\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/overshoot-new-momentum-based-method-for-model-training\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/kinit.sk\\\/wp-content\\\/uploads\\\/2023\\\/02\\\/Web_news_feature_general_5.png\",\"articleSection\":[\"Uncategorized @sk\",\"News\",\"Pop science\",\"2025\"],\"inLanguage\":\"sk-SK\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/overshoot-new-momentum-based-method-for-model-training\\\/\",\"url\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/overshoot-new-momentum-based-method-for-model-training\\\/\",\"name\":\"Overshoot: Speeding up model training with a new momentum-based method - KInIT\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/kinit.sk\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/overshoot-new-momentum-based-method-for-model-training\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/overshoot-new-momentum-based-method-for-model-training\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/kinit.sk\\\/wp-content\\\/uploads\\\/2023\\\/02\\\/Web_news_feature_general_5.png\",\"datePublished\":\"2025-02-22T01:27:24+00:00\",\"dateModified\":\"2025-02-22T01:29:16+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/kinit.sk\\\/#\\\/schema\\\/person\\\/64db52a830dcb6d4df386e78e7eb748b\"},\"description\":\"Overshoot enhancement, either combined with ADAM or SGD CM, regularly speeds up the convergence, saving between...\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/overshoot-new-momentum-based-method-for-model-training\\\/#breadcrumb\"},\"inLanguage\":\"sk-SK\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/kinit.sk\\\/sk\\\/overshoot-new-momentum-based-method-for-model-training\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"sk-SK\",\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/overshoot-new-momentum-based-method-for-model-training\\\/#primaryimage\",\"url\":\"https:\\\/\\\/kinit.sk\\\/wp-content\\\/uploads\\\/2023\\\/02\\\/Web_news_feature_general_5.png\",\"contentUrl\":\"https:\\\/\\\/kinit.sk\\\/wp-content\\\/uploads\\\/2023\\\/02\\\/Web_news_feature_general_5.png\",\"width\":1800,\"height\":942},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/overshoot-new-momentum-based-method-for-model-training\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/kinit.sk\\\/sk\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"News\",\"item\":\"https:\\\/\\\/kinit.sk\\\/category\\\/news\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Overshoot: Speeding up model training with a new momentum-based method\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/kinit.sk\\\/#website\",\"url\":\"https:\\\/\\\/kinit.sk\\\/\",\"name\":\"KInIT\",\"description\":\"Vyu\u017e\u00edvame v\u00fdskum pre \u013eud\u00ed a priemysel\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/kinit.sk\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"sk-SK\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/kinit.sk\\\/#\\\/schema\\\/person\\\/64db52a830dcb6d4df386e78e7eb748b\",\"name\":\"Wanda Pribylincova\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Overshoot: Speeding up model training with a new momentum-based method - KInIT","description":"Overshoot enhancement, either combined with ADAM or SGD CM, regularly speeds up the convergence, saving between...","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/kinit.sk\/sk\/overshoot-new-momentum-based-method-for-model-training\/","og_locale":"sk_SK","og_type":"article","og_title":"Overshoot: Speeding up model training with a new momentum-based method - KInIT","og_description":"Overshoot enhancement, either combined with ADAM or SGD CM, regularly speeds up the convergence, saving between...","og_url":"https:\/\/kinit.sk\/sk\/overshoot-new-momentum-based-method-for-model-training\/","og_site_name":"KInIT","article_published_time":"2025-02-22T01:27:24+00:00","article_modified_time":"2025-02-22T01:29:16+00:00","og_image":[{"width":1800,"height":942,"url":"https:\/\/kinit.sk\/wp-content\/uploads\/2023\/02\/Web_news_feature_general_5.png","type":"image\/png"}],"author":"Wanda Pribylincova","twitter_card":"summary_large_image","twitter_creator":"@kinit","twitter_site":"@kinit","twitter_misc":{"Autor":"Wanda Pribylincova","Predpokladan\u00fd \u010das \u010d\u00edtania":"2 min\u00faty"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/kinit.sk\/sk\/overshoot-new-momentum-based-method-for-model-training\/#article","isPartOf":{"@id":"https:\/\/kinit.sk\/sk\/overshoot-new-momentum-based-method-for-model-training\/"},"author":{"name":"Wanda Pribylincova","@id":"https:\/\/kinit.sk\/#\/schema\/person\/64db52a830dcb6d4df386e78e7eb748b"},"headline":"Overshoot: Speeding up model training with a new momentum-based method","datePublished":"2025-02-22T01:27:24+00:00","dateModified":"2025-02-22T01:29:16+00:00","mainEntityOfPage":{"@id":"https:\/\/kinit.sk\/sk\/overshoot-new-momentum-based-method-for-model-training\/"},"wordCount":365,"image":{"@id":"https:\/\/kinit.sk\/sk\/overshoot-new-momentum-based-method-for-model-training\/#primaryimage"},"thumbnailUrl":"https:\/\/kinit.sk\/wp-content\/uploads\/2023\/02\/Web_news_feature_general_5.png","articleSection":["Uncategorized @sk","News","Pop science","2025"],"inLanguage":"sk-SK"},{"@type":"WebPage","@id":"https:\/\/kinit.sk\/sk\/overshoot-new-momentum-based-method-for-model-training\/","url":"https:\/\/kinit.sk\/sk\/overshoot-new-momentum-based-method-for-model-training\/","name":"Overshoot: Speeding up model training with a new momentum-based method - KInIT","isPartOf":{"@id":"https:\/\/kinit.sk\/#website"},"primaryImageOfPage":{"@id":"https:\/\/kinit.sk\/sk\/overshoot-new-momentum-based-method-for-model-training\/#primaryimage"},"image":{"@id":"https:\/\/kinit.sk\/sk\/overshoot-new-momentum-based-method-for-model-training\/#primaryimage"},"thumbnailUrl":"https:\/\/kinit.sk\/wp-content\/uploads\/2023\/02\/Web_news_feature_general_5.png","datePublished":"2025-02-22T01:27:24+00:00","dateModified":"2025-02-22T01:29:16+00:00","author":{"@id":"https:\/\/kinit.sk\/#\/schema\/person\/64db52a830dcb6d4df386e78e7eb748b"},"description":"Overshoot enhancement, either combined with ADAM or SGD CM, regularly speeds up the convergence, saving between...","breadcrumb":{"@id":"https:\/\/kinit.sk\/sk\/overshoot-new-momentum-based-method-for-model-training\/#breadcrumb"},"inLanguage":"sk-SK","potentialAction":[{"@type":"ReadAction","target":["https:\/\/kinit.sk\/sk\/overshoot-new-momentum-based-method-for-model-training\/"]}]},{"@type":"ImageObject","inLanguage":"sk-SK","@id":"https:\/\/kinit.sk\/sk\/overshoot-new-momentum-based-method-for-model-training\/#primaryimage","url":"https:\/\/kinit.sk\/wp-content\/uploads\/2023\/02\/Web_news_feature_general_5.png","contentUrl":"https:\/\/kinit.sk\/wp-content\/uploads\/2023\/02\/Web_news_feature_general_5.png","width":1800,"height":942},{"@type":"BreadcrumbList","@id":"https:\/\/kinit.sk\/sk\/overshoot-new-momentum-based-method-for-model-training\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/kinit.sk\/sk\/"},{"@type":"ListItem","position":2,"name":"News","item":"https:\/\/kinit.sk\/category\/news\/"},{"@type":"ListItem","position":3,"name":"Overshoot: Speeding up model training with a new momentum-based method"}]},{"@type":"WebSite","@id":"https:\/\/kinit.sk\/#website","url":"https:\/\/kinit.sk\/","name":"KInIT","description":"Vyu\u017e\u00edvame v\u00fdskum pre \u013eud\u00ed a priemysel","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/kinit.sk\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"sk-SK"},{"@type":"Person","@id":"https:\/\/kinit.sk\/#\/schema\/person\/64db52a830dcb6d4df386e78e7eb748b","name":"Wanda Pribylincova"}]}},"_links":{"self":[{"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/posts\/35459","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/users\/34"}],"replies":[{"embeddable":true,"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/comments?post=35459"}],"version-history":[{"count":1,"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/posts\/35459\/revisions"}],"predecessor-version":[{"id":35460,"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/posts\/35459\/revisions\/35460"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/media\/25091"}],"wp:attachment":[{"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/media?parent=35459"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/categories?post=35459"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kinit.sk\/sk\/wp-json\/wp\/v2\/tags?post=35459"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}