paperId,title,pub_date,total_citations,a,b,R2,lead_author_id,last_author_id,citation_dates,citation_counts,lead_author_h_index,last_author_h_index,rank dd4cfde3e135f799a9a71b4f57e13a29de89f7e3,DAPO: An Open-Source LLM Reinforcement Learning System at Scale,2025-03-18,770,93.11428571428571,63.43296703296704,0.8675937566539489,23716915,2285763682,"[""2025-04-01"", ""2025-05-01"", ""2025-06-01"", ""2025-07-01"", ""2025-08-01"", ""2025-09-01"", ""2025-10-01"", ""2025-11-01"", ""2025-12-01"", ""2026-01-01"", ""2026-02-01"", ""2026-03-01"", ""2026-04-01"", ""2026-05-01""]","[21, 54, 181, 265, 329, 422, 569, 741, 749, 749, 749, 749, 749, 749]",9.0,5.0,2 668075792a7ab40457d92e09da28d35c879271c3,Kimi k1.5: Scaling Reinforcement Learning with LLMs,2025-01-22,601,40.828571428571415,50.91648351648353,0.914092496127803,2341539673,2341579297,"[""2025-02-01"", ""2025-03-01"", ""2025-04-01"", ""2025-05-01"", ""2025-06-01"", ""2025-07-01"", ""2025-08-01"", ""2025-09-01"", ""2025-10-01"", ""2025-11-01"", ""2025-12-01"", ""2026-01-01"", ""2026-02-01"", ""2026-03-01""]","[10, 36, 82, 139, 274, 361, 414, 457, 513, 583, 584, 584, 584, 584]",1.0,2.0,3 c4ff8bc44d88cd267baf18ac5d3a3a1fe86d08eb,Training Language Models to Self-Correct via Reinforcement Learning,2024-09-19,267,-12.199999999999982,19.283516483516486,0.96257722316695,2317038858,2258552654,"[""2024-10-01"", ""2024-11-01"", ""2024-12-01"", ""2025-01-01"", ""2025-02-01"", ""2025-03-01"", ""2025-04-01"", ""2025-05-01"", ""2025-06-01"", ""2025-07-01"", ""2025-08-01"", ""2025-09-01"", ""2025-10-01"", ""2025-11-01""]","[12, 28, 28, 33, 41, 63, 86, 108, 148, 182, 188, 201, 222, 244]",4.0,5.0,7 182c7b40ff7560a5545764814338f55a2098e441,Reinforced Self-Training (ReST) for Language Modeling,2023-08-17,350,-4.199999999999998,13.404395604395607,0.97934244588122,146372255,1737568,"[""2023-09-01"", ""2023-10-01"", ""2023-11-01"", ""2023-12-01"", ""2024-01-01"", ""2024-02-01"", ""2024-03-01"", ""2024-04-01"", ""2024-05-01"", ""2024-06-01"", ""2024-07-01"", ""2024-08-01"", ""2024-09-01"", ""2024-10-01""]","[4, 14, 23, 30, 37, 50, 74, 92, 111, 122, 143, 147, 155, 159]",15.0,76.0,9 9a75e23639bfcc3a51da57a3b682a984d1d8ac0b,Language Models can Solve Computer Tasks,2023-03-30,435,-5.685714285714273,11.292307692307695,0.9837593339410006,2176466425,143953836,"[""2023-04-01"", ""2023-05-01"", ""2023-06-01"", ""2023-07-01"", ""2023-08-01"", ""2023-09-01"", ""2023-10-01"", ""2023-11-01"", ""2023-12-01"", ""2024-01-01"", ""2024-02-01"", ""2024-03-01"", ""2024-04-01"", ""2024-05-01""]","[2, 6, 21, 29, 37, 48, 56, 71, 83, 89, 100, 114, 140, 152]",3.0,19.0,10 1122b654f8b47c1aa9c04ff6bbe7561c798e2ad0,Reinforcement Learning for Reasoning in Large Language Models with One Training Example,2025-04-29,138,40.45714285714286,9.501098901098903,0.7829334981173772,2334890311,2324671790,"[""2025-05-01"", ""2025-06-01"", ""2025-07-01"", ""2025-08-01"", ""2025-09-01"", ""2025-10-01"", ""2025-11-01"", ""2025-12-01"", ""2026-01-01"", ""2026-02-01"", ""2026-03-01"", ""2026-04-01"", ""2026-05-01"", ""2026-06-01""]","[4, 33, 55, 74, 85, 102, 133, 135, 135, 135, 135, 135, 135, 135]",8.0,5.0,11 900cd128482bbab4d2752d01ce80c55498b78dd2,SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution,2025-02-25,102,14.40000000000001,7.839560439560441,0.9009383971730187,2237736409,2324830843,"[""2025-03-01"", ""2025-04-01"", ""2025-05-01"", ""2025-06-01"", ""2025-07-01"", ""2025-08-01"", ""2025-09-01"", ""2025-10-01"", ""2025-11-01"", ""2025-12-01"", ""2026-01-01"", ""2026-02-01"", ""2026-03-01"", ""2026-04-01""]","[5, 13, 21, 42, 50, 54, 66, 83, 96, 97, 97, 97, 97, 97]",8.0,2.0,12 bde841b0dbbf7a15ee69966a828c7fe2cf532ad9,WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning,2024-11-04,85,-10.813186813186809,7.648351648351649,0.9515219133522951,2286747770,2243402027,"[""2024-12-01"", ""2025-01-01"", ""2025-02-01"", ""2025-03-01"", ""2025-04-01"", ""2025-05-01"", ""2025-06-01"", ""2025-07-01"", ""2025-08-01"", ""2025-09-01"", ""2025-10-01"", ""2025-11-01"", ""2025-12-01""]","[1, 3, 4, 8, 11, 14, 33, 44, 49, 57, 67, 82, 83]",7.0,33.0,13 69535d3c6ff3238f8e7b2b29c1d40ea9d9d7914f,Reasoning with Exploration: An Entropy Perspective,2025-06-17,95,36.228571428571435,6.173626373626374,0.6082587092673627,2068324576,2253471545,"[""2025-07-01"", ""2025-08-01"", ""2025-09-01"", ""2025-10-01"", ""2025-11-01"", ""2025-12-01"", ""2026-01-01"", ""2026-02-01"", ""2026-03-01"", ""2026-04-01"", ""2026-05-01"", ""2026-06-01"", ""2026-07-01"", ""2026-08-01""]","[5, 15, 34, 65, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95]",8.0,15.0,18 10feab31bb9e71a0f1094fb00c0554abfb992c4d,ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models,2025-05-30,77,20.057142857142853,5.507692307692308,0.7172462470419474,2369203659,2359888593,"[""2025-06-01"", ""2025-07-01"", ""2025-08-01"", ""2025-09-01"", ""2025-10-01"", ""2025-11-01"", ""2025-12-01"", ""2026-01-01"", ""2026-02-01"", ""2026-03-01"", ""2026-04-01"", ""2026-05-01"", ""2026-06-01"", ""2026-07-01""]","[5, 12, 19, 28, 52, 74, 74, 74, 74, 74, 74, 74, 74, 74]",2.0,4.0,22 8e3b1f5d8b6c165f64137cc1f7dea89cf6f622bd,d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning,2025-04-16,57,7.399999999999999,4.795604395604396,0.806981111608293,2260172378,2267723293,"[""2025-05-01"", ""2025-06-01"", ""2025-07-01"", ""2025-08-01"", ""2025-09-01"", ""2025-10-01"", ""2025-11-01"", ""2025-12-01"", ""2026-01-01"", ""2026-02-01"", ""2026-03-01"", ""2026-04-01"", ""2026-05-01"", ""2026-06-01""]","[1, 6, 11, 16, 21, 37, 56, 56, 56, 56, 56, 56, 56, 56]",6.0,3.0,26 4aef44e4aeaf28868ae2f1fff2c4eb19ff4df1f6,The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning,2025-05-21,67,20.857142857142865,4.527472527472528,0.7096938195294573,1923351,2334095077,"[""2025-06-01"", ""2025-07-01"", ""2025-08-01"", ""2025-09-01"", ""2025-10-01"", ""2025-11-01"", ""2025-12-01"", ""2026-01-01"", ""2026-02-01"", ""2026-03-01"", ""2026-04-01"", ""2026-05-01"", ""2026-06-01"", ""2026-07-01""]","[6, 13, 21, 30, 49, 65, 65, 65, 65, 65, 65, 65, 65, 65]",13.0,5.0,29 c78350e81298ca87bc1d59b466fa40081232caaa,Teaching Large Language Models to Reason with Reinforcement Learning,2024-03-07,134,3.8857142857142875,4.512087912087912,0.9849652208560354,2279337437,48647153,"[""2024-04-01"", ""2024-05-01"", ""2024-06-01"", ""2024-07-01"", ""2024-08-01"", ""2024-09-01"", ""2024-10-01"", ""2024-11-01"", ""2024-12-01"", ""2025-01-01"", ""2025-02-01"", ""2025-03-01"", ""2025-04-01"", ""2025-05-01""]","[6, 10, 13, 18, 21, 23, 27, 39, 40, 44, 46, 53, 59, 66]",6.0,27.0,30 8d6411e337502f7fe0bfa59d486803a73d2c1192,Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling,2025-01-20,56,7.714285714285719,4.384615384615385,0.9257669803322274,2329057292,2328936204,"[""2025-02-01"", ""2025-03-01"", ""2025-04-01"", ""2025-05-01"", ""2025-06-01"", ""2025-07-01"", ""2025-08-01"", ""2025-09-01"", ""2025-10-01"", ""2025-11-01"", ""2025-12-01"", ""2026-01-01"", ""2026-02-01"", ""2026-03-01""]","[2, 8, 13, 19, 30, 36, 37, 41, 46, 55, 55, 55, 55, 55]",7.0,3.0,31 529ff7d6441d244212cf2becafd12a7e67ac56d9,FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning,2023-09-01,177,-0.8857142857142818,3.9604395604395606,0.9771728467135712,2162042348,2237499232,"[""2023-09-01"", ""2023-10-01"", ""2023-11-01"", ""2023-12-01"", ""2024-01-01"", ""2024-02-01"", ""2024-03-01"", ""2024-04-01"", ""2024-05-01"", ""2024-06-01"", ""2024-07-01"", ""2024-08-01"", ""2024-09-01"", ""2024-10-01""]","[4, 4, 7, 10, 14, 18, 21, 24, 27, 32, 41, 45, 47, 54]",9.0,10.0,32 04c05c6acc970f2ca89af8e436b8dd8189396146,General-Reasoner: Advancing LLM Reasoning Across All Domains,2025-05-20,56,17.828571428571433,3.685714285714286,0.7291672986500835,2461713,2249847177,"[""2025-06-01"", ""2025-07-01"", ""2025-08-01"", ""2025-09-01"", ""2025-10-01"", ""2025-11-01"", ""2025-12-01"", ""2026-01-01"", ""2026-02-01"", ""2026-03-01"", ""2026-04-01"", ""2026-05-01"", ""2026-06-01"", ""2026-07-01""]","[7, 12, 19, 27, 34, 54, 54, 54, 54, 54, 54, 54, 54, 54]",23.0,20.0,33 2d906cda427cb2c4a71069423312e57ba4cd5445,Reinforcement Learning Enhanced LLMs: A Survey,2024-12-05,40,-2.3516483516483535,3.6098901098901104,0.9653726416360269,2109514219,2285877067,"[""2025-01-01"", ""2025-02-01"", ""2025-03-01"", ""2025-04-01"", ""2025-05-01"", ""2025-06-01"", ""2025-07-01"", ""2025-08-01"", ""2025-09-01"", ""2025-10-01"", ""2025-11-01"", ""2025-12-01"", ""2026-01-01""]","[1, 1, 4, 8, 9, 12, 18, 25, 28, 34, 37, 37, 37]",11.0,7.0,35 668858489bbec3ce45f7a84a6a557b329f9ec91a,ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL,2024-02-29,111,-1.799999999999994,3.30989010989011,0.9538781175085166,2288585404,1488785534,"[""2024-03-01"", ""2024-04-01"", ""2024-05-01"", ""2024-06-01"", ""2024-07-01"", ""2024-08-01"", ""2024-09-01"", ""2024-10-01"", ""2024-11-01"", ""2024-12-01"", ""2025-01-01"", ""2025-02-01"", ""2025-03-01"", ""2025-04-01""]","[3, 4, 4, 7, 11, 14, 16, 17, 23, 26, 30, 32, 42, 47]",10.0,41.0,37 6c00f661c46391642208a5292f38b5a9e0e09cae,AutoWebGLM: A Large Language Model-based Web Navigating Agent,2024-04-04,105,-1.4285714285714246,3.10989010989011,0.9043779712501553,2051311700,2260595820,"[""2024-05-01"", ""2024-06-01"", ""2024-07-01"", ""2024-08-01"", ""2024-09-01"", ""2024-10-01"", ""2024-11-01"", ""2024-12-01"", ""2025-01-01"", ""2025-02-01"", ""2025-03-01"", ""2025-04-01"", ""2025-05-01""]","[1, 3, 7, 8, 10, 13, 16, 18, 21, 22, 26, 32, 47]",10.0,19.0,39 53106a642a12b05753ebe9ffca62d8efb0670281,Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models,2024-12-18,39,1.5999999999999985,3.1054945054945056,0.9697990766219226,1819830,2258552654,"[""2025-01-01"", ""2025-02-01"", ""2025-03-01"", ""2025-04-01"", ""2025-05-01"", ""2025-06-01"", ""2025-07-01"", ""2025-08-01"", ""2025-09-01"", ""2025-10-01"", ""2025-11-01"", ""2025-12-01"", ""2026-01-01"", ""2026-02-01""]","[2, 2, 6, 10, 15, 18, 22, 25, 27, 30, 37, 37, 37, 37]",28.0,5.0,40 0a42291d9543eabe33f6c14278333484071a707c,Offline Reinforcement Learning for LLM Multi-Step Reasoning,2024-12-20,36,2.714285714285718,2.791208791208792,0.9571765990528528,2216204941,2336830631,"[""2025-01-01"", ""2025-02-01"", ""2025-03-01"", ""2025-04-01"", ""2025-05-01"", ""2025-06-01"", ""2025-07-01"", ""2025-08-01"", ""2025-09-01"", ""2025-10-01"", ""2025-11-01"", ""2025-12-01"", ""2026-01-01"", ""2026-02-01""]","[1, 2, 7, 12, 15, 18, 23, 24, 25, 29, 34, 34, 34, 34]",3.0,2.0,44 95d638e7705ec561382268405bc488df4c26c7f7,Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning,2025-03-20,39,8.600000000000003,2.7208791208791214,0.8156613553873827,2232782972,2351085693,"[""2025-04-01"", ""2025-05-01"", ""2025-06-01"", ""2025-07-01"", ""2025-08-01"", ""2025-09-01"", ""2025-10-01"", ""2025-11-01"", ""2025-12-01"", ""2026-01-01"", ""2026-02-01"", ""2026-03-01"", ""2026-04-01"", ""2026-05-01""]","[2, 4, 12, 18, 23, 25, 32, 36, 36, 36, 36, 36, 36, 36]",3.0,3.0,45 4f0e4a313a3f777b4b6aab4f364b9bc51a6aacc9,Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging,2025-03-26,34,10.171428571428576,2.347252747252748,0.833716340671871,2346255376,2347282055,"[""2025-04-01"", ""2025-05-01"", ""2025-06-01"", ""2025-07-01"", ""2025-08-01"", ""2025-09-01"", ""2025-10-01"", ""2025-11-01"", ""2025-12-01"", ""2026-01-01"", ""2026-02-01"", ""2026-03-01"", ""2026-04-01"", ""2026-05-01""]","[5, 7, 15, 16, 20, 26, 30, 33, 34, 34, 34, 34, 34, 34]",4.0,4.0,53 aa89c6bf86486e180833037333555e3492b15c8e,MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning,2025-02-25,26,-0.3142857142857136,2.3230769230769233,0.9242590823168335,2232975456,2347076258,"[""2025-03-01"", ""2025-04-01"", ""2025-05-01"", ""2025-06-01"", ""2025-07-01"", ""2025-08-01"", ""2025-09-01"", ""2025-10-01"", ""2025-11-01"", ""2025-12-01"", ""2026-01-01"", ""2026-02-01"", ""2026-03-01"", ""2026-04-01""]","[0, 1, 4, 4, 8, 9, 14, 19, 23, 25, 25, 25, 25, 25]",3.0,1.0,54 24d9d00b91f99dde3c9a0ea5c79e63f2ed26151c,Reinforcing General Reasoning without Verifiers,2025-05-27,29,6.200000000000001,2.3208791208791215,0.7023063067602122,2257107372,2356268297,"[""2025-06-01"", ""2025-07-01"", ""2025-08-01"", ""2025-09-01"", ""2025-10-01"", ""2025-11-01"", ""2025-12-01"", ""2026-01-01"", ""2026-02-01"", ""2026-03-01"", ""2026-04-01"", ""2026-05-01"", ""2026-06-01"", ""2026-07-01""]","[1, 4, 5, 6, 21, 29, 29, 29, 29, 29, 29, 29, 29, 29]",6.0,5.0,55 1f45c9d4d92c51a94c984eb4c2c6e027bbf78038,SRPO: A Cross-Domain Implementation of Large-Scale Reinforcement Learning on LLM,2025-04-19,35,12.742857142857153,1.9516483516483518,0.7551946061906223,2356633495,2356584590,"[""2025-05-01"", ""2025-06-01"", ""2025-07-01"", ""2025-08-01"", ""2025-09-01"", ""2025-10-01"", ""2025-11-01"", ""2025-12-01"", ""2026-01-01"", ""2026-02-01"", ""2026-03-01"", ""2026-04-01"", ""2026-05-01"", ""2026-06-01""]","[4, 11, 15, 21, 23, 26, 32, 32, 32, 32, 32, 32, 32, 32]",2.0,2.0,62 82f59319e581cdfe16a97fe29bec6215ad818a81,Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions,2025-06-09,32,13.228571428571435,1.8659340659340664,0.6163089064550746,2300236632,2309265357,"[""2025-07-01"", ""2025-08-01"", ""2025-09-01"", ""2025-10-01"", ""2025-11-01"", ""2025-12-01"", ""2026-01-01"", ""2026-02-01"", ""2026-03-01"", ""2026-04-01"", ""2026-05-01"", ""2026-06-01"", ""2026-07-01"", ""2026-08-01""]","[3, 7, 14, 22, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31]",2.0,8.0,64 765051eee0fb1394b62555edb86ba1f7a00892fb,Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking,2025-03-25,26,6.285714285714291,1.8571428571428574,0.8163037824180723,2352756200,2352034310,"[""2025-04-01"", ""2025-05-01"", ""2025-06-01"", ""2025-07-01"", ""2025-08-01"", ""2025-09-01"", ""2025-10-01"", ""2025-11-01"", ""2025-12-01"", ""2026-01-01"", ""2026-02-01"", ""2026-03-01"", ""2026-04-01"", ""2026-05-01""]","[1, 4, 8, 13, 17, 18, 21, 25, 25, 25, 25, 25, 25, 25]",6.0,6.0,66 59084df7203c6be33838ba3e3854eb9bda053ed2,Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint,2024-01-11,39,1.7714285714285725,1.7824175824175827,0.9830743821417095,2256558402,2274218622,"[""2024-02-01"", ""2024-03-01"", ""2024-04-01"", ""2024-05-01"", ""2024-06-01"", ""2024-07-01"", ""2024-08-01"", ""2024-09-01"", ""2024-10-01"", ""2024-11-01"", ""2024-12-01"", ""2025-01-01"", ""2025-02-01"", ""2025-03-01""]","[2, 5, 6, 6, 7, 11, 12, 14, 15, 19, 21, 21, 23, 25]",11.0,16.0,68 d3f8d5a9c48ee9f338f25f1be5f3adc7fd5dd855,"ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models",2023-10-16,114,-1.828571428571428,1.7648351648351652,0.9382520061986628,25841722,2114939672,"[""2023-11-01"", ""2023-12-01"", ""2024-01-01"", ""2024-02-01"", ""2024-03-01"", ""2024-04-01"", ""2024-05-01"", ""2024-06-01"", ""2024-07-01"", ""2024-08-01"", ""2024-09-01"", ""2024-10-01"", ""2024-11-01"", ""2024-12-01""]","[1, 1, 3, 3, 5, 5, 6, 7, 12, 14, 15, 18, 22, 23]",11.0,14.0,69 3646acd0dc49a00d38517abccfc3a54cb78bbadc,SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning,2025-06-02,25,9.615384615384606,1.7307692307692315,0.660039113428944,2347120431,2272708319,"[""2025-07-01"", ""2025-08-01"", ""2025-09-01"", ""2025-10-01"", ""2025-11-01"", ""2025-12-01"", ""2026-01-01"", ""2026-02-01"", ""2026-03-01"", ""2026-04-01"", ""2026-05-01"", ""2026-06-01"", ""2026-07-01""]","[2, 6, 11, 17, 24, 25, 25, 25, 25, 25, 25, 25, 25]",2.0,5.0,71 2bdccbeffa0bae760f416a72ebe7a3951e230659,Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective,2025-06-17,25,8.771428571428576,1.6945054945054947,0.627891943278143,2365396554,2295863002,"[""2025-07-01"", ""2025-08-01"", ""2025-09-01"", ""2025-10-01"", ""2025-11-01"", ""2025-12-01"", ""2026-01-01"", ""2026-02-01"", ""2026-03-01"", ""2026-04-01"", ""2026-05-01"", ""2026-06-01"", ""2026-07-01"", ""2026-08-01""]","[1, 5, 7, 14, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25]",3.0,6.0,72 c226a4acb42912054d498bcf771023b0ba2da001,Language Model Self-improvement by Reinforcement Learning Contemplation,2023-05-23,70,-2.942857142857143,1.364835164835165,0.832693117408907,1432234123,2152850415,"[""2023-06-01"", ""2023-07-01"", ""2023-08-01"", ""2023-09-01"", ""2023-10-01"", ""2023-11-01"", ""2023-12-01"", ""2024-01-01"", ""2024-02-01"", ""2024-03-01"", ""2024-04-01"", ""2024-05-01"", ""2024-06-01"", ""2024-07-01""]","[0, 0, 0, 2, 3, 3, 3, 3, 4, 8, 9, 12, 16, 20]",8.0,14.0,77 7d6168fbd3ed72f9098573007f4b8c2ec9e576b9,Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning,2024-02-08,48,-1.6857142857142848,1.3362637362637366,0.9491219061312519,2190751523,2257129989,"[""2024-03-01"", ""2024-04-01"", ""2024-05-01"", ""2024-06-01"", ""2024-07-01"", ""2024-08-01"", ""2024-09-01"", ""2024-10-01"", ""2024-11-01"", ""2024-12-01"", ""2025-01-01"", ""2025-02-01"", ""2025-03-01"", ""2025-04-01""]","[0, 1, 1, 2, 4, 4, 4, 6, 8, 11, 11, 13, 15, 18]",13.0,24.0,78 8db1d30ac06dde4bd19d0a86241137ac2be21552,LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic Reward,2024-01-07,29,1.1428571428571443,0.8461538461538461,0.9741070548417836,2215168083,71756373,"[""2024-02-01"", ""2024-03-01"", ""2024-04-01"", ""2024-05-01"", ""2024-06-01"", ""2024-07-01"", ""2024-08-01"", ""2024-09-01"", ""2024-10-01"", ""2024-11-01"", ""2024-12-01"", ""2025-01-01"", ""2025-02-01"", ""2025-03-01""]","[2, 2, 2, 3, 5, 5, 6, 7, 8, 10, 10, 10, 11, 12]",5.0,15.0,92 3714ed902e79dad5dcc93c5d033c8222d044f3c8,Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation,2023-10-03,28,1.362637362637362,0.7087912087912089,0.8701631457854004,2108813860,2061625488,"[""2023-11-01"", ""2023-12-01"", ""2024-01-01"", ""2024-02-01"", ""2024-03-01"", ""2024-04-01"", ""2024-05-01"", ""2024-06-01"", ""2024-07-01"", ""2024-08-01"", ""2024-09-01"", ""2024-10-01"", ""2024-11-01""]","[3, 3, 3, 3, 3, 4, 4, 5, 7, 8, 10, 10, 10]",7.0,13.0,94 0fe8f2f55046ad0b8d6337f57a78466790923264,Outcome-based Exploration for LLM Reasoning,2025-09-08,26,22.14285714285715,0.4285714285714285,0.19999999999999996,2379687376,2237802765,"[""2025-10-01"", ""2025-11-01"", ""2025-12-01"", ""2026-01-01"", ""2026-02-01"", ""2026-03-01"", ""2026-04-01"", ""2026-05-01"", ""2026-06-01"", ""2026-07-01"", ""2026-08-01"", ""2026-09-01"", ""2026-10-01"", ""2026-11-01""]","[11, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26]",1.0,10.0,97