Unsupervised Learning For Refactoring Pattern Detection

doi:10.60692/dahey-81682

Published June 28, 2021 | Version v1

Publication Open

Unsupervised Learning For Refactoring Pattern Detection

1. Universidade Federal do Paraná

Software refactoring changes the structure of a program without modifying its external behavior, generally intending to improve software quality attributes. However, refactoring is a complex activity and, many times, a composition of refactorings is necessary. Besides, some code elements are refactored similarly, considering the kind and frequency of refactorings applied. Works in the refactoring literature usually investigate the impact and understanding of an individual refactoring, neglecting that developers have to apply more than one refactoring operation to reach their goals. There is a lack of studies to identify and characterize refactoring patterns. To fulfill this gap, this work explores the use of unsupervised learning, particularly cluster analysis, to group elements (Java classes) that are refactored similarly in software repositories. We used a total of 1435 projects and applied the K-Means algorithm to group classes that received the same refactoring with the same frequency. We obtained a set of seven clusters. Then, the main refactoring compositions associated with each cluster are analyzed to identify the corresponding pattern. Each pattern is described and also characterized using a set of metrics. The great majority of refactoring compositions include only one kind of refactoring, applied with low frequency. If we consider compositions including more than one type of refactorings, combinations of Extract Superclass and Pull Up Method are the most frequent.

Translated Descriptions

This is an automatic machine translation with an accuracy of 90-95%

Translated Description (Arabic)

تعمل إعادة هيكلة البرامج على تغيير بنية البرنامج دون تعديل سلوكه الخارجي، وتهدف عمومًا إلى تحسين سمات جودة البرامج. ومع ذلك، فإن إعادة الهيكلة هي نشاط معقد، وفي كثير من الأحيان، من الضروري تكوين إعادة الهيكلة. إلى جانب ذلك، تتم إعادة هيكلة بعض عناصر التعليمات البرمجية بشكل مماثل، مع الأخذ في الاعتبار نوع وتواتر عمليات إعادة الهيكلة المطبقة. عادة ما تبحث الأعمال في أدبيات إعادة الهيكلة في تأثير وفهم إعادة الهيكلة الفردية، مع إهمال أنه يتعين على المطورين تطبيق أكثر من عملية إعادة هيكلة واحدة للوصول إلى أهدافهم. هناك نقص في الدراسات لتحديد وتوصيف أنماط إعادة الهيكلة. لسد هذه الفجوة، يستكشف هذا العمل استخدام التعلم غير الخاضع للإشراف، وخاصة التحليل العنقودي، لتجميع العناصر (فئات جافا) التي يتم إعادة هيكلتها بشكل مماثل في مستودعات البرامج. استخدمنا ما مجموعه 1435 مشروعًا وقمنا بتطبيق خوارزمية K - Means على فئات المجموعة التي تلقت نفس إعادة الهيكلة بنفس التردد. حصلنا على مجموعة من سبع مجموعات. بعد ذلك، يتم تحليل تركيبات إعادة الهيكلة الرئيسية المرتبطة بكل مجموعة لتحديد النمط المقابل. يتم وصف كل نمط وتمييزه أيضًا باستخدام مجموعة من المقاييس. تتضمن الغالبية العظمى من تركيبات إعادة الهيكلة نوعًا واحدًا فقط من إعادة الهيكلة، يتم تطبيقه بتردد منخفض. إذا أخذنا في الاعتبار التركيبات بما في ذلك أكثر من نوع واحد من إعادة الهيكلة، فإن تركيبات المستخلص الفائق وطريقة السحب لأعلى هي الأكثر شيوعًا.

Translated Description (French)

La refactorisation logicielle modifie la structure d'un programme sans modifier son comportement externe, généralement dans le but d'améliorer les attributs de qualité du logiciel. Cependant, le refactoring est une activité complexe et, souvent, une composition de refactorings est nécessaire. En outre, certains éléments de code sont refactorisés de manière similaire, compte tenu du type et de la fréquence des refactorisations appliquées. Les travaux de la littérature sur le refactoring étudient généralement l'impact et la compréhension d'un refactoring individuel, en négligeant le fait que les développeurs doivent appliquer plus d'une opération de refactoring pour atteindre leurs objectifs. Il y a un manque d'études pour identifier et caractériser les modèles de refactoring. Pour combler cette lacune, ce travail explore l'utilisation de l'apprentissage non supervisé, en particulier l'analyse de cluster, pour regrouper des éléments (classes Java) qui sont refactorisés de manière similaire dans des référentiels logiciels. Nous avons utilisé un total de 1435 projets et appliqué l'algorithme K-Means pour regrouper les classes qui ont reçu le même refactoring avec la même fréquence. Nous avons obtenu un ensemble de sept grappes. Ensuite, les principales compositions de refactoring associées à chaque cluster sont analysées pour identifier le motif correspondant. Chaque modèle est décrit et également caractérisé à l'aide d'un ensemble de métriques. La grande majorité des compositions de refactoring ne comprennent qu'un seul type de refactoring, appliqué à basse fréquence. Si l'on considère des compositions comprenant plus d'un type de refactorisation, les combinaisons d'Extract Superclass et de Pull Up Method sont les plus fréquentes.

Translated Description (Spanish)

La refactorización de software cambia la estructura de un programa sin modificar su comportamiento externo, generalmente con la intención de mejorar los atributos de calidad del software. Sin embargo, la refactorización es una actividad compleja y, muchas veces, es necesaria una composición de refactorizaciones. Además, algunos elementos de código se refactorizan de manera similar, teniendo en cuenta el tipo y la frecuencia de las refactorizaciones aplicadas. Los trabajos en la literatura de refactorización generalmente investigan el impacto y la comprensión de una refactorización individual, descuidando que los desarrolladores tienen que aplicar más de una operación de refactorización para alcanzar sus objetivos. Faltan estudios para identificar y caracterizar los patrones de refactorización. Para colmar esta laguna, este trabajo explora el uso del aprendizaje no supervisado, en particular el análisis de clústeres, para agrupar elementos (clases Java) que se refactorizan de manera similar en los repositorios de software. Utilizamos un total de 1435 proyectos y aplicamos el algoritmo K-Means a las clases de grupo que recibieron la misma refactorización con la misma frecuencia. Obtuvimos un conjunto de siete grupos. A continuación, se analizan las principales composiciones de refactorización asociadas a cada clúster para identificar el patrón correspondiente. Cada patrón se describe y también se caracteriza utilizando un conjunto de métricas. La gran mayoría de las composiciones de refactorización incluyen solo un tipo de refactorización, aplicada con baja frecuencia. Si consideramos composiciones que incluyen más de un tipo de refactorización, las combinaciones de Extract Superclass y Pull Up Method son las más frecuentes.

Files

qualitative-tex.pdf.pdf

Files (43.1 kB)

Please wait a few minutes before your translated files are ready Note: Some files might be protected thus translations might not work.

Name	Size	Download all
qualitative-tex.pdf.pdf md5:ca3ee84a5bf25d1b2b2802a449ca8024	43.1 kB	Preview Download

Additional details

Translated title (Arabic): التعلم غير الخاضع للإشراف للكشف عن نمط إعادة الهيكلة
Translated title (French): Apprentissage non supervisé pour la détection de modèles de refactorisation
Translated title (Spanish): Aprendizaje no supervisado para la detección de patrones de refactorización

Other: https://openalex.org/W3193142627
DOI: 10.1109/cec45853.2021.9504804

Is Global South Knowledge: Yes
Country: Brazil

https://openalex.org/W1976270565
https://openalex.org/W1977556410
https://openalex.org/W2071983648
https://openalex.org/W2106228740
https://openalex.org/W2115685157
https://openalex.org/W2145603002
https://openalex.org/W2166615924
https://openalex.org/W2363412885
https://openalex.org/W2498915605
https://openalex.org/W2558658782
https://openalex.org/W2608628736
https://openalex.org/W2740791754
https://openalex.org/W2756266070
https://openalex.org/W2769294560
https://openalex.org/W2787379525
https://openalex.org/W2787946654
https://openalex.org/W2795027827
https://openalex.org/W2800525766
https://openalex.org/W2884327322
https://openalex.org/W2888559725
https://openalex.org/W2901284326
https://openalex.org/W2919418334
https://openalex.org/W2980897813
https://openalex.org/W2997591727
https://openalex.org/W3083752221
https://openalex.org/W3089663604
https://openalex.org/W3099529967
https://openalex.org/W4292023222

	All versions	This version
Views	3	3
Downloads	1	1
Data volume	43.1 kB	43.1 kB

Unsupervised Learning For Refactoring Pattern Detection

Translated Descriptions

Translated Description (Arabic)

Translated Description (French)

Translated Description (Spanish)

Files

qualitative-tex.pdf.pdf

Files (43.1 kB)

Additional details

Additional titles

Identifiers

Related works

GreSIS Basics Section

References

Unsupervised Learning For Refactoring Pattern Detection

Creators

Description

Translated Descriptions

Translated Description (Arabic)

Translated Description (French)

Translated Description (Spanish)

Files

qualitative-tex.pdf.pdf

Files (43.1 kB)

Additional details

Additional titles

Identifiers

Related works

GreSIS Basics Section

References