Practical dynamic reconstruction of control flow graphs

doi:10.60692/k9w5a-72e54

Published October 11, 2020 | Version v1

Publication Open

Practical dynamic reconstruction of control flow graphs

1. Federal Center for Technological Education of Minas Gerais
2. University of Alberta
3. Universidade Federal de Minas Gerais

Abstract The automatic recovery of a program's high‐level representation from its binary version is a well‐studied problem in programming languages. However, most of the solutions to this problem are based on purely static approaches: techniques such as dataflow analyses or type inference are used to convert the bytes that constitute the executable code back into a control flow graph (CFG). This article departs from such a modus operandi to show that a dynamic analysis can be effective and useful, both as a standalone technique, and as a way to enhance the precision of static approaches. The experimental results provide evidence that completeness, that is, the ability to conclude that the entire CFG has been discovered, is achievable on many functions that are part of industry‐strong benchmarks. Experiments also indicate that dynamic information greatly enhances the ability of DynInst , a state‐of‐the‐art binary reconstructor, to deal with code stripped of debugging information. These results were obtained with CFGgrind , a new implementation of a dynamic code reconstructor, built on top of Valgrind . When applied to cBench , CFGgrind is 9% faster than callgrind , Valgrind 's tool used to track targets of function calls; and 7% faster in Spec Cpu2017 . CFGgrind recovers the complete CFG of 40% of all the procedures invoked during the standard execution of programs in Spec Cpu2017 , and 37% in cBench . When combined with CFGgrind , DynInst finds 15% more CFGs for cBench , and 7% more CFGs for Spec Cpu2017 . Finally, CFGgrind is more than 7 times faster than DCFG, a CFG reconstructor from Intel, and 1.30 times faster than bfTrace , a CFG reconstructor used in research. CFGgrind is also more precise than these two tools, handling operating system signals, shared code in functions, and unaligned instructions; besides supporting multithreaded programs, exact profiling and incremental refinements.

Translated Descriptions

This is an automatic machine translation with an accuracy of 90-95%

Translated Description (Arabic)

يعد الاسترداد التلقائي للتمثيل عاليالمستوى للبرنامج من نسخته الثنائية مشكلة مدروسة جيدًا في لغات البرمجة. ومع ذلك، فإن معظم الحلول لهذه المشكلة تستند إلى مناهج ثابتة بحتة: يتم استخدام تقنيات مثل تحليلات تدفق البيانات أو نوع الاستدلال لتحويل وحدات البايت التي تشكل التعليمات البرمجية القابلة للتنفيذ مرة أخرى إلى رسم بياني لتدفق التحكم (CFG). تنحرف هذه المقالة عن طريقة العمل هذه لإظهار أن التحليل الديناميكي يمكن أن يكون فعالًا ومفيدًا، كأسلوب مستقل، وكطريقة لتعزيز دقة الأساليب الثابتة. تقدم النتائج التجريبية دليلاً على أن الاكتمال، أي القدرة على استنتاج أن مجموعة CFG بأكملها قد تم اكتشافها، يمكن تحقيقه في العديد من الوظائف التي تشكل جزءًا من المعايير القوية للصناعة. تشير التجارب أيضًا إلى أن المعلومات الديناميكية تعزز بشكل كبير قدرة DynInst، وهو معيد بناء ثنائيحديث، على التعامل مع التعليمات البرمجية المجردة من معلومات التصحيح. تم الحصول على هذه النتائج مع CFGgrind، وهو تنفيذ جديد لمعيد بناء كود ديناميكي، تم بناؤه فوق Valgrind . عند تطبيقه على cBench ، يكون CFGgrind أسرع بنسبة 9 ٪ من callgrind، وهي أداة Valgrind المستخدمة لتتبع أهداف استدعاءات الوظائف ؛ وأسرع بنسبة 7 ٪ في Spec Cpu2017 . تسترد CFGgrind كامل CFG بنسبة 40 ٪ من جميع الإجراءات التي تم الاحتجاج بها أثناء التنفيذ القياسي للبرامج في المواصفات CPU 2017، و 37 ٪ في cBench . عند دمجها مع CFGgrind، تجد DynInst 15 ٪ أكثر من CFG لـ cBench ، و 7 ٪ أكثر من CFG لـ Spec Cpu2017 . أخيرًا، فإن CFGgrind أسرع بأكثر من 7 مرات من DCFG، وهو معيد بناء CFG من Intel، و 1.30 مرة أسرع من bfTrace ، وهو معيد بناء CFG يستخدم في البحث. كما أن CFGgrind أكثر دقة من هاتين الأداتين، والتعامل مع إشارات نظام التشغيل، والتعليمات البرمجية المشتركة في الوظائف، والتعليمات غير المتوافقة ؛ إلى جانب دعم البرامج متعددة الخيوط، والتنميط الدقيق والتحسينات التدريجية.

Translated Description (French)

Résumé La récupération automatique de la représentation de hautniveau d'un programme à partir de sa version binaire est un problème bienétudié dans les langages de programmation. Cependant, la plupart des solutions à ce problème sont basées sur des approches purement statiques : des techniques telles que les analyses de flux de données ou l'inférence de type sont utilisées pour convertir les octets qui constituent le code exécutable en un graphe de flux de contrôle (CFG). Cet article s'écarte d'un tel modus operandi pour montrer qu'une analyse dynamique peut être efficace et utile, à la fois comme technique autonome, et comme moyen d'améliorer la précision des approches statiques. Les résultats expérimentaux fournissent la preuve que l'exhaustivité, c'est-à-dire la capacité de conclure que l'ensemble du CFG a été découvert, est réalisable sur de nombreuses fonctions qui font partie de repères solides de l'industrie. Les expériences indiquent également que les informations dynamiques améliorent considérablement la capacité de DynInst , un reconstructeur binaire depointe, à traiter le code dépourvu d'informations de débogage. Ces résultats ont été obtenus avec CFGgrind, une nouvelle implémentation d'un reconstructeur de code dynamique, construit sur Valgrind . Lorsqu'il est appliqué à cBench, CFGgrind est 9 % plus rapide que callgrind , l'outil de Valgrind utilisé pour suivre les cibles des appels de fonction ; et 7 % plus rapide dans Spec Cpu2017 . CFGgrind récupère le CFG complet de 40% de toutes les procédures invoquées lors de l'exécution standard des programmes dans Spec Cpu2017 , et 37% dans cBench . Lorsqu'il est combiné avec CFGgrind , DynInst trouve 15 % de CFG en plus pour cBench et 7 % de CFG en plus pour Spec Cpu2017 . Enfin, CFGgrind est plus de 7 fois plus rapide que DCFG, un reconstructeur CFG d'Intel, et 1,30 fois plus rapide que bfTrace , un reconstructeur CFG utilisé dans la recherche. CFGgrind est également plus précis que ces deux outils, gérant les signaux du système d'exploitation, le code partagé dans les fonctions et les instructions non alignées ; en plus de prendre en charge les programmes multithread, le profilage exact et les améliorations incrémentielles.

Translated Description (Spanish)

Resumen La recuperación automática de la representación de altonivel de un programa desde su versión binaria es un problema bien estudiado en los lenguajes de programación. Sin embargo, la mayoría de las soluciones a este problema se basan en enfoques puramente estáticos: técnicas como el análisis de flujo de datos o la inferencia de tipos se utilizan para convertir los bytes que constituyen el código ejecutable en un gráfico de flujo de control (CFG). Este artículo se aparta de tal modus operandi para mostrar que un análisis dinámico puede ser efectivo y útil, tanto como una técnica independiente como una forma de mejorar la precisión de los enfoques estáticos. Los resultados experimentales proporcionan evidencia de que la integridad, es decir, la capacidad de concluir que se ha descubierto todo el CFG, se puede lograr en muchas funciones que forman parte de puntos de referencia sólidos de la industria. Los experimentos también indican que la información dinámica mejora en gran medida la capacidad de DynInst , un reconstructor binario deúltima generación, para lidiar con el código despojado de información de depuración. Estos resultados se obtuvieron con CFGgrind , una nueva implementación de un reconstructor de código dinámico, construido sobre Valgrind . Cuando se aplica a cBench , CFGgrind es un 9% más rápido que callgrind , la herramienta de Valgrind utilizada para rastrear objetivos de llamadas de funciones; y un 7% más rápido en Spec Cpu2017 . CFGgrind recupera el CFG completo del 40% de todos los procedimientos invocados durante la ejecución estándar de programas en Spec Cpu2017, y el 37% en cBench . Cuando se combina con CFGgrind , DynInst encuentra un 15% más de CFG para cBench y un 7% más de CFG para Spec Cpu2017 . Finalmente, CFGgrind es más de 7 veces más rápido que DCFG, un reconstructor CFG de Intel, y 1.30 veces más rápido que bfTrace , un reconstructor CFG utilizado en investigación. CFGgrind también es más preciso que estas dos herramientas, manejando señales del sistema operativo, código compartido en funciones e instrucciones no alineadas; además de admitir programas multiproceso, perfiles exactos y refinamientos incrementales.

Files

thesis.pdf.pdf

Files (4.5 MB)

Please wait a few minutes before your translated files are ready Note: Some files might be protected thus translations might not work.

Name	Size	Download all
thesis.pdf.pdf md5:8201bd8adf1736c85e15009d0b30e370	4.5 MB	Preview Download

Additional details

Translated title (Arabic): إعادة بناء ديناميكية عملية لمخططات تدفق التحكم
Translated title (French): Reconstruction dynamique pratique des organigrammes de contrôle
Translated title (Spanish): Reconstrucción dinámica práctica de gráficos de flujo de control

Other: https://openalex.org/W3092300773
DOI: 10.1002/spe.2907

Is Global South Knowledge: Yes
Country: Brazil

https://openalex.org/W1491178396
https://openalex.org/W1544225867
https://openalex.org/W1582456956
https://openalex.org/W1594784410
https://openalex.org/W1690253345
https://openalex.org/W17195072
https://openalex.org/W1971650562
https://openalex.org/W1996931407
https://openalex.org/W2031913690
https://openalex.org/W2040926019
https://openalex.org/W2047764386
https://openalex.org/W2102039064
https://openalex.org/W2110066339
https://openalex.org/W2117030266
https://openalex.org/W2119251836
https://openalex.org/W2127637733
https://openalex.org/W2134312016
https://openalex.org/W2150270530
https://openalex.org/W2158974202
https://openalex.org/W2161086571
https://openalex.org/W2162800072
https://openalex.org/W2296286487
https://openalex.org/W2296467253
https://openalex.org/W2339669378
https://openalex.org/W2466114212
https://openalex.org/W2888556190
https://openalex.org/W2914687823
https://openalex.org/W2952416601
https://openalex.org/W2962688040
https://openalex.org/W2998574808
https://openalex.org/W4231040899
https://openalex.org/W4244275790

	All versions	This version
Views	1	1
Downloads	1	1
Data volume	4.5 MB	4.5 MB

Practical dynamic reconstruction of control flow graphs

Translated Descriptions

Translated Description (Arabic)

Translated Description (French)

Translated Description (Spanish)

Files

thesis.pdf.pdf

Files (4.5 MB)

Additional details

Additional titles

Identifiers

Related works

GreSIS Basics Section

References

Practical dynamic reconstruction of control flow graphs

Creators

Description

Translated Descriptions

Translated Description (Arabic)

Translated Description (French)

Translated Description (Spanish)

Files

thesis.pdf.pdf

Files (4.5 MB)

Additional details

Additional titles

Identifiers

Related works

GreSIS Basics Section

References