|=----------------------------------------------------------------------=| |=---------------=[ Taint Me Like One of Your fetch cURLs ]=------------=| |=----------------------------------------------------------------------=| |=-----------------=[ Discovering IoT CVEs at scale with ]=-------------=| |=---------=[ binary pointer alias analysis and compiler theory ]=------=| |=----------------------------------------------------------------------=| |=-------------------------=[ 4ttil4sz1a ]=-----------------------------=| |=---------------------=[ attilaszia@proton.me ]=-----------------------=| |=----------------------------------------------------------------------=| O---[ Index 0 - Introduction 1 - Taint-style vulnerabilities 2 - Pointer alias analysis 3 - TMTDR: Too Much Theory Didn't Read 4 - Real-world taint-style vulnerabilities 5 - Access paths and on-demand inter-procedural analysis 6 - Multi-architecture binary program analysis 7 - Contributions 8 - Prototype implementation 9 - Closing remarks 10 - References 11 - Release artifacts ---[ 0 - Introduction It is 2025. Everything is being rewritten in Rust. Complex mitigations make exploiting even strong primitives difficult, almost elusive on recent platforms. Big tech threat intelligence groups give talks on recovering 0-days from the small traces left in dmesg logs. A lot of resources are spent on each side. Even defenders with formidable skills have a hard time deciphering the complexity of some exploits that target high-value systems like mobile operating systems or browsers. Does this cat-and-mouse game converge to a fixed point, or are we chasing each other like parallels in Bolyai's geometry, only to meet at some boundary point at infinity? What is the endgame, the singularity of offensive security, the unavoidable heat death of the hacker universe? The thermal catastrophe of cyberspace's cosmology, where not even Bekenstein entropy increases anymore? Are we left with a puddle of conformal quantum field excitations on a boundary of something that is likely a de-Sitter hologram written in Haskell by a random Gen-Z kid who thinks GitHub is as obsolete as day jobs? None of these questions will be answered in this article. Instead, we will look at something more pragmatic that I have repeatedly asked myself on sleepless nights for the past 10 years, staring into the full moon: But what about IoT devices?! The history, albeit not objective, is well known. Commenting on recent tariff policies, Howard Marks called the last 80 years the best economic period for mankind, with trade growth being a major reason. Financial benefits from globalization included keeping a lid on inflation by buying products from abroad. Little does Mr. Marks know the cost of cybersecurity risks that inflation cap came with when it comes to commercial off-the-shelf IoT devices and the attack surfaces of networks where these are installed. Still, real mayhem never came, apart from the Mirai botnet here and there bringing down parts of the internet, Phineas Fisher pwning Hacking Team via an IoT exploit with style, and APTs making a sport of mass-compromising surveillance devices for strategic objectives and overtaking ICS systems for whatever twisted reason. People with business affinity, now celebrated as demigods on the dangerous streets of San Francisco throwing big side-parties during the RSA conference, attacked the problem from the networking side. Instead of a modern, secure-by-design approach that application development follows, IoT devices of the past were never assumed to run secure code. A bunch of startups achieved "centaur" status with this practice, crossing $100M in annual recurring revenue by selling solutions stemming from potential answers to the IoT security problem. They call them agentless. People love it. People buy it. Only recently has the idea of maybe not having vulnerabilities in the firmware also enjoyed some renewed interest. It was hard to wrap my own head around, but where the oligopoly of big tech is a disadvantage for various reasons from allocative, productive, and dynamic market efficiency standpoints, it is an advantage when securing modern digital infrastructure at scale by simply not having to deal with complex supply chains and resource allocation problems. Running hardware companies, arguably, sucks. Unit costs must be kept to a minimum, already constraining engineering heavily. Net profit margins are dwarfed by those of software companies, even if you pull everything off. To stand out from the noise of competing companies, the pressure is high to ship more features, even within this constrained financial and technological environment. The shortage of skilled embedded systems developers with secure coding knowledge does not help either. For C-level executives of some manufacturing corporations, as long as there is no measurable cost for the lack of security measures they are held accountable for at a board meeting, there is zero incentive to do anything about it. This is worsened by the non-existent consumer awareness that could theoretically affect sales by deterring customers from the feature-driven marketing hype cycle. Allow me to paraphrase Joel Bakan: public corporations are simply obligated by law to prioritize shareholder interests to a psychopathic level, as opposed to their customers'. Excuse my French, but these entities don't give a flying fuck about you or your privacy. All of this resulted, to put it bluntly, in shitty code that's easy to hack. Pre-auth command injections in web controllers meant to configure the locale setting; special DHCP options giving you a root shell as a feature rather than a bug; 10-year-old image parsers processing the embedded JPEG in your Despacito.flac file that you just want to blast on your car's infotainment system via the USB media player. Still, if IoT vulnerabilities are so low-hanging and secure coding quality is so low, how come IoT security never had that "pentest" moment? That inflection point where vulnerabilities are commoditized by automatic tools to the degree that you just train operators to sit in front of them and farm findings without ever looking at the details and intricacies of the implementation or understanding the big picture? Well. This is where we are really taking off after this long introduction. Instead of answering with more, possibly ill-founded thoughts on the tech stack being overly heterogeneous, or market forces not fostering an ecosystem of pentesting entities, or manufacturers operating outside of verticals with extended adversarial activities or in unregulated consumer markets with zero compliance requirements... instead of all that, I will invite you on a more difficult route. We will discuss one of the primary technical issues preventing the commoditization of IoT vulnerability scanners. I am also inviting you to my "grand programme" of laying the groundwork for a tool that could become one of the open-source options people use in the future to test these systems much more efficiently than ever before. ---[ 1 - Taint-style vulnerabilities Luckily, manufacturers are realizing that relying on open-source software is not such a bad idea after all. The Yocto Project is getting more traction in the IoT space for generic Linux-based embedded systems. Zephyr is a popular solution for RTOS designs, and those building routers often fork a version of the OpenWRT project rather than relying on SDK demo code. That code often comes from the SoC vendor's admittedly poor software team with no guarantee it works. Similarly, certain devices that went through several cycles of Pwn2Own are expected to be in slightly better shape than a sibling that received less attention from the community. Still, looking at the CVEs and threat intel from the past five years, one invariant seems ever-present. Most practical exploits don't have to rely on complex multi-bug chains like they do in the browser and mobile space. Often, a single bug can compromise the whole system. We are talking about problems where one or a few shots of attacker-controlled input hit a networking service on the IoT device and reach a security-sensitive "sink," such as an out-of-bounds write or a command injection site. Trivial examples of these are easy to spot, but discovering similar issues in production code is considerably more difficult. It is not a simple task to identify parts of the code where user-controlled data enters the system (so-called ‘sources’ in taint analysis nomenclature). It is especially hard to find all feasible paths from these sources to the security-sensitive ‘sinks’. The execution flow can span various functions; variables get assigned, reassigned, and copied around; and indirect calls can happen via function pointers or virtual dispatch. There is no magic red button saying "exploit this firmware" in Ghidra or IDA. As an additional point, Product Security teams expect a lot from static analyzers and are willing to pay a lot for them. This is true even if these tools can’t reason about the code's runtime semantics due to known theoretical results, such as Rice's theorem. One explanation for their success is that they are still better at identifying vulnerable code patterns than the average developer. Results of SAST tools are also known to be noisy, meaning they have many false positives. They generally overestimate the feasible execution paths in programs and will raise alerts in many cases where the flagged vulnerability could never happen in practice. Circling back, one might ask: why are these taint-style vulnerabilities in IoT so hard to catch automatically? Again, this has to do with the practical and theoretical limitations, as well as the trade-off choices, of existing security testing tools. In particular, the problem boils down to what compiler theory has been calling the pointer alias analysis or (modulo some uncertainties in interpretation) the points-to analysis problem for the past 50–70 years. Consider the following example. void vuln(char* input, char* output){ strcpy(output, input); // Stack-based buffer overflow } int main(int argc, char** argv) { char taintedbuf[128]; char localbuf[32]; read(STDIN_FILENO, taintedbuf, 128); char* p = taintedbuf; char* q = p; vuln(q, localbuf); return 0; } It is not hard to catch the vulnerability here with a static analyzer. I'm obviously not advertising any brand, but CodeQL seems somewhat popular in the research community, even if its engine is not open source. With CodeQL, one can configure a source for read() and a sink for strcpy() via a module, say ReadToStrcpyFlowConfig. Something like the following query will catch the problem: module ReadToStrcpyFlow = TaintTracking::Global; from ReadToStrcpyFlow::PathNode source, ReadToStrcpyFlow::PathNode sink where ReadToStrcpyFlow::flowPath(source, sink) select source, sink Now let's consider a slightly more complex example that better resembles a real-world context. Here, there are two structure definitions where additional data is stored alongside the pointer. Moreover, the two structures are nested. Our target pointer is stored in struct data_inner, which in turn is wrapped in data_outer. Our vulnerable function now receives the outer structure and copies the data via data->ptr_inner->data. struct data_inner { int key; unsigned int hash; char* data; }; struct data_outer { int key; unsigned int hash; struct data_inner* ptr_inner; }; int vulnerable(struct data_outer* data); int main(int argc, char** argv) { char taintedbuf[128]; struct data_inner d1; struct data_outer d2; d1.key = 0xdead; d2.key = 0xbeef; d1.hash = 0xcafe; d2.hash = 0xbabe; d2.ptr_inner = &d1; d2.ptr_inner->data = (char *)&taintedbuf; read(STDIN_FILENO, taintedbuf, 128); vulnerable(&d2); return 0; } int vulnerable(struct data_outer* data) { char localbuf[64]; strcpy(localbuf, data->ptr_inner->data); } Believe it or not, CodeQL is unable to identify an issue here. Why? CodeQL chooses not to implement detailed pointer alias analysis as a design decision to keep things relatively fast. In many ways, static analyzers must balance computational complexity, precision, and scalability. They also need to maintain a nice user experience with an intuitive API or rule syntax. Still, aiming low on the pointer alias analysis problem means that CodeQL and most other SAST tools won't find many juicy, practical vulnerabilities. These are often super simple for human analysts to find and for attackers to exploit. So simple, in fact, that if you're suspecting that almost all LLMs would find the vulnerability in the code above, you're not wrong. If you're into Dijkstra and program correctness, or just mathematical structuralism for that matter, you might already sense why I'm not a fan of the LLM approach here. Otherwise, I'm not going to dive into the philosophical and aesthetic problems of why it is not a good idea. Why tokenize code, send it through a long sequence of matrix multiplications on overpriced Nvidia GPUs, with some non-linearities sprinkled in? Just to get a probability distribution that you can sample from with some temperature factor to find out, with high probability, that your code is vulnerable... well, most of the time. It is harder and harder to fight the distortions of reality caused by the large amounts of venture capital thrown at these approaches. But for the sake of this article, we are resorting to well-established measures of computational complexity and cost from the CS classes that some of you have taken at university. I'd like to convince you that the algorithm we are going to discuss here is a cheaper way to do these things. Also, in some admittedly ill-defined metric, I'd like to argue that it is the right way to do this as well. ---[ 2 - Pointer alias analysis So what is this problem and why is it difficult? Basically, you are trying to identify all pointer definitions p and q, where q may alias p. Two pointers alias each other when they point to the same object. Consider this example. void func(int num, int** p, int** q, int** r, int** s, int** t) { if (num<10) { *r = *q; *q = *p; } else { *r = *q; } *t = *p; } int main(int argc, char** argv) { int object1 = 0xdead; int object2 = 0xbeef; int object3 = 0xcafe; int *p, *q, *r, *s, *t; p = &object1; q = &object2; s = &object3; if (argc > 1) { func(atoi(argv[1]), &p, &q, &r, &s, &t); } else { printf("Taking one parameter %s \n", argv[0]); exit(1); } /* p, q, r, s, t are still alive here */ printf("%x %x %x %x %x", *p, *q, *r, *s, *t); return 0; } At the end of the main function, the following are true: p and q may alias p and r never alias p and s never alias p and t must alias q and r may alias q and s never alias q and t may alias r and s never alias r and t never alias s and t never alias If you compile this for whatever architecture, fire up Ghidra and load it. There will be a basic block implementing the final printf and return statement. All of these relationships hold for the values of p, q, r, s, and t at that basic block. Crucially, we can also be interested in aliasing relationships at a different basic block. For instance, consider the block before the conditional is taken, where the locals p, q, and s get their values. Can we establish aliasing relationships between the object addresses there and the pointers at the final return block? Since the func() operations do not change p or s, and we also see that r always receives the value of the original q, whereas t always receives the original p, we can deduce that: p in the final block must alias &object1 q in the final block may alias &object1 or &object2 r in the final block must alias &object2 s in the final block must alias &object3 t in the final block must alias &object1 Sometimes authors differentiate between alias analysis and points-to analysis. Alias analysis computes a set S holding pairs of variables (p,q) where p and q may (or must) point to the same location. On the other hand, points-to analysis computes a relation points-to(p,x), where p may (or must) point to the location of the variable x. It is not that hard to see that accurately establishing these relationships is not only NP-hard, but in general, it is uncomputable for arbitrary programs. For formal proofs of these, see Landi et al [1]. Practical algorithms therefore must resort to approximations, and it is important to mention that may-analysis is generally easier than must-analysis, as one should expect. Modern compilers like clang/LLVM and gcc run all sorts of fancy optimization and program analysis algorithms. However, for large codebases, not only are superpolynomial algorithms a big no-no, but even more expensive polynomial algorithms, say, cubic ones, are going to have scalability issues. That is, for compilation, more accurate alias analysis solutions won't be acceptable. However, when it comes to security, the trade-off is slightly different: spending a few minutes or even hours of compute to prevent a critical CVE might actually be a bloody good idea under most reasonable threat models and organizational economic constraints. For our model example, if we are able to establish with pointer alias analysis that data->ptr_inner->data aliases d2.ptr_inner->data, we can deduce that it aliases &taintedbuf. This enables us to raise an alert about an issue, given the sizes of the buffers used during strcpy. Cool, huh? ---[ 3 - TMTDR: Too Much Theory Didn't Read Before investigating what we can come up with in terms of pointer alias techniques capable of discovering interesting taint-style vulnerabilities, let's see what compilers do when dealing with pointers. The two main algorithms are Andersen's and Steensgaard's, invented by fine Scandinavian gentlemen in the late 20th century. Andersen's algorithm is pretty much what you would come up with naturally after obtaining a foundational body of knowledge in program analysis techniques such as dataflow analysis, constraint-based analysis, and Abstract Interpretation. So, Andersen's algorithm kind of goes like this. Let's define set constraints based on program statements using the following rules. address-of) [[ p = &x ]] -> l_x <= p copy) [[ p = q ]] -> p => q assign) [[ *p = q ]] -> *p => q dereference [[ p = *q ]] -> p => *q That is, in plain English: The set of locations pointed to by p contains the label x. The set of locations pointed to by p must contain those pointed to by q. The set of locations pointed to by *p must contain those pointed to by q. The set of locations pointed to by p must contain those pointed to by *q. If you think of the semantics, or how to actually run this on a simple program, consider what happens to the labels l_x, where x is the object identifier. It is easy to come up with these inference rules: p => q && l_x <= q ------------------------- l_x <= p meaning that if q is possibly pointing to some object x, and now p = q is encountered, then p also has the possibility to point to x. assign) *p => q && l_r <= p && l_x <= q ----------------------------------------------- l_x <= r meaning that if q is possibly pointing to some object x and p is possibly pointing to object r and now *p = q is encountered, then r has the possbility to point to x dereference) p => *q && l_r <= q && l_x <= r ----------------------------------------------- l_x <= p meaning that if r is possibly pointing to object x and q is possibly pointing to object r and now p = *q is encountered then p might point to x The attentive reader must have noticed that these rules have no reference to the order of statements whatsoever, and this is correct. The analysis does not consider the flow of the program, which is another way of saying that Andersen's algorithm is a flow-insensitive alias analysis. Let's run it manually for our toy example above and see whether it results in any inaccuracies. Collecting address-of statements: p = &object1; q = &object2; s = &object3; Collecting assignments (by folding the function call into main and ignoring the dereferences for simplicity): r = q; q = p; t = p; (The above cannot be done in general. The function func() could be called from different contexts, which opens up another dimension of analysis: context-sensitivity. Here, we use a simple example and a context-insensitive analysis for demonstration purposes. If you are Rolf Rolles, the chief scientist at Hex-Rays, this should be trivial and boring. If you are not him, just stay with me and it will all make sense.) So, first we have: p = {object1} q = {object2} r = {} s = {object3} t = {} Then, by the [copy] rule, we add a couple more labels: p = {object1} q = {object2, object1} r = {object2} s = {object3} t = {object1} Then, another run of [copy] reaches a fixpoint: p = {object1} q = {object2, object1} r = {object2, object1} s = {object3} t = {object1} The fact that you always reach a fixpoint if you define a bunch of inequalities, where the underlying values are discrete sets of object labels and where the rules (the fancy lingo is 'transfer functions') can only add, but never remove, 'new knowledge', should make sense intuitively. Generalizations of this observation have pretty important consequences in maths and CS, especially in game theory and logic programming. This generalization is celebrated as the Knaster-Tarski theorem (well, it's not like people throw parties dedicated to it, but you know what I mean). It's a mathematical guarantee that recursive definitions in logic and programming make sense and settle down — so we can build formal meanings for languages and logical systems with confidence. But anyway. Comparing our final state to the accurate solution, we see that Andersen's algorithm loses precision because it asserts that r may alias object1. This is due to flow-insensitivity: even though q can become an alias of p, the assignment r = q always happens before that statement in the actual execution flow. This may not be obvious from the example, but the lack of context-, field-, and flow-sensitivity means that Andersen's solution will be pretty inaccurate for most real-world programs. Steensgaard's is even worse in terms of accuracy, but at least it is roughly linear, so it scales well with program size (Andersen's has cubic worst-case time complexity). Instead of operating over the O(n^2) space of pointer pairs, Steensgaard's uses a constant number of abstract locations to represent objects. Points-to relations are therefore constrained to these abstract locations. In real programs, a variable can have multiple aliases; in this case, Steensgaard's algorithm unifies the corresponding abstract locations. And that's pretty much the algorithm. If a pointer can point to two or more different things, those things are merged. Then we check if that merged location now points to multiple things and merge those as well. This process repeats, eventually creating equivalence classes among the abstract locations. Formally, it processes the same types of statements as Andersen's: address-of) [[ p = &x ]] -> join(*p, x) copy) [[ p = q ]] -> join(*p, *q) assign) [[ *p = q ]] -> join(**p, *q) dereference [[ p = *q ]] -> join(*p, **q) But crucially, we join abstract locations (denoted by *p) like this (think of this as pseudocode, not Python or C, and try not to be bothered by its recursion): join(e1, e2): if (e1 == e2) return e1next = *e1 e2next = *e2 unify(e1, e2) join(e1next, e2next) To get an intuition for Steensgaard's, think of it this way. The analysis treats pointer assignments as equalities between abstract locations, not as subsets. Whenever two pointers are related by an assignment, their target locations are merged into a single equivalence class. This merging happens recursively via the join function, which ensures that not only the pointers themselves but also what they point to (and so on) are unified. The result is a fast, scalable analysis that sacrifices precision by conflating many distinct memory locations, making it ideal when speed matters more than exact aliasing details. So let's see a Steensgaard run for the same program as above. The initial state is the same. p = {object1} q = {object2} r = {} s = {object3} t = {} Then, we create the equivalence classes according to the assignments: {r} merges with {q, object2} → {r, q, object2} {q, object2, r} merges with {p, object1} → {p, q, r, object1, object2} {t} merges with {p, q, r, object1, object2} → {t, p, q, r, object1, object2} {s} remains independent, as it is not involved in any pointer assignments. After all merges, we have two final equivalence classes: {p, q, r, t, object1, object2} {s, object3} So, Steensgaard's algorithm manages to extract some useful information, but considerably less than what you'd want for security analysis. In exchange, each statement is processed once, and the additional operations are nearly constant. This results in a near-linear overall time complexity of O(n*alpha(n)). The alpha(n) here is the inverse Ackermann function, which can be treated as a constant for practical purposes. Now, although we looked at a simple example, things can get really tricky when you examine a few instances of this problem from the PTABEN [2] benchmark. Here's one checking for context-sensitivity. void bar(int**k, int**s){ *k = *s; } void foo(int**x,int**y,int**z){ int t; *y = &t; *z = *x; } int main(){ int *p1,*q1,*r1,*a1,*b1,*c1,q2,a2; int **p = &p1; int **q = &q1; q1 = &q2; int **r = &r1; int **a = &a1; a1 = &a2; int **b = &b1; int **c = &c1; bar(&p,&q); MUSTALIAS(p,q); NOALIAS(p,&p1); foo(p,q,r); MUSTALIAS(q1,r1); foo(a,b,c); MUSTALIAS(a1,c1); NOALIAS(q1,c1); NOALIAS(a1,r1); } This looks much more like an NP-hard problem relative to program size, huh? Kind of like a Sudoku puzzle. Incidentally, it is possible to add context-sensitivity to Andersen's algorithm, but it will naturally increase computational complexity. The state of the art for this problem using source code (well, at least as of 2020) seems to be SUPA [3] improving on SFS [4], although neither of them targets security use cases per se. Recent research has obviously looked at ML-augmented tricks, but as I mentioned briefly, I'm not taking those things seriously for the moment. For our toy example that CodeQL failed on, field- and flow-sensitivity would be nice. For programs such as Emacs, whole-program analysis times with flow-sensitivity are still in the range of thousands of seconds, with superpolynomial scaling. ---[ 4 - Real-world taint-style vulnerabilities In order to discover these types of vulnerabilities in real-world code, we need field-sensitivity and flow-sensitivity. A bit of context-sensitivity would also help. In addition, we might encounter indirect calls via function pointers (which could theoretically alias each other, becoming a crazy mess, but let's forget that for now). Moreover, we would love to do this using only binary code, since most real-life attackers won't have access to source code either, and they still manage to hack our applications and IoT systems. There has to be a way to extract 0-day semantics from the object code itself via program analysis, up to the limits imposed by Rice's theorem. At the end of the day, this is engineering. We don't care if the problem is theoretically impossible; we are just trying to generate actionable, high-quality alerts and the value that comes with them. Intuitively, many developers roll their eyes at the notion of binary program analysis. I think this is because they are unhappy with the false positives that commercial tools output, even when run on the full source code. This, and the fact that much information is lost during compilation, reinforces the wrong assumption that binary analysis means lower-quality findings. As we'll see, that's not the case. As far as vulnerabilities go, working with binaries can actually result in more accurate results. I've been in the rare and lucky position of managing to convince people to work with me on some crazy but serious-sounding ideas in IoT security, land some funding, etc., while staying out of the academic funding system. My life advice would be not to attempt repeating this. Nevertheless, these adventures brought me the fortune of briefly working with Balint Toth. He was a university student at the time who helped collect all CVEs for Linux-based IoT devices from 2018-2024. He downloaded the vulnerable firmware whenever he could, identified the vulnerable binary, and ran the program analysis algorithm we had at the time. Here's a snippet of the dataset: CVE-2023-38921 Netgear WG302v2 v5.2.9 /usr/sbin/configura.. CVE-2023-3260 Dataprobe iBoot-PDU v1.43.03312023 /usr/bin/usermanag.. CVE-2023-27988 Zyxel NAS326 v5.21(AAZF.13)C0 /usr/sbin/ntpdate_.. CVE-2021-4045 TP-Link Tapo C200 /usr/sbin/uhttpd CVE-2019-12103 TP-Link M7350 v160330 /archive/data/QCMA.. CVE-2021-43711 Totolink EX200 v4.0.3c.7646_B20201211 /web_cste/cgi-bin/.. CVE-2023-34832 TP-Link AX10v1.2 v1.3.4_20230220 /usr/sbin/v6plus CVE-2023-29856 D-Link DIR-868L v1.12 /htdocs/cgibin CVE-2019-5185 WAGO PFC200 v03.02.02(14) /usr/bin/iocheckd CVE-2023-39780 Asus RT-AX55 v3.0.0.4.386.51598 /sbin/rc CVE-2023-38924 Netgear DGN3500 v1.1.00.37 /usr/sbin/rc CVE-2023-37791 D-Link DIR-619L B1 v2.04 /bin/boa CVE-2023-37722 Tenda F1202 v1.2.0.20(408) /bin/httpd CVE-2023-34563 Netgear R6250 v1.0.4.48_10.1.30 /usr/sbin/httpd CVE-2023-34214 Moxa TN-5900 v3.3 /magicP/agent/agen.. CVE-2023-31742 Linksys WRT54GL v4.30.18_b6 /usr/sbin/httpd CVE-2023-31740 Linksys E2000 v1.0.06_b1 /usr/sbin/httpd CVE-2023-31700 TP-Link TL-WPA4530 KIT v170406 /usr/sbin/httpd CVE-2023-26612 D-Link DIR-823G v1.0.2B05 /bin/goahead CVE-2022-30114 Fastweb FASTGate v18.3.n.0482_FW_264 /bin/cmproxy CVE-2019-6258 D-Link DIR822B1 v202KRb06 /usr/sbin/udhcpd CVE-2019-20082 Asus RT-N63 v3.0.0.4.376.3754 /usr/sbin/httpd CVE-2019-11400 TRENDNet TEW-651BR v2.04B1 /sbin/ncc CVE-2018-20114 D-Link DIR-860L_REVB v2.03.B03 /htdocs/cgibin CVE-2023-39238 Asus RT-AX55 v3.0.0.4.386.50460 /usr/lib/libshared.. CVE-2023-31983 Edimax N300 BR-6428NS v4_1.10 /bin/webs CVE-2023-33626 D-Link DIR-600 B5 v2.18 /htdocs/cgibin CVE-2023-34356 Peplink Surf SOHO v6.3.5 fs1/web/cgi-bin/MA.. CVE-2023-33533 Netgear D8500 v1.0.3.60 /usr/sbin/httpd CVE-2023-46422 Totolink X6000R v9.4.0cu.652_B20230116 /usr/sbin/shttpd CVE-2023-4410 Totolink EX1200L v9.3.5u.6146_B20201023 /www/cgi-bin/cst.. CVE-2023-34800 D-Link RT-AC750 revA v101b03 /htpdocs/cgibin CVE-2023-33556 Totolink A7100RU v7.4cu.2313_B20191024 /usr/lib/lighttpd.. CVE-2023-37700 Tenda FH1203 v2.0.1.6 /bin/httpd CVE-2022-38841 Linksys AX3200 v1.1.01.272918 /www/cgi-bin/porta.. CVE-2023-38928 Netgear R7100 v1.0.0.78 /usr/sbin/httpd CVE-2023-31856 Totolink CP300+ v5.2cu.7594_B20200910 /web_cste/cgi-bin/.. CVE-2023-30135 Tenda AC18 v15.03.05.19(6318_)_cn /bin/httpd CVE-2023-2378 Ubiquiti EdgeRouter v2.0.9-hotfix.6 /usr/sbin/ubnt-util CVE-2023-29803 Totolink X18 v9.1.0cu.2024_B20220329 /web/cgi-bin/cstec.. CVE-2023-26801 LB-Link BL-WR9000 v2.4.9 /bin/goahead CVE-2022-28496 Totolink CP900 v6.3c.566_B20171026 /lib/cste_modules/.. CVE-2023-27240 Tenda AX3 v16.03.12.11 /bin/httpd CVE-2023-0640 TRENDnet TEW-652BRP v3.04b01 /sbin/ncc CVE-2023-24154 Totolink T8 v4.1.5cu /web/cste/cgi-bin/.. CVE-2022-44832 D-Link DIR-3040 FW120B03 /bin/prog.cgi CVE-2021-3707 D-Link DSL-2750U FW1.03 /cgi-bin/webupg CVE-2022-31874 Asus RT-N53 v3.0.0.4.376.3754 /usr/sbin/httpd CVE-2022-26999 Arris TR3300 v1.0.13 /usr/sbin/rc_app/.. CVE-2022-24171 Tenda G1 v15.11.0.17(9502)_CN /bin/httpd CVE-2021-41383 Netgear R6020 v1.0.0.48 /usr/sbin/rc_app/r.. CVE-2020-14109 Xiaomi AX3600 v1.1.12 /usr/sbin/meshd CVE-2023-45463 Netis N3 v1.0.1.751 /bin/script/cscrip.. CVE-2023-42320 Tenda AC10 v16.03.10.09 /bin/httpd CVE-2022-27646 Netgear R6700v3 v1.0.4.120 /bin/circled CVE-2020-19320 D-Link DIR-619L v2.0.5 /bin/boa CVE-2019-17621 D-Link DIR-859 v1.0.5 /htdocs/cgibin CVE-2023-38922 Netgear JWNR2000v2 v1.0.0.11 /usr/sbin/uhttpd CVE-2023-38925 Netgear EX6200 v1.0.0.64 /usr/sbin/httpd CVE-2023-37758 D-Link DIR-815 v1.04 /htdocs/cgibin Guess what? All of these are taint-style vulnerabilities hiding in the code of the network-facing binaries listed in the rightmost column! If only we had a pointer analysis algorithm that operated on multi-architecture binary code with field-, context-, and flow-sensitivity... Well, it turns out that a brilliant paper by Kai Cheng et al. attempts to solve this [5]. Incidentally, I'm releasing this dataset with this article, but more on that later. ---[ 5 - Access paths and on-demand inter-procedural analysis The authors of [5] draw their main inspiration from a much older paper written by Wen-Mei W. Hwu et al. [6]. Let's insert a flashback here. The year is 2000. Santana's "Smooth" is playing on the radio, the ILOVEYOU virus is spreading through the Internet, people are buying their first Nokia 3310s, and yours truly is about to start 2nd grade at age 8. Computers are objectively pretty slow and crappy, even though they are routinely beating Kasparov in chess. IRC is huge. Most importantly for this discussion, Intel Itanium is about to become a thing. As some of you know, Intel Itanium was a high-performance 64-bit processor family with an innovative EPIC (Explicitly Parallel Instruction Computing) architecture. It aimed to revolutionize enterprise computing by delivering superior parallelism and scalability. It failed due to poor backward compatibility with x86 software, competition from x86-64 processors like AMD's Opteron, and limited adoption by software developers. Itanium's experimental compilers were designed to perform advanced static scheduling and optimization at compile time, leveraging the chip's EPIC architecture to extract maximum parallelism. However, these compilers struggled to predict runtime behavior accurately, often leading to suboptimal performance and failing to justify the high costs required for software migration. Simply put: the idea was to be clever with compilers and crazy with parallelization, but it just didn't work out. Still, while searching for the holy grail of Itanium compilation, the IMPACT Research Group at the University of Illinois, led by Wen-Mei W. Hwu, produced some spectacular program analysis algorithms, including then-state-of-the-art pointer alias analysis. Pointer analysis algorithms can be categorized based on whether they work with the underlying object/variable or with access paths: Object/Variable-based analysis focuses on identifying the target object or variable a pointer refers to, abstracting away specific access paths. This simplifies the analysis but may lose precision for complex memory structures. Access path-based analysis considers the specific paths (e.g., x->y->z) used to reach a memory location, enabling finer-grained insights into pointer targets but increasing complexity and computational cost. Access paths were used naturally in earlier algorithms, but the work in [6] formally defined them and explained how to do context- and field-sensitive interprocedural analysis more or less efficiently: Definition 2.1 (Construction of Access Paths) As defined by the following rules, AP denotes the function that recursively determines postfix access paths for C expressions that involve memory accesses: 1. AP(v) = v, where v is a variable. 2. AP(exp()) = AP(exp)n(), where exp is a call-site, either direct or indirect, and n is a unique id. 3. AP(*exp) = AP(exp)*, if exp is not a function name. 4. AP(*exp) = AP(exp), if exp is a function name. 5. AP(exp[index]) = AP(exp), where exp is of array type and exp is not a formal parameter. 6. AP(exp[index]) = AP(exp)*, where exp is an arbitrary pointer or a formal parameter of array type. 7. AP(exp op exp1) = AP(exp), where op is an arbitrary binary operator and exp is a pointer type variable. 8. AP(exp -> field) = AP(exp) * .so_so. 9. AP(exp.field) = AP(exp).so_so. 10. AP(exp.field1.field2) = AP(exp).so3.so3, where so3 = so1 + so2 and eo3 = eo1 + eo2. I won't go super deep into transfer functions and monotone frameworks, but the real meat of this paper is showing how to propagate points-to information across procedure calls until a fixpoint is reached. It also shows how access paths are crucial for analyzing complex, potentially recursive data structures. (I obtained the source code of the original IMPACT compiler in case any of you'd like to take a look.) How efficient is the resulting algorithm? Well, it's exponential in different parameters. Still, with a conservative k-limiting factor of 1, it could analyze a program as complex as gcc in 500 seconds on a 450 MHz Pentium II, so it would be pretty damn fast and practical today. This also shows how great research in computer science eventually finds its application. The original objective failed, yet the same algorithm can be reused for security research in the 2020s! How cool is that? Still, the above algorithm operates on source code, and we need something better for our purposes. ---[ 6 - Multi-architecture binary program analysis So, we're back to the research in [5]. Their first insight is that symbolic/concolic execution approaches, apart from being slow, rely heavily on concretization strategies. These might resolve two aliased symbolic addresses to different concrete addresses, preventing taint from propagating properly. There is also an algorithm called VSA (a form of Abstract Interpretation) that does fairly well on binaries for numeric and pointer analysis. However, like symbolic execution, it represents memory on an object/variable basis. For example, when a read happens, both techniques conservatively model the result. This could be a 'top' element in VSA's lattice or an unconstrained symbolic value for concolic execution. In either case, useful information is lost if the value is later used as an address. See where this is going? This is exactly the problem that access paths solve. Let's see how to make them work for binary code. The idea is to operate on an Intermediate Representation. Instead of the grammar above for AP's, we define SSEs, structured symbolic expressions (not to be confused with static symbolic execution) recursively like this: expr ::= expr op_b expr | op_u expr | var op_b ::= + | - | * | / | << | >> | ... op_u ::= ~ | ! | ... var ::= tau_reg | tau_mem | tau_val tau_reg ::= r_i tau_val ::= {Integer} tau_mem ::= load(expr) | store(expr) The implementation uses VEX IR, originally designed for Valgrind, then ported to python by the angr team, and this definition closely follows the main objects and operations defined there. VEX is also powerful because it supports lifting x86, arm, mips, riscv, mips64, arm64, ppc64 binary code into IR, and it is easily extensible. To motivate the analysis, I'll reproduce the example given in the paper with my own words. Consider the following assembly listing: LDR R1, [R3, 0x8] MOV R0, R1 # R1 is pointer STR R3, [R6, 0x4] We'd like to find all pointer aliases to R1. By looking at line 1, we see that it is defined there. so we add load(R3+0x8) to the aliases. Now if we look forward from here, we find a usage of R3, it gets stored to R6+0x4. Therefore, we replace R3 with store(R6+0x4), and add load(store(R6+0x4)+0x8) to the alias set on line 3. Similarly, for this code if we were to find the aliases of r1: mov r5, r7 ldr r11, [r5] ldr r1, [r4] # r1 is a pointer mov r2, r1 str r2, [r7] We would end up with these (non-trivial) expressions: * r2 * load(r4) * r5 * store(r7) Nice, huh? The original paper gives the following set of update rules for the intra-procedural analysis: SSE update with define-use chain (1) r_i = r_j ---replace---> expr.replace(r_i, r_j) (2) r_i = Binop(r_n, r_m) ---replace---> expr.replace(r_n r_m, r_i) (3) r_i = ITE(r_j, r_n, r_m) ---replace---> expr.replace(r_n V r_m, r_i) (4) r_i = ITE(r_j, r_n, r_m) ---replace---> expr.replace(r_n, r_i)$ (5) r_i = Load(r_j) ---replace---> expr.replace(load(r_j), r_i) (6) Store(r_i) = r_j ---replace---> expr.replace(r_j, store(r_i)) --------------------------------------------------------------------- SSE update following use-define chain (8) r_i = r_j ---replace---> expr.replace(r_i, r_j) (9) r_i = Binop(r_n, r_m) ---replace---> expr.replace(r_i, r_n r_m) (10) r_i = ITE(r_j, r_n, r_m) ---replace---> expr.replace(r_i, r_n V r_m) (11) r_i = ITE(r_j, r_n, r_m) ---replace---> expr.replace(r_i, r_m)$ (12) r_i = Load(r_j) ---replace---> expr.replace(r_i, load(r_j)) (13) Store(r_i) = r_j ---replace---> expr.replace(store(r_i), r_j) --------------------------------------------------------------------- SSE kills (14) r_i = r_j ---kill---> expr.kill() (15) Store(r_i) = r_j ---kill---> expr.kill() --------------------------------------------------------------------- Table: Update rules for structured symbolic expressions ITE is just If-Then-Else. Believe it or not, this is still not enough to discover the vulnerability in the example that CodeQL also fails on due to the nested structures and aliasing. (Note that our problem might be slightly more difficult than CodeQL's, as we want to discover the bug without the source code, having only some binary code at our disposal.) My humble personal contribution to the set of update rules to solve this is the following. The analysis often encounters statements of the form (2), where Binop is addition: (2) r_i = Binop(r_n, r_m) ---replace---> expr.replace(r_n r_m, r_i) I relax this update rule for cases when the expression is also of the form: expr <-> Add(r_n, o_2) r_i = Add(r_n, o_1) ---replace---> expr.replace(r_n + o_1, r_n + o_d) where o_d = o_1 - o_2 This allows the replacement of expressions even if the offset does not completely match. Why? Structures with multiple fields can be passed on the stack between functions, and this transformation allows the analysis to recover aliasing relationships nonetheless. ---[ 7 - Contributions To reiterate, the main credit goes to Kai Cheng et al. for their work. I had the pleasure of exchanging thoughts with Dr. Cheng and discussing some of the improvement ideas I had in mind regarding accuracy and their computational cost. The automatic discovery of IoT vulnerabilities is a field of ongoing research. Moreover, there are commercial tools that are not bad at all, some targeting scripting languages and building on solid open-source code-analysis tools like Joern. This list is not exhaustive, but one other tool is worth mentioning. MANGODFA [8], from the original angr developers at ASU and UCSB, is a very promising tool. Unfortunately, however, they did not evaluate their solution against that of [5]. Reading the literature, I can confirm that while many solutions claim better results than [5] in some metric, its time-efficiency seems unbeatable. This is due to its practical time complexity and its ability to resolve indirect calls by applying access path algebra in a clever way. It can thereby address potentially vulnerable codebases that use large function pointer tables for dispatching, such as for URL routing in embedded HTTP servers. The astonishing part is that the reported timing results are single-core. The unmodified implementation on a moderately beefy workstation can scan an average Linux-based IoT device executable in 179 seconds and raise alerts with an 86% true positive rate for the taint-style vulnerabilities it can detect. My contributions, which I am releasing with this article, are the following: In collaboration with Balint Toth, I collected all the CVEs I could from 2018-2024 and gathered vulnerable firmware samples for them, identifying the vulnerable binary. [REL1] I implemented a prototype version of the algorithm for educational purposes. It beats CodeQL on our toy example's taint analysis problem by running on multi-architecture object code instead of source code. Furthermore, I added rules for more-accurate pointer alias analysis and showcased that SSE-based parallelism can be done. [REL2] I ported the original implementation of [5] from its IDA-based frontend to Ghidra, so the full implementation can be used as a free and open-source solution for security analysis. [REL3] This lends itself to a natural program toward the automation of low-complexity IoT vulnerability discovery. Note that if SSE-level parallelism is implemented for the full version of [5], an average target could be scanned in 2 to 40 seconds of wall-clock time, potentially scaling further with horizontal computing resources. Additional speedups from reimplementing the solution in native code (as opposed to Python) or porting it to a JIT-ed variant could bring this down to the few-seconds level without major algorithmic changes. Moreover, the on-demand nature of the pointer alias analysis makes the algorithm easily embeddable into reverse engineering and vulnerability research workflows, especially given these running times. The current implementation consults library function summaries. Other than that, it has no hard dependencies preventing its use on RTOS systems, once the necessary transfer functions for inter-procedural taint propagation are provided. Therefore, maintaining and improving these implementations could very well push low-complexity IoT vulnerability research into the "pentest era," where such a tool could rediscover most past CVEs. I'm inviting the research community to: Contribute to the code artifacts released with this article. Contribute to evaluating the solution on the released real-world dataset, and analyze the root cause of potential false negatives (and, less frequently, false positives). ---[ 8 - Prototype implementation The rest of this article will be dedicated to a very bare-bones Python implementation [REL2] that does just enough analysis to discover the taint vulnerability in binaries compiled from our example C file. It can be thought of as an appetizer for the full code in [REL3], which was largely developed by the original authors. I wrote most of this during two long train rides, so excuse the spaghetti code. Don't do this at home, kids! First, we need a way to represent SSEs. It turns out that angr has a module called claripy that already implements a nice AST interface for dealing with registers, values, and operations on them, defined in claripy/ast/bv.py. We just need to extend that with loads and stores to represent tau_mem expressions, and we are done: SSEload = operations.op('load', BV, BV, calc_length=operations.basic_length_calc, bound=False) SSEstore = operations.op('store', BV, BV, calc_length=operations.basic_length_calc, bound=False) That was easy. We'll also add a wrapper class around it to keep track of additional metadata: class SSE(): def __init__(self, sse, tracking, valid_notbefore, valid_notafter, parent): # The actual claripy SSE symbolic bitvector self.sse = sse # The direction in which the SSE is valid self.direction = tracking # The range where the SSE variable is live self.valid_notbefore = valid_notbefore self.valid_notafter = valid_notafter self.store_statement = None self.parent = parent We already know more or less enough to implement the intraprocedural part of the algorithm. We'll have to fetch a block of code, select a target register, and scan up and down and up and down all the statements within that block until a fixpoint is reached, repeatedly invoking the update rules defined above. Let's simulate that we are doing TDD and start with this test case instead of high-level requirements. We know what alias expressions we should get for this piece of ARM assembly: AS = "arm-linux-gnueabihf-as" CC = "arm-linux-gnueabihf-gcc -fno-stack-protector -O0 --machine=thumb" class TestPrisCoreSSE(unittest.TestCase): def test_core_simple(self): code = [ 'mov r2, r6', 'str r3, [r2]', 'ldr r1, [r3, #0x8]', 'ldr r0, [r6]', 'ldr r0, [r0, #0x8]', ] code_str = '\n'.join(code) with open("test.s", "w") as f: f.write(code_str) os.system(f"{AS} test.s -o test.out") proj = angr.Project('./test.out', auto_load_libs=False) cfg = proj.analyses.CFGFast() entry_node = cfg.get_any_node(proj.entry) irsb = proj.factory.block(proj.entry).vex res = pris_sse.traceblock(irsb, 11, "r1")[0] expected_str = ( "[('r3', ), " "('r2', ), " "('r6', ), " "('r1', ), ('r0', )]" ) self.assertEqual(str(res), expected_str) As you can see, we are taking that snippet and turn it into an ELF file with arm-linux-gnueabihf-as. Then, using angr.Project, the file is loaded into memory. We perform a quick control flow graph recovery to get the entry block. Then, we fetch that block, an object that is now a proper VEX IR SuperBlock. If you would like to familiarize yourself with VEX IR, the best start is probably [7]. The code we have to implement, of course is the definition of traceblock() as used on line 23. It should take an IRSB, a statement ID corresponding to our entry point to the analysis, and the name of the register that we want to get aliases for, in this case, r1. Conceptually we are doing the following things: * Initialize an SSE for the register in the target statement. * Forward pass to propagate this SSE from the target statement to the end. * Backward pass to propagate this sse from the target statement to the beginning. * Once the forward and backward passes are complete, we have a set of structured symbolic expressions for all the registers. * For each register, we forward pass to propagate it to the end, and backward pass to propagate it to the beginning. * If a new structured symbolic expression is found at this stage, we add it to the set of structured symbolic expressions. * We repeat the above step until we have no new structured symbolic expressions. * At this stage, we have the full set of structured symbolic expressions for all the registers. One of my other humble additions to the original algorithm is to parallelize SSE transformations as much as I can (see trace_task and Pool) while found_new_sses: # ...omitting some code... if multithreaded: # Prepare arguments for each parallel task task_args = zip( irsb_per_tuple, global_sses_per_tuple, global_sse_keys_per_tuple, global_sse_map_per_tuple, sse_tuples ) with Pool(16) as p: all_results = p.starmap(trace_task, task_args) # Merge results from all parallel tasks for result in all_results: global_sses, global_sse_keys, global_sse_map = merge( result[0], global_sses, global_sse_keys, global_sse_map) global_sses, global_sse_keys, global_sse_map = merge( result[1], global_sses, global_sse_keys, global_sse_map) The forward and backward passes are similar, and obviously, that's where the magic happens—the kind you're waiting for. The heart of the code is just going through statements and replacing SSEs according to their update rules: for idx, curr_stmt in enumerate(forward_statements): # ... if check_presence(irsb, curr_stmt, curr_var): print(f"Replacing {curr_sse} at {idx}") try: if curr_sse in local_sse_map: print(local_sse_map[curr_sse]) new_sse, tracking_info = do_sse_replace( curr_stmt, curr_sse, curr_var, irsb, True ) In more detail, this also involves tracking the direction and liveness of aliases (for more information on those, refer to the paper). The update rules themselves are implemented along the lines of this function: def do_sse_replace(stmt, sse, var, irsb, forward): """ This function replaces a variable with a new symbolic variable in the expression tree. :param stmt: The current instruction :param sse: The structured symbolic expression :param var: The variable to be replaced :param irsb: The VEX IRSB (block) :param forward: The direction of the analysis pass :return: A new structured symbolic expression and tracking info """ # ...omitting code... if stmt.tag == "Ist_Put": exprs = list(stmt.expressions) print(exprs) if exprs[0].tag == "Iex_RdTmp": reg_started_stats(1) print(" | Simple variable assignment - paper case 1") print(" | ri = rj ---rj----> expr.replace(rj, ri)") reg_name = irsb.arch.translate_register_name( stmt.offset, stmt.data.result_size(irsb.tyenv) // 8 ) replacement_sse = claripy.BVS( str(reg_name), 32, explicit_name=str(reg_name) ) new_sse = ast_replace(sse, replacement_sse, var) tracking_info = map_rule_to_sse_direction(1, forward) return (new_sse, tracking_info) # ...and so on for other cases... This more or less completes, or at least give you an idea about the intraprocedural part of the algorithm. For the inter-procedural part, spanning multiple functions, the following has to be implemented: Algorithm 2 Intra- and inter-procedural alias analysis ------------------------------------------------------------ 1: procedure ANALYZEFUNCTION(CFG) 2: WorkList <- Postorder(CFG) 3: do 4: FINDALIAS(WorkList.reverse()) // forward 5: FINDALIAS(WorkList) // backward 6: while new aliases can be generated 7: MERGE(Caller.CallSite, EntryBB.OUT_b) 8: MERGE(Caller.ReturnSite, ExitBB.OUT_f) 9: 10: procedure FINDALIAS(WorkList) 11: for BB in WorkList do 12: if BB is a callsite then 13: PTRs <- GETROOTPTR(PRE_f, SUC_b) 14: MOD, REF <- TRANSFERFUNC(PTRs, callee) 15: UPDATECONTEXT(PRE_f, SUC_b, MOD, REF) 16: else 17: OUT_f, OUT_b <- TRACEBLOCK(PRE_f, SUC_b) 18: MERGE(BB.successors, BB.OUT_f) 19: MERGE(BB.predecessors, BB.OUT_b) To make this barely work for our C code (or its compiled binaries), we'll define a minimal set of sources and sinks. We will also disregard the "magic" transfer function parts of the full algorithm for now. Spelled out in code: def interprocedural(proj, cfg, addr, out_sses_incoming=None): main_func = cfg.kb.functions[addr] print("Recursing interprocedurally") print(main_func.transition_graph.nodes) postorder = networkx.dfs_postorder_nodes(main_func.transition_graph) index = 0 out_sses = out_sses_incoming for node in postorder: print(node) if isinstance(node, angr.codenode.BlockNode): irsb = proj.factory.block(node.addr).vex irsb.pp() # For both cases, we need to trace the block. # We prepare the arguments for the traceblock call here. trace_kwargs = { 'multithreaded': True, 'first': False, 'sse_tuples_start': out_sses[0] if out_sses else None, 'sse_map_start': out_sses[2] if out_sses else None } if not out_sses and index > 0: raise Exception("No outbound SSEs from previous trace") sse_tuples, keys, sse_map = pris_sse.traceblock( irsb, None, None, **trace_kwargs ) out_sses = rebase_sses(sse_tuples, keys, sse_map) elif isinstance(node, angr.knowledge_plugins.functions.function.Function): if node.is_plt: print("Checking sinks for:") print(node) check_sinks(proj, node, out_sses) continue else: interprocedural(proj, cfg, node.addr, out_sses_incoming=out_sses) index += 1 And finally, using our simple prototype, we are able to discover the issue and beat CodeQL to it! Again, it makes sense to reiterate that this solution does not require source code at all. Whether you compile it to x86, ARM, MIPS, or RISC-V, it will work just as well. Recursing interprocedurally [ , , ] Function strcpy [4195344] Syscall: False SP difference: 0 Has return: False Returning: True Alignment: False Arguments: reg: [], stack: [] Blocks: [0x400410] Calling convention: None Checking sinks for Function strcpy [4195344] Syscall: False SP difference: 0 Has return: False Returning: True Alignment: False Arguments: reg: [], stack: [] Blocks: [0x400410] Calling convention: None False Out SSEs ( [ ('r3', ), ('r3', ), ('r7', ), ('r7', ), ('r7', ), ('r1', ), ('r0', ) ], [ '', '', '', '', '', '', '' ], { : , : , : , : , : , : , : } ) [ ! ] ALERT : Potential Buffer Overflow at 0x400410 [ ! ] Tainted operand to strcpy [ ! ] Alias chain : | SSE: | parent: | SSE: | parent: | SSE: | parent: | SSE: | parent: | SSE: | parent: | SSE: | parent: | SSE: | parent: | SSE: | parent: | SSE: | parent: | SSE: | parent: QED. In terms of benchmarks, the original implementation by the authors is already up to 200x faster than other state-of-the-art analyzers. This speedup is largely because earlier approaches often used angr's symbolic execution in an underconstrained way. That method leads to long waits for Z3 queries to finish, state explosion, and non-trivial memory concretization strategies. In contrast, this solution is single-threaded, and most of its computation is spent on simple Python AST manipulation. This approach has proven effective for discovering real-world 0-day vulnerabilities in IoT device firmware. We successfully applied it during our time in the research team at BugProve, a former startup focused on automated security analysis of embedded systems. Although the team has since disbanded, the methodology remains highly effective and can also be used to rediscover known (N-day) vulnerabilities. In terms of accuracy, it outperforms VSA-based systems, achieving 90–100% precision on the PTABEN basic, flow, and context pointer alias datasets. ---[ 9 - Closing remarks This article has tried to connect the dots, showing how foundational research from the compiler world can be repurposed to tackle modern security challenges. What began as techniques to optimize code for ambitious hardware like Itanium has found a new, critical purpose: securing the vast and often opaque world of IoT devices. My central argument has been that for real-world security, especially in the embedded space, we must be comfortable working at the binary level. Source code is often a luxury we don't have, and even when we do, the compiled binary is the ultimate ground truth. The goal isn't to replace source-based tools, but to have powerful, precise analysis for the many situations where they fall short. The prototype I've described, built on the solid foundation of access paths and structured symbolic expressions, shows this is a practical goal. It demonstrates that even without source, we can achieve a high degree of precision in our analysis, outperforming other tools on specific, tricky problems involving pointer aliasing. This article is also an open invitation. The CVE dataset [REL1], the educational prototype [REL2], and the Ghidra port of the full implementation [REL3] are being released to provide a common ground for further research. There is immense potential here to improve these tools, find the root causes of their current limitations, and build upon them. Pushing this line of research forward could finally bring a new level of automation and clarity to IoT vulnerability discovery. I look forward to seeing what the community builds next. ---[ 10 - References [1] William Landi and Barbara G. Ryder. 1991. Pointer-induced aliasing: a problem classification. In Proceedings of the 18th ACM SIGPLAN-SIGACT symposium on Principles of programming languages (POPL '91). Association for Computing Machinery, New York, NY, USA, 93-103. https://doi.org/10.1145/99583.99599 [2] https://github.com/SVF-tools/Test-Suite/tree/master/src [3] Y. Sui and J. Xue, "Value-Flow-Based Demand-Driven Pointer Analysis for C and C++," in IEEE Transactions on Software Engineering, vol. 46, no. 8, pp. 812-835, 1 Aug. 2020. doi: 10.1109/TSE.2018.2869336. [4] Ben Hardekopf and Calvin Lin. 2011. Flow-sensitive pointer analysis for millions of lines of code. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '11). IEEE Computer Society, USA, 289-298. [5] Kai Cheng et al. 2023. Detecting Vulnerabilities in Linux-Based Embedded Firmware with SSE-Based On-Demand Alias Analysis. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2023). ACM, New York, NY, USA, 360-372. https://doi.org/10.1145/3597926.3598062 [6] Ben-Chung Cheng and Wen-Mei W. Hwu. 2000. Modular Interprocedural Pointer Analysis Using Access Paths: Design, Implementation, and Evaluation. SIGPLAN Not. 35, 5 (May 2000), 57-69. https://doi.org/10.1145/358438.349311 [7] https://github.com/angr/vex/blob/master/pub/libvex_ir.h#L42 [8] Gibbs, Wil, et al. Operation Mango: Scalable Discovery of Taint-Style Vulnerabilities in Binary Firmware Services. 33rd USENIX Security Symposium (USENIX Security 24), USENIX Association, Aug. 2024, pp. 7123-7139. https://www.usenix.org/conference/usenixsecurity24/presentation/gibbs ---[ 11 - Release artifacts [REL1] https://github.com/attilaszia/linux-iot-cves [REL2] https://github.com/attilaszia/sagrada [REL3] https://github.com/attilaszia/emtaint-sagrada |=[ EOF ]=-------------------------------------------------------------=|