Trusted Computing Base[edit]

A secure and trustworthy computer is a computer that only does what you (think you) tell it to.

Computers are insecure and not trustworthy and there is nothing we can do about that. Computers are "universal machines" (see Von Neumann, Turing). They can be instructed to do anything they can do according to the laws of physics. It is important to understand that a computer can't discern between "good" and "bad" instructions, bits are bits. There are two general problems:

(1) There are a lot of people who can instruct a given machine to do something (anything!), we need to trust them all, or verify everything, which is impossible.

If you read or write all instructions (kernel, software, firmware) yourself, study the hardware design (so you understand what an instruction actually does) and compare that with the actual hardware and then make sure you are the only one who can access the machine and give it new instructions, you need to trust that other people only give the machine instructions to do things that you want it to do. You can tell it to accept other, external untrusted instructions in a restricted, controlled fashion (like "render that html but other wise don't touch my system") but this is next to impossible to do reliably because of point 2). Otherwise you could never connect that machine to a network or transfer any input from another system to it ("input" and "instructions" has to be understood in a very broad sense, it is not just about executable code, it can as well be media files or hardware devices that are being attached).

The "Trusted Computing Base" (TCB) refers to all instructions and hardware used to restrict a "universal machine" from accepting and acting on (other) arbitrary instructions. It is used to set a certain security policy to enforce that a computer only does what these trusted instructions allow it to. "Trusted" simply means that we have to trust it, not that it is wise to trust it. The TCB tries to restrict what instructions can do but it consists of instructions itself. It can't restrict or police itself. Any flaw whatsoever in the TCB directly results in a compromise of the security of the system.

(2) The machines themselves and the instructions are incredibly complex.

Even in an ideal world with mathematically verified hardware and software we may think we are giving instructions to do A while in reality the instructions tell it to do B, or also B. Machines can not make "mistakes" but they certainly can do unexpected things...

We can't give machines instructions in our language (programming language) it needs to be in machine code. The translation is so complex that we need another machine to do that for us, and to create that machine another was used and so on. This expands our chain of trust to basically all machines and humans involved to the first compiler created by an assembler which in turn was created by hand-written machine code or earlier ^[archive] even punched cards ^[archive].

Hardware is fallible, hardware failure can result in entirely unpredictable results. Cosmic rays "flipping a bit" or just bad memory and extreme temperatures? Look up "Bit-squatting" for an example. Software depends on the hardware to function perfectly, what happens if the most basic assumptions are betrayed?

Another example for complexity, side channel attacks: they are very difficult to protect against and poorly understood by most software developers. E.g. through power or CPU load analysis otherwise isolated parts outside the TCB can influence or eavesdrop on privileged encryption. In other words, determining what is part of the TCB and what is definitely not, is a very hard problem. See for example Spectre and Meltdown.

If that wasn't enough we also have the problem that circuit layouts, microcode and firmware of most vendors is proprietary (nonfree). Part because of competition concerns, part because they are afraid of patents... In any case, this makes verification even more difficult that it already is. Further "hardware" (which includes firmware, i.e. software) is becoming more and more capable (#complex and dangerous). Features like Trusted Computing when used to prevent ring 0 access from the user and owner of the computer, EFI ^[archive] that has Internet connectivity and can update itself or Intel RMM / Intel AMT / Intel ME which can grant remote access that is invisible to the user and operating system (OS) combined with a less than perfect track record (e.g. see Intel related research by Invisible Things Lab) doesn't exactly spell trustworthy and 100% dependable.

Conclusion: The TCB can never really be trustworthy. The source code of every currently usable OS kernel alone is too complex and large to completely audit and make error free, not just for a single human but even for large groups like the "Linux community". But let's assume we solve that (e.g. through using microkernels and formal verification): How do you make sure compiled binaries are actually doing what the source intended? Or, how can you verify that complex hardware and integrated circuits are actually built according to their intended design? For all the verification and auditing processes we are dependent on other complex computer systems that would have to be trusted unconditionally. Bootstrapping trust is a chicken and egg problem. We would have to be able to verify systems with just looking at them with our bare eyes and hands or build/verify all systems necessary to bootstrap a modern compiler and hardware development platform. This may have been possible for the first "computers" in the first half of the 19th century but not anymore.

Since there's nothing we can do about that, what else can we do then?

We need to design our systems in a way that makes it no longer necessary to trust them 100%. We only need to trust that it is good enough and that it is astronomically unlikely that multiple diverse systems are untrustworthy in the same way.

If you don't want a computer to be able to tell anyone your location or identity better make sure the computer doesn't know either... The Whonix ™ security design in some ways mimics the security design of the Tor network itself: Don't trust any single entity to be trustworthy. We only rely on the fact that it is unlikely that all entities (nodes, computers) are compromised and colluding. To be precise, that's a goal that is not achieved yet with Whonix ™ alone, see Whonix². One could also say that the actual TCB of such a "system" (actually multiple systems) becomes the design, arrangement and usage policy which is very well possible for every user to comprehend and verify.

This could be compared to the "Air Gap" used on most high security networks. They assume that the TCBs are not trustworthy and work around that using a simple and easily verifiable policy that basically eliminates the complete attack surface or hardware and software bugs and even protects against most backdoors (for example, a subverted PRNG could still result in weak crypto being exported from the trusted network where it can be recorded and "cracked" by an adversary. A strong pyhiscal isolation based system could then encrypt data twice on different systems using different PRNG implementations to protect against such attacks.)

Physical Isolation in the sense of Whonix ™ is not a new idea, see Verifiable Computer Security and Hardware: Issues by William D. Young. September 1991 (PDF!) ^[archive] page 18 for a summary. It seems like the idea was rediscovered by Whonix ™ (independently, and we came up with the same term), to our knowledge Whonix ™ is currently the only project following this approach in a defined way.

Security Mindset of Open Source Software Ecosystem[edit]

Upstream distributions (such as Debian) can mostly only package available upstream software but not re-write most. Though, a lot upstream software was created with priorities and not necessarily highest security in mind. A lot required functionality is only available through "legacy" software written in memory-unsafe programming languages. The whole Open Source software ecosystem was never primarily focused on highest possible security to begin with.

https://www.youtube.com/watch?v=31xA9p3pYE4 ^[archive] does not sound promising on general trends in computer security. Quoting freely. "More and more companies (such as google chrome to pick one example among a trend) move to an approach where security bug reports are disregarded unless a proof of concept exploit is being provided. Mitigation methods for bad, potentially exploitable code are preferred over actual security bug fixes because there is too much source code, complexity is too high and the total number of bugs is unmanageable."

See also Linux User Experience versus Commercial Operating Systems to learn about organizational issues in the Open Source ecosystem.

See also Dev/Operating System to learn about which operating systems were considered for Whonix ™.

Bad Usability of Programming Languages[edit]

Too much software which is part of the TCB is written in very old programming languages such as assembler, C and C++. These programming languages are unsafe and difficult to master. There are many ways to make mistakes. Review of software written in such languages is difficult even if the author had good intentions. It is getting worse if the author had malicious intentions. Auditing exiting source code can be super difficult, see Obfuscated C Code Contest ^[archive]. Most times it is easier to write source code than to understand source code written by a third party. Geeks will disagree however due to the flood of security issues it should be abundantly proven that there is rarely security bug free source code. Therefore the conclusion is undeniable that there are safety and usability issues in programming languages and the overall tool chain.

Even easier programming languages such as ruby are still too difficult to learn. Therefore the number of hobbyists motivated/capable to use them is limited. Programming languages that focus on usability, run usability studies and iterate according to usability research, that can be understood by as many people as possible are yet to be invented.

Too much Software Written in Unsafe Languages[edit]

This is partially because by the time the software was written there were no easier / more suitable programming languages available. Also performance considerations were part of this. Nowadays too few people are sufficiently aware, funded and/or motivated to replace this software with rewrites in languages with higher safety and usability. In the very strong opinion of the author, for "perfect security" a lot old source code written in unsafe languages would have to be rewritten in safe and easy (yet to be invented) programming languages to keep the amount of code in the TCB which is written in unsafe languages at a minimum to make it manageable to audit and free of security issues. This however is very unlikely to happen.

Purposeful Vulnerabilities in Software[edit]

In 2021, researchers at the University of Minnesota released a paper that revealed they had purposefully tested the feasibility of introducing vulnerabilities (use-after-free bugs) in open source software (OSS) -- in this case the Linux kernel -- via "hypocrite commits." They defined these commits as "...seemingly beneficial commits that in fact introduce other critical issues." ^[1] While this research is ethically questionable, it prompts doubt about the OSS claim that this development approach produces more reliable, secure and higher-quality software, particularly since the "many eyeballs" approach is suggested to increase the rate at which bugs are identified and fixed. The paper is significant because the Linux kernel powers billions of devices world-wide and the hypocrite commits were not identified until the researcher's paper was published.

The researchers identified three primary reasons for why hypocrite commits to the kernel were possible: ^[1]

(1) OSS is open by nature, so anyone from anywhere, including malicious ones, can submit patches. (2) Due to the overwhelming patches and performance issues, it is impractical for maintainers to accept preventive patches for “immature vulnerabilities”. (3) OSS like the Linux kernel is extremely complex, so the patch-review process often misses introduced vulnerabilities that involve complicated semantics and contexts.

Suggested mitigations included: ^[1]

increasing committer liability and accountability
employment of advanced static-analysis tools
high-coverage or directed dynamic testing (such as fuzzers)
accepting patches for high-risk immature vulnerabilities
raising awareness of the risk of hypocrite commits
auditing of public patches by a larger number of people (not just maintainers)

This example illustrates that malicious committers to OSS may have had multiple opportunities to purposefully introduce stealthy vulnerabilities in countless software tools and projects. It is feasible that high-profile OSS focused on privacy and anonymity might have become a target for similar attempts. Successful attempts have the potential to be significant, because they could potentially exist for a long period and affect a countless number of users. As outlined by the researchers, it is possible to mitigate these underhanded methods by updating the code of conduct for OSS and developing/improving tools for patch testing and verification. ^[1]

Backdoors[edit]

Table: Finding Backdoors in Freedom Software vs Non-Freedom Software

	Non-Freedom Software (precompiled binaries)	Freedom Software (source-available)
Original source code is reviewable	No	Yes
Compiled binary file can be decompiled into disassembly	Yes	Yes
Regular pre-compiled binaries	Depends ^[2]	Yes
Obfuscation ^[archive] (anti-disassembly, anti-debugging, anti-VM) ^[3] is usually not used	Depends ^[4]	Yes ^[5]
Price for security audit searching for backdoors	Very high ^[6]	Lower
Difference between precompiled version and self-compiled version	Unavailable ^[7]	Small or none ^[8]
Reverse-engineering ^[archive] is not required	No	Yes
Assembler language skills required	Much more	Less
Always legal to decompile / reverse-engineer	No ^[9] ^[10]	Yes ^[11]
Possibility of catching backdoors via observing incoming/outgoing Internet connections	Very difficult ^[12]	Very difficult ^[12]
Convenience of spotting backdoors	Lowest convenience ^[13]	Very high convenience ^[14]
Difficulty of spotting "direct" backdoors ^[15] ^[16] ^[17]	Much higher difficulty ^[18]	Much lower difficulty ^[19]
Difficulty of spotting a "bugdoor" ^[20]	Much higher difficulty ^[21]	Lower difficulty
Third parties can legally release a software fork ^[archive], a patched version without the backdoor	No ^[22]	Yes ^[23]
Third parties can potentially make (possibly illegal) modifications like disabling serial key checks ^[24]	Yes	Yes
Software is always modifiable	No ^[25]	Yes
Third parties can use static code analysis tools	No	Yes
Third parties can judge source code quality	No	Yes
Third parties can find logic bugs in the source code	No	Yes
Third parties can find logic bugs in the disassembly	Yes	Yes
Benefits from population-scale scrutiny	No	Yes
Third parties can benefit from debug symbols ^[archive] during analysis	Depends ^[26]	Yes
Display source code intermixed with disassembly	No	Yes ^[27]
Effort to audit subsequent releases	Almost same ^[28]	Usually lower ^[29]
Forum discussion: Finding Backdoors in Freedom Software vs Non-Freedom Software ^[archive]

Spotting backdoors is already very difficult in Freedom Software where the full source code is available to the general public. Spotting backdoors in non-freedom software composed of obfuscated binaries is exponentially more difficult. ^[30] ^[31] ^[32] ^[33] ^[34] ^[35] ^[36] ^[37]

To further improve the situation in the future, the Freedom Software community is working on the Reproducible Builds ^[archive] project. Quote:

Reproducible builds are a set of software development practices that create an independently-verifiable path from source to binary code.

Whilst anyone may inspect the source code of free and open source software for malicious flaws, most software is distributed pre-compiled with no method to confirm whether they correspond.
This incentivises attacks on developers who release software, not only via traditional exploitation, but also in the forms of political influence, blackmail or even threats of violence.
This is particularly a concern for developers collaborating on privacy or security software: attacking these typically result in compromising particularly politically-sensitive targets such as dissidents, journalists and whistleblowers, as well as anyone wishing to communicate securely under a repressive regime.
Whilst individual developers are a natural target, it additionally encourages attacks on build infrastructure as an successful attack would provide access to a large number of downstream computer systems. By modifying the generated binaries here instead of modifying the upstream source code, illicit changes are essentially invisible to its original authors and users alike.
The motivation behind the Reproducible Builds project is therefore to allow verification that no vulnerabilities or backdoors have been introduced during this compilation process. By promising identical results are always generated from a given source, this allows multiple third parties to come to a consensus on a “correct” result, highlighting any deviations as suspect and worthy of scrutiny.
This ability to notice if a developer has been compromised then deters such threats or attacks occurring in the first place as any compromise would be quickly detected. This offers comfort to front-liners that they not only can be threatened, but they would not be coerced into exploiting or exposing their colleagues or end-users.

Several free software projects ^[archive] already, or will soon, provide reproducible builds.

Footnotes[edit]

↑ ^1.0 ^1.1 ^1.2 ^1.3 https://raw.githubusercontent.com/QiushiWu/qiushiwu.github.io/main/papers/OpenSourceInsecurity.pdf ^[archive]
↑ Some use binary obfuscators.
↑ https://resources.infosecinstitute.com/topic/anti-disassembly-anti-debugging-and-anti-vm/ ^[archive]
↑ Some use obfuscation.
↑ An Open Source application binary could be obfuscated in theory. However, depending on the application and the context -- like not being an Open Source obfuscator -- that would be highly suspicious. An Open Source application using obfuscators would probably be criticized in public, get scrutinized, and lose user trust.
↑
This is because non-freedom software is usually only available as a pre-compiled, possibly obfuscated binary. Using an anti-decompiler:
- Auditors can only look at the disassembly and cannot compare a pre-compiled version from the software vendor with a self-compiled version from source code.
- There is no source code that is well-written, well-commented, and easily readable by design.
↑ Since there is no source code, one cannot self-build one's own binary.
↑
- small: for non-reproducible builds (or reproducible builds with bugs)
- none: for reproducible builds
↑ Decompilation is often expressly forbidden by license agreements of proprietary software.
↑ Skype used DMCA (Digital Millenium Copyright Act) to shut down reverse engineering of Skype ^[archive]
↑ Decompilation is always legal and permitted in the license agreements of Freedom Software.
↑ ^12.0 ^12.1 This is very difficult because most outgoing connections are encrypted by default. At some point the content must be available to the computer in an unencrypted (plain text) format, but accessing that is not trivial. When running a suspected malicious application, local traffic analyzers like Wireshark ^[archive] cannot be trusted. The reason is the malicious application might have compromised the host operating system and be hiding that information from the traffic analyzer or through a backdoor. One possible option might be running the application inside a virtual machine, but many malicious applications actively attempt to detect this configuration. If a virtual machine is identified, they avoid performing malicious activities to avoid being detected. Ultimately this might be possible, but it is still very difficult.
↑ It is necessary to decompile the binary and read "gibberish", or try to catch malicious traffic originating from the software under review. As an example, consider how few people would have decompiled Microsoft Office and kept doing that for every upgrade.
↑
It is possible to:
1. Audit the source code and confirm it is free of backdoors.
2. Compare the precompiled binary with a self-built binary and audit the difference. Ideally, and in future, there will be no difference (thanks to the Reproducible Builds project) or only a small difference (due to non-determinism introduced during compilation, such as timestamps).
↑ An example of a "direct" backdoor is a hardcoded username and password or login key only known by the software vendor. In this circumstance there is no plausible deniability for the software vendor.
↑ List of “direct” backdoors in wikipedia ^[archive].
↑
One interesting “direct” backdoor was this bitcoin copay wallet backdoor:
- If more than 100 BTC, steal it. Otherwise, don’t bother.
- https://www.synopsys.com/blogs/software-security/malicious-dependency-supply-chain/ ^[archive]
- https://github.com/dominictarr/event-stream/issues/116 ^[archive]
- https://github.com/dominictarr/event-stream/issues/116#issuecomment-441759047 ^[archive]
↑ Requires strong disassembly auditing skills.
↑ If for example hardcoded login credentials were in the published source code, that would be easy to spot. If the published source code is different from the actual source code used by the developer to compile the binary, that difference would stand out when comparing pre-compiled binaries from the software vendor with self-compiled binaries by an auditor.
↑ A "bugdoor" is a vulnerability that can be abused to gain unauthorized access. It also provides plausible deniability for the software vendor. See also: Obfuscated C Code Contest ^[archive].
↑ Such issues are hard to spot in the source code, but even harder to spot in the disassembly.
↑ This is forbidden in the license agreement. Due to lack of source code, no serious development is possible.
↑ Since source code is already available under a license that permits software forks and redistribution.
↑ This entry is to differentiate from the concept immediately above. Pre-compiled proprietary software is often modified by third parties for the purposes of privacy, game modifications, and exploitation.
↑ For example, Intel ME could not be disabled in Intel CPUs yet. At the time of writing, a Freedom Software re-implementation of Intel microcode is unavailable.
↑ Some may publish debug symbols.
↑
- objdump ^[archive] with parameter -S / --source
- How does objdump manage to display source code with the -S option? ^[archive]
↑ It is possible to review the disassembly, but that effort is duplicated for subsequent releases. The disassembly is not optimized to change as little as possible or to be easily understood by humans. If the compiled version added new optimizations or compilation flags changed, that creates a much bigger diff ^[archive] of the disassembly.
↑ After the initial audit of a source-available binary, it is possible to follow changes in the source code. To audit any newer releases, an auditor can compare the source code of the initially audited version with the new version. Unless there was a huge code refactoring or complete rewrite, the audit effort for subsequent versions is lower.
↑ The consensus is the assembler low level ^[archive] programming language is more difficult than other higher level abstraction ^[archive] programming languages. Example web search terms: assembler easy, assembler easier, assembler difficult.
↑
Source code written in higher level abstraction programming languages such as C and C++ are compiled to object code ^[archive] using a compiler. See this article ^[archive] for an introduction and this image ^[archive]. Source code written in lower level abstraction programming language assembler is converted to object code using an assembler. See the same article above and this image ^[archive]. Reverse engineering is very difficult for a reasonably complex program that is written in C or C++, where the source code is unavailable; that can be deduced from the high price for it. It is possible to decompile (meaning re-convert) the object code back to C with a decompiler like Boomerang ^[archive]. To put a price tag on it, consider this quote -- Boomerang: Help! I've lost my source code ^[archive]:
How much will it cost? You should expect to pay a significant amount of money for source recovery. The process is a long and intensive one. Depending on individual circumstances, the quality, quantity and size of artifacts, you can expect to pay upwards of US$15,000 per man-month.
- convert executable back to C source code ^[archive]
↑
The following resources try to solve the question of how to disassemble a binary (byte code) into assembly source code and re-assemble (convert) to binary.
1. Take a hello world assembler source code.
2. Assemble.
nasm -felf64 hello.asm
nasm -felf64 hello.asm
3. Link.
ld hello.o -o hello
ld hello.o -o hello
4. objdump (optional).
objdump -d hello
objdump -d hello
5. Exercise for the reader: disassemble hello and re-assemble.
↑
The GNU Hello ^[archive] program source file hello.c ^[archive] at the time of writing contains 170 lines. The objdump -d /usr/bin/hello on Debian buster has 2757 lines.
Install hello.
1. Update the package lists.
sudo apt update
sudo apt update
2. Upgrade the system.
sudo apt full-upgrade
sudo apt full-upgrade
3. Install the hello package.
Using apt command line parameter --no-install-recommends is in most cases optional.
sudo apt install --no-install-recommends hello
sudo apt install --no-install-recommends hello
4. Done.
The procedure of installing hello is complete.
```
objdump -d /usr/bin/hello
```
objdump -d /usr/bin/hello
```
2757
```

↑ For example, consider how difficult it was to reverse engineer Skype: Skype Reverse Engineering : The (long) journey ;).. ^[archive]

↑

Consider all the Debian package maintainer scripts. Clearly these are easier to review as is, since most of them are written in sh or bash. Review would be difficult if these were converted to a program written in C, and were closed source and precompiled.
Similarly, it is far preferable for OnionShare to stay Open Source and written in python, rather than the project being turned into a precompiled binary.

↑ Salary comparison ($K):