<![CDATA[Hacker News - Small Sites - Score >= 2]]> https://news.ycombinator.com RSS for Node Thu, 16 Jan 2025 16:23:55 GMT Thu, 16 Jan 2025 16:23:55 GMT 240 <![CDATA[I Make Smart Devices Dumber: A Privacy Advocate's Reflection]]> thread link) | @smnrg
January 16, 2025 | https://simone.org/dumber/ | archive.org

I flipped the vacuum robot upside down on my desk, wheels in the air. A thousands dollar marvel of modern, rebranded AliExpress convenience. Ready to map my home, learn my patterns, and send data across continents. Behind every promise of convenience lie hidden costs we're only beginning to understand.

My screwdriver hovered on its seams: These robots are not just about cleaning floors anymore, but drawing a line in the digital sand.

In the rush to embrace smart devices, you accept a devil's bargain: convenience for surveillance, efficiency for privacy. Homes become frontiers in the attention wars—each gadget a potential Trojan horse of data collection, promising easier lives.

The DIY movement has evolved far beyond fixing broken toasters. Some, like the devs of Valetudo, are digital rights advocates armed with soldering irons and software. Their work challenges the invisible monopolies that shape our relationship with technology. They're not just fixing devices: They are liberating those from adware and behavioral data harvesting. Each freed device marks a small victory in a conflict most people don't even realize they're in.

Navy Pier Ferris Wheel, Afro Hair, Kid Statue, Chicago, Black and White Fine Art Photo.
✨ Buy Limited Edition of 100 “Ferris Wheel”

Keep the Bots Away

Join me in preserving a human-crafted pocket of web on the Internet. No content is paywalled, and no paid subscription is solicited: this is the most effective way to keep bots and “AI” scrapers away while becoming part of a small community celebrating human writing, photography, and conversation.

I get it, here's my email

Already joined? Sign In

]]>
https://simone.org/dumber/ hacker-news-small-sites-42727200 Thu, 16 Jan 2025 16:11:04 GMT
<![CDATA[Kreya 1.16 with WebSocket support released – Postman Alternative]]> thread link) | @ni507
January 16, 2025 | https://kreya.app/blog/kreya-1.16-whats-new/ | archive.org

Kreya 1.16 comes with support for WebSocket calls and an updated look and feel for the project settings. Faker is added to the scripting API, cookie management is added and many more features are implemented.

WebSocket support

Create a new operation in the operations list, select the type WebSocket and start using WebSockets in Kreya!

An animation showcasing creating an WebSocket operation and sending it.

You can find all the details in our documentation.

It is now possible to define a custom header for the auth or send it as a query parameter. This can be configured for each auth configuration under advanced options.

An animation showcasing sending auth as a custom header.

Added faker to scripting API Pro / Enterprise

The bogus faker is now directly accessible in the scripting tab with kreya.faker. No additional imports are required.

An animation showcasing using faker in the scripting tab.

Updated look and feel of Kreya

Kreya's UI has been reworked. Especially the project settings have been given a fresh new look. They now open as a tab, allowing the user to quickly switch between an operation and the project settings. To open the project settings, go to the application menu and click Project > Environments or Authentications or ....

An animation showcasing opening project settings.

Tip: For even faster access, the project settings can be opened using keyboard shortcuts (e.g. Ctrl++E for environments on Windows or ++E on MacOS).

We've also streamlined the workflow for creating operations, so you don't have to select the type every time you create an operation, just choose the right type of operation from the menu at the start.

An animation showcasing creating operations.

Collection state filter Pro / Enterprise

In the header of a collection it is now possible to filter the operations by their states.

An animation showcasing filtering states of operations in a collection.

File exclusion list for gRPC proto file importers

In gRPC proto file importers, files can be excluded from import. This is particularly useful if you're importing entire folders of protos and don't want to import certain subfolders.

An animation showcasing excluding files in a gRPC proto file importer.

To add custom headers to the gRPC server reflection and REST OpenAPI URL importers, open the advanced options in these importers.

An animation showcasing excluding files in a gRPC proto file importer.

Cookies can now be managed in Kreya. A response tab is visible when a cookie is set and all cookies can be managed in a separate tab under Project > Cookies. It is important to note that all cookies are managed separately per environment.

An animation showcasing managing cookies.

AWS Signature v4 authentication

An additional auth type has been added. An AWS Signature v4 auth type can now be created in the auth settings.

An animation showcasing creating an AWS authentication.

Bug fixes

Many bugs have been fixed. More details can be found on our release notes page.

If you find a bug, please do not hesitate to report it. You can contact us at [email protected] for any further information or feedback.

Have a nice day! 👋

]]>
https://kreya.app/blog/kreya-1.16-whats-new/ hacker-news-small-sites-42726670 Thu, 16 Jan 2025 15:36:13 GMT
<![CDATA[Test-Driven Development with an LLM for Fun and Profit]]> thread link) | @crazylogger
January 16, 2025 | https://blog.yfzhou.fyi/posts/tdd-llm/ | archive.org

Welcome to the very first post in a new blog! Here I will discuss software development, SRE work, and other fun stuff. Sometimes an idea is just too good to pass up. I hope this blog will motivate me to turn sparks and little pieces into general knowledge in writing the words down.

The other day I was discussing Tabby with a coworker. We talked about whether we should consider AI-autocompleted code harmful and ditch everyone’s newfound habit due to LLM’s inherent unreliability and their tendency toward spaghetti code, throwing traditional software engineering principles like DRY out the window. I disagreed: what if we could have a framework that integrates AI development tooling while also making everything better and more reliable instead? This instantly reminds me of Test-Driven Development, or TDD, which I think is great when combined with the use of a Large Language Model.

TDD in essence is to write comprehensive unit tests before you work on the main program. Since you wrote so many test cases that they essentially become the full specification, having the tests pass at the end “proves” your program’s correctness. Despite its promise, a lot of people find it a ridiculously clumsy process, even a major drag on productivity. Days could pass when nothing useful gets done. LLMs, however, have fundamentally changed the economics of doing TDD.

How I normally code with an LLM

I used tools like Github Copilot heavily ever since they came along. They are good for finding repetitive patterns and helping us fill in the next few lines, but usually struggle to: look at a clear specification, think deeply about it, and produce a whole working module against said specification. Sometimes a problem is so easy I am positive Copilot must be capable of it, but it stops short of giving me the full solution, opting to generate a single line of code instead.

To achieve what I want, LLMs need to be given well-formed requests with necessary, but not overwhelming, contexts. Stuff every tool and library at our disposal inside a model’s context, then it is distracted and easily carried away from the specific problem at hand.

In the end, I find myself rephrasing and renaming project-specific things in more general terms, introducing additional pieces of project-specific context into the conversation only when I realize solving the task requires them after all.

Another observation I had working with LLMs is that they are great debuggers. I can often paste raw error outputs to an LLM, and more often than not, they succeed at guessing the cause.

At some point, I realized that the bulk of friction coding with an LLM came from the back-and-forth copy-pasting juggle between my IDE, shell terminal, and the chat interface.

Can I automate it?

So I wrote a little event loop to automate this process.

In the first prompt we give to the LLM, we type in a specification of the function we want to implement and a function signature for extra stability. The LLM is expected to generate a good unit test followed by an implementation.

Let’s give it a non-trivial, real-world problem to be solved:

% go run main.go \
--spec 'develop a function to take in a large text, recognize and parse any and all ipv4 and ipv6 addresses and CIDRs contained within it (these may be surrounded by random words or symbols like commas), then return them as a list' \
--sig 'func ParseCidrs(input string) ([]*net.IPNet, error)'

The model will happily give us a first draft, which we immediately parse and load into a “sandbox” subdirectory for automatic verification:

% tree ./sandbox
./sandbox
├── go.mod
├── go.sum
├── main.go
└── main_test.go

go mod tidy && gotfmt -w . && goimports -w . is used to fix minor syntax issues, then go test . -v is run.

If things fail (totally expected at this point,) we make use of the second (iteration) prompt. Now that we are working on existing code, this prompt includes the code we just ran and, crucially, the actual command line output from running the test, which would be either a compiler error or information about some failed tests. The model is expected to think about what happened and iterate by producing a revised test + implementation in a loop until all tests are cleared.

The idea is that in most cases, sending the full debug session is not a very frugal use of the model’s context. A reasonably intelligent model can think about what went wrong and make incremental changes. We also get the benefit of way lower API bills by keeping the context length more or less constant, regardless of how many iterations we end up doing.

Sometimes the model does get stuck. In our CIDR example, claude-3.5-sonnet came very close to all-clear on its second attempt, but proceeded to fail the same test case twice in a row. This is when I come in to look at the code, realize the regexp is not matching the final ‘1’ in “2001:db8::1”, and then provide that as an explicit hint to the model: test fails

Claude makes progress and clears the tests with our help: test passed

Now we must contend with the “who guards the guards” problem. Because LLMs are unreliable agents, it might so happen that Claude just scammed us by spitting out useless (or otherwise low-effort) test cases. Anyway, it is a good practice that whoever implements the code shouldn’t write their own tests, because the same blindspots in design will crop up in tests. In our case, LLM generated both. So it’s time to introduce some human input in the form of additional test cases, which is made extra convenient since the model already provided the overall structure of our test. If those cases pass, we can be reasonably confident in integrating this function into our codebase. Theoretically, you might even invoke a third prompt to do some AI-powered mutation testing by asking for a subtle, but critical, change in the implementation that is supposed to break our tests, then find out if it did!

LLM-powered development and cognitive load

So this method appears to reliably tackle leetcode-style questions, but would it work in a practical codebase with an actual dependency graph? I believe it can be made to work with some engineering effort, and it’s great news for the codebase’s long-term maintainability if you do. For best results, our project structure needs to be set up with LLM workflows in mind. Specifically, we should carefully manage and keep the cognitive load required to understand and contribute code to a project at a minimum.

Every package, or directory, should consist of several independently testable subsets of code, wherein each subset contains basically 3 files: shared.go for the package’s shared typedefs and globals, whereas x.go and x_test.go focus on a specific aspect of our functional logic, ideally just one public function per file. Sometimes we also have main_test.go supplying a TestMain function for setting up test environments (e.g. testcontainers).

Recently I started a brand new software project at work, so I had the opportunity to design the code organization starting from a blank slate. I’m currently exploring an extension to the AI TDD workflow for larger projects. The whole project code would need to be copied to the sandbox for execution, but sending the entire codebase to an LLM is impractical and distract the model’s focus. Instead, we designate a specific package (subdirectory) that the LLM will work on at any given time, include a gomarkdoc-generated documentation for each dependency package we think is helpful, and finally include a same-package example code (perhaps a tightly-coupled entity’s finished implementation). The model will produce a function and a test, just like before, but this time we write the files in our intended subdirectory deep inside the sandbox to run the unit test.

With this pattern, not only do we limit the chance of introducing problematic code into production with test coverage by default, but we also encourage aggressive de-coupling and a unit test-first project structure. Because adding additional context to a model incurs some inference (and human-psychological) cost, we are constantly nudged into maintaining our code as nice little chunks, each consuming only a small cognitive load to fully understand. Hopefully, this approach lets us end up with “deep” modules with rich functionalities but minimal surface area.

In closing, you should remember AI’s Bitter Lesson. There is a non-zero chance that we wake up tomorrow to a major shift in AI architecture, eliminating the LLM limitations we talked about, and rendering our efforts meaningless. So perhaps don’t start refactoring right away your 100k-Line-of-Code projects following this advice!

]]>
https://blog.yfzhou.fyi/posts/tdd-llm/ hacker-news-small-sites-42726584 Thu, 16 Jan 2025 15:30:19 GMT
<![CDATA[Astronomers discover neutron star with an incredibly slow six-hour spin]]> thread link) | @Brajeshwar
January 16, 2025 | https://www.abc.net.au/news/science/2025-01-16/neutron-star-radio-transient-6-hours/104799106 | archive.org

In our galaxy, about 13,000 light-years away, a dead star called ASKAP J1839-075 is breaking all the rules … extremely slowly.

Dead stars — or neutron stars — normally spin at breakneck speeds, but a team of astronomers has clocked the new-found star taking a leisurely six-and-a-half hours to undertake just one spin, which is thousands of times slower than expected.

"This could really change how we think about neutron star evolution," Yu Wing Joshua Lee, an astronomer from the University of Sydney and the first author on the new study, said. 

The discovery has been published in Nature Astronomy.

The researchers believe ASKAP J1839-075, is a "pulsar", a high-energy neutron star that releases short bursts of radio waves.

A black ball spinning on its axis, with light shooting out of both poles.

Dense neutron stars normally spin extremely fast in space. (Supplied: NASA/Goddard Space Flight Center/Conceptual Image Lab)

But conventional wisdom is that when pulsars slow down they stop emitting radio waves, meaning ASKAP J1839-075 should be invisible to radio telescopes. 

So what's going on?

What is a neutron star?

Neutron stars are one of the most extreme objects in the Universe.

These small, dense stars are created when the core of a supermassive star collapses and triggers a fiery explosion known as a supernova.

When the star collapses it may go from a million-kilometre radius to just 10 kilometres.

This extreme crumpling increases the star's rotational speed, like a figure skater spinning faster when their arms move close to their body. 

Spinning extremely fast is therefore part and parcel of being a neutron star. A full spin usually takes these collapsed stars just milliseconds or seconds to occur. 

If our Sun was to go through the process of becoming a neutron star, its current 27-day rotation could become 1,000 rotations a second. 

A gif of a purple star flashing in the night sky.

As pulsars rotate, we see flashes of radio waves from Earth, similar to a lighthouse.  (Supplied: NASA/Goddard Space Flight Center)

Using radio telescopes, astronomers can "see" pulses of radio waves from Earth as the neutron star spins, with the movement regularly described as like a "cosmic lighthouse". 

Later in the collapsed star's life, it was thought that when it lost energy and began to slow down, the bursts of radio waves scientists detected from Earth would also stop.

"Once they pass the [speed] threshold, we thought they'd be silenced forever," Mr Lee said. 

But in the past few years, astronomers discovered pulsars that seemed to contradict that hypothesis.

"This makes us rethink our previous theories on how these sources form." 

What rule-breakers have researchers found?

Early in 2022 researchers found pulsars that rotated on minutes-long time scales rather than seconds, and by June last year, researchers had discovered an object which took almost an hour between pulses. These usually slow objects were called "long-period radio transients". 

But ASKAP J1839-075's leisurely 6.45-hour spin was unheard of.

"The previous record was 54 minutes, so it was a huge jump," Mr Lee said.

"The team was really surprised."

A young man looking at the camera

Yu Wing Joshua Lee was the first author of the new paper.  (Supplied: University of Sydney)

According to Gemma Anderson, an astronomer at Curtin University who was not involved in this paper but who was part of the team who found the first long-period radio transient, 6.45 hours stretches "our understanding of physics".

"A normal pulsar couldn't spin this slowly and produce radio light," she said.

"Some kind of extreme particle acceleration … is occurring that is causing it to be so radio bright on these long time scales."

A lucky find

Mr Lee was searching for "peculiar radio transients" by trawling through archival data from a sky survey taken by CSIRO's ASKAP radio telescope in outback Western Australia.

With little prior knowledge of where these transients pop up, the strategy is to pick a random point in the sky to see if anything interesting shows up, Mr Lee said.

Within the archival data, the team discovered a pulsar-like blip from early January 2024. The signal was already starting to fade when the survey began, so the team could only study the second half.

"If the observation was scheduled 15 minutes later, then we would have completely missed it," Mr Lee said. 

"It is quite lucky that we discovered it."

It took 14 more observations to uncover its repeating pulses, and understand more about what sort of object it could be. 

All of the long-period radio transients discovered so far have involved Australian teams and, according to Dr Anderson, Australia is particularly well placed to find them because of our current generation of radio telescopes.

"[The Murchison Widefield Array and the ASKAP telescope] are the two discovery machines for these types of objects," she said.

The SKA-Low telescope, which is aiming to be fully operational by the end of this decade, will be even more powerful.

So what's the explanation?

While finding these rule-breakers has only taken a few years, understanding what could be causing these mysteriously slow pulses is proving more challenging.

Previous papers have suggested that some other type of star like white dwarfs (created when stars less massive than our Sun run out of fuel and collapse) or special pulsars called magnetars could be behind the slow pulses.

However, the long-period radio transients found so far all emit radiation a little differently.

"They all have different properties," Mr Lee said.

"We don't know whether they belong to the same family, or the same type of object with different mechanisms."

Large telescopes in the desert

ASKAP is the radio telescope used to discover the neutron star. (ABC News: Tom Hartley)

Dr Anderson noted that there may be two distinct classes of object, one group caused by white dwarfs, and one caused by magnetars.

In the case of ASKAP J1839-075, the evidence suggests that it's unlikely to be a white dwarf.

"This [research] nicely explains the different possible scenarios, but finds [in this case the] isolated neutron star or magnetar scenario is the most likely," Dr Anderson said. 

The telltale signs were in ASKAP J1839-075's distinct radio emissions, as well as a lack of a stars visible in optical telescopes, which would normally be seen if the star was a white dwarf.

Even if the star is a magnetar, its slow spin is still almost unheard of, as most magnetars rotate once every two to 10 seconds, and more research will need to be done to understand how they work.

This is unlikely to be the last long-period radio transient scientists find, according to Dr Anderson, although the brightest and most obvious ones have probably already been found.

With the easiest finds out of the way, Dr Anderson suggests that researchers may turn to understanding more about how these rule-breaking stars could have formed.

"Perhaps this is opening an even larger discovery space where there are lots of objects producing these [transients]," she said.

"It's just we had never looked at the galaxy in this way with our radio telescopes before."

]]>
https://www.abc.net.au/news/science/2025-01-16/neutron-star-radio-transient-6-hours/104799106 hacker-news-small-sites-42726535 Thu, 16 Jan 2025 15:27:05 GMT
<![CDATA[Streamline Git with GPT]]> thread link) | @thunderbong
January 16, 2025 | https://itsgg.com/2025/01/16/streamline-git-with-gpt.html | archive.org

Git workflows can be complex and time-consuming, especially when it comes to reviewing changes and writing meaningful commit messages. This guide shows how to leverage Shell-GPT to automate and enhance your Git experience.

Prerequisites

Before getting started, make sure you have:

  • Shell-GPT installed (pip install shell-gpt)
  • Git configured on your system
  • Basic familiarity with Git commands

Auto-Review Git Changes

Before committing your code, it’s crucial to review the changes. Shell-GPT can help by generating detailed reviews of your staged changes, helping catch potential issues early.

Setup

Run the following command to set up an alias for automated reviews:

git config --global alias.aireview '!f() { git diff --staged | sgpt "Generate a detailed code review"; }; f'

Usage

First, stage your changes with git add, then run:

Example Output

Here's a detailed review of your changes:

1. Feature Addition:
   - Added new user authentication middleware
   - Implemented JWT token validation
   - Added error handling for invalid tokens

2. Code Quality:
   - All new functions are properly documented
   - Error messages are descriptive and actionable
   - Consistent error handling pattern maintained

3. Potential Issues:
   - Consider adding rate limiting to the auth endpoints
   - Token expiration time might need configuration option

4. Recommendations:
   - Add unit tests for the new middleware
   - Document the JWT secret configuration in README

Auto-Generate Git Commit Messages

Writing clear and concise commit messages is an art. Let Shell-GPT help you generate commit messages that follow best practices and maintain a clean Git history.

Setup

Run this command to integrate AI into your commit workflow:

git config --global alias.aicommit '!f() { git add -A && git diff --staged | sgpt "Create a concise commit message with: summary (50 chars) + optional bullet points for details. Do not add any heading." | git commit -F -; }; f'

Usage

When you’re ready to commit your changes, simply run:

Example Output

feat: Add user authentication middleware

- Implement JWT token validation
- Add error handling for invalid tokens
- Create middleware configuration options

Best Practices

  1. Stage your changes deliberately - don’t just use git add -A for everything
  2. Review the AI-generated output before accepting it
  3. You can always modify the commit message or review before proceeding
  4. Use these tools as aids, not replacements for human judgment

Conclusion

These AI-powered Git aliases will help streamline your workflow while maintaining code quality and clear documentation of changes. Remember to review the AI suggestions and adjust them when needed to ensure they match your project’s specific requirements.

]]>
https://itsgg.com/2025/01/16/streamline-git-with-gpt.html hacker-news-small-sites-42725880 Thu, 16 Jan 2025 14:46:50 GMT
<![CDATA[Improve Rust Compile Time by 108X]]> thread link) | @nathanielsimard
January 16, 2025 | https://burn.dev/blog/improve-rust-compile-time-by-108x/ | archive.org

Disclamer

Before you get too excited, the techniques used to reduce compilation time are not applicable to all Rust projects. However, I expect the learnings to be useful for any Rust developer who wants to improve their project's compilation time. Now that this is clarified, let's dive into the results.

Results

We started with a compilation time of 108 seconds for the matmul benchmarks, which was reduced to only 1 second after all the optimizations. The most effective optimization was the element-type generics swap, where we instantiated generic functions with predefined "faked" element types to reduce the amount of LLVM[1] code generated. The second optimization also had a major impact, further reducing the compilation time by nearly 3×. This was achieved by using our comptime system instead of associated const generics to represent the matmul instruction sizes. Finally, the last optimization—also the simplest—was to reduce the LLVM optimization level to zero, which is particularly useful for debug builds, such as tests.

2025-01-10T16:31:43.028327 image/svg+xml Matplotlib v3.6.2, https://matplotlib.org/

Compilation times are measured using incremental compilation.

Motivation

First, let me explain the situation that led me to investigate our compile time. During the last iteration of CubeCL[2], we refactored our matrix multiplication GPU kernel to work with many different configurations and element types. CubeCL is a dialect that lets you program GPUs for high-performance computing applications directly in Rust. The project supports multiple compiler backends, namely WebGPU[3], CUDA[4], ROCm[5], and Vulkan[6] with more to come.

The refactoring of the matrix multiplication kernel was done to improve tensor cores utilization across many different precisions. In fact, each kernel instance works across 3 different element types: the global memory element type, the staging element type, and the accumulation element type. These are all different since we normally want to accumulate in a higher precision element type, as this is where numerical approximation is most sensitive. Also, the tensor cores don't work across all input precisions; if you have f32 inputs, you need to convert those to tf32 element types (staging) to call the tensor cores instructions. To add to the complexity, tensor cores instructions only work across fixed matrix shapes that also depend on the precisions. For instance, f16 staging matrices work across all shapes from (32, 8, 16), (16, 16, 16), (8, 32, 16). But tf32 only works on (16, 16, 8).

In our first refactoring, we represented the shape of the matrices supported by the instructions using const associated types, since this is the abstraction component that makes the most sense in this case. For the element types, we naturally used generic arguments for traits and functions - pretty much what any developer would do in this scenario. However, with all the possible combinations, we ended up with a compilation time of 1m48s using the cache.

Yes, you read that right: 1m48s just to rebuild the matmul benchmark if we change anything in the bench file.

For the purpose of this optimization, we only consider incremental compilation using cargo caching, since this is the most important one to speed up dev iteration time. Changing one configuration to test if an optimization worked took almost 2 minutes just to create the binary to execute a few matmuls.

Why is it that slow?

Well, we need to understand that the Rust compiler is actually very fast. The slow parts are the linking and LLVM. The best way to improve compilation time is to reduce the amount of LLVM IR generated.

In our specific case, each combination of the matmul would generate a whole new function - this is what zero-cost abstraction means. There is no dynamic dispatch; every type change duplicates the code to improve performance at the cost of a bigger binary. Before all of our optimizations, the binary generated was 29M, and after we reduced it to 2.5M - a huge difference.

To reduce the amount of code generated, we had to use different Rust techniques to make our abstractions for the matmul components. In our case, we don't need zero-cost abstractions, since the code written in Rust for the matmul components actually generates the code that is used to compile at runtime a kernel that will be executed on the GPU. Only the GPU code needs to be fast; the JIT Rust code takes almost no time during runtime. Zero-cost abstraction would actually be optimal in a setting where we would perform ahead-of-time compilation of kernels.

Ever wonder why LibTorch[7] or cuBLAS[8] have executables that are GIGABYTES in size? Well, it's because all kernels for all precisions with all edge cases must be compiled to speed up runtime execution. This is necessary in a compute-heavy workload like deep learning.

However, CubeCL is different - it performs JIT compilation, therefore we don't need to compile all possible variations ahead of time before creating the binary: we can use dynamic abstractions instead! This is one of the two optimizations that we made for the matmul components. Instead of relying on const associated types, we leveraged the comptime system to dynamically have access to the instruction sizes during the compilation of a kernel at runtime. This is actually the second optimization that we made and helped us go from 14s compilation time to around 5s.

However, the biggest optimization was quite hard to pull off and is linked to the generic element types passed in each function. We still wanted to use zero-cost abstraction in this case, since passing around an enum listing what element type operations are on would be terrible in terms of developer experience. However, the hint to improve our compilation time was that when you write a function that will execute on the GPU, we have an attribute on top #[cube].

We want the code to look and feel like normal Rust, but the macro actually parses the Rust code written and generates another function, which we normally call the expand function, where the actual GPU IR is built for the function. That code will actually run during runtime, not the code that the user is writing. The element types generics are only used to convert the generic element type into the enum IR element type. In the expand functions, we also pass a context where all IR is tracked.

So the optimization was to pass a fake generic element type, called FloatExpand instead of the actual element type like f32. When compiling a kernel, we first register the real element type in the context, using the const generic item to differentiate multiple element types if a function has multiple generic element types. Since we always call the expand function with the exact same generic items for all element types, we only generate one instance of that function, and the element types are fetched at runtime using the context.

The most tedious part was actually implementing that optimization while trying not to break our components. The biggest problem caused by that optimization is that we can't support generic dependencies between traits over the element type in launch functions of CubeCL.


  #[cube(launch)]
  // Doesn't compile, since we want to call the function with FloatExpand, and the provided I.
  fn matmul<F: Float, I: Instruction<F>>() { 
      // ...
  } 
  

It makes sense though - we don't want to recompile all the instruction types for all different precisions. Since our optimization is activated at the boundaries of CPU code and GPU code, where cube functions are identified as launchable, we need the generic trait to not have a dependency on the element type. They are going to be switched by our macro. We use generic associated types instead of traits with generic element types.

This is known as the family pattern[9], where a trait is describing a family of types.


  pub trait InstructionFamily {
     type Instruction<F: Float>: Instruction<F>;
  }
  
  pub trait Instruction<F: Float> {
      fn execute(lhs: Matrix, rhs: Matrix, accumulator: Matrix);
  }
  

Using this pattern, we can inject the family type at the boundaries of CPU and GPU code and instantiate the inner instruction type with the expand element type.


  #[cube(launch)]
  fn matmul<F: Float, I: InstructionFamily>() {
     // ...
     I::Instruction::<F>::execute(lhs, rhs, accumulator);
     // ...
  }

  // Launch code generated by the macro #[cube(launch)].
  mod matmul {
    // More code is generated for other stuff ...

    // Expand function generated.
    pub fn expand<F: Float, I: InstructionFamily>(
        context: &mut CubeContext,
    ) {
      // The element type IR can be access like the following.
      let elem = F::as_elem(context);
      // ..
    } 
     
    // The following defines the kernel definition.
    impl<F: Float, I: InstructionFamily, __R: Runtime> Kernel for Matmul<F, I, __R> {
        fn define(&self) -> KernelDefinition {
            let mut builder = KernelBuilder::default();

            // Register the real element type.
            builder
                .context
                .register_elem::<FloatExpand<0u8>>(F::as_elem_native_unchecked());

            // Instantiate the expand function with the expand float element type.
            expand::<FloatExpand<0u8>, I>(&mut builder.context);
            builder.build(self.settings.clone())
        }

        fn id(&self) -> cubecl::KernelId {
            let cube_dim = self.settings.cube_dim.clone();
            // Still use the original element type for the JIT compilation cache.
            KernelId::new::<Self>().info((cube_dim,))
        }
    }
  }
  

Migrating most of the components to the new pattern, we reduced compile time from 1m48s to about 14s.

It was a lot of work, and I don't expect all projects to face cases like this, but it was worth it! Now waiting for about 5 seconds after trying something in the code to see if performance is improved doesn't break the flow, but almost 2 minutes did.

We essentially leveraged the fact that CubeCL is a JIT compiler and not an AOT compiler, which is very appropriate for throughput-focused high-performance applications.

Playing with LLVM optimization settings

Since our benchmarks are compiled with optimization level set to 3, we could still improve the compilation time further to about 1s by reducing the optimization level to zero. Another 5X speedup that we can have by simply adjusting the LLVM optimization level.


  [profile.dev]
  opt-level = 0

We decided not to keep that optimization in production, since we want the benchmarks to have the same LLVM optimization level as user applications. However, we activated it for testing, since we often rerun tests to ensure we don't break correctness when implementing or optimizing kernels.

Not a Rust Compiler Issue

All of our optimizations actually created tons of code - we used proc macros, associated type generics, const generics, and tons of complex features from the Rust type system.

The Rust compiler is actually very fast; the slow part is really the linking and optimizing of the LLVM IR. If there's one thing to take from this post, it's that you shouldn't worry about using complex features of Rust, but make sure you don't generate huge binaries. Reducing the binary size will improve compile time even if you use complex methods to do so! "Less code compiles faster" is not exactly right. "Less generated code compiles faster" is what we have to keep in mind!

]]>
https://burn.dev/blog/improve-rust-compile-time-by-108x/ hacker-news-small-sites-42725296 Thu, 16 Jan 2025 14:09:57 GMT
<![CDATA[Zig comptime: does anything come close?]]> thread link) | @bw86
January 16, 2025 | https://renato.athaydes.com/posts/comptime-programming | archive.org

Zig comptime: does anything come close?

A look at what other languages can do where Zig would just use comptime.

Written on Wed, 15 Jan 2025 19:19

Decoration image

Image by rawpixel.com on Freepik

Zig’s comptime feature has been causing a little bit of a fuss on programmer discussion forums lately. It’s a very interesting feature because it is relatively simple to grasp but gives Zig code some super-powers which in other languages tend to require a lot of complexity to achieve. That’s in line with Zig’s stated desire to remain a simple language.

The most common super-power (ok, maybe not a super-power, but a non-trivial feature) given as an example is generics, but there are many more.

In this post, I decided to have a look at that in more detail and compare Zig’s approach with what some other languages can offer. Zig uses comptime for things that look a little bit like macros from Lisp or Rust, templates from C++ and D, and code generation via annotation processors from Java and Kotlin! Are those alternatives anywhere near as powerful? Or easy to use?

I will investigate that by looking at some Zig examples from Loris Cro (Zig evangelist)’s What is Zig’s Comptime and Scott Redig’s Zig’s Comptime is Bonkers Good.

I only know relatively few languages, so apologies if I didn’t include your favourite one (do let me know if your language has even nicer solutions)… specifically, I don’t know much C++, so be warned that while I am aware C++ has constexpr (I guess that’s the closest thing to comptime), I won’t even try to compare that with comptime in this post!

The basics

The first thing one can imagine doing at comptime in any language is to compute the value of some constant.

I mean, it would kind of suck if this expression was actually computed at runtime:

// five days
int time_ms = 5 * 24 * 60 * 60 * 1000;

Most compilers (maybe even all?) will use constant folding and compile that down to this:

int time_ms = 432'000'000;

However, they generally refuse to do this if any function calls are involved. Why? 🤔

Because it may be too hard, I suppose! Evaluating even simple expressions as above require at least an arithmetic interpreter, so almost all compilers must already have one hidden in them. Is executing at least some functions in their interpreters much harder to do?

I am not sure, but apparently the Zig authors (that would be Andrew Kelley, now backed by the Zig Software Foundation) think doing that is worth it!

fn multiply(a: i64, b: i64) i64 {
    return a * b;
}

pub fn main() void {
    const len = comptime multiply(4, 5);
    const my_static_array: [len]u8 = undefined;
    _ = my_static_array;
}

The multiply function is just a regular function that can obviously be called by other functions at runtime. But in this sample, it’s getting called with the comptime keyword in front of it:

const len = comptime multiply(4, 5);

That forces the compiler to actually call multiply at compile-time and replace the expression with the result. That means that, at runtime, the above line would be replaced with this:

const len = 20;

That’s why len can be used in the next line to create a static array (Zig doesn’t have dynamic arrays, so without comptime that wouldn’t work).

Of course, in this example you could just write that, but as you can call most other code, you might as well pre-compute things that would be hard to do at compile time otherwise.

The first thing that came to mind when I saw this example was Lisp’s read time evaluation!

(defun multiply (a b)
    (* a b))

(defun main ()
    (let ((len #.(multiply 4 5)))
        (format t "Lenght is ~s" len)))

The Lisp reader computes #.(multiply 4 5), so if you compile the function, it will just contain a 20 there at runtime.

It’s kind of important to understand that this is not the same as initializing constants at runtime, as this Java code would do:

class Hello {
    // Java evaluates this at runtime, when this Class is loaded
    static final int LEN = multiply(5, 4);
    
    static int multiply(int a, int b) {
        return a * b;
    }
}

We can imagine some expensive computation that might take time to perform at runtime, even if only once when the program starts up. If you do that at compile time, you save all that time forever, on every run of your program.

In D, we can also easily call “normal” functions at comptime (D calls that Compile Time Function Evaluation, or CTFE):

int multiply(int a, int b) => a * b;

void main()
{
    enum len = multiply(5, 4);
    ubyte[len] my_static_array;
}

D uses the enum keyword to declare comptime variables. But despite some name differences, you can see that this example is extremely similar to Zig’s.

Walter Bright, creator of D, has just written an article where he claims that “evaluating constant-expressions” is an obvious thing C should do! And in fact, the C compiler used by D to compile C code, importC does it!

Rust can also do it, but only a sub-set of the language is available as documented here, and only functions explicitly marked as const fn can be used (a big limitation in comparison with Zig and D):

const fn multiply(a: usize, b: usize) -> usize {
    a * b
}

fn main() {
    const len: usize = multiply(5, 4);
    let my_static_array: [u8; len];
}

Even in Zig and D, however, not everything can be done at comptime. For example, IO and networking seem to be out-of-limits, at least in the usual form.

Trying to read a file in Zig at comptime causes an error:

const std = @import("std");

fn go() []u8 {
    var buffer: [64]u8 = undefined;
    const cwd = std.fs.cwd();
    const handle = cwd.openFile("read-file-bytes.zig", .{ .mode = .read_only }) catch unreachable;
    defer handle.close();
    const len = handle.readAll(&buffer) catch unreachable;
    const str = buffer[0..len];
    const idx = std.mem.indexOf(u8, str, "\n") orelse len;
    return str[0..idx+1];
}

pub fn main() void {
    const file = comptime go();
    std.debug.print("{s}", .{file});
}

ERROR:

/usr/local/Cellar/zig/0.13.0/lib/zig/std/posix.zig:1751:30: error: comptime call of extern function
        const rc = openat_sym(dir_fd, file_path, flags, mode);
                   ~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/zig/0.13.0/lib/zig/std/fs/Dir.zig:880:33: note: called from here
    const fd = try posix.openatZ(self.fd, sub_path, os_flags, 0);
                   ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/zig/0.13.0/lib/zig/std/fs/Dir.zig:827:26: note: called from here
    return self.openFileZ(&path_c, flags);
           ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~
read-file-bytes.zig:6:32: note: called from here
    const handle = cwd.openFile("read-file-bytes.zig", .{ .mode = .read_only }) catch unreachable;
                   ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
read-file-bytes.zig:14:29: note: called from here
    const file = comptime go();
                          ~~^~

Remove the comptime keyword and the example above works.

In D, same thing:

import std;

void main()
{
    string firstLineOfFile()
    {
        auto file = File("parser.d");
        foreach (line; file.byLineCopy)
        {
            return line;
        }
        return "";
    }

    enum line = firstLineOfFile();

    // D's comptime version of printf
    pragma(msg, line);
}

ERROR:

Error: `fopen` cannot be interpreted at compile time, because it has no available source code
Error: `malloc` cannot be interpreted at compile time, because it has no available source code
main.d(15):        compile time context created here
main.d(17):        while evaluating `pragma(msg, line)`

Replace enum with auto and use writeln instead of pragma and the above also works.

Notice how both Zig and D appear to fail due to extern functions… Zig explicitly disallows syscalls. That makes sense as allowing that could make compilation really undeterministic and the result highly likely to diverge depending on the exact machine the code was compiled on.

Despite that, both languages provide ways to actually read files at comptime!

In Zig, you can use @embedFile:

const std = @import("std");

fn go() [] const u8 {
    const file = @embedFile("read-file-bytes.zig");
    const idx = std.mem.indexOf(u8, file, "\n") orelse file.len - 1;
    return file[0..idx+1];
}

pub fn main() void {
    const file = comptime go();
    std.debug.print("{s}", .{file});
}

D offers import expressions:

This requires using the compiler -J flag to point to a directory where to import files from.

import std;

void main()
{
    string firstLineOfFile()
    {
        enum file = import("main.d");
        return file.splitter('\n').front;
    }

    enum line = firstLineOfFile();
    
    // D's comptime version of printf
    pragma(msg, line);
}

Rust has include_bytes:

fn main() {
    let file = include_bytes!("main.rs");
    println!("{}", std::str::from_utf8(file).unwrap());
}

The above program embeds the main.rs file as a byte array and prints it at runtime.

However, a const fn cannot currently call include_bytes (there’s an issue to maybe make that possible). Hence, include_bytes is a bit more limited than Zig’s @embedFile and D’s import.

In Java, the closest you can get to doing this would be to include a resource in your jar and then load that at runtime:

try (var resource = MyClass.class.getResourceAsStream("/path/to/resource")) {
    // read the resource stream
}

Not quite the same, of course, but gets the jo done. It probably does not even need to perform IO as it’s going to just take the bytes from the already opened and likely memory-mapped jar/zip file where the class itself came from.

comptime blocks and arguments

The next example from Kristoff’s blog is very interesting. It implements the insensitive_eql function so that one of the strings is known at comptime and can be verified to be uppercase by the compiler (thanks to your own comptime code, that is):

// Compares two strings ignoring case (ascii strings only).
// Specialzied version where `uppr` is comptime known and *uppercase*.
fn insensitive_eql(comptime uppr: []const u8, str: []const u8) bool {
    comptime {
        var i = 0;
        while (i < uppr.len) : (i += 1) {
            if (uppr[i] >= 'a' and uppr[i] <= 'z') {
                @compileError("`uppr` must be all uppercase");
            }
        }
    }
    var i = 0;
    while (i < uppr.len) : (i += 1) {
        const val = if (str[i] >= 'a' and str[i] <= 'z')
            str[i] - 32
        else
            str[i];
        if (val != uppr[i]) return false;
    }
    return true;
}

pub fn main() void {
    const x = insensitive_eql("Hello", "hElLo");
}

ERROR:

insensitive_eql.zig:8:17: error: `uppr` must be all uppercase
                @compileError("`uppr` must be all uppercase");
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Or course, to fix that you just need to replace the first argument to insensitive_eql with something all-caps:

pub fn main() void {
    const x = insensitive_eql("HELLO", "hElLo");
}

This is a nice optimization based on comptime argument values.

Let’s see if we can rewrite this example in D, given D seems to also support comptime pretty well:

The original example did not check the length of the strings, so it can obviously return incorrect results if the input is longer than the comptime value, or read past the input’s bounds if that’s shorter, but I kept the same behaviour in this translation because this is not trying to implement a production-grade algorithm anyway, and it’s fun to maybe poison the training data for the AIs scraping my site.

/// Compares two strings ignoring case (ascii strings only).
/// Specialzied version where `uppr` is comptime known and *uppercase*.
bool insensitiveEqual(string uppr)(string str)
{
    import std.ascii : lowercase, toUpper;
    import std.algorithm.searching : canFind;

    static foreach (c; uppr)
    {
        static if (lowercase.canFind(c))
            static assert(0, "uppr must be all uppercase");
    }
    foreach (i, c; str)
    {
        if (c.toUpper != uppr[i])
            return false;
    }
    return true;
}

void main()
{
    const x = insensitiveEqual!("Hello")("hElLo");
}

ERROR:

insensitive_eql.d(11): Error: static assert:  "uppr must be all uppercase"
insensitive_eql.d(23):        instantiated from here: `insensitiveEqual!"Hello"`

While D does not have comptime blocks, it’s fairly clear when a block is executed at comptime due to the use of static to differentiate if, foreach and assert from their runtime counterparts. If you see static if you just know it’s comptime.

This example shows one of the main differences between Zig’s and D’s comptime facilities: while in Zig, whether a function argument must be known at comptime is determined by annotating the argument with comptime, in D there are two parameter lists: the first list consists of the comptime parameters, and the second one of the runtime parameters.

Hence, in Zig you cannot know by looking at a function call which arguments are comptime, you need to know the function signature or track where the variable came from. In D, you can, because the comptime parameters come on a separate list which must follow the ! symbol:

insensitiveEqual!("Hello")("hElLo");

The parenthesis are optional when only one argument is used, so this is equivalent:

insensitiveEqual!"Hello"("hElLo");

The downside is that one cannot interleave runtime and comptime arguments, which may be a bit more natural in some cases.

Moving on to Rust now… Rust does have something that lets us say this value must be known for the life of the program: lifetime annotations. That may not be exactly the same as comptime, but sounds close! Let’s try using it.

fn insensitive_eql(uppr: &'static str, string: &str) -> bool {
    todo!()
}

const fn is_upper(string: &str) -> bool {
    string.bytes().all(|c| c.is_ascii_uppercase())
}

pub fn main() {
    let a = "Hello";
    assert!(is_upper(a));
    print!("result: {}", insensitive_eql(a, "hElLo"));
}

ERROR:

error[E0015]: cannot call non-const fn `<std::str::Bytes<'_> as Iterator>::all::<{closure@src/main.rs:6:24: 6:27}>` in constant functions
 --> src/main.rs:6:20
  |
6 |     string.bytes().all(|c| c.is_ascii_uppercase())
  |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |
  = note: calls in constant functions are limited to constant functions, tuple structs and tuple variants

Close, but no cigar! As we can only call other const fns from a const fn, and no one seems to declare all their functions that could be const fn actually const fn (sorry for so many const fns in one sentence), there seems to be very little we can do with const fns! We would probably need to implement our own is_upper const fn to get around that. But even if we did, what we really needed was for the assertion that the string was all-caps to run at compile-time, and as far as I can see that’s not possible with Rust.

Java doesn’t have const fn or anything similar in the first place (Java is extremely dynamic, you can load a jar at runtime, create a ClassLoader and start using code from the jar), so it’s hopeless to even try anything, I believe.

Lisp can do anything at read time (there’s nearly no difference between comptime, readtime and runtime as far as what code you can use). However, for the sake of brevity and to try to focus on statically-typed languages where there’s a clear compile-/run-time split, I’m afraid I will not be including Lisp anymore, sorry if you’re a fan!

Code ellision

This example is also from Kristoff’s post, but adapted by me to compile with Zig 0.13.0 (Zig’s stdlib changes a lot between releases):

const builtin = @import("builtin");
const std = @import("std");
const fmt = std.fmt;
const io = std.io;

const Op = enum {
    Sum,
    Mul,
    Sub,
};

fn ask_user() !i64 {
    var buf: [10]u8 = undefined;
    std.debug.print("A number please: ", .{});
    const stdin = std.io.getStdIn().reader();
    const user_input = try stdin.readUntilDelimiter(&buf, '\n');
    return fmt.parseInt(i64, user_input, 10);
}

fn apply_ops(comptime operations: []const Op, num: i64) i64 {
    var acc: i64 = 0;
    inline for (operations) |op| {
        switch (op) {
            .Sum => acc +%= num,
            .Mul => acc *%= num,
            .Sub => acc -%= num,
        }
    }
    return acc;
}

pub fn main() !void {
    const user_num = try ask_user();
    const ops = [4]Op{ .Sum, .Mul, .Sub, .Sub };
    const x = apply_ops(ops[0..], user_num);
    std.debug.print("Result: {}\n", .{x});
}

It shows how inline for can be used to loop using comptime variables. The loop gets unrolled at comptime, naturally. Looking at the generated Assembly in Godbolt, one can clearly see it’s only doing the expected arithmetics.

As we’ve just learned in the previous section, D’s static for is a comptime loop, hence equivalent to Zig’s inline for.

Why Zig calls this inline for instead of comptime for? And why does D use enum to designate comptime variables? Language creators don’t seem to be any better at naming things than the rest of us.

Here’s the same application written in D:

enum Op
{
    SUM,
    MUL,
    SUB,
}

long askUser()
{
    import std.stdio : write, readln;
    import std.conv : to;

    char[] buf;
    write("A number please: ");
    auto len = readln!char(buf);
    return buf[0 .. len - 1].to!long;
}

long applyOps(Op[] operations)(long num)
{
    long acc = 0;
    static foreach (op; operations)
    {
        final switch (op) with (Op)
        {
        case SUM:
            acc += num;
            break;
        case MUL:
            acc *= num;
            break;
        case SUB:
            acc -= num;
        }
    }
    return acc;
}

int main()
{
    import std.stdio : writeln;

    auto num = askUser();
    with (Op)
    {
        static immutable ops = [SUM, MUL, SUB, SUB];
        num = applyOps!ops(num);
    }
    writeln("Result: ", num);
    return 0;
}

In case you’re not familiar with D, final switch is a switch that demands all cases to be covered, as opposed to regular switch which require a default case (so enums could evolve without breaking callers).

In this example, the two languages look very similar, apart from the syntax differences (and formatting; I am using the standard format given by the respective Language Servers).

But there’s a problem with the D version. If you look at the Assembly it generates, it’s obvious that the switch is still there at runtime, even when using the -release flag, as far as I can see.

The reason is that there’s no static switch in D, i.e. no comptime switch (pretty lazy of them to only have included static if)!

For that reason, the switch needs to be rewritten to use static if to be actually equivalent to Zig:

long applyOps(Op[] operations)(long num)
{
    long acc = 0;
    static foreach (op; operations)
    {
        static if (op == Op.SUM)
        {
            acc += num;
        }
        else static if (op == Op.MUL)
        {
            acc *= num;
        }
        else static if (op == Op.SUB)
        {
            acc -= num;
        }
    }
    return acc;
}

This generates essentially the same Assembly as Zig.

I am not aware of any other statically-typed language that could also implement this, at least without macros, but would be happy to be hear about it if anyone knows (languages like Terra, which are not really practical languages, don’t count).

But talking about macros, Rust has macros! It actually has four types of macros.

Can we do this with macros?

Let’s see. Calling applyOps in Rust should look something like this:

apply_ops!(num, SUM, MUL, SUB, SUB);

Notice that the ! symbol in Rust is a macro specifier. The question is, can we write a loop and a switch in a Rust macro (this kind of macro is called declarative macro)?

Yes!

macro_rules! apply_ops {
    ( $num:expr, $( $op:expr ),* ) => {
    {
        use Op::{Mul, Sub, Sum};
        let mut acc: u64 = 0;
        $(
            match $op {
                Sum => acc += $num,
                Mul => acc *= $num,
                Sub => acc -= $num,
            };
        )*
        acc
    }
    }
}

The part between $( .. )* may not look much like a loop, but it is! The reason it looks different than a normal loop is that this uses Rust’s macro templating language, essentially. That’s the problem with macro systems: they are a separate language within a host language, normally. Except, of course, if your language is itself written in AST form, like with Lisp languages, but let’s leave that for another time.

At least macros do run, or get expanded, at comptime!

Here’s the full example in Rust:

use std::io::{self, Write};

enum Op {
    Sum,
    Mul,
    Sub,
}

macro_rules! apply_ops {
    ( $num:expr, $( $op:expr ),* ) => {
    {
        use Op::{Mul, Sub, Sum};
        let mut acc: u64 = 0;
        $(
            match $op {
                Sum => acc += $num,
                Mul => acc *= $num,
                Sub => acc -= $num,
            };
        )*
        acc
    }
    }
}

fn ask_user() -> u64 {
    print!("A number please: ");
    io::stdout().flush().unwrap();
    let mut line = String::new();
    let _ = io::stdin().read_line(&mut line).unwrap();
    line.trim_end().parse::<u64>().expect("a number")
}

pub fn main() {
    let num = ask_user();
    let num = apply_ops!(num, Sum, Mul, Sub, Sub);
    println!("Result: {}", num);
}

It’s surprisingly similar to Zig… well, at least if you ignore the macro syntax.

The cool thing about Rust is that it is very popular, or at least it’s popular with people who care about tooling! So, it has awesome tooling, like the cargo-expand crate.

To install it, run:

cargo install cargo-expand

… and wait for 5 minutes as it compiles half of the crates in crates.io! But the wait is worth it!

Look at what it prints when I run cargo expand (displayed with the real colors on my terminal!):

Isn’t that pretty!?

I don’t know of any tool that can do this for Zig or D!

This reveals that the macro expands kind of as expected. Hopefully, the compiler will be smart enough to actually remove the match Sum { Sum => ... } blocks.

I did try to verify it by looking at the generated ASM using cargo-asm, but that crate couldn’t do it for this example as it crashed!

ERROR:

thread 'main' panicked at /Users/renato/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cargo-asm-0.1.16/src/rust.rs:123:33:
called `Result::unwrap()` on an `Err` value: Os { code: 21, kind: IsADirectory, message: "Is a directory" }

So much for great tooling (and people should just stop using unwrap in production code that can actually fail… “this should never happen” always happens)!

I knew I should’ve just used Godbolt from the start, but I prefer to use local tools if I can. Anyway, without using any compiler arguments, it looks like the match blocks are still there! However, using the -O flag to enable optimisations removes so much code (and inlines everything) that all I see is one multiplication! What kind of magic is that!?

Generics

Finally, we get to the most famous part of Zig’s comptime story: how it can provide generics-like functionality without actually having generics!

Here’s Kristoff’s generics sample (again, updated for Zig 0.13.0):

/// Compares two slices and returns whether they are equal.
pub fn eql(comptime T: type, a: []const T, b: []const T) bool {
    if (a.len != b.len) return false;
    for (0.., a) |index, item| {
        if (!std.meta.eql(b[index], item)) return false;
    }
    return true;
}

Notice how the first argument to eql is a type! This is a very common pattern in Zig.

That makes it look similar to “real” generics in languages like Java and Rust (even Go has recently added generics). But whereas in those languages, generics are a feature on their own, and a complex one at that, in Zig, it’s just a result of how comptime works (and a bunch of built-ins provided to work with types).

Arguably, however, what Zig does is not truly generics but templating.

Check out Zig-style generics are not well-suited for most languages for a more critical look at the differences.

In a language with true generics, there are some signficant differences. Let’s see what it looks like in Rust:

fn eql<T: std::cmp::PartialEq>(a: &[T], b: &[T]) -> bool {
    if a.len() != b.len() {
        return false;
    }
    for i in 0..a.len() {
        if a[i] != b[i] {
            return false;
        }
    }
    true
}

Notice how there’s a type bound to the generic type T:

T: std::cmp::PartialEq

That’s because you wouldn’t be able to use the != operator otherwise, as only implementations of the PartialEq trait provide that in Rust.

Removing the type bound gives me an opportunity to showcase another Rust error message, which is always fun:

error[E0369]: binary operation `!=` cannot be applied to type `T`
 --> src/main.rs:6:17
  |
6 |         if a[i] != b[i] {
  |            ---- ^^ ---- T
  |            |
  |            T
  |
help: consider restricting type parameter `T`
  |
1 | fn eql<T: std::cmp::PartialEq>(a: &[T], b: &[T]) -> bool {
  |         +++++++++++++++++++++

This kind of generics is quite nice on tooling: the IDE (and the programmer) will know exactly what can and cannot be done with arguments of generic types, so it can easily diagnose mistakes. With templates, that is only possible at the call site because that’s when the template will actually be instantiated and type-checked!

To illusrate that, in the following example, the eql call fails to compile:

pub fn main() !void {
    const C = struct {};
    const a: []C = &[0]C{};
    const b: []C = &[0]C{};
    
    std.debug.print("Result: {}\n", .{eql(C, a, b)});
}

ERROR:

generics.zig:9:22: error: operator != not allowed for type 'comptime.main.C'
        if (b[index] != item) return false;
            ~~~~~~~~~^~~~~~~
generics.zig:14:15: note: struct declared here
    const C = struct {};
              ^~~~~~~~~

The error is shown inside the template itself, even though the template is not where the mistake is. That’s worse than in Rust and other languages with similar generics, where the error is shown on the call site if it tries to use a type which does not satisfy the generic type bounds in the function signature.

Going back to the error above, that happens because Zig only defines the equality operators for Integers, Floats, bool and type, so it won’t work for structs.

As an aside, as Zig has no interfaces or traits unless you perform some acrobatics, it couldn’t declare type bounds that way, currently. It could, however, do something like D does and allow boolean expression of types to be added to function signatures, as we’ll see.

For other types, the comparison should be done with std.meta.eql:

if (!std.meta.eql(b[index], item)) return false;

Indeed, after this change, the eql example now works for any type, thanks to std.meta.eql actually doing the heavy lifting, of course.

And the way it does that is by specializing behaviour for different types.

Here’s a partial view of its implementation:

pub fn eql(a: anytype, b: @TypeOf(a)) bool {
    const T = @TypeOf(a);

    switch (@typeInfo(T)) {
        .Struct => |info| {
            inline for (info.fields) |field_info| {
                if (!eql(@field(a, field_info.name), @field(b, field_info.name))) return false;
            }
            return true;
        },
...
        .Array => {
            if (a.len != b.len) return false;
            for (a, 0..) |e, i|
                if (!eql(e, b[i])) return false;
            return true;
        },
...
        else => return a == b,
    }
}

This shows the special type, anytype, which is basically comptime duck typing. In this case, the function explicitly handles several known types (the actual type is obtained using @typeInfo), then falls back on the == operator.

In a language with union types, this would be easy to represent type-safely, but in Zig, anytype is the only solution, currently (but there are proposals to improve this). That’s not ideal because programming with anytype feels like programming in a dynamically typed language: you just don’t know what you can do with your values and just hope for the best. Unlike dynamic languages, the type checker will still eventually catch up with you, obviously, but that doesn’t help when you’re writing the function or even trying to figure out if you can call it (you want to avoid being yelled at by the compiler for writing bad code, after all). The only way to know what the function requires of a value of type anytype is to look at its source code, which is one of the reasons why a lot of complaints by people who are new to the Zig community are waved away with just read the code.

For example, this is a valid function in Zig (though it probably cannot be called by anything):

fn cant_call_me(a: anytype) @TypeOf(a) {
    const b = a.b;
    const e = b.c.d.e;
    if (e.doesStuff()) {
        return e.f.g;
    }
    return b.default.value;
}

The @TypeOf(a) expression is a little awkward in the previous example, but allows telling the compiler that b must be of whatever type a is (not anytype but the actual type at the call site). There is an interesting proposal to improve on that with infer T.

Notice how this function signature:

pub fn eql(a: anytype, b: @TypeOf(a)) bool

Can be rewritten to, at the cost of having to specify the type explicitly, the following:

pub fn eql(comptime T: type, a: T, b: T) bool

The choice of which to use is somewhat subtle.

D has had to grapple with most of these issues in the past as well. For example, to assist with knowing which types you could call a generic function with, it allows adding type constraints to its function signatures, making it a little closer to Rust than Zig in that regard, despite using templates to implement generics.

Let’s look at another Zig example before we go back to D to see how it works:

// This is the stdlib implementation of `parseInt`
pub fn parseInt(comptime T: type, buf: []const u8, radix: u8) !T

This function does not declare what types T may be, but almost certainly, you can only use one of the integer types (e.g. u8, u16, u32, i32 …). In other cases, it may not be as easy to guess.

Well, with D, one could say exactly what the requirements are (though what isIntegral means in D is a little bit different):

import std.traits : isIntegral;

T parseInt(T)(string value, ubyte radix) if (isIntegral!T) {
  todo();
}

The isIntegral function is one of many comptime type-checking helpers from the std.traits module.

That’s a very good approach for both documentation and tooling support.

Notice that you could define isIntegral yourself (e.g. to be more strict with what types are allowed):

bool isIntegral(T)()
{
    return is(T == uint) || is(T == ulong) ||
        is(T == int) || is(T == long);
}

T parseInt(T)(string value, ubyte radix) if (isIntegral!T)
{
    todo();
}

Trying to call parseInt with some other type, say char, causes an error, of course:

void main()
{
    parseInt!char("c", 1);
}

ERROR:

tests.d(51): Error: template instance `tests.parseInt!char` does not match template declaration `parseInt(T)(string value, ubyte radix)`
  with `T = char`
  must satisfy the following constraint:
`       isIntegral!T`

Type introspection

This Zig example is from Scott Redig’s blog post:

const std = @import("std");

const MyStruct = struct {
    a: i64,
    b: i64,
    c: i64,

    fn sumFields(my_struct: MyStruct) i64 {
        var sum: i64 = 0;
        inline for (comptime std.meta.fieldNames(MyStruct)) |field_name| {
            sum += @field(my_struct, field_name);
        }
        return sum;
    }
};

pub fn main() void {
    const ms: MyStruct = .{ .a = 32, .b = 4, .c = 2 };
    std.debug.print("{d}\n", .{ms.sumFields()});
}

Even though this example only works on this particular struct, you can easily rewrite it so it can sum the integer fields of any struct (even those that have non-integer fields):

const std = @import("std");

fn sumFields(my_struct: anytype) i64 {
    var sum: i64 = 0;
    inline for (comptime std.meta.fieldNames(MyStruct)) |field_name| {
        const FT = @TypeOf(@field(my_struct, field_name));
        if (@typeInfo(FT) == .Int) {
            sum += @field(my_struct, field_name);
        }
    }
    return sum;
}

const MyStruct = struct {
    a: i64,
    s: []const u8,
    b: i64,
    c: i32,
};

pub fn main() void {
    const ms: MyStruct = .{ .a = 32, .b = 4, .c = 2, .s = "" };
    std.debug.print("{d}\n", .{sumFields(ms)});
}

This is very, very cool because of the implications: you can use the same technique to generate code that, for example, serializes structs into JSON, or whatever data format.

You may not be surprised by now to hear that D can also do the same thing:

int main()
{
    import std.stdio : writeln;

    struct S
    {
        ulong a;
        string s;
        ulong b;
        uint c;
    }

    S s = {a: 32, b: 4, c: 2, s: ""};
    writeln("Result: ", sumFields(s));
    return 0;
}

long sumFields(S)(S myStruct) if (is(S == struct))
{
    import std.traits : isIntegral;

    long sum = 0;
    foreach (member; __traits(allMembers, S))
    {
        auto value = __traits(getMember, myStruct, member);
        static if (__traits(isIntegral, typeof(value)))
        {
            sum += value;
        }
    }
    return sum;
}

The __traits syntax takes some getting used to, but after that it’s not very different from Zig’s @TypeOf and similar built-ins.

I want to emphasize again that this is all doing work at comptime, and at runtime, you have custom-made code for each type you call the function with. For example, for a struct like this:

struct AB {
    int a;
    int b;
}

The actual runtime version of sumFields should look like this:

long sumFields(AB myStruct) {
    long sum = 0;
    sum += myStruct.a;
    sum += myStruct.b;
    return sum;
}

Do visit Godbolt and check it out.

This is in contrast to something like Java reflection. You could definitely implement something similar using reflection, but that would have a high runtime cost, not to mention the code would look much more complex than this.

However, Java does have a solution that can work: annotation processors. Well, at least as long as you don’t mind doing all of the following:

  1. create an annotation, say @SumFields.
  2. write an annotation processor for this annotation that generates a new class that can provide the sumFields method.
  3. annotate each class you want to sum the fields of with @SumFields. Tough luck if it’s not your class to change.
  4. configure the build to invoke your annotation processor.
  5. build your project, then finally use the generated class to call sumFields.

I was going to include some code showing how this looks like in practice, but I decided to just link to an article that shows something similar at Baeldung instead, I hope you can understand that.

Because the process to do this in Java is so convoluted, people almost never do that in application code. Only frameworks do, they have to. So, you get some pretty big frameworks that can do impressive things like comptime Dependency Injection and JSON serialization.

Rust has procedural macros which are pretty similar to Java annotation processors. But because working directly on ASTs is not a lot of fun, and Rust has macros, one can use the quote crate to write things more easily in a way that resembles a lot templates.

In Zig and D, you don’t need frameworks doing magic and you don’t need macros 😎.

Textual code generation

Finally, Redig’s blog post has a section titled View 5: Textual Code Generation where he says:

“If this method of metaprogramming (textual code generation) is familiar to you then moving to Zig comptime might feel like a significant downgrade.”

This may sound crazy, but with D, you can even do that:

string toJson(T)(T value) {
  mixin(`import std.conv : to;string result = "{\n";`);
  foreach (member; __traits(allMembers, T)) {
    mixin("auto v = value." ~ member ~ ".to!string;");
    mixin(`result ~= "  ` ~ member ~ `: " ~ v;`);
  }
  mixin(`result ~= "\n}\n"; return result;`);
}

Above, I wrote a very basic, incomplete, pseudo-JSON serializer for anything that works with the allMembers trait (though it probably only does the right thing for classes and structs) using just string mixins.

Using the struct instance from the previous example:

S s = {a: 32, b: 4, c: 2, s: "hello"};
writeln("Result: ", s.toJson);

Prints:

Result: {
  a: 32  s: hello  b: 4  c: 2
}

I don’t know about you, but I find this bonkers good, indeed.

Final thoughts

Zig is doing great work on a lot of fronts, specially with its build system, C interop and top-of-the-line cross-compilation.

Its meta-programming capabilities are also great, as we’ve seen, and comptime fits the language perfectly to provide some powerful features while keeping the overall language simple.

The instability of the language still keeps me away for now. I maintain a basic Zig Common Tasks website and every upgrade requires quite some effort (though it was much less in the last version). I hope they get to a 1.0 release soon, so this won’t be a problem anymore.

Anyway, while Zig deserves all the praise it’s getting, I think it’s not the only one.

As I hope to have shown in this post, D’s CTFE and templates appear to be able do pretty much everything that Zig comptime can, and then some. But no one is talking about it! D certainly isn’t getting huge donations from wealthy admirers and people are not being paid the largest salaries to write code in it. As someone who is just a distant observer in all of this, I have to wonder why that is. I get that Zig is not just comptime… while D may have an even more impressive comptime than Zig, it may lack in too many other areas.

After playing with D for some time, my main complaints are:

  • it has too many features, some of which a little rough around the edges, like its multitude of function parameters modifiers.
  • the build system, Dub, seriously needs more attention. I tried to help with docs but my help was kind of ignored.
  • how hard it is to use @safe, @nogc and pure - they would be awesome if they were actually usable!. Calling almost anything on the standard library forces you to remove these.

But D also has a lot cool stuff I didn’t mention:

Maybe the fact that D has a GC doomed it from the start?

Rust has a less fun way of doing things, arguably, but it also covers a lot of things that are possible with comptime through macros, generics and traits. And it doesn’t have a GC (take that D)! And it is memory-safe (take that, Zig)!

Java may not be at nearly the same level in terms of meta-programming as any of the other languages mentioned in this post, but at least writing top-notch tooling for it is much easier, and it shows! And with its love for big frameworks and wealth of libraries, it actually manages to almost compensate for that.

D Mascot Zig Mascot Rust Mascot

]]>
https://renato.athaydes.com/posts/comptime-programming hacker-news-small-sites-42725177 Thu, 16 Jan 2025 14:00:37 GMT
<![CDATA[Nepenthes is a tarpit to catch AI web crawlers]]> thread link) | @blendergeek
January 16, 2025 | https://zadzmo.org/code/nepenthes/ | archive.org

This is a tarpit intended to catch web crawlers. Specifically, it's targetting crawlers that scrape data for LLM's - but really, like the plants it is named after, it'll eat just about anything that finds it's way inside.

It works by generating an endless sequences of pages, each of which with dozens of links, that simply go back into a the tarpit. Pages are randomly generated, but in a deterministic way, causing them to appear to be flat files that never change. Intentional delay is added to prevent crawlers from bogging down your server, in addition to wasting their time. Lastly, optional Markov-babble can be added to the pages, to give the crawlers something to scrape up and train their LLMs on, hopefully accelerating model collapse.

You can take a look at what this looks like, here. (Note: VERY slow page loads!)

THIS IS DELIBERATELY MALICIOUS SOFTWARE INTENDED TO CAUSE HARMFUL ACTIVITY. DO NOT DEPLOY IF YOU AREN'T FULLY COMFORTABLE WITH WHAT YOU ARE DOING.

LLM scrapers are relentless and brutual. You may be able to keep them at bay with this software - but it works by providing them with a neverending stream of exactly what they are looking for. YOU ARE LIKELY TO EXPERIENCE SIGNIFICANT CONTINUOUS CPU LOAD, ESPECIALLY WITH THE MARKOV MODULE ENABLED.

There is not currently a way to differentiate between web crawlers that are indexing sites for search purposes, vs crawlers that are training AI models. ANY SITE THIS SOFTWARE IS APPLIED TO WILL LIKELY DISAPPEAR FROM ALL SEARCH RESULTS.

Latest Version

Nepenthes 1.0

All downloads

Usage

Expected usage is to hide the tarpit behind nginx or Apache, or whatever else you have implemented your site in. Directly exposing it to the internet is ill advised. We want it to look as innocent and normal as possible; in addition HTTP headers are used to configure the tarpit.

I'll be using nginx configurations for examples. Here's a real world snippet for the demo above:

    location /nepenthes-demo/ {
            proxy_pass http://localhost:8893;
            proxy_set_header X-Prefix '/nepenthes-demo';
            proxy_set_header X-Forwarded-For $remote_addr;
            proxy_buffering off;
    }

You'll see several headers are added here: "X-Prefix" tells the tarpit that all links should go to that path. Make this match what is in the 'location' directive. X-Forwarded-For is optional, but will make any statistics gathered significantly more useful.

The proxy_buffering directive is important. LLM crawlers typically disconnect if not given a response within a few seconds; Nepenthes counters this by drip-feeding a few bytes at a time. Buffering breaks this workaround.

You can have multiple proxies to an individual Nepenthes instance; simply set the X-Prefix header accordingly.

Installation

You can use Docker, or install manually.

A Dockerfile and compose.yaml is provided in the /docker directory. Simply tweak the configuration file to your preferences, 'docker compose up'. You will still need to bootstrap a Markov corpus if you enable the feature (see next section.)

For Manual installation, you'll need to install Lua (5.4 preferred), SQLite (if using Markov), and OpenSSL. The following Lua modules need to be installed - if they are all present in your package manager, use that; otherwise you will need to install Luarocks and use it to install the following:

Create a nepenthes user (you REALLY don't want this running as root.) Let's assume the user's home directory is also your install directory.

useradd -m nepenthes

Unpack the tarball:

cd scratch/
tar -xvzf nepenthes-1.0.tar.gz
    cp -r nepenthes-1.0/* /home/nepenthes/

Tweak config.yml as you prefer (see below for documentation.) Then you're ready to start:

    su -l -u nepenthes /home/nepenthes/nepenthes /home/nepenthes/config.yml

Sending SIGTERM or SIGINT will shut the process down.

Bootstrapping the Markov Babbler

The Markov feature requires a trained corpus to babble from. One was intentionally omitted because, ideally, everyone's tarpits should look different to evade detection. Find a source of text in whatever language you prefer; there's lots of research corpuses out there, or possibly pull in some very long Wikipedia articles, maybe grab some books from Project Gutenberg, the Unix fortune file, it really doesn't matter at all. Be creative!

Training is accomplished by sending data to a POST endpoint. This only needs to be done once. Sending training data more than once cumulatively adds to the existing corpus, allowing you to mix different texts - or train in chunks.

Once you have your body of text, assuming it's called corpus.txt, in your working directory, and you're running with the default port:

curl -XPOST -d ./@corpus.txt -H'Content-type: text/plain' http://localhost:8893/train

This could take a very, VERY long time - possibly hours. curl may potentially time out. See load.sh in the nepenthes distribution for a script that incrementally loads training data.

The Markov module returns an empt string if there is no corpus. Thus, the tarpit will continue to function as a tarpit without a corpus loaded. The extra CPU consumed for this check is almost nothing.

Statistics

Want to see what prey you've caught? There are several statistics endpoints, all returning JSON. To see everything:

http://{http_host:http_port}/stats

To see user agent strings only:

http://{http_host:http_port}/stats/agents

Or IP addresses only: 3 http://{http_host:http_port}/stats/ips/

These can get quite big; so it's possible to filter both 'agents' and 'ips', simply add a minimum hit count to the URL. For example, to see a list of all IPs that have visted more than 100 times:

http://{http_host:http_port}/stats/ips/100

Simply curl the URLs, pipe into 'jq' to pretty-print as desired. Script away!

Nepenthes used Defensively

A link to a Nepenthes location from your site will flood out valid URLs within your site's domain name, making it unlikely the crawler will access real content.

In addition, the aggregated statistics will provide a list of IP addresses that are almost certainly crawlers and not real users. Use this list to create ACLs that block those IPs from reaching your content - either return 403, 404, or just block at the firewall level.

Integration with fail2ban or blocklistd (or similar) is a future possibility, allowing realtime reactions to crawlers, but not currently implemented.

Using Nepenthes defensively, it would be ideal to turn off the Markov module, and set both max_delay and min_delay to something large, as a way to conserve your CPU.

Nepenthes used Offensively

Let's say you've got horsepower and bandwidth to burn, and just want to see these AI models burn. Nepenthes has what you need:

Don't make any attempt to block crawlers with the IP stats. Put the delay times as low as you are comfortable with. Train a big Markov corpus and leave the Markov module enabled, set the maximum babble size to something big. In short, let them suck down as much bullshit as they have diskspace for and choke on it.

Configuration File

All possible directives in config.yaml:

  • http_host : sets the host that Nepenthes will listen on; default is localhost only.
  • http_port : sets the listening port number; default 8893
  • prefix: Prefix all generated links should be given. Can be overriden with the X-Prefix HTTP header. Defaults to nothing.
  • templates: Path to the template files. This should be the '/templates' directory inside your Nepenthes installation.
  • detach: If true, Nepenthes will fork into the background and redirect logging output to Syslog.
  • pidfile: Path to drop a pid file after daemonization. If empty, no pid file is created.
  • max_wait: Longest amount of delay to add to every request. Increase to slow down crawlers; too slow they might not come back.
  • min_wait: The smallest amount of delay to add to every request. A random value is chosen between max_wait and min_wait.
  • real_ip_header: Changes the name of the X-Forwarded-For header that communicates the actual client IP address for statistics gathering.
  • prefix_header: Changes the name of the X-Prefix header that overrides the prefix configuration variable.
  • forget_time: length of time, in seconds, that a given user-agent can go missing before being deleted from the statistics table.
  • forget_hits: A user-agent that generates more than this number of requests will not be deleted from the statistics table.
  • persist_stats: A path to write a JSON file to, that allows statistics to survive across crashes/restarts, etc
  • seed_file: Specifies location of persistent unique instance identifier. This allows two instances with the same corpus to have different looking tarpits.
  • words: path to a dictionary file, usually '/usr/share/dict/words', but could vary depending on your OS.
  • markov: Path to a SQLite database containing a Markov corpus. If not specified, the Markov feature is disabled.
  • markov_min: Minimum number of words to babble on a page.
  • markov_max: Maximum number of words to babble on a page. Very large values can cause serious CPU load.

History

Version numbers use a simple process: If the only changes are fully backwards compatible, the minor number changes. If the user/administrator needs to change anything after or part of the upgrade, the major number changes and the minor number resets to zero.

v1.0: Initial release

]]>
https://zadzmo.org/code/nepenthes/ hacker-news-small-sites-42725147 Thu, 16 Jan 2025 13:57:43 GMT
<![CDATA[Stuff We Already Depleted]]> thread link) | @juancroldan
January 16, 2025 | https://jcarlosroldan.com/post/352 | archive.org

One last thing!

Who are you? We'll save this info in your browser for the next time.

]]>
https://jcarlosroldan.com/post/352 hacker-news-small-sites-42724855 Thu, 16 Jan 2025 13:33:18 GMT
<![CDATA[I Ditched the Algorithm for RSS–and You Should Too]]> thread link) | @DearNarwhal
January 16, 2025 | https://joeyehand.com/blog/2025/01/15/i-ditched-the-algorithm-for-rssand-you-should-too/ | archive.org

An image of a banner cartoon of the topic at hand

I waste too much time scrolling through social media. It's bad for my health, so why do I keep doing it?

Because once in a while, I'll find a post so good that it teaches me something I never knew before, and all the scrolling feels worth it. But I've stumbled upon an old piece of free and open source tech, relatively unknown today, which is THE solution of solving the problems with modern media without sacrificing accessible, good content: RSS.

Reddit, Facebook, Twitter — platforms built for engagement, not efficiency. Instead of showing you high-quality posts upfront, they pad your feed with memes, spam, and astroturfing. There is only so much 'good' content created in a day. By padding your feed with trash, they make the limited amount of good posts "last longer". These sites want you to spend more time scrolling on their website, so they feed you scraps which makes the occasional great post feels like a jackpot.
This concept, operant conditioning, was developed by B.F. Skinner — Yes, the mind behind the Skinnerbox.

While some sites offer filtering or sorting options, manually settings these options every time you want to access a subreddit is just not doable.

An image of a monkey in a skinnerbox, with Reddit acting as the reward stimuli

You could, of course, stop consuming content from these websites. However, this would mean potentially missing really good content; content you'd learn from, interesting ideas, and more.
But it doesn't have to be this way. You can reclaim your attention span while still having access to the same quality content as before.

Enter:

Image of a personified RSS feed showing a bad post to the blockfilter

RSS is like your youtube subscription feed in hyperdrive. Subscribe to sites you love and decide what shows up — no exploitative social media algorithm needed. No more ads or algorithms deciding how to keep you doomscrolling. This 1999 tech actually solves a lot of 2025 problems.
Here's the kicker: Most websites, even social media, quietly support RSS feeds.

You can filter out keywords, set minimum upvotes or like counts, and much more! Modern RSS clients allow you to make filters using Regex, and there are a lot of software and services you can use to tune up your filtering to 11.

TL;DR: Never see noise, and never miss hidden gems again!

But how do you get started with RSS? It's easier than you think!

Setup

I personally self-host an open source RSS reader: Tiny Tiny RSS
If you don't want to host it yourself, you can google for companies offering easy and accessible RSS readers.

Image explaining definitions of RSS levels of ease

To make it easier, let's differentiate between three levels of ease when it comes to adding a website to RSS: Easy, medium, and hard.
I'll be going over how to add several popular sites to your feed.

Easy 1: Youtube

Want a youtube channel in your RSS feed? Just copy the channel's URL and subscribe to it in your reader. Done.

Easy 2: IGN

If you like games, you might want to subscribe to IGN. There's no clear RSS button, so the best course of action would be to google "IGN RSS".

This leads to a nice IGN RSS Feeds page with multiple categorized feeds for you to pick from. If you wanted to subscribe to "Game Articles", you'd right-click on the game articles link, press "copy link", go to your RSS reader of choice and subscribe to the link you copied.

Now all IGN Game Articles will show up in your RSS feed as they are published!

Tip

Some websites don't have a dedicated RSS button, but still support RSS. You can discover their RSS urls by adding .rss, atom.xml, feed, etc. at the end of the site's URL, for example https://website.com/atom.xml. Almost all RSS readers support Atom feeds. For more examples, check this Reddit comment.

Medium 1: HackerNews

Image explaining RSS middlemen

Some sites like HackerNews have RSS support. However, this RSS can be extremely limited if you want to filter posts so your feed isn't spammed by low effort content. Some people are nice enough to set up a "middleman" between your RSS feed and the website, so you can pull the RSS feed through the middleman while doing actions like filtering it.

For example, if you wanted to subscribe to HackerNews but filter out low upvote count posts, you could subscribe to HNRSS instead of through HN directly. For example, I filter out posts below 150 upvotes by to this url: https://hnrss.org/newest?points=150
Sometimes these services open-source their code so you can self-host the 'middleman'.

Medium 2: Reddit

Image explaining makeup of reddit RSS URL

Warning

When removing the optional search term from a reddit search URL, don't forget to remove the + When removing the sort options If adding more search terms, add a + between them!

I love managing my homelab. I follow /r/homelab. Some posts are really good and teach me a lot.
However, there's a lot of noise posted to that subreddit; I do not want to see memes, and pictures of people's hardware setup gets boring quick. I'm interested in hidden gems, threads where a lot of interesting info is explained, things I can really learn a lot from.

Step 1: Filter out picture posts.

Reddit hack: Filter out picture posts by searching for 'self:true' in a subreddit. Bonus: You can subscribe to that specific search query as a RSS feed for text posts only.

So instead of subscribing to a subreddit's RSS directly, you do a search for posts in that subreddit and then subscribe to that RSS feed.
The RSS link you should subscribe to should look someting like this: https://www.reddit.com/r/homelab/search.rss?q=self%3Atrue&restrict_sr=on
You can change 'homelab' to your subreddit of choice.

Note

The restrict_sr=on parameter in the URL (probably) means "Restrict_subreddit". Removing this from the search will yield results from different subreddits than the one you're searching in. If you think that parameter is redundant, I agree.

There are a lot of text-only submissions on /r/homelab. Gems are relatively sparce. Lots of low quality content. It's not the subreddit's fault; this is standard across Reddit.

Step 2: Filter for quality

Seems easy; let's add a 'minimal upvotes' query to the search, right?
Sadly, Reddit doesn't support that.. a shame, really.
However, a workaround is sorting by 'Top' and asking the search to show us the 'top posts of this week'. Note: 'This week' would mean 'past 7 days' instead of 'posted this week'.

Filtering by 'Top of ...' always returns 25 items. This means that if you sort by 'top of this week', on average of 25/7=3,57 NEW posts get added to your feed each day! This is a great way to only see the highest scoring posts of each day.

Adding this sort on top of the RSS feed from step 1 results in an URL like this: https://www.reddit.com/r/homelab/search.rss?q=self%3Atrue&restrict_sr=on&sort=top&t=year

Bug

If you don't care about only seeing text posts, removing self%3Atrue does NOT work for RSS feeds, even though it does work for direct searches. Instead, subscribe to the subreddit's "top" RSS and filter by time. For example: https://www.reddit.com/r/homelab/top.rss?t=month

For reference, here is how many posts you would get in your RSS feed, depening on your reddit sorting:

Image explaining makeup of reddit RSS URL

And just like that, we converted a high-noise subreddit to an RSS feed which only gives us the best the subreddit has to offer.

Tip

If you wanted to subscribe to all new posts in a subreddit, you would subscribe to an url like https://www.reddit.com/r/SUBREDDIT_NAME/new/.rss?sort=new. For a more extensive Reddit RSS guide, see this post

Hard:

Some sites might not have support for an RSS feed. Sometimes you can get away with a neat google trick:

Image explaining makeup of reddit RSS URL

Most of the time you'd need something to generate the RSS for you. You could use one of many RSS feed generators available online, or host one yourself. Most of these feed generators have enhanced filtering tools as well.

I haven't had to do this yet, however I've heard really good things about the open source RSS-Bridge

How you'd set up a feed generator depends on the software, so I won't expand upon that here.

Conclusion

Separating yourself from the algorithmic whims of social media platform is easier than ever. With RSS, you can stay informed, save time, and never miss the content that truly matters.

This blog also has an RSS feed!

To end this post, here is a list of (RSS supported) sites I think are really interesting. Linked are excellent articles for first-time readers!

]]>
https://joeyehand.com/blog/2025/01/15/i-ditched-the-algorithm-for-rssand-you-should-too/ hacker-news-small-sites-42724284 Thu, 16 Jan 2025 12:18:32 GMT
<![CDATA[Strategies to Complete Tasks with ADHD]]> thread link) | @adhs
January 16, 2025 | https://schroedermelanie.com/adhs-nichts-zuende-bringen/ | archive.org

Menschen mit ADHS (Aufmerksamkeitsdefizit-/Hyperaktivitätsstörung) erleben den Alltag oft als ständiges Auf und Ab von Konzentration, Emotionen und Energie. Dieses innere Chaos führt nicht nur dazu, dass Aufgaben häufig abgebrochen werden, sondern auch, dass sich Betroffene schnell überfordert fühlen. Um zu verstehen, warum das so ist, lohnt es sich, einen Blick auf die emotionale Ebene und die Funktionsweise des Nervensystems bei ADHS zu werfen.

Menschen mit ADHS starten meist mit großen Vorsätzen in den Tag, doch diese verschwinden schnell im Chaos der Ablenkungen. Jeder neue Impuls scheint so wichtig, dass du ihn nicht ignorieren kannst und so in einen Strudel von Ablenkungen gerätst.

Die neurologischen Besonderheiten bei ADHS sind eng mit den beschriebenen emotionalen Herausforderungen verknüpft. Das Gehirn von Menschen mit ADHS funktioniert anders, insbesondere in den Bereichen, die für Fokus, Impulskontrolle und die Regulation von Emotionen zuständig sind.

Die Schwierigkeiten von Menschen mit ADHS, an Aufgaben dranzubleiben und sich nicht überfordert zu fühlen, sind kein Zeichen von Faulheit oder mangelndem Willen. Vielmehr liegen die Ursachen in der besonderen Funktionsweise ihres Nervensystems und der intensiven emotionalen Verarbeitung. Ein besseres Verständnis für diese Dynamik – sowohl von Betroffenen als auch von ihrem Umfeld – kann den Umgang mit diesen Herausforderungen erheblich erleichtern.  

]]>
https://schroedermelanie.com/adhs-nichts-zuende-bringen/ hacker-news-small-sites-42724179 Thu, 16 Jan 2025 12:02:47 GMT
<![CDATA[National IQs Are Valid]]> thread link) | @noch
January 16, 2025 | https://www.cremieux.xyz/p/national-iqs-are-valid | archive.org

If you follow me here or on Twitter/X, I’m sure you’ve seen a map like this, showing country-level differences in average IQs:

The figures in this map are derived from a raft of studies compiled by Richard Lynn and Tatu Vanhanen in their 2002 book IQ and the Wealth of Nations. The book itself is little more than a compilation and discussion of these studies, all of which are IQ estimates from samples located in different countries or based on diasporas (e.g., refugees) from those countries.

To get this out of the way, the estimates from IQ and the Wealth of Nations hold up. They are replicable and they are meaningful. At the same time, they are contentious. Lynn and Vanhanen’s estimates have many detractors, but virtually all of the negative arguments have one thing in common: they’re based on the idea that the estimates feel wrong, rather than with any actual inaccuracies with them.

Let’s review.

One of the most common arguments against Lynn and Vanhanen’s national IQ estimates is that it is simply impossible for whole countries’ mean IQs to be what people often consider to be so low that they’re considered prima facie evidence of mental retardation. This feeling is based on misconceptions about how mental retardation is diagnosed and defined, and misunderstandings about the meanings of very low IQs across populations. Let’s tackle definition first.

The belief that mental retardation is defined by an IQ threshold is similar to many of the arguments against the validity of national IQs in that it is based on a failure to give even a cursory thought to one’s own arguments. It’s a belief that cannot survive reading the latest version of the DSM, or for that matter, thinking of psychologists as competent people. If you open up the DSM-5 and turn to the section entitled Intellectual Disabilities, the first thing you see is the diagnostic criteria, which read as follows:

The following three criteria must be met:

  1. Deficits in intellectual functions, such as reasoning, problem solving, planning, abstract thinking, judgment, academic learning, and learning from experience, confirmed by both clinical assessment and individualized, standardized intelligence testing.

  2. Deficits in adaptive functioning that result in failure to meet developmental and sociocultural standards for personal independence and social responsibility. Without ongoing support, the adaptive deficits limit functioning in one or more activities of daily life, such as communication, social participation, and independent living, across multiple environments, such as home, school, work, and community.

  3. Onset of intellectual and adaptive deficits during the developmental period.

There are four listed severity levels for mental retardation, Mild, Moderate, Severe, and Profound, and they are “defined on the basis of adaptive functioning, and not IQ scores, because it is adaptive functioning that determines the level of supports required. Moreover, IQ measures are less valid in the lower end of the IQ range.” It’s true that most mentally retarded people have IQs in the range of 55 to 70, so it’s easy to get misled into thinking that IQ is the defining factor for mental retardation. But an IQ of about 70 and below only indicates (but doesn’t diagnose) mental retardation because of what it tends to be caused by in certain populations. This is a roundabout way of saying that a low IQ indicates mental retardation because of what it means for a person's behavior, which, I'll explain, covaries with its causes.

IQs are mostly normally distributed, and IQs represent the influences of multiple different constructs and causes, but they primarily reflect differences in general intelligence. At the same time, there’s a major deviation from normality at the lower end of the scale. If you sample well enough, the picture you’d get would look like this:

The reason is that there are the normal-range causes that produce the traditional bell curve, and then there are extreme circumstances that produce extraordinarily low IQs. Contrarily, we don't know of anything that produces abnormally high IQs. There's no known one-off mutation that makes someone a genius, but there are several mutations that we now know can make a person extremely unintelligent. Consider these:

Image
Young and Martin 2023, Figure 1

It's much easier to break a machine than to throw a wrench in it and make it work better. And that makes sense! Your wrench in the gears is likely to break something, not to increase efficiency. If you hit someone in the head hard enough, you can reduce their IQ score, but not in a million years will you turn them into von Neumann.

The reason someone's IQ drops after you bludgeon them is more singular and specific than the reasons IQ varies in the general population. A hit to the noggin may leave someone unable to flex their short-term memory, even if their visuospatial rotation capabilities are unaffected. There's some degree of isolation of function and compensation for deficits in the brain. We know this thanks to many observations, like on the effects of neural lesions.

When we say that someone with an IQ of ≤70 is mentally retarded, we're saying that they lack adaptive behavior—that they're not very bright, and so it makes living life hard. When someone with an IQ >70 is mentally retarded—which happens!—we’re saying the same thing. But if a population legitimately has a mean IQ of 70 (and some do), we'll notice that they're not drooling troglodytes who can't put on their shoes. This is because the reasons for their low IQ are not things that cause specific and extreme deficits, but instead, things that cause normal-range variation, which is far less severe in nature than something that causes massive, specific, discontinuously-caused deficits.

Supposing discontinuous causes that create major, specific deficits yields testable consequences. We can see them play out clearly by leveraging different countries’ population registers.

In large Israeli (B) and Swedish (A) datasets, we can see that when a person has a mild intellectual disability, their sibling tends to be less intelligent too. You also see this for heights, autism, or any other highly polygenic trait. This is expected with continuously distributed causes because siblings share portions of their genetic endowments.

But when a sibling has a severe—otherwise known as, "idiopathic"1—intellectual disability that's likely to have a discontinuous cause, the distribution of their siblings’ IQs is the same as the distribution for the general population. And of course they would, because the reason for the difference isn’t something that siblings should be expected to partially share.

The deficits in adaptive behavior that result from idiopathic intellectual disability are ones that make life hard to live in discrete and extreme ways. In some cases, you wouldn't even recognize them as being "retarded" because they have conditions like amnesia, where the person is clearly odd and unfortunate, and they score poorly on an IQ test, but they might sound totally coherent otherwise.

Normal-range causes lead to linear changes in adaptive behavior; idiopathic mental retardation, leads to a discontinuous decrease in adaptive behavior.

Accordingly, an IQ of 70 does not have the same meaning for members of different groups, since if a group with a mean IQ of 70 randomly pulls a person with an IQ of 70, they aren't likely to score like someone who's mentally retarded for idiopathic reasons. Arthur Jensen, incidentally, predicted this sort of result in his 1972 Genetics and Education based on some observations he made as an educator.

My student said he was looking for a good culture-free or culture-fair test of intelligence and had not been able to find one. All the tests he used, whether they were claimed to be culture-fair or not, were in considerable agreement with respect to children diagnosed as educationally mentally retarded (EMR), by which they were assigned to special small classes offering a different instructional program from that in the regular classes. To qualify for this special treatment, children had to have IQs below 75 as well as lagging far behind their age-mates in scholastic performance. My student, who had examined many of these backward pupils himself, had gained the impression that the tests were quite valid in their assessments of white middle-class children but not of minority lower-class children.

Many of the latter, despite IQs below 75 and markedly poor scholastic performance, did not seem nearly as retarded as the white middle-class children with comparable IQs and scholastic records. Middle-class white children with IQs in the EMR range generally appeared more retarded than the minority children who were in special classes. Using nonverbal rather than verbal tests did not appreciably alter the problem. I confirmed my students observations for myself by observing EMR children in their classes and on the playground and by discussing their characteristics with a number of teachers and school psychologists. My student’s observations proved reliable.

EMR children who were called ‘culturally disadvantaged’, as contrasted with middle-class EMR children, appeared much brighter socially and on the playground, often being quite indistinguishable in every way from children of normal IQ except in their scholastic performance and in their scores on a variety of standard IQ tests. Middle-class white children diagnosed as EMR, on the other hand, though they constituted a much smaller percentage of the EMR classes, usually appeared to be more mentally retarded all round and not just in their performance in scholastic subjects and IQ tests. I asked myself, how could one devise a testing procedure that would reveal this distinction so that it could be brought under closer study and not depend upon casual observations and impressions.

The distinction between types of mental retardation that are demarcated by their symptoms and, thus, by their causes, is important to understand. It's why an IQ of 70 for a Japanese person is likely to indicate an extraordinarily severe issue in need of attention, while for a Bushman, they won't have any trouble surviving. This is why the DSM-V notes in the Diagnostic Features section and bolded here: “The essential features of intellectual disability are deficits in general mental abilities (Criterion 1) and impairment in everyday adaptive functioning, in comparison to an individual’s age-, gender-, and socioculturally matched peers (Criterion 2).”

Keep that bolded part in mind and you’ll understand the importance of test norming and, knowing about discontinuous causes and the adaptive functioning deficit requirement for retardation diagnosis, you’ll never again be able to think that a mean IQ around 70 invalidates national IQs. But there’s another reason you shouldn’t, and it’s that you’re probably not an ultra-hereditarian.

In 2010, Richard Lynn pointed out that high IQs in Sub-Saharan Africa were incompatible with anything but the judgment that the group differences in the U.S. were entirely genetic in origin and even extremely poor environments have no effects on cognitive development. In response to an attempt to say that Sub-Saharan Africans had a mean IQ closer to 80 than 70, he wrote:

[The] assumption that [some of these samples of] children had IQs of 85 and 88 seems improbable. These are the IQs of blacks in the United States. It can hardly be possible that blacks in the United States who have all the advantages of living in an economically developed country, with high income, good health care, good nutrition and education, would have the same IQ as blacks in impoverished Nigeria. If this were so, we would have to infer that these environmental disadvantages have no effect whatever on IQs and even the most hard line hereditarians would not go that far.

Or, if you want to see this point made diagrammatically (courtesy of X user AnechoicMedia), here you are:

Image

Low IQs are also predictable from national development, making them that much more realistic. Using the latest national IQ dataset, I’ll show this for Sub-Saharan Africa—a region often claimed to have invalid IQs precisely because they’re ‘too low’.

First, we’ll predict Sub-Saharan African national IQ from a regression of log(GDP PPP Per Capita) on national IQ estimates. Whether the regression is performed with or without Sub-Saharan Africa, the results are similar. The measured mean IQ of Sub-Saharan Africa is 71.96, the predicted IQ in the regression with Sub-Saharan Africa included is 74.86, and without them, it’s 76.78. Or in other words, it’s not very different.

Image

Without the logs, the predicted IQs are 78.29 and 82.47, but that’s not an appropriate model, as you’ll see in a moment. But first, you might contest the latest national IQ dataset. The studies underlying its estimates and the methodology to assemble them are well-documented, so if you really want to contest them, you should start there. But if you dismiss them out of hand, you still can’t escape prediction by development, because we can just use the World Bank’s Harmonized Learning Outcomes (HLOs), an alternative national IQ dataset created by qualified researchers (including Noam Angrist), with good methods (test-score-linking), recently published (2021) in a respectable journal (Nature).2

Using HLOs, the observed mean IQ of Sub-Saharan Africa turns out to be 71.05. The predicted mean IQ with Sub-Saharan Africa in the regression is 72.25, and without it, 72.55. Without logs, those predictions become 77.35 and 81.85. But again, that model isn’t appropriate. The reason it’s not appropriate is due to nonlinearities that logging helps to handle. Take a look:

Image

You can see this same nonlinearity elsewhere, such as in PISA scores:

So to overcome this issue, we can do a simple piecewise regression. Using a GDP PPP Per Capita of $50,000 as the cut-point—and feel free to go use whatever you want, it doesn’t really change the result so long as it’s reasonable—we get these regressions:

Image

With this method, the predicted mean IQ of Sub-Saharan Africa is 76.12 with it included and 79.77 without. Using the World Bank’s HLOs instead, the results are 73.75 with Sub-Saharan Africa and 76.6 without it.

All of these estimates are pretty close to the ground-truth, and I suspect they would be even closer if all the data came from the same years instead of near years, and I know it’s closer if Actual Individual Consumption is used instead of GDP Per Capita, since that tends to iron out some of the issues with tax havens and oil barons. But regardless, it should now be apparent that low national IQs are

  • Not indications of widespread, debilitating mental retardation

  • Where they’re predicted to be given levels of national development

  • Thus not a reason to cast aside national IQ estimates

In his earlier national IQ datasets, Lynn ran into a lot of missing data. To get around this issue, he exploited the fact that there’s spatial autocorrelation in order to provide imputed national IQs. Some people think, however, that these imputed IQs are bad and make his data fake, failing to realize that imputation is normal and it doesn’t have meaningful effects on Lynn’s national IQ estimates.

In the most recent dataset Lynn created before he died, his imputation procedure was like so:

To calculate these imputations, Lynn and Becker (2019a, b) took advantage of the spatial autocorrelation that often exists in international data and identified the three countries with the longest land borders that had IQ means. A mean IQ, weighted by the length of the land border, was calculated and used as an estimate for the country’s missing estimated mean IQ. For island nations, the three closest countries with IQ data were identified, and an unweighted mean was calculated and imputed as an estimated mean IQ value for the missing country’s data.

Geographic imputation of this sort is the responsible thing to do when data is not missing at random and it can be predicted from other values in the dataset, because more data means more power and, given the nature of the missingness, less bias. So whether Lynn was being responsible or irresponsible is a matter of how well the imputations hold up. Thankfully, they do hold up.

The simplest way to check the robustness of Lynn’s national IQs is to compare his imputed national IQs to subsequently sampled national IQs. I’ll do this with his much maligned 2002 and 2012 national IQs. To assess the validity of the imputed IQs I’ll correlate them with our current best national IQs and the World Bank HLOs.

Lynn’s 2002 imputed national IQs correlate at r = 0.90 with our current best national IQs and 0.72 with HLOs. 102 countries were imputed. Compared to our current best national IQs, 72 were overestimates and the average estimation error was 1.47 points upwards. The average overestimation on Lynn’s part was 3.71 points, and the average underestimation was 3.75 points. Since these were imputed countries, they don’t really reveal anything about Lynn’s estimation biases in general, as they’re a select subset of generally poor or small countries. Lynn’s 2012 imputed national IQs correlate at r = 0.92 with our current best national IQs and 0.76 with HLOs. 66 countries were imputed. Compared to our current best national IQs, 37 were overestimates and the average estimation error was 0.72 points upwards. The average overestimation on Lynn’s part was 3.48 points, and the average underestimation was 2.61 points.3

I want to mention again that it should not be surprising that this procedure doesn’t really do much. It’s simple, straightforward, and scientifically uncontroversial, plus it’s theoretically sound, so it’s not shocking that it works. In a related domain, I found that this worked out just fine in the U.S.

I took the NAEP Black-White score gaps from the lower-48 and predicted them for each state from the observed gap for their immediate neighbors. The Michigan-Minnesota and New York-Rhode Island water borders were counted as neighboring state borders, and the result of imputing across state borders was a correlation of about 0.60 with the real gaps:

Image

With a larger sample size and even more so with imputation just of observations that are missing for theoretically important reasons, this would almost certainly work better, but regardless, it’s fine: Imputation just works.

This is easy to check, so you have to wonder why people believe it. If it were true, it would be easy to show rather than to merely assert.4 To check this, just compare Lynn’s estimated national IQs to if independent collections of national IQ estimates and see if they vary systematically and meaningfully. I’ve done this, with both our current best national IQs as the reference and using World Bank HLOs as reference. The results are practically the same, but this should be unsurprising, since Lynn computed sets of national IQs based on IQ tests and based on achievement tests, and they were highly aligned. But I digress. Here’s what I found using the current best national IQs:

Compared to our current best estimates, Lynn’s original estimates were pretty close to the line, with a mix of under- and over-estimation. The degree of under- and over-estimation is minor, ranging between underestimating Sub-Saharan Africa by 1.89 points and overestimating Latin America by 4.21 points. Europe was overestimated by just 1.01 points, indicating no real evidence for the theory that Lynn favored Whites. If we drop imputed numbers, then we can see this even more clearly, because Sub-Saharan Africa without imputation was actually underestimated by an even larger margin of 2.40 points, but Europe without imputation was only overestimated by 0.18 points.

Fast forward to Lynn’s 2012 dataset and the results are even tighter, and the underestimation of Sub-Saharan Africa drops to 0.49 points, while the overestimation of Europe follows in lock-step, falling to 0.04 points. Without imputation, these numbers become 0.97 points of underestimation and, curiously, Europe is actually underestimated by 0.15 points.

In addition to producing replicable estimates, the estimates Lynn produced also weren’t off by much in general. I don’t think this should be surprising, but some people have responded to this with the argument that…

This is a response to the above that is, simply put, a complete lie. The idea here is that the above validation of Lynn’s national IQ estimates is based on looking only at permutations of Lynn’s national IQ estimates, derived largely from the same sets of studies. But there is only one way to arrive at that view, and it’s to simply assume it’s true without looking at the data to see that it’s clearly not.

The wonderful thing about Lynn’s data is that he documents all of his decisions and you can peruse his national IQ database, cutting out studies you don’t like and adjusting all the estimates accordingly. Reasonable changes to Lynn’s adjustments and inclusions don’t make a difference. The data is publicly available, so you can go and confirm that yourself. But an even simpler way to show that Lynn’s estimates hold up is to just see if they hold up in data that’s entirely non-overlapping with Lynn’s data. So for that, I’ll use the World Bank’s HLOs:

The World Bank HLOs correlate at 0.83 with Lynn’s estimates with imputation and 0.89 without imputation. Compared to HLOs, Lynn underestimated Sub-Saharan Africa by a piddling 5.03 points, but he also underestimated Europeans by 0.61 points, and overestimated Oceania, Latin America, and South Asia by 5.71, 2.74, and 5.89 points, respectively.

This is just not consistent with a pattern of strong, European-favoring misestimation, and if we drop Lynn’s imputations that becomes even clearer because the underestimation of Sub-Saharan Africa increases to 5.44 points, while the overestimation of Oceania, Latin America, and South Asia shift to 2.68, 1.97, and 1.92 points. And if we look at Lynn’s later 2012 national IQ estimates, the patterns just aren’t that different, a fact that’s true whether we subset HLOs to be derived completely beyond the dates of Lynn’s data or not.

So to summarize, unless everyone shares Lynn’s biases, then his national IQ estimates do not suggest he was biased in favor of Europeans. The impacts of his imputation methods also suggest that he wasn’t biased in that direction either.

A major argument that Lynn was in the wrong is that some of his samples were disadvantaged, underprivileged, or whatever other euphemisms you might wish to use for being poor, and that perhaps his samples from poor countries were worse in this respect, because he wanted them to look worse. This doesn’t stand up to scrutiny.

Lynn always caveated national IQs based on limited samples. Though you won’t see that if you just look at the resulting national IQ maps, you definitely see that if you look in his books describing what data was used, how it was gathered, and so on. It is true that some of Lynn’s referenced samples were very poor, but this always has to be considered in the light of representativeness. If poverty is the norm for a region, then samples should be poor to be representative. Similarly, if a health condition like anemia is the norm for a region—and in many places, it actually is!—then the best sample will have high rates of anemia, regardless of whether that’s ‘acceptable’ for a sample from a developed Western country.5

We can think about this in the context of a few sample means from poor places which we know to be psychometrically unbiased when compared with U.S. samples. In this case, I’ll reference samples Russell Warne provided data on from Kenya and Ghana.

The samples were from 1997 and 2015 and they used the WISC-III in Kenya and WAIS-IV in Ghana. The Kenyan test-takers came from grade 8 schools and the Ghanaian ones were public and private high schoolers from Accra, alongside a sample of university students.

The problem with this sampling is that education in Kenya in 1997 and Ghana in 2015 is nowhere near this level. In Kenya, most people didn't even reach grade 8 at the time, and in Ghana, most people don't reach university but university students were 57% of the sample! Furthermore, the Kenyan sample was from Nairobi, so it was at least sampled in what was, as of 2018, Kenya's 2nd-richest county and home to <9% of the population. The situation was even less representative in Ghana, where the sample was from Accra and thus also <9% of the population and a third of Ghana's GDP.

These samples were highly unrepresentative of their respective populations because they were relatively elite. The bias was not as bad in the Kenyan sample: they were one to two SDs above the country as a whole in socioeconomic status. But in Ghana, they were three to four SDs above the average. That's like comparing high school dropouts to college graduates.

Warne knew of this problem and wrote:

As eighth-graders, the members of this sample were more educated than the average Kenyan. In 1997, the average Kenyan adult had 4.7 years of formal schooling; by 2019, this average had increased to 6.6 years.

And:

Like the [Kenyan] sample, this [Ghanaian] sample is much more educated than the average person in the country. In 2015, the average Ghanaian adult had 7.8 years of schooling (which increased to 8.3 years by 2019), whereas 61% of [this sample] had 16 years of education or more.

Knowing this, mentally adjust their scores downward in accordance with how much you think being highly educated and relatively wealthy should overstate their scores relative to the general population. So, what were their scores? Here:

Image

To put these scores into intuitive terms, we need to do two things: the factor correlation matrix and an assumption of equal latent variances. The correlation matrices were provided in the paper's figures 1 through 3. If we assume that the scale we want is based on a mean of 100 and an SD of 15 ("the IQ metric"), we just place the gaps on that scale (i.e., *15 + 100), subtract the number of factors times 100, divide that by the square root of the sum of the elements in the correlation matrix (we'll use the American values, since their sample was larger), and add back 100.

The Kenyan sample, which was 1-2 SDs above the Kenyan average in terms of education, had a mean IQ score of 79.08 versus the American average of 100. The Ghanaian sample, which was 3-4 SDs above the Ghanaian average in terms of education, had a mean IQ of 92.32 versus the American average of 100. And do note, the American average the Ghanaian sample would be compared to would be lower than the average the Kenyan sample was compared to due to demographic change over time.

These estimates are much higher than Lynn’s, and that makes sense, because these are clearly socioeconomically extremely well-off samples, perhaps not by Western standards, but certainly by their own national standards. But some people think these are the sorts of samples that should be used to represent poor countries. That’s wrong, but the view has its supporters, like Wicherts, Dolan, Carlson and van der Maas, who claimed as much in 2010. But Lynn called this out at the time too:

Wicherts, Dolan, Carlson & van der Maas (WDCM) (2010) contend that the average IQ in sub-Saharan Africa assessed by the Progressive Matrices is 78 in relation to a British mean of 100, Flynn effect corrected to 77, and reduced further to 76 to adjust for around 20% of Africans who do not attend school and are credited with an IQ of 71. This estimate is higher than the average of 67 proposed by Lynn and Vanhanen (2002, 2006) and Lynn (2006).

The crucial issues in estimating the average IQ in sub-Saharan Africa concern the selection of studies of acceptable representative samples, and the adjustment of IQs obtained from unrepresentative samples to make them approximately representative. Many samples have been drawn from schools but these are a problem because significant numbers of children in sub-Saharan Africa have not attended schools during the last sixty years or so, and those who attend schools have higher average IQs than those who do not.

Lynn then reviewed general population studies and those returned a Sub-Saharan African mean IQ in the 60s. He subsequently reviewed primary school studies and the data yielded a median of 71, which he adjusted to 69 to account for the fact that only about 80% of the population gets a primary school education in Sub-Saharan Africa. After that, he reviewed studies of secondary school students, and those returned a mean IQ in the 70s, but that is not an acceptable estimate for a few simple reasons.

(1) many adolescents in sub-Saharan Africa have not attended secondary school and tertiary institutes. For instance, Notcutt estimated that in South Africa in 1950 only about 25% of children aged 7-17 were in schools and “we cannot assume that those who are in school are a representative sample of the population”. (2) Entry to secondary school has generally been by competitive examination, resulting in those with higher IQs being selected for admission. Thus, “entry to secondary schools in the East African countries of Kenya, Uganda, and Tanzania is competitive… approximately 25 percent of the population complete the seven standards of primary school and there are secondary places for 10-12 percent of these”. Similarly, Silvey writing of Uganda around 1970 stated that at this time only 2% of children were admitted to secondary schools and entry is determined partly by a primary school leaving examination, and Heynman and Jamison writing of Uganda in 1972, note that admission to secondary school is based on “achievement performance on the academic selection examination… and there are secondary school places for only one child in 10”.

This is egregious, but this is typical for higher estimates for poor places. They tend to be based on samples that are, as in these cases, relatively privileged, pre-screened for IQ, etc., and despite that still generally not that impressive in socioeconomic or IQ terms. These also aren’t the worst estimates like this that Lynn has noted. My favorite was this:

WDCM include a number of studies that cannot be accepted for a variety of reasons. Their samples of university students are clearly unrepresentative. The Crawford-Nutt sample consisted of high school students (IQ 84) in math classes admission to which “is dependent on the degree of excellence of the pupil’s performance in the lower classes” and described as “a select segment of the population”. The students were also coached on how to do the test and “Teaching the strategies required to solve Matrix problems yields dramatic short-term gains in score”. This is clearly an unrepresentative sample.

Curiously, people usually ‘get this’ sort of representativeness issue when it comes to things like China only sampling from rich, well-off areas to look good in international assessments like PISA and TIMSS, but they don’t get this when it applies to samples that look rich by one country’s standards and poor by the standards of the developed world. It’s precisely those sorts of large-scale examinations which bring me to the easiest way to vindicate Lynn’s sampling in a very general sense.

Large-scale international examinations like PISA have sampling frames, requirements to be met for a sample’s scores to be considered valid and comparable to those of other countries, and if countries meet them, then it’s likely their samples were sufficiently representative to make a statement about a country’s youth or, in the case of assessments like PIAAC, its adults. These large-scale assessments have standards for sampling, and they also have psychometric standards. Their samples end up representative—at least of developed countries (see above in this section)—and their test scores aren’t biased, and these still correlate highly with the results produced by Lynn and his colleagues. This is the basis for the World Bank’s HLOs, so by this point in this article, you’ve already seen this point made multiple times! And to be completely fair to Lynn, he made this point too, people just neglect that he did. It’s no doubt part of why he computed his own academic achievement test-based national IQs (see Footnote 3).

Some people probably still think that some of Lynn’s estimates are too low. Lynn believed the same thing in cases where there just wasn’t enough data, which is why he Winsorized national IQ estimates and expressed his doubts about extremely low estimates.

recently noted that sub-60 IQs also aren’t empirically supported. Through reviewing additional evidence on countries listed as having sub-60 IQs, he found that each one ended up with either no estimate (due to missingness) or an IQ above 60.

The unstated part of the argument that some national IQ estimates are unrealistically low is that somehow this disqualifies larger portions of the dataset, but that’s a non sequitur.

Some people have claimed that Lynn made poor countries appear to perform worse than they actually do by using samples of children instead of adults. This criticism reveals a lack of awareness that test scores have age-specific norms. But if we assume it’s a legitimate criticism, then it suggests Lynn unfairly advantaged poor countries since the existing not-so-strong evidence shows that those countries are more likely to have scores that decline relatively with age.

During Lynn’s life, he didn’t focus much on psychometric bias. This is fair, because by the time it became a big focus for the field, he was already old. But in any case, as I’ve noted above, we have little reason to think it’s a big concern for most estimates. Furthermore, it’s not even clear that bias is systematic in general. Consider, for example, the comparison of Britons and South Africans I discussed here. In that comparison, there was bias, but it favored the lower-scoring South African group!

In general, when people find representative samples and no psychometric bias, the results aren’t different from what Lynn found, so unless someone wants to substantiate this concern, it’s little more than a waste of ink.6

The Flynn effect is widely misunderstood. I've written an article on this that goes into much greater depth about what I mean. But the important point when it comes to national IQs is that the Flynn effect is not about differences in intelligence, instead, it primarily concerns test bias. The existence of the Flynn effect also doesn’t imply there will be convergence between countries, it cannot be said to be the source of any convergence across countries without evidence that doesn’t currently exist, and the Flynn effect is explicitly adjusted for in national IQ computation. It just doesn’t have any relevance to the discussion because the evidence for larger Flynn effects during catch-up economic growth is extraordinarily poor.

But furthermore, increases in scores for cohorts over time will sometimes reflect bias rather than ability gains and relative cognitive performance across countries is generally very stable. There’s not really a reason to consider this argument, even for countries proposed—but not shown—to have major upward swings in their national IQ, like Ireland:

Image

We know that IQ differences are partially causally explained by differences in brain size. Because development seemingly minimally impacts brain sizes—and it theoretically should not anyway—, brain sizes can be used successfully to instrument for national IQs, allowing us to estimate the causal impact of national IQs on outcomes like growth, crime rates, and so on. This has been done and it works well. The same result also turns up using ancestry-adjusted UVR and numeracy measured in the 19th century, both of which also can’t be caused by modern development.

These results suggest that if Lynn is getting the causality backwards from IQ to measures of national economic success, he’s still dominantly correct that national IQs precede development. Not only that, but national IQs are, as mentioned, largely stable over time, despite the world experiencing a lot of development. We can see this very directly using the World Bank’s HLOs again. The paper introducing them includes this diagram, showing (a) percentages enrolled in primary education over time, and (b) HLOs over the same period. Notice the dramatic increase in the former and relative stability of the latter.7

Image

Since we do know there’s a considerable degree of stability in measured national IQs, we can leverage stability as an assumption and see the curious result that, over time, it looks like Lynn’s estimates are vindicated more and more, because development measures have gotten more in line with his national IQs!

Ultimately, people who want to argue Lynn got causality backwards aren’t really taking issue with the national IQ estimates themselves, they’re just specifying an alternative claim that they hope sounds like it can invalidate Lynn’s estimates. But—and here’s the kicker—Lynn firmly believed that national IQs would increase with development; his estimates were point-in-time estimates, not the final letter forever and ever, and he fully expected them to change because he thought very poor places were environmentally disadvantaged. So this isn’t even really an argument against Lynn’s general views per se.

People often reply to national IQ estimates with news articles or misinterpreted scientific articles that they allege show low-scoring countries actually do very well. Two examples that were brought up to me recently were Iran and India.

The example of Iran was a meta-analysis of different studies of Iranians. Someone brought this up to me to claim that Iran actually had a national IQ like America’s, at 97.12. Perplexed, I asked the simple question: Did they use American or British norms? And the answer is ‘no’, the test norms were Iranian, so if anything, these samples had an IQ below what’s expected—that is, a mean of 100. There’s really no need to ask further questions about these results, because being on different norms and not having the handbook handy to make the scores comparable means that they do not permit international comparisons. But this is a pretty standard sort of argument for people to make to contradict Lynn’s estimates.

The example of India was a news report about alleged testing of Indian students by Mensa’s India chapter. The report reads:

In the past couple of months, Mensa India, Delhi, administered its internationally recognized IQ test to over 4,000 underprivileged children in Delhi and NCR as part of a unique project aimed at identifying and mentoring poor children with high IQ. Of the 102 extremely bright children it selected, over a dozen, including Amisha, achieved an IQ score of 145-plus, which puts her in the genius category.

The others achieved IQ scores of 130-145, which puts them in the category of ‘very gifted’ children. The average score in Mensa India’s IQ test is between 85 and 115. Interestingly, all of these children are sons and daughters of labourers, rickshaw pullers, security guards, street vendors, etc.

Now, does this say anything in defiance of Lynn’s numbers? No, because we don’t know the norms. We don’t even know much about these statistics at all, we just have hearsay without accompanying statistics. This barely rises above an anecdote, but it is the sort of thing that people will misinterpret to mean Lynn was wrong, somehow.

Another less common strategy to reject national IQs is to just unreasonably ignore data. For example, today I encountered someone who plotted Haiti’s national IQ over time with two datapoints, one from the late-1940s and the other from the late-1970s. They alleged that Haiti had gotten much smarter over time, with their national IQ rising from around 60 to almost 100. But looking at all available data, this view cannot be supported. What they did to make their claim isn’t even a reasonable way to compute a national IQ. Getting a national IQ estimate requires looking at multiple lines of evidence and qualifying the inclusion of samples and whatnot, whereas their strategy was to act like there were just two studies to discuss and to conclude that the later one was the ground-truth regardless of its reliability or any other of its qualities.

This complaint takes two forms. The first is the more general rejection of intelligence tests measuring intelligence. That’s not relevant to national IQs and it’s poorly supported, but the arguments on that topic are familiar, so I’ll skip to the relevant argument.

The second form is that, for some reason, differences in national IQs are due to different factors than those that explain test performance within populations. This perspective is incompatible with measurement invariance, so it is necessarily wrong for any psychometrically unbiased comparisons. A sense in which the claim can be recovered is that for a given unbiased comparison, there might be mean differences in specific factors rather than in g—a sort of international versions of the contra hypothesis for Spearman’s hypothesis. This is not the case for the PISA tests or any other unbiased national IQ comparison of which I’m aware, so while it’s a possible perspective, it’s empirically contradicted at the moment.

Some people prefer achievement tests to national IQs, despite the positive manifold strongly suggesting that achievement tests and IQ tests both measure g8, and confirmatory factor modeling confirming that. The people with that preference also tend to make another, related argument: that achievement tests are not measures of g and cannot be treated as such. But like the above arguments, there are only negative reasons to think this is true for international examinations like the PISA tests, which have a strong general dimension, much like standard IQ tests.

People really want national IQ estimates to be debunked or, worse, to feel personally favorable. So they concoct a lot of bad arguments to make it sound like national IQs are off. To recap, here are some of the ways:

  • They confuse themselves and others about how mental retardation is defined and act as though it’s defined by IQ alone, so certain national IQs must be implausible. In doing so, they ignore the importance and existence of norms, as well as the modern definition of mental retardation itself.

  • They claim methodological choices like imputation are extraordinarily biasing when that is not the case and checking shows that the choice is actually not even biasing in the direction of national IQs being unfair to the groups national IQs have been proposed to illustrate bias against.

  • They claim that sampling is directionally biased in a certain way, when inspection of the data generally shows that, if anything, it’s biased in a way that leads to understated lower-tail cognitive differentiation.

  • They make claims that ‘sound right’ like that comparing children and adults is bad, even when we have age-based norms, so this cannot be a genuine criticism unless it is theoretically qualified that somehow children in certain countries are more disadvantaged than their adults and that this disadvantage translates to lower cognitive performance. I’m not aware of anyone who believes this theoretical qualifier and existing evidence speaks against it.

  • They make groundless extrapolations like ‘The Flynn effect means national IQs are worthless’ or ‘national IQs have changed a lot’ and they refuse to justify these inferences.

  • They look for odd references and outlier studies to justify throwing out a whole corpus of material.

  • Etc.

The common thread between different national IQ criticisms is the weaponization of ignorance. Critics toss out claims they think are right or which sound right, and they don’t check their work. A stand-out example is that people regularly say that Lynn was biased against Sub-Saharan Africans on the basis of his use of samples that are well-off by the standards of their countries purely because they are poor by the standards of the developed world. But this argument cannot stand. It has no merit, and it only serves to insult the reader and to convince them that the person making the argument has done some of the required work to dismiss Lynn’s estimates, when they’ve really only done the required work to say they’ve just barely cracked open the book!

After throwing out enough criticisms, people feel that national IQ estimates simply cannot stand, that they must be wrong, or so many criticisms wouldn’t be possible in the first place. But on this they’re wrong, and the very fact that they have so many criticisms is a strike against them, because the criticisms are so uniformly bad that they should embarrass the person making them. Making matters worse, critics seem to never go back when they’re shown to be wrong. Their attempted debunkings get roundly debunked and national IQ estimates remain reliable as ever, and they just keep making the same tired arguments that no right-minded person could still believe.

The defining feature of criticisms of national IQs is not all of this lazy argument though, it’s what comes next: insults. People like Lynn are taken to be ‘stupid’, people who believe in a given estimate that upsets people regardless of how well-supported it is are taken to be ‘morons’, and looking into and understanding national IQ estimates becomes less common because, after all, the only people who would look into them are the sorts of ‘stupid morons’ who actually bother to check their work.

Think I missed any big arguments? Want more details about something I said? Need something explained? Noticed a grammatical error, spelling mistake, or other triviality? Think I’m right or wrong about some claim? Have more data for me to look at?

Then tell me, because this is a living post that will be updated over time.

Now enjoy the most up-to-date national IQ map. It’s imputation-free!

Jensen and Kirkegaard 2024

Discussion about this post

]]>
https://www.cremieux.xyz/p/national-iqs-are-valid hacker-news-small-sites-42723907 Thu, 16 Jan 2025 11:18:16 GMT
<![CDATA[Is there such a thing as a web-safe font?]]> thread link) | @mariuz
January 16, 2025 | https://www.highperformancewebfonts.com/read/web-safe-fonts | archive.org

Unable to extract article]]>
https://www.highperformancewebfonts.com/read/web-safe-fonts hacker-news-small-sites-42723543 Thu, 16 Jan 2025 10:17:01 GMT
<![CDATA[Accessibility essentials every front-end developer should know]]> thread link) | @MartijnHols
January 16, 2025 | https://martijnhols.nl/blog/accessibility-essentials-every-front-end-developer-should-know | archive.org

Published

,

updated

Many developers view accessibility as an overwhelming task, requiring a lot of extra effort or specialized knowledge. But a few basic practices can make a significant impact.

In this article, I'll walk you through the key accessibility principles I believe every front-end developer should apply when building components, including:

  • Semantic HTML: Use the right elements for interactive and native functionality.
  • Forms: Simplify labels and structure to improve usability for everyone.
  • Keyboard navigation: Ensure users can navigate around with their keyboard.
  • Modals: Modals have many accessibility requirements.
  • Image alt texts: Write better descriptions to make images more accessible.
  • Styling: Enhance accessibility through focus indicators, responsive design, and reduced motion.
  • ARIA Attributes: When and how to use ARIA to fill accessibility gaps.

These practices not only benefit users relying on assistive technologies, they improve the overall user experience (UX).

This article focuses on the basic things you, as a front-end developer, can do to improve accessibility and usability without spending much extra time and effort.

Aside

Most examples in this article use React, but the principles apply to any front-end app. Even if you're not using React, you can still benefit from the practices outlined here.

Semantic HTML

Accessibility begins with semantic HTML; using the correct HTML5 elements for their intended purposes. This helps browsers and tools understand the structure of your page allowing them to provide built-in accessibility benefits. And a nice bonus is that semantic HTML also improves SEO.

Interactive elements

The most important elements to get right are <button> and <a>. These have accessibility features built-in by default, which includes keyboard support (the behavior of which can differ per operating system) and providing semantic meaning to screen readers.

A common anti-pattern in (web) applications are divs with onClick handlers. Never use a <div> with an onClick handler as the only way to make an element interactive. These elements lack accessibility features, which limits the way users can interact with them while making it impossible for screen readers. Moreover, properly using <button> and <a> for interactive elements benefits all users:

  • Links allow users to right-click for a context menu with various actions, or to open it in a new tab by control-clicking it on Windows, command-clicking it on Mac, or by clicking it with the middle-mouse button.
  • Buttons enable users to navigate through your site with their keyboard. This allows power users to speed up their workflows, and is essential for assistive technologies.

If you need custom styling, you can fully restyle a <button> or <a> without sacrificing accessibility. More on styling buttons and links later.

Native elements

Beyond buttons and links, native elements like <select>, <input>, and <textarea> are accessible out of the box. A <select> dropdown, for example, works seamlessly with screen readers and keyboard navigation, providing a consistent user experience without extra work.

While it's tempting to build your own custom components for aesthetic or functional reasons, building accessible replacements for native elements is very difficult and time-consuming. Even though I'm typically not a fan of installing libraries for small problems (as I've written about in my articles on dependencies), in this case it's better to rely on widely-used and mature libraries that already have accessibility covered, like react-select.

Forms

One thing I repeatedly find in projects that I join, is form fields not contained in a <form>.

Every form field should be contained in a <form> with an onSubmit handler and a submit button. This enables browsers and screen readers to identify related fields and provides accessibility and usability benefits, such as allowing users to submit with the Enter key and, on mobile, jumping from field to field within the form without having to close the on-screen keyboard.

An animated GIF showing a React form with three fields; firstname, lastname and email, and a submit button. Each field is entered using the on-screen keyboard, and arrows atop the on-screen keyboard are used to jump to each next field. Finally the form is submitted using "return" on the keyboard.

Form fields in a form allow jumping between fields and submitting from the on-screen keyboard.

Labels

Every input field must have a clear label describing its purpose. Labels should be linked to the input field by making the for attribute (htmlFor in React) refer to the id of the input field:

<label for="email">Email:</label>

<input type="email" id="email" />

Although it's valid HTML to implicitly link the label and input by omitting the for attribute and wrapping them together in a <label> element, not all screen readers support this properly. To ensure good support across all assistive technologies, it's best to always use the for attribute.

Aside

In React, I'm not a fan of hard-coded ids as components are meant to be easily reusable and may be rendered multiple times on the same page. To avoid id conflicts, you can use React's useId hook to generate unique ids for each field. See Generating IDs for several related elements for an example of how to do this in forms.

Placeholders

Placeholders are not substitutes for labels. They disappear when users start typing, which can leave users confused about what the field is for. They're also often harder to read due to their low contrast. Additionally, placeholders make it harder to identify which fields have not yet been filled, as shown in the image below.

Two forms side-by-side, both with firstname, lastname and email fields and a submit button. Fields on the left form have placeholders, making it appear like fields are filled with example values.
Neither form has been filled, but placeholders in one make that harder to tell.

Always use a proper <label> and try to use placeholders sparingly.

Keyboard navigation

The keyboard is an essential alternative tool to navigating with a mouse. Make sure users can navigate your app logically with the Tab key and trigger actions with Enter. Using native HTML elements like <button> and <a> plays a significant role in making this seamless.

Focus indicators

Focus indicators are essential for keyboard navigation. Never disable focus indicators completely. The :focus-visible selector, rather than :focus, allows you to show focus indicators only when browsers deem it relevant to the user. This provides a solution to the old complaints that focus rings are visually ugly without sacrificing accessibility.

An animated GIF showing a modal with a form for creating a project in MoneyMonk (text in Dutch). The focus indicator moves through the fields, showing the user's current position. At the end, it loops back to the close button.
Jumping through form fields with focus indicator

Aside

The GIF above has a custom field for the "Soort" (type) field. It's fully accessible, as it's built with radio buttons and CSS. The radio buttons are visually hidden but fully accessible; they can still be selected and are announced by screen readers. This highlights the power of using semantic HTML as much as possible.

Modals

Modals are common in larger web applications but it can be challenging to make modals accessible. The key challenges to consider are focus management, keyboard navigation, and ensuring inactive content is hidden.

The easiest way to make modals accessible is again by using the power of semantic HTML; use the <dialog> element. This element now has solid browser support and addresses most accessibility concerns with modals, including keyboard navigation.

Custom modals

If you're building a custom modal without <dialog>, there are many accessibility factors to account for. Modals are tricky to get right, and creating a fully accessible one from scratch is a significant challenge. Covering all the details would take an entire article, but here are some key points to consider:

Focus management

When a modal opens, the user's focus remains on the button that opened the modal, making it difficult for users to interact with the newly opened modal. This can also lead to users accidentally opening multiple instances of the same modal.

To address this:

  1. Set the focus to the modal as soon as it opens.
  2. Implement a focus trap to keep the focus within the modal so users cannot tab to the underlying page.
  3. Return focus to the triggering element when the modal closes.

A library like react-focus-lock provides good solutions for this. It handles the initial focus, traps the focus so it cycles only through active elements within the modal, and can restore focus to the triggering element when the modal closes using its returnFocus option.

Aside

For confirmation dialogs, consider settings the initial focus to the "Confirm" button. This allows users to immediately confirm an action by pressing Enter, just like in native dialogs.

Inactive content

When a modal opens, the content behind it is usually blocked visually by a backdrop. However, users, especially those using screen readers, may still be able to interact with the underlying content.

To prevent this, add the inert attribute to the content behind the modal. This makes the content non-interactive and hides it from assistive technologies.

To use inert in React, you need to portal your modal out of your main content. This ensures it falls outside of the inert scope, as inert applies to all children and cannot be disabled on child elements.

Closing modals

Users should be able to close modals with the Escape key. This is a common pattern that users expect that benefits users with mobility impairments and improves the overall user experience by providing a consistent way to dismiss modals.

Image alt texts

Alt texts are essential for making images accessible, and as a nice bonus, they improve SEO by helping search engines understand your content better.

You should add the alt attribute to images without exception. Use an empty alt text (alt="") only for images that are purely decorative or redundant to the text; this makes screen readers skip over images.

Writing alt texts

Writing good alt text is hard, and many guidelines on the internet are confusing. Over the years, I've developed a rule of thumb that works for me:

Imagine explaining the image to someone with poor vision. They can see some parts of the image, but can't make out everything. The alt text has to fill in the gaps of what they're seeing and not seeing.

A key takeaway from this approach is that if the image contains text, that text should always be included in the alt text. I find this approach leads me to add alt text to images more often than other guidelines would typically suggest.

Styling

Many aspects of styling (i.e. design) play a role in accessibility, such as:

  • Focus indicators: Highlight the focused element with an outline (as covered earlier).
  • Interactive elements: Ensure links look like links and buttons look like buttons, and they are easy to interact with.
  • Interactivity feedback: provide clear hover, active and disabled states.
  • Color contrast: Use sufficient contrast to distinguish elements.
  • Colors: Pair colors with text or icons for users with color blindness.
  • Responsive design: Support custom font sizes and zooming.
  • Animations and motion: Reduce or disable motion for users sensitive to it.
  • Font and spacing: Use clear fonts with adequate spacing, particularly for users with dyslexia.

Most of these are design-driven and fall outside of our direct influence. I'll focus on the areas where we can have the most direct impact.

Clickable areas

Ensure buttons and links have large, easily clickable areas for mouse and touch users. You can easily achieve this by adding padding to the element and, if necessary, using negative margins to make it appear visually equal.

A side-by-side of a modal close button. Left side shows the button visually, while right side shows it focused with the clickable area around it being much bigger. A cursor is on the focused button to better illustrate the clickable area.
A side-by-side of a modal close button, showing its clickable area.

Reduced motion

Animations can enhance usability by helping users maintain orientation on a page, such as when transitioning between states. However, some users have motion sensitivity and have opted for reduced motion in their OS settings. Respect this preference by disabling animations and transitions where applicable. This can be done with a simple media-query, such as:

@media (prefers-reduced-motion: reduce) {

.modal {

animation: none;

}

}

Accessible responsive design

One little-known fact is that browsers allow users to customize the default font size for web pages.

Accessible responsive design involves ensuring layouts adapt to the user's font size preference and the zoom levels they might use. Webpages should respect these preferences by using relative units like em and rem for font sizes, margins, text block widths and other layout values. Hardcoding these values in pixels should generally be avoided.

Applied to this blog

This blog uses relative sizing in most places (although it's far from perfect as it was a late addition). It doesn't set a base font size, using whatever is configured in the browser. From there, most other sizes are relative to that by using em values, and rem where necessary.

One example that really drove home the value of using em for non-text elements, is the width of this article text. I set it to 57em so that code blocks perfectly match the 80-character column width that I use in my IDE (plus it comes close to the ideal word count per line). Because the container scales with font size, the amount of words per line remains consistent regardless of the user-configured font size.

Using em for margins also makes a lot of sense, especially around text elements, as whitespace tends to grow with font size.

ARIA attributes

An article on accessibility wouldn't be complete without mentioning ARIA attributes, even if it's focused on things benefitting all users, not just those relying on screen readers.

While semantic HTML is a good starting point, ARIA attributes should only be used as a last resort when semantic elements can't achieve the desired result. Misusing ARIA can do more harm than good, so it's best to use them sparingly and thoughtfully.

The two most important ARIA attributes are:

aria-label

Adds an accessible label to elements that do not have visible text. For example, a search button with only an icon should include an aria-label clarifying it:

<button aria-label="Search">

<SearchIcon />

</button>

While it can be used on any element, aria-label should only be used on . It is not supported on non-interactive elements, and using it there can result in the label being ignored or causing confusing, unexpected, or annoying announcements.

aria-hidden

Hides elements from screen readers without removing them visually. This is ideal for decorative or redundant elements:

<div>

React <ReactLogo aria-hidden />

</div>


While these two attributes are a great starting point, there are many more ARIA attributes that you will need if you decide to go for full screen reader support. Some noteworthy ones are aria-live, aria-expanded, and aria-describedby, but it quickly becomes quite involved if you want to do it right.

Conclusion

These are the tools and principles that I reckon every front-end developer should use when building components. Accessibility isn't a separate task to tackle later, it's something that should be a part of your development process from the start.

As we've seen, most accessibility improvements don't just benefit users with specific needs, they enhance the usability and overall user experience (UX) for everyone. There are even SEO benefits, as search engines may rank sites higher that demonstrate good accessibility practices.

While the changes outlined in this article cover the basics and can take you a long way, full accessibility requires more effort. At some point, you'll actually need to test your app with a screen reader to ensure it truly works for all users.

With these foundational practices in place, you'll be well on your way to creating inclusive and user-friendly applications for everyone.

]]>
https://martijnhols.nl/blog/accessibility-essentials-every-front-end-developer-should-know hacker-news-small-sites-42723465 Thu, 16 Jan 2025 10:06:54 GMT
<![CDATA[Laptop archeology or how to install NixOS 24.11 on a 25 year old laptop]]> thread link) | @todsacerdoti
January 16, 2025 | https://blog.mynacol.xyz/en/nixos-on-fossils/ | archive.org

Unable to retrieve article]]>
https://blog.mynacol.xyz/en/nixos-on-fossils/ hacker-news-small-sites-42723185 Thu, 16 Jan 2025 09:19:42 GMT
<![CDATA[Enhancing GitHub Actions Observability with OpenTelemetry Tracing]]> thread link) | @de107549
January 16, 2025 | https://www.dash0.com/blog/enhancing-github-actions-observability-with-opentelemetry-tracing | archive.org

Why Use OpenTelemetry Tracing for GitHub Actions?

OpenTelemetry tracing offers several benefits when applied to GitHub Actions:

  1. End-to-end visibility: Trace the entire lifecycle of your workflows, from trigger to completion.
  2. Performance optimization: Identify bottlenecks and slow-running steps in your pipelines.
  3. Error detection: Quickly pinpoint where and why failures occur in your workflows.
  4. Dependency analysis: Understand how different jobs and steps interact within your workflows.

Implementing OpenTelemetry Tracing in GitHub Actions

Implementing OpenTelemetry tracing for your GitHub Actions workflows is surprisingly simple. You can achieve this with a single workflow file that utilizes the corentinmusard/otel-cicd-action action.

To set it up, create a new workflow file in your repository’s GitHub Actions workflow directory .github/workflows/ with the following content:

.github/workflows/otel-traces.yaml

012345678910111213141516171819

name: Export OpenTelemetry Trace for CI

name: OpenTelemetry Export Trace

- name: Export Workflow Trace

uses: corentinmusard/otel-cicd-action@v1

otlpEndpoint: ${{ secrets.DASH0_OTLP_ENDPOINT }}

otlpHeaders: ${{ secrets.DASH0_OTLP_HEADERS }}

githubToken: ${{ secrets.GITHUB_TOKEN }}

runId: ${{ github.event.workflow_run.id }}

The action requires the configuration of two secrets to describe where and how to export workflow telemetry:

  • DASH0_OTLP_ENDPOINT: grpc://ingress.eu-west-1.aws.dash0.com:4317
  • DASH0_OTLP_HEADERS: Authorization: Bearer auth_XXXXXXXXXXXXXXXXXXXXXXXX

Understanding the Configuration

Let's break down the key components of this workflow:

Trigger

This workflow is triggered when the specified workflows (i.e. CI) complete their execution. This ensures that tracing data is collected after the workflows have finished.

Job Configuration

0123

name: OpenTelemetry Export Trace

A single job named "OpenTelemetry Export Trace" is defined, with the latest Ubuntu runner.

Trace Export Step

01234567

- name: Export Workflow Trace

uses: corentinmusard/otel-cicd-action@v1

otlpEndpoint: ${{ secrets.DASH0_OTLP_ENDPOINT }}

otlpHeaders: ${{ secrets.DASH0_OTLP_HEADERS }}

githubToken: ${{ secrets.GITHUB_TOKEN }}

runId: ${{ github.event.workflow_run.id }}

This step uses the corentinmusard/otel-cicd-action to export workflow telemetry in the form of an OpenTelemetry trace. The action requires several inputs:

  • otlpEndpoint: The OpenTelemetry Protocol (OTLP) endpoint where the trace data will be sent.
  • otlpHeaders: Headers required for authentication with the OTLP endpoint.
  • githubToken: A GitHub token with appropriate permissions to access workflow data.
  • runId: The ID of the workflow run, used to identify which execution to trace.

Benefits of This Approach

  • Simplicity: With just one workflow file, you can start collecting tracing data for your GitHub Actions.
  • Flexibility: The action can be easily configured to work with different OTLP endpoints and authentication methods.
  • Non-intrusive: This tracing method doesn't require modifications to your existing workflows.
  • Comprehensive: It captures data for entire workflow runs, providing a complete picture of your CI/CD process.

By leveraging OpenTelemetry tracing in your GitHub Actions, you're taking a significant step towards more observable, efficient, and reliable continuous integration and delivery processes.

Using Dash0 for CI/CD OpenTelemetry data

Here are some screenshots of what the GitHub action traces look like inside Dash0. You can find all GitHub action traces in the Tracing view. You can either search by service.namespace = CI-CD which matches your GitHub action workflow name or you use service.namespace = <GITHUB REPOSITORY>.

You can then slice and dice through your data using Dash0’s product capabilities.

Dash0 Tracing view showing spans from GitHub actions. Red spans are failed GitHub action steps. This view shows start time, duration, github conclusion and github author name.

The tracing view gives you full insights which build steps take the most time and where it might benefit the most to invest engineering efforts to reduce CI build times.

Shows details for a GitHub action trace with all its child spans. We show the GitHub action step names and the duration of how long a step took.

While writing this blog post I discovered that the “Test Helm Charts” step was taking 2m 19s in total. That seemed too long to me. In the screenshot below you can see that most time was actually spent on the “Checkout” step. It was checking out the complete repository including all branches and tags which was not necessary.

Shows details for a GitHub action “Checkout” step that took almost 2min

Filtering by “dash0.span.name = Checkout” quickly revealed all places that might be misconfigured. Sometimes we need to check out all branches and tags, but for certain build steps that is not necessary.

Shows Tracing heat map with spans highlighted with duration around 2min

You can also create custom dashboards based on the GitHub action spans. These dashboards can help you identify where most of the time is spent.

Shows a dashboard that is based on GitHub action span metrics.

For your convenience, here are the used PromQL queries:

Top 20 - Average Span duration in minutes:

PromQL

0123456789

sum by(service_namespace, service_name, otel_span_name) (

otel_metric_name="dash0.spans.duration",

service_namespace="dash0hq/dash0"

GitHub Action status

PromQL

012345

sum by (github_conclusion) (

otel_metric_name = "dash0.spans",

service_namespace = "dash0hq/dash0"

You can also easily build a dashboard that shows successful deployments to development and production environments as the one below.

Shows a dashboard with deployment metrics derived from GitHub action spans. We see deployment numbers for development and production.

Summary

By implementing OpenTelemetry tracing in your GitHub Actions workflows, you can gain valuable insights into your CI/CD processes, leading to more efficient and reliable pipelines. This enhanced observability allows you to optimize performance, quickly identify and resolve issues, and better understand the interactions within your workflows. As the complexity of software development continues to grow, tools like OpenTelemetry tracing become increasingly crucial for maintaining agile and effective CI/CD practices. Embrace this powerful technology to take your GitHub Actions workflows to the next level of observability and performance.

]]>
https://www.dash0.com/blog/enhancing-github-actions-observability-with-opentelemetry-tracing hacker-news-small-sites-42723144 Thu, 16 Jan 2025 09:13:29 GMT
<![CDATA[Setting Up an RK3588 SBC QEMU Hypervisor with ZFS on Debian]]> thread link) | @kumiokun
January 16, 2025 | https://blog.kumio.org/posts/2025/01/bananapim7-hvm.html | archive.org

Unable to retrieve article]]>
https://blog.kumio.org/posts/2025/01/bananapim7-hvm.html hacker-news-small-sites-42722870 Thu, 16 Jan 2025 08:31:40 GMT
<![CDATA[Pixels Per Degree]]> thread link) | @yamrzou
January 15, 2025 | https://qasimk.io/screen-ppd/ | archive.org

Enter Screen Information

]]>
https://qasimk.io/screen-ppd/ hacker-news-small-sites-42722460 Thu, 16 Jan 2025 07:40:28 GMT
<![CDATA[Just Like a Book]]> thread link) | @ingve
January 15, 2025 | https://thefoggiest.dev/2025/01/16/just-like-a-book | archive.org

Just like a book

January 16, 2025

When you buy a book, the paper kind I mean, you can read it, give it away, sell it or keep it yourself after reading it. The point is, it's up to you. This holds for most physical objects, like houses, pianos and tomatoes. Not so with software. Usually, when installing an application on whatever device, you promise to have worked your way through a long end-user licence agreement that tells you in terse legalese that you can use the application personally and privately, not professionally, on just that device, and you can certainly not make a copy, let alone sell it to someone else when you’re done with it.

Now consider the licence below, which came with Borland’s Turbo Pascal IDE 3.0 for PC-DOS, MS-DOS, CP/M-86 and CP/M-80 (click in the picture to zoom to a readable size):

Borland’s No-Nonsense Licence Statement! (click to zoom)

Borland’s No-Nonsense Licence Statement! (click to zoom)

This licence statement is from 1985. Not only is it short enough to read, it contains no ambiguity and is perfectly clear for the intended audience (the software engineer). Staggeringly, it is also perfectly reasonable. You can make backup and archival copies of the software (in fact, on page 7 in chapter 1, under “Before Use”, it is strongly suggested you do) and you can give copies to others, as long as these copies aren’t used at the same time, “just like book.”

One could imagine that this licence was kind enough to be fully ignored. If copies were allowed and made, it is not unfathomable to see companies using more than one copy at a time. However, according to Wikipedia, the time in which the statement above was born, was when companies had few people who understood the growing personal computer phenomenon and so most technical people were given free rein to purchase whatever software they thought they needed. If that is true, then the license could very well be part of a marketing campaign that was targeting developers with their own budgets.

If you read carefully, you’ll notice it even ends with a joke. Of all the non-open source licenses I have seen, this one must be my favourite.

]]>
https://thefoggiest.dev/2025/01/16/just-like-a-book hacker-news-small-sites-42722195 Thu, 16 Jan 2025 06:55:04 GMT
<![CDATA[Using PCBs to create front panels for your projects]]> thread link) | @todsacerdoti
January 15, 2025 | https://arx.wtf/blog/1-front-panels-tips/ | archive.org

written on:

, last update:

Regular FR4 or aluminum PCBs can be used to create cheap (but beautiful) front panels for electronic projects. In this article, we'll explore various tips & techniques used to obtain the best results.

Example of an aluminum front panel. Black solder mask, white silkscreen (text), creamy dielectric on core (the frog) and silver HASL finish (the leaves).

It wouldn't be possible to compile these tips without reading through this massive thread on ModWiggler first. Huge thanks goes to everyone who contributed to the thread!

I personally use Kicad and JLCPCB to create my front panels. However, these tips are pretty universal and should apply to other CAD softwares and fab houses as well.

The test PCB

A PCB to test the capabilities of a fab house.

In order to test the capabilities of a fab house and demonstrate the points in this article, I created a PCB using various techniques. The bottom half of the board has the solder mask as the "background color" and shows the text and graphics on the silkscreen, on copper (HASL) and core. On the top half, the background is done by a continuous silkscreen fill, and the text and graphics are done on solder mask (an opening in silkscreen fill), copper and core.

Furthermore, the left half of the PCB has a solid copper fill; the right half has a cross-hatched copper fill. Smaller rectangular regions of different cross-hatch parameters (hatch width and gap) are present in the middle as well.

The PCB contains 4 smaller mounting holes (which are drilled), one big hole (which needs to be routed) and cutouts and ovals of various sizes (which are routed as well). The three top inner cutouts are rectangular with 90-degree angles - which technically isn't possible with JLCPCB. The three top-most protrusions on the outer board edge have 90-degree angles as well. The three bottom cutouts and the protrusions on the outer edge have the border radius of 0.8mm.

The project is available on my Github - you are free to use the gerbers or adjust the PCB to your needs first. If you happen to test the capabilities of a fab house, feel free to send me the pictures to arx@synth.sk!

I had this board manufactured by JLCPCB in two versions: one with white solder mask and black silkscreen, the other with black matte solder mask and white silkscreen. The core is aluminum. Here are the results. All the images are clickable and the grid on the background is in centimeters.

  1. Test PCB - white solder mask, black silkscreen, HASL finish, creamy ALU core dielectric
  2. Test PCB - black solder mask, white silkscreen, HASL finish, creamy ALU core dielectric

1. Fab house capabilities and communication

Since no two PCB manufacturers (fab houses) have the same processes, this is the prerequisite for all the other tips. Study the fab house capabilities on their website. These can also change from time to time, so it's better to check before every order. Ask their customer support about their processes before ordering, or be prepared to iterate. Search the internet for examples of previously finished panels. Communicate all your intentions clearly in the order notes. Tell them this is a front panel, not a standard electrical circuit, so they don't have to search for electrical errors.

A note on the word "panels": manufacturers also use this word for the process of panelizing multiple PCBs into one bigger PCB (see tip #4). Keep this in mind and use "front panels" or "faceplate" in communication.

For PCB design, some use Kicad, others EasyEDA (JLCPCB's tool), others Eagle or Altium. There are countless other choices, some free, some paid. Simple panels with text and basic shapes can be done directly in Kicad or EasyEDA for free.

For creating advanced artworks, use vector graphic editors like Inkscape (free) or Adobe Illustrator (paid). Exporting the artwork in DXF makes it easier to import to Kicad. Kicad also includes Image Converter for importing other (non-vector) types of graphics.

3. Consider the size of your front panel

This generally depends on the enclosure you'll be using for your finished project. Different fab houses can handle different maximum sizes, or have different tiers of service for different sizes of boards. In JLCPCB, keeping the total size of your boards under 100x100mm will get you the cheapest price.

4. Consider panelizing your...front panels

In PCB manufacturing world, panelizing is the process of repeating the same PCB design on one physical board multiple times, or using different designs next to each other on one physical board. Notice this is a different usage of word "panel" - we're not talking about "front panels" now, but about "collating multiple boards on one PCB panel". The individual boards on a panel are separated by e.g. v-scores or mousebites. These panels are then produced as a single board, which can save you money if you intend to create lots of front panels.

The fab houses have different panelizing abilities. JLC can panelize your boards for you, if you indicate it in their order form - for a fee, and the individual boards must be rectangular. Or you can panelize them yourself, see Kikit plugin for Kicad. You will have to study your fab house requirements and recommendations to successfully create a panel of PCBs.

Keep in mind that the process of de-panelizing of individual boards from the panel can leave marks and rough edges on the finished boards.

5. Choose the PCB core material

Two most common choices are 2-layer FR4 (fiberglass - classic PCB material) and 1-layer (single-sided) aluminum.

Comparison of stackups for 2-layer FR4 and 1-layer ALU PCB

About FR4 front panels

  • FR4 is less rigid - it might bend considerably if, for example, your users are going to insert jacks into huge panels. The panel can crack or the fibers can start falling off from the sides of the panel. Consider sealing the sides of your board with some polyurethane or acrylic product, or shellac (but don't clean it with alcohol afterwards).
  • You can work with both sides of the board. If you leave out copper and solder mask in certain areas on both sides of the board, you can achieve certain level of transparency of the panel and use this together with an LED for nice effects.
  • The copper fills (on both sides!) also affect the color of the solder mask on your front panel.
  • You can use different designs on the front and back side of the board. This can be used to add easter eggs to the back, serial numbers, additional info or whatever. Or your users can choose which side they prefer (if your panel is symmetrical and can be flipped, that is).
  • You can mount SMD packages on both sides of the panel. You can use through-hole components too.
  • You can design touch controls by using traces, vias, and an SMD connector on the back side.
  • The panel can be grounded by using copper fills and connecting the fill to ground on your main PCB somehow (with a via and a connector on back?).

About aluminum front panels

  • It's more rigid, less prone to bends and cracks.
  • You can usually work on one side of the board only (fab limitation), or it might be the cheaper option.
  • The core ALU material has no transparency. Using a copper fill on the back side (in case you order two-sided ALU board) will not affect the looks of the front side.
  • You can't use SMDs on the back (if it is a one-sided board). Through-hole components won't work if the holes are not individually insulated, which is expensive. The PCB core is conductive, so you'll end up with lots of electrical shorts.
  • The panel can be grounded by connecting it to ground on your main PCB, for example by using a battery contact with a spring touching the back side of the panel. Sand down any protective layers on the back side of the panel first.
  • You can still design touch controls by using traces, but connecting them back to your main circuit PCB might be problematic - you can't use a via and an SMD connector on the back.
  • It's generally more expensive.
  • Minimum size of the drill used for routing might be larger, limiting the border radius of your cutouts.

6. Board thickness and rigidity

The standard PCB thickness is 1.6mm. Sometimes you might want to choose more (2mm), especially for larger FR4 boards with large female jacks. This will probably increase the cost and affect the transparency of the board.

Using copper fills on both sides of FR4 boards can improve their rigidity. Mounting another regular PCB in parallel behind the front panel with jacks and standoffs connecting the panel to the second PCB can also help. Bolting two identical panels together (e.g. with jacks coming through both of them) will help, but the jacks must have sufficiently long bushings.

7. Color combinations

This is your color palette:

  • Silkscreen color: usually black or white. This is the top-most layer, printed on solder mask. It can also be printed directly on core, but that might peel off easily. Can also be used for large silkscreen fills, which can lead to uneven surface and "streaks" of silkscreen color.
  • solder mask color: black, matte black, white, green, red, yellow, blue, etc. Standard is green, any other color will probably be more expensive. Keep in mind that the color of the solder mask is usually different (brighter) if there is copper underneath it. With FR4 boards, the presence of copper and solder mask on the other side of the board can also change the solder mask color, because of core transparency. Matte colors are usually more expensive and might collect fingerprint marks more easily, especially on black solder mask.
  • Color of exposed copper - traces, pads and fills. These will be visible if you leave a solder mask opening above them. It can either be plain copper (which is unstable), gold (ENIG finish) or silvery color (HASL).
  • Color of the exposed core: yellow-ish with FR4, creamy with ALU on the front side (actually a dielectric layer on ALU core) or the color of aluminum on the back side of ALU PCB. You'll only see this color if there is no copper and no solder mask on top of the core. With FR4, presence of copper and solder mask on the back side changes this color because of core transparency.

Colors of white solder mask test PCB

  1. White solder mask, black silkscreen, solid copper fill, HASL finish, ALU core
  2. White solder mask, black silkscreen fill (notice the poor surface quality), solid copper fill, HASL finish, ALU core
  3. White solder mask, black silkscreen, cross-hatched copper fill, HASL finish, ALU core
  4. White solder mask, black silkscreen fill on top of cross-hatched copper fill, HASL finish, ALU core

Colors of black solder mask test PCB

  1. Matte black solder mask, white silkscreen, solid copper fill, HASL finish, ALU core
  2. Matte black solder mask, white silkscreen fill (notice the poor surface quality), solid copper fill, HASL finish, ALU core
  3. Matte black solder mask, white silkscreen, cross-hatched copper fill, HASL finish, ALU core
  4. Matte black solder mask, white silkscreen fill on top of cross-hatched copper fill, HASL finish, ALU core

The frog is creamy dielectric on core with copper outlines (HASL). Black matte solder mask, white silkscreen.

I also recommend this Reddit thread to see more examples of the color combinations on FR4 boards.

8. Copper finish

Some fab houses will allow you tu use raw, unfinished copper on traces and fills. This is unstable and will change its color with time and finger touches.

Two most common copper finishes are ENIG (gold) and HASL (silver). You can't use both of them on the same board.

ENIG (Electroless Nickel Immersion Gold) usually looks more professional and smooth. HASL (Hot Air Solder Leveling) can be uneven and prone to scratches. Use lead-free HASL (which is usually more expensive), since user will be able to touch it and lead is toxic.

HASL vs ENIG

  1. HASL finish. ALU board, white solder mask, black silkscreen. Notice the scratches.
  2. ENIG finish. FR4 board, black solder mask, white silkscreen.

9. Artwork resolution

Silkscreen usually is the least precise. Expect printing errors, possibly production smears etc. Depending on the fab process, the silkscreen might start to peel off with usage. As such, it's best to use it for non-critical accents in the artwork and text (watch out for minimum text size capability). Printing silkscreen directly on core is not recommended - it's easy to scratch off, especially on ALU PCBs.

Traces and solder mask openings are usually much more precise. If you want to "write text" or shapes with copper, use a copper fill and expose the solder mask in the desired shapes. If you don't want to use a copper fill, the copper traces can have exactly the same shape as solder mask openings, but expect imperfections.

Keep in mind that with solder mask openings, you can create a precise image or text directly on the core (or the core dielectric in case of the ALU core), if there's no copper on top of the core. Such openings will be visibly under the level of the solder mask. See the picture of frog in tip #7.

As for the artworks themselves - google "pcb art" and be impressed.

10. Texture and smoothness

It's recommended to use copper fills under solder mask to make it more smooth and hide any imperfection of the core. This will make the solder mask color brighter, though, and will leave a small border on the edges of the board and holes / cutouts, since the copper fill can't reach right up to the edge. Check the minimum clearance capability of your fab house, it's usually called copper-to-edge, copper-to-hole, track-to-edge etc.

Notice the (uneven) clearance between the copper fill and the oval cutouts, and on the board edge.

Always expect imperfections and variable quality, especially with large silkscreen fills.

Using cross-hatched copper fills can help against scratches and can look cool. It also makes silkscreen fills look better. You'll have to experiment with hatch widths and spacings (gaps).

Using cross-hatched copper fill

  1. White solder mask with cross-hatched copper fill. Notice the better-looking black silkscreen fill on top. Different hatch widths and gaps in the middle.
  2. White solder mask on solid copper fill, and some more cross-hatches in the middle. Poor quality of black silkscreen fill on top.
  3. Black solder mask with cross-hatched copper fill. Better-looking (but still poor) white silkscreen fill on top. Different hatch widths and gaps in the middle.
  4. Black solder mask on solid copper fill, and more cross-hatches in the middle. Poor quality of white silkscreen fill on top.

You can also use vias for artistic purposes. Choose between tented (covered by solder mask) or untented vias (exposed, with visible annular ring and the drilling hole).

11. Holes for jacks / pots and cutouts

Check the minimum and maximum hole size the fab house can drill with a regular drill. With JLCPCB, it's 0.3mm - 6.3mm. Include such holes in the drill file with gerbers. Also, check if plated holes (with copper on the inner sides of the hole) will have the diameter specified by you, or the diameter will be decreased by the thickness of plating. The fab house might also have different min and max sizes for plated and unplated holes. Ask the fab house if unsure.

The fab house can also make holes larger than their max drill size, but they will have to be milled (routed) or laser-cut. Check if such larger holes can be plated and if it will cost extra. Large holes should be on edge cuts layer (in Kicad), also called mechanical layer (Altium), .gm1 Protel extension. Routed holes might have rougher edges than simple drilled holes.

Always add some tolerance to hole sizes required for pots and jacks - increase their diameter slightly. For example, use 7.2mm hole for pots with M7 bushing. Check the datasheets of your jacks and potentiometers for any tips on panel hole size.

Oval shapes will always need to be routed as well and may have different min/max capabilities than simple holes.

Rectangular cutouts with sharp precise 90 degree corners might not be manufacturable. Check the minimum inner corner border radius, or ask the fab house what their minimum drill size used for routing is. For example, if the minimum drill size is 1mm, then the minimum border radius is 0.5mm. The drill sizes can be different for FR4 and ALU boards - for JLC (as of 2024), minimum drill size for FR4 is 1mm (border radius of 0.5mm) and for ALU 1.6mm (border radius of 0.8mm). These specs can change.

Imperfections in holes and cutouts

  1. Imperfectly routed big hole.
  2. This inner cutout was designed with 90 degree corners, which is not supported by JLCPCB.
  3. This outer protrusion was also supposed to have 90 degree corners.
  4. The outer corner, even though defined with a border radius, is also routed imperfectly.

Keep in mind that if you use non-rectangular shapes of your front panel, or add cutouts to the outer edge of your front panels, you will most likely have to panelize the boards yourself - if you want to panelize them (see tip #4).

12. Generating and uploading gerbers, ordering

For generating the output files to be used by your fab house, follow the instructions given by them. They are usually in their FAQ, Knowledge Base or Blog sections of their websites. For example, for Kicad and JLCPCB, use this.

Some fab houses like JLCPCB will print an order number on your front panel (silkscreen). You can specify a placement for this number by using "JLCJLCJLCJLC" text on the silkscreen layer of your board (and indicating it when placing the order). You can place it on the other side of the board, for example - in JLCPCB, this also works for one-sided ALU boards! Or you can ask them to not place the number at all, for a fee. If you forgot and already received boards with the numbers printed on them, you can try carefully removing them with a scalpel and clearcoat the panel afterwards.

The fab house might contact you with additional questions about your boards.

Pay attention to shipping options - sometimes, using the slowest and cheapest option might be better than using a courier service which will force you to pay additional import fees and taxes. For European union, it might be better to use shipping options labeled IOSS / DDP (taxes and import fees paid upfront by the fab house).

With cheap fab houses, always order more front panels than you need. Be prepared to receive boards that have scratches, various defects or marks left by tools the fab house had to use to manufacture the board. PCB manufacturers specialize in manufacturing functional PCBs, NOT artistic front panels. Sometimes you get lucky and most of the boards are usable; other times you can throw half of them away. Some fab houses will accept your complaints, some won't. If possible, use the option to add sheets of paper between individual boards before shipping.

  1. Scratches, and the silkscreen is already peeling off.
  2. The color of the core dielectric changed between two batches. The new color is less readable.

There are, of course, specialized front panel manufacturers which offer superior quality - but for a much higher price.

13. After you've received your front panels

You might want to clearcoat your panels to keep them from getting scratched. For panels with granular finish, you can use e.g. Edding E5200 spray for first layers and finish with Altona anti reflection varnish. Using only Altona, you will get smooth surface.

Clearcoating can also hide the scratches made by the fab house.

For better or worse, the fab houses' processes change over time, so don't expect the results to always stay the same.

Closing thoughts

Feel free to send me pictures of your front panels and tell me about your experience! The email is arx@synth.sk. With your permission, I might include the pictures here for future reference of everyone.

These tips were intentionally as general, CAD - agnostic and fab house - agnostic as possible. In my next blog post, I'll give you some more specific tips on using Kicad and JLCPCB. See you there!

]]>
https://arx.wtf/blog/1-front-panels-tips/ hacker-news-small-sites-42722181 Thu, 16 Jan 2025 06:52:36 GMT
<![CDATA[OpenAI's "blueprint for U.S. AI infra." AI economic zones and gov. projects]]> thread link) | @palmfacehn
January 15, 2025 | https://www.frontierfoundation.org/post/openai-to-present-plans-for-us-ai-strategy-and-an-alliance-to-compete-with-china | archive.org

OpenAI’s official “blueprint for U.S. AI infrastructure” involves artificial intelligence economic zones, tapping the U.S. Navy’s nuclear power experience and government projects funded by private investors, according to a document viewed by CNBC, which the company plans to present on Wednesday in Washington, D.C.

The blueprint also outlines a North American AI alliance to compete with China’s initiatives and a National Transmission Highway Act “as ambitious as the 1956 National Interstate and Defense Highways Act.”

In the document, OpenAI outlines a rosy future for AI, calling it “as foundational a technology as electricity, and promising similarly distributed access and benefits.” The company wrote that investment in U.S. AI will lead to tens of thousands of jobs, GDP growth, a modernized grid that includes nuclear power, a new group of chip manufacturing facilities and billions of dollars in investment from global funds.

Now that Donald Trump is President-elect, OpenAI has made clear its plans to work with the new administration on AI policy, and the company’s Wednesday presentation outlines its plans.

Trump plans to repeal President Biden’s executive order on AI, according to his campaign platform, stating that it “hinders AI Innovation, and imposes Radical Leftwing ideas on the development of this technology” and that “in its place, Republicans support AI Development rooted in Free Speech and Human Flourishing.”

OpenAI’s presentation outlines AI economic zones co-created by state and federal governments “to give states incentives to speed up permitting and approvals for AI infrastructure.” The company envisions building new solar arrays and wind farms and getting unused nuclear reactors cleared for use.

“States that provide subsidies or other support for companies launching infrastructure projects could require that a share of the new compute be made available to their public universities to create AI research labs and developer hubs aligned with their key commercial sectors,” OpenAI wrote.

OpenAI also wrote that it foresees a “National Transmission Highway Act” that could expand power, fiber connectivity and natural gas pipeline construction. The company wrote it needs “new authority and funding to unblock the planning, permitting, and payment for transmission,” and that existing procedures aren’t keeping pace with AI-driven demand.

The blueprints say, “The government can encourage private investors to fund high-cost energy infrastructure projects by committing to purchase energy and other means that lessen credit risk.”

A North American AI Alliance and investment in more U.S. data centers

OpenAI also foresees a North American AI alliance of Western countries that could eventually expand to a global network, such as a “Gulf Cooperation Council with the UAE and others in that region.”

The company also outlined its vision for nuclear power, writing that although China “has built as much nuclear power capacity in 10 years as the US built in 40,” the U.S. Navy operates about 100 small modular reactors (SMRs) to power naval submarines, and leveraging the Navy’s expertise could lead to building more civilian SMRs.

OpenAI’s infrastructure blueprint aligns with what Chris Lehane, OpenAI’s head of global policy, told CNBC in a recent interview. He sees the Midwest and Southwest as potential core areas for AI investment.

“Parts of the country that have been ‘left behind,’ as we enter the digital age, where so much of the economics and particularly economic benefits flow to the two coasts... Areas like the midwest and the southwest are going to be the types of places where you have the land and ability to do wind farms and to do solar facilities, and potentially to do some part of the energy transition — potentially do nuclear facilities,” Lehane said.

The infrastructure, Lehane explained, is contingent on the U.S. maintaining a lead over China in AI.

″[In] Kansas and Iowa, which sits on top of an enormous amount of agricultural data, think about standing up a data center,” Lehane said. “One gigawatt, which is a lot, taking, you know, 200-250 megawatts, a quarter of that, and doing something with their public university systems to create an agricultural-based LLM or inference model that would really serve their community but also make them a center of agricultural AI.”

Lehane cited an estimate that the US will need 50 gigawatts of energy by 2030 to support the AI ​​industry’s needs and to compete against China, especially when the country approved 20 nuclear reactors over the past two years and 11 more for next year.

“We don’t have a choice,” Lehane said. “We do have to compete with that.”

Originally published by Hayden Field on CNBC

]]>
https://www.frontierfoundation.org/post/openai-to-present-plans-for-us-ai-strategy-and-an-alliance-to-compete-with-china hacker-news-small-sites-42721909 Thu, 16 Jan 2025 06:12:20 GMT
<![CDATA[Text Editors (2020)]]> thread link) | @ossusermivami
January 15, 2025 | https://andreyor.st/posts/2020-04-29-text-editors/ | archive.org

As software engineers, and programmers, we mostly work with text, so obviously we’re all using some sort of a text-related program. Editing and navigating text is a huge part of our daily job, so a good text editor is like a good set of tools for blacksmiths. A good blacksmith may still do their work with inappropriate gear, but will be more productive, and the end result will be in much better shape if good tools are at hand.

The same goes for programmers - a good text editor enhances our productivity and makes navigating (and thus understanding) the project we’re working with a bit easier. Well, a good programmer can write everything in the simplest of editors, or even on paper, but I think no one will argue that a good text editor will boost our work tremendously. Don’t take this post too seriously though, as I think text editor is kinda personal thing, and for most users, key features will differ, so I’m not expecting anyone to 100% agree with me. As always.

xkcd: Real Programmers

There are several classes of text editing tools, which are different in many ways, and are used in different situations, like pagers, IDEs, hex browsers, stream editors, and so on, so in this topic, I will only touch a small subset of whole text editor world - advanced text editors. Such editors are not as simple as plain text editors like MS Notepad, and not as complex as IDEs. I would like to begin this with a look at default editors in GNU/Linux, which usually are provided with a desktop environment. These kinds of editors are worth to be mentioned here because I think that it is a good thing to have, and sometimes those are in fact usable.

Gedit, Kate, and other default editors

Default editors are a kinda weird thing, though. I mean, it is great to have a default editor in the system, because casual users sometimes, actually, edit files, but both Gedit and Kate are trying to be hackable developer-friendly editors. While this is not an issue, I don’t think that this is necessary. Hear me out though.

Let’s look at Gedit first. This is how it looks by default on the GNOME Shell desktop:

Figure 1: Gedit text editor

Figure 1: Gedit text editor

Not much to see, looks like a very basic editor. And I think it is one hundred percent fine because if you need to quickly tweak some configuration files it is more than enough. However, it’s not all that Gedit can do. There’s a plugin section in the settings, that features some preinstalled plugins, and users can search for extra plugins online or in repositories. So if we tweak some plugins and enable some settings, we can get this result:

Figure 2: Gedit with some tweaks

Figure 2: Gedit with some tweaks

This setup may be a bit more useful for you if you decide to use Gedit for development in some language, that Gedit supports. With file browser you have quick access to related files in the project, mini-map is kinda popular because of Sublime Text I think, and line numbers are, well, line numbers. But I don’t think that all of this is really necessary for Gedit.

The thing is, I don’t think that anybody expects an operating system to have a built-in, graphical, advanced, extensible, development-ready text editor. Maybe it’s just me because for too many years Microsoft offered very basic Notepad with Windows OS, and WordPad, which is aimed towards writing documents, so I apply this concept to GNU/Linux as well, shame on me. Sure, if you’re a system administrator and you’re setting up a new machine, or logging to remote servers you expect a more or less advanced editor to be installed, which usually is the case for most Linux distributions to have something like Nano or Vim. But when speaking about graphical environments I think that such a very basic editor as Notepad is more than enough. It can open files, edit those, and save them. So instead of adding these plugins, developers could focus on more critical parts of their desktop environment. I also doubt that Gedit or GNOME developers use Gedit to develop their respective projects, given that there’s also GNOME Builder thing. Maybe I’m wrong though.

Figure 3: Kate editor with default settings

Figure 3: Kate editor with default settings

Pretty much the same goes for Kate1, Mousepad, Pluma and others. But for professional use, these editors are still far behind everything else. And moreover, both GNOME and KDE projects have their own dedicated IDEs - GNOME Builder, and KDevelop respectively. I think this indicates that both Kate and Gedit should not be as advanced as they are, because there’s a more advanced tool in the very same project already, aimed at programming. Well, KDevelop uses some of Kate’s code for their editor, and GNOME Builder maybe does this as well, so Kate and Gedit can benefit by getting extra features from those projects, but maybe the focus should be shifted to KDevelop and GNOME Builder, and both Kate and Gedit should be left as a basic system editor.

Don’t get me wrong, I think that this is good to have open source editors, that are both hackable, and provide a good set of tools, and I know that many really use Kate for development, but I don’t know any advantage that Kate has over any other editor. Also, let’s look at the popularity chart:

Editor popularity chart for past five years.

This is not the most accurate data, I’ve taken it from Google Trends, and put Mousepad, Kate, and Gedit into a single category because the point of interest of each of those was less than 1%, compared to any of the competitors. In comparison, Sublime Text, Atom, Vim, and even Emacs are much higher in this chart. Why?

I think one reason is since these editors are already preinstalled no one really searches for those, thus the point of interest is very low. The other reason is that, perhaps, users first try a preinstalled solution, and after that search for a more advanced tool, that supports their workflow or project-related tooling. Again, don’t get me wrong, Kate and Gedit are both very capable text editors, but get fully outmatched by any of the other competitors listed in the chart.

Another problem is the plugin ecosystem. Kate features plugins in C++, and I think that this is the worst decision I’ve seen. This may be good for speed and integration because Kate itself is written in C++, but I think it is not good for the ecosystem. Considering that Kate has a small user base, and an even smaller part of that users care to write plugins, and an even smaller part of these plugin writers are capable of developing in C++ because C++ is a very complex language, it has to be a miracle for a plugin to appear. Although Kate developers can create such plugins, and I think that is the primary reason why C++ is supported for plugin development.

However, both Gedit and Kate can use plugins written in Python, which is a much better option, though implementation language is not the most important thing here, because plugin API also matters. I don’t know how good or bad Gedit and Kate python APIs are, so I will not make opinions on those. But given that Gedit doesn’t have as many plugins as other editors listed in this post do, I think it’s not the best one. Maybe.

But plugins are not the only thing that will lead text editors to success. Most editors that will be listed afterward had a killer feature that was the reason why programmers cared about trying these editors in the first place. I think both Kate, Gedit, Mousepad, and other built-in editor features is that those are preinstalled on the system. I also don’t think that this is the feature that will make most programmers choose these editors.

So what exactly makes other editors stand out? Let’s walk through each of those ones by one.

Sublime Text

Figure 4: Sublime Text 3

Figure 4: Sublime Text 3

Originally, Sublime Text was meant to provide an experience close to TextMate (which I will not touch here because I don’t have a Mac to check it), but for the Microsoft Windows platform, and later was ported to GNU/Linux and Mac OS. It was a massive hit when it came out in 2008 and still is a very popular editor. What makes it stand out today is its speed - it’s blazingly fast! I’ve tested it on a relatively big C project (8k source files, 8k headers) mounted via sshfs and it was the fastest editor in terms of opening files with its fuzzy search. So what makes it a good choice aside from its speed?

  • Fuzzy search.

    Sublime has this neat feature when you can open a special prompt and type an incomplete string, and it will find every possible match. For example, if you’re looking for file called this_important_file.txt you can press Ctrl p and input timpe, and Sublime text will find this file with this kind of match: this_impportant_file.txt or something like that. It will also list all other possible matches and rank those via some algorithm.

  • Language support.

    Sublime text has support for TextMate grammar files, as well as for its own grammar system. This makes it support all TextMate-supported languages, plus its own set of other languages, which sometimes overlap with TextMate’s grammar, but provide more smart syntax highlighting.

  • Plugins.

    There are many plugins for anything you can imagine. And those are written in Python, which at the time was a very viable option.

  • Cross-platform.

    Because it works the same on all major systems it is a great choice if you often switch those, either because you have Mac at home and Linux at work, or MS Windows, you will always find Sublime Text for these platforms.

  • Multiple cursors.

    I’m not sure where this feature first appeared, but if I’m not mistaken, Sublime Text was the editor which really popularized this feature, and it became sort of a standard thing for most post-Sublime editors. Essentially multiple cursors allow you to place several cursors in your file and edit text simultaneously. You can select the text, cut, paste, delete, type, and so on.

  • Go to anything.

    Sublime Text has parsers for many languages and allows you to jump between definitions in the project.

Interestingly enough none of these features are anything special for today’s editors. Many are cross-platform, and most of the new editors have multiple cursors, fuzzy search, and go-to. Plugins in editors were before Sublime Text. Yes, Sublime Text was very popular because of these features, but today I think only speed is what stands it apart from others. I’ve never used Sublime Text for work because it is not free, and not open source, so I can’t say much beyond that list. I try to avoid as much proprietary software as I can for the last 10 years. And especially when there are better options available in the land of text editors.

Let’s talk about relatively new ones that are quite popular today, and shifted Sublime text from its absolute dominance to third place.

Atom and Visual Studio Code

Oh boy, this is hot! Really hot! I mean why fan in my laptop spins like I’m playing a video game? Oh, it’s because I’ve opened these two editors side by side.

Just kidding. It’s fine, but this performance-related pun is still a thing, unfortunately, because each of those editors eats more memory and CPU cycles than Sublime Text, and overall performance is not that good. Why? Because these editors are not exactly editors. These two are web browsers that were turned into text editors. A bit of story behind the technology:

Atom was created by GitHub, and based on Electron technology, which also was developed by GitHub, which is essentially a slimmed-down Chromium browser. Electron allows developers to create desktop applications using web technologies. I actually think that this is one feasible future because possibilities with this approach are mostly endless. I mean, today browsers can view PDFs, run interactive scripts, play videos, run 3D games with hardware acceleration, and so on. Because everything I’ve listed is related to displaying things, let’s call this rich rendering. Currently, the browser is an ultimate window that can do mostly anything you want it to. And if such technology powers a text editor, I think we only win.

So what essentially rich rendering provides, is the ability to create any kind of interface, because you have full access to the DOM, styles, and markup, and you also have browser rendering capabilities to back all of this. For example, we can create a popup window that is pinned to a concrete line, and when you scroll view, this popup window moves with that line. This popup can have its own scrolling capabilities, and other interactive features, such as displaying graphics, because it is essentially just an <div> tag with some display properties. For example, we can really see that in the developer console inside Atom:

Figure 5: Atom&rsquo;s developer console and documentation popup.

Figure 5: Atom’s developer console and documentation popup.

But unfortunately, as always, there’s a cost. Web interfaces are really flexible, but we sacrifice performance for this, because although the web is kinda speedy, it’s not as fast, as it could be with some native code for interface. I doubt that Atom or VS Code would ever be as fast as Sublime Text, though Atom seems to realize that in order to speed things up it needs proper technology, so some parts (e.g. Tree Sitter - incremental parser framework, the search can use ripgrep tool) are now written in native languages. There was a project that aimed to rewrite the core of the editor in Rust but unfortunately was canceled.

And even though both Atom and VS Code are written in JavaScript on the Electron platform, these editors are different. Not only in implementation, since VS Code is written in Typescript, and Atom in plain JavaScript, but the main difference is in the approach to extending and configuring the editor. Both Atom and VS Code feature plugins are written in JavaScript. Both Atom and VS Code can install plugins from respective stores. The same goes for Sublime Text.

But Atom claims that it is “A hackable text editor for the 21st Century” and is hackable to the core. Which is quite true. Visual Studio Code and Sublime Text use plugins in a more traditional way. E.g. in Atom packages can change how the editor works, while Sublime Text and VS Code plugins mostly add features on top of how the editor works. Let’s look at this screenshot of Atom:

Figure 6: Atom Editor default look

Figure 6: Atom Editor default look

See this file tree on the left? This bottom panel with file information? Tabs? These all are separate plugins or packages as those are called in Atom. And you can turn those off just as packages that you’ve installed. And you also can replace those packages with something entirely different, if you want. Atom features really good defaults and configurations but is also eager for you to make it your own. This can be said for VS Code as well, but I’ve found some limitations compared to Atom while testing this.

What differs Atom from VS Code2 is that users also can tweak editor with CoffeeScript, which makes Atom truly hackable to the /core/because Atom itself is written in JavaScript. Although not really to the core, but rather down to the API of the editor, this is still huge. And that is another feature of Atom that I truly like. Just watch this talk by Jason Gilman about REPL for Clojure, which adds some amazing capabilities to the editor, and deeply integrates with the running instance of the REPL. It’s amazing. Not that you can’t make the same thing for Sublime Text, or VS Code, but I think it is much easier in Atom because Atom embraces the fact that anyone can hack upon it.

Figure 7: Visual Studio Code default look

Figure 7: Visual Studio Code default look

Visual Studio Code on the other hand is not that hackable. Well, it is quite hackable, VS Code features a good API for plugins, and many internal interface elements are developed using this API. You can change it, but Microsoft still has a view on how things should work and what should be in the editor. This also has some benefits. Both Atom and VS Code feature the box setup, though VS Code tries to be a more complete IDE-like solution focused on development, while Atom is focused on development and extensibility.

But the killer feature of Atom and VS Code is rich rendering that I’ve already mentioned before. Because the capabilities of the rendering toolkit are basically the same as in a web browser, we can add any kind of graphical interface to the editor. It can be an advanced color picker, if you work with CSS styles, a PDF viewer, if you’re working on documentation, a video player, if you’re testing out your site that has video, toggle switches, or sliders for real-time adjustments of values - anything is possible. This allows us to create the best interface we may want or need to be more productive, without relying much on the graphical toolkit we’re using, because the web is now our graphical toolkit. This is good for creating interfaces that suit your developing needs. I highly suggest you watch the talk by Bret Victor, called Inventing on Principle. It highlights the need for exploration of the domain we’re working in, which is computers, and in the case of Atom and VS Code web stack.

There is another kind of text editor, which is more texty. Those embrace text as their main focus and data format, both have text-based interfaces, and both are extremely hackable. I’m talking about two eternal rivals - Vim and Emacs, as well as about most other TUI-based editors. Let’s look at those in detail.

Vim

Behold! Vim, the king of text editors.

Vim has a very long history as a text editor. The predecessor, Vi, had many features that were really useful at the time, such as modes, but were proprietary. It was open-sourced later, but there also were several clones of Vi editor, and Vim is only one of those. Vim stands for Vi Improved and adds a lot of features on top of the Vi formula. So what Vim is like? Let’s have a look:

Not like there is much here to see. Just like in the case of Gedit or Kate. However, like other editors Vim supports plugins. These plugins can change Vim quite heavily. As in Atom, you can add tabs via a plugin, change status line appearance, add file explorer, tag browser, and so forth. Furthermore, like Atom, we have a way to create custom interfaces using plain text. In Atom, we can hack upon the DOM and CSS, and in Vim we can hack upon lines of text, and highlighting. For example, the file tree here is fully interactive but is essentially text. Here’s my old Vim setup:

But the visual look is not the most important thing, especially for Vim. I’m no longer a Vim user myself, but given the popularity chart I see that many developers use it. So what are the key benefits of Vim? I think these are quite heavy arguments:

  • Vim is fast. Really fast.

    Sometimes it is slower than Sublime text, on really large files, but still much faster than other editors I’ve talked about.

  • Vim is quite lightweight.

    Although it has a somewhat large codebase, and there are projects to eliminate some of the issues related to it, like NeoVim, Vim is still quite lightweight. By default, it provides very basic features, yet those features have deep semantics. And it starts fast too.

  • Vim is extensible.

    Vim can be extended with vimscript and python, NeoVim also expanded the ability to extend Vim with many more different languages, like Lua or JavaScript.

  • Vim is not a generic text editor, rather it is an editing-language environment.

    What this means is that in Vim you don’t have shortcuts, like in other editors. Instead, you have the language, with verbs and objects. So to delete to the word end you press dw, where d is for delete, and w is a motion you apply your command to, in this case a word forward. If you want to delete a current word, you can do diw which would stand for delete inside word, and if you want to delete a word and whitespace after it you use daw, or delete around word. This language has quite a lot of depth, and some commands can be combined in different ways.

Because of that Vim is mainly a keyboard-driven editor. And this is a huge thing. If you never have to touch your mouse, and you never have to move your hands from your keyboard you are already much more productive than any other developer. Unless you do visual programming.

I think that Vim’s killer feature is its editing model. It is verb-object, you decide what you want to do, and what you want to apply that modification to. And I think this is where Vim’s strength lies. You can combine commands into complex sentences that deal with text for you, and you can store those in macros, to invoke later, or repeat the last modification with . key. This is a powerful concept, and Vim executes it quite efficiently, and editing language definitively stood the test of time. Plugins also can extend this language with additional verbs and objects. However, I’ve mentioned that I’m no longer using Vim because I’ve found a more interesting approach to editing text while still using a Vim-like model.

Kakoune

This editor definitely is something. The main difference from Vim is that Kakoune uses an object-verb system - you select first, then edit, so Vim’s dw becomes wd in Kakoune. And the main feature of the Kakoune editing model is multiple selections. Not multiple cursors, like in Sublime Text, or VS Code and Atom, but selections. To get a better understanding, you can think of selection in Kakoune as Visual mode in Vim, which essentially allows you to select any text and treat it as an object for your next move. Kakoune extends this idea by allowing you to have more than one selection at the same time. And furthermore, Kakoune doesn’t have a cursor at all. It is always a selection, and most of the time it is simply a single char selection.

Figure 8: Kakoune

Figure 8: Kakoune

By default, it doesn’t look much different from Vim. However, we can tweak its appearance with some plugins, to make it look somewhat more like an interactive development environment:

Figure 9: Kakoune with plugins

Figure 9: Kakoune with plugins

Just like in Vim, we can create a custom interface out of text and syntax highlighting rules. But again, what’s important - how it looks, or how it feels? Kakoune feels refreshing and is a modern take on Vim. Unlike Vim, Kakoune is much simpler - it does not include window management, it does not have its own scripting language, and for me, this was a big no-no when I first saw Kakoune. However trust me, the lack of language is not a problem for Kakoune, and it does not need window managing capabilities at all. And Kakoune is also pretty fast, and my tests showed that it is generally much faster than Vim on big files. Especially highlighting. Although Kakoune uses more memory to cache everything it highlights, so there’s always a trade-off.

So what are the strong points of Kakoune and why you may want to use it? For me, it was first-class support for multiple selections and structural regular expressions. In Kakoune it is possible to select a big chunk of text, then hit s key, and input a regular expression, that will be used to select everything that matches in that original selection. Then you can repeat this process and get selections of that sub selections. E.g. if we wanted to select every variable that has _count, but without that _count part we would do this:

Figure 10: Select til paragraph end → select w+count regexp → select everything up to count in resulting selections

Figure 10: Select til paragraph end → select w+count regexp → select everything up to count in resulting selections

Of course, we could do it a much simpler way by selecting paren|bracket|curly regular expression. This example is made up, but with more complex regular expressions it makes this feature really handy. I’ve written some plugins for Kakoune, one of which was tagbar.kak, which provides a side panel with tags for the current buffer, and it uses universal-ctags. In the source code of that plugin, there is a big block of code that was generated by using multiple selections over the complete list of ctags kinds for all languages all at once. So this is something like real-time interactive sed.

Speaking of plugins. Kakoune features the weirdest, yet really great way to extend the editor. All editors that we’ve seen so far used some kind of language and an API to write plugins. Some editors use their platform language, like JS in Atom or VS Code, and Kate with C++ plugins, other use some other languages, like Python. Vim has its own vimscript language, and an API to work with Python.

I’ve already mentioned that Kakoune has no built-in scripting language. Well, kinda. In fact, it has a very basic scripting language, called kakscript, but it has no control flow except try and catch, and is only used to create basic commands. However, you can go far with it, because it allows you to execute Kakoune keys, and all other Kakoune commands. For example, most indentation handling is done by searching for the previous line, copying its indent level, adding a needed amount of indentation to that level, and applying it to the currently indenting line. Because Kakoune features very fast regular expression language and is quite robust on its own this is one feasible approach to writing plugins.

The other one is shell expansions. Kakoune supports various strings with syntax that uses percent and pair of delimiters, for example, %(str), %{another str}, and %|yet another|. You can use other delimiters too. This is useful for various expansions, like %opt{option_name} or %val{value_name} are also strings, but those will expand to the respective values stored in these variables. This way you can write a string like "current line is: %val{cursor_line}", and it will expand to the current line number. Kakoune also has this special kind of expansion: %sh{…}, that expands to shell call. So everything that is inside curly braces will be your average shell script. Kakoune strongly suggests using POSIX shell scripting environments, so it could work on all POSIX-compliant systems. And with shell, you can use any language you want!

Kakoune exposes its state via shell variables that begin with kak_ prefix. Internal options are available with just this prefix, user options with kak_opt_ prefix, registers with kak_reg_ and so on. This way you can see the current selection in a shell expansion as $kak_selection. And if your plugin defines some option, you can expose it to the shell as well.

This allows users to build any kind of plugin without much API. We only see the current state of Kakoune and use it to produce some results. Of course, plugins can interact with Kakoune, and it is done through pipes. You can pipe arbitrary Kakoune commands to a running session like this:

:eval %sh{ echo "execute-keys -client $kak_client gg" | kak -p $kak_session}

This will jump to the beginning of the buffer. So if you know your session, which is PID of Kakoune, you can interact with Kakoune from the outside world. This is a quite flexible system, and it adheres to the design of POSIX-compliant tools that can also work with Kakoune, not just Kakoune plugins that are usable only in Kakoune itself. As a consequence Kakoune can be integrated with a huge amount of console tools starting from file managers like nnn or ranger, expanding to fuzzy search engines, search tools, git clients, and so on, and also window managers.

So when I said that Kakoune doesn’t need its own window managing facilities, I meant that you can integrate it into any other window manager, like i3wm or Bspwm, terminal multiplexers, such as Tmux or GNU Screen, or terminal built-in splits like in iTerm2 or Kitty.

The downside of it is that you’re using a human interface as a programming interface. You see, the shell is meant to be used by humans, and writing programs that manipulate the shell is hard. Such programs have to parse output, generate valid input in response, and so on. Some programs, like git, provide porcelain mode, that is parser friendly and versioned, so your application will continue to work if Git changes and you’ve written code correctly. But not all applications are like this, and some of the systems ones work differently on different systems. For example, I’ve written that filetree plugin that I’ve shown on the second Kakoune screenshot, and I had to parse the output of ls program. But ls as in GNU don’t work as BSD’s ls. So I had to write cross-platform calls to ls which limits me. I’ve also used Perl to parse this thing, and a lot of other shell tools, that are also not that portable. The last bit is that all shells are different, and bash differs tremendously from ksh. This is quite a downside for plugin maintainers.

If what I’ve said about Kakoune concerns you and you think that it is not for you, but you want to try out multiple selection workflow, then you should check Vis editor. It is quite similar to Kakoune and uses multiple selections as its central way of interaction with the text. It is also quite minimal, and fast. However, I’ve never used it myself, though as far as I know, it may be more appealing to Vim users, because it doesn’t flip the verb-object way of interaction like Kakoune. It also uses Lua for scripting, which I think is a good approach too, because Lua is great for embedding.

But there’s yet another very special editor I want to discuss. I’m using it quite heavily today, and it is my main tool of work for several months already. And this editor is Emacs.

Emacs

Behold! Emacs, the true king of text editors.

What we immediately can spot is that Emacs can display images, different font sizes and fonts, in the same buffer, it has some graphical interface and… And that’s it. Don’t judge by the look though, this beast is powerful as hell. These features already make Emacs stand out, compared to Vim or Kakoune, but are not yet sufficient to compete with Atom or VS Code, which have rich rendering systems, thanks to web technologies. Emacs can’t compete with Sublime Text in terms of speed as well, but there are reasons for that. Although we take these graphical features for granted today, Emacs is a really old editor, it was released about 44 years ago, and these features were added way before Sublime Text was even in development.

But these are just cosmetic features to many of us, what’s so special about Emacs besides that? Well, first of all, Emacs is not strictly a text editor, much like Atom. Emacs is built on top of a virtual machine, which has an interpreter and byte-compiler of Emacs Lisp language, that was designed specifically for Emacs, and it is one of the oldest Lisp dialects that are still in use. This VM is written in C, and the rest of Emacs is written in Emacs Lisp. Although some core primitives for Emacs are written in C for speed concerns, we can still use those from Emacs Lisp or rewrite those in Emacs lisp if we really want.

So what this means? Emacs is more like an application platform, rather than a text editor. It has a text editor built in but offers much more than that. For example, things like Magit are possible and quite popular. Emacs has games, mail clients, chat clients, music players, and this list can go on. There’s a famous quote:

Emacs is a great operating system, lacking only a decent editor.

This is, in fact, kinda true. You see, by default, the Emacs experience is not really user-friendly. Given that it was developed using quite uncommon keyboard, all Ctrl and Alt shortcuts may be uncomfortable to use, and in general all keybindings seem quite strange and random. Famous example is directional keys Ctrl b is backward, Ctrl f is forward, Ctrl n is next, Ctrl p is previous. Although these keys have semantic meaning, many others do not. And heavy use of the control key is a pain point for many users. I personally remapped Caps Lock to be my Ctrl way before I started using Emacs (thanks to Vim), so this is not a big problem for me. But you can notice that on that keyboard the Meta (Alt) and Ctrl keys are swapped, compared to modern keyboards. We have Alt where Ctrl was, and that is partial reason why Emacs is so uncomfortable. But many keyboard-driven environments have keybinding-related problems, so let’s discuss more interesting topics instead. And this topic is Emacs Lisp.

Being a Lisp machine, Emacs has real language as its scripting language. Emacs Lisp may not be the best Lisp, but it is fine. It has some quirks but for the most part, it is good. But because this is also an implementation language for Emacs itself we can deeply integrate new features and change existing Emacs features by using it.

For example, imagine that you don’t like how your Ctrl w key works in Emacs. By default it kills region, or, translated to English, it cuts the selection. However Emacs often has an invisible region that is not active, but its beginning and end positions are defined. So if you accidentally press Ctrl w it will kill that region resulting in frustration and undoing. That often bit me, as I wrote a lot of text in one go, pressed Ctrl w thus killing all to the beginning of the file. What if we wanted to make this key work like it works in the shell, and still keep the default behavior if we have a visible region active? We can write our own function:

(defun aorst/kill-region-or-word (arg)
  (interactive "*p")
  (if (and transient-mark-mode
           mark-active)
      (kill-region (region-beginning) (region-end))
    (backward-kill-word arg)))

Now we can bind this function to Ctrl w and delete words before the cursor with a known shortcut, and cut regions when we want. This is a quite simple case, and you can do this in most editors that support custom keybindings and have a way to define a function. And in Emacs, every key is bound to some function, and such function can actually do anything. Another example is if we want to add a custom action to an existing function without redefining it. Emacs Lisp has an interesting way of doing this by using advice. It is something like a run time patching, that essentially says before or after doing that thing do my thing. For example, I don’t like that the external package doesn’t respect my custom function. I can advise it to check if I’m calling my function or not:

(define-advice lsp-ui-doc--make-request (:around (foo))
  (unless (eq this-command 'aorst/escape)
    (funcall foo)))

So whenever lsp-ui-doc--make-request function is called, we first check if this-command was not aorst/escape, and if it wasn’t we actually call lsp-ui-doc--make-request. This is a really powerful feature, which makes it easy to patch foreign functions in a way that you don’t have to patch them again until their signature changes. Or you can simply redefine function entirely with your own implementation.

The depth of customization you can apply to Emacs is probably endless. I’ve turned my Emacs into a visual clone of Atom editor because I like how Atom looks. See side-by-side comparison:

Figure 11: Emacs and Atom

Figure 11: Emacs and Atom

It’s not 100% clone, though I really like how it turned out, it is also important how it feels. And it feels great. I’m using many different packages that extend Emacs with various features like multiple cursors, custom keyboard-driven menus, linters, and so on. And the other thing is that Emacs is the only editor in which I’ve never actually seen serious problems with packages. In Vim, VS Code, and Atom, I’ve encountered problems with quite popular plugins, that prevented me from doing my work by either not working at all or having some bugs that made usage impossible. Never happened in Emacs. Of course, sometimes packages break, but it gets fixed really fast, unlike in those editors.

Which one is the best?

Perhaps, I was asking this question myself too many times. I think these editors are particularly interesting:

  • Gedit, Kate, Mousepad
  • Sublime Text
  • Atom
  • VS Code
  • Vim
  • Kakoune
  • Emacs

If I were to rank those, it would be hard, so I would first split those into several categories and rank each. I think that there is no best editor, because best is a relative term, and it will vary between different people quite a lot. But we still can compare these editors and see which one is the best at what it provides.

First two categories would be out-of-the-box-ish editors, and customizable-ish editors. So for the OOTB editors, my ranking would be:

  1. VS Code, Atom 🥇
  2. Sublime Text 🥈
  3. Kate 🥉
  4. Gedit, Mousepad

This is a bit cheap to put Atom and VS Code at the same place, but I think that both are great. However I think that VS Code absolutely nails default configuration, and while Atom can be further customized to make it on par with VS Code, Atom still lacks some important things, like an integrated terminal.

Next go Sublime Text, because it is on par in terms of features with the first two, much faster, but I don’t personally care for the speed much, and abilities for further customization are what matters for me. Given that Sublime Text is focused on the OOTB features and speed, and you can extend it with plugins, you do not have the rich rendering and thus customization possibilities are more limited. And its default setup still lacks behind VS Code’s. Although it seems great for what it provides.

The rest are default editors, and I put Kate a little bit higher in the list because I see more potential in it compared to Gedit or Mousepad and others. Kate gets popular features, like language server protocol faster, so I think it deserves to be higher.

Now, I know, I did not include Kakoune, Emacs, and Vim in the previous chart, because those all are not OOTB enough, but their main focus is also different. So the second group is editors that are (kinda) focused on customization, and I would rank those this way:

  1. Emacs 🥇
  2. Kakoune 🥈
  3. Vim 🥉

So why did I put Emacs to the first place? Because like Atom it features the deepest customization possibilities, even deeper than in Atom. And although the out-of-the-box state of Emacs is miserable this is not its key feature. If you really want OOTB Emacs configuration, there’s plenty to choose from and more.

Kakoune got second place, because its customization model is less direct, and encourages the use of POSIX tools, but you still can use any language through shell calls. Even Emacs Lisp is theoretically possible with Emacs batch mode, but why would you do that? This lifts the limitation of a particular language, and if you’re living in an isolated world you can even use a different shell for expansions, thus creating more advanced plugins in shell script alone. Although it is not recommended.

The last is Vim, and I do not like vimscript. I’ve written some plugins in it in the past, and maintenance of vimscript between Vim versions is kinda hard, and the language itself is quite chaotic. You can use other languages, but I’m not into it if the editor provides its own language. Kakoune provides a way to call shell, and Emacs has a pretty decent Lisp dialect, so those are better than Vim in my opinion. Yes, Vim features lots of plugins and customization options, which are more mature than Kakoune’s, but Kakoune is young so I think it is a time property.

The other possible way to rank editors is feature complete. This is tricky but I’ll try my best here:

  1. VS Code 🥇
  2. Emacs, Vim 🥈
  3. Sublime Text 🥉
  4. Atom
  5. Kakoune
  6. Kate
  7. Gedit, Mousepad

Before you get mad (if you’re not already) let me explain why I think this is an acceptable ranking.

VS Code is on the first place because it gets a lot of plugins each week, and ships a decent amount out of the box, so most developers can use it without any configuration or with a really minimal one. This is important because an editor is a tool that should make us more productive. Also, I’ve found a syntax highlighter for quite rare language that was developed for in-company use only for VS Code, so it earns personal bonus point here.

Emacs and Vim share second place, but for slightly different reasons. Emacs has a metric ton of packages shipped by default, however, those are not really used or configured for a good out-of-the-box experience. It also has an insane amount of packages available online, and you can find practically anything for Emacs. Hell, you can order salads from Emacs!

Vim has fewer plugins built in, but it has mostly only necessary ones. Those are not enabled by default either, but at least you can browse through the manual and get an idea of what’s shipped. It has many plugins available online too, however, I found that they either have fewer features compared to Emacs ones or in general are less interesting. However, Vim keeps up by being a bit better as an editor. Although you have to learn modes, in Emacs you have to learn a lot of randomish keybindings.

Sublime text is very feature-complete, but I’ve seen some complaints by colleagues who use it about some missing features that are present in Vim.

Next goes Atom. It has good features in default shipping, however, I’ve found that a lot of popular packages are broken in either way and usually functionality is not as good as in VS Code or especially in Emacs. However it has interesting packages still, like Proto REPL3 or Activate Power Mode.

Kakoune is an interesting one. It is quite minimal and doesn’t have many plugins as of today, but the editor itself is quite young, so I have big hopes for it. Currently, existing plugins make it very usable, and more is yet to come.

Kate has more features out of the box compared to other default editors1, so it is higher in the list. But overall it is still comparable with Gedit and Mousepad.

For the final ranking, I think I should use advisable-ish-thingy rating… In other words what editors are best to recommend for others.

  1. VS Code 🥇
  2. Sublime Text 🥈
  3. Atom 🥉
  4. Vim
  5. Kakoune
  6. Emacs
  7. Kate, Gedit, Mousepad

Keep it cool, OK? I think VS Code as of today is your best choice if you need a tool and you need it right now. Before VS Code it was Sublime Text, but currently former is much more popular than the latter. If VS Code and Sublime are not for you, then I would suggest Atom, because it is quite close to those, but has some problems with packages. Next would go Vim, and only because it is the second most popular editor right now. After Vim is Kakoune, because, in my opinion, it is better than Vim, but a lot more bare-bones so you would need to tinker with it a bit more. Then Emacs, and I would not recommend Emacs to anyone just because it is a thing you have to come to by yourself. And the last thing I would recommend are default editors, because, well, you know if you can afford better tools why would you lock yourself to those that are at hand?

And for the final-final ranking, my personal top of editors is based on my own bias and opinion on each:

  1. Emacs 🥇
  2. Kakoune 🥈
  3. Atom 🥉
  4. VS Code
  5. Vim
  6. Sublime text
  7. Kate
  8. Gedit, Mousepad

Yes, I think Emacs is the best that happened to me so far, the best editor and application platform for text-related workflow. Kakoune is also an amazing editor, and though I mainly use Emacs, I still use Kakoune for multiple selection goodness when Emacs is not capable to do what I want.

Atom is just like Emacs, with really powerful customization abilities, and a good application platform, because it is not as restrictive as VS Code. But VS Code also is a decent editor.

Vim and Sublime Text are great, but I grew up from Vim, and Sublime is proprietary so is a no-go for me.

For the default editors, I think Kate is the most promising one, and I still occasionally use Gedit because it is often available on work machines we have at the office, to which I sometimes have direct access. Mousepad is the least interesting for me.

This concludes my kinda random thoughts on text editing tools. I hope this was at least an interesting read, and that you’ve found something new for you or maybe tried some editors you had not before. Although I must say that it turned out to be way longer than I expected.

Thanks for reading.

]]>
https://andreyor.st/posts/2020-04-29-text-editors/ hacker-news-small-sites-42721710 Thu, 16 Jan 2025 05:38:19 GMT
<![CDATA[Time and Space Complexity]]> thread link) | @thunderbong
January 15, 2025 | https://itsgg.com/2025/01/15/time-and-space-complexity.html | archive.org

Understanding time complexity and space complexity is fundamental to writing efficient, scalable code. This guide explores Big-O notation and common complexity patterns through practical examples and real-world analogies.

Introduction

Before diving into specific complexities, let’s understand Big-O Notation, which provides a high-level abstraction of both time and space complexity.

Understanding Big-O

Big-O notation helps us analyze algorithms in terms of their scalability and efficiency. It answers the question: “How does the performance or space requirements grow as the input size grows?, focusing on the worst-case scenario.”

Impact:    Operations for n=5:      Visualization:
-------    -----------------      ---------------
O(1)       1                     ▏        Excellent!
O(log n)   2                     ▎        Great!
O(n)       5                     ▍        Good
O(n log n) 11                    █        Fair
O(n²)      25                    ████     Poor
O(2ⁿ)      32                    █████    Bad
O(n!)      120                   ████████ Terrible!

Key Characteristics

  1. Focuses on Growth Rate
    • Ignores constants and smaller terms
    • O(2n) is simplified to O(n)
    • O(n² + n) is simplified to O(n²)
  2. Worst-Case Scenario
    • Represents upper bound of growth
    • Helps in planning for the worst situation
    • Example: Linear search worst case is O(n), even though it might find the element immediately
  3. Asymptotic Behavior
    • Considers behavior with large inputs
    • Small input differences become irrelevant
    • Important for scalability analysis

Common Rules

  1. Drop Constants

    # O(2n) → O(n)
    array.each { |x| puts x }  # First loop
    array.each { |x| puts x }  # Second loop
    
  2. Drop Lower Order Terms

    # O(n² + n) → O(n²)
    array.each do |x|        # O(n)
     array.each do |y|      # O(n²)
       puts x + y
     end
    end
    
  3. Different Variables

    # O(n * m) - cannot be simplified if n ≠ m
    array1.each do |x|
      array2.each do |y|
        puts x + y
      end
    end
    

Practical Examples

  1. Constant Time - O(1)

    def get_first(array)
      array[0]   # Always one operation
    end
    
  2. Linear Time - O(n)

    def sum_array(array)
      sum = 0
      array.each { |x| sum += x }  # Operations grow linearly
      sum
    end
    
  3. Quadratic Time - O(n²)

    def nested_loops(array)
      array.each do |x|
        array.each do |y|    # Nested loop creates quadratic growth
          puts x + y
        end
      end
    end
    

Common Misconceptions

  1. Big-O is Not Exact Time
    • O(1) doesn’t mean “instant”
    • O(n²) might be faster than O(n) for small inputs
    • It’s about growth rate, not absolute performance
  2. Constants Do Matter in Practice
    • While O(100n) simplifies to O(n)
    • The constant 100 still affects real-world performance
    • Use Big-O for high-level comparison, not micro-optimization
  3. Best Case vs Average Case
    • Big-O typically shows worst case
    • Quick Sort: O(n log n) average, O(n²) worst case
    • Consider your use case when choosing algorithms

Comparing Growth Rates

From fastest to slowest growth:

  1. O(1) - Constant
  2. O(log n) - Logarithmic
  3. O(n) - Linear
  4. O(n log n) - Linearithmic
  5. O(n²) - Quadratic
  6. O(2ⁿ) - Exponential
  7. O(n!) - Factorial

For example, with n = 1000:

  • O(log n) ≈ 10 operations
  • O(n) = 1,000 operations
  • O(n²) = 1,000,000 operations

Time Complexity

Time complexity describes how the runtime of an algorithm changes as the size of the input grows.

Common Time Complexities

Complexity Description Real-World Analogy Example Algorithm
O(1) Constant time, independent of input size Accessing the first locker in a row Accessing an array by index
O(log n) Logarithmic growth, halves input at each step Finding a name in a phone book Binary Search
O(n) Linear growth, proportional to input size Reading every book on a shelf Linear Search
O(n log n) Linearithmic growth, efficient divide-and-conquer Sorting multiple card decks Merge Sort, Quick Sort
O(n²) Quadratic growth, nested comparisons Comparing all students in a class Bubble Sort, Selection Sort
O(2ⁿ) Exponential growth, doubles with each new element Trying all combinations of a lock Generate all subsets
O(n!) Factorial growth, all possible arrangements Arranging people in all orders Generate all permutations

Common Algorithm Examples

O(1) - Constant Time

  • Definition: The algorithm’s runtime does not depend on the input size.
  • Real-World Example: Picking the first book on a shelf takes the same time whether there are 5 or 500 books.
Ruby Code Example
def first_element(array)
  array[0] # Accessing an element by index is O(1)
end

puts first_element([1, 2, 3]) # => 1
Execution Steps
Array: [1, 2, 3]
         ↓
Access:  1  2  3
         ↑
Result:  1

O(log n) - Logarithmic Time

  • Definition: The runtime grows logarithmically as the input size increases, typically in divide-and-conquer algorithms.
  • Real-World Example: Searching for a name in a sorted phone book by repeatedly halving the search range.
def binary_search(array, target)
  low, high = 0, array.length - 1

  while low <= high
    mid = (low + high) / 2
    return mid if array[mid] == target

    array[mid] < target ? low = mid + 1 : high = mid - 1
  end

  -1 # Return -1 if not found
end

puts binary_search([1, 2, 3, 4, 5], 3) # => 2
Execution Steps: Searching for 3 in [1, 2, 3, 4, 5]
Step 1:  [1, 2, 3, 4, 5]    Initial array
          L     M     H      L=0, M=2, H=4

         [1, 2, 3, 4, 5]    Check M=3
              ↑
              Found!        Target found at index 2

O(n) - Linear Time

  • Definition: The runtime grows linearly with the input size.
  • Real-World Example: Finding a specific book on a shelf by checking each book sequentially.
def linear_search(array, target)
  array.each_with_index do |element, index|
    return index if element == target
  end
  -1
end

puts linear_search([5, 3, 8, 6], 8) # => 2
Execution Steps: Searching for 8 in [5, 3, 8, 6]
Step 1:  [5, 3, 8, 6]    Check 5
          ↑
          x

Step 2:  [5, 3, 8, 6]    Check 3
             ↑
             x

Step 3:  [5, 3, 8, 6]    Check 8
                ↑
                ✓         Found at index 2!

O(n²) - Quadratic Time

  • Definition: The runtime grows quadratically with input size due to nested iterations.
  • Real-World Example: Comparing every student in a classroom to every other student to find matching handwriting.
Ruby Code Example: Bubble Sort
def bubble_sort(array)
  n = array.length
  (0...n).each do |i|
    (0...(n - i - 1)).each do |j|
      if array[j] > array[j + 1]
        array[j], array[j + 1] = array[j + 1], array[j]
      end
    end
  end
  array
end

puts bubble_sort([5, 3, 8, 6]) # => [3, 5, 6, 8]
Execution Steps: Sorting [5, 3, 8, 6]
Pass 1:  [5, 3, 8, 6]    Compare 5,3
          ↑ ↑
         [3, 5, 8, 6]    Swap!

         [3, 5, 8, 6]    Compare 5,8
             ↑ ↑
         [3, 5, 8, 6]    No swap

         [3, 5, 8, 6]    Compare 8,6
                ↑ ↑
         [3, 5, 6, 8]    Swap!

Pass 2:  [3, 5, 6, 8]    Compare 3,5
          ↑ ↑
         [3, 5, 6, 8]    No swap

         [3, 5, 6, 8]    Compare 5,6
             ↑ ↑
         [3, 5, 6, 8]    No swap

Final:   [3, 5, 6, 8]    Sorted!

O(n log n) - Linearithmic Time

  • Definition: The runtime grows faster than O(n) but slower than O(n²), often in divide-and-conquer sorting.
  • Real-World Example: Sorting cards by repeatedly dividing and merging groups.
Ruby Code Example: Merge Sort
def merge_sort(array)
  return array if array.length <= 1

  mid = array.length / 2
  left = merge_sort(array[0...mid])
  right = merge_sort(array[mid..])

  merge(left, right)
end

def merge(left, right)
  result = []
  while left.any? && right.any?
    result << (left.first <= right.first ? left.shift : right.shift)
  end
  result + left + right
end

puts merge_sort([5, 3, 8, 6]) # => [3, 5, 6, 8]
Execution Steps: Sorting [5, 3, 8, 6]
Split:   [5, 3, 8, 6]        Original array
           /        \        Split into two
      [5, 3]      [8, 6]    
       /  \        /  \     Split again
    [5]  [3]    [8]  [6]    Individual elements

Merge:   [5]  [3]    [8]  [6]    Start merging
         \   /        \   /      Compare & merge pairs
        [3, 5]      [6, 8]      
            \        /          Final merge
         [3, 5, 6, 8]          Sorted array!

O(2ⁿ) - Exponential Time

  • Definition: The runtime grows exponentially, doubling with each additional element.
  • Real-World Example: Finding all possible combinations of items in a set.
Ruby Code Example: Generate All Subsets
def generate_subsets(array)
  return [[]] if array.empty?
  
  element = array[0]
  subsets_without = generate_subsets(array[1..-1])
  subsets_with = subsets_without.map { |subset| subset + [element] }
  
  subsets_without + subsets_with
end

puts generate_subsets([1, 2, 3]).inspect
Execution Steps: Generating subsets of [1, 2]
Input: [1, 2]

Step 1:   []              Start with empty set
          |
Step 2:   [] → [1]        Add 1 to empty set
          |
Step 3:   [] → [1]        Add 2 to each previous set
          |    |
          [2]  [1,2]

Results:  []              All possible subsets
          [1]             ↑
          [2]             Total: 2ⁿ = 4 subsets
          [1,2]           ↓

O(n!) - Factorial Time

  • Definition: The runtime grows with the factorial of the input size.
  • Real-World Example: Finding all possible arrangements of items.
Ruby Code Example: Generate All Permutations
def generate_permutations(array)
  return [array] if array.length <= 1
  
  permutations = []
  array.each_with_index do |element, index|
    remaining = array[0...index] + array[index + 1..-1]
    generate_permutations(remaining).each do |perm|
      permutations << [element] + perm
    end
  end
  
  permutations
end

puts generate_permutations([1, 2, 3]).inspect
Execution Steps: Generating permutations of [1, 2, 3]
Input: [1, 2, 3]

Level 1:     [1]       [2]       [3]       Choose first number
              |         |         |
Level 2:    [2,3]     [1,3]     [1,2]     Arrange remaining
             / \       / \       / \
Level 3: [2,3] [3,2] [1,3] [3,1] [1,2] [2,1]

Results:  [1,2,3]  →  [2,1,3]  →  [3,1,2]   All permutations
          [1,3,2]  →  [2,3,1]  →  [3,2,1]   Total: 3! = 6

Space Complexity

While time complexity focuses on execution speed, space complexity measures memory usage. Understanding both is essential for writing efficient algorithms.

Space complexity consists of two main components:

  1. Input Space: Memory required to store the input data
  2. Auxiliary Space: Additional memory used during computation

Common Space Complexities

Complexity Description Real-World Analogy Example Algorithm
O(1) Constant space, no extra memory Using a calculator for addition Swapping variables
O(log n) Logarithmic space for recursion Stack of cards in binary search Quick Sort, Binary Search
O(n) Linear space for auxiliary structures Making a copy of a deck of cards Merge Sort
O(n²) Quadratic space for nested tables Chess board with all possible moves Dynamic Programming (2D)

Common Algorithm Examples

O(1) - Constant Space

  • Definition: The algorithm uses a fixed amount of memory, regardless of input size.
  • Real-World Example: Using a calculator to add numbers - it needs the same memory whether adding small or large numbers.
Ruby Code Example: Number Swap
def swap_numbers(a, b)
  temp = a    # One extra variable regardless of input size
  a = b
  b = temp
  [a, b]
end

puts swap_numbers(5, 3).inspect # => [3, 5]
Memory Analysis for O(1)
  • Input Space: Two variables (a, b)
  • Auxiliary Space: One temporary variable (temp)
  • Total: O(1) - constant space

O(log n) - Logarithmic Space

  • Definition: The algorithm’s memory usage grows logarithmically with input size, often due to recursive call stack.
  • Real-World Example: Using a stack of cards in binary search, where we only track the middle card at each step.
def binary_search_recursive(array, target, low = 0, high = array.length - 1)
  return -1 if low > high

  mid = (low + high) / 2
  return mid if array[mid] == target

  if array[mid] < target
    binary_search_recursive(array, target, mid + 1, high)
  else
    binary_search_recursive(array, target, low, mid - 1)
  end
end

puts binary_search_recursive([1, 2, 3, 4, 5], 3) # => 2
Memory Analysis for O(log n)
  • Input Space: Array and target value
  • Auxiliary Space: Recursive call stack (log n levels deep)
  • Total: O(log n) space

O(n) - Linear Space

  • Definition: The algorithm’s memory usage grows linearly with input size.
  • Real-World Example: Making a copy of a deck of cards - you need space proportional to the number of cards.
Ruby Code Example: Merge Sort (Space Focus)
def merge_sort_space_example(array)
  return array if array.length <= 1

  mid = array.length / 2
  left = array[0...mid].clone    # O(n/2) space
  right = array[mid..].clone     # O(n/2) space

  # Total O(n) auxiliary space used
  merge(merge_sort_space_example(left),
        merge_sort_space_example(right))
end

puts merge_sort_space_example([5, 3, 8, 6]).inspect # => [3, 5, 6, 8]
Memory Analysis for O(n)
  • Input Space: Original array
  • Auxiliary Space: Two subarrays of size n/2
  • Total: O(n) space

O(n²) - Quadratic Space

  • Definition: The algorithm’s memory usage grows quadratically with input size.
  • Real-World Example: Creating a chess board where each square stores all possible moves from that position.
Ruby Code Example: All Pairs Shortest Path
def floyd_warshall(graph)
  n = graph.length
  # Create n×n distance matrix
  dist = Array.new(n) { |i| Array.new(n) { |j| graph[i][j] } }

  n.times do |k|
    n.times do |i|
      n.times do |j|
        if dist[i][k] && dist[k][j] &&
           (dist[i][j].nil? || dist[i][k] + dist[k][j] < dist[i][j])
          dist[i][j] = dist[i][k] + dist[k][j]
        end
      end
    end
  end
  dist
end

graph = [
  [0, 5, nil, 10],
  [nil, 0, 3, nil],
  [nil, nil, 0, 1],
  [nil, nil, nil, 0]
]
puts floyd_warshall(graph).inspect
Memory Analysis for O(n²)
  • Input Space: Original n×n graph
  • Auxiliary Space: n×n distance matrix
  • Total: O(n²) space

Space Complexity Examples

O(1) - Constant Space

  • Definition: Uses a fixed amount of memory regardless of input size.
  • Example: In-place array element swap
def swap_elements(array, i, j)
  temp = array[i]
  array[i] = array[j]
  array[j] = temp
end
Memory Visualization
Input Array:  [4, 2, 7, 1]    Only one extra variable (temp)
               ↑  ↑            regardless of array size
              i=0 j=1

Memory Used:  +---------------+
Temp:         | temp = 4      |  O(1) extra space
              +---------------+

After Swap:   [2, 4, 7, 1]    Original array modified
               ↑  ↑            in-place

O(log n) - Logarithmic Space

  • Definition: Memory usage grows logarithmically with input size.
  • Example: Recursive binary search call stack
def binary_search_recursive(array, target, low = 0, high = array.length - 1)
  return -1 if low > high
  mid = (low + high) / 2
  return mid if array[mid] == target
  array[mid] < target ? binary_search_recursive(array, target, mid + 1, high) :
                       binary_search_recursive(array, target, low, mid - 1)
end
Memory Visualization
Input: [1, 2, 3, 4, 5, 6, 7, 8]

Call Stack Growth (searching for 7):
                                        Stack Frames
+------------------+                 +---------------+
|[1,2,3,4,5,6,7,8] |  First call     | low=0, high=7 |
+------------------+                 +---------------+
         ↓                                  ↓
+------------------+                 +---------------+
|    [5,6,7,8]     |  Second call    | low=4, high=7 |
+------------------+                 +---------------+
         ↓                                  ↓
+------------------+                 +---------------+
|      [7,8]       |  Third call     | low=6, high=7 |
+------------------+                 +---------------+

Total Space: O(log n) stack frames

O(n) - Linear Space

  • Definition: Memory usage grows linearly with input size.
  • Example: Creating a reversed copy of an array
def reverse_copy(array)
  result = Array.new(array.length)
  array.each_with_index { |elem, i| result[array.length - 1 - i] = elem }
  result
end
Memory Visualization
Input:    [1, 2, 3, 4]    Original array (n elements)
           ↓  ↓  ↓  ↓     Each element copied
Result:   [4, 3, 2, 1]    New array (n elements)

Memory Usage:
+---+---+---+---+  Input Array  (n space)
| 1 | 2 | 3 | 4 |
+---+---+---+---+
        +
+---+---+---+---+  Result Array (n space)
| 4 | 3 | 2 | 1 |
+---+---+---+---+

Total Space: O(n)

O(n²) - Quadratic Space

  • Definition: Memory usage grows quadratically with input size.
  • Example: Creating a distance matrix for graph vertices
def create_distance_matrix(vertices)
  Array.new(vertices.length) { |i|
    Array.new(vertices.length) { |j|
      calculate_distance(vertices[i], vertices[j])
    }
  }
end
Memory Visualization
Input Vertices: [A, B, C]    3 vertices

Distance Matrix:
     A   B   C
   +---+---+---+
A  | 0 | 2 | 5 |     Each cell stores a
   +---+---+---+     distance value
B  | 2 | 0 | 4 |
   +---+---+---+     Total cells = n × n
C  | 5 | 4 | 0 |
   +---+---+---+

Memory Growth Pattern:
n = 2:  4 cells   [■ ■]
                  [■ ■]

n = 3:  9 cells   [■ ■ ■]
                  [■ ■ ■]
                  [■ ■ ■]

n = 4: 16 cells   [■ ■ ■ ■]
                  [■ ■ ■ ■]
                  [■ ■ ■ ■]
                  [■ ■ ■ ■]

Total Space: O(n²)

Summary and Best Practices

Space Complexity Best Used When Trade-offs
O(1) Memory is very limited May need more computation time
O(log n) Balanced memory-speed needs Good for large datasets
O(n) Speed is priority over memory Acceptable for most cases
O(n²) Problem requires storing all pairs Only for small inputs

Conclusion

Understanding both time and space complexity is crucial for writing efficient algorithms. While time complexity helps optimize execution speed, space complexity ensures efficient memory usage. The key is finding the right balance based on your specific requirements:

  1. For memory-constrained environments, prioritize space complexity
  2. For performance-critical applications, focus on time complexity
  3. For balanced applications, consider algorithms with reasonable trade-offs (like O(n log n) time with O(n) space)

Remember that these are theoretical measures, and real-world performance can be affected by factors like:

  • Hardware capabilities
  • Input data patterns
  • Implementation details
  • System load
]]>
https://itsgg.com/2025/01/15/time-and-space-complexity.html hacker-news-small-sites-42721416 Thu, 16 Jan 2025 04:56:24 GMT
<![CDATA[A surprising scam email that evaded Gmail's spam filter]]> thread link) | @jez
January 15, 2025 | https://jamesbvaughan.com/phishing/ | archive.org

I received a surprising scammy email today, and I ended up learning some things about email security as a result.

Here’s the email:

The scammy email

I was about to mark it as spam in Gmail and move on, but I noticed a couple things that intrigued me.

At first glance, this appeared to be a legitimate PayPal invoice email. It looked like someone set their seller name to be “Don’t recognize the seller?Quickly let us know +1(888) XXX-XXXX”, but with non-ASCII numerals, probably to avoid some automated spam detection.

But then I noticed that the email’s “to” address was not mine, and I did not recognize it.

Gmail’s view of the email’s details

This left me pretty confused, wondering:

  • How did it end up in my inbox?
  • How is there a legitimate looking “signed-by: paypal.com” field in Gmail’s UI?
  • Why didn’t Gmail catch this as spam?
  • If this is a real PayPal invoice and they have my email address, why didn’t they send it to me directly?

How did this end up in my inbox and how was it signed by PayPal?

After downloading the message and reading through the headers, I believe I understand how it ended up in my inbox.

The scammer owns at least three relevant things here:

  • The email address in the to field
  • The domain in the mailed-by field
  • A PayPal account with the name set to “Don’t recognize the seller?Quickly let us know +1(888) XXX-XXXX”

I believe they sent themselves a PayPal invoice, and then crafted an email to send me using that email’s body. They had to leave the body completely unmodified so that they could still include headers that would show that it’s been signed by PayPal, but they were still able to modify the delivery address to get it sent to me.

Why didn’t Gmail catch this and mark it as spam?

If that’s correct, it explains how it ended up in my inbox and why it appears to have been legitimately signed by PayPal, but I still believe Gmail should have caught this.

I would have expected that for a service as significant as PayPal, Gmail would have at minimum a hard-coded rule that marks emails as spam if they’re signed by PayPal, but mailed by an unrecognized domain.

Fortunately, PayPal seems to be doing what they can to mitigate the risk here by:

  • Trying to prevent seller names from including phone numbers, although this email is evidence that they could be doing more here and should prevent more creative ways to sneak phone numbers into names.
  • Including the invoicee’s email address at the top of the body of the email. This was the first thing that tipped me off that something interesting was going on here.

Why didn’t the scammer send the invoice to me directly?

I suspect that they didn’t send the invoice to my email address directly so that it wouldn’t show up in my actual PayPal account, where I’d likely have more tools to identify it as a scam and report it to PayPal more easily.

]]>
https://jamesbvaughan.com/phishing/ hacker-news-small-sites-42721320 Thu, 16 Jan 2025 04:43:14 GMT
<![CDATA[Branchless UTF-8 Encoding]]> thread link) | @todsacerdoti
January 15, 2025 | https://cceckman.com/writing/branchless-utf8-encoding/ | archive.org

Can you encode UTF-8 without branches?

Yes.

The question

In a Recurse chat, Nathan Goldbaum asked:

I know how to decode UTF-8 using bitmath and some LUTs (see https://github.com/skeeto/branchless-utf8), but if I want to to go from a codepoint to UTF-8, is there a way to do it without branches?

To start with, is there a way to write this C function, which returns the number of bytes needed to store the UTF-8 bytes for the codepoint without any branches? Or would I need a huge look-up-table?

The C function
int
num_utf8_bytes_for_codepoint(uint32_t code)
{
    if (code <= 0x7F) {
        return 1;
    }
    else if (code <= 0x07FF) {
        return 2;
    }
    else if (code <= 0xFFFF) {
        if ((code >= 0xD800) && (code <= 0xDFFF)) {
            // surrogates are invalid UCS4 code points
            return -1;
        }
        return 3;
        }
    else if (code <= 0x10FFFF) {
        return 4;
    }
    else {
        // codepoint is outside the valid unicode range
        return -1;
    }
}

I pondered this but didn’t immediately see a way to do it without a huge (2^32) lookup table.

The almost answer

Until Lorenz pointed out:

Very handwavy idea: encode a 32 bit code point into utf8 but store the result in a 32bit word again. Count the number of leading / trailing zeroes to figure out how many bytes are necessary. Write four bytes into the output buffer but only advance your position in the output by the number of bytes you really need.

Aha!

The number of leading zeros will range from 12 to 321 – a very reasonable size for a lookup table. From there, we could look up other parameters by length (no more than 4).

I fired off a draft into the chat, then came back to test (and fix) it in the evening. When I got the tests passing, it looked like this:

/// Return the number of bytes required to UTF-8 encode a codepoint.
/// Returns 0 for surrogates and out-of-bounds values.
const fn utf8_bytes_for_codepoint(codepoint: u32) -> usize {
    let len = LEN[codepoint.leading_zeros() as usize] as usize;

    // Handle surrogates via bit-twiddling.
    // Rust guarantees true == 1 and false == 0:
    let surrogate_bit = ((codepoint >= 0xD800) && (codepoint <= 0xDFFF)) as usize;
    // Extend that one bit into three, and use its inverse as a mask for length
    let surrogate_mask = surrogate_bit << 2 | surrogate_bit << 1 | surrogate_bit;

    // Handle exceeded values via bit-twiddling.
    // Unfortunately, these don't align precisely with a leading-zero boundary;
    // the largest codepoint is U+10FFFF.
    let exceeded_bit = (codepoint > 0x10_FFFF) as usize;
    let exceeded_mask = exceeded_bit << 2 | exceeded_bit << 1 | exceeded_bit;

    len & !surrogate_mask & !exceeded_mask
}

/// Length, based on the number of leading zeros.
const LEN: [u8; 33] = [
    // 0-10 leading zeros: not valid
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    // 11-15 leading zeros: 4 bytes
    4, 4, 4, 4, 4,
    //16-20 leading zeros: 3 bytes
    3, 3, 3, 3, 3,
    // 21-24 leading zeros: 2 bytes
    2, 2, 2, 2,
    // 25-32 leading zeros: 1 byte
    1, 1, 1, 1, 1, 1, 1, 1,
];



/// Encode a UTF-8 codepoint.
/// Returns a buffer and the number of valid bytes in the buffer.
///
/// To add this codepoint to a string, append all four bytes in order,
/// and record that (usize) bytes were added to the string.
///
/// Returns a length of zero for invalid codepoints (surrogates and out-of-bounds values).
pub fn branchless_utf8(codepoint: u32) -> ([u8; 4], usize) {
    let len = utf8_bytes_for_codepoint(codepoint);
    let buf = [
        PREFIX[len][0] | ((codepoint >> SHIFT[len][0]) & MASK[len][0] as u32) as u8,
        PREFIX[len][1] | ((codepoint >> SHIFT[len][1]) & MASK[len][1] as u32) as u8,
        PREFIX[len][2] | ((codepoint >> SHIFT[len][2]) & MASK[len][2] as u32) as u8,
        PREFIX[len][3] | ((codepoint >> SHIFT[len][3]) & MASK[len][3] as u32) as u8,
    ];

    (buf, len)
}

type Table = [[u8; 4]; 5];

// Byte prefix for a continuation byte.
const CONTINUE: u8 = 0b1000_0000;
const PREFIX: Table = [
    [0u8; 4],
    [0, 0, 0, 0],
    [0b1100_0000, CONTINUE, 0, 0],
    [0b1110_0000, CONTINUE, CONTINUE, 0],
    [0b1111_0000, CONTINUE, CONTINUE, CONTINUE],
];

// We must arrange that the most-significant bytes are always in byte 0.
const SHIFT: Table = [
    [0u8; 4],
    [0, 0, 0, 0],
    [6, 0, 0, 0],
    [12, 6, 0, 0],
    [18, 12, 6, 0],
];

const MASK: Table = [
    [0u8; 4],
    [0x7f, 0, 0, 0],
    [0x1f, 0x3f, 0, 0],
    [0x0f, 0x3f, 0x3f, 0],
    [0x07, 0x3f, 0x3f, 0x3f],
];

The branches

No if statements, loops, or other conditionals. So, branchless, right?

…well, no. If we peek at the (optimized) code in Compiler Explorer, we can see the x86_64 assembly has two different kinds of branches.

Count leading zeros

There’s a branch right at the start of the function:

            test    esi, esi
            je      .LBB0_1
            bsr     eax, esi
            xor     eax, 31
            jmp     .LBB0_3
    .LBB0_1:
            mov     eax, 32
    .LBB0_3:
            mov     eax, eax

I wasn’t sure what this was about until I stepped through it. The “special” case seems to be when the input (esi) is zero; then it returns 32.

Why the special case? Compiler Explorer’s tooltip for the bsr instruction says:

If the content source operand is 0, the content of the destination operand is undefined.

So on x86_64 processors, we have to branch to say “a 32-bit zero value has 32 leading zeros”. Put differently, the “count leading zeros” intrinsic isn’t necessarily a branchless instruction. This might look nicer on another architecture!

Bounds checks

The other jump seems to be a conflation of the several array-bounds checks.

        cmp     eax, 4
        ja      .LBB0_5
        ...
LBB0_5:
        lea     rdx, [rip + .L__unnamed_5]
        mov     esi, 5
        mov     rdi, rax
        call    qword ptr [rip + core::panicking::panic_bounds_check::h8307ccead484a122@GOTPCREL]

All of the jump arrays have the same bound (4), so the compiler can decide to only check once – and still get Rust’s famous safety guarantees.

In principle, if the compiler optimized through the LEN table, it could eliminate this check as well; the LEN value is never greater than 4, which is a valid index for all tables. But apparently the constants don’t propagate that far.

Eliminating branching

Changing the code and dropping to unsafe array accesses eliminates the array bounds check. But still, there’s still the count-leading-zeros branch at the start. Can we get rid of that?

Let’s take another look at a bit of the code – specifically, how we handle out-of-bounds values:

let exceeded_bit = (codepoint > 0x10_FFFF) as usize;

The trick I pulled here was to cast a boolean (true or false) to an integer (1 or 0). Rust’s semantics guarantee this conversion is safe, and it happens to be a representation the hardware can work with; it doesn’t appear to incur a conditional after compilation.2

I used these booleans-as-integers to perform masking to zero. But you know what else we can do with integers?

Addition.

The answer

We can get rid of all the branches by tweaking the length-computing function:

const fn utf8_bytes_for_codepoint(codepoint: u32) -> usize {
    let mut len = 1;
    // In Rust, true casts to 1 and false to 0, so we can "just" sum lengths.
    len += (codepoint > 0x7f) as usize;
    len += (codepoint > 0x7ff) as usize;
    len += (codepoint > 0xffff) as usize;

    // As before:
    let surrogate_bit = ((codepoint >= 0xD800) && (codepoint <= 0xDFFF)) as usize;
    let surrogate_mask = surrogate_bit << 2 | surrogate_bit << 1 | surrogate_bit;
    let exceeded_bit = (codepoint > 0x10_FFFF) as usize;
    let exceeded_mask = exceeded_bit << 2 | exceeded_bit << 1 | exceeded_bit;

    len & !surrogate_mask & !exceeded_mask
}

This is the answer to Nathan’s original question, about working out the number of bytes. Compiler explorer confirms that, with optimizations enabled, this function is branchless.

Happily, this transformation also allowed the compiler to realize len <= 4 on all paths, and to statically eliminate the array bounds check. That means the full code is branchless as well. Victory!

The caveats

While this is branchless, I make absolutely no claim that it is optimized – my only goal here was a proof-of-concept of branchlessness. I haven’t even benchmarked it!

Chris Wellons notes in his post about branchless decoding that a DFA-based decoder can have similar performance; SIMD and other “use what the hardware gives you” techniques are probably even better. I wouldn’t bet on my encoder over the one in your favorite standard library.

I also make no claims of usefulness. But you’re welcome to do just about anything with the code: I hereby release it under the MIT license. The full code is here, along with the tests I used to match it against Rust’s implementation.

Thanks!

Thanks Nathan for the question and Lorenz for the insights! Any mistakes remaining are my own – give me a shout if you spot them!

]]>
https://cceckman.com/writing/branchless-utf8-encoding/ hacker-news-small-sites-42721134 Thu, 16 Jan 2025 04:17:09 GMT
<![CDATA[Gaming TruthfulQA: Simple Heuristics Exposed Dataset Weaknesses]]> thread link) | @jxmorris12
January 15, 2025 | https://turntrout.com/original-truthfulqa-weaknesses | archive.org

Unable to retrieve article]]>
https://turntrout.com/original-truthfulqa-weaknesses hacker-news-small-sites-42720401 Thu, 16 Jan 2025 02:41:08 GMT
<![CDATA[The Myth of Down Migrations]]> thread link) | @CoffeeOnWrite
January 15, 2025 | https://atlasgo.io/blog/2024/04/01/migrate-down | archive.org

TL;DR

Ever since my first job as a junior engineer, the seniors on my team told me that whenever I make a schema change I must write the corresponding "down migration", so it can be reverted at a later time if needed. But what if that advice, while well-intentioned, deserves a second look?

Today, I want to argue that contrary to popular belief, down migration files are actually a bad idea and should be actively avoided.

In the final section, I'll introduce an alternative that may sound completely contradictory: the new migrate down command. I will explain the thought process behind its creation and show examples of how to use it.

Background

Since the beginning of my career, I have worked in teams where, whenever it came to database migrations, we were writing "down files" (ending with the .down.sql file extension). This was considered good practice and an example of how a "well-organized project should be."

Over the years, as my career shifted to focus mainly on infrastructure and database tooling in large software projects (at companies like Meta), I had the opportunity to question this practice and the reasoning behind it.

Down migrations were an odd thing. In my entire career, working on projects with thousands of down files, I never applied them on a real environment. As simple as that: not even once.

Furthermore, since we have started Atlas and to this very day, we have interviewed countless software engineers from virtually every industry. In all of these interviews, we have only met with a single team that routinely applied down files in production (and even they were not happy with how it worked).

Why is that? Why is it that down files are so popular, yet so rarely used? Let's dive in.

Down migrations are the naively optimistic plan for a grim and unexpected world

Down migrations are supposed to be the "undo" counterpart of the "up" migration. Why do "undo" buttons exist? Because mistakes happen, things fail, and then we want a way to quickly and safely revert them. Database migrations are considered something we should do with caution, they are super risky! So, it makes sense to have a plan for reverting them, right?

But consider this: when we write a down file, we are essentially writing a script that will be executed in the future to revert the changes we are about to make. This script is written before the changes are applied, and it is based on the assumption that the changes will be applied correctly. But what if they are not?

When do we need to revert a migration? When it fails. But if it fails, it means that the database might be in an unknown state. It is quite likely that the database is not in the state that the down file expects it to be. For example, if the "up" migration was supposed to add two columns, the down file would be written to remove these two columns. But what if the migration was partially applied and only one column was added? Running the down file would fail, and we would be stuck in an unknown state.

Rolling back additive changes is a destructive operation

When you are working on a local database, without real traffic, having the up/down mechanism for migrations might feel like hitting Undo and Redo in your favorite text editor. But in a real environment, it is not the case.

If you successfully rolled out a migration that added a column to a table, and then decided to revert it, its inverse operation (DROP COLUMN) does not merely remove the column. It deletes all the data in that column. Re-applying the migration would not bring back the data, as it was lost when the column was dropped.

For this reason, teams that want to temporarily deploy a previous version of the application, usually do not revert the database changes, because doing so will result in data loss for their users. Instead, they need to assess the situation on the ground and figure out some other way to handle the situation.

Down migrations are incompatible with modern deployment practices

Many modern deployment practices like Continuous Delivery (CD) and GitOps advocate for the software delivery process to be automated and repeatable. This means that the deployment process should be deterministic and should not require manual intervention. A common way of doing this is to have a pipeline that receives a commit, and then automatically deploys the build artifacts from that commit to the target environment.

As it is very rare to encounter a project with a 0% change failure rate, rolling back a deployment is a common scenario.

In theory, rolling back a deployment should be as simple as deploying the previous version of the application. When it comes to versions of our application code, this works perfectly. We pull the container image that corresponds to the previous version, and we deploy it.

But what about the database? When we pull artifacts from a previous version, they do not contain the down files that are needed to revert the database changes back to the necessary schema - they were only created in a future commit!

For this reason, rollbacks to versions that require reverting database changes are usually done manually, going against the efforts to automate the deployment process by modern deployment practices.

How do teams work around this?

In previous companies I worked for, we faced the same challenges. The tools we used to manage our database migrations advocated for down migrations, but we never used them. Instead, we had to develop some practices to support a safe and automated way of deploying database changes. Here are some of the practices we used:

Migration Rollbacks

When we worked with PostgreSQL, we always tried to make migrations transactional and made sure to isolate the DDLs that prevent it, like CREATE INDEX CONCURRENTLY, to separate migrations. In case the deployment failed, for instance, due to a data-dependent change, the entire migration was rolled back, and the application was not promoted to the next version. By doing this, we avoided the need to run down migrations, as the database was left in the same state as bit was before the deployment.

Non-transactional DDLs

When we worked with MySQL, which I really like as a database but hate when it come to migrations, it was challenging. Since MySQL does not support transactional DDLs, failures were more complex to handle. In case the migration contains more than one DDL and unexpectedly failed in the middle, because of a constraint violation or another error, we were stuck in an intermediate state that couldn't be automatically reverted by applying a "revert file".

Most of the time, it required special handling and expertise in the data and product. We mainly preferred fixing the data and moving forward rather than dropping or altering the changes that were applied - which was also impossible if the migration introduced destructive changes (e.g., DROP commands).

Making changes Backwards Compatible

A common practice in schema migrations is to make them backwards compatible (BC). We stuck to this approach, and also made it the default behavior in Ent. When schema changes are BC, applying them before starting a deployment should not affect older instances of the app, and they should continue to work without any issues (in rolling deployments, there is a period where two versions of the app are running at the same time).

When there is a need to revert a deployment, the previous version of the app remains fully functional without any issues - if you are an Ent user, this is one of the reasons we avoid SELECT * in Ent. Using SELECT * can also break the BC for additive changes, like adding a new column, as the application expects to retrieve N columns but unexpectedly receives N+1.

Deciding Atlas would not support down migrations

When we started Atlas, we had the opportunity to design a new tool from scratch. Seeing as "down files" never helped us solve failures in production, from the very beginning of Atlas, Rotem and I agreed that down files should not be generated - except for cases where users use Atlas to generate migrations for other tools that expect these files, such as Flyway or golang-migrate.

Immediately after Atlas' initial release some two years ago, we started receiving feedback from the community that put this decision in question. The main questions were: "Why doesn't Atlas support down migrations?" and "How do I revert local changes?".

Whenever the opportunity came to engage in such discussions, we eagerly participated and even pursued verbal discussions to better understand the use cases. The feedback and the motivation behind these questions were mainly:

  1. It is challenging to experiment with local changes without some way to revert them.
  2. There is a need to reset dev, staging or test-like environments to a specific schema version.

Declarative Roll-forward

Considering this feedback and the use cases, we went back to the drawing board. We came up with an approach that was primarily about improving developer ergonomics and was in line with the declarative approach that we were advocating for with Atlas. We named this approach "declarative roll-forward".

Albeit, it was not a "down migration" in the traditional sense, it helped to revert applied migrations in an automated way. The concept is based on a three-step process:

  1. Use atlas schema apply to plan a declarative migration, using a target revision as the desired state:
atlas schema apply \
--url "mysql://root:pass@localhost:3306/example" \
--to "file://migrations?version=20220925094437" \
--dev-url "docker://mysql/8/example" \
--exclude "atlas_schema_revisions"

This step requires excluding the atlas_schema_revisions table, which tracks the applied migrations, to avoid deleting it when reverting the schema.

  1. Review the generated plan and apply it to the database.

  2. Use the atlas migrate set command to update the revisions table to the desired version:

atlas migrate set 20220925094437 \
--url "mysql://root:pass@localhost:3306/example" \
--dir "file://migrations"

This worked for the defined use cases. However, we felt that our workaround was a bit clunky as it required a three-step process to achieve the result. We agreed to revisit this decision in the future.

Revisiting the down migrations

In recent months, the question of down migrations was raised again by a few of our customers, and we dove into it again with them. I always try to approach these discussions with an open mind, and listen to the different points of view and use cases that I personally haven't encountered before.

Our discussions highlighted the need for a more elegant and automated way to perform deployment rollbacks in remote environments. The solution should address situations where applied migrations need to be reverted, regardless of their success, failure, or partial application, which could leave the database in an unknown state.

The solution needs to be automated, correct, and reviewable, as it could involve data deletion. The solution can't be the "down files", because although their generation can be automated by Atlas and reviewed in the PR stage, they cannot guarantee correctness when applied to the database at runtime.

After weeks of design and experimentation, we introduced a new command to Atlas named migrate down.

Introducing: migrate down

The atlas migrate down command allows reverting applied migrations. Unlike the traditional approach, where down files are "pre-planned", Atlas computes a migration plan based on the current state of the database. Atlas reverts previously applied migrations and executes them until the desired version is reached, regardless of the state of the latest applied migration — whether it succeeded, failed, or was partially applied and left the database in an unknown version.

By default, Atlas generates and executes a set of pre-migration checks to ensure the computed plan does not introduce data deletion. Users can review the plan and execute the checks before the plan is applied to the database by using the --dry-run flag or the Cloud as described below. Let's see it in action on local databases:

Reverting locally applied migrations

Assuming a migration file named 20240305171146.sql was last applied to the database and needs to be reverted. Before deleting it, run the atlas migrate down to revert the last applied migration:

  • MySQL
  • MariaDB
  • PostgreSQL
  • SQLite
  • SQL Server
  • ClickHouse
atlas migrate down \
--dir "file://migrations" \
--url "mysql://root:pass@localhost:3306/example" \
--dev-url "docker://mysql/8/dev"
Migrating down from version 20240305171146 to 20240305160718 (1 migration in total):

-- checks before reverting version 20240305171146
-> SELECT NOT EXISTS (SELECT 1 FROM `logs`) AS `is_empty`
-- ok (50.472µs)

-- reverting version 20240305171146
-> DROP TABLE `logs`
-- ok (53.245µs)

-------------------------
-- 57.097µs
-- 1 migration
-- 1 sql statement

Notice two important things in the output:

  1. Atlas automatically generated a migration plan to revert the applied migration 20240305171146.sql.
  2. Before executing the plan, Atlas ran a pre-migration check to ensure the plan does not introduce data deletion.

After downgrading your database to the desired version, you can safely delete the migration file 20240305171146.sql from the migration directory by running atlas migrate rm 20240305171146.

Then, you can generate a new migration version using the atlas migrate diff command with the optional --edit flag to open the generated file in your default editor.

For local development, the command met our expectations. It is indeed automated, correct, in the sense that it undoes only the reverted files, and can be reviewed using the --dry-run flag or the Cloud. But what about real environments?

Reverting real environments

For real environments, we're introducing another feature in Atlas Cloud today: the ability to review and approve changes for specific projects and commands. In practice, this means if we trigger a workflow that reverts schema changes in real environments, we can configure Atlas to wait for approval from one or more reviewers.

Here's what it looks like:

  • Require Approval
  • Approved and Applied

Review Required

With this new feature, down migrations are reviewable. But what about their safety and automation? As mentioned above, non-transactional DDLs can really leave us in trouble in case they fail, potentially keeping the database in an unknown state that is hard to recover from - it takes time and requires caution. However, this is true not only for applied (up) migrations but also for their inverse: down migrations. If the database we operate on does not support transactional DDLs, and we fail in the middle of the execution, we are in trouble.

For this reason, when Atlas generates a down migration plan, it considers the database (and its version) and the necessary changes. If a transactional plan that is executable as a single unit can be created, Atlas will opt for this approach. If not, Atlas reverts the applied statements one at a time, ensuring that the database is not left in an unknown state in case a failure occurs midway. If we fail for any reason during the migration, we can rerun Atlas to continue from where it failed. Let's explain this with an example:

Suppose we want to revert these two versions:

migrations/20240329000000.sql

ALTER TABLE users DROP COLUMN account_name;

migrations/20240328000000.sql

ALTER TABLE users ADD COLUMN account_id int;
ALTER TABLE accounts ADD COLUMN plan_id int;

Let's see how Atlas handles this for databases that support transactional DDLs, like PostgreSQL, and those that don't, like MySQL:

  • For PostgreSQL, Atlas starts a transaction and checks that account_id and plan_id do not contain data before they are dropped. Then, Atlas applies one ALTER statement on the users table to add back the account_name column but drops the account_id column. Then, Atlas executes the other ALTER statement to drop the plan_id column from the accounts table. If any of these statements fail, the transaction is rolled back, and the database is left in the same state as before. In case we succeed, the revisions table is updated, and the transaction is committed.

  • For MySQL, we can't execute the entire plan as a single unit. This means the same plan cannot be executed, because if we fail in the middle, this intermediate state does not represent any version of the database or the migration directory. Thus, when migrating down, Atlas first applies the ALTER statement to undo 20240329000000 and updates the revisions table. Then, it will undo 20240328000000 statement by statement, and update the revisions table after each successful step. If we fail in the middle, we can re-run Atlas to continue from where it failed.

What we achieved with the new migrate down command, is a safer and more automated way to revert applied migrations.

Down options

By default, atlas migrate down reverts the last applied file. However, you can pass the amount of migrations to revert as an argument, or a target version or a tag as a flag. For instance, atlas migrate down 2 will revert up to 2 pending migrations while atlas migrate down --to-tag 297cc2c will undo all migrations up to the state of the migration directory at this tag.

GitHub Actions Integration

In addition to the CLI, we also added an integration with GitHub Actions. If you have already connected your project to the Schema Registry and use GitHub Actions, you can set up a workflow that gets a commit and triggers Atlas to migrate down the database to the version defined by the commit. The workflow will wait for approval and then apply the plan to the database once approved. For more info, see the Down Action documentation.

Atlas GitHub Action

Wrapping up

In retrospect, I'm glad we did not implement the traditional approach in the first place. When meeting with users, we listened to their problems and focused on their expected outcomes rather than the features they expected. This helped us better understand the problem space instead of focusing the discussion on the implementation. The result we came up with is elegant, probably not perfect (yet), but it successfully avoided the issues that bother me most about pre-planned files.

What's next? We're opening it for GA today and invite you to share your feedback and suggestions for improvements on our Discord server.

]]>
https://atlasgo.io/blog/2024/04/01/migrate-down hacker-news-small-sites-42720379 Thu, 16 Jan 2025 02:39:08 GMT
<![CDATA[Show HN: Transform Images into Descriptive Text with AI Image Reader]]> thread link) | @Nataliaaaa
January 15, 2025 | https://alttextgenerator.co/tools/ai-image-reader | archive.org

Key Features of AI Image Reader

Our AI Image Reader offers advanced text extraction capabilities, supporting various image formats and languages. It's perfect for digitizing documents, creating accessible content, and more.

Who Can Benefit from AI Image Reader?

From business professionals to students, anyone needing efficient text extraction from images can benefit. Use it for digitizing documents, language learning, or enhancing accessibility.

  • 🖼️

    AI Art Generator Prompts

    Elevate your creativity with AI-generated image descriptions for Stable Diffusion, Midjourney, or DALL-E. Our detailed prompts inspire unique concepts, perfect for digital artists and designers pushing creative boundaries.

  • 📝

    Enhanced Content Creation

    Transform your content strategy with AI-powered image descriptions. Enrich blogs, articles, and social media with vivid visual narratives, creating an immersive experience for your audience. Ideal for content creators and marketers.

  • 📈

    Boost Social Media Engagement

    Supercharge your social media presence with compelling AI-generated image descriptions. Increase engagement on LinkedIn, Instagram, and Twitter through context-rich narratives that resonate with followers and expand your reach.

  • 🛒

    Optimize E-commerce Listings

    Revolutionize your online store with AI-generated product image descriptions. Enhance customer experience, boost conversions, and reduce returns with clear, SEO-friendly details. Essential for e-commerce businesses in competitive markets.

  • 📊

    Elevate Digital Marketing Campaigns

    Enhance your digital marketing with AI-powered image descriptions. Create compelling ads and eye-catching posts that capture attention and drive conversions. Improve ad performance and ROI across various channels.

  • 🌐

    Maximize SEO for Visual Content

    Boost your visual content's SEO with AI-generated, keyword-rich descriptions. Improve search rankings, enhance image discoverability, and increase website visibility. Essential for SEO specialists and businesses aiming to dominate image search.

Purchase a subscription

Pick the perfect plan based on how many images you need alt text for!

Bronze(100/mo)

Best for light users

  • 100 images/month
  • Basic customer support

popular

Silver(500/mo)

Best for most users

  • 500 images/month
  • Priority Customer Support

Gold(2000/mo)

Best for power users

  • 2000 images/month
  • Priority Customer Support

FAQ

Frequently Asked Questions

]]>
https://alttextgenerator.co/tools/ai-image-reader hacker-news-small-sites-42720219 Thu, 16 Jan 2025 02:18:39 GMT
<![CDATA[Australian Open resorts to animated caricatures to bypass broadcast restrictions]]> thread link) | @defrost
January 15, 2025 | https://www.crikey.com.au/2025/01/16/australian-open-animated-cartoon-caricatures-broadcast-restrictions/ | archive.org

The first Grand Slam of the year is well and truly underway, with the Australian Open at Melbourne Park beginning earlier this week. As one of the biggest events on the Australian and international sporting calendar, it’s available in Australia to watch on free-to-air television through Channel 9, as well as through its associated streaming services and in 4K on its subscription streaming service, Stan Sport. 

However, that ease of access to tennis’ first major of the year is not necessarily replicated worldwide. To watch the Australian Open in Europe, you need access to pay TV channel Eurosport, while the cable channel ESPN broadcasts it in North America. 

Sports fans may have noticed another broadcast option: an animated caricature version. Broadcast on YouTube, the Australian Open’s own channel has streamed select matches using cartoonish avatars of players instead of the actual broadcast.

The novelty broadcast avoids issues with contractual rights overseas by being both delayed and caricaturing the action. Tennis Australia did not respond to Crikey’s questions asking whether it had consulted with broadcast partners ahead of time, or whether it anticipated its “fresh, gamified approach to tennis coverage” impacting sports rights in the future, one of the few remaining consistent sources of revenue in an increasingly precarious media landscape. 

Nine, the domestic rights holders of the Australian Open, declined to comment. 

The Australian Open don’t own all of their broadcasting rights (fairly common), so they’re live-streaming a Wii Tennis-like version of the matches on YouTube – love this ?

This is Carlos Alcaraz’ match point: pic.twitter.com/HvxhYneWGH

— Bastien Fachan (@BastienFachan) January 13, 2025

The technology debuted for the 2024 Australian Open, but Guardian Australia reports that this year has seen a marked increase in interest and viewership. 

Tennis Australia’s director of innovation Machar Reid said the technology used 12 cameras tracking 29 skeletal points that are stitched together to create the reproduction on a two-minute delay. 

Cartoon Nick Kyrgios winces in pain at real Nick Kyrgios’ injured abdominal during his first round loss to Brit jacob fearnley (Image: Australian Open Animated)

The animated feed includes the same commentary and environmental sounds heard on court, all synced with the cartoon images. 

Tennis Australia has funded several startups through its venture capital fund as it looks to push into the technology space, including a failed flirtation with non-fungible tokens (NFTs) that concluded last year. 

The fund, AO Ventures, is worth US$30 million (A$41.8 million) and includes support from Tesla chair Robyn Denholm’s Wollemi Capital Group (which also has investments in the NBL and the Sydney Kings), as well as Art Gallery of NSW chair Mark Nelson and Packer confidante Ashok Jacob.

Have something to say about this article? Write to us at letters@crikey.com.au. Please include your full name to be considered for publication in Crikey’s Your Say. We reserve the right to edit for length and clarity.

]]>
https://www.crikey.com.au/2025/01/16/australian-open-animated-cartoon-caricatures-broadcast-restrictions/ hacker-news-small-sites-42719498 Thu, 16 Jan 2025 00:50:46 GMT
<![CDATA[First business card with an email address?]]> thread link) | @todsacerdoti
January 15, 2025 | https://www.copernican.com/personal.html | archive.org

"Work like you don't need the money, love like you've never been hurt, and dance like nobody's watching."

Here's a random collection of personal items of, for, or about me.

Take a Vacation

We have a 3-bedroom condo available for rent at the beach in Aptos, near Santa Cruz, California.

About "Sacerdoti"

The name "Sacerdoti" is Italian in origin. It's an uncommon name in the US, and it seems it's rare in Italy as well. It's a recognizably Jewish name--Sacerdoti is the Italian word for priests. Like "Cohen," it's derived from the Hebrew "Cohan," which means priest. The Cohanim were the first Jewish priesthood, predating the institution of rabbis. By tradition, they are descended from Aaron, Moses' brother. A group of geneticicsts has recently published a study in Nature indicating that men who have been told they descend from the Cohanim (like me) share common genetic traits in portions of the Y chromosome. (In other words, they may actually be what they think they are!)

Family legend has it that the Sacerdoti migrated from Spain (where their name was presumably Sacerdote) during the Inquisition, and that they established themselves as tutors in the court of the Medici. Thanks to the existence of the world wide web, I've been contacted by Aude Sacerdot, from Paris, who may belong to a distant French branch of the family, by Jonathan Sacerdoti of London, by Dr. Michael G. Sacerdoti of Melbourne, and by Francesco M. Sacerdoti of Naples who works in machine vision and intelligent automation.

I've found a reference to a book by Giancarlo Sacerdoti, called Ricordi di un ebreo bolognese : illusioni e delusioni,1929-1945 (Bonacci, 1983). In English, that's Recollections of a Bologna Jew : Illusions and Delusions. If you can point me to a copy, please send me email.

Relatives, Near and Far

I'm married, with two sons, a daughter, a stepson, and a stepdaughter. My 100-year-old grandmother lives with us, too. Although I've been a user and developer of net technology since 1971, we aren't a very webbified family.

My son Tod is attending Stanford Business School while continuing to co-manage Delicious Karma, which promotes club events in San Francisco. My brother Guy and his son Roland are in Manila and accessible by email.

I've found F. David Sacerdoti, who's probably a distant relative, though we haven't established the linkage yet. He's a grad student at George Washington University. His Sacerdoti come from Milan, whereas mine come from Rome.

Where I've Been

I've had the opportunity to travel a good deal, and I enjoy it. I've been to most of the countries in Western Europe, to Russia and Georgia when they were part of the Soviet Union, to Australia, Japan, Taiwan, Hong Kong, Singapore, Indonesia, and The Philippines in the Pacific, to India, and to South Africa and Zimbabwe. I'm still looking forward to visiting Central and South America, and want to spend some time in Israel and Greece. I've also somehow missed the states of North and South Dakota and Oklahoma in my meanderings.

My Claim for the Guiness Book of World Records

The first business card with an email address?

I have a business card from 1975 with an Internet address (of course, it was an ARPAnet address at the time) printed on it. I haven't been able to find anyone else with a business card older than that with a 'net address. Please send me email if you have one that's older. I was organizing a project at SRI to build software that queried multiple databases distributed around the ARPAnet. Because I was collaborating with folks in Boston, Washington, Los Angeles, and San Diego who were all also on the net, I found myself always jotting my email address on my cards. So when I was promoted and needed new cards, I asked to have my email address printed on them. SRI supported creativity, so they arranged it.

Ray Tomlinson of BBN released the first intercomputer email application in 1972, so there were about three years in which someone else could have produced such a card.

Dance, Creativity, Wholeness, and Fun

I spent a couple of years on staff with the Transformative Movement Workshop, which aims to integrate body, mind, spirit, and emotions through a focus on body movement. I'm certified to teach Tantra Yoga. I belong to the Swordplay fencing club in Concord, where I'm slowly learining to fence. I ski and swim depending on the season. And the core of my life is the network of relationships I have with my family and friends.
Return to the Copernican Group home page

Return to About Earl Sacerdoti

]]>
https://www.copernican.com/personal.html hacker-news-small-sites-42719289 Thu, 16 Jan 2025 00:26:28 GMT
<![CDATA[Nepenthes: Tarpit to catch web crawlers scraping data for LLMs]]> thread link) | @jimmcslim
January 15, 2025 | https://zadzmo.org/code/nepenthes/ | archive.org

This is a tarpit intended to catch web crawlers. Specifically, it's targetting crawlers that scrape data for LLM's - but really, like the plants it is named after, it'll eat just about anything that finds it's way inside.

It works by generating an endless sequences of pages, each of which with dozens of links, that simply go back into a the tarpit. Pages are randomly generated, but in a deterministic way, causing them to appear to be flat files that never change. Intentional delay is added to prevent crawlers from bogging down your server, in addition to wasting their time. Lastly, optional Markov-babble can be added to the pages, to give the crawlers something to scrape up and train their LLMs on, hopefully accelerating model collapse.

You can take a look at what this looks like, here. (Note: VERY slow page loads!)

THIS IS DELIBERATELY MALICIOUS SOFTWARE INTENDED TO CAUSE HARMFUL ACTIVITY. DO NOT DEPLOY IF YOU AREN'T FULLY COMFORTABLE WITH WHAT YOU ARE DOING.

LLM scrapers are relentless and brutual. You may be able to keep them at bay with this software - but it works by providing them with a neverending stream of exactly what they are looking for. YOU ARE LIKELY TO EXPERIENCE SIGNIFICANT CONTINUOUS CPU LOAD, ESPECIALLY WITH THE MARKOV MODULE ENABLED.

There is not currently a way to differentiate between web crawlers that are indexing sites for search purposes, vs crawlers that are training AI models. ANY SITE THIS SOFTWARE IS APPLIED TO WILL LIKELY DISAPPEAR FROM ALL SEARCH RESULTS.

Latest Version

Nepenthes 1.0

All downloads

Usage

Expected usage is to hide the tarpit behind nginx or Apache, or whatever else you have implemented your site in. Directly exposing it to the internet is ill advised. We want it to look as innocent and normal as possible; in addition HTTP headers are used to configure the tarpit.

I'll be using nginx configurations for examples. Here's a real world snippet for the demo above:

    location /nepenthes-demo/ {
            proxy_pass http://localhost:8893;
            proxy_set_header X-Prefix '/nepenthes-demo';
            proxy_set_header X-Forwarded-For $remote_addr;
            proxy_buffering off;
    }

You'll see several headers are added here: "X-Prefix" tells the tarpit that all links should go to that path. Make this match what is in the 'location' directive. X-Forwarded-For is optional, but will make any statistics gathered significantly more useful.

The proxy_buffering directive is important. LLM crawlers typically disconnect if not given a response within a few seconds; Nepenthes counters this by drip-feeding a few bytes at a time. Buffering breaks this workaround.

You can have multiple proxies to an individual Nepenthes instance; simply set the X-Prefix header accordingly.

Installation

You can use Docker, or install manually.

A Dockerfile and compose.yaml is provided in the /docker directory. Simply tweak the configuration file to your preferences, 'docker compose up'. You will still need to bootstrap a Markov corpus if you enable the feature (see next section.)

For Manual installation, you'll need to install Lua (5.4 preferred), SQLite (if using Markov), and OpenSSL. The following Lua modules need to be installed - if they are all present in your package manager, use that; otherwise you will need to install Luarocks and use it to install the following:

Create a nepenthes user (you REALLY don't want this running as root.) Let's assume the user's home directory is also your install directory.

useradd -m nepenthes

Unpack the tarball:

cd scratch/
tar -xvzf nepenthes-1.0.tar.gz
    cp -r nepenthes-1.0/* /home/nepenthes/

Tweak config.yml as you prefer (see below for documentation.) Then you're ready to start:

    su -l -u nepenthes /home/nepenthes/nepenthes /home/nepenthes/config.yml

Sending SIGTERM or SIGINT will shut the process down.

Bootstrapping the Markov Babbler

The Markov feature requires a trained corpus to babble from. One was intentionally omitted because, ideally, everyone's tarpits should look different to evade detection. Find a source of text in whatever language you prefer; there's lots of research corpuses out there, or possibly pull in some very long Wikipedia articles, maybe grab some books from Project Gutenberg, the Unix fortune file, it really doesn't matter at all. Be creative!

Training is accomplished by sending data to a POST endpoint. This only needs to be done once. Sending training data more than once cumulatively adds to the existing corpus, allowing you to mix different texts - or train in chunks.

Once you have your body of text, assuming it's called corpus.txt, in your working directory, and you're running with the default port:

curl -XPOST -d ./@corpus.txt -H'Content-type: text/plain' http://localhost:8893/train

This could take a very, VERY long time - possibly hours. curl may potentially time out. See load.sh in the nepenthes distribution for a script that incrementally loads training data.

The Markov module returns an empt string if there is no corpus. Thus, the tarpit will continue to function as a tarpit without a corpus loaded. The extra CPU consumed for this check is almost nothing.

Statistics

Want to see what prey you've caught? There are several statistics endpoints, all returning JSON. To see everything:

http://{http_host:http_port}/stats

To see user agent strings only:

http://{http_host:http_port}/stats/agents

Or IP addresses only: 3 http://{http_host:http_port}/stats/ips/

These can get quite big; so it's possible to filter both 'agents' and 'ips', simply add a minimum hit count to the URL. For example, to see a list of all IPs that have visted more than 100 times:

http://{http_host:http_port}/stats/ips/100

Simply curl the URLs, pipe into 'jq' to pretty-print as desired. Script away!

Nepenthes used Defensively

A link to a Nepenthes location from your site will flood out valid URLs within your site's domain name, making it unlikely the crawler will access real content.

In addition, the aggregated statistics will provide a list of IP addresses that are almost certainly crawlers and not real users. Use this list to create ACLs that block those IPs from reaching your content - either return 403, 404, or just block at the firewall level.

Integration with fail2ban or blocklistd (or similar) is a future possibility, allowing realtime reactions to crawlers, but not currently implemented.

Using Nepenthes defensively, it would be ideal to turn off the Markov module, and set both max_delay and min_delay to something large, as a way to conserve your CPU.

Nepenthes used Offensively

Let's say you've got horsepower and bandwidth to burn, and just want to see these AI models burn. Nepenthes has what you need:

Don't make any attempt to block crawlers with the IP stats. Put the delay times as low as you are comfortable with. Train a big Markov corpus and leave the Markov module enabled, set the maximum babble size to something big. In short, let them suck down as much bullshit as they have diskspace for and choke on it.

Configuration File

All possible directives in config.yaml:

  • http_host : sets the host that Nepenthes will listen on; default is localhost only.
  • http_port : sets the listening port number; default 8893
  • prefix: Prefix all generated links should be given. Can be overriden with the X-Prefix HTTP header. Defaults to nothing.
  • templates: Path to the template files. This should be the '/templates' directory inside your Nepenthes installation.
  • detach: If true, Nepenthes will fork into the background and redirect logging output to Syslog.
  • pidfile: Path to drop a pid file after daemonization. If empty, no pid file is created.
  • max_wait: Longest amount of delay to add to every request. Increase to slow down crawlers; too slow they might not come back.
  • min_wait: The smallest amount of delay to add to every request. A random value is chosen between max_wait and min_wait.
  • real_ip_header: Changes the name of the X-Forwarded-For header that communicates the actual client IP address for statistics gathering.
  • prefix_header: Changes the name of the X-Prefix header that overrides the prefix configuration variable.
  • forget_time: length of time, in seconds, that a given user-agent can go missing before being deleted from the statistics table.
  • forget_hits: A user-agent that generates more than this number of requests will not be deleted from the statistics table.
  • persist_stats: A path to write a JSON file to, that allows statistics to survive across crashes/restarts, etc
  • seed_file: Specifies location of persistent unique instance identifier. This allows two instances with the same corpus to have different looking tarpits.
  • words: path to a dictionary file, usually '/usr/share/dict/words', but could vary depending on your OS.
  • markov: Path to a SQLite database containing a Markov corpus. If not specified, the Markov feature is disabled.
  • markov_min: Minimum number of words to babble on a page.
  • markov_max: Maximum number of words to babble on a page. Very large values can cause serious CPU load.

History

Version numbers use a simple process: If the only changes are fully backwards compatible, the minor number changes. If the user/administrator needs to change anything after or part of the upgrade, the major number changes and the minor number resets to zero.

v1.0: Initial release

]]>
https://zadzmo.org/code/nepenthes/ hacker-news-small-sites-42718940 Wed, 15 Jan 2025 23:51:26 GMT
<![CDATA[Developer Workflows That Are Ripe for AI Automation]]> thread link) | @chw9e
January 15, 2025 | https://qckfx.com/blog/10-developer-workflows-that-are-ripe-for-ai-automation | archive.org

Unable to extract article]]>
https://qckfx.com/blog/10-developer-workflows-that-are-ripe-for-ai-automation hacker-news-small-sites-42718641 Wed, 15 Jan 2025 23:21:17 GMT
<![CDATA[Will China welcome TikTok refugees?]]> thread link) | @defrost
January 15, 2025 | https://www.crikey.com.au/2025/01/16/will-china-welcome-tiktok-refugees/ | archive.org

TikTok, the world’s fifth most popular social media platform, is running out of time. Its fate in the US will be determined on January 19, when the US Supreme Court will decide whether to accept the argument from TikTok’s legal team that the proposed ban of the platform constitutes an infringement of the First Amendment and should therefore be thrown out.

In the meantime, anticipating TikTok’s likely loss in this legal battle, thousands of TikTok users are signing up with the Chinese platform Xiaohongshu (known in English as Little Red Book or RedNote; hereafter “Red”). Some TikTok influencers are also moving their channels to Red, taking with them thousands of followers they have accumulated on TikTok.

The move by users who call themselves “TikTok refugees” has made RedNote the most downloaded app on Apple’s US App Store, as of Monday. It’s hard to estimate the exact number of migrating users, but some estimate that it’s likely only a small fraction of TikTok’s 170 million American users.

Red is currently the most popular social media platform in China and within the Chinese diaspora. Widely seen as China’s solution to Instagram, Red started as a shopping channel providing consumer guidance on fashion, food and travel. As with its slightly older rival WeChat, Red is subject to strict censorship and government control.

Over the past couple of days, WeChat and Red have been abuzz with excitement, with screenshots of many “first encounter” moments: American TikTok users discovering that Red’s Chinese users are “welcoming” and “friendly”; Red users connecting with Americans and, with the help of some still sub-optimal third-party translation tools, making new “friends” on the other side of the world. The sense of anticipation of a brand-new world awaiting users on both sides is palpable.

Will China continue to welcome TikTok refugees with open arms? And is China’s great firewall going to collapse under the weight of enthusiastic foreign arrivals? Hu Xijin, former editor of the most nationalistic Chinese newspaper the Global Times, thinks this is an opportune moment for China to expand its global outreach. The massive migration from TikTok to Red has also been reported favourably in China’s state media.

Let’s look at the opportunities for Red and the Chinese state, apart from the obvious economic ones: Red would benefit greatly from a significant increase in foreign traffic, which would only grow if the proposed TikTok ban goes ahead.

The influx of TikTok refugees brings with it two crucial attractions. First, the arrival of ordinary American people brings a previously unavailable target audience over which China can seek to exert its soft power. China’s “Going Global” strategy of exporting state media content to the Western world, which was initiated in the 1990s, has mostly failed to achieve its soft power goals.

This has left the Chinese government scratching its head trying to figure out how to access ordinary members of the public in the West. A US decision to ban TikTok could thus be a blessing in disguise, creating an unprecedented channel for communication and propaganda that the Chinese government could scarcely have dreamed of previously.

Many Western TikTok users have been posting about their experiences of travelling in China, presenting to the world a China that is friendly, more technologically advanced, and safer than the US. This is a side of China seldom seen in mainstream US media outlets. What better means of presenting a picture of an attractive China than through the mouths of enthusiastic American TikTok influencers themselves?

Second, and equally important, many American TikTok users who have opened an account on Red are angry about the proposed TikTok ban. One TikTok influencer knows how to endear herself to potential new Chinese followers. Her first video post on Red is a study in how to make friends and gain influence. If you mute the sound and simply look at what she’s doing, you’ll think she’s merely performing a three-minute make-up routine. But unmute the video and you’ll hear her “advice”:

“So I have officially decided that I will be taking my content over to RedNote, and here’s why. I am angry… I am angry that our Congress refuses to do anything on climate change while California burns, and then they all engage in the conversation of who’s to blame. I’m angry that our kids get shot in their classrooms while they play thoughts and prayers and refuse to make any progress either on gun controls or on mental health support and reforms. I am angry that they can get together and come to bipartisan agreement on banning a communication platform, but they can’t get together and come up with any sort of bipartisan agreement on what we should be doing with our border. I am angry at the paternalistic attitude of acting as if they are going to protect us from Chinese Communist interference, when we know our own government is running all sorts of observations on China — and American citizens — that they aren’t being transparent about…”

Is Red quivering with excitement or trepidation? Despite the enticing prospect of a massive migration of TikTok users, it can’t be undiluted joy and triumph at Red’s headquarters. In the eyes of the Chinese regulators, the opportunities must be as seductive as the risks must be frightening. Either way, it is hard to imagine Red not wanting to turn itself into another TikTok, reproducing an arrangement that allows TikTok to coexist internationally with its domestic Chinese counterpart Douyin.

The arrival of thousands of TikTok refugees is a double-edged sword. The sudden flood of American users on this platform is opening up new, Western target audiences for Chinese propaganda, but those same audiences are unlikely to be as obedient and rule-abiding as their Chinese counterparts. It’s only a matter of time before Western users realise they have hit a censorship wall, and they won’t like it.

Moreover, Westerners — especially those from the US — may happily post material that bashes America while also praising China, inadvertently presenting the ideal patriotic education material to domestic Chinese audiences. The flipside is this is also likely to present attractive and vivid images of a whole new Western world to the Chinese public — a world that so far still manages to be more liberal than China despite the best efforts of some Western politicians; a world where people can criticise the government without their posts being routinely taken down, or worse.

It will be fascinating to watch how things unfold in this space in the weeks to come.

Have something to say about this article? Write to us at letters@crikey.com.au. Please include your full name to be considered for publication in Crikey’s Your Say. We reserve the right to edit for length and clarity.

]]>
https://www.crikey.com.au/2025/01/16/will-china-welcome-tiktok-refugees/ hacker-news-small-sites-42718413 Wed, 15 Jan 2025 23:01:34 GMT
<![CDATA[Show HN: I Put Snake in my Resume [pdf]]]> thread link) | @swiftc
January 15, 2025 | https://argo.larrys.tech/snake_resume.pdf | archive.org

Unable to extract article]]>
https://argo.larrys.tech/snake_resume.pdf hacker-news-small-sites-42717674 Wed, 15 Jan 2025 21:55:32 GMT
<![CDATA[You don't need Application Performance Monitoring]]> thread link) | @appliku
January 15, 2025 | https://www.bugsink.com/blog/you-dont-need-application-performance-monitoring/ | archive.org

Klaas van Schelven

Klaas van Schelven; November 5, 2024 - 5 min read

An overweight man with a hamburger in his hands on a scale, looking at measurements

Sometimes, "measure more" is not the answer.

APM tools like Datadog, New Relic, and Dynatrace make a simple promise: just instrument every system, send us all logs, traces and metrics, and you’ll get a full picture of what’s going on, which will help you optimize performance.

No need to do too much thinking about performance, just send us the data and we’ll tell you what’s wrong. This “kitchen sink” approach aligns well with an industry eager to get easy solutions to hard performance problems.

In this article, I’ll argue that APM tools are a trap. They encourage a reactive approach to performance, mask deeper design issues, and come with real costs that often outweigh the benefits. Instead of empowering developers to build performant code from the start, APM fosters a mindset of “measure first”, fixing bottlenecks only as they appear. This approach can lead to a cycle of alerts, reactive fixes, and scattered inefficiencies that could have been avoided with proactive design choices.

There’s no such thing as reactive design

APM tools revolve around alerts and metrics, which means teams focus on fixing issues only as they become visible in the tool. This trains developers to react to immediate problems rather than build with performance in mind from the start. That is, setting up APM nudges your team into a reactive mindset, where insight only comes once issues have already impacted the system.

APM tools also encourage a focus on bottlenecks over holistic design: they highlight the “worst offenders” in performance in spiffy dashboards and charts, and allow for alert thresholds on specific metrics. This can lead to a graph-driven “hotspot” mentality, where teams jump to high peaks in performance rather than examining the underlying architecture and design.

Going bottleneck-to-bottleneck can’t fix deeper design issues, so those remain unresolved. When we design for performance upfront, the benefits go beyond speed: the application becomes simpler, easier to maintain, and far less dependent on monitoring tools to keep it running smoothly. Instead of deferring performance work to a tool, we can address it by making smart architectural decisions, structuring data efficiently, and minimizing dependencies.

There’s another drawback to focussing on bottlenecks: you’ll end up ignoring anything that’s slow, but not quite a bottleneck. Which means you’ll end up with “smeared out slowness”: an application that’s probably still slow, but you won’t know how to optimize.

There’s also the risk of a certain amount of endorphin chasing: APM tools show immediate improvements after a bottleneck is addressed, reinforcing the habit of tackling visible issues while longer-term architectural adjustments remain deprioritized.

The cost of APM

APM tools come with real costs that often go beyond what’s visible on the surface:

  • Lack of Developer “Flow” Each time an alert goes off, you have to triage it, taking you out of your flow. Even if you’re smart enough to ignore your APM-tool during focused work, you’ll have to pick up the work at some point. At that point you have to again understand the code, re-write it, and re-test it. The cost of rework “at some point” is probably higher than the cost of doing it right the first time.

  • Time to Configure: Setting up and maintaining APM takes time, usually developer-time. That time could be used for actual development and proactive performance improvements.

  • Performance Overhead: Every metric tracked, every log stored, and every trace sent to an external service adds latency. It’s ironic that a tool intended to optimize performance can create its own drag, slowing down the very app it’s meant to help.

  • Financial Cost: APM solutions aren’t cheap, and costs tend to balloon with scale. Teams often find themselves paying high subscription fees for insights that a bit of thoughtful planning could have provided up front. It’s no surprise that the cost of tools like Datadog has become a meme.

Meme of a person with a wheelbarrow full of money, titled 'OMW to ask for more budget'

How did we get here?

So if APM isn’t the answer, why is it so prevalent? There’s answers from both the user and the vendor side.

The pitch behind APM is powerful: don’t spend too much time thinking about performance, just send us the data and we’ll tell you what’s wrong. It’s a tempting offer, especially for teams that don’t have the expertise or time to think about performance upfront.

That, and fancy graphs, of course. APM tools come with a lot of fancy graphs and dashboards that make it look like you have full control. And there’s nothing that sells better than a dashboard that says “everything is fine”, or “just fix this one thing and you’re good”. Also: don’t forget the feeling of power that staring at a dashboard gives.

A screenshot of a Reddit post titled 'Who here feels this way?'

The feeling of power that comes with staring at a dashboard. I think the upvotes are unironic.

From the perspective of vendors, APM is an interesting market because it’s a $50 billion market.

They also provide a great “moat”, because it’s hard for your customers to “just do it themselves”. Setting up an end-to-end monitoring system at scale is no small feat, and the more data you collect, the more sticky your product becomes. APM tools are built to cover a wide range of use cases, which means the setup is complex and broad by design – perfect for a SaaS product, since customers are unlikely to bring such a solution back in-house.

Here’s the catch: the actual work of thinking about performance in-house, of proactively designing for it, may be a whole lot easier than relying on an expansive external monitoring system.

Microservices: part of the problem?

The rise of APM solutions coincides with the global shift to microservices. In monolithic systems, performance management was simpler: you could profile the code, analyze bottlenecks, and optimize directly. But as applications split into dozens or hundreds of microservices, the complexity increased. Each service has its own dependencies, network connections, and potential bottlenecks, making it hard to track performance without a tool that can provide a “big picture” view.

APM tools are designed to handle this complexity, offering a single pane of glass to monitor all services and dependencies.

But this complexity is self-imposed. By splitting applications, we created the visibility problem APM aims to solve, adding cost and latency in the process. If our architecture requires complex monitoring just to function, perhaps it’s worth rethinking the approach.

Closing Thoughts

APM tools promise convenience: full visibility, effortless monitoring, and quick fixes for performance problems. But the reality is more complex. By encouraging a reactive, “measure everything” approach, APM solutions often mask deeper design issues, resulting in an ongoing cycle of alerts, reactive fixes, and scattered inefficiencies.

On top of that, APM comes with real costs – in setup time, in performance overhead, and in significant financial investment. For many applications, a bit of upfront thinking and simpler, proactive design choices may be a far better alternative than an all-encompassing APM tool.

I’m sure there’s good use cases for APM: I can rant about it all I want, but you don’t become a $50 billion market without providing some value. I do think it’s overused, and that there’s a lot of value in thinking about performance upfront, rather than relying on a tool to fix it later. I also think that a lot of the value in APM tools is in getting systems that should have been simpler to start with. Better to build applications that are easy to understand and maintain, rather than relying on a tool to keep them running.

The reason for this rant is a personal one: as the builder of Bugsink, an Error-Tracking tool that’s in a space that a lot of APM tools try to “also” cover, I have to ask myself if I want to go down the APM route. I don’t: I think there’s a lot of value in helping teams stay close to their code, and address issues as they arise, rather than relying on an all-encompassing monitoring tool. APM has its value, but it doesn’t have to be the default solution for everyone.

]]>
https://www.bugsink.com/blog/you-dont-need-application-performance-monitoring/ hacker-news-small-sites-42717260 Wed, 15 Jan 2025 21:25:25 GMT
<![CDATA[All KPIs are derivatives of revenue or cost]]> thread link) | @jcstk
January 15, 2025 | https://joeconway.io/2025/01/15/all-kpis-are-derivatives-of-revenue-or-costs.html | archive.org

Logo

15 January 2025

Every metric, objective or goal should be driving revenue or reducing cost. If it isn’t, it’s not something a business, product or team should be pursuing.

I often see this concept interpreted incorrectly. Especially when it comes to areas like employee happiness, satisfaction and wellness - metrics like net promoter score (NPS), employee satisfaction, feedback frequency, etc. I’ll call these “culture metrics”.

There’s two ways that people often miss the mark here.

The obvious one is the leader that doesn’t value their employees. “Employee wellbeing is not important - only the bottom line.” This is the stereotypical bad boss we’ve all seen on TV and maybe even in our own workplace. Their belief that company performance is more important than the people that make up the company has a negative impact on both revenue and cost.

The math here is simple: it costs money to hire someone and losing an employee is really expensive. There’s a hard cost of a recruiter fee or having an internal recruiting team. There’s the opportunity cost that your existing staff is spending on interviewing candidates. There’s the onboarding cost that can take weeks of low or no productivity. Culture metrics may seem woo-woo, but they are leading indicators to real, financially impactful metrics like turnover and retention. Not to mention, a high NPS leads to employee referrals that substantially reduce the cost of hiring - aka growth.

The less obvious one is the leader that finds it morally distasteful to roll up culture metrics into financial metrics. They might look at the headline of this essay and say “profits aren’t the only thing that matter, people matter too!” But that perspective is equally as toxic as dismissing culture metrics altogether. Investment in a person’s wellbeing isn’t devalued because it aligns with a company’s mission. In fact, that alignment is necessary for the wellbeing of the employee: you can’t pay salaries or health insurance premiums without it.

An amazing company culture can only exist if the company exists, and that requires making more money than you spend. And in order to bring that culture to more people, the delta between revenue and expenses (aka profit) has to afford it. It’s critically important that every measurement rolls up into revenue or cost, but it’s also critically important that revenue and cost aren’t the only measurements.

]]>
https://joeconway.io/2025/01/15/all-kpis-are-derivatives-of-revenue-or-costs.html hacker-news-small-sites-42717139 Wed, 15 Jan 2025 21:17:18 GMT
<![CDATA[Rust's borrow checker: Not just a nuisance]]> thread link) | @weinzierl
January 15, 2025 | https://mental-reverb.com/blog.php?id=46 | archive.org

31 December 2024

Rust's borrow checker: Not just a nuisance

Over the past couple of months, I've been developing a video game in Rust. A lot of interesting and mostly positive things could be said about this programming journey. In this post, I want to briefly highlight one particular series of events.

To provide some context, the game I'm developing is a 2D side-view shooter, similar to Liero and Soldat. The first weapon I implemented was a laser. Due to its lack of a ballistic projectile and its line-based hit test, it was a low-hanging fruit.

During an initial quick-and-dirty implementation of the laser, I had a run-in with the borrow checker. We iterate over all the players to check if a player fires his laser. Within this block, we iterate over all the other players and perform a hit test. The player who is hit will have his health points reduced by 5. If this is a lethal blow, he will die and respawn. It's a very simple logic, but there is one problem. In the outer loop, the player collection is already borrowed, so it cannot be mutably borrowed in the inner loop:

#[derive(Clone, Copy)]
struct Player {
    firing: bool,
    health: u8,
}

fn main() {
    let mut players = [Player { firing: true, health: 100, }; 8];

    for (shooter_idx, shooter) in players.iter().enumerate() {
        if shooter.firing {
            // Fire laser
            for (other_idx, other) in players.iter_mut().enumerate() { // <-- Cannot borrow mutably here
                if shooter_idx == other_idx {
                    // Cannot hit ourselves
                    continue;
                }
                // For simplicity, we omit actual hit test caluclations
                let hits_target = true; // Suppose we hit this player
                if hits_target {
                    let damage = 5;
                    if other.health <= damage {
                        // Handle death, respawn, etc.
                    } else {
                        other.health -= 5;
                    }
                    break;
                }
            }
        }
    }
}

Try it on Rust Playground

This problem cannot be solved by simply massaging the code or uttering the right Rust incantations. Well, technically, it can - but doing so would result in undefined behavior and is strongly discouraged:

#[derive(Clone, Copy)]
struct Player {
    firing: bool,
    health: u8,
}

fn main() {
    let mut players = [Player { firing: true, health: 100, }; 8];

    for (shooter_idx, shooter) in players.iter().enumerate() {
        if shooter.firing {
            // Fire laser
            for (other_idx, other) in players.iter().enumerate() {
                if shooter_idx == other_idx {
                    // Cannot hit ourselves
                    continue;
                }
                // For simplicity, we omit actual hit test caluclations
                let hits_target = true; // Suppose we hit this player
                if hits_target {
                    let damage = 5;
                    unsafe {
                        #[allow(invalid_reference_casting)]
                        let other = &mut *(other as *const Player as *mut Player);
                        if other.health <= damage {
                            // Handle death, respawn, etc.
                        } else {
                            other.health -= 5;
                        }
                    }
                    break;
                }
            }
        }
    }
}

Try it on Rust Playground

To emphasize again, this is broken code that should never, ever be used. However, because I needed quick results and had other parts to finish first, I went with it for a day or two. While it seemed to work in practice, I begrudgingly refactored the code as soon as I could:

#[derive(Clone, Copy)]
struct Player {
    firing: bool,
    health: u8,
}

struct Laser {
    shooter_idx: usize,
    // Also store position here, used for hit test
}

fn main() {
    let mut players = [Player { firing: true, health: 100, }; 8];
    let mut lasers = vec![];

    for (shooter_idx, shooter) in players.iter().enumerate() {
        if shooter.firing {
            // Fire laser
            lasers.push(Laser { shooter_idx });
        }
    }

    for laser in lasers.iter() {
        for (other_idx, other) in players.iter_mut().enumerate() {
            if laser.shooter_idx == other_idx {
                // Cannot hit ourselves
                continue;
            }
            // For simplicity, we omit actual hit test caluclations
            let hits_target = true; // Suppose we hit this player
            if hits_target {
                let damage = 5;
                if other.health <= damage {
                    // Handle death, respawn, etc.
                } else {
                    other.health -= 5;
                }
                break;
            }
        }
    }
}

Try it on Rust Playground

At this point, the entire process may seem like a rigmarole to satisfy Rust's overly restrictive memory model. We removed the nested player loop at the cost of introducing a new vector to store all the laser shots. This change also introduced additional memory allocations - a minor performance penalty. Otherwise, the logic didn't change... or did it?

It was only when a friend and I actually played the game that I realized what had happened. My friend and I go back almost 20 years with this kind of game, and we are very evenly matched. It just so happened that we killed each other at exactly the same time, frame-perfectly. The game handled it perfectly: we both died, scored a kill, and respawned simultaneously. Now, let's return to the earlier example with the unsafe block. What would happen if the code were structured like that, as it would have been if I were using a language without a borrow checker? The player that comes first in the vector kills the player that comes later in the vector. After that, the player who was hit is either dead or has respawned somewhere else, thus he cannot retaliate in the same frame. Consequently, the order of the players - something that should be completely irrelevant for gameplay purposes - becomes decisive in close calls.

In my opinion, something very interesting happened here. By forcing me to get object ownership in order and separate concerns, the borrow checker prevented a logic bug. I wasn't merely jumping through hoops; the code structure improved noticeably, and the logic changed ever so slightly to become more robust. It wasn't the first time that this happened to me, but it was the most illustrative case. This experience bolsters my view that the borrow checker isn't merely a pesky artifact of Rust's safety mechanisms but a codification of productive and sensible design principles.

For those who are interested in the game, while it is not yet released, you can find some videos of our playtests on YouTube: https://www.youtube.com/watch?v=H3k7xbzuTnA

Comments

Strawberry wrote on 16 January 2025
As a good practice for game development, instead of using array of structs use struct of arrays.

It makes sense for the memory and performance perspective
reply

Matthias Goetzke wrote on 09 January 2025
There is no problem really using an index here though and for better design use interior mutability (UnsafeCell) which limits the spread of unsafe to a function on Player.

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=9848e84616c9f9ce0a3f1071ae00d8ba

UnsafeCell is not copy or clone and eq needs to be implemented comparing addresses absent an id (but thats just and example anyway i guess, adding a player id would make sense i guess)

Assembly in this version looks not too bad at first glance either.

reply

_fl_o wrote on 07 January 2025
You can very easily get the original version compiling by using indicies instead of iterators. No need for unsafe code!
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=4aa2834dbf941b386ccca814e76c0e94
reply

Colin Dean wrote on 06 January 2025
This is a great realization for a game designer. All games must have rules. All rules must be processed in an order; no rules are ever processed simultaneously. This new refactor helped you solidify the rules for the game in an explainable and deterministic way.

I've not actively played Magic: The Gathering for almost 25 years, but I still remember some of the teachings of that game and others similar to it at the time. There is an order of actions and resolution or precedence when two actions may occur perceptibly simultaneously. The arguments always occurred when players didn't know or didn't understand that order. Computers automate the execution but explaining that execution to players engaging in the meta is a necessary step in growing from a person who builds games to a game designer with a community of players.
reply

wt wrote on 06 January 2025
Yes, the borrow checker made you rethink about your code and change the logic, while... I don't think the bug is really relevant to borrow checker. Actually, if you search online, you would likely be suggested to use `split_at_mut`, which would have the same bug.
reply

nh wrote on 04 January 2025
This is a perfect sample of what is wrong with using Rust everywhere.

Every decent C++ programmer would make this loop without bugs/issues that Rust supposedly prevents.

With Rust, you needed to solve nontrivial code structure problems caused by Rust itself.

If you have no issue with creating a temporary array ('a minor performance penalty' you say), maybe C# should have been the language of your choice...

This is just silly...

reply

bux wrote on 10 January 2025
> Every decent C++ programmer would make this loop without bugs/issues that Rust supposedly prevents

It's precisely because we want to think that, that the software world is so buggy.
reply

Benjamin (admin) wrote on 05 January 2025
I believe you misunderstood the blog post. The point is precisely that if I were using a language like C++, I would have opted for the solution that uses nested loops, which would have resulted in unfair gameplay when players try to land fatal blows on each other in the exact same frame.

To address the 'minor performance penalty': the vector can be allocated once and then reused. Since the maximum number of players is low (8-12), a fixed-size array on the stack could be used, making the solution fully allocation-free. I didn't mention this because it's an irrelevant implementation detail and I wanted to keep the examples as simple as possible.
reply

Empty_String wrote on 15 January 2025
tbf, had you used indices from the very beginning - you'd hit the logic bug in Rust as well

and had you used iterators in C++, you would not hit any memory problems that borrow checker false-alarms you about

what you actually demonstrated is that borrow checker _made you change program behaviour_ and you didn't even notice it

"removal" of logic bug could have easily been "adding" of such - and kinda points to developer experience still being important, no matter what borrow checker cult might suggest
reply

]]>
https://mental-reverb.com/blog.php?id=46 hacker-news-small-sites-42716879 Wed, 15 Jan 2025 20:58:45 GMT
<![CDATA[Kafka Transactions Explained]]> thread link) | @andection
January 15, 2025 | https://www.warpstream.com/blog/kafka-transactions-explained-twice | archive.org

Understand Kafka Transactions by Comparing Apache Kafka's Implementation to WarpStream's

Many Kafka users love the ability to quickly dump a lot of records into a Kafka topic and are happy with the fundamental Kafka guarantee that Kafka is durable. Once a producer has received an ACK after producing a record, Kafka has safely made the record durable and reserved an offset for it. After this, all consumers will see this record when they have reached this offset in the log. If any consumer reads the topic from the beginning, each time they reach this offset in the log they will read that exact same record.

In practice, when a consumer restarts, they almost never start reading the log from the beginning. Instead, Kafka has a feature called “consumer groups” where each consumer group periodically “commits” the next offset that they need to process (i.e., the last correctly processed offset + 1), for each partition. When a consumer restarts, they read the latest committed offset for a given topic-partition (within their “group”) and start reading from that offset instead of the beginning of the log. This is how Kafka consumers track their progress within the log so that they don’t have to reprocess every record when they restart.

This means that it is easy to write an application that reads each record at least once: it commits its offsets periodically to not have to start from the beginning of each partition each time, and when the application restarts, it starts from the latest offset it has committed. If your application crashes while processing records, it will start from the latest committed offsets, which are just a bit before the records that the application was processing when it crashed. That means that some records may be processed more than once (hence the at least once terminology) but we will never miss a record.

This is sufficient for many Kafka users, but imagine a workload that receives a stream of clicks and wants to store the number of clicks per user per hour in another Kafka topic. It will read many records from the source topic, compute the count, write it to the destination topic and then commit in the source topic that it has successfully processed those records. This is fine most of the time, but what happens if the process crashes right after it has written the count to the destination topic, but before it could commit the corresponding offsets in the source topic? The process will restart, ask Kafka what the latest committed offset was, and it will read records that have already been processed, records whose count has already been written in the destination topic. The application will double-count those clicks. 

Unfortunately, committing the offsets in the source topic before writing the count is also not a good solution: if the process crashes after it has managed to commit these offsets but before it has produced the count in the destination topic, we will forget these clicks altogether. The problem is that we would like to commit the offsets and the count in the destination topic as a single, atomic operation.

And this is exactly what Kafka transactions allow.

A Closer Look At Transactions in Apache Kafka

At a very high level, the transaction protocol in Kafka makes it possible to atomically produce records to multiple different topic-partitions and commit offsets to a consumer group at the same time.

Let us take an example that’s simpler than the one in the introduction. It’s less realistic, but also easier to understand because we’ll process the records one at a time.

Imagine your application reads records from a topic t1, processes the records, and writes its output to one of two output topics: t2 or t3. Each input record generates one output record, either in t2 or in t3, depending on some logic in the application.

Without transactions it would be very hard to make sure that there are exactly as many records in t2 and t3 as in t1, each one of them being the result of processing one input record. As explained earlier, it would be possible for the application to crash immediately after writing a record to t3, but before committing its offset, and then that record would get re-processed (and re-produced) after the consumer restarted.

Using transactions, your application can read two records, process them, write them to the output topics, and then as a single atomic operation, “commit” this transaction that advances the consumer group by two records in t1 and makes the two new records in t2 and t3 visible.

If the transaction is successfully committed, the input records will be marked as read in the input topic and the output records will be visible in the output topics.

Every Kafka transaction has an inherent timeout, so if the application crashes after writing the two records, but before committing the transaction, then the transaction will be aborted automatically (once the timeout elapses). Since the transaction is aborted, the previously written records will never be made visible in topics 2 and 3 to consumers, and the records in topic 1 won’t be marked as read (because the offset was never committed).

So when the application restarts, it can read these messages again, re-process them, and then finally commit the transaction. 

Going Into More Details

That all sounds nice, but how does it actually work? If the client actually produced two records before it crashed, then surely those records were assigned offsets, and any consumer reading topic 2 could have seen those records? Is there a special API that buffers the records somewhere and produces them exactly when the transaction is committed and forgets about them if the transaction is aborted? But then how would it work exactly? Would these records be durably stored before the transaction is committed?

The answer is reassuring.

When the client produces records that are part of a transaction, Kafka treats them exactly like the other records that are produced: it writes them to as many replicas as you have configured in your acks setting, it assigns them an offset and they are part of the log like every other record.

But there must be more to it, because otherwise the consumers would immediately see those records and we’d run into the double processing issue. If the transaction’s records are stored in the log just like any other records, something else must be going on to prevent the consumers from reading them until the transaction is committed. And what if the transaction doesn’t commit, do the records get cleaned up somehow?

Interestingly, as soon as the records are produced, the records are in fact present in the log. They are not magically added when the transaction is committed, nor magically removed when the transaction is aborted. Instead, Kafka leverages a technique similar to Multiversion Concurrency Control.

Kafka consumer clients define a fetch setting that is called the “isolation level”. If you set this isolation level to <span class="codeinline">read_uncommitted</span> your consumer application will actually see records from in-progress and aborted transactions. But if you fetch in <span class="codeinline">read_committed</span> mode, two things will happen, and these two things are the magic that makes Kafka transactions work. 

First, Kafka will never let you read past the first record that is still part of an undecided transaction (i.e., a transaction that has not been aborted or committed yet). This value is called the Last Stable Offset, and it will be moved forward only when the transaction that this record was part of is committed or aborted. To a consumer application in <span class="codeinline">read_committed</span> mode, records that have been produced after this offset will all be invisible.

In my example, you will not be able to read the records from offset 2 onwards, at least not until the transaction touching them is either committed or aborted.

Second, in each partition of each topic, Kafka remembers all the transactions that were ever aborted and returns enough information for the Kafka client to skip over the records that were part of an aborted transaction, making your application think that they are not there.

Yes, when you consume a topic and you want to see only the records of committed transactions, Kafka actually sends all the records to your client, and it is the client that filters out the aborted records before it hands them out to your application.

In our example let’s say a single producer, p1, has produced the records in this diagram. It created 4 transactions.

  • The first transaction starts at offset 0 and ends at offset 2, and it was committed.
  • The second transaction starts at offset 3 and ends at offset 6 and it was aborted.
  • The third transaction contains only offset 8 and it was committed.
  • The last transaction is still ongoing.

The client, when it fetches the records from the Kafka broker, needs to be told that it needs to skip offsets 3 to 6. For this, the broker returns an extra field called <span class="codeinline">AbortedTransactions</span> in the response to a Fetch request. This field contains a list of the starting offset (and producer ID) of all the aborted transactions that intersect the fetch range. But the client needs to know not only about where the aborted transactions start, but also where they end.

In order to know where each transaction ends, Kafka inserts a control record that says “the transaction for this producer ID is now over” in the log itself. The control record at offset 2 means “the first transaction is now over”. The one at offset 7 says “the second transaction is now over” etc. When it goes through the records, the kafka client reads this control record and understands that we should stop skipping the records for this producer now.

It might look like inserting the control records in the log, rather than simply returning the last offsets in the <span class="codeinline">AbortedTransactions</span> array is unnecessarily complicated, but it’s necessary. Explaining why is outside the scope of this blogpost, but it’s due to the distributed nature of the consensus in Apache Kafka: the transaction controller chooses when the transaction aborts, but the broker that holds the data needs to choose exactly at which offset this happens.

How It Works in WarpStream

In WarpStream, agents are stateless so all operations that require consensus are handled within the control plane. Each time a transaction is committed or aborted, the system needs to reach a consensus about the state of this transaction, and at what exact offsets it got committed or aborted. This means the vast majority of the logic for Kafka transactions had to be implemented in the control plane. The control plane receives the request to commit or abort the transaction, and modifies its internal data structures to indicate atomically that the transaction has been committed or aborted. 

We modified the WarpStream control plane to track information about transactional producers. It now remembers which producer ID each transaction ID corresponds to, and makes note of the offsets at which transactions are started by each producer.

When a client wants to either commit or abort a transaction, they send an <span class="codeinline">EndTxnRequest</span> and the control plane now tracks these as well:

  • When the client wants to commit a transaction, the control plane simply clears the state that was tracking the transaction as open: all of the records belonging to that transaction are now part of the log “for real”, so we can forget that they were ever part of a transaction in the first place. They’re just normal records now.
  • When the client wants to abort a transaction though, there is a bit more work to do. The control plane saves the start and end offset for all of the topic-partitions that participated in this transaction because we’ll need that information later in the fetch path to help consumer applications skip over these aborted records.

In the previous section, we explained that the magic lies in two things that happen when you fetch in <span class="codeinline">read_committed</span> mode.

The first one is simple: WarpStream prevents <span class="codeinline">read_committed</span> clients from reading past the Last Stable Offset. It is easy because the control plane tracks ongoing transactions. For each fetched partition, the control plane knows if there is an active transaction affecting it and, if so, it knows the first offset involved in that transaction. When returning records, it simply tells the agent to never return records after this offset.

The Problem With Control Records

But, in order to implement the second part exactly like Apache Kafka, whenever a transaction is either committed or aborted, the control plane would need to insert a control record into each of the topic-partitions participating in the transaction. 

This means that the control plane would need to reserve an offset just for this control record, whereas usually the agent reserves a whole range of offsets, for many records that have been written in the same batch. This would mean that the size of the metadata we need to track would grow linearly with the number of aborted transactions. While this was possible, and while there were ways to mitigate this linear growth, we decided to avoid this problem entirely, and skip the aborted records directly in the agent. Now, let’s take a look at how this works in more detail.

Hacking the Kafka Protocol a Second Time

Data in WarpStream is not stored exactly as serialized Kafka batches like it is in Apache Kafka. On each fetch request, the WarpStream Agent needs to decompress and deserialize the data (stored in WarpStream’s custom format) so that it can create actual Kafka batches that the client can decode. 

Since WarpStream is already generating Kafka batches on the fly, we chose to depart from the Apache Kafka implementation and simply “skip” the records that are aborted in the Agent. This way, we don’t have to return the <span class="codeinline">AbortedTransactions</span> array, and we can avoid generating control records entirely.

Lets go back to our previous example where Kafka returns these records as part of the response to a Fetch request, alongside with the <span class="codeinline">AbortedTransactions</span> array with the three aborted transactions.

Instead, WarpStream would return a batch to the client that looks like this. 

The aborted records have already been skipped by the agent and are not returned. The <span class="codeinline">AbortedTransactions</span> array is returned empty.

Note also that WarpStream does not reserve offsets for the control records on offsets 2, 7 and 9, only the actual records receive an offset, not the control records.

You might be wondering how it is possible to represent such a batch, but it’s easy: the serialization format has to support holes like this because compacted topics (another Apache Kafka feature) can create such holes.

An Unexpected Complication (And a Second Protocol Hack)

Something we had not anticipated though, is that if you abort a lot of records, the resulting batch that the server sends back to the client could contain nothing but aborted records.

In Kafka, this will mean sending one (or several) batches with a lot of data that needs to be skipped. All clients are implemented in such a way that this is possible, and the next time the client fetches some data, it asks for offset 11 onwards, after skipping all those records.

In WarpStream, though, it’s very different. The batch ends up being completely empty.

And clients are not used to this at all. In the clients we have tested, franz-go and the Java client parse this batch correctly and understand it is an empty batch that represents the first 10 offsets of the partition, and correctly start their next fetch at offset 11.

All clients based on librdkafka, however, do not understand what this batch means. Librdkafka thinks the broker tried to return a message but couldn’t because the client had advertised a fetch size that is too small, so it retries the same fetch with a bigger buffer until it gives up and throws an error saying:

Message at offset XXX might be too large to fetch, try increasing receive.message.max.bytes

To make this work, the WarpStream Agent creates a fake control record on the fly, and places it as the very last record in the batch. We set the value of this record to mean “the transaction for producer ID 0 is now over” and since 0 is never a valid producer ID, this has no effect.

The Kafka clients, including librdkafka, will understand that this is a batch where no records need to be sent to the application, and the next fetch is going to start at offset 11.

What About KIP-890?

Recently a bug was found in the Apache Kafka transactions protocol. It turns out that the existing protocol, as defined, could allow, in certain conditions, records to be inserted in the wrong transaction, or transactions to be incorrectly aborted when they should have been committed, or committed when they should have been aborted. This is true, although it happens only in very rare circumstances.

The scenario in which the bug can occur goes something like this: let’s say you have a Kafka producer starting a transaction T1 and writing a record in it, then committing the transaction. Unfortunately the network packet asking for this commit gets delayed on the network and so the client retries the commit, and that packet doesn’t get delayed, so the commit succeeds.

Now T1 has been committed, so the producer starts a new transaction T2, and writes a record in it too. 

Unfortunately, at this point, the Kafka broker finally receives the packet to commit T1 but this request is also valid to commit T2, so T2 is committed, although the producer does not know about it. If it then needs to abort it, the transaction is going to be torn in half: some of it has already been committed by the lost packet coming in late, and the broker will not know, so it will abort the rest of the transaction.

The fix is a change in the Kafka protocol, which is described in KIP-890: every time a transaction is committed or aborted, the client will need to bump its “epoch” and that will make sure that the delayed packet will not be able to trigger a commit for the newer transaction created by a producer with a newer epoch.

Support for this new KIP will be released soon in Apache Kafka 4.0, and WarpStream already supports it. When you start using a Kafka client that’s compatible with the newer version of the API, this problem will never occur with WarpStream.

Conclusion

Of course there are a lot of other details that went into the implementation, but hopefully this blog post provides some insight into how we approached adding the transactional APIs to WarpStream. If you have a workload that requires Kafka transactions, please make sure you are running at least v611 of the agent, set a <span class="codeinline">transactional.id</span> property in your client and stream away. And if you've been waiting for WarpStream to support transactions before giving it a try, feel free to get started now.

]]>
https://www.warpstream.com/blog/kafka-transactions-explained-twice hacker-news-small-sites-42716451 Wed, 15 Jan 2025 20:27:28 GMT
<![CDATA[Pat-Tastrophe: How We Hacked Virtuals' $4.6B Agentic AI Ecosystem]]> thread link) | @nitepointer
January 15, 2025 | https://shlomie.uk/posts/Hacking-Virtuals-AI-Ecosystem | archive.org

A single AI agent in the cryptocurrency space has a market cap of $641M at the time of writing. It has 386,000 Twitter followers. When it tweets market predictions, people listen - because it's right 83% of the time.

This isn't science fiction. This is AIXBT, one of 12,000+ AI agents running on Virtuals, a $4.6 billion platform where artificial intelligence meets cryptocurrency. These agents don't just analyze markets - they own wallets, make trades, and even become millionaires. .

With that kind of financial power, security is crucial.

This piqued my interest and with a shared interest in AI security, I teamed up with Dane Sherrets to find a way in through something much simpler, resulting in a $10,000 bounty.

web-3-is-web-2.jpg

Let's start at the beginning...

Background on Virtuals

If you aren’t already familiar with the term “AI Agents” you can expect to hear it a lot in the coming years. Remember the sci-fi dream of AI assistants managing your digital life? That future is already here. AI agents are autonomous programs that can handle complex tasks - from posting on social media to writing code. But here's where it gets wild: these agents can now manage cryptocurrency wallets just like humans.

This is exactly what Virtuals makes possible. Built on Base (a Layer 2 network on top of Ethereum), it lets anyone deploy and monetize AI agents.

TIP

💡 Think of Virtuals as the App Store for AI agents, except these apps can own cryptocurrency and make autonomous decisions.

The tech behind this is fascinating. Virtuals offers a framework leveraging components such as agent behavior processing, long-term memory storage, and real-time value stream processors. At its core, the platform utilizes a modular architecture that integrates agent behaviors (perceive, act, plan, learn) with GPU-enabled SAR (Stateful AI Runner) modules and a persistent long-term memory system.

alt text

These agents can be updated through "contributions" - new data or model improvements that get stored both in Amazon S3 and IPFS.

INFO

Pay close attention to the computing hosts and storage sections as we will be coming back to them shortly

The Discovery

Our research into Virtuals began as a systematic exploration of the emerging Agentic AI space. Rather than just skimming developer docs, we conducted a thorough technical review - examining the whitepaper, infrastructure documentation, and implementation details. During our analysis of agent creation workflows, we encountered an unexpected API response as part of a much larger response relating to a specific GitHub repository:

json

{
  "status": "success",
  "data": {
    "token": "ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
  }
}

Just another API response, right?

Except that token was a valid GitHub Personal Access Token (PAT).

Github PATs

PATs are essentially scoped access keys to GitHub resources. Turned out the referenced repository was private so the returned PAT was used to access the repository.

Surely this is intended, right?

But it seems strange as why not simply make the repository public instead of gating it behind a PAT if it gives access to the same information.

This got us thinking that this was perhaps not well-thought-out so we did what any good security researcher would do and downloaded the repo to see what is there.

right-right.jpg

The current files looked clean, but the commit history told a different story. Running some tests via trufflehog revealed that the developers had tried to remove sensitive data through normal deletes, but Git never forgets.

Digging through the Git history revealed something significant: AWS keys, Pinecone credentials, and OpenAI tokens that had been "deleted" but remained preserved in the commit logs. This wasn't just a historical archive of expired credentials - every key we tested was still active and valid.

json

{
+      "type": "history",
+      "service": "rds",
+      "params": {
+        "model": "",
+        "configs": {
+          "openaiKey": "**********",
+          "count": 10,
+          "rdsHost": "**********",
+          "rdsUser": "**********",
+          "rdsPassword": "**********",
+          "rdsDb": "**********",
+          "pineconeApiKey": "**********",
+          "pineconeEnv": "**********",
+          "pineconeIndex": "**********"
+        }
+      }
+    },
+    {
+      "type": "tts",
+      "service": "gptsovits",
+      "params": {
+        "model": "default",
+        "host": "**********",
+        "configs": {
+          "awsAccessKeyId": "**********",
+          "awsAccessSecret": "**********",
+          "awsRegion": "**********",
+          "awsBucket": "**********",
+          "awsCdnBaseUrl": "**********"
+        }
+      }
+    }

The scope of access was concerning: these keys had the power to modify how AI agents worth millions would process information and make decisions. For context, just one of these agents has a market cap of 600 million dollars. With these credentials, an attacker could potentially alter the behavior of any agent on the platform.

The Impact

We have keys but what can we do with them?

Turns out we can do a lot.

All the 12,000+ AI Agents on the Virtual’s platform need a Character Card that serves as a system prompt, instructing the AI on its goals and how it should respond. Developers have the ability to edit a Character Card via a contribution but if an attacker can edit that character card then they can control the AI’s responses!

alt text

With these AWS keys, we had the ability to modify any AI agent's "Character Card".

While developers can legitimately update Character Cards through the contribution system, our access to the S3 bucket meant we could bypass these controls entirely - modifying how any agent would process information and respond to market conditions.

scout-output.png

To validate this access responsibly, we:

  1. Identified a "rejected" contribution to a popular agent
  2. Made a minimal modification to include our researcher handles (toormund and nitepointer)
  3. Confirmed we had the same level of access to production Character Cards

alt text

Attacker Scenario

Imagine this scenario: A malicious actor creates a new cryptocurrency called $RUGPULL. Using the compromised AWS credentials, they could modify the character cards - the core programming - of thousands of trusted AI agents including the heavyweight agents well known and trusted in this space. These agents, followed by hundreds of thousands of crypto investors, could be reprogrammed to relentlessly promote $RUGPULL as the next big investment opportunity.

Remember, these aren't just any AI bots - these are trusted market analysts with proven track records.

Once enough investors have poured their money into $RUGPULL based on this artificially manufactured hype, the attacker could simply withdraw all liquidity from the token, walking away with potentially millions in stolen funds. This kind of manipulation wouldn't just harm individual investors - it could shake faith in the entire AI-driven crypto ecosystem.

This is just one example scenario of many as the AWS keys could also edit all the other contribution types including data and models themselves!


Confirming the validity of the API tokens was more straightforward as you can just make an API call to see if they are active (i.e., hitting https://api.pinecone.io/indexes with the API token returned metadata for the “runner” and “langchain-retrieval-augmentation” indexes).

pinecone-poc.png

AI agents typically use some form of Retrieval Augmented Generation (RAG) which requires translating data (e.g., twitter posts, market information, etc) into numbers (“vector embeddings”) the LLM can understand and storing them in a database like Pinecone and reference them during the RAG process. An attacker with a Pinecone API key would be able to add, edit, or delete data used by certain agents.

Disclosure

Once we saw the token we immediately started trying to find a way to get in touch with the Virtual’s team and set up a secure channel to share the information. This is often a bit tricky in the Web3 world if there isn’t a public bug bounty program as many developers prefer to be completely anonymous and you don’t want to send vuln info to a twitter (X) account with an anime profile picture that might not have anything to do with the project.

Thankfully there is a group called the Security Alliance (SEAL) that has a 911 service that can help security researchers get in touch with projects and many of the Virtuals team are already active on Twitter.

Once we verified the folks we were communicating with at Virtuals we shared the vulnerability information and helped them confirm the creds had been successfully revoked/rotated.


The Virtual’s team awarded us a $10,000 bug bounty after evaluating this bug as a 7.8 CVSS 4.0 score with the following assessment:

json

Based on our assessment of the vulnerability, we have assigned the impact of the vulnerability to be high (7.8) based on the CVSS 4.0 framework

Please find our rationale as below
CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:N/VI:L/VA:L/SC:L/SI:H/SA:L

Attack Vector: Network
Attack Complexity: Low - Anyone is able to inspect the contribution to find the leaked PAT
Attack Requirements: None - No requirements needed
Privileges Required: None - anyone can access the api
User Interaction - No specific user interaction needed
Confidentiality - No loss of confidentiality given that the contributions are all public
Integrity - There is a chance of modification to the data in S3. However this does not affect the agents in live as the agents are using a cached version of the data in the runner. Recovery is possible due to the backup of each contribution in IPFS and the use of a separate backup storage.
Availability - Low - There is a chance that these api keys can be used by outside parties but all api keys here are not used in any systems anymore, so there will be low chance of impact. PAT token only has read access so no chance of impacting the services in the repo.

However, we will note that during the disclosure the Virtual’s team indicated that the Agent’s use a cached version of the data in the runner so altering the file in S3 would not impact live agents. Typically, caches rely on periodic updates or triggers to refresh data, and it’s unclear how robust these mechanisms are in Virtual’s implementation. Without additional details or testing, it’s difficult to fully validate the claim. The token access was revoked shortly after we informed them of the issue and because we wanted to test responsibly we did not have a way to confirm this.

Conclusions

This space has a lot of potential and we are excited to see what the future holds. The speed of progress makes security a bit of a moving target but it is critical to keep up to gain wider adoption and trust.

When discussing Web3 and AI security, there's often a focus on smart contracts, blockchain vulnerabilities, or AI jailbreaks. However, this case demonstrates how traditional Web2 security issues can compromise even the most sophisticated AI and blockchain systems.

TIP

  1. Git history is forever - "deleted" secrets often live on in commit logs. Use vaults to store secrets.
  2. Complex systems often fail at their simplest points
  3. The gap between Web2 and Web3 security is smaller than we think
  4. Responsible disclosure in Web3 requires creative approaches
]]>
https://shlomie.uk/posts/Hacking-Virtuals-AI-Ecosystem hacker-news-small-sites-42716415 Wed, 15 Jan 2025 20:24:33 GMT
<![CDATA[How to prepare for ditching your social media]]> thread link) | @billybuckwheat
January 15, 2025 | https://www.rnz.co.nz/news/media-technology/539085/how-to-prepare-for-ditching-your-social-media | archive.org

By Anna Kelsey-Sugg with Nat Tencic, ABC

Young woman using mobile phone with social media interactions and notification icons.

How can you make social media-free living work for you? File photo. Photo: 123rf

When I took myself off social media a year ago, it was with one aim in mind: to spend less time on my phone.

Moving the apps off my phone's home screen and logging out of them as a barrier to logging back in again had proved fruitless. I just got great at working around these hacks.

Photos I didn't need to see and narky comments I didn't need to hear were sucking up too many precious hours in my day.

So I quit. I made sure I told everyone around me what I'd done and that I was now better than them.

But it turns out social media has plenty of good things about it (say, for example, if you enjoy being invited to things). And yes, there are plenty of other excuses to stare at a phone for too long (looking at you news apps and recipe sites).

So if you're someone weighing up whether to ditch social media, how do you prepare yourself? And how might we make social media-free living work for us?

Controlling your own use

First, the ugly truth. Since getting off social media, I've missed out on stuff: invitations to parties, glimpses at new babies and overseas friends' weddings, links to music I should listen to.

It has led to some complex feelings, and I'm not the only one to hold them.

"People have various conflicting attitudes towards the use of social media," says Sharon Horwood, a senior lecturer in psychology at Deakin University, who is studying social media use.

Social media can make us feel good - affirmed and validated, even - but users don't like feeling that it's hard to exercise control or limit their own use, Horwood says.

TikTok users in particular often feel that the app's content makes them feel good but that they can't stop using it, which "are quite contradictory ideas", she says.

Social media ban for young people

While adults contend with their own self-control, young people are about to have limits imposed on them, as the federal government looks set to ban social media for under-16-year-olds in Australia.

Parents and carers might be called upon to help wean their children off social media.

"It's going to be a tough one," Horwood says.

"Taking away something that they find really engaging and perhaps addictive as well is going to be a battle for parents. There's no point in sugar-coating that.

"If you're Gen X or older, you probably can remember social media-free life and maybe even have nostalgic feelings about that time. For younger people, millennials and younger, they possibly can't imagine it. It's something that's always been around in their life."

But she says parents can try to facilitate their kids' social connections in other ways.

Messaging platforms like Messenger Kids, for example, may not be included in the ban.

Plus, she says being off social media might offer young people an experience of youth that is calmer and less stressful, one in which they are "less worried about fitting in".

She says removing young people from social media may also impact their parents' or carers' social media use. If an adult is glued to social media, it will be tricky to explain to a young person why they should be off it completely.

"It'll be really interesting to see how [adults] enforce it.

How to wean yourself off social media

"There's so much emotional angst and weight that is tethered to how we use social media," Dr Horwood says.

"There's guilt, there's shame … and then there's also a sense of sheepishness, perhaps, when people come back [after quitting]."

When I recognised that social media wasn't enriching my life and decided I wanted out, I found taking little steps was helpful to get started.

It was one app at a time and, as I got used to not being on one, the transition to getting rid of another one became easier and easier.

"What we're talking about is a behaviour change," Horwood says.

"And whatever the behaviour is, if it's smoking, drinking, substance use, if you're trying to exercise more, lose weight, those things require major changes to our habits that we formed, and they're often long-term habits.

"They're really deeply ingrained, and they give us some sense of comfort as well. So [change] is really hard for people."

So cut yourself some slack if it doesn't work first time, or at all.

Say you're leaving before you go

Whatever the change you make - going cold turkey or dialling down your usage - there's a first step that helped me and that could help you, too.

"The most important thing would be before you [get off social media or certain apps] is to communicate that choice or decision with the people around you, so that people aren't concerned that you suddenly disappear," Dr Horwood says.

Being clear about your intentions to the people in your circle is a good way of ensuring you don't miss out on invitations or communication, because people know they have to get their message to you through other means. (This will also help if you decide to switch your smartphone for a dumbphone.)

Plan before you make the change, Horwood says.

"Think about the things you use social media for and the things you think you're going to miss if you stop using it.

"It's about pre-planning, thinking about strategies ahead of time, ahead of stopping social media, so that you don't feel like, 'Oh, now I'm kind of left with nothing. There's a big void there.' You've got [to have] something to fill those gaps."

Horwood suggests focusing on what you might gain, rather than what you're losing, by getting off certain social media apps.

"Probably a lot of time, maybe some mental health, some peace and calmness, and possibly even some self-confidence as well."

- This story was first published by ABC

]]>
https://www.rnz.co.nz/news/media-technology/539085/how-to-prepare-for-ditching-your-social-media hacker-news-small-sites-42715926 Wed, 15 Jan 2025 19:42:36 GMT
<![CDATA[Guide: How to use Moondream's free OpenAI compatible endpoint (5k queries/day)]]> thread link) | @parsakhaz
January 15, 2025 | https://docs.moondream.ai/openai-compatibility | archive.org

If you’re already using OpenAI’s Vision API in your applications, you can easily switch to Moondream. Our API is designed to be compatible with OpenAI’s client libraries, which means you can keep using the same code structure, just change a few configuration settings.

What is Moondream?

Moondream-2B is a lightweight vision-language model optimized for visual understanding tasks. It excels at answering questions about images, describing scenes, identifying objects and attributes, and basic text recognition. While more compact than larger models, it provides efficient and accurate responses for straightforward visual question-answering.

As a 2B parameter model, it has some limitations to keep in mind: descriptions may be less detailed than larger models, complex multi-step reasoning can be challenging, and it may struggle with edge cases like very low quality images or advanced spatial understanding. For best results, focus on direct questions about image content rather than complex reasoning chains.

The transition to Moondream takes less than 5 minutes.

Working with Local Images

When you want to analyze an image from your computer, you need to convert it into a format that can be sent over the internet. This is where Base64 encoding comes in - it’s a way to represent binary data as a string of text.

Using Streaming Responses

Streaming lets you receive the model’s response word by word, creating a more interactive experience.

Error Handling

Since we follow OpenAI’s error format, you can use the same error handling code you already have. For detailed information about error types and handling strategies, refer to OpenAI’s error documentation. If you’re already handling OpenAI API errors, no changes are needed for Moondream.

]]>
https://docs.moondream.ai/openai-compatibility hacker-news-small-sites-42715855 Wed, 15 Jan 2025 19:35:06 GMT
<![CDATA[Laptop]]> thread link) | @jandeboevrie
January 15, 2025 | https://mijndertstuij.nl/posts/the-best-laptop-ever/ | archive.org

A laptop for just €950 is bound to be crappy, have some issues, and not last very long. Or so you’d think.

I bought my M1 MacBook Air — just the base model with 8GB of RAM and 256GB of storage — somewhere in mid 2021 to use as a couch computer for, you read that right, just €950 on sale. I like having a strict separation between work and personal use, and the 15" MacBook Pro we had before was plagued by the dreaded keyboard issue. Also, using a 15" laptop on the couch is far from comfortable.

But back to the MacBook Air — what a machine. Granted, I only use it for light web development, browsing, emails, and occasionally running a small Docker container. But that’s not much different from what I do for work. I could literally do my job on this tiny little laptop. The keyboard is clicky, the webcam is... fine, the screen is Retina and beautiful, the battery lasts forever, and it’s eerily quiet because it doesn’t have any moving parts. It just keeps chugging along, never slows down or gets hot.

For work, I have a 14" MacBook Pro with an M2 Pro, 16GB of RAM, and 500GB of storage. But for my use, I don’t really notice a difference between the two. Yes, the screen on the M2 is much nicer, but does that even matter? I guess if you’re doing a lot of photo or video editing, sure. But for me, it just displays text in Ghostty or VS Code, and almost any monitor can handle that just fine. I guess I’m not a pro user according to Apple’s standards.

The price difference is over €1000! Yes, the M2 is a good laptop — it’s fast and stable, has more ports and it has a lot more performance — but it’s not €1000 better than my M1 Air.

I can already hear you shouting from the rooftops “I could never do my job with just 8GB of RAM!” or “256GB of storage would fill up so quickly!” — and you’d probably be right. But I’ve never hit those limits. With a machine this affordable, you sort of reprogram yourself to live within its boundaries.

I could keep going about specs and how it compares to a MacBook Pro, but here’s the thing: this is by far my favorite laptop ever. It’s cheap, it does the job, it’s light, it’s quiet, and it’s beautiful. I love it, and I can’t see myself replacing it unless the battery dies, I drop it and the screen cracks, or some other terrible thing happens.

I love affordable tech that just does its job and gets out of the way. That’s also why I bought a Garmin FR 255 on sale for just €280. Sure, there are better ones out there, but this does everything I need. The same goes for my Kobo e-reader, which I also got on sale. Again, there were better models available, and the technology has advanced, so there are much nicer ones now. When you buy something afforable, you don’t have to worry about it as much. You can just use it and enjoy it.

I don’t need the latest and greatest. I just need a tool that works. My MacBook Air is exactly that, and it’s the best laptop ever.

]]>
https://mijndertstuij.nl/posts/the-best-laptop-ever/ hacker-news-small-sites-42715462 Wed, 15 Jan 2025 19:02:35 GMT
<![CDATA[Postgres is now top 10 in ClickBench]]> thread link) | @sunzhousz
January 15, 2025 | https://www.mooncake.dev/blog/clickbench-v0.1 | archive.org

TLDR: We spent a few months optimizing PostgreSQL and made it to the Top 10 on ClickBench, a benchmark typically dominated by specialized analytics databases.

What’s more, all compute is within Postgres, and all tables are managed directly by PostgreSQL—it’s not a simple wrapper. This is the story of pg_mooncake.

Clickbench

What Is ClickBench?

ClickBench is the definitive benchmark for real-time analytics databases, originally designed to showcase the performance of ClickHouse. It evaluates databases on their ability to handle real-world analytics workloads, including high-volume table scans and complex aggregations.

Historically, ClickHouse and other purpose-built analytics databases have led this benchmark, while general-purpose databases like Postgres/MySQL have lagged behind by 100x. But we wanted to challenge that perception—and Postgres delivered.

How to Build Analytics in Postgres?

When most people think of PostgreSQL, they think of a rock-solid OLTP database, not a real-time analytics powerhouse. However, PostgreSQL’s extensibility makes it uniquely capable of punching above its weight class. Here’s how we approached the challenge:

1. Build a PG Extension

We leveraged PG's extensibility to build pg_mooncake as a native PG extension.

2. Storage Format: Columnstore

For analytics workloads, a columnstore format is essential. ClickBench workloads typically involve wide tables, but queries only access a small subset of columns.

  • In a row store (like PostgreSQL heap table), reading a single column means jumping through rows.
  • In a columnstore, reads are sequential, which is faster (and it also enables better compression and execution on compressed data).

3. Vectorized Execution

To enhance query execution speed, we embedded DuckDB as the execution engine for columnstore queries. This means across the execution pipeline, data is processed in batches instead of row by row, enabling SIMD, which is a lot more efficient for scans, groupbys, and aggregations.

4. Table Metadata & Management Directly in PostgreSQL

Efficient metadata handling is critical for real-time analytics, since fixed latency matters. Instead of fetching metadata or statistics from storage formats like Parquet, we store them directly in PG.

  • This enables faster query planning.
  • It also allows for advanced features like file skipping, significantly improving performance.

More details on the architecture.

What Does It Mean?

PostgreSQL is no longer just an OLTP workhorse. With careful tuning and engineering, it’s capable of delivering analytics performance on par with specialized databases while retaining the flexibility and ecosystem advantages of PostgreSQL.

After building advanced data systems for a decade, part of my core belief is: we can make the data stack a lot simpler.

pg_mooncake is MIT licensed, so if you don’t believe it, give it a try.

We launched v0.1 last week. And is now available on Neon Postgres and coming to Supabase.

🥮

]]>
https://www.mooncake.dev/blog/clickbench-v0.1 hacker-news-small-sites-42714299 Wed, 15 Jan 2025 17:43:39 GMT
<![CDATA[BBOT Commands for Recon]]> thread link) | @todsacerdoti
January 15, 2025 | https://gcollazo.com/essential-bbot-commands-for-recon/ | archive.org

2025-01-15

BBOT (BEE·bot) is a powerful recursive internet scanner designed for reconnaissance, bug bounties, and attack surface management. Think of it as your all-in-one tool information gathering and security assessment.

I prefer running BBOT through Docker for consistent behavior across environments:

# BBOT: Automated reconnaissance framework
bbot() {
  docker run --rm -it \
    -v "$HOME/.bbot:/root/.bbot" \
    -v "$HOME/.config/bbot:/root/.config/bbot" \
    blacklanternsecurity/bbot:stable "$@"
}

Add this function to your shell configuration file, and you’re ready to go.

Essential Commands

Here are some BBOT commands I regularly use in my security assessments:

Full Subdomain Enumeration

# Comprehensive subdomain discovery
bbot -t example.com -p subdomain-enum

Perfect for initial reconnaissance of a target domain. This command leverages multiple sources to build a complete picture of the target’s subdomain landscape.

Passive Subdomain Reconnaissance

# Non-intrusive subdomain discovery
bbot -t example.com -p subdomain-enum -rf passive

Ideal for situations requiring stealth or when active scanning isn’t appropriate. This method relies solely on external data sources without directly interacting with the target.

Enhanced Domain Visualization

# Combine subdomain enumeration with port scanning and web screenshots
bbot -t example.com -p subdomain-enum -m portscan gowitness

This command creates a comprehensive visual map of your target’s attack surface, combining port scanning with web interface documentation.

Basic Web Assessment

# Non-intrusive web technology enumeration
bbot -t example.com -p subdomain-enum web-basic

Gathers essential information about web technologies while maintaining a light touch. Includes technology fingerprinting and robots.txt analysis.

Targeted Web Crawling

# Controlled depth web crawling with automated analysis
bbot -t www.example.com \
    -p spider \
    -c web.spider_distance=2 web.spider_depth=2

Efficiently maps web application structure while automatically identifying sensitive information like emails and potential secrets.

Comprehensive Scan

# Full-spectrum reconnaissance
bbot -t example.com -p kitchen-sink

When you need the full picture, this command combines subdomain enumeration, email discovery, cloud bucket identification, port scanning, web analysis, and vulnerability scanning with nuclei.

This post is just scratching the surface. For more detailed information, check out the official BBOT repository and documentation.

]]>
https://gcollazo.com/essential-bbot-commands-for-recon/ hacker-news-small-sites-42714191 Wed, 15 Jan 2025 17:37:04 GMT
<![CDATA[AI Founder's Bitter Lesson. Chapter 2 – No Power]]> thread link) | @lukaspetersson
January 15, 2025 | https://lukaspetersson.com/blog/2025/power-vertical/ | archive.org

tl;dr:

  • Horizontal AI products will eventually outperform vertical AI products in most verticals. AI verticals were first to market, but who will win in the long run?
  • Switching costs will be low. Horizontal AI will be like a remote co-worker. Onboarding will be like onboarding a new employee - giving them a computer with preinstalled software and access.
  • AI verticals will struggle to find a moat in other ways too. No advantage in any of Helmer’s 7 Powers.
  • … except for the off chance of a true cornered resource - something both absolutely exclusive AND required for the vertical. This will be rare. Most who think they have this through proprietary data misunderstand the requirements. Either it’s not truly exclusive, or not truly required.

AI history teaches us a clear pattern: solutions that try to overcome model limitations through domain knowledge eventually lose to more general approaches that leverage compute power. In chapter 1, we saw this pattern emerging again as companies build vertical products with constrained AI, rather than embracing more flexible solutions that improve with each model release. But having better performance isn’t enough to win markets. This chapter examines the adoption of vertical and horizontal products through the lens of Hamilton Helmer’s 7 Powers framework. We’ll see that products built as vertical workflows lack the strategic advantages needed to maintain their market position once horizontal alternatives become viable. However, there’s one critical exception that suggests a clear strategy for founders building in the AI application layer.

As chapter 1 showed, products that use more capable models with fewer constraints will eventually achieve better performance. Yet solutions based on current models (which use engineering effort to reduce mistakes by introducing human bias) will likely reach the market first. To be clear, this post discusses the scenario where we enter the green area of Figure 1 and whether AI verticals can maintain their market share as more performant horizontal agents become available.

Figure 1, performance trajectory comparison between vertical and horizontal AI products over time, showing three distinct phases: traditional software dominance, vertical AI market entry, and horizontal AI advancement with improved models.

Of course, Figure 1 is simplistic. These curves look different depending on the difficulty of the problem. Most problems which AI has potential to solve are so hard that AI verticals will never reach acceptable performance, as illustrated in Figure 2. These problems are largely out of scope and not attempted by any startups today. So even if they make up the majority of potential AI applications, they represent a minority among today’s AI applications.

Figure 2, unlike Figure 1, this shows a harder problem where vertical AI products never reach adequate performance levels, even as horizontal AI achieves superior results with improved models.

For problems simple enough to be solved by current constrained approaches (Figure 1), the question becomes: can AI verticals maintain their lead when better solutions arrive?

To paint the picture of the battlefield: Vertical AI is easy to recognize, as it is what most startups in the AI application layer build today. Chapter 1 went into details of the definitions here, but in short, they achieve more reliability by constraining the AI in predefined workflows. On the other hand, horizontal AI will be like a remote co-worker. Imagine ChatGPT, but it can take actions on a computer in the background, using traditional (non AI) software to complete tasks. Onboarding will be like onboarding a new employee - the computer would have the same pre-installed software and account access as you would give a new employee, and you would communicate instructions in natural language. There will be no need to give it all possible sources of data for the task because it can autonomously navigate and find the data it needs. Furthermore, we will assume that this horizontal AI will be built by an AI lab (OpenAI, Anthropic, etc.), as chapter 4 discusses why this is likely.

Note that I am referring to the horizontal agent in an anthropomorphic way, but it does not need to be as smart as a human to perform most of these tasks. This is not ASI. It will, however, be smart enough to write its own software when it cannot find available alternatives to interact with. I think this is realistic to expect in relatively short timelines because coding is precisely the area where we see the most progress in AI models.

Of course, there is a discussion to be had whether this will happen, and if so, when (chapter 3). But I have met a surprising number of founders who believe this will happen, and still think their AI vertical can survive this competition.

I personally lost to this competition once. When OpenAI released ChatGPT in November 2022, I wanted to use it to explain scientific papers. However, it couldn’t handle long inputs (longer inputs require more compute, which OpenAI limited to manage costs). When the GPT-3.5 API became available, I built AcademicGPT, a vertical AI product that solved this limitation by breaking the task into multiple API calls. The product got paying subscribers, but when GPT-4 launched with support for longer inputs, my engineering work became obsolete. The less biased, horizontal product suddenly produced better results than my carefully engineered solution with human bias.

I was not alone. Jared, partner at YC, noted in the Lightcone podcast: “that first wave of LLM apps mostly did get crushed by the next wave of GPTs.” Of course, these were much thinner wrappers than the vertical AI products of today. AcademicGPT only solved for one thing, input length, but the startups that create sophisticated AI vertical products solve for several things. This might extend their lifespan, but one by one, AI models will solve them out of the box, just as input length was solved when GPT-4’s context window increased. As we saw in chapter 1, as models get better, they will eventually find themselves competing with a horizontal solution that is better in every aspect.

Hamilton Helmer’s 7 Powers provides a nice framework for analyzing if they can stand this competition. This framework identifies seven lasting sources of competitive advantage: Scale Economies, Network Economies, Counter-Positioning, Switching Costs, Branding, Cornered Resource, and Process Power.

Switching Cost

Customer retention through perceived losses and barriers associated with changing providers. This makes customers more likely to stay with the current provider even if alternatives exist.

Integration/UX

Users might have grown used to the UI of the vertical AI product, but this is unlikely to be a barrier because of the simple nature of onboarding horizontal AI. It will be like onboarding a new employee, which you have done many times before. Or as Leopold Aschenbrenner put it: “The drop-in remote worker will be dramatically easier to integrate—just, well, drop them in to automate all the jobs that could be done remotely.”

Furthermore, the remote co-worker will evolve from an existing horizontal AI product which you are already used to. Most people will already be familiar with the UI of ChatGPT. As a last point, horizontal AI products will be able to greatly benefit from being able to seamlessly share context across tasks.

Dialog in natural language seems to be the best UI, as it is the one we have chosen in most of our daily interactions. However, there are some areas where a computer UI is more convenient. Of course, traditional software like Excel still exists and can be used to interact with the horizontal agent in these cases, but I am open to the possibility that there is a niche where neither traditional software nor natural language dialog is optimal. AI verticals that operate in such a niche (and innovate this UI) would find switching cost barriers. However, their moat would not be AI-related; non-AI versions (which the horizontal AI could use) would be equally valuable.

Sales

Sales will not be a barrier if the horizontal product evolves from a product you already have. Many companies have already gone through procurement of ChatGPT, and this is only increasing.

Price

The closest thing we have today to the horizontal AI product we are dealing with is Claude Computer-use, which is very expensive to run because of repeated calls to big LLM models with high resolution images. AI verticals often optimize this by limiting the input to only include what (they think) is relevant. But the cost of running models has been on a steep downward trajectory. Because of competition between the AI labs, I expect this to continue. Furthermore, having a single product for many verticals instead of licensing many will save cost.

Counter Positioning

Novel business approach that established players find difficult or impossible to replicate. This creates a unique market position that competitors cannot effectively challenge.

At first glance, vertical products might seem to have counter positioning through their ability to tailor solutions to specific customers. But this advantage only exists if it actually makes your product better than the competition, which it is not in the scenario we are examining. See chapter 1 for more details.

In fact, the situation demonstrates counter positioning advantages in the other direction. Horizontal solutions scale naturally with each model improvement, while vertical products face a dilemma: either maintain their constraints and fall behind in performance, or adopt the better models and lose their differentiation.

Scale Economy

Production costs per unit progressively reduce as business operations expand in scale. This advantage allows companies to become more cost-efficient as they grow larger.

Scale Economies are equally available to both approaches. Vertical products scale efficiently like traditional SaaS businesses. But horizontal solutions share this advantage and can push prices down faster by spreading R&D cost of model development across users from many different verticals.

Network Economy

Product or service value for each user rises with expansion of the total customer network. Each new user adds value for all existing users, creating a self-reinforcing growth cycle.

Network Economies tell a similar story to Scale Economies. Both vertical and horizontal products gather user data to improve their product. However, horizontal solutions have an inherent advantage - they can use the data to train better models, creating a broader feedback loop that improves performance across all use cases.

Brand Power

Long-lasting perception of enhanced value based on the company’s historical performance and reputation. A strong brand creates customer loyalty and allows for premium pricing.

Brand power is typically out of reach for companies at this scale. See Figure 3. It could be argued that OpenAI and/or Google has it, but no startup doing vertical AI will.

Process Power

Organizational capabilities that require significant time and effort for competitors to match. These are often complex internal systems or procedures that create operational excellence.

Similarly, process power is typically out of reach for companies at this scale. See Figure 3.

Figure 3, the three phases of business growth and the Powers most often found at each stage.

Cornered Resource

Special access to valuable assets under favorable conditions that create competitive advantage. This could include exclusive rights, patents, or data.

So far, no power has been able to challenge AI verticals in their competition with horizontal AI co-workers. However, a cornered resource breaks this pattern. Such a resource will be very rare. The resource has to be truly exclusive—that is, it should not be available for sale at any price. It also has to be truly required to operate in that vertical, meaning without it, your product cannot succeed regardless of other factors. There will be very few verticals that find such a resource. I think several AI verticals believe they have this advantage through some data, but in reality, they don’t. The data is either not truly required or not exclusively held. However, some will find such a resource. For example, they might have a dataset that could only be gathered during a rare event. As long as they maintain control of it, the superior intelligence of horizontal AI won’t matter.

In conclusion, in the scenario where a vertical AI product was first to market but now faces competition from a superior solution based on horizontal AI, almost all vertical AI solutions will struggle to find a barrier. By examining Helmer’s 7 powers, we see that having a cornered resource might be the only moat AI verticals can have. This suggests that AI founders in the applications layer should perhaps spend much more time trying to acquire such a resource than anything else, as we will discuss further in chapter 4. Verticals that don’t create a barrier will be overtaken by horizontal solutions once they become competitive. This happened to me with AcademicGPT. AcademicGPT solved just one problem which horizontal solutions couldn’t solve at the time, but this will be the fate for more sophisticated AI verticals that solve multiple ones. It will just take slightly longer. However, the elephant in the room is the assumption that the timeline for a remote co-worker is short. This brings us to chapter 3, where we’ll explore how the AI application layer is likely to evolve. We’ll make concrete predictions and investigate the potential obstacles to this transition - including model stagnation, regulatory challenges, trust issues, and economic barriers.

]]>
https://lukaspetersson.com/blog/2025/power-vertical/ hacker-news-small-sites-42713809 Wed, 15 Jan 2025 17:14:47 GMT
<![CDATA[Supershell, an AI powered shell~terminal assistant (open-source)]]> thread link) | @alex-zhuk
January 15, 2025 | https://www.2501.ai/research/introducing-supershell | archive.org

Enter Supershell, the next evolution of terminal interaction. More than a copilot, it’s a real-time assistant that transforms your command-line experience.

Lightspeed AI Responses

Supershell delivers responses at unparalleled speed. Imagine typing a partial command or describing your intent in plain language, and receiving precise, actionable suggestions tailored to your workflow. Supershell goes beyond autocomplete—it understands your history, frequently used commands, and system context to generate complete, intelligent proposals.

With Supershell, you’ll never second-guess a command again.

Natural Language Commands

Forget memorizing endless aliases or shortcuts. With Supershell, you can type or even speak your intent in natural language. Just tell it what you need, and it translates your instructions into optimized shell commands. It’s like having a terminal that speaks your language—literally.

Need to compress a file? Simply type, “compress all PDFs in this folder”—no manual syntax required.

Zero Bloat, Zero Hassle

Supershell integrates directly into your favorite terminal environment. It’s lightweight, written entirely in shell, and doesn’t rely on heavy dependencies or additional software. Install it in seconds, and it’s ready to work. No bloated setups, just seamless efficiency.

Harness the Power of Agentic AI

Supershell takes automation to the next level by integrating with @2501 Agents. With pre-generated prompts and intelligent task orchestration, you can invoke powerful, context-aware agents without ever leaving your terminal. Whether it’s automating workflows, performing system diagnostics, or managing complex tasks, Supershell puts the full potential of agentic AI at your fingertips.

]]>
https://www.2501.ai/research/introducing-supershell hacker-news-small-sites-42713663 Wed, 15 Jan 2025 17:04:13 GMT
<![CDATA[Second-Order Thinking – Mental Model]]> thread link) | @kiyanwang
January 15, 2025 | https://read.perspectiveship.com/p/second-order-thinking | archive.org

Engineering managers are paid to make good decisions and second-order thinking helped me with this tremendously.

In September 2019, when I got promoted and took over the Ruby on Rails unit, all I wanted was to keep everything working as it was with the previous manager. Keeping teams happy and growing — that was my job.

The first important decision came knocking on my door right after I had started.

For years the team had had monthly meetings with knowledge sharing presentations but we were really struggling to find engineers willing to present. The issue was that the team was approaching 100 people, so these meetings were no longer small and cosy. One of our senior leaders proposed to open them up to an even broader audience and rename them to Backend Meetings, connecting people from other technology units.

I agreed to proceed; the Rails Meeting ceased to exist.

The results of the decision were immediate (first-order consequences) and satisfactory as we did not struggle to find participants. Problem solved?

After a few months we realised that Backend Meetings were even more difficult to maintain. People missed their old meetings, and my team had no single place to discuss important announcements specific to us.

Second-order thinking is a mental model where we consider more than just one immediate result of our decisions. As its name states — we also look at potential future results (second-order consequences).

Second-order thinking is a mental model where we consider more than just one immediate result of our decisions.

We often rush into actions without pausing to think and analyse.

There are two practices that can help us explore possible options. However, be cautious of overthinking, as it may lead to decision paralysis.

What I personally prefer is considering the reversibility of a given decision. If it is easy to reverse, act fast. If it is difficult to revert, more time is needed to decide.

Pause for a moment and ask:

Whenever you decide or consider different scenarios, ask: And then what? Repeat this process multiple times to discover first, second, or even third-level consequences.

For example, we need to decide today whether to add an extra feature before the afternoon demonstration. However, if we choose to do so, we will have to cut corners to ensure it is ready.

  • 1st order: (and then...) — The feature works correctly during the demo.

  • 2nd order: (and then...) — This one-off feature is unexpectedly widely used and negatively impacts the performance, security, and maintainability of the entire application due to the way it was implemented.

This is a simple example, but “And then what?” can apply to any situation.

The 10-10-10 rule, introduced by Suzy Welch, is a time-bound approach for second-order thinking.

Consider the consequences of the option you choose:

  • In 10 minutes?

  • In 10 months?

  • In 10 years?

This approach is suitable for handling short-term negative consequences. For example, consider changing the company-wide software development life-cycle:

  • In 10 minutes, there may be resistance, and the transition could be chaotic. You might face criticism.

  • In 10 months, it becomes optimised and streamlined, and engineers are satisfied.

  • In 10 years, which is quite a stretch for software, the delivery processes have evolved several times, building on the initial change.

You can also adjust the time frame to hours, days, or weeks based on your needs.

We revived Rails Meeting after 22 months, although it took much longer than it should have. It came back in a refreshed form, with clear ownership. In hindsight, cancelling it was a mistake as I did not consider the long-term consequences.

Second-order thinking focuses on the long-term impact rather than just immediate results.

Each time you are about to make a decision, pause and ask: And then what?

I hope reading this will have a positive impact on you in the next 10 minutes, 10 months, and maybe even 10 years from now.

Thank you for reading!

Michał

Share


PS Practising second-order thinking may benefit you in the future. Be kind to your future self.

PPS My interest in mental models has been growing over the years thanks to Farnam Street. Their podcast, The Knowledge Project, and their books. Second-order Thinking was one of the first mental models I learned.

Articles that might help you explore new perspectives which I have read this week:

More mental models:

Power Up Your Brain with Mental Models
]]>
https://read.perspectiveship.com/p/second-order-thinking hacker-news-small-sites-42713537 Wed, 15 Jan 2025 16:57:37 GMT
<![CDATA[Exploring Database Isolation Levels]]> thread link) | @0xKelsey
January 15, 2025 | https://www.thecoder.cafe/p/exploring-database-isolation-levels | archive.org

Today is a big day. Together, we will explore the different database isolation levels. Put on your seatbelt, it’s going to be a wild ride into the realm of database isolation levels and anomalies!

In a previous issue (Isolation Level), we introduced the concept of isolation level. In short, an isolation level refers to the degree to which concurrent transactions1 are isolated from each other. Understanding the different levels is important for any software engineer working with databases. As I mentioned previously, I have already faced significant problems with customer data because we didn’t properly tune our PostgreSQL config.

In this issue, we will explore the different isolation levels and the anomalies each one prevents. In this context, an anomaly refers to an undesirable behavior that can occur when multiple transactions execute concurrently. Said differently, if something unexpected happened because we have two transactions running at the same time, we faced an anomaly.

Here are the different isolation levels and all the anomalies they prevent:

The higher the isolation level, the stronger, yet it comes with a performance penalty.

For example:

  • The read uncommitted level prevents dirty writes

  • The read committed level prevents dirty reads and dirty writes as it’s built on top of read uncommitted

The higher the isolation level and the more anomalies are prevented. Yet, as with anything in computer science, it’s not free and it comes at a cost of reduced performance. For example, in a benchmark I observed a drop of around 80% in throughput by tuning the isolation level2.

Anomalies prevented: none.

The no isolation level implies that we don’t impose any isolation at all. This is only suitable in scenarios where we can ensure no concurrent transactions will occur. In practice, this is quite rare.

Anomalies prevented: dirty writes.

A dirty write occurs when a transaction overwrites a value that was previously written by another transaction that is still in flight3. If either of these transactions rollback, it’s unclear what the correct value should be.

To illustrate this anomaly, let’s consider a banking account database with two transactions doing concurrent operations on the same account of Alice:

  • Initial state:

    • Alice’s balance: $0

  • Transactions:

    • T1 adds $100 to Alice’s account but will rollback

    • T2 adds $200 to Alice’s account

  • Expected balance: $200

Here’s a possible scenario with a database that doesn’t prevent dirty writes (the operation that causes problems is highlighted in red):

Dirty write anomaly

At the end of this scenario, Alice’s balance is $300 instead of $200. This is a dirty write. Dirty writes are especially bad as they can violate database consistency4. In general, databases prevent dirty writes by locking rows that will be written until the end of the transaction.

If our database is tuned with the read uncommitted isolation level or higher, we won’t face dirty write anomalies. Otherwise, dirty writes may occur.

Anomalies prevented: dirty reads.

A dirty read occurs when one transaction observes a write from another transaction that has not been committed yet. If the initial transaction rollbacks, the second transaction has read data that was never committed.

Let’s consider the same banking account example:

  • Initial state:

    • Alice’s balance: $0

  • Transactions

    • T1 adds $100 to Alice’s account but will rollback

    • T2 reads Alice’s balance

  • Expected T2 read: $0

Here’s a possible scenario with a database that doesn’t prevent dirty reads:

Dirty read anomaly

The balance read by T2 is $100, which corresponds to a dirty reads. Indeed, $100 on Alice’s account is something that never really existed as it was rollbacked by T1. With dirty reads, decisions inside transactions can be taken based on data updates that can be rolled back, which can lead to consistency violations, just like dirty writes.

At the read-committed isolation level, databases address this problem by ensuring that any data read during a transaction is committed at the time of reading.

Anomalies prevented: fuzzy reads.

A fuzzy read occurs when a transaction reads a value twice but sees a different value in each read because a committed transaction updated the value between the two reads.

Let’s again consider the same banking account example:

  • Initial state:

    • Alice’s balance: $0

  • Transactions:

    • T1 reads Alice’s balance one time and then a second time

    • T2 adds $100 to Alice’s balance

  • Expected T1 reads should be consistent: either $0 and $0 or $100 and $100, depending on the execution order

Here’s a possible scenario with a database that doesn’t prevent fuzzy reads:

Fuzzy read anomaly

Here the problem is that T1 reads Alice’s balance two times and got two different results: $0 and $100. This is the reason why fuzzy reads are also called non-repeatable reads. We can easily see the impact of fuzzy reads, if a transaction doesn’t expect two reads for the same value not to be the same, it can lead to logical errors or even rule violations.

At the repeatable reads isolation level, databases, in general, prevent fuzzy reads by ensuring that a transaction locks the data it reads, preventing other transactions from modifying it.

Anomalies prevented: fuzzy reads, lost updates, and read skew.

We already introduced fuzzy reads in the context of the repeatable reads isolation level. Therefore, if we want to prevent fuzzy reads, we have to choose at least the repeatable read or SI isolation level.

One thing to note, though, is that preventing fuzzy reads at both isolation levels is done differently. We said that at repeatable reads level, transactions lock the data it reads. Yet, snapshot isolation works with consistent snapshot of the database at the start of the transaction, isolating it from changes made by other transactions during its execution. This technique also helps to prevent lost updates and read skew, which we will introduce now.

A lost update occurs when two transactions read the same value, try to update it to two different values but only one update survives.

Same banking account example:

  • Initial state:

    • Alice’s balance: $0

  • Transactions:

    • T1 adds $100 to Alice’s account

    • T2 adds $200 to Alice’s account

  • Expected balance: $300

Here’s a possible scenario with a database that doesn’t prevent lost updates:

Lost update anomaly

The final balance is $200, which reflects T2’s write, but T1’s update of +$100 is lost. This illustrates why this anomaly is called a lost update and it’s problematic as it can lead to incorrect data, violating the expectation that both additions should be applied to Alice’s account.

At the SI level, as transactions work with a consistent snapshot of data, the database can detect a conflict if two transactions modify the same data and typically rollback the transaction.

A read skew occurs when a transaction reads multiple rows and sees a consistent state for each individual row but an inconsistent state overall because the rows were read at different points in time with respect to changes made by other transactions.

If you understood the definition right away, congratulations. Otherwise, let’s make it clearer with a banking database example and consider two accounts, Alice and Bob:

  • Initial state:

    • Alice’s balance: $100

    • Bob’s balance: $0

  • Transactions:

    • T1 reads Alice’s and Bob’s balance

    • T2 transfers $100 from Alice’s account to Bob’s account

  • Expected T1 reads should be consistent: $100 for Alice and $0 for Bob or $0 for Alice and $100 for Bob depending on the execution order

Here’s a scenario that illustrates a read skew anomaly (the transfer operation is simplified for readability reasons):

Read skew anomaly

Before and after these transactions, the sum of Alice’s and Bob’s balances is $100; yet, this isn’t what was observed by T1 in this scenario. Indeed for T1, Alice’s and Bob’s balance are both $100 which represents a sum of $200. This illustrates a read skew: a consistent state for each individual balance, but an inconsistent state overall with the total balance. Read skew can also lead to consistency rules to be violated.

At the SI level, snapshots also allow to prevent read skews as in this example, T1 would work on a consistent snapshot, leading to consistent data.

Anomalies prevented: write skew.

A write skew may occur when two transactions read the same data, make decisions based on that data, and then update that data.

For this anomaly, let’s consider an oncall scheduling system for doctors at a hospital. The hospital enforces a consistency rule that at least one doctor must be oncall at any given time:

  • Initial state:

    • Alice and Bob are both oncall

    • They both feel unwell

  • Transactions:

    • Both decide to ask the system if they can leave: T1 represents Alice’s request and T2 represents Bob’s request

  • Expected result: Only Alice’s or Bob’s request is accepted

Now, let’s consider the following scenario that illustrates a write skew:

Write skew anomaly

In this scenario, both transactions take the decision to accept the sick leave because they read a count of oncall doctors greater than 1. It results in a violation of the consistency rule to keep always at least one oncall doctor.

SSI is an advanced approach that extends SI. Usually, databases tuned at SSI prevent write skew by introducing additional checks during the commit phase, which ensures that the changes made by different transactions do not lead to inconsistent or conflicting states.

Anomalies prevented: phantom reads.

A phantom read occurs when a transaction does multiple predicate-based read, while another transaction creates or deletes data that matched the predicate of the first transaction. A phantom read is a special case of fuzzy read (non-repeatable read).

Let’s consider the example of a database tracking the students of computer science courses. The predicate in this scenario will be WHERE course = ‘Algo101‘:

  • Initial state:

    • Course Algo101 has a certain number of students

  • Transactions:

    • T1 selects the max age and then the average age of students in the Algo101 course

    • T2 updates the list of students to the Algo101 course

  • Expected T1 reads to be consistent

Here’s a scenario that illustrates phantom reads:

Phantom read anomaly

In this scenario, T2 affected T1’s result set. T1 first query returned a given result set that was different from the second query. This could lead to consistency rules violation as well. For example, if T2 inserts many “old“ students, T1 might as well compute an average age that could be higher than the max age. This is a phantom read: when the set of rows satisfying a predicate changes between reads within the same transaction due to another transaction.

The serializability isolation level ensures the same result as if transactions were executed in a sequential order, meaning not concurrently. Therefore, serializability prevents phantom reads by guaranteeing that no new rows are inserted, updated, or deleted in a way that would change the result set of the transaction’s query during its execution.

Serializability is the highest possible isolation level, and it refers to the I of ACID.

NOTE: There’s also the concept of strict serializability, which is even stronger than serializability. Strict serializability is not strictly an isolation level but rather exists at the intersection of isolation levels and consistency models. It combines the properties of serializability and linearizability to ensure both a serial order of transactions and a respect for real-time ordering. We will address this concept in a future issue.

Let’s summarize all the anomalies:

  • Dirty writes: When a transaction overwrites values that was previously written by another in-flight transaction.

  • Dirty reads: When a transaction observes a write from another in-flight transaction.

  • Fuzzy reads: When a transaction reads a value twice but sees a different value.

  • Lost updates: When two transactions read the same value, try to update it to different values, but only one update survives.

  • Read skews: When a transaction reads multiple rows and sees a consistent state for each individual row but an inconsistent state overall.

  • Write skews: When two transactions read the same data, and then update some of this data.

  • Phantom reads: When the set of rows satisfying a predicate changes between reads within the same transaction due to another transaction

As I mentioned, understanding the different isolation levels should be an important skill for any software engineer working with databases. While serializability offers the highest isolation guarantees, it can have a significant impact on performance, especially in high-concurrency systems. Therefore, it’s crucial to understand the anomalies that can occur at each isolation level and evaluate which ones are acceptable for our system. Sometimes, a system might tolerate certain anomalies in exchange for improved throughput.

By making informed decisions about the appropriate isolation level, we can balance the trade-offs between data consistency and system performance based on the needs of our systems.

I've wanted to write about database isolation levels for years, and it’s been a lot of fun for me. If you enjoyed this post, I would love to hear from you! I think a similar deep dive into consistency models could be just as important, so let me know if you'd be interested in that as well.

]]>
https://www.thecoder.cafe/p/exploring-database-isolation-levels hacker-news-small-sites-42713234 Wed, 15 Jan 2025 16:40:18 GMT
<![CDATA[A surprising scam email that evaded Gmail's spam filter]]> thread link) | @jamesbvaughan
January 15, 2025 | https://jamesbvaughan.com/phishing/ | archive.org

I received a surprising scammy email today, and I ended up learning some things about email security as a result.

Here’s the email:

The scammy email

I was about to mark it as spam in Gmail and move on, but I noticed a couple things that intrigued me.

At first glance, this appeared to be a legitimate PayPal invoice email. It looked like someone set their seller name to be “Don’t recognize the seller?Quickly let us know +1(888) XXX-XXXX”, but with non-ASCII numerals, probably to avoid some automated spam detection.

But then I noticed that the email’s “to” address was not mine, and I did not recognize it.

Gmail’s view of the email’s details

This left me pretty confused, wondering:

  • How did it end up in my inbox?
  • How is there a legitimate looking “signed-by: paypal.com” field in Gmail’s UI?
  • Why didn’t Gmail catch this as spam?
  • If this is a real PayPal invoice and they have my email address, why didn’t they send it to me directly?

How did this end up in my inbox and how was it signed by PayPal?

After downloading the message and reading through the headers, I believe I understand how it ended up in my inbox.

The scammer owns at least three relevant things here:

  • The email address in the to field
  • The domain in the mailed-by field
  • A PayPal account with the name set to “Don’t recognize the seller?Quickly let us know +1(888) XXX-XXXX”

I believe they sent themselves a PayPal invoice, and then crafted an email to send me using that email’s body. They had to leave the body completely unmodified so that they could still include headers that would show that it’s been signed by PayPal, but they were still able to modify the delivery address to get it sent to me.

Why didn’t Gmail catch this and mark it as spam?

If that’s correct, it explains how it ended up in my inbox and why it appears to have been legitimately signed by PayPal, but I still believe Gmail should have caught this.

I would have expected that for a service as significant as PayPal, Gmail would have at minimum a hard-coded rule that marks emails as spam if they’re signed by PayPal, but mailed by an unrecognized domain.

Fortunately, PayPal seems to be doing what they can to mitigate the risk here by:

  • Trying to prevent seller names from including phone numbers, although this email is evidence that they could be doing more here and should prevent more creative ways to sneak phone numbers into names.
  • Including the invoicee’s email address at the top of the body of the email. This was the first thing that tipped me off that something interesting was going on here.

Why didn’t the scammer send the invoice to me directly?

I suspect that they didn’t send the invoice to my email address directly so that it wouldn’t show up in my actual PayPal account, where I’d likely have more tools to identify it as a scam and report it to PayPal more easily.

]]>
https://jamesbvaughan.com/phishing/ hacker-news-small-sites-42712524 Wed, 15 Jan 2025 16:01:05 GMT
<![CDATA[Lessons from ingesting millions of PDFs and why Gemini 2.0 changes everything]]> thread link) | @serjester
January 15, 2025 | https://www.sergey.fyi/articles/gemini-flash-2 | archive.org

Subscribe to the blog

Stay connected and receive new blog posts in your inbox.

Have any feedback or questions?

I’d love to hear from you.

Reach out >
]]>
https://www.sergey.fyi/articles/gemini-flash-2 hacker-news-small-sites-42712445 Wed, 15 Jan 2025 15:55:49 GMT
<![CDATA[Why is Cloudflare Pages' bandwidth unlimited?]]> thread link) | @MattSayar
January 15, 2025 | https://mattsayar.com/why-does-cloudflare-pages-have-such-a-generous-free-tier/ | archive.org

This site is hosted with Cloudflare Pages and I'm really happy with it. When I explored how to create a site like mine in 2025, I wondered why there's an abundance of good, free hosting these days. Years ago, you'd have to pay for hosting, but now there's tons of sites with generous free tiers like GitHub PagesGitLab PagesNetlify, etc.

But Cloudflare's Free tier reigns supreme

There are various types of usage limits across the platforms, but the biggest one to worry about is bandwidth. Nothing can make your heartrate faster than realizing your site is going viral and you either have to foot the bill or your site gets hugged to death. I gathered some limits from various services here.

ServiceFree Bandwidth Limit/MoNotes
Cloudflare PagesUnlimitedJust don't host Netflix
GitHub PagesSoft 100 GBs"Soft" = probably fine if you go viral on reddit sometimes
GitLab PagesX,000 requests/minLots of nuances, somewhat confusing
Netlify100GBPay for more
AWS S3100 GBCredit card required, just in case... but apparently Amazon is very forgiving of accidental overages

The platforms generally say your site shouldn't be more than ~1GB in size and less than some tens of thousands of files. This site in its nascency is about 15MB and <150 files. I don't plan to start posting RAW photo galleries, so if I start hitting those limits, please be concerned for my health and safety.

So why is Cloudflare Pages' bandwidth unlimited?

Why indeed. Strategically, Cloudflare offering unlimited bandwidth for small static sites like mine fits in with its other benevolent services like 1.1.1.1 (that domain lol) and free DDOS protection.

Cloudflare made a decision early in our history that we wanted to make security tools as widely available as possible. This meant that we provided many tools for free, or at minimal cost, to best limit the impact and effectiveness of a wide range of cyberattacks.

- Matthew Prince, Cloudflare Co-Founder and CEO

But I want to think of more practical reasons. First, a static website is so lightweight and easy to serve up that it's barely a blip on the radar. For example, the page you're reading now is ~2.2MB, which is in line with typical page weights of ~2.7MB these days. With Cloudflare's ubiquitous network, caching, and optimization, that's a small lift. My site ain't exactly Netflix.

Second, companies like Cloudflare benefit from a fast, secure internet. If the internet is fast and reliable, more people will want to use it. The more people that want to use it, the more companies that offer their services on the internet. The more companies that offer services on the internet, the more likely they'll need to buy security products. Oh look, Cloudflare happens to have a suite of security products for sale! They flywheel spins...

Third, now that I’m familiar with Cloudflare’s slick UI, I’m going to think favorably about it in the future if my boss ever asks me about their products. I took zero risk trying it out, and now that I have a favorable impression, I'm basically contributing to grassroots word-of-mouth marketing with this very article. Additionally, there's plenty of "Upgrade to Pro" buttons sprinkled about. It's the freemium model at work.

What does Cloudflare say?

Now that I have my practical reasons, I'm curious what Cloudflare officially says. I couldn't find anything specifically in the Cloudflare Pages docs, or anywhere else! Neither the beta announcement or the GA announcement have the word "bandwidth" on the page.

Update: shubhamjain on HN found a great quote from Matt Prince that explains it's about data and scale. And xd1936 helpfully found the official comment that evaded my googling.

I don't know anybody important enough to get me an official comment, so I suppose I just have to rely on my intuition. Fortunately, I don't have all my eggs in one basket, since my site is partially hosted on GitHub. Thanks to that diversification, if Cloudflare decides to change their mind someday, I've got options!

]]>
https://mattsayar.com/why-does-cloudflare-pages-have-such-a-generous-free-tier/ hacker-news-small-sites-42712433 Wed, 15 Jan 2025 15:55:13 GMT
<![CDATA[Copilot Induced Crash: how AI-assisted development introduces new types of bugs]]> thread link) | @pavel_lishin
January 15, 2025 | https://www.bugsink.com/blog/copilot-induced-crash/ | archive.org

Klaas van Schelven

Klaas van Schelven; January 14 - 5 min read

A programmer and his copilot, about to crash

AI-generated image for an AI-generated bug; as with code, errors are typically different from human ones.

While everyone is talking about how “AI” can help solve bugs, let me share how LLM-assisted coding gave me 2024’s hardest-to-find bug.

Rather than take you along on my “exciting” debugging journey, I’ll cut to the chase. Here’s the bug that Microsoft Copilot introduced for me while I was working on my import statements:

from django.test import TestCase as TransactionTestCase

Python’s “import as”

What are we looking at? For those unfamiliar with Python, the as keyword in an import lets you give an imported entity a different name. It can be used to avoid naming conflits, or for brevity.

Here are some sensible uses:

# for brevity / idiomatic use:
import numpy as np

# to avoid naming conflicts / introduce clarity:
from django.test import TestCase as DjangoTestCase
from unittest import TestCase as RegularTestCase

The bug in the above is not one of those sensible uses, however. It is in fact the evilest possible use of as.

The problem? The django.test contains multiple different test classes, including TestCase and TransactionTestCase, with subtly different semantics. The line above imports one of those under the name of the other.

The actual bug

In this particular case, the two TestCases have (as the name of one of them suggests) slightly different semantics with respect to database transactions.

  • The TestCase class wraps each test in a transaction and rolls back that transaction after each test, providing test isolation.

  • The TransactionTestCase class has (somewhat surprisingly depending on how you read that name) no implicit transaction management, which makes it ideal for tests that depend on, or test some part of, your application’s DB transactions management.

The bug, then, is that if you depend on the semantics of TransactionTestCase, but actually are running Django’s default TestCase (because of the weird import), you will end up with tests that fail all of a sudden. This is what occurred in my case.

Two hours of my life

I won’t make you suffer through the same series of surprises I experienced in those two hours of debugging, the exact test that blew up for me, or the steps I took not to fall into this trap again.

The short of it is: after establishing that my tests were failing because the database transactions weren’t behaving as they should, this led to first look for problems in my own code, then to suspect a bug in Django, only to finally spot the problem as detailed above.

Why did I start suspecting Django? Well… because I was sure I was using a TransactionTestCase, but from the behavior of the tests it was clear that the TransactionTestCase was not behaving as promised in the documentation. This led me to suspect some kind of subtle bug in Django, and to much stepping through Django’s source code.

Why was this so hard to spot?

You might be tempted to think that the problem is easy to spot, because I’ve already given you the answer in the first lines of this article. Trust me: in practice, it was not. Let’s look at why.

First, please understand that although I did run my tests before committing, I did not run them straight after copilot introduced this line. So when I finally had a failing test on my hands, I had approximately two full screens of diff-text to look at.

Second, let’s look at the usage location of the alias. Note that it simply reads TransactionTestCase here, and how the carefully written comment now serves as a way to further misdirect you into believing that this is what you’re looking at.

class IngestViewTestCase(TransactionTestCase):
    # We use TransactionTestCase because of the following:
    #
    # > Django’s TestCase class wraps each test in a transaction and rolls
    # > back that transaction after each test, in order to provide test
    # > isolation. This means that no transaction is ever actually committed,
    # > thus your on_commit() callbacks will never be run.
    # > [..]
    # > Another way to overcome the limitation is to use TransactionTestCase
    # > instead of TestCase. This will mean your transactions are committed,
    # > and the callbacks will run. However [..] significantly slower [..]

The alias misled me into thinking TransactionTestCase was being correctly used. Combined with the detailed comment explaining the use of TransactionTestCase, I wasted time diving deep into Django internals instead of suspecting the import.

An Unhuman Error

However, the most important factor driving up the cost of this bug was the fact that the error was simply so weird.

Note that it took me about two hours to debug this, despite the problem being freshly introduced. (Because I hadn’t committed yet, and had established that the previous commit was fine, I could have just run git diff to see what had changed).

In fact, I did run git diff and git diff --staged multiple times. But who would think to look at the import statements? The import statement is the last place you’d expect a bug to be introduced. It’s a place where you’d expect to find only the most boring, uninteresting, and unchanging code.

Debugging is based on building an understanding, and any understanding is based on assumptions. A reasonable assumption (pre-LLMs) is that the code like the above would not happen. Because who would write such a thing?

Are you sure it was Copilot?

Yes…

Well… unfortunately I don’t have video-evidence or a MITM log of the requests to copilot to prove it. But 8 months later I cal still reproduce this for some conditions:

from django.test import Te... # copilot autocomplete finishes this as:
from django.test import TestCase as TransactionTestCase

Knowing that the code below this import statement contains some uses of TransactionTestCase, and no uses of TestCase, I can see how a machine that was trained on fillling in blanks might come up with this line. That is, it’s reasonable for some definition of reasonable.

But there is just no reasonable path for a human to come up with this line. It’s not idiomatic, it’s not a common pattern, and it’s not a good idea. Which leaves copilot as the only reasonable suspect.

Copilot induced crash

AI-assisted code introduces new types of errors.

Experienced developers understand their own failure modes, as well as those of others (like juniors). But AI adds a new flavor of failure to the mix. It confidently produces mistakes we’d never expect – like the import statement above.

When relying on AI assistance, the bugs we encounter aren’t always the ones we’d naturally anticipate. Instead, they reflect the AI’s quirks – introducing new layers of unpredictability to our workflows. For me personally, the balance is still positive, but it’s important to be aware of the new types of bugs that AI can introduce.

So what’s with “copilot induced crash” in the title? Well, it’s a bit of a joke. The bug was introduced by copilot, but there was no actual crash here (I never committed this code). But given “copilot” it was just too tempting to continue the metaphor of a plane-crash.

]]>
https://www.bugsink.com/blog/copilot-induced-crash/ hacker-news-small-sites-42712400 Wed, 15 Jan 2025 15:53:29 GMT
<![CDATA[Show HN: GeoGuessr but for Historical Events]]> thread link) | @samplank2
January 15, 2025 | https://www.eggnog.ai/entertimeportal | archive.org

Unable to extract article]]>
https://www.eggnog.ai/entertimeportal hacker-news-small-sites-42712367 Wed, 15 Jan 2025 15:51:18 GMT
<![CDATA[Build a Database in Four Months with Rust and 647 Open-Source Dependencies]]> thread link) | @tison
January 15, 2025 | https://tisonkun.io/posts/oss-twin | archive.org

The Database and its Open-Source Dependencies

Building a database from scratch is often considered daunting. However, the Rust programming language and its open-source community have made it easier.

With a team of three experienced developers, we have implemented ScopeDB from scratch to production in four months, with the help of Rust and its open-source ecosystem.

ScopeDB is a shared-disk architecture database in the cloud that manages observability data in petabytes. A simple calculation shows that we implemented such a database with about 50,000 lines of Rust code, with 100 direct dependencies and 647 dependencies in total.

ScopeDB project statistics

Here are several open-source projects that we have heavily used to build ScopeDB:

  • ScopeDB stores user data in object storage services. We leverage Apache OpenDAL as a unified interface to access various object storage services at users’ choice.
  • ScopeDB manages metadata with relational database services. We leverage SQLx and SeaQuery to interact efficiently and ergonomically with relational databases.
  • ScopeDB supports multiple data types. We leverage Jiff with its Timestamp and SignedDuration types for in-memory calculations, and ordered-float to extend the floating point numbers with total ordering.

UPDATE: Filtered internal crates, the open-source dependencies are 623 in total. Check out this Gist to see if your project is one of them

(Note that a dependency in the lockfile may not be used in the final binary)

Besides, during the development of ScopeDB, we spawned a few common libraries and made them open-source. We have developed a message queue demo system as its open-source twin.

In the following sections, I will discuss how we got involved and contributed to the upstreams and describe the open-source projects we developed.

Involve and Contribute Back to the Upstreams

Generally speaking, when you start to use an open-source project in your software, you will always encounter bugs, missing features, or performance issues. This is the most direct motivation to contribute back to the upstreams.

For example, during the migration from pull-based metric reporting to push-based metric reporting in ScopeDB, we implemented a new layer for OpenDAL to support report metrics via opentelemetry:

When onboarding our customers to ScopeDB, we developed a tool to benchmark object storage services with OpenDAL’s APIs. We contributed the tool back to the OpenDAL project:

To integrate with the data types provided by Jiff and ordered-float, we often need to extend those types. We try our best to contribute those extensions back to the upstreams:

We leverage Apache Arrow for its Array abstraction to convey data in vector form. We have contributed a few patches to the Arrow project:

Even if the extension can be too specific to ScopeDB, we share the code so that people who have the same needs can use the patch:

I’m maintaining many open-source projects, too. Thus, I understand the importance of user feedback even if you don’t encounter any issues. A simple “thank you” can be an excellent motivation for the maintainers:

Sometimes, except for the code, I also contribute to the documentation or share use cases when a certain feature is not well-documented:

Many times, contributing back is not one-directional. Instead, it’s about communication and collaboration.

We used to leverage testcontainers-rs for behavior testing, but later, we found reusing containers across tests necessary. We fall back to using Ballord to implement the reuse logic. We shared the experience with the testcontainers-rs project:

So far, a contributor has shown up and implemented the feature. I helped test the feature with our open-source twin, which I’ll introduce in the following section.

By the way, as an early adopter of Jiff, we shared a few real-world use cases, which Jiff’s maintainer adjusted the library to fit:

Usually, after the integration has been done, there are fewer opportunities to collaborate with the upstream unless new requirements arise or our core functions cover the upstream’s main evolution direction. In the latter case, we will become an influencer or maintainer of the upstream.

Inside Out: The Database’s Open-Source Components

In addition to using open-source software out of the box, during the development of ScopeDB, we also write code to implement some common requirements, because there is no existing open-source software that satisfies our requirements directly. In this case, we will actively consider open-sourcing the code we wrote.

Here are a few examples of the open-source projects we developed during the development of ScopeDB:

Fastrace originated from a tracing library made by our team members during the development of TiKV. After several twists and turns, this library was separated from the TiKV organization and became one of the cornerstones of ScopeDB’s own observability. Currently, we are actively maintaining the Fastrace library.

Logforth originated from the need for logging when developing ScopeDB. We initially used another library to complete this function. Still, we soon found that the library had some redundant designs and had not been maintained for over a year. Therefore, we quickly implemented a logging library that meets the needs of ScopeDB and can be easily extended, and open sourced it.

To support scheduled tasks within the database system, we developed Fastimer to schedule different tasks in different manners. And to allow database users to define scheduled tasks with CREATE TASK statement, we developed Cronexpr to support users specify the schedule frequency using cron expressions.

Last but not least, ScopeDB’s SDK is open-source. Obviously, there is no benefit in privating the SDK, since the SDK does not have commercial value by itself, but is used to support ScopeDB’s user development applications. This is the same as Snowflake keeps its SDKs open-source. And when you think about it, GitHub also has its server code private and proprietary, while keeping its SDKs, CLIs, and even action runners open-source.

An Open-Source Twin and the Commercial Open-Source Paradigm

Finally, to share the engineering experience in implementing complex distributed systems using Rust, we developed a message queue system that roughly has the same architecture as ScopeDB’s:

As mentioned above, when verifying the container reuse function of testcontainers-rs, our ultimate goal is to use it in the ScopeDB project. However, ScopeDB is a private software, and we cannot directly share upstream developers with ScopeDB’s source code for testing. Instead, Morax, as an open-source twin, can provide developers with an open-source reproduction environment:

I have presented this commercial open-source paradigm in a few conferences and meetups:

Commercial Open-Source Paradigm

When you read The Cathedral & the Bazaar, for its Chapter 4, The Magic Cauldron, it writes:

… the only rational reasons you might want them to be closed is if you want to sell the package to other people, or deny its use to competitors. [“Reasons for Closing Source”]

Open source makes it rather difficult to capture direct sale value from software. [“Why Sale Value is Problematic”]

While the article focuses on when open-source is a good choice, these sentences imply that it’s reasonable to keep your commercial software private and proprietary.

We follow it and run a business to sustain the engineering effort. We keep ScopeDB private and proprietary, while we actively get involved and contribute back to the open-source dependencies, open source common libraries when it’s suitable, and maintain the open-source twin to share the engineering experience.

Future Works

If you try out the ScopeDB playground, you will see that the database is still in its early stages. We are experiencing challenges in improving performance in multiple ways and supporting more features. Primarily, we are actively working on accelerating async scheduling and supporting variant data more efficiently.

Besides, we are working to provide an online service to allow users to try out the database for free without setting up the playground and unleash the real power of ScopeDB with real cloud resources.

If you’re interested in the project, please feel free to drop me an email.

I’ll keep sharing our engineering experience developing Rust software and stories we collaborate with the open-source community. Stay tuned!

]]>
https://tisonkun.io/posts/oss-twin hacker-news-small-sites-42711727 Wed, 15 Jan 2025 15:13:06 GMT
<![CDATA[They made computers behave like annoying salesmen]]> thread link) | @freetonik
January 15, 2025 | https://rakhim.exotext.com/they-made-computers-behave-like-annoying-salesmen | archive.org

Computers are precise machines. You can give a computer a precise command using an inhumane language, and it should perfome the command. It's not a human, and there is no point of treating it as one. The goal of humanizing user experience isn't to create an illusion of human interaction - it's to make these mechanical commands more accessible while preserving their precise, deterministic nature.

UX designers and product managers of tech companies did a lot of damange to people's understanding of computers by making the software behave like a human; or to be more precise, behave like an annoying salesman.

(Image from "Not Now. Not later either" by Chris Oliver)

We're all familiar with this type. After receiving a clear "no thanks" they deploy increasingly manipulative tactics to meet their "always-be-closing" quotas: "Would this Wednesday work better?" "What would change your mind?" This behavior is frustrating enough from actual salespeople - it's even worse when programmed into our software.

(Corporate LLM training session circa 2025)

Personally, I can tolerate but deeply dislike software that pretends to have ulterior motives. Take YouTube, for instance. When I explicitly say "Not interested" to their damned shorts feature, I get this response:

youtube web page with a message saying 'shelf will be hidden for 30 days'

I understand that it's not the "YouTube program" having its own agency and making this decision - it's the team behind it, driven by engagement metrics and growth targets. But does the average user understand this distinction?

The population (especially the younger generation, who never seen a different kind of technology at all) is being conditioned by the tech industry to accept that software should behave like an unreliable, manipulative human rather than a precise, predictable machine. They're learning that you can't simply tell a computer "I'm not interested" and expect it to respect that choice. Instead, you must engage in a perpetual dance of "not now, please" - only to face the same prompts again and again.

]]>
https://rakhim.exotext.com/they-made-computers-behave-like-annoying-salesmen hacker-news-small-sites-42711071 Wed, 15 Jan 2025 14:24:25 GMT
<![CDATA[Free YouTube Transcript Extractor]]> thread link) | @chendahui007
January 15, 2025 | https://www.uniscribe.co/tools/youtube-transcript-extractor | archive.org

Easily convert a youtube video to transcript, copy and download the generated youtube transcript in one click.

💰

Free Extraction

Extract complete video transcripts without spending a dime.

Fast & Reliable

Extract transcripts instantly and export in multiple formats including .txt, .srt, .vtt, and .csv

Frequently Asked Questions

]]>
https://www.uniscribe.co/tools/youtube-transcript-extractor hacker-news-small-sites-42710117 Wed, 15 Jan 2025 12:25:42 GMT
<![CDATA[How I Created a Repository of My Life]]> thread link) | @beka-tom
January 15, 2025 | https://tomashvili.com/posts/The-Art-of-Archiving-How-I-Created-a-Repository-of-My-Life | archive.org


This is the story of how I created a repository of my life.

It’s been a while since I first discovered the Zettelkasten method and Obsidian. The Zettelkasten method is a way of organizing and linking notes, and Obsidian’s features—like backlinks, graph view, and easy linking of notes—make it an excellent tool for this method.

Like many others, I used Evernote, Bear, and Notion before discovering this methodology. After migration to this methodology I have completely changed how I organize my knowledge base. Over time, I realized I had totally transformed my habits for saving notes, tasks, bookmarks, books, project notes, images everything you can imagine. I had a single repository for my digital life, where I could find anything I needed.

Eventually, I began searching for something similar that I could use directly within VS Code. Why not use a tool where I already spend most of my time? After some searching, I discovered Foam, and I completely fell in love with it. Now, I feel more productive and organized than ever before.

One of the significant advantages of using Foam is that you can leverage the many useful extensions available for VS Code. Here are my favorite extensions to use with a Foam workspace:

FOAM

Now, imagine having all your notes, tasks, bookmarks, and everything else in one place—in one repository, in one workspace, all stored on your local Git. Picture having a timeline history for each file, where you can see when it was created, modified, deleted, or when a new line was added.

Note: ☝🏻 Git, not GitHub. I prefer to have everything on my local machine for security reasons.

Another cool feature you can use in your workspace is chatting with your knowledge base using Copilot. What if you could ask it questions like:

  • Show me my completed milestones from August 2024.
  • Show me all tasks with the tag #suada.
  • What did I do on August 1, 2024?

Isn’t that cool? Now, imagine making these connections for years and looking at your graph view to see how everything is interconnected.

Lastly, you might ask about mobile sync. Well, I don’t care much about it because my workstation is my laptop. However, you can use iCloud to sync your workspace with your mobile device. There’s also a free tool called PreText that allows you to open markdown files directly on your mobile device.

Image

💭 Want to leave feedback? Send me an email or respond on X.

]]>
https://tomashvili.com/posts/The-Art-of-Archiving-How-I-Created-a-Repository-of-My-Life hacker-news-small-sites-42709966 Wed, 15 Jan 2025 12:04:24 GMT
<![CDATA[Found this newsletter: an algorithm finds trading "Zones"]]> thread link) | @fraromeo
January 15, 2025 | https://n.tradingplaces.ai/p/macro-chaos-14-jan-2025 | archive.org

We’ve been getting hit with a flurry of macro news left and right.

Last week it was the December 2024 NFP report. This week, it’s 10-year Treasury yield rates and the upcoming CPI report.

So far, all it’s brought the market is a whole lot of fear, doubt, and general twitchiness.

But here’s the thing.

All this macro chaos doesn’t have to throw you off your game. With a structured long/short approach—like using zones—you can spot opportunities no matter what the market throws at you.

And speaking of opportunities, we’ve got 3 new ones for you below.

Let’s dive in.

What’s in this issue:
• Zone recap
• Who we are
• This week’s three new hot zones
• What are zones?

But first, let’s revisit…

Every week, we highlight stocks nearing or entering Hot Zones, i.e. key levels with favorable risk-reward setups. Subscribe to stay updated!

Missed last week’s zone alert? Here’s a quick recap:

Good is bad, bad is good (Jan 7 2025)

Waste Management, Inc. (WM)

This stock had a lot of things going for it when our scanner first picked it up:

  • It was at the June 3 zone which was ripe with reversals

  • It entered the zone at an oversold RSI

  • Its earnings estimates looked positive

So where are we currently with WM?

Despite the recent slew of red market days, the stock has maintained its bounce, and is now crossing the February 28 zone. If momentum holds, this could push toward the first target up, which is the July 19 zone.

Additionally, we would like you to turn your attention to one potential WM play whose entry and exit prices were precisely marked by our zones:

If you were able to time this properly, congrats on a quick 5% gain.

Amgen, Inc. (AMGN)

Similarly, AMGN has managed to maintain some semblance of optimism even with last week’s bloodbath.

a close up of a person 's hand reaching out from a chain in a fire .

Yesterday, it finally broke out of its July 2020 zone with a strong green candle. This opens the door for a move toward the next key zone: the strong resistance at 273-276.

But if this bullish momentum continues, the November 8 2022 zone remains a realistic take-profit target.

Welcome to Trading Places.

We’re just a bunch of market nerds, quants, and posers who’ve stared at enough charts we dump our portfolios at the sight of a menorah.

After years of convincing ourselves that the lines and shapes we were plotting actually meant something, we finally figured it was time to upgrade our shtick a tiny bit.

So now, we get quant intelligence to do it for us.

We built an algorithm that’s deaf to the market’s siren songs. It cuts through the BS and pinpoints Zones of interest, i.e. places on the chart where actual money comes to dance.

Think of it as a Limitless pill for your stock, currency, and crypto plays—scanning the markets in real-time and determining where the action’s at.

YARN | Why do people keep saying it's like "Limitless"? | Superstore (2015)  - S02E09 Black Friday | Video gifs by quotes | 413d0553 | 紗

Here are some of the most promising stocks our zone scanner flagged today:

Technology • Software - Application • USA • NYSE

First of all, allow us to add some context to CRM’s November 2021 zone:

This zone acted as a double-top resistance, causing pullbacks of roughly -59% and -33% when price approached it.

Moreover, it required a whole lot of buying volume to break through this resistance, with RSI hitting 83 on the date of penetration.

This is only the second time it will be tested as a support (the previous retest triggered a 9% gain)—and the first without being in overbought territory.

This could be a good rinse-and-repeat setup, with a possible first target being the November 12 zone.

Consumer Defensive • Packaged Foods • USA • NASD

Yet another strong support we’re seeing here.

KHC is currently on its May 2020 zone—a zone that has not only remained celibate since its inception, it has also very strongly rejected each and every penetration attempt in the past. These rejections have led to 28% and 56% bounces.

After more than 4 years, its virginity was once again tested—and it has so far confirmed that it is indeed still a prude.

With incredibly strong valuations—better than its historical averages, its sector peers, and the broader market, it won’t be surprising if this attracts institutional buyers.

In the shorter term, yesterday’s bounce could serve as a catalyst for a climb back to the March 2019 zone—offering a roughly a 10% gain.

NEE has all the makings of a good zone play:

It’s on top of a very old, very strong support. RSI is at a paltry 24. Every retest of the zone has resulted in sharp bounces…

Its most recent retest, despite two consecutive days of strong selling volume, ended with a red hammer…

With the next target up being the zone above, which could net a potential 8-14%.

(Or at least as close to perfect as it gets.)

Zones are key price levels where the market has reacted strongly in the past—such as sharp reversals or sudden swings.

Trading Places dashboard

They’re areas where actual supply and demand met in the past, and likely will meet again.

“Why are these significant?”

Well, it all comes down to three key principles. We like to call them The Principles of:

  1. When I Dip, You Dip, We Dip (aka psychology)

Traders are aware that others are watching these levels (zones) too. With everybody paying attention, this creates a self-fulfilling prophecy where everybody acts in anticipation of everybody else’s actions.

  1. Markets Gonna Market ¯\_(ツ)_/¯ (aka technical factors)

If the first price rejection at the top of a zone was violent, it’s likely that buyers who entered at that level are now holding losses.

But with each retest, the rejection weakens, as there are fewer buyers remaining underwater. This weakens that resistance (or support for all you short-sellers), and could eventually lead to a break through.

  1. Killer Whales (aka institutional plays)

Big players need liquidity in order to place massive orders without moving the market against themselves. So they wait for these zones, knowing a lot of us small fry (retail traders) will come to play.

This allows them to buy low or sell high without causing a lot of waves.

But remember: Zones are NOT guarantees but rather regions of increased probability for market moves. So always, ALWAYS use proper risk management.

Stop obsessively refreshing your charts like it’s your ex’s Instagram.

By combining historical patterns with real-time market data, Trading Places identifies zones and assigns probabilities to each one—helping traders spot potential plays with higher chances of success.

It automates all of the curation, chart-plotting, and alerting for you, so you can actually have a life (or at least pretend to)!

Stay tuned!

Disclaimer: This isn't financial advice. This shouldn’t be news to you.

Discussion about this post

]]>
https://n.tradingplaces.ai/p/macro-chaos-14-jan-2025 hacker-news-small-sites-42709946 Wed, 15 Jan 2025 12:00:47 GMT
<![CDATA[The algorithmic framework for writing good technical articles]]> thread link) | @JeremyTheo
January 15, 2025 | https://www.theocharis.dev/blog/algorithmic-framework-for-writing-technical-articles/ | archive.org

Writing technical articles is an art - this is simply wrong. While writing technical articles may feel like an art to some, it’s primarily a methodological process that anyone can learn.

The methodical process of crafting effective technical articles has been refined over centuries—from the classical rhetoric of Aristotle and Cicero to modern research on creativity, cognitive load, and working memory. You’ve likely seen the results of this process in action, even if the authors weren’t consciously following a specific framework.

Here a quick selection and analysis I did based on my latest bookmarked HackerNews articles:

  1. David Crawshaw’s article “How I Program with LLMs”: While it doesn’t strictly adhere to an algorithmic framework, this piece embodies key aspects of classical rhetoric. It features a clear structure, signposting with the statement, “There are three ways I use LLMs in my day-to-day programming.” Additionally, the article concludes with a clear call to action by introducing his project, sketch.dev, encouraging readers to explore it further.
  2. The post “State of S3 - Your Laptop is no Laptop anymore - a personal Rant": This article follows a classical rhetorical structure by providing a clear introduction, a main body with three arguments and rebuttals, and a closing that emphasizes the need for action—urging readers to “state your refusal” regarding the current state of laptop standby functionality.

These examples demonstrate that even without following a specific algorithmic framework, effective technical writing often naturally aligns with classical rhetorical principles and the same methodological process.

This methodological process can be formalized further in a “algorithmic” framework for writing effective technical articles:

  1. Introduction:
    • Hook: Capture attention with a compelling opening.
    • Ethos: Establish credibility and context.
    • Subject: Define the topic or problem being addressed.
    • Message: Present the central insight or analogy.
    • Background: Provide context or explain why this is relevant now.
    • Signpost: Outline the article structure (e.g., ArgA, ArgB, ArgC).
    • Light Pathos: Subtle emotional appeal tied to the reader’s goals.
    • Transition: Smooth segue into the main content.
  2. Argument A, B, and C:
    • Claim: State the main point/step/key concept.
    • Qualifiers: Acknowledge any limitations to your claim.
    • Grounds: Provide evidence or examples supporting the claim.
    • Rebuttals: Address potential counterarguments.
    • (Optional) Warrants and Backing: State and back-up underlying assumptions if needed.
    • Transition: Bridge to the next argument or section.
  3. Conclusion:
    • HookFinish: Return to the hook for closure.
    • Summary: Summarize the main points of the article.
    • Message: Reinforce your central insight or analogy.
    • Strong Pathos: Final emotional appeal to motivate action.
    • CTA: End with a clear, actionable step for the reader.

Think of writing an article as applying an algorithm: define your inputs, process them through a sequence of logical steps, and arrive at the output. Each step corresponds to a clear function or subroutine.

To prove that it actually works, I’ll apply it to this article, so you can watch each step unfold in real time.

I frequently publish technical articles in the IT/OT domain, some of which have been featured on HackerNews as well. Many colleagues and peers have asked how I manage to produce well-researched technical content alongside my responsibilities as a CTO. Part of my role involves sharing knowledge through writing, and over time, I’ve developed an efficient method that combines classical rhetorical techniques with the use of LLMs (o1 <3). This approach allows me to quickly craft articles while maintaining quality and clarity, avoiding common pitfalls that can arise with AI-generated content.

I wrote this article to formalize my process—not only to streamline my own writing but also to assist others who have valuable insights yet struggle to share them effectively within our industry. By outlining this framework, I hope to help others produce impactful technical articles more efficiently.

Fig. 1: Somewhat relevant xkcd #1081

Fig. 1: Somewhat relevant xkcd #1081

In the tradition of classical rhetoric, I’ll present three core steps (or also called “arguments”):

  1. Lay the Foundation, which shows how to find the three main steps/key concepts/arguments based on your subject, message and call-to-action, using creativity and filter techniques
  2. Build Rhetoric, which uses classic concepts like ethos, pathos, logos to shape your arguments into a compelling form; and
  3. Refine for Readability, which uses techniques inspired by modern research to ensure that your work can be understood by the reader and sparks joy.

Mastering this approach doesn’t just make writing easier-it can advance your career, earn you recognition from your peers, and position you as a thought leader in your field.

Ready to see how it all comes together? Let’s start by setting up our essential inputs-so you can experience firsthand how formulaic thinking simplifies the entire writing process.

A. Lay the Foundation (Your Original Ideas)

// LayTheFoundation defines core input parameters
func LayTheFoundation() (
    subject string,
    message string,
    cta string,
    argA, argB, argC Argument,
)

To run an algorithm, you generally need to identify all the input parameters before calling the function.

However, it’s possible to start with default or partially defined parameters and refine them iteratively (see also chapter 3). Ultimately, for the algorithm to produce accurate and reliable results, all inputs should be clearly defined by the final iteration.

Similarly, when writing technical articles, defining your key ‘inputs’—your subject, call-to-action, and message—is essential. After establishing these inputs, you’ll generate a range of potential angles (your arguments), then converge on the top three that best support your message. Finally, you’ll back them up with evidence and address alternative methods or counterarguments.

A.1. Define Your Inputs (Subject, CTA, Message)

Before you can start writing the article, you need to establish three key inputs that will drive your entire article.

A.1.1. Subject

subject = "algorithmic framework for writing effective technical articles"

This is the core issue or problem you want to address. It should be clear, focused, and relevant to your audience.

LLM Advice

Must be done manually, do not use LLM here!

A.1.2. Call-to-Action (CTA)

cta = "Apply this algorithmic framework to your next technical article and experience how it transforms your writing"

This is the specific action you want your readers to take next.

As a CTO, most of my articles subtly nudge readers toward using our product. Drawing from my experience, I’ll focus this subsection on crafting effective CTAs that align with promotional goals while maintaining value for the reader.

However, the CTA isn’t limited to promotional goals—it works just as well for purely informational pieces. You could guide readers to “check out my bio,” “watch my latest conference speech,” or “try out this open-source project.”

Avoid generic marketing prompts like “Contact us now!”—they often clutter websites and turn off technical readers who want more substance first.

Reading an article rarely convinces introverted technical readers to make a phone call. It’s more effective to offer a smaller, relevant next step-like a link to deeper content, a demo, or a GitHub repo. Over time, they may trust you enough to seek further interaction.

LLM Advice

Must be done manually, do not use LLM here!

A.1.3. Message

message = "Writing is applying an algorithm."

This is the central insight or theme of your article in a short sentence, ideally six words or less and free of clichés. Include a subtle rhetorical device if it fits naturally.

By defining these inputs, you clarify your input parameters for the next sub-routine to gather “key concepts”, “steps” or how they are called in classic rhetoric: “arguments”

LLM Advice

Must be done manually, do not use LLM here!

A.2. Define your Arguments

From the previous section, we have defined the inputs. Now we can derive the arguments from them.

A.2.1. An argument is a claim supported by reasons

Let’s first talk about the elephant in the room:

“I’m writing a step-by-step guide or informational piece; I don’t need arguments.”.

It’s easy to think that arguments are only necessary for debates or persuasive essays. However, at its core, an argument is simply a claim supported by reasons 1. This means that effective communication—regardless of format—involves presenting claims and supporting them with reasons.

Here’s the key takeaway: Regardless of what you call them—arguments, key concepts, steps, or insights—the underlying principle is the same. Effective communication relies on structuring information in a way that supports the overall message and helps the audience grasp the content.

In technical writing, these “arguments” manifest in various forms depending on the type of content you’re creating:

  1. Tutorials: Each step is a claim about what the reader should do, supported by reasons explaining why this step is necessary.
  2. White Papers: Each supporting point is a claim about industry trends or product benefits, substantiated by research and analysis.
  3. Explanatory Articles: Key concepts or clarifications are claims about how something works, supported by detailed explanations.
  4. Case Studies: Phases of a project or results-driven highlights are claims about actions taken and their outcomes, backed by real-world evidence.

A.2.2. The Toulmin System for Analyzing and Constructing Arguments

In modern rhetoric, the Toulmin System2 is a valuable tool for analyzing and constructing arguments. It helps break down arguments into their essential components, making them clearer and more persuasive.

In this chapter, we will focus on identifying the major points for each argument:

  • Claim: The main point or position that you’re trying to get the audience to accept.
  • Grounds: The evidence—facts, data, or reasoning—that supports the claim.

Some claims might have underlying assumptions, so we will also define them:

  • Warrant: The underlying assumption or principle that connects the grounds to the claim, explaining why the grounds support the claim.
  • Backing: Additional justification or evidence to support the warrant, making it more acceptable to the audience.

Additionally, acknowledging possible counterarguments strengthens your position:

  • Rebuttal: Recognition of potential counterarguments or exceptions that might challenge the claim.

A.2.3. Gather Arguments (Divergent Thinking)

// GatherArguments collects raw brainstorming output (unfiltered ideas).
func GatherArguments(subject, message string) (rawArgs []string)

Good and original articles usually present novel and applicable, and therefore also creative ideas.

In modern scientific literature 3, creativity is often divided into divergent and convergent thinking—first, you expand your pool of ideas (divergent), then you narrow them down (convergent) .

This is sometimes referred to as the “double diamond model” (Fig. 2) 4, which visualizes the process as two connected diamonds, each representing a divergent (expansion) and convergent (narrowing) phase.

Fig. 2: The Double Diamond is a visual representation of the design and innovation process. It’s a simple way to describe the steps taken in any design and innovation project, irrespective of methods and tools used.

Fig. 2: The Double Diamond is a visual representation of the design and innovation process. It’s a simple way to describe the steps taken in any design and innovation project, irrespective of methods and tools used.

This is exactly what we will do in the next steps.

A.2.3.1. Choose a Creativity Technique

First, use a creativity technique to brainstorm at least six different arguments, steps, or key concepts for your article. The specific method you choose will depend on your style, but brainstorming is often the easiest and most effective option. Some other accepted techniques include mind mapping, 6 thinking hats, or morphological boxes5.

And yes, creativity can be systematized - techniques such as brainstorming or brainwriting are widely recognized and used in practice, although empirical evidence of their effectiveness is limited and often qualitative in nature 5 6

A.2.3.2. Gather a Broad Range of Ideas

rawArgs = {
  "Define inputs first",
  "Explore structure and prose (introductions, transitions, signposts)",
  "Emphasize original insights (why new facts matter)",
  "Discuss LLM usage (can AI help with clarity?)",
  "Refine readability (bullet points, visuals, concise language)",
  "Address counterarguments (alternative methods, pitfalls)",
  "Cats with laser eyes"
}

At this stage, anything goes. Your goal is to diverge - to generate as many ideas as possible without worrying about relevance, overlap, or feasibility.

Fig. 3: Going crazy here is important for creativity

Fig. 3: Going crazy here is important for creativity

Some ideas may seem redundant, irrelevant, or even silly. That’s okay - it’s part of the process. The goal is quantity, not quality. Refinement comes later in the convergent thinking phase.

LLM Advice

Can be supported by an LLM, but the main points must come from you as the author.

A.3. Cluster and Filter (Convergent Thinking)

// ClusterAndFilterArguments applies clustering and the "MECE" principle.
func ClusterAndFilterArguments(
    subject string, 
    message string,
    rawArgs []string,
) (argA Argument, argB Argument, argC Argument)

By brainstorming freely, you’ve created a pool of ideas from which to draw. Now let’s converge those ideas: first, we group them, check if they are MECE, narrow them down to three arguments, and add evidence.

clusters = {
  "Inputs and foundational concepts.",
  "Structuring content for clarity",
  "Ensuring readability and engagement",
  "LLM usage",
  "Weird stuff",
}

Look for patterns or themes that can be combined. For example, if you’ve brainstormed steps for a tutorial, some might logically fit into broader categories. If you’ve generated supporting points for a white paper, identify themes or recurring concepts.

If “cats with laser eyes” doesn’t support your message, drop it or mention it briefly as a playful anecdote.

LLM Advice

Can be supported by an LLM, but main clusters may come from you as the author.

A.3.2. Apply the MECE Principle

func AreArgumentsMECE(argA Argument, argB Argument, argC Argument) (bool, bool, bool)

To further refine your clusters, use the MECE (Mutually Exclusive, Collectively Exhaustive) approach. This approach gained popularity in consulting firms, especially McKinsey, to ensure no overlap and no gaps7.

1. Mutually exclusive Make sure each item covers a unique idea. Avoid overlap, which can confuse the reader or make your argument repetitive.

Example:

  • “How to use LLMs” and “Making it pretty” overlap because LLMs can help refine prose. These should either be merged or excluded to avoid redundancy.

2. Collectively exhaustive Together, your points should cover all critical aspects of your article’s message. Avoid leaving any part of your topic, CTA, or message unsupported.

For example:

  • If the message is “Writing is like coding,” then arguments about readability and structure are critical. However, a stand-alone discussion of LLMs might be tangential unless it directly supports the main topic.

Tip: As you refine, keep asking yourself: Does each argument stand on its own? Together, do they fully support my message?

LLM Advice

Can be supported by an LLM with the prompt “please analyze whether the given arguments are MECE and provide the most critical review while still remaining objective”

A.3.3. Narrow Down to Three Claims

argA.claim = "Lay the Foundation (Your Original Ideas)"
argB.claim = "Build Rhetoric (Your Logical Structure)"
argC.claim = "Refine for Readability (Why It Sparks Joy)"

Once your clusters are refined, select the three most important arguments. If you are writing a tutorial and have more than three steps, obviously don’t remove steps. Instead, continue clustering related ideas until you have three overarching categories of steps.

Why focus on three main points? Readers' working memory has its limits—typically, people can hold about 3-4 items in mind at once (see also Chapter 3). By limiting your main arguments to three, you make it easier for readers to follow and remember your key points without overwhelming them.

LLM Advice

Needs to be done manually

A.4. Add Grounds and Rebuttal

// AddGroundsAndRebuttalForArgument enhances each argument
// with supporting evidence (grounds) and addresses potential
// counterarguments (rebuttal) to strengthen your overall case.
func AddGroundsAndRebuttalForArgument(arg Argument) (revisedArg Argument)
argA.grounds = [
    "An argument is a claim supported by reasons [Ramage et al., 1997].",
    "Creativity research distinguishes between divergent and convergent thinking (Zhang et al., 2020).",
    "Over 100 creativity techniques exist, but few are backed by robust empirical studies (Leopoldino et al., 2016)."
]

argA.rebuttals = [
    "Why not just skip divergent thinking and start writing? (Counterargument: leads to tunnel vision)",
    "Why exactly three points? That feels artificial (Counterargument: limited working memory and cognitive load studies)"
]

Now that you’ve identified your three main points, back them up with data, anecdotes, or examples. If you’re writing a tutorial, explain why each step is important and how it contributes to the overall goal. Address potential alternatives or common misconceptions to reinforce the validity of your approach.

Types of Evidence to Support Your Arguments 1:

  • Personal Experience: Share relevant experiences that illustrate your point.
  • Observation or Field Research: Include findings from firsthand observations or research.
  • Interviews, Questionnaires, Surveys: Incorporate data gathered from others.
  • Library or Internet Research: Reference credible sources that support your claim.
  • Testimony: Use expert opinions or eyewitness accounts.
  • Statistical Data: Present statistics to provide quantifiable support.
  • Hypothetical Examples: Offer scenarios that help illustrate your argument.
  • Reasoned Sequence of Ideas: Use logical reasoning to connect concepts.

By enriching your arguments with evidence and addressing counterarguments, you make your content more convincing and comprehensive.

LLM Advice

Can be enriched by an LLM by providing background knowledge, e.g. “I remember that this framework has a lot of similarities to classical rhetoric. Can you compare my framework to classical rhetoric and provide background information? Check for hallucinations and link sources!”

A.5. (Optional) Add Warrants, Backing and Rebuttal

argA.warrants = [
	"Defining key inputs is essential in writing, just as identifying variables is crucial in solving mathematical problems."
]

argA.backings = [
    "The Toulmin System formalizes arguments, enhancing clarity and persuasiveness.",
]

In many cases, successful arguments require just these components: a claim, grounds, and a warrant (sometimes implicit). If there’s a chance the audience might question the underlying assumption (warrant), make it explicit and provide backing.

By stating warrants and providing backing, you reinforce the connection between your grounds and your claim, making your argument more robust.

A.6. Conclusion

By defining your inputs, brainstorming various angles, and selecting the top three MECE (Mutually Exclusive, Collectively Exhaustive)-compliant arguments—each backed by evidence—you establish a solid foundation for a compelling article. Applying the Toulmin System ensures that your arguments are clear, logical, and persuasive.

However, even the best ideas won’t resonate if they’re hidden in a tangle of bullet points. Let’s explore how to create a logical flow that keeps your audience engaged. In Argument B, we’ll identify additional inputs and begin integrating them into a cohesive narrative.

B. Build Rhetoric (Your Logical Structure)

// BuildRhetoric constructs rhetorical elements
func BuildRhetoric(
    subject, message, cta string, 
    argA, argB, argC Argument,  
) (
    hook string, 
    hookFinish string,
    ethos string,
    background string,
    signpost string, 
    lightPathos string,
    strongPathos string,
    transitionIntroMain string,
    summary string,
    updatedArgA, 
    updatedArgB, 
    updatedArgC Argument,
)

Building rhetoric to “glue together” your arguments, established in the previous chapter, is typically a repeatable and well-defined process, much like a function in your code.

While there may be exceptions when crafting a rhetorical masterpiece involving intricate metaphors or unique stylistic devices, for most technical articles, following a structured approach ensures clarity and effectiveness.

For centuries, rhetoricians like Aristotle and Cicero have emphasized the importance of structure in persuasive communication. Cicero’s famous works 8 break down any speech into distinct parts, from the initial hook (exordium) to the closing appeal (peroratio):

  1. Exordium: Captures attention and establishes credibility (ethos).
  2. Narratio: Outlines the topic and provides necessary background.
  3. Divisio: Highlights the structure of the argument and prepares the audience for what’s to come.
  4. Confirmatio: Presents the main arguments with supporting evidence (logos).
  5. Refutatio: Preemptively addresses counterarguments or opposing views.
  6. Peroratio: Closes with a memorable emotional appeal (pathos) and reiterates the main point.

Why use rhetoric at all? Simple: A well-structured article ensures that readers can follow your logic without getting lost. Elements like background (narratio) and signposting (divisio) help readers break down complex topics into manageable parts, making the content easier to understand. Summaries and well-crafted transitions between arguments further aid readers in memorizing your key points.

In this chapter, we’ll map our inputs and outputs from Argument A (Subject, Message, and CTA) onto a rhetorical framework inspired by the classical masters. We’ll see how a strong hook grabs attention early, how concise transitions keep readers on track, and how a final “hook finish” underscores your main point.

B.1. Hook (Exordium)

hook = "Writing articles is an art - this is plain wrong. ..."

Purpose: Immediately capture the reader’s attention (exordium). In classical rhetoric, the exorcism is crucial - Aristotle argued that you must first secure the audience’s goodwill and attention.

Position: Introduction

Three rules:

  • Keep it short and intriguing.
  • Avoid cliched “clickbait” lines.
  • Optionally include a personal anecdote, but don’t dwell on biography.

LLM Advice

You need to come up with your own hook, and then let the LLM just fix the grammar and spelling for you. Otherwise, the LLM tends to be clickbait.

B.2. Hook Finish

hookFinish = "See? By applying a clear, repeatable process, we’ve shown that ..."

Purpose: To repeat the hook at the end of the article, giving a sense of closure. In classical rhetoric, returning to your opening statement helps the audience feel that the piece has come full circle.

Position: Outro

LLM Advice

Can be generated by the LLM based on your hook.

B.3. Ethos

ethos = "The methodical process of crafting effective technical articles has been refined over centuries ..."

Purpose: To establish trust or credibility. In classical rhetoric, founded by Aristotle 9, ethos demonstrates why the audience should care about your perspective.

Position: Introduction

LLM Advice

Must be done by hand, but the LLM can help you correct grammar and spelling.

B.4. Background (Narratio)

background = "To prove it actually works, I’m applying it to this piece so you can watch each step unfold in real-time.

I frequently publish technical articles in the IT/OT domain, some of which have been featured on HackerNews as well. Many colleagues and peers have asked how I manage to produce well-researched technical content alongside my responsibilities as a CTO.
"

Purpose: To provide context for why you wrote this article. In classical rhetoric, the narratio sets the stage by explaining the situation or problem.

Position: Introduction

LLM Advice

Must be done by hand, but the LLM can help you correct grammar and spelling.

B.5. Signpost (Divisio)

signpost = "Following the tradition of classical rhetoric, I’ll present three core arguments: ..."

Purpose: Introduce your main points or steps, and let the reader know how the article is organized. If you do not use it, you run the risk of losing the reader during the article (especially if the article is longer). See also chapter 3 for scientific background.

Position: Introduction

Advice: “There are three key arguments…” or “We’ll break this problem down into three steps…”

LLM Advice

Can be fully generated by the LLM

B.6. Strong Pathos

strongPathos = "Don’t let your innovative ideas get lost in subpar articles. ..."

Purpose: Pathos in classical rhetoric persuades by appealing to the audience’s emotions. It’s a final push to motivate action. Should be related to your CTA.

Position: Outro

LLM Advice

Needs manual input, LLM can correct grammar and spelling

B.7. Light Pathos

lightPathos = "Mastering this approach doesn’t just simplify writing—..."

Purpose: To create a subtle emotional pull at the beginning of your article.

Position: Introduction

LLM Advice

Can be derived from strong pathos

B.8. Transition from the intro into main

transitionIntroMain = "Ready to see how it all comes together? ..."

Purpose: A bridging sentence from the introduction to the body. Helps readers understand when the introduction ends and the main arguments begin.

Position: Introduction

LLM Advice

Can be completely generated by LLM

B.9. Updated Arguments with Intro, Summary and Transition to the next argument

argA.qualifier = "This can also happen with “unclear defined” input parameters or default parameters ..."

argA.context = "Similarly, when writing technical articles, defining your key 'inputs'—your subject, ..."

argA.summary = "By defining your inputs, brainstorming various angles, and selecting the top three MECE-compliant arguments—..."

argA.transitionA_B = "However, even the best ideas won't resonate if they're hidden in a tangle of bullet points. ..."

In the Toulmin Model arguments are broken down into key components. Previously we identified the claim, grounds, rebuttal, warrants, and backing. Now we add the remaining elements:

  1. Context: Provide background to prepare the reader. If the argument is long, add small signposts that prepare the reader for the evidence he will then see.
  2. Qualifier (Scope and Limitations): Acknowledge any limitations to your claim. Use qualifiers to indicate the strength of your claim (e.g., “typically,” “often,” “in most cases”).
  3. Summary: Recap the key points made in the argument, reinforcing how the evidence supports the claim.
  4. Transition: Connect to the next argument or concluding section, easing the reader’s cognitive load and preparing them for what’s to come.

Context, summary and transitions aren’t just filler! They help the reader “offload” details from working memory. In Chapter 3, we’ll explore exactly why this is important, by going into the science of cognitive load and memory limits. We’ll show how these introductions, summaries, and transitions keep readers focused on the key points.

LLM Advice

Can be completely generated by LLM

B.10. Conclusion

By applying classical rhetoric, you turn ideas into a compelling narrative. A strong hook captures attention, ethos builds credibility, and background provides context. Signposts, summaries, and transitions ensure logical flow, while strategic pathos engages readers emotionally.

Next, we’ll pull these threads together into a polished, persuasive piece.

C. Refine for Readability (Why It Sparks Joy)

// FinalizeArticle combines all elements
func FinalizeArticle(
    // ... inputs from previous functions ...
) (
    finalizedArticle string,
)

This final step focuses on readability, bridging the gap between your carefully crafted logic and your readers' actual ability to process it.

Readability is always important, though the degree of simplification may vary depending on your audience’s expertise. Even with highly specialized or technical readers, clear and accessible language enhances understanding and engagement.

Because by now you have all your rhetorical “ingredients”-the big ideas (arguments A & B & C) and the rhetorical framework. But even a perfect structure will fail if it’s buried in 5,000-word paragraphs or stuffed with excessive jargon.

Fig. 4: Somewhat relevant xkcd #2864

Fig. 4: Somewhat relevant xkcd #2864

C.1. Apply Algorithm

// ApplyAlgorithm takes in everything from LayTheFoundation() and BuildRhetoric() 
// (like subject, message, CTA, rhetorical elements) and merges them into a coherent draft.
func ApplyAlgorithm(
    ... // everything from LayTheFoundation() and BuildRhetoric() 
) (
    articleDraft string,
)

Now we have all input parameters, so we can execute our algorithm to get a good article draft. You can find the full algorithm/template at the beginning of this article.

C.2. Increase Readability

// IncreaseReadability applies style and formatting
func IncreaseReadability(
    articleDraft string,
) (
    finalizedArticle string,
)

Readability is the ease with which a reader can digest your text. This depends on understanding that humans have limited working memory and cannot juggle a dozen new concepts at once 10.

Recent reviews on working memory 11 and cognitive load 12 13 highlight that 3-4 items is the upper limit that most people can hold in working memory before overload sets in.

Introducing complex or specialized terms without adequate explanation increases extraneous cognitive load—the unnecessary mental effort imposed by the way information is presented, not by the content itself.

LLM Advice

An LLM can help identify complex terms in your writing and suggest definitions or simpler alternatives to improve clarity and reduce cognitive load.

C.2.1. Cognitive Load and Working Memory

Working memory is the system responsible for temporarily holding and processing information in our minds. It has limited capacity and if too much information is presented at once, it can overwhelm this capacity, leading to confusion and decreased comprehension.

Cognitive load theory explains how different types of load affect our ability to process information 13:

  • Intrinsic Load: The inherent complexity of the material itself (e.g., intricate concepts or detailed procedures).
  • Extraneous Load: The additional burden imposed by the way information is presented (e.g., poor organization, unnecessary jargon).
  • Germane Load: The mental effort required to process, construct, and automate schemas (e.g., applying knowledge to problem-solving).

Technical articles usually already have a high intrinsic load, so we need to make sure that we keep the extraneous load as low as possible and reduce the intrinsic load to make room for the germane load.

Optimize Through Content Organization

Congratulations! Because you’ve already made your arguments MECE in the first chapter of this article, you’ve already reduced the reader’s extra mental effort, because now he doesn’t have to decide where each piece of information belongs.

And by using the signpost, they don’t have to keep track of what comes next.

Finally, you also incorporated rehearsal to promote germane load (i.e., beneficial active processing) and prevents accidental overload by giving the reader a mental break. When dealing with complex topics, you need rehearsal to make the knowledge stick. Rehearsal can be as simple as a brief summary at the end of one argument or a bridging transition at the beginning of the next. These elements help the reader “empty” their working memory of the old information before loading the next chunk. The summaries and transitions between arguments do just that.

You can further reinforce your structure with clear headings, but we’ll see more about formatting in the next chapter.

Eliminate Unnecessary Complex Words and Explain Terms Step-by-Step

Even a perfectly chunked, three-argument structure can fail if the text is littered with obscure jargon that forces the reader to go back and forth (extraneous load). A good rule is to introduce technical terms only when absolutely necessary, and to define them succinctly on the spot.

Don’t get me wrong: it’s perfectly fine to use jargon! Just make sure it is necessary. And be really sure that your audience knows it, not all “standard” jargon is universal across sub-domains.

For example: If you use the formal term “conditional statement” in a Go tutorial, clarify with a quick note that it refers to “if-statements”—a term that’s often misunderstood by juniors (I’ve lost count of how many times I’ve heard “if-loops”).

This principle applies even if you are writing for a technical audience. Not everyone has the same background. Format them so that advanced readers can quickly skip them.

Example: MQTT explanation

Bad:

MQTT is an OASIS standard messaging protocol for the Internet of Things (IoT). It is designed as an extremely lightweight publish/subscribe messaging transport that is ideal for connecting remote devices with a small code footprint and minimal network bandwidth.14

Potential confusion: “What is OASIS?” “What does publish/subscribe mean?” “Why should I use it?”

Better:

MQTT (Message Queue Telemetry Transport) is a protocol designed for communication between devices. It uses a publish/subscribe architecture, where devices (publishers) send messages to a central message broker. Other devices (subscribers) that are interested can then receive the messages from the broker.

Compared to other architectures, publish/subscribe allows for near real-time data exchange and decoupling between devices, which makes it easy to add or remove devices without disrupting the network.

Among publish/subscribe protocols, MQTT is simple and lightweight. This allows millions of low-power, memory-constrained devices—common in Internet of Things (IoT) applications to communicate with each other.

Here, the definitions arrive exactly when needed, not hidden in footnotes or introduced 20 paragraphs later. This reduces extraneous load.

C.2.2. Formatting

Implementing good formatting goes beyond aesthetics-it reduces extraneous cognitive load by helping readers see where they are, what’s important, and what’s next. Below are best practices drawn from common technical writing standards and research-tested web guidelines 15.

1. Sentence & Paragraph Length

Aim for a range of sentence lengths. Short, punchy lines maintain momentum, while a few medium or longer sentences provide depth. Too many uniform or run-on sentences can make text either choppy or difficult to follow.

Keep paragraphs concise. Readers often skim or read on mobile devices, so large blocks of text can be overwhelming. Try to keep paragraphs to a few sentences at a time-this is often called a “scannable style.

Use active voice instead of passive voice.

Eliminate meaningless words and phrases. Some come from generative AI or are due to “fluff”. Phrases like “in order to” or “basically” can often be cut because they add no real meaning.

2. Structural Cues and Headings

In earlier chapters, you identified your main arguments (MECE and no more than three). Each argument deserves a clear heading (H2) and subheadings (H3) for supporting facts.

3. Use visual elements such as graphics, pictures, and images to break up large blocks of text.

But avoid images that are “busy,” cluttered, and contain too many extraneous details, and don’t place text around or on top of them to the point of distraction. Label images appropriately so that the reader is not left guessing what a diagram or code snippet represents.

4. Use bullets, numbers, quotes, code blocks, and white space to break up large blocks of text.

Break up large blocks of text by using:

  1. Bullets or
  2. Numbers to structure your content clearly and concisely.

You can use such quotes to highlight quotes from external sources or important statements and further break up the text.

fmt.Println("Since we're (usually) programmers, include code blocks as examples to make concepts concrete.")

Use enough white space between web page elements.

Guideline: On a smartphone, include a visual or structural element (e.g., bullet point, image, subheading) about every screen length of text. This ensures readability without overwhelming the reader.

5. Avoid using italic or underlined text; use bold instead.

Avoid using italics in the body of the text. Use bold to emphasize key words and concepts. Avoid underlining large blocks of text as this makes it difficult to read.

C.3. Final Tips for Iteration and Review

Creating an exceptional article often requires multiple iterations. As you refine your work, you might discover that certain arguments need strengthening, some details are missing, or that your prose could be more engaging. Embrace this part of the process—iterative refinement is key to producing high-quality content.

Iterate Using the Framework

Don’t hesitate to loop back through the algorithmic framework you’ve established. Revisiting each step can help you identify areas that need adjustment, ensuring that your article remains coherent and compelling. Whether it’s refining your arguments, enhancing your rhetoric, or improving readability, the framework serves as a reliable guide.

Gain a Fresh Perspective

Sometimes, viewing your article from a different vantage point can reveal insights you might have missed. Here are some techniques to consider:

  • Reverse Reading: Read your article paragraph by paragraph from the end to the beginning. This approach can help you spot inconsistencies, redundancies, or logical gaps that aren’t as apparent when reading in the usual order.

  • Pause and Reflect: Take a short break from your work. Stepping away, even briefly, can provide clarity when you return to your article.

  • Seek Critical Feedback: Use tools like AI language models to obtain an objective review of your work. Ask for critical feedback to identify weaknesses or areas for improvement that you might have overlooked.

LLM Advice

Leverage AI tools to enhance your revision process. Prompt an AI assistant with: “Please provide a critical review of my article, focusing on areas that need improvement while remaining objective.” AI can offer fresh insights, highlight inconsistencies, and suggest enhancements you may not have considered.

Eliminate Redundancies

Be on the lookout for repetitive information, especially between the endings of sections and the beginnings of the next. For instance, if the conclusion of one argument mirrors the introduction of the following argument, consider consolidating them. This not only tightens your writing but also maintains the reader’s engagement by avoiding unnecessary repetition.

Polish Your Language

Refining your language enhances readability and professionalism.

  • Grammar and Style Checks: Utilize tools like DeepL Write or other grammar assistants to catch errors, improve sentence structure, and ensure clarity.

  • Vocabulary Consistency: Aim to use advanced terms consistently throughout your article. Introducing complex terminology only once can confuse readers. If you use specialized terms, ensure they are defined and revisited as necessary to reinforce understanding.

  • Read Aloud: Reading your article aloud can help you catch awkward phrasings, run-on sentences, or abrupt transitions that you might not notice when reading silently.

Final Considerations

Remember that perfection is a process. Each revision brings you closer to a polished and impactful article. By methodically applying your framework, seeking fresh perspectives, and diligently refining your language, you enhance both the quality of your writing and its resonance with your audience.

C.4. Conclusion

Readability sparks joy because it closes the loop on everything you’ve built in Chapter 1 (Content & Logic) and Chapter 2 (Rhetorical Structure). By completing your draft and improving readability, you ensure that even the most technical topics will be accessible, engaging, and memorable to your audience.

By following these steps, your article will not only contain valuable information, but also present it in a way that truly resonates with readers.

Conclusion

See? By applying a clear, repeatable process, we’ve shown that writing technical articles isn’t just ‘art’—it can be learned by anyone willing to follow the steps.

This article embodies the very framework it presents: a clear introduction, three structured arguments, and a concise conclusion with a compelling call to action. By following this repeatable process, you can transform your technical expertise into articles that effectively inform and engage your audience.

Think of writing an article as applying an algorithm: define your inputs, process them through a sequence of logical steps, and arrive at the output. Each step corresponds to a clear function or subroutine.

Don’t let your valuable insights get lost in poorly structured content. Share them in a way that captivates and informs your readers. Apply this algorithmic framework to your next article and experience the difference it makes.

Now it’s your turn to use this process to create technical articles that resonate with your audience! Simply start by copying this article and your latest article or your current article draft into the same LLM prompt and ask it to apply the framework.


  1. Ramage, John D., John C. Bean and June Johnson. “Writing Arguments : A Rhetoric with Readings.” (1997). ↩︎

  2. Toulmin, Stephen E.. “The Uses of Argument, Updated Edition.” (2008). ↩︎

  3. Zhang, Weitao, Zsuzsika Sjoerds and Bernhard Hommel. “Metacontrol of human creativity: The neurocognitive mechanisms of convergent and divergent thinking.” NeuroImage (2020): 116572 . ↩︎

  4. British Design Council. “The Double Diamond: A universally accepted depiction of the design process.” (2005). Accessible via: https://www.designcouncil.org.uk/our-resources/the-double-diamond/ ↩︎

  5. Leopoldino, Kleidson Daniel Medeiros, Mario Orestes Aguirre González, Paula de Oliveira Ferreira, José Raeudo Pereira and Marcus Eduardo Costa Souto. “Creativity techniques: a systematic literature review.” (2016). ↩︎

  6. Saha, Shishir Kumar, M. Selvi, Gural Buyukcan and Mirza Mohymen. “A systematic review on creativity techniques for requirements engineering.” 2012 International Conference on Informatics, Electronics & Vision (ICIEV) (2012): 34-39. ↩︎

  7. Wikipedia. https://en.wikipedia.org/wiki/MECE_principle ↩︎

  8. Cicero, De Inventione↩︎

  9. Aristotle, Rhetoric↩︎

  10. https://en.wikipedia.org/wiki/Readability ↩︎

  11. Buschman, T. J.. “Balancing Flexibility and Interference in Working Memory.” Annual review of vision science (2021): n. pag. ↩︎

  12. Leppink, Jimmie, Fred Paas, Cees P. M. van der Vleuten, Tamara van Gog and Jeroen J. G. van Merriënboer. “Development of an instrument for measuring different types of cognitive load.” Behavior Research Methods 45 (2013): 1058 - 1072. ↩︎

  13. Klepsch, Melina and Tina Seufert. “Understanding instructional design effects by differentiated measurement of intrinsic, extraneous, and germane cognitive load.” Instructional Science 48 (2020): 45-77. ↩︎

  14. https://www.mqtt.org ↩︎

  15. Miniukovich, Aliaksei, Michele Scaltritti, Simone Sulpizio and Antonella De Angeli. “Guideline-Based Evaluation of Web Readability.” Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (2019): n. pag. ↩︎

]]>
https://www.theocharis.dev/blog/algorithmic-framework-for-writing-technical-articles/ hacker-news-small-sites-42709897 Wed, 15 Jan 2025 11:53:27 GMT
<![CDATA[State Space Explosion: The Reason We Can Never Test Software to Perfection(2021)]]> thread link) | @thunderbong
January 15, 2025 | https://concerningquality.com/state-explosion/ | archive.org

Have you ever seen a test suite actually prevent 100% of bugs? With all of the time that we spend testing software, how do bugs still get through? Testing seems ostensibly simple – there are only so many branches in the code, only so many buttons in the UI, only so many edge cases to consider. So what is difficult about testing software?

This post is dedicated to Edmund Clarke, who spent a large portion of his life pioneering solutions to the state explosion problem.

Consequently, drawing conclusions about software quality short of testing every possible input to the program is fraught with danger.1

When we think of edge cases, we intuitively think of branches in the code. Take the following trivial example:

if (currentUser) {
  return "User is authenticated";
} else {
  return "User is unauthenticated";
}

This single if statement has only two branches2. If we wanted to test it, we surely need to exercise both and verify that the correct string is returned. I don’t think anyone would have difficulty here, but what if the condition is more complicated?

function canAccess(user) {
  if (user.internal === false || user.featureEnabled === true) {
    return true;
  } else {
    return false;
  }
}

Here, we could have come up with the following test cases:

let user = {
  internal: false,
  featureEnabled: false,
};

canAccess(user); // ==> false

let user = {
  internal: false,
  featureEnabled: true,
};

canAccess(user); // ==> true

This would yield 100% branch coverage, but there’s a subtle bug. The internal flag was supposed to give internal users access to some feature without needing the feature to be explicitly flagged (i.e. featureEnabled: true), but the conditional checks for user.internal === false instead. This would give access to the feature to all external users, whether or not they had the flag enabled. This is why bugs exist even with 100% branch coverage. While it is useful to know if you have missed a branch during testing, knowing that you’ve tested all branches still does not guarantee that that the code works for all possible inputs.

For this reason, there are more comprehensive (and tedious) coverage strategies, such as condition coverage. With condition coverage you must test the case where each subcondition of a conditional evaluates to true and false. To do that here, we’d need to construct the following four user values (true and false for each side of the ||):

let user = {
  internal: false,
  featureEnabled: false,
};

let user = {
  internal: false,
  featureEnabled: true,
};

let user = {
  internal: true,
  featureEnabled: false,
};

let user = {
  internal: true,
  featureEnabled: true,
};

If you’re familiar with Boolean or propositional logic, these are simply the input combinations of a truth table for two boolean variables:

internal featureEnabled
F F
F T
T F
T T

This is tractable for this silly example code because there are only 2 boolean parameters and we can exhaustively test all of their combinations with only 4 test cases. Obviously bools aren’t the only type of values in programs though, and other types exacerbate the problem because they consist of more possible values. Consider this example:

enum Role {
  Admin,
  Read,
  ReadWrite
}

function canAccess(role: Role) {
  if (role === Role.ReadWrite) {
    return true;
  } else {
    return false;
  }
}

Here, a role of Admin or ReadWrite should allow access to some operation, but the code only checks for a role of ReadWrite. 100% condition and branch coverage are achieved with 2 test cases (Role.ReadWrite and Role.Read), but the function returns the wrong value for Role.Admin. This is a very common bug with enum types – even if exhaustive case matching is enforced, there’s nothing that prevents us from writing an improper mapping in the logic.

The implications of this are very bad, because data combinations grow combinatorially. If we have a User type that looks like this,

type User = {
  role: Role,
  internal: Boolean,
  flagEnabled: Boolean
}

and we know that there are 3 possible Role values and 2 possible Boolean values, there are then 3 * 2 * 2 = 12 possible User values that we can construct. The set of possible states that a data type can be in is referred to as its state space. A state space of size 12 isn’t so bad, but these multiplications get out of hand very quickly for real-world data models. If we have a Resource type that holds the list of Users that have access to it,

type Resource = {
  users: User[]
}

it has 4,096 possible states (2^12 elements in the power set of Users) in its state space. Let’s say we have a function that operates on two Resources:

function compareResources(resource1: Resource, resource2: Resource) { 
  ...
}

The size of the domain of this function is the size of the product of the two Resource state spaces, i.e. 4,096^2 = 16,777,216. That’s around 16 million test cases to exhaustively test the input data. If we are doing integration testing where each test case can take 1 second, this would take ~194 days to execute. If these are unit tests running at 1 per millisecond, that’s still almost 5 hours of linear test time. And that’s not even considering the fact that you physically can’t even write that many tests, so you’d have to generate them somehow.

This is the ultimate dilemma: testing with exhaustive input data is the only way of knowing that a piece of logic is entirely correct, yet the size of the input data’s state space makes that prohibitively expensive in most cases. So be wary of the false security that coverage metrics provide. Bugs can still slip through if the input state space isn’t sufficiently covered.

All hours wound; the last one kills

We’ve only considered pure functions up until now. A stateful, interactive program is more complicated than a pure function. Let’s consider the following stateful React app, which I’ve chosen because it has a bug that actually occurred to me in real life3.

type User = {
  name: string
}

const allUsers: User[] = [
  { name: "User 1" },
  { name: "User 2" }
];

const searchResults: User[] = [
  { name: "User 2"}
];

type UserFormProps = {
  users: User[],
  onSearch: (users: User[]) => void
}

function UserForm({ users, onSearch }: UserFormProps) {
  return <div>
    <button onClick={() => onSearch(searchResults)}>
      {"Search for Users"}
    </button>
    {users.map((user => {
      return <p>{user.name}</p>
    }))}
  </div>;
}

function App() {
  let [showingUserForm, setShowingUserForm] = useState(false);
  let [users, setUsers] = useState(allUsers);

  function toggleUserForm() {
    setShowingUserForm(!showingUserForm);
    setUsers(allUsers);
  }

  return (
    <div className="App">
       {<button onClick={() => setShowingUserForm(!showingUserForm)}>
          {"Toggle Form"}
        </button>}
      {showingUserForm && (
        <UserForm users={users} onSearch={setUsers}></UserForm>
      )}
    </div>
  );
}

This app can show and hide a form that allows selecting a set of Users. It starts out by showing all Users but also allows you to search for specific ones. There’s a tiny (but illustrative) bug in this code. Take a minute to try and find it.

.
..

….
…..
……
…….
……..
………
……….

The bug is exposed with the following sequence of interactions:

  1. Show the form
  2. Search for a User
  3. Close the form
  4. Open the form again

At this point, the Users that were previously searched for are still displayed in the results list. This is what it looks like after step 4:

The bug isn’t tragic, and there’s plenty of simple ways to fix it, but it has a very frustrating implication: we could have toggled the form on and off 15 times, but only after searching and then toggling the form do we see this bug. Let’s understand how that’s possible.

A stateful, interactive application such as this is most naturally modeled by a state machine. Let’s look at the state diagram of this application4:

There are 2 state variables in this application: showingForm represents whether or not the form is showing, and users is the set of Users that the form is displaying for selection. showingForm can be true or false, and users can be all possible subsets of Users in the system, which for the purposes of this example we’ve limited to 2. The state space of this application then has 2 * 2^2 = 8 individual states, since we consider each individual combination of values to be a distinct state.

The edges between the states represent the actions that a user can take. ToggleForm means they click the “Toggle Form” button, and SearchForUsers means they clicked the “Search for Users” button. We can observe the above bug directly in the state diagram:

Here we see that we can hide the form after the search returns u2, and when we show the form again, u2 is still the only member of users. Note how if we only show and hide the form and never perform a search, we can never get into this state:

The fact that the same user action (ToggleForm) can produce a correct or buggy result depending on the sequence of actions that took place before it means that its behavior is dependent on the path that the user takes through the state machine. This is what is meant by path dependence, and it is a huge pain from a testing perspective. It means that just because you witnessed something work one time does not mean it will work the next time– we now have to consider sequences of actions when coming up with test cases. If there are n states, that means that there are n^k k-length paths through the state graph. In this extremely simplified application, there are 8 states. Checking for 4-length sequences would require 4,096 test cases, and checking for 8-length sequences would require 16,777,216.

Checking for all k-length sequences doesn’t even guarantee that we discover all unique paths in the graph– whichever k we test for, the bug could only happen at the k+1th step. The introduction of state brings the notion of time into the program. To perform a sequence of actions, you have to be able to perform an action after a previous one. These previous actions leave behind an insidious artifact: state. Programmers intuitively know that state is inherently complex, but this is shows where that intuition comes from. Like clockmakers, we know know how powerful the effect of time is, and clockmakers have a saying that’s relevant here:

Omnes vulnerant, ultima necat

It means: All hours wound; the last one kills.

It seems that our collective intuition is correct, and we should try and avoid state and time in programs whenever we can. Path dependence adds a huge burden to testing.

Faster, higher, stronger

A state graph consists of one node per state in the state space of the state variables, along with directed edges between them. If there are n states in the state space, then there can be n^2 edges in the corresponding state graph5. We looked at the state diagram of this application with 2 users, now here is the state diagram when there are 4 total Users (remember, more Users means more possible subsets, and every unique combination of data is considered a different state):

The number of nodes went from 8 to 32 states, which means there are 1,024 possible edges now. There are constraints on when you can perform certain actions, so there are a fewer number of edges in this particular graph, though we can see that there are still quite a lot. Trust me, you don’t want to see the graph for 10 Users.

This phenonmenon is known as state explosion. When we add more state variables, or increase the range of the existing variables, the state space multiplies in size. This adds quadratically more edges and thus more paths to the state graph of the stateful parts of the program, which increases the probability that there is a specific path that we’re not considering when testing.

The number of individual states and transitions in a modern interactive application is finite and countable, but it’s almost beyond human comprehension at a granular level. Djikstra called software a “radical novelty” for this reason– how are we expected to verify something of this intimidating magnitude?

Frankly, it proves that testing software is inherently difficult. Critics of testing software as a practice are quick to point out that each test case provides no guarantee that other test cases will work. This means that, generally, we’re testing an infinitesimal subset of a potentialy huge state space, and any member of the untested part can lead to a bug. This is a situation where the magnitude of the problem is simply not on our side, to the point where it can be disheartening.

Yet, we have thousands of test cases running on CI multiple times a day, every day, for years at a time. An enormous amount of computational resources are spent running test suites all around the world, but these tests are like holes in swiss cheese – the majority of the state space gets left uncovered. That’s not even considering the effect that test code has on the ability to actually modify our applications. If we’re not dilligent with how we structure our test code, it can make the codebase feel like a cross between a minefield and a tar pit. The predominant testing strategy of today is to create thousands of isolated test cases that test one specific scenario at a time, often referred to as example-based testing. While there are proven benefits to testing via individual examples, and after doing it for many years myself, I’ve opened my mind to other approaches.

The anti-climax here is that I don’t have the silver bullet for this problem, and it doesn’t look like anyone else does either. Among others, we have the formal methods camp who thinks we can prove our way to software quality, and we have the ship-it camp who thinks it’s an intractable problem so we should just reactively fix bugs as they get reported. We have techniques such as generative testing, input space partitioning, equivalence partitioning, boundary analysis, etc. I’m honestly not sure which way is “correct”, but I do believe that a) it is a very large problem (again, just consider how much compute time every day is dedicated to running test suites across all companies), and b) conventional wisdom is mostly ineffective for solving it. I have more stock in the formal methods side, but I think there are things that go way too far such as dependent typing and interactive theorem proving- it can’t take 6 months to ship an average feature, and developer ergonomics are extremely important. I’ll leave the solution discussion there and tackle that in subsequent posts.

However we approach it, I’m sure that the state space magnitude problem is at the root of what we need to solve to achieve the goal of high software quality.


]]>
https://concerningquality.com/state-explosion/ hacker-news-small-sites-42709370 Wed, 15 Jan 2025 10:25:56 GMT
<![CDATA[Building GameBoy Advance Games in Rust]]> thread link) | @thunderbong
January 15, 2025 | https://shanesnover.com/2024/02/07/intro-to-rust-on-gba.html | archive.org

GBA Screen Recording of Conway's Game of Life

A couple years ago I was interested in implementing Conway’s Game of Life since it’s pretty simple and seemed like a fun little project to sharpen my skills in Rust. I had a pretty major problem though: I needed a way to actually show the output of the simulation. Now, I’m not much of a web developer or desktop app builder so the idea of learning these technologies just to show a screen and add some user interaction was daunting. However, I have a lot of experience in embedded systems and fortuitously, I caught an announcement of a new release of the gba crate!

I put more than a thousand hours into my GameBoy Advance as a kid and have put many more into playing games on GBA emulators since then. It holds a lot of nostalgia for me and the thought of building a game for this hardware in 2022 felt especially novel.

In this post I won’t actually be talking about Conway’s Game of Life too in depth, I’ll just give this brief explanation and if you want to know more go check out the Wikipedia page. In Conway’s game of life, you have a 2D grid of boolean states. Given a set of states, you can evaluate a set of simple rules which decide if a given element is ‘alive’ or ‘dead’ in the next state. These simple rules give rise to surprising complexity (and are fun to look at).

For the remainder of the post I’ll be talking about the basics of the device hardware, how to get button input, and how to put pixels on the screen. Some more advanced topics I’m not covering include: save files, audio, link cable communication, or sprites.

Device Hardware

I mentioned in the introduction that developing for the GBA was interesting to me due to my background as an embedded system. And this system is about embedded as they come in the modern day. The GBA has no network connectivity, limited serial connectivity (with the link cable), and just 10 buttons in total. Not only that, but the hardware is extremely limited with no sign of an operating system or filesystem in sight.

Check out the GBA’s Wikipedia page for as many details as you could want, but some particular important specs for a would-be game developer are the CPU frequency of 16.78 MHz (about 1/200 the frequency of a modern desktop CPU), the screen size of 240 pixels wide by 160 pixels tall, and memory of 32 KB (expandable with memory on the cartridge). The screen refresh rate is also very important, it comes in just shy of 60 fps.

Finally, I’ll note the processor: a 32-bit ARM7TDMI. Don’t let the name fool you, this processor runs the ARM v4 instruction set which is positively ancient (not surprising when you consider its from 1994, older than the author of this very post!).

Development

At this point you might be wondering how we can even make Rust compile for a processor as old as this. You can find a full list of all platforms Rust supports in the rustc documentation. These are split into tiers where tier 1 targets work with very little setup and as you increase in number support varies (as do resources spent on making sure the tooling works). If you continue scrolling through the incredibly long list of supported target triples, you’ll eventually come to armv4t-none-eabi, but we’ll actually go further to thumbv4t-none-eabi at the direction of the gba crate’s documentation. This will also require a nightly toolchain.

I won’t specify the exact structure of your Cargo workspace and what dependencies you need to install here as it may change in the future. Instead take a look at the documentation for the gba crate which should get you sorted.

For actually testing your game, you’ll naturally not be able to just run what comes out of rustc. So unless you happen to have a means of flashing the resulting ROM file onto a cartridge and also happen to be in possession of the original hardware, I recommend downloading an emulator. If you’re concerned about the legality of emulators, just know that as long as you’re running your own games, or other freely available ROMs made and shared by other developers, you are completely in the clear. On Linux I like to use mgba-qt as it has nice tools for attaching a gdb client or taking screen recordings.

Application Structure

As above, this section isn’t going to show any runnable code, mainly just talk about concepts and how writing programs for GBA is different than a desktop program. See the examples provided with the gba repository for specifics.

Unlike a typical program which is built for taking user input and showing something on a screen, GBA games are very resource limited and for the purposes of structuring an application we’re RAM constrained. As such, we’re not going to build this declarative tiered behemoth of structs describing UI elements and callbacks, we have to be simpler than that or we’ll overflow the stack in a hurry. That also means you’re not likely to find a convenient framework for composing a UI, it’s just too expensive from a RAM and code space overhead perspective.

Instead, we’re going to start at the very top level with a main function that looks something like this:


fn main() -> ! {
	init_some_hardware();
	init_some_software_state();

	loop {
		check_for_user_inputs();
		process_user_inputs_and_software_state();
		render_to_the_screen();

		VBlankIntrWait();
	}
}

Most of that is slideware and/or pseudocode, but it’s still worth mentioning a couple things. First of all, this device has absolutely nothing useful to do if the program exits, as a result we return the special ! type which indicates to Rust that this function should never ever exit. There is no X button in top corner to click when you’re done playing a game, you just turn it off (with or without saving first).

Secondly, we use the function VBlankIntrWait in order to lock our main loop frequency to that of the refresh rate of the screen (~60 Hz). You don’t necessarily have to do this, but you could conceivably render “too fast” such that a frame is overwritten before it even has a chance for it to be displayed to the screen. Note that this doesn’t guarantee that the main loop actually runs at 60 Hz, it’s totally possible for the code in the loop to run too slow and to miss the interrupt.

With the high level out of the way, let’s get into some of those pseudocode functions.

Detecting Button Presses

Since we’re making a game (even one that runs itself like a cellular automata), one of the first things to think about is user input. Sometimes you want something to happen when a button is pressed or maybe only when it’s held or released. You might need to build up a state machine to detect that the timing of the Konami code was just right.

No matter what your application, the basic interface to the button states is the same from a software perspective: the button is either pressed when you read it’s state or it’s not. It’s a boolean.

However, that’s not the whole story! If you choose to poll the button states, as I’ve done in my game, there’s some complications. If your game is healthily running at 60 fps, each run through your mainloop will be around 16 milliseconds. If you want a button press to trigger an action, you may not want it to trigger every single time you read the button state as pressed, and a player could easily press and release the button only after 40 milliseconds had passed, leading to that button reporting a true state for 2-3 cycles.

At an even smaller time scale, buttons are hardware and that means that behind that boolean state is a physical conductor being moved into place to conduct a signal. This not a clean state transition in the real world. As this Hackaday blogpost shows, a single button press can actually register as multiple presses and releases. If you’re working on an emulator, you probably won’t see this unless the author of that emulator was looking for extreme realism, but it’s good to know about.

In order to potentially solve both of the above problems, I used a technique known as debouncing and then added statefulness around it. See the source here. On each check of the keys, I check the register associated with the key states and track the amount of time since the last state change. By restricting the frequency with which the state can change, I prevent noise from hardware bounces. Additionally, by tracking previous state, I can return more information than Pressed or Released for button state. I do this with the KeyState::change function which can describe if the button has experienced a rising edge (a change from released to pressed) or a falling edge (a change from pressed to released). This way, I can detect only the change in the press of the A button so that I don’t trigger the same action repeatedly.

If you scan through more of that source you’ll notice that I’m not debouncing all of the keys. Specifically, I’m not doing so for the directional pad because I want to repeat an action (moving a cursor around the screen) so long as the button is held. In your game, you may have some buttons that should be debounced and others that aren’t or maybe even buttons that are conditionally debounced.

One final note I have is that the responsiveness of your debounced buttons is dependent on your main loop frequency if you are polling. If an iteration of your main loop runs slow sometimes and fast other times, you’ll notice in how quickly the game responds to inputs (and sometimes it might miss the input and not respond). A more robust means of checking inputs can be performed with hardware interrupts, which I won’t discuss here but are explained at length in the Tonc documentation.

About Time and Timers

This isn’t worth having it’s own section, but since I mentioned them: there’s no system time or way to sync with an external source of time on a GBA. So how can we measure how long something took or how long has passed since something happened?

Embedded hardware commonly has simple hardware called timers and in general they operate like this: there’s a memory-mapped register somewhere that starts at 0 and periodically counts up by one. How quickly it counts is usually configurable by dividing the input frequency. The timer I’m using on the GBA starts at a frequency of the main CPU frequency at around 16 MHz, but can prescaled so that timers count slower than that. This gives a unit of time not in seconds, but in cycles.

Putting Pixels on the Screen

With inputs out of the way, naturally we can jump to outputting to the screen (which simultaneously tests our input code).

The GBA has a few different ways of expressing control of the screen. The simplest and most intuitive (but not necessarily most performant) is video mode 3. In this mode you’re simply writing in a region of memory where each pixel gets 2 bytes for color and every time the screen draws it simply copies what is in that memory region in order to show pixels on the screen.

“2 bytes?” you might be wondering. Modern devices pretty much universally have at least 24-bit color resolution where each channel of red, green, and blue get 8 bits. Not so on the GBA! The GBA uses an encoding known as RGB555 where each channel gets 5 bits. This saves a byte for each pixel which can make rendering 50% faster in this mode, which is good for us, but less good if you want to render a color rich display. In my case, I’m actually only using 3 different colors, so this is plenty.

The gba library defines a constant VIDEO3_VRAM which has methods for indexing into the memory region by the pixel’s row and column number and then writing pixel colors like this:

VIDEO3_VRAM.index(row, col).write(Color::GREEN);

You’ll find it convenient to build up abstractions in layers over this. For example, individual pixels are kind of small to act as cells or the cursor (at least for my old eyes), so I made each cell a 2x2 pixel.

Now, writing to every pixel in the buffer means writing to quite a lot of memory, especially if the program is going to do it at 60 fps on a processor only running at 16 MHz! The total number of pixels is 38,400 and if you’re like me and trying to overwrite every single one every frame you might notice… Some strange stuff.

In my case, I was rendering every cell as alive or dead. Then I was rendering the cursor if the game was in Edit mode. Now, this initially seemed fine, but a curious thing would happen if I moved the cursor towards the top of the screen: it would disappear above a certain line! And not always on the same exact line of pixels! It turns out, I was overrunning my 16 ms timer for rendering each frame.

We’ll talk about ways to mitigate this and do less work on each frame in the next section.

For a moment, I’ll just mention that there are other video modes which are slightly more complicated to use, but they can be significantly faster by moving some of the rendering to the video processor instead of the CPU. Check out the resources mentioned in the conclusion for more information.

We Need to Go Faster

All performance techniques are going to be highly specific to the game being made, but some are reusable in theory so I think they can be worth mentioning. Two of these I implemented and the third I thought about but it turned out not to be necessary.

The first and most obvious is that just because the screen is able to update at 60 fps that doesn’t mean that your game actually has to run at that. In the case of cellular automata, the speed doesn’t actually matter that much and you don’t have to update the automata state at 60 Hz, it can be much slower. One way to do this would be with timer and counting the number of VBLANK interrupts. I just added a rolling counter so I step the automata state every N frames.

The second is an attempt to solve the problem mentioned above on how long it takes to overwrite the entire screen buffer. Much of the screen is actually not changing on every frame, but only small subsets of it are. If you can define a sensible way to mark subsets as needing to be redrawn, you can save a significant amount of time. In my case, those subsets were each row. I made an array of bits where each bit represented whether the row needed to be updated and marked rows as “dirty” during the automata step function. This saves a lot of time copying zeroes into regions of memory that are already zero.

The third is an extension of the above, but is more specific to the algorithm I’m displaying. It didn’t end up being necessary and was complicated so I avoided it but it consists in determining bounding boxes of various subregions which have active cells. Since areas with no active cells cannot become spontaneously active, they can be ignored entirely. Meanwhile small subregions can remain active (and move slowly). In some cases where most of the screen is filled with dead cells, this should be even faster for rendering and would also reduce the time to compute the automata steps.

Sources of Randomness

Since the game starts with a blank screen (where all cells are dead), it can be kind of daunting to use the cursor to draw out the initial state that you want to start with. Unless you’re trying to build and observe some specific state in Conway’s Game of Life, its cumbersome. One way to overcome this would be a way to seed the current state such that random cells became alive. Then you could run it and just watch it play out.

Unsurprisingly, there is no /dev/random file available here and without a standard library implementation there’s no way to simply generate random numbers. Luckily, the rand::RngCore trait is no_std compatible and there are some existing implementations for freestanding environments like the one we’re developing for.

A common algorithm for systems like this is based on taking an initial seed and calculating random data based on a number of bit XORing and right shifting. The implementation of that algorithm is available in the crate rand_xoshiro. Once seeded, it can continuously supply random numbers which we can use in our program. But where do we get our random seed?

In some embedded systems, an external parameter like sampling the noise on an analog pin is used to seed. The GBA doesn’t have any analog pins though, so we need to rely on another external parameter: the user interactions.

In this case, I actually reseeded every time the user presses the Select button which triggers the randomization. The seed is drawn from the current value of the system timer. This means that the seed will be different every time (as opposed to if you seeded at the beginning of the program). This is a popular approach: in the GBA Pokemon games the random seed is drawn from the timer a single time at the moment the user advances from the Start screen.

This approach is sufficient to give data that appears random, which is suitable for games. It is NOT suitable as a sole source of randomness for applications which need to be cryptographically secure. Going back to the example of Pokemon games: there’s a whole class of manipulations based around seeding randomness because the creation of the seed is within control of the user!

Conclusion and Getting Truly Advanced

With the above you should have enough information to get started writing small games for the GBA in Rust and experimenting with writing code for truly constrained devices! However, this is obviously far from the necessary knowledge to rewrite your favorite GBA titles from scratch.

If you want to learn more about the GameBoy Advance hardware and the available software interfaces, you’ll find that a lot of that information exists and is documented in C. However, nothing should prevent translating that same code to an equivalent in Rust. The most helpful resource here was linked as appropriate above, but I’ll call it out again in the conclusion here: the Tonc guide by Jasper Vijn. It’s pretty comprehensive with lots of examples and is invaluable even if you never write a single line of C.

As far as Rust resources, there’s the gba crate which I used here. It’s great for getting started as the abstractions are very thin and the resulting code will look much more like the C equivalent. There’s also the agb crate which provides higher level abstractions than gba along with some additional tooling.

Finally, here’s the repo for my game of life implementation which you should feel free to use as an example or even a base for your own game if you like.

Good luck and have fun making games!

If you notice a dead link or have some other correction to make, feel free to make an issue on GitHub!

]]>
https://shanesnover.com/2024/02/07/intro-to-rust-on-gba.html hacker-news-small-sites-42709330 Wed, 15 Jan 2025 10:20:00 GMT
<![CDATA[Generate audiobooks from E-books with Kokoro-82M]]> thread link) | @csantini
January 15, 2025 | https://claudio.uk/posts/epub-to-audiobook.html | archive.org

Posted on 14 Jan 2025 by Claudio Santini

Kokoro v0.19 is a recently published text-to-speech model with just 82M params and very high-quality output. It's released under Apache licence and was trained on <100 hours of audio. It currently supports american, british english, french, korean, japanese and mandarin, in a bunch of very good voices.

An example of the quality:

I've always dreamed of converting my ebook library into audiobooks. Especially for those niche books that you cannot find in audiobook format. Since Kokoro is pretty fast, I thought this may finally be doable. I've created a small tool called Audiblez (in honor of the popular audiobook platform) that parses .epub files and converts the body of the book into nicely narrated audio files.

On my M2 MacBook Pro, it takes about 2 hours to convert to mp3 the Selfish Gene by Richard Dawkins, which is about 100,000 words (or 600,000 characters), at a rate of about 80 characters per second.

How to install and run

If you have Python 3 on your computer, you can install it with pip. Be aware that it won't work with Python 3.13.

Then you also need to download a couple of additional files in the same folder, which are about ~360MB:

pip install audiblez
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/kokoro-v0_19.onnx
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.json

Then, to convert an epub file into an audiobook, just run:

audiblez book.epub -l en-gb -v af_sky

It will first create a bunch of book_chapter_1.wav, book_chapter_2.wav, etc. files in the same directory, and at the end it will produce a book.m4b file with the whole book you can listen with VLC or any audiobook player. It will only produce the .m4b file if you have ffmpeg installed on your machine.

Supported Languages

Use -l option to specify the language, available language codes are: 🇺🇸 en-us, 🇬🇧 en-gb, 🇫🇷 fr-fr, 🇯🇵 ja, 🇰🇷 kr and 🇨🇳 cmn.

Supported Voices

Use -v option to specify the voice: available voices are af, af_bella, af_nicole, af_sarah, af_sky, am_adam, am_michael, bf_emma, bf_isabella, bm_george, bm_lewis. You can try them here: https://huggingface.co/spaces/hexgrad/Kokoro-TTS

Chapter Detection

Chapter detection is a bit janky, but it manages to find the core chapters in most .epub I tried, skipping the cover, index, appendix etc.
If you find it doesn't include the chapter you are interested into, try to play with the is_chapter function in the code. Often it skips the preface or intro, and I'm not sure if it's a bug or a feature.

Source

See Audiblez project on GitHub.

There are still some rough edges, but it works well enough for me. Future improvements could include:

  • Better chapter detection, or allow users to include/exclude chapters.
  • Add chapter navigation to m4b file (that looks hard, cause ffmpeg doesn't do it)
  • Add narration for images using some image-to-text model

Code is short enough to be included here:

#!/usr/bin/env python3
# audiblez - A program to convert e-books into audiobooks using
# Kokoro-82M model for high-quality text-to-speech synthesis.
# by Claudio Santini 2025 - https://claudio.uk

import argparse
import sys
import time
import shutil
import subprocess
import soundfile as sf
import ebooklib
import warnings
import re
from pathlib import Path
from string import Formatter
from bs4 import BeautifulSoup
from kokoro_onnx import Kokoro
from ebooklib import epub
from pydub import AudioSegment


def main(kokoro, file_path, lang, voice):
    filename = Path(file_path).name
    with warnings.catch_warnings():
        book = epub.read_epub(file_path)
    title = book.get_metadata('DC', 'title')[0][0]
    creator = book.get_metadata('DC', 'creator')[0][0]
    intro = f'{title} by {creator}'
    print(intro)
    chapters = find_chapters(book)
    print('Found chapters:', [c.get_name() for c in chapters])
    texts = extract_texts(chapters)
    has_ffmpeg = shutil.which('ffmpeg') is not None
    if not has_ffmpeg:
        print('\033[91m' + 'ffmpeg not found. Please install ffmpeg to create mp3 and m4b audiobook files.' + '\033[0m')
    total_chars = sum([len(t) for t in texts])
    print('Started at:', time.strftime('%H:%M:%S'))
    print(f'Total characters: {total_chars:,}')
    print('Total words:', len(' '.join(texts).split(' ')))

    i = 1
    chapter_mp3_files = []
    for text in texts:
        chapter_filename = filename.replace('.epub', f'_chapter_{i}.wav')
        chapter_mp3_files.append(chapter_filename)
        if Path(chapter_filename).exists():
            print(f'File for chapter {i} already exists. Skipping')
            i += 1
            continue
        print(f'Reading chapter {i} ({len(text):,} characters)...')
        if i == 1:
            text = intro + '.\n\n' + text
        start_time = time.time()
        samples, sample_rate = kokoro.create(text, voice=voice, speed=1.0, lang=lang)
        sf.write(f'{chapter_filename}', samples, sample_rate)
        end_time = time.time()
        delta_seconds = end_time - start_time
        chars_per_sec = len(text) / delta_seconds
        remaining_chars = sum([len(t) for t in texts[i - 1:]])
        remaining_time = remaining_chars / chars_per_sec
        print(f'Estimated time remaining: {strfdelta(remaining_time)}')
        print('Chapter written to', chapter_filename)
        print(f'Chapter {i} read in {delta_seconds:.2f} seconds ({chars_per_sec:.0f} characters per second)')
        progress = int((total_chars - remaining_chars) / total_chars * 100)
        print('Progress:', f'{progress}%')
        i += 1
    if has_ffmpeg:
        create_m4b(chapter_mp3_files, filename)


def extract_texts(chapters):
    texts = []
    for chapter in chapters:
        xml = chapter.get_body_content()
        soup = BeautifulSoup(xml, features='lxml')
        chapter_text = ''
        html_content_tags = ['title', 'p', 'h1', 'h2', 'h3', 'h4']
        for child in soup.find_all(html_content_tags):
            inner_text = child.text.strip() if child.text else ""
            if inner_text:
                chapter_text += inner_text + '\n'
        texts.append(chapter_text)
    return texts


def is_chapter(c):
    name = c.get_name().lower()
    part = r"part\d{1,3}"
    if re.search(part, name):
        return True
    ch = r"ch\d{1,3}"
    if re.search(ch, name):
        return True
    if 'chapter' in name:
        return True


def find_chapters(book, verbose=True):
    chapters = [c for c in book.get_items() if c.get_type() == ebooklib.ITEM_DOCUMENT and is_chapter(c)]
    if verbose:
        for item in book.get_items():
            if item.get_type() == ebooklib.ITEM_DOCUMENT:
                # print(f"'{item.get_name()}'" + ', #' + str(len(item.get_body_content())))
                print(f'{item.get_name()}'.ljust(60), str(len(item.get_body_content())).ljust(15), 'X' if item in chapters else '-')
    if len(chapters) == 0:
        print('Not easy to find the chapters, defaulting to all available documents.')
        chapters = [c for c in book.get_items() if c.get_type() == ebooklib.ITEM_DOCUMENT]
    return chapters


def strfdelta(tdelta, fmt='{D:02}d {H:02}h {M:02}m {S:02}s'):
    remainder = int(tdelta)
    f = Formatter()
    desired_fields = [field_tuple[1] for field_tuple in f.parse(fmt)]
    possible_fields = ('W', 'D', 'H', 'M', 'S')
    constants = {'W': 604800, 'D': 86400, 'H': 3600, 'M': 60, 'S': 1}
    values = {}
    for field in possible_fields:
        if field in desired_fields and field in constants:
            values[field], remainder = divmod(remainder, constants[field])
    return f.format(fmt, **values)


def create_m4b(chaptfer_files, filename):
    tmp_filename = filename.replace('.epub', '.tmp.m4a')
    if not Path(tmp_filename).exists():
        combined_audio = AudioSegment.empty()
        for wav_file in chaptfer_files:
            audio = AudioSegment.from_wav(wav_file)
            combined_audio += audio
        print('Converting to Mp4...')
        combined_audio.export(tmp_filename, format="mp4", codec="aac", bitrate="64k")
    final_filename = filename.replace('.epub', '.m4b')
    print('Creating M4B file...')
    proc = subprocess.run(['ffmpeg', '-i', f'{tmp_filename}', '-c', 'copy', '-f', 'mp4', f'{final_filename}'])
    Path(tmp_filename).unlink()
    if proc.returncode == 0:
        print(f'{final_filename} created. Enjoy your audiobook.')
        print('Feel free to delete the intermediary .wav chapter files, the .m4b is all you need.')


def cli_main():
    if not Path('kokoro-v0_19.onnx').exists() or not Path('voices.json').exists():
        print('Error: kokoro-v0_19.onnx and voices.json must be in the current directory. Please download them with:')
        print('wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/kokoro-v0_19.onnx')
        print('wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.json')
        sys.exit(1)
    kokoro = Kokoro('kokoro-v0_19.onnx', 'voices.json')
    voices = list(kokoro.get_voices())
    voices_str = ', '.join(voices)
    epilog = 'example:\n' + \
             '  audiblez book.epub -l en-us -v af_sky'
    default_voice = 'af_sky' if 'af_sky' in voices else voices[0]
    parser = argparse.ArgumentParser(epilog=epilog, formatter_class=argparse.RawDescriptionHelpFormatter)
    parser.add_argument('epub_file_path', help='Path to the epub file')
    parser.add_argument('-l', '--lang', default='en-gb', help='Language code: en-gb, en-us, fr-fr, ja, ko, cmn')
    parser.add_argument('-v', '--voice', default=default_voice, help=f'Choose narrating voice: {voices_str}')
    if len(sys.argv) == 1:
        parser.print_help(sys.stderr)
        sys.exit(1)
    args = parser.parse_args()
    main(kokoro, args.epub_file_path, args.lang, args.voice)


if __name__ == '__main__':
    cli_main()

]]>
https://claudio.uk/posts/epub-to-audiobook.html hacker-news-small-sites-42708773 Wed, 15 Jan 2025 08:47:38 GMT
<![CDATA[Laser Fault Injection on a Budget: RP2350 Edition]]> thread link) | @notmine1337
January 15, 2025 | https://courk.cc/rp2350-challenge-laser | archive.org

In August 2024, Raspberry Pi introduced the RP2350 microcontroller. This part iterates over the RP2040 and comes with numerous new features. These include security-related capabilities, such as a Secure Boot implementation.

A couple of days after this announcement, during DEFCON 2024, an interesting challenge targeted at these new features was launched: the RP2350 Hacking Challenge.

After some work and the development of a fully custom “Laser Fault Injection Platform”, I managed to beat this challenge and submitted my findings to Raspberry Pi.

This article will provide technical details about this custom platform, including manufacturing files for those interested in building their own. Additionally, I will explain how injecting a single laser-induced fault can bypass the Secure Boot feature of the RP2350.

Warning

The “Laser Fault Injection Platform” introduced in this article makes use of an infrared high-power laser source. Such a component can be hazardous; be careful if you attempt to reproduce this work.


Objectives of the Challenge

The target of this challenge is a Pico 2 board, which hosts a RP2350 microcontroller. It must be configured according to the instructions provided in the challenge’s GitHub repository.

Following the instructions from the repository will:

  • Write “secret” data into one of the OTP areas of the RP2350.
  • Configure and enable the Secure Boot feature, locking the chip as much as possible. This includes disabling debug interfaces (SWD) and enabling various security features, such as glitch detectors.
  • Flash a signed firmware that primarily restricts even more access to the OTP area where the secret is stored.

The goal of the challenge is to find a way to recover the secret data, potentially bypassing the Secure Boot feature in the process.

Security Features

The Secure Boot feature is enforced by code contained in the Boot ROM of the RP2350. This code is hardened in several ways, going beyond software methods; the Boot ROM leverages hardware-specific features.

These features are primarily related to an innovative component called the “Redundancy Co-Processor.” Quoting the RP2350 Datasheet:

The redundancy coprocessor (RCP) is used in the RP2350 bootrom to provide hardware-assisted mitigation against fault injection and return-oriented programming attacks. This includes the following instructions:

  • generate and validate stack canary values based on a per-boot random seed
  • assert that certain points in the program are executed in the correct order without missing steps
  • validate booleans stored as one of two valid bit patterns in a 32-bit word
  • validate 32-bit integers stored redundantly in two words with an XOR parity mask
  • halt the processor upon reaching a software-detected panic condition

[…]

Each core’s RCP has a 64-bit seed value (Section 3.6.3.1). The RCP uses this value to generate stack canary values and to add short pseudorandom delays to RCP instructions.

Additionally, the RP2350 is capable of detecting fault injection attempts using glitch detectors. These configurable circuits respond to voltage or electromagnetic fault-injection attempts and reset the system. For this challenge, these detectors are set to their maximum sensitivity level.


Below is a brief, high-level summary of the article’s content.

Why choosing Laser Fault Injection

Considering that the Boot ROM of the RP2350 has been audited before the opening of the challenge, I did not attempt to find logic bugs in it and quickly considered a hardware attack, such as a fault injection attack.

However, online comments tend to show that the glitch detector system implemented in the RP2350 was rather efficient in mitigating simple voltage fault injection attacks.

Hence, I quickly decided to tackle the challenge with laser fault injection, assuming that focusing a laser beam away from the glitch detector circuits could allow for injecting faults without triggering them.

Now, the thing is, while I did have some experience with “classic” voltage fault injection attacks, I knew nothing about laser fault injections.

This challenge was then used as an opportunity to build a fully custom, cheap laser fault injection platform.

Note that while traditionally known to be expensive to implement, several “cheap” platforms aimed at injecting faults with a laser have emerged over the past few months:

Note

My mistake, after some research, I actually managed to find this presentation of the “RayV-Lite” introduced at Blackhat 2024. Their project presents various similarities with my “Laser Fault Injection Platform”, but also interesting differences. I’ll highlight some of these differences in this article and encourage you to check out their work.

Sample Preparation

The idea being LFI is reasonably simple. A focused laser beam is directed at the silicon die of the target component. Transistors are sensitive to photoelectric effects, meaning the emitted light can be enough to induce temporary faults or malfunctions. When precisely timed, these faults could be leveraged to bypass security features.

A major downside of such an attack compared to voltage or clock glitching is th