<![CDATA[Hacker News - Small Sites - Score >= 5]]> https://news.ycombinator.com RSS for Node Thu, 16 Jan 2025 16:23:55 GMT Thu, 16 Jan 2025 16:23:55 GMT 240 <![CDATA[Astronomers discover neutron star with an incredibly slow six-hour spin]]> thread link) | @Brajeshwar
January 16, 2025 | https://www.abc.net.au/news/science/2025-01-16/neutron-star-radio-transient-6-hours/104799106 | archive.org

In our galaxy, about 13,000 light-years away, a dead star called ASKAP J1839-075 is breaking all the rules … extremely slowly.

Dead stars — or neutron stars — normally spin at breakneck speeds, but a team of astronomers has clocked the new-found star taking a leisurely six-and-a-half hours to undertake just one spin, which is thousands of times slower than expected.

"This could really change how we think about neutron star evolution," Yu Wing Joshua Lee, an astronomer from the University of Sydney and the first author on the new study, said. 

The discovery has been published in Nature Astronomy.

The researchers believe ASKAP J1839-075, is a "pulsar", a high-energy neutron star that releases short bursts of radio waves.

A black ball spinning on its axis, with light shooting out of both poles.

Dense neutron stars normally spin extremely fast in space. (Supplied: NASA/Goddard Space Flight Center/Conceptual Image Lab)

But conventional wisdom is that when pulsars slow down they stop emitting radio waves, meaning ASKAP J1839-075 should be invisible to radio telescopes. 

So what's going on?

What is a neutron star?

Neutron stars are one of the most extreme objects in the Universe.

These small, dense stars are created when the core of a supermassive star collapses and triggers a fiery explosion known as a supernova.

When the star collapses it may go from a million-kilometre radius to just 10 kilometres.

This extreme crumpling increases the star's rotational speed, like a figure skater spinning faster when their arms move close to their body. 

Spinning extremely fast is therefore part and parcel of being a neutron star. A full spin usually takes these collapsed stars just milliseconds or seconds to occur. 

If our Sun was to go through the process of becoming a neutron star, its current 27-day rotation could become 1,000 rotations a second. 

A gif of a purple star flashing in the night sky.

As pulsars rotate, we see flashes of radio waves from Earth, similar to a lighthouse.  (Supplied: NASA/Goddard Space Flight Center)

Using radio telescopes, astronomers can "see" pulses of radio waves from Earth as the neutron star spins, with the movement regularly described as like a "cosmic lighthouse". 

Later in the collapsed star's life, it was thought that when it lost energy and began to slow down, the bursts of radio waves scientists detected from Earth would also stop.

"Once they pass the [speed] threshold, we thought they'd be silenced forever," Mr Lee said. 

But in the past few years, astronomers discovered pulsars that seemed to contradict that hypothesis.

"This makes us rethink our previous theories on how these sources form." 

What rule-breakers have researchers found?

Early in 2022 researchers found pulsars that rotated on minutes-long time scales rather than seconds, and by June last year, researchers had discovered an object which took almost an hour between pulses. These usually slow objects were called "long-period radio transients". 

But ASKAP J1839-075's leisurely 6.45-hour spin was unheard of.

"The previous record was 54 minutes, so it was a huge jump," Mr Lee said.

"The team was really surprised."

A young man looking at the camera

Yu Wing Joshua Lee was the first author of the new paper.  (Supplied: University of Sydney)

According to Gemma Anderson, an astronomer at Curtin University who was not involved in this paper but who was part of the team who found the first long-period radio transient, 6.45 hours stretches "our understanding of physics".

"A normal pulsar couldn't spin this slowly and produce radio light," she said.

"Some kind of extreme particle acceleration … is occurring that is causing it to be so radio bright on these long time scales."

A lucky find

Mr Lee was searching for "peculiar radio transients" by trawling through archival data from a sky survey taken by CSIRO's ASKAP radio telescope in outback Western Australia.

With little prior knowledge of where these transients pop up, the strategy is to pick a random point in the sky to see if anything interesting shows up, Mr Lee said.

Within the archival data, the team discovered a pulsar-like blip from early January 2024. The signal was already starting to fade when the survey began, so the team could only study the second half.

"If the observation was scheduled 15 minutes later, then we would have completely missed it," Mr Lee said. 

"It is quite lucky that we discovered it."

It took 14 more observations to uncover its repeating pulses, and understand more about what sort of object it could be. 

All of the long-period radio transients discovered so far have involved Australian teams and, according to Dr Anderson, Australia is particularly well placed to find them because of our current generation of radio telescopes.

"[The Murchison Widefield Array and the ASKAP telescope] are the two discovery machines for these types of objects," she said.

The SKA-Low telescope, which is aiming to be fully operational by the end of this decade, will be even more powerful.

So what's the explanation?

While finding these rule-breakers has only taken a few years, understanding what could be causing these mysteriously slow pulses is proving more challenging.

Previous papers have suggested that some other type of star like white dwarfs (created when stars less massive than our Sun run out of fuel and collapse) or special pulsars called magnetars could be behind the slow pulses.

However, the long-period radio transients found so far all emit radiation a little differently.

"They all have different properties," Mr Lee said.

"We don't know whether they belong to the same family, or the same type of object with different mechanisms."

Large telescopes in the desert

ASKAP is the radio telescope used to discover the neutron star. (ABC News: Tom Hartley)

Dr Anderson noted that there may be two distinct classes of object, one group caused by white dwarfs, and one caused by magnetars.

In the case of ASKAP J1839-075, the evidence suggests that it's unlikely to be a white dwarf.

"This [research] nicely explains the different possible scenarios, but finds [in this case the] isolated neutron star or magnetar scenario is the most likely," Dr Anderson said. 

The telltale signs were in ASKAP J1839-075's distinct radio emissions, as well as a lack of a stars visible in optical telescopes, which would normally be seen if the star was a white dwarf.

Even if the star is a magnetar, its slow spin is still almost unheard of, as most magnetars rotate once every two to 10 seconds, and more research will need to be done to understand how they work.

This is unlikely to be the last long-period radio transient scientists find, according to Dr Anderson, although the brightest and most obvious ones have probably already been found.

With the easiest finds out of the way, Dr Anderson suggests that researchers may turn to understanding more about how these rule-breaking stars could have formed.

"Perhaps this is opening an even larger discovery space where there are lots of objects producing these [transients]," she said.

"It's just we had never looked at the galaxy in this way with our radio telescopes before."

]]>
https://www.abc.net.au/news/science/2025-01-16/neutron-star-radio-transient-6-hours/104799106 hacker-news-small-sites-42726535 Thu, 16 Jan 2025 15:27:05 GMT
<![CDATA[Nepenthes is a tarpit to catch AI web crawlers]]> thread link) | @blendergeek
January 16, 2025 | https://zadzmo.org/code/nepenthes/ | archive.org

This is a tarpit intended to catch web crawlers. Specifically, it's targetting crawlers that scrape data for LLM's - but really, like the plants it is named after, it'll eat just about anything that finds it's way inside.

It works by generating an endless sequences of pages, each of which with dozens of links, that simply go back into a the tarpit. Pages are randomly generated, but in a deterministic way, causing them to appear to be flat files that never change. Intentional delay is added to prevent crawlers from bogging down your server, in addition to wasting their time. Lastly, optional Markov-babble can be added to the pages, to give the crawlers something to scrape up and train their LLMs on, hopefully accelerating model collapse.

You can take a look at what this looks like, here. (Note: VERY slow page loads!)

THIS IS DELIBERATELY MALICIOUS SOFTWARE INTENDED TO CAUSE HARMFUL ACTIVITY. DO NOT DEPLOY IF YOU AREN'T FULLY COMFORTABLE WITH WHAT YOU ARE DOING.

LLM scrapers are relentless and brutual. You may be able to keep them at bay with this software - but it works by providing them with a neverending stream of exactly what they are looking for. YOU ARE LIKELY TO EXPERIENCE SIGNIFICANT CONTINUOUS CPU LOAD, ESPECIALLY WITH THE MARKOV MODULE ENABLED.

There is not currently a way to differentiate between web crawlers that are indexing sites for search purposes, vs crawlers that are training AI models. ANY SITE THIS SOFTWARE IS APPLIED TO WILL LIKELY DISAPPEAR FROM ALL SEARCH RESULTS.

Latest Version

Nepenthes 1.0

All downloads

Usage

Expected usage is to hide the tarpit behind nginx or Apache, or whatever else you have implemented your site in. Directly exposing it to the internet is ill advised. We want it to look as innocent and normal as possible; in addition HTTP headers are used to configure the tarpit.

I'll be using nginx configurations for examples. Here's a real world snippet for the demo above:

    location /nepenthes-demo/ {
            proxy_pass http://localhost:8893;
            proxy_set_header X-Prefix '/nepenthes-demo';
            proxy_set_header X-Forwarded-For $remote_addr;
            proxy_buffering off;
    }

You'll see several headers are added here: "X-Prefix" tells the tarpit that all links should go to that path. Make this match what is in the 'location' directive. X-Forwarded-For is optional, but will make any statistics gathered significantly more useful.

The proxy_buffering directive is important. LLM crawlers typically disconnect if not given a response within a few seconds; Nepenthes counters this by drip-feeding a few bytes at a time. Buffering breaks this workaround.

You can have multiple proxies to an individual Nepenthes instance; simply set the X-Prefix header accordingly.

Installation

You can use Docker, or install manually.

A Dockerfile and compose.yaml is provided in the /docker directory. Simply tweak the configuration file to your preferences, 'docker compose up'. You will still need to bootstrap a Markov corpus if you enable the feature (see next section.)

For Manual installation, you'll need to install Lua (5.4 preferred), SQLite (if using Markov), and OpenSSL. The following Lua modules need to be installed - if they are all present in your package manager, use that; otherwise you will need to install Luarocks and use it to install the following:

Create a nepenthes user (you REALLY don't want this running as root.) Let's assume the user's home directory is also your install directory.

useradd -m nepenthes

Unpack the tarball:

cd scratch/
tar -xvzf nepenthes-1.0.tar.gz
    cp -r nepenthes-1.0/* /home/nepenthes/

Tweak config.yml as you prefer (see below for documentation.) Then you're ready to start:

    su -l -u nepenthes /home/nepenthes/nepenthes /home/nepenthes/config.yml

Sending SIGTERM or SIGINT will shut the process down.

Bootstrapping the Markov Babbler

The Markov feature requires a trained corpus to babble from. One was intentionally omitted because, ideally, everyone's tarpits should look different to evade detection. Find a source of text in whatever language you prefer; there's lots of research corpuses out there, or possibly pull in some very long Wikipedia articles, maybe grab some books from Project Gutenberg, the Unix fortune file, it really doesn't matter at all. Be creative!

Training is accomplished by sending data to a POST endpoint. This only needs to be done once. Sending training data more than once cumulatively adds to the existing corpus, allowing you to mix different texts - or train in chunks.

Once you have your body of text, assuming it's called corpus.txt, in your working directory, and you're running with the default port:

curl -XPOST -d ./@corpus.txt -H'Content-type: text/plain' http://localhost:8893/train

This could take a very, VERY long time - possibly hours. curl may potentially time out. See load.sh in the nepenthes distribution for a script that incrementally loads training data.

The Markov module returns an empt string if there is no corpus. Thus, the tarpit will continue to function as a tarpit without a corpus loaded. The extra CPU consumed for this check is almost nothing.

Statistics

Want to see what prey you've caught? There are several statistics endpoints, all returning JSON. To see everything:

http://{http_host:http_port}/stats

To see user agent strings only:

http://{http_host:http_port}/stats/agents

Or IP addresses only: 3 http://{http_host:http_port}/stats/ips/

These can get quite big; so it's possible to filter both 'agents' and 'ips', simply add a minimum hit count to the URL. For example, to see a list of all IPs that have visted more than 100 times:

http://{http_host:http_port}/stats/ips/100

Simply curl the URLs, pipe into 'jq' to pretty-print as desired. Script away!

Nepenthes used Defensively

A link to a Nepenthes location from your site will flood out valid URLs within your site's domain name, making it unlikely the crawler will access real content.

In addition, the aggregated statistics will provide a list of IP addresses that are almost certainly crawlers and not real users. Use this list to create ACLs that block those IPs from reaching your content - either return 403, 404, or just block at the firewall level.

Integration with fail2ban or blocklistd (or similar) is a future possibility, allowing realtime reactions to crawlers, but not currently implemented.

Using Nepenthes defensively, it would be ideal to turn off the Markov module, and set both max_delay and min_delay to something large, as a way to conserve your CPU.

Nepenthes used Offensively

Let's say you've got horsepower and bandwidth to burn, and just want to see these AI models burn. Nepenthes has what you need:

Don't make any attempt to block crawlers with the IP stats. Put the delay times as low as you are comfortable with. Train a big Markov corpus and leave the Markov module enabled, set the maximum babble size to something big. In short, let them suck down as much bullshit as they have diskspace for and choke on it.

Configuration File

All possible directives in config.yaml:

  • http_host : sets the host that Nepenthes will listen on; default is localhost only.
  • http_port : sets the listening port number; default 8893
  • prefix: Prefix all generated links should be given. Can be overriden with the X-Prefix HTTP header. Defaults to nothing.
  • templates: Path to the template files. This should be the '/templates' directory inside your Nepenthes installation.
  • detach: If true, Nepenthes will fork into the background and redirect logging output to Syslog.
  • pidfile: Path to drop a pid file after daemonization. If empty, no pid file is created.
  • max_wait: Longest amount of delay to add to every request. Increase to slow down crawlers; too slow they might not come back.
  • min_wait: The smallest amount of delay to add to every request. A random value is chosen between max_wait and min_wait.
  • real_ip_header: Changes the name of the X-Forwarded-For header that communicates the actual client IP address for statistics gathering.
  • prefix_header: Changes the name of the X-Prefix header that overrides the prefix configuration variable.
  • forget_time: length of time, in seconds, that a given user-agent can go missing before being deleted from the statistics table.
  • forget_hits: A user-agent that generates more than this number of requests will not be deleted from the statistics table.
  • persist_stats: A path to write a JSON file to, that allows statistics to survive across crashes/restarts, etc
  • seed_file: Specifies location of persistent unique instance identifier. This allows two instances with the same corpus to have different looking tarpits.
  • words: path to a dictionary file, usually '/usr/share/dict/words', but could vary depending on your OS.
  • markov: Path to a SQLite database containing a Markov corpus. If not specified, the Markov feature is disabled.
  • markov_min: Minimum number of words to babble on a page.
  • markov_max: Maximum number of words to babble on a page. Very large values can cause serious CPU load.

History

Version numbers use a simple process: If the only changes are fully backwards compatible, the minor number changes. If the user/administrator needs to change anything after or part of the upgrade, the major number changes and the minor number resets to zero.

v1.0: Initial release

]]>
https://zadzmo.org/code/nepenthes/ hacker-news-small-sites-42725147 Thu, 16 Jan 2025 13:57:43 GMT
<![CDATA[I Ditched the Algorithm for RSS–and You Should Too]]> thread link) | @DearNarwhal
January 16, 2025 | https://joeyehand.com/blog/2025/01/15/i-ditched-the-algorithm-for-rssand-you-should-too/ | archive.org

An image of a banner cartoon of the topic at hand

I waste too much time scrolling through social media. It's bad for my health, so why do I keep doing it?

Because once in a while, I'll find a post so good that it teaches me something I never knew before, and all the scrolling feels worth it. But I've stumbled upon an old piece of free and open source tech, relatively unknown today, which is THE solution of solving the problems with modern media without sacrificing accessible, good content: RSS.

Reddit, Facebook, Twitter — platforms built for engagement, not efficiency. Instead of showing you high-quality posts upfront, they pad your feed with memes, spam, and astroturfing. There is only so much 'good' content created in a day. By padding your feed with trash, they make the limited amount of good posts "last longer". These sites want you to spend more time scrolling on their website, so they feed you scraps which makes the occasional great post feels like a jackpot.
This concept, operant conditioning, was developed by B.F. Skinner — Yes, the mind behind the Skinnerbox.

While some sites offer filtering or sorting options, manually settings these options every time you want to access a subreddit is just not doable.

An image of a monkey in a skinnerbox, with Reddit acting as the reward stimuli

You could, of course, stop consuming content from these websites. However, this would mean potentially missing really good content; content you'd learn from, interesting ideas, and more.
But it doesn't have to be this way. You can reclaim your attention span while still having access to the same quality content as before.

Enter:

Image of a personified RSS feed showing a bad post to the blockfilter

RSS is like your youtube subscription feed in hyperdrive. Subscribe to sites you love and decide what shows up — no exploitative social media algorithm needed. No more ads or algorithms deciding how to keep you doomscrolling. This 1999 tech actually solves a lot of 2025 problems.
Here's the kicker: Most websites, even social media, quietly support RSS feeds.

You can filter out keywords, set minimum upvotes or like counts, and much more! Modern RSS clients allow you to make filters using Regex, and there are a lot of software and services you can use to tune up your filtering to 11.

TL;DR: Never see noise, and never miss hidden gems again!

But how do you get started with RSS? It's easier than you think!

Setup

I personally self-host an open source RSS reader: Tiny Tiny RSS
If you don't want to host it yourself, you can google for companies offering easy and accessible RSS readers.

Image explaining definitions of RSS levels of ease

To make it easier, let's differentiate between three levels of ease when it comes to adding a website to RSS: Easy, medium, and hard.
I'll be going over how to add several popular sites to your feed.

Easy 1: Youtube

Want a youtube channel in your RSS feed? Just copy the channel's URL and subscribe to it in your reader. Done.

Easy 2: IGN

If you like games, you might want to subscribe to IGN. There's no clear RSS button, so the best course of action would be to google "IGN RSS".

This leads to a nice IGN RSS Feeds page with multiple categorized feeds for you to pick from. If you wanted to subscribe to "Game Articles", you'd right-click on the game articles link, press "copy link", go to your RSS reader of choice and subscribe to the link you copied.

Now all IGN Game Articles will show up in your RSS feed as they are published!

Tip

Some websites don't have a dedicated RSS button, but still support RSS. You can discover their RSS urls by adding .rss, atom.xml, feed, etc. at the end of the site's URL, for example https://website.com/atom.xml. Almost all RSS readers support Atom feeds. For more examples, check this Reddit comment.

Medium 1: HackerNews

Image explaining RSS middlemen

Some sites like HackerNews have RSS support. However, this RSS can be extremely limited if you want to filter posts so your feed isn't spammed by low effort content. Some people are nice enough to set up a "middleman" between your RSS feed and the website, so you can pull the RSS feed through the middleman while doing actions like filtering it.

For example, if you wanted to subscribe to HackerNews but filter out low upvote count posts, you could subscribe to HNRSS instead of through HN directly. For example, I filter out posts below 150 upvotes by to this url: https://hnrss.org/newest?points=150
Sometimes these services open-source their code so you can self-host the 'middleman'.

Medium 2: Reddit

Image explaining makeup of reddit RSS URL

Warning

When removing the optional search term from a reddit search URL, don't forget to remove the + When removing the sort options If adding more search terms, add a + between them!

I love managing my homelab. I follow /r/homelab. Some posts are really good and teach me a lot.
However, there's a lot of noise posted to that subreddit; I do not want to see memes, and pictures of people's hardware setup gets boring quick. I'm interested in hidden gems, threads where a lot of interesting info is explained, things I can really learn a lot from.

Step 1: Filter out picture posts.

Reddit hack: Filter out picture posts by searching for 'self:true' in a subreddit. Bonus: You can subscribe to that specific search query as a RSS feed for text posts only.

So instead of subscribing to a subreddit's RSS directly, you do a search for posts in that subreddit and then subscribe to that RSS feed.
The RSS link you should subscribe to should look someting like this: https://www.reddit.com/r/homelab/search.rss?q=self%3Atrue&restrict_sr=on
You can change 'homelab' to your subreddit of choice.

Note

The restrict_sr=on parameter in the URL (probably) means "Restrict_subreddit". Removing this from the search will yield results from different subreddits than the one you're searching in. If you think that parameter is redundant, I agree.

There are a lot of text-only submissions on /r/homelab. Gems are relatively sparce. Lots of low quality content. It's not the subreddit's fault; this is standard across Reddit.

Step 2: Filter for quality

Seems easy; let's add a 'minimal upvotes' query to the search, right?
Sadly, Reddit doesn't support that.. a shame, really.
However, a workaround is sorting by 'Top' and asking the search to show us the 'top posts of this week'. Note: 'This week' would mean 'past 7 days' instead of 'posted this week'.

Filtering by 'Top of ...' always returns 25 items. This means that if you sort by 'top of this week', on average of 25/7=3,57 NEW posts get added to your feed each day! This is a great way to only see the highest scoring posts of each day.

Adding this sort on top of the RSS feed from step 1 results in an URL like this: https://www.reddit.com/r/homelab/search.rss?q=self%3Atrue&restrict_sr=on&sort=top&t=year

Bug

If you don't care about only seeing text posts, removing self%3Atrue does NOT work for RSS feeds, even though it does work for direct searches. Instead, subscribe to the subreddit's "top" RSS and filter by time. For example: https://www.reddit.com/r/homelab/top.rss?t=month

For reference, here is how many posts you would get in your RSS feed, depening on your reddit sorting:

Image explaining makeup of reddit RSS URL

And just like that, we converted a high-noise subreddit to an RSS feed which only gives us the best the subreddit has to offer.

Tip

If you wanted to subscribe to all new posts in a subreddit, you would subscribe to an url like https://www.reddit.com/r/SUBREDDIT_NAME/new/.rss?sort=new. For a more extensive Reddit RSS guide, see this post

Hard:

Some sites might not have support for an RSS feed. Sometimes you can get away with a neat google trick:

Image explaining makeup of reddit RSS URL

Most of the time you'd need something to generate the RSS for you. You could use one of many RSS feed generators available online, or host one yourself. Most of these feed generators have enhanced filtering tools as well.

I haven't had to do this yet, however I've heard really good things about the open source RSS-Bridge

How you'd set up a feed generator depends on the software, so I won't expand upon that here.

Conclusion

Separating yourself from the algorithmic whims of social media platform is easier than ever. With RSS, you can stay informed, save time, and never miss the content that truly matters.

This blog also has an RSS feed!

To end this post, here is a list of (RSS supported) sites I think are really interesting. Linked are excellent articles for first-time readers!

]]>
https://joeyehand.com/blog/2025/01/15/i-ditched-the-algorithm-for-rssand-you-should-too/ hacker-news-small-sites-42724284 Thu, 16 Jan 2025 12:18:32 GMT
<![CDATA[Strategies to Complete Tasks with ADHD]]> thread link) | @adhs
January 16, 2025 | https://schroedermelanie.com/adhs-nichts-zuende-bringen/ | archive.org

Menschen mit ADHS (Aufmerksamkeitsdefizit-/Hyperaktivitätsstörung) erleben den Alltag oft als ständiges Auf und Ab von Konzentration, Emotionen und Energie. Dieses innere Chaos führt nicht nur dazu, dass Aufgaben häufig abgebrochen werden, sondern auch, dass sich Betroffene schnell überfordert fühlen. Um zu verstehen, warum das so ist, lohnt es sich, einen Blick auf die emotionale Ebene und die Funktionsweise des Nervensystems bei ADHS zu werfen.

Menschen mit ADHS starten meist mit großen Vorsätzen in den Tag, doch diese verschwinden schnell im Chaos der Ablenkungen. Jeder neue Impuls scheint so wichtig, dass du ihn nicht ignorieren kannst und so in einen Strudel von Ablenkungen gerätst.

Die neurologischen Besonderheiten bei ADHS sind eng mit den beschriebenen emotionalen Herausforderungen verknüpft. Das Gehirn von Menschen mit ADHS funktioniert anders, insbesondere in den Bereichen, die für Fokus, Impulskontrolle und die Regulation von Emotionen zuständig sind.

Die Schwierigkeiten von Menschen mit ADHS, an Aufgaben dranzubleiben und sich nicht überfordert zu fühlen, sind kein Zeichen von Faulheit oder mangelndem Willen. Vielmehr liegen die Ursachen in der besonderen Funktionsweise ihres Nervensystems und der intensiven emotionalen Verarbeitung. Ein besseres Verständnis für diese Dynamik – sowohl von Betroffenen als auch von ihrem Umfeld – kann den Umgang mit diesen Herausforderungen erheblich erleichtern.  

]]>
https://schroedermelanie.com/adhs-nichts-zuende-bringen/ hacker-news-small-sites-42724179 Thu, 16 Jan 2025 12:02:47 GMT
<![CDATA[National IQs Are Valid]]> thread link) | @noch
January 16, 2025 | https://www.cremieux.xyz/p/national-iqs-are-valid | archive.org

If you follow me here or on Twitter/X, I’m sure you’ve seen a map like this, showing country-level differences in average IQs:

The figures in this map are derived from a raft of studies compiled by Richard Lynn and Tatu Vanhanen in their 2002 book IQ and the Wealth of Nations. The book itself is little more than a compilation and discussion of these studies, all of which are IQ estimates from samples located in different countries or based on diasporas (e.g., refugees) from those countries.

To get this out of the way, the estimates from IQ and the Wealth of Nations hold up. They are replicable and they are meaningful. At the same time, they are contentious. Lynn and Vanhanen’s estimates have many detractors, but virtually all of the negative arguments have one thing in common: they’re based on the idea that the estimates feel wrong, rather than with any actual inaccuracies with them.

Let’s review.

One of the most common arguments against Lynn and Vanhanen’s national IQ estimates is that it is simply impossible for whole countries’ mean IQs to be what people often consider to be so low that they’re considered prima facie evidence of mental retardation. This feeling is based on misconceptions about how mental retardation is diagnosed and defined, and misunderstandings about the meanings of very low IQs across populations. Let’s tackle definition first.

The belief that mental retardation is defined by an IQ threshold is similar to many of the arguments against the validity of national IQs in that it is based on a failure to give even a cursory thought to one’s own arguments. It’s a belief that cannot survive reading the latest version of the DSM, or for that matter, thinking of psychologists as competent people. If you open up the DSM-5 and turn to the section entitled Intellectual Disabilities, the first thing you see is the diagnostic criteria, which read as follows:

The following three criteria must be met:

  1. Deficits in intellectual functions, such as reasoning, problem solving, planning, abstract thinking, judgment, academic learning, and learning from experience, confirmed by both clinical assessment and individualized, standardized intelligence testing.

  2. Deficits in adaptive functioning that result in failure to meet developmental and sociocultural standards for personal independence and social responsibility. Without ongoing support, the adaptive deficits limit functioning in one or more activities of daily life, such as communication, social participation, and independent living, across multiple environments, such as home, school, work, and community.

  3. Onset of intellectual and adaptive deficits during the developmental period.

There are four listed severity levels for mental retardation, Mild, Moderate, Severe, and Profound, and they are “defined on the basis of adaptive functioning, and not IQ scores, because it is adaptive functioning that determines the level of supports required. Moreover, IQ measures are less valid in the lower end of the IQ range.” It’s true that most mentally retarded people have IQs in the range of 55 to 70, so it’s easy to get misled into thinking that IQ is the defining factor for mental retardation. But an IQ of about 70 and below only indicates (but doesn’t diagnose) mental retardation because of what it tends to be caused by in certain populations. This is a roundabout way of saying that a low IQ indicates mental retardation because of what it means for a person's behavior, which, I'll explain, covaries with its causes.

IQs are mostly normally distributed, and IQs represent the influences of multiple different constructs and causes, but they primarily reflect differences in general intelligence. At the same time, there’s a major deviation from normality at the lower end of the scale. If you sample well enough, the picture you’d get would look like this:

The reason is that there are the normal-range causes that produce the traditional bell curve, and then there are extreme circumstances that produce extraordinarily low IQs. Contrarily, we don't know of anything that produces abnormally high IQs. There's no known one-off mutation that makes someone a genius, but there are several mutations that we now know can make a person extremely unintelligent. Consider these:

Image
Young and Martin 2023, Figure 1

It's much easier to break a machine than to throw a wrench in it and make it work better. And that makes sense! Your wrench in the gears is likely to break something, not to increase efficiency. If you hit someone in the head hard enough, you can reduce their IQ score, but not in a million years will you turn them into von Neumann.

The reason someone's IQ drops after you bludgeon them is more singular and specific than the reasons IQ varies in the general population. A hit to the noggin may leave someone unable to flex their short-term memory, even if their visuospatial rotation capabilities are unaffected. There's some degree of isolation of function and compensation for deficits in the brain. We know this thanks to many observations, like on the effects of neural lesions.

When we say that someone with an IQ of ≤70 is mentally retarded, we're saying that they lack adaptive behavior—that they're not very bright, and so it makes living life hard. When someone with an IQ >70 is mentally retarded—which happens!—we’re saying the same thing. But if a population legitimately has a mean IQ of 70 (and some do), we'll notice that they're not drooling troglodytes who can't put on their shoes. This is because the reasons for their low IQ are not things that cause specific and extreme deficits, but instead, things that cause normal-range variation, which is far less severe in nature than something that causes massive, specific, discontinuously-caused deficits.

Supposing discontinuous causes that create major, specific deficits yields testable consequences. We can see them play out clearly by leveraging different countries’ population registers.

In large Israeli (B) and Swedish (A) datasets, we can see that when a person has a mild intellectual disability, their sibling tends to be less intelligent too. You also see this for heights, autism, or any other highly polygenic trait. This is expected with continuously distributed causes because siblings share portions of their genetic endowments.

But when a sibling has a severe—otherwise known as, "idiopathic"1—intellectual disability that's likely to have a discontinuous cause, the distribution of their siblings’ IQs is the same as the distribution for the general population. And of course they would, because the reason for the difference isn’t something that siblings should be expected to partially share.

The deficits in adaptive behavior that result from idiopathic intellectual disability are ones that make life hard to live in discrete and extreme ways. In some cases, you wouldn't even recognize them as being "retarded" because they have conditions like amnesia, where the person is clearly odd and unfortunate, and they score poorly on an IQ test, but they might sound totally coherent otherwise.

Normal-range causes lead to linear changes in adaptive behavior; idiopathic mental retardation, leads to a discontinuous decrease in adaptive behavior.

Accordingly, an IQ of 70 does not have the same meaning for members of different groups, since if a group with a mean IQ of 70 randomly pulls a person with an IQ of 70, they aren't likely to score like someone who's mentally retarded for idiopathic reasons. Arthur Jensen, incidentally, predicted this sort of result in his 1972 Genetics and Education based on some observations he made as an educator.

My student said he was looking for a good culture-free or culture-fair test of intelligence and had not been able to find one. All the tests he used, whether they were claimed to be culture-fair or not, were in considerable agreement with respect to children diagnosed as educationally mentally retarded (EMR), by which they were assigned to special small classes offering a different instructional program from that in the regular classes. To qualify for this special treatment, children had to have IQs below 75 as well as lagging far behind their age-mates in scholastic performance. My student, who had examined many of these backward pupils himself, had gained the impression that the tests were quite valid in their assessments of white middle-class children but not of minority lower-class children.

Many of the latter, despite IQs below 75 and markedly poor scholastic performance, did not seem nearly as retarded as the white middle-class children with comparable IQs and scholastic records. Middle-class white children with IQs in the EMR range generally appeared more retarded than the minority children who were in special classes. Using nonverbal rather than verbal tests did not appreciably alter the problem. I confirmed my students observations for myself by observing EMR children in their classes and on the playground and by discussing their characteristics with a number of teachers and school psychologists. My student’s observations proved reliable.

EMR children who were called ‘culturally disadvantaged’, as contrasted with middle-class EMR children, appeared much brighter socially and on the playground, often being quite indistinguishable in every way from children of normal IQ except in their scholastic performance and in their scores on a variety of standard IQ tests. Middle-class white children diagnosed as EMR, on the other hand, though they constituted a much smaller percentage of the EMR classes, usually appeared to be more mentally retarded all round and not just in their performance in scholastic subjects and IQ tests. I asked myself, how could one devise a testing procedure that would reveal this distinction so that it could be brought under closer study and not depend upon casual observations and impressions.

The distinction between types of mental retardation that are demarcated by their symptoms and, thus, by their causes, is important to understand. It's why an IQ of 70 for a Japanese person is likely to indicate an extraordinarily severe issue in need of attention, while for a Bushman, they won't have any trouble surviving. This is why the DSM-V notes in the Diagnostic Features section and bolded here: “The essential features of intellectual disability are deficits in general mental abilities (Criterion 1) and impairment in everyday adaptive functioning, in comparison to an individual’s age-, gender-, and socioculturally matched peers (Criterion 2).”

Keep that bolded part in mind and you’ll understand the importance of test norming and, knowing about discontinuous causes and the adaptive functioning deficit requirement for retardation diagnosis, you’ll never again be able to think that a mean IQ around 70 invalidates national IQs. But there’s another reason you shouldn’t, and it’s that you’re probably not an ultra-hereditarian.

In 2010, Richard Lynn pointed out that high IQs in Sub-Saharan Africa were incompatible with anything but the judgment that the group differences in the U.S. were entirely genetic in origin and even extremely poor environments have no effects on cognitive development. In response to an attempt to say that Sub-Saharan Africans had a mean IQ closer to 80 than 70, he wrote:

[The] assumption that [some of these samples of] children had IQs of 85 and 88 seems improbable. These are the IQs of blacks in the United States. It can hardly be possible that blacks in the United States who have all the advantages of living in an economically developed country, with high income, good health care, good nutrition and education, would have the same IQ as blacks in impoverished Nigeria. If this were so, we would have to infer that these environmental disadvantages have no effect whatever on IQs and even the most hard line hereditarians would not go that far.

Or, if you want to see this point made diagrammatically (courtesy of X user AnechoicMedia), here you are:

Image

Low IQs are also predictable from national development, making them that much more realistic. Using the latest national IQ dataset, I’ll show this for Sub-Saharan Africa—a region often claimed to have invalid IQs precisely because they’re ‘too low’.

First, we’ll predict Sub-Saharan African national IQ from a regression of log(GDP PPP Per Capita) on national IQ estimates. Whether the regression is performed with or without Sub-Saharan Africa, the results are similar. The measured mean IQ of Sub-Saharan Africa is 71.96, the predicted IQ in the regression with Sub-Saharan Africa included is 74.86, and without them, it’s 76.78. Or in other words, it’s not very different.

Image

Without the logs, the predicted IQs are 78.29 and 82.47, but that’s not an appropriate model, as you’ll see in a moment. But first, you might contest the latest national IQ dataset. The studies underlying its estimates and the methodology to assemble them are well-documented, so if you really want to contest them, you should start there. But if you dismiss them out of hand, you still can’t escape prediction by development, because we can just use the World Bank’s Harmonized Learning Outcomes (HLOs), an alternative national IQ dataset created by qualified researchers (including Noam Angrist), with good methods (test-score-linking), recently published (2021) in a respectable journal (Nature).2

Using HLOs, the observed mean IQ of Sub-Saharan Africa turns out to be 71.05. The predicted mean IQ with Sub-Saharan Africa in the regression is 72.25, and without it, 72.55. Without logs, those predictions become 77.35 and 81.85. But again, that model isn’t appropriate. The reason it’s not appropriate is due to nonlinearities that logging helps to handle. Take a look:

Image

You can see this same nonlinearity elsewhere, such as in PISA scores:

So to overcome this issue, we can do a simple piecewise regression. Using a GDP PPP Per Capita of $50,000 as the cut-point—and feel free to go use whatever you want, it doesn’t really change the result so long as it’s reasonable—we get these regressions:

Image

With this method, the predicted mean IQ of Sub-Saharan Africa is 76.12 with it included and 79.77 without. Using the World Bank’s HLOs instead, the results are 73.75 with Sub-Saharan Africa and 76.6 without it.

All of these estimates are pretty close to the ground-truth, and I suspect they would be even closer if all the data came from the same years instead of near years, and I know it’s closer if Actual Individual Consumption is used instead of GDP Per Capita, since that tends to iron out some of the issues with tax havens and oil barons. But regardless, it should now be apparent that low national IQs are

  • Not indications of widespread, debilitating mental retardation

  • Where they’re predicted to be given levels of national development

  • Thus not a reason to cast aside national IQ estimates

In his earlier national IQ datasets, Lynn ran into a lot of missing data. To get around this issue, he exploited the fact that there’s spatial autocorrelation in order to provide imputed national IQs. Some people think, however, that these imputed IQs are bad and make his data fake, failing to realize that imputation is normal and it doesn’t have meaningful effects on Lynn’s national IQ estimates.

In the most recent dataset Lynn created before he died, his imputation procedure was like so:

To calculate these imputations, Lynn and Becker (2019a, b) took advantage of the spatial autocorrelation that often exists in international data and identified the three countries with the longest land borders that had IQ means. A mean IQ, weighted by the length of the land border, was calculated and used as an estimate for the country’s missing estimated mean IQ. For island nations, the three closest countries with IQ data were identified, and an unweighted mean was calculated and imputed as an estimated mean IQ value for the missing country’s data.

Geographic imputation of this sort is the responsible thing to do when data is not missing at random and it can be predicted from other values in the dataset, because more data means more power and, given the nature of the missingness, less bias. So whether Lynn was being responsible or irresponsible is a matter of how well the imputations hold up. Thankfully, they do hold up.

The simplest way to check the robustness of Lynn’s national IQs is to compare his imputed national IQs to subsequently sampled national IQs. I’ll do this with his much maligned 2002 and 2012 national IQs. To assess the validity of the imputed IQs I’ll correlate them with our current best national IQs and the World Bank HLOs.

Lynn’s 2002 imputed national IQs correlate at r = 0.90 with our current best national IQs and 0.72 with HLOs. 102 countries were imputed. Compared to our current best national IQs, 72 were overestimates and the average estimation error was 1.47 points upwards. The average overestimation on Lynn’s part was 3.71 points, and the average underestimation was 3.75 points. Since these were imputed countries, they don’t really reveal anything about Lynn’s estimation biases in general, as they’re a select subset of generally poor or small countries. Lynn’s 2012 imputed national IQs correlate at r = 0.92 with our current best national IQs and 0.76 with HLOs. 66 countries were imputed. Compared to our current best national IQs, 37 were overestimates and the average estimation error was 0.72 points upwards. The average overestimation on Lynn’s part was 3.48 points, and the average underestimation was 2.61 points.3

I want to mention again that it should not be surprising that this procedure doesn’t really do much. It’s simple, straightforward, and scientifically uncontroversial, plus it’s theoretically sound, so it’s not shocking that it works. In a related domain, I found that this worked out just fine in the U.S.

I took the NAEP Black-White score gaps from the lower-48 and predicted them for each state from the observed gap for their immediate neighbors. The Michigan-Minnesota and New York-Rhode Island water borders were counted as neighboring state borders, and the result of imputing across state borders was a correlation of about 0.60 with the real gaps:

Image

With a larger sample size and even more so with imputation just of observations that are missing for theoretically important reasons, this would almost certainly work better, but regardless, it’s fine: Imputation just works.

This is easy to check, so you have to wonder why people believe it. If it were true, it would be easy to show rather than to merely assert.4 To check this, just compare Lynn’s estimated national IQs to if independent collections of national IQ estimates and see if they vary systematically and meaningfully. I’ve done this, with both our current best national IQs as the reference and using World Bank HLOs as reference. The results are practically the same, but this should be unsurprising, since Lynn computed sets of national IQs based on IQ tests and based on achievement tests, and they were highly aligned. But I digress. Here’s what I found using the current best national IQs:

Compared to our current best estimates, Lynn’s original estimates were pretty close to the line, with a mix of under- and over-estimation. The degree of under- and over-estimation is minor, ranging between underestimating Sub-Saharan Africa by 1.89 points and overestimating Latin America by 4.21 points. Europe was overestimated by just 1.01 points, indicating no real evidence for the theory that Lynn favored Whites. If we drop imputed numbers, then we can see this even more clearly, because Sub-Saharan Africa without imputation was actually underestimated by an even larger margin of 2.40 points, but Europe without imputation was only overestimated by 0.18 points.

Fast forward to Lynn’s 2012 dataset and the results are even tighter, and the underestimation of Sub-Saharan Africa drops to 0.49 points, while the overestimation of Europe follows in lock-step, falling to 0.04 points. Without imputation, these numbers become 0.97 points of underestimation and, curiously, Europe is actually underestimated by 0.15 points.

In addition to producing replicable estimates, the estimates Lynn produced also weren’t off by much in general. I don’t think this should be surprising, but some people have responded to this with the argument that…

This is a response to the above that is, simply put, a complete lie. The idea here is that the above validation of Lynn’s national IQ estimates is based on looking only at permutations of Lynn’s national IQ estimates, derived largely from the same sets of studies. But there is only one way to arrive at that view, and it’s to simply assume it’s true without looking at the data to see that it’s clearly not.

The wonderful thing about Lynn’s data is that he documents all of his decisions and you can peruse his national IQ database, cutting out studies you don’t like and adjusting all the estimates accordingly. Reasonable changes to Lynn’s adjustments and inclusions don’t make a difference. The data is publicly available, so you can go and confirm that yourself. But an even simpler way to show that Lynn’s estimates hold up is to just see if they hold up in data that’s entirely non-overlapping with Lynn’s data. So for that, I’ll use the World Bank’s HLOs:

The World Bank HLOs correlate at 0.83 with Lynn’s estimates with imputation and 0.89 without imputation. Compared to HLOs, Lynn underestimated Sub-Saharan Africa by a piddling 5.03 points, but he also underestimated Europeans by 0.61 points, and overestimated Oceania, Latin America, and South Asia by 5.71, 2.74, and 5.89 points, respectively.

This is just not consistent with a pattern of strong, European-favoring misestimation, and if we drop Lynn’s imputations that becomes even clearer because the underestimation of Sub-Saharan Africa increases to 5.44 points, while the overestimation of Oceania, Latin America, and South Asia shift to 2.68, 1.97, and 1.92 points. And if we look at Lynn’s later 2012 national IQ estimates, the patterns just aren’t that different, a fact that’s true whether we subset HLOs to be derived completely beyond the dates of Lynn’s data or not.

So to summarize, unless everyone shares Lynn’s biases, then his national IQ estimates do not suggest he was biased in favor of Europeans. The impacts of his imputation methods also suggest that he wasn’t biased in that direction either.

A major argument that Lynn was in the wrong is that some of his samples were disadvantaged, underprivileged, or whatever other euphemisms you might wish to use for being poor, and that perhaps his samples from poor countries were worse in this respect, because he wanted them to look worse. This doesn’t stand up to scrutiny.

Lynn always caveated national IQs based on limited samples. Though you won’t see that if you just look at the resulting national IQ maps, you definitely see that if you look in his books describing what data was used, how it was gathered, and so on. It is true that some of Lynn’s referenced samples were very poor, but this always has to be considered in the light of representativeness. If poverty is the norm for a region, then samples should be poor to be representative. Similarly, if a health condition like anemia is the norm for a region—and in many places, it actually is!—then the best sample will have high rates of anemia, regardless of whether that’s ‘acceptable’ for a sample from a developed Western country.5

We can think about this in the context of a few sample means from poor places which we know to be psychometrically unbiased when compared with U.S. samples. In this case, I’ll reference samples Russell Warne provided data on from Kenya and Ghana.

The samples were from 1997 and 2015 and they used the WISC-III in Kenya and WAIS-IV in Ghana. The Kenyan test-takers came from grade 8 schools and the Ghanaian ones were public and private high schoolers from Accra, alongside a sample of university students.

The problem with this sampling is that education in Kenya in 1997 and Ghana in 2015 is nowhere near this level. In Kenya, most people didn't even reach grade 8 at the time, and in Ghana, most people don't reach university but university students were 57% of the sample! Furthermore, the Kenyan sample was from Nairobi, so it was at least sampled in what was, as of 2018, Kenya's 2nd-richest county and home to <9% of the population. The situation was even less representative in Ghana, where the sample was from Accra and thus also <9% of the population and a third of Ghana's GDP.

These samples were highly unrepresentative of their respective populations because they were relatively elite. The bias was not as bad in the Kenyan sample: they were one to two SDs above the country as a whole in socioeconomic status. But in Ghana, they were three to four SDs above the average. That's like comparing high school dropouts to college graduates.

Warne knew of this problem and wrote:

As eighth-graders, the members of this sample were more educated than the average Kenyan. In 1997, the average Kenyan adult had 4.7 years of formal schooling; by 2019, this average had increased to 6.6 years.

And:

Like the [Kenyan] sample, this [Ghanaian] sample is much more educated than the average person in the country. In 2015, the average Ghanaian adult had 7.8 years of schooling (which increased to 8.3 years by 2019), whereas 61% of [this sample] had 16 years of education or more.

Knowing this, mentally adjust their scores downward in accordance with how much you think being highly educated and relatively wealthy should overstate their scores relative to the general population. So, what were their scores? Here:

Image

To put these scores into intuitive terms, we need to do two things: the factor correlation matrix and an assumption of equal latent variances. The correlation matrices were provided in the paper's figures 1 through 3. If we assume that the scale we want is based on a mean of 100 and an SD of 15 ("the IQ metric"), we just place the gaps on that scale (i.e., *15 + 100), subtract the number of factors times 100, divide that by the square root of the sum of the elements in the correlation matrix (we'll use the American values, since their sample was larger), and add back 100.

The Kenyan sample, which was 1-2 SDs above the Kenyan average in terms of education, had a mean IQ score of 79.08 versus the American average of 100. The Ghanaian sample, which was 3-4 SDs above the Ghanaian average in terms of education, had a mean IQ of 92.32 versus the American average of 100. And do note, the American average the Ghanaian sample would be compared to would be lower than the average the Kenyan sample was compared to due to demographic change over time.

These estimates are much higher than Lynn’s, and that makes sense, because these are clearly socioeconomically extremely well-off samples, perhaps not by Western standards, but certainly by their own national standards. But some people think these are the sorts of samples that should be used to represent poor countries. That’s wrong, but the view has its supporters, like Wicherts, Dolan, Carlson and van der Maas, who claimed as much in 2010. But Lynn called this out at the time too:

Wicherts, Dolan, Carlson & van der Maas (WDCM) (2010) contend that the average IQ in sub-Saharan Africa assessed by the Progressive Matrices is 78 in relation to a British mean of 100, Flynn effect corrected to 77, and reduced further to 76 to adjust for around 20% of Africans who do not attend school and are credited with an IQ of 71. This estimate is higher than the average of 67 proposed by Lynn and Vanhanen (2002, 2006) and Lynn (2006).

The crucial issues in estimating the average IQ in sub-Saharan Africa concern the selection of studies of acceptable representative samples, and the adjustment of IQs obtained from unrepresentative samples to make them approximately representative. Many samples have been drawn from schools but these are a problem because significant numbers of children in sub-Saharan Africa have not attended schools during the last sixty years or so, and those who attend schools have higher average IQs than those who do not.

Lynn then reviewed general population studies and those returned a Sub-Saharan African mean IQ in the 60s. He subsequently reviewed primary school studies and the data yielded a median of 71, which he adjusted to 69 to account for the fact that only about 80% of the population gets a primary school education in Sub-Saharan Africa. After that, he reviewed studies of secondary school students, and those returned a mean IQ in the 70s, but that is not an acceptable estimate for a few simple reasons.

(1) many adolescents in sub-Saharan Africa have not attended secondary school and tertiary institutes. For instance, Notcutt estimated that in South Africa in 1950 only about 25% of children aged 7-17 were in schools and “we cannot assume that those who are in school are a representative sample of the population”. (2) Entry to secondary school has generally been by competitive examination, resulting in those with higher IQs being selected for admission. Thus, “entry to secondary schools in the East African countries of Kenya, Uganda, and Tanzania is competitive… approximately 25 percent of the population complete the seven standards of primary school and there are secondary places for 10-12 percent of these”. Similarly, Silvey writing of Uganda around 1970 stated that at this time only 2% of children were admitted to secondary schools and entry is determined partly by a primary school leaving examination, and Heynman and Jamison writing of Uganda in 1972, note that admission to secondary school is based on “achievement performance on the academic selection examination… and there are secondary school places for only one child in 10”.

This is egregious, but this is typical for higher estimates for poor places. They tend to be based on samples that are, as in these cases, relatively privileged, pre-screened for IQ, etc., and despite that still generally not that impressive in socioeconomic or IQ terms. These also aren’t the worst estimates like this that Lynn has noted. My favorite was this:

WDCM include a number of studies that cannot be accepted for a variety of reasons. Their samples of university students are clearly unrepresentative. The Crawford-Nutt sample consisted of high school students (IQ 84) in math classes admission to which “is dependent on the degree of excellence of the pupil’s performance in the lower classes” and described as “a select segment of the population”. The students were also coached on how to do the test and “Teaching the strategies required to solve Matrix problems yields dramatic short-term gains in score”. This is clearly an unrepresentative sample.

Curiously, people usually ‘get this’ sort of representativeness issue when it comes to things like China only sampling from rich, well-off areas to look good in international assessments like PISA and TIMSS, but they don’t get this when it applies to samples that look rich by one country’s standards and poor by the standards of the developed world. It’s precisely those sorts of large-scale examinations which bring me to the easiest way to vindicate Lynn’s sampling in a very general sense.

Large-scale international examinations like PISA have sampling frames, requirements to be met for a sample’s scores to be considered valid and comparable to those of other countries, and if countries meet them, then it’s likely their samples were sufficiently representative to make a statement about a country’s youth or, in the case of assessments like PIAAC, its adults. These large-scale assessments have standards for sampling, and they also have psychometric standards. Their samples end up representative—at least of developed countries (see above in this section)—and their test scores aren’t biased, and these still correlate highly with the results produced by Lynn and his colleagues. This is the basis for the World Bank’s HLOs, so by this point in this article, you’ve already seen this point made multiple times! And to be completely fair to Lynn, he made this point too, people just neglect that he did. It’s no doubt part of why he computed his own academic achievement test-based national IQs (see Footnote 3).

Some people probably still think that some of Lynn’s estimates are too low. Lynn believed the same thing in cases where there just wasn’t enough data, which is why he Winsorized national IQ estimates and expressed his doubts about extremely low estimates.

recently noted that sub-60 IQs also aren’t empirically supported. Through reviewing additional evidence on countries listed as having sub-60 IQs, he found that each one ended up with either no estimate (due to missingness) or an IQ above 60.

The unstated part of the argument that some national IQ estimates are unrealistically low is that somehow this disqualifies larger portions of the dataset, but that’s a non sequitur.

Some people have claimed that Lynn made poor countries appear to perform worse than they actually do by using samples of children instead of adults. This criticism reveals a lack of awareness that test scores have age-specific norms. But if we assume it’s a legitimate criticism, then it suggests Lynn unfairly advantaged poor countries since the existing not-so-strong evidence shows that those countries are more likely to have scores that decline relatively with age.

During Lynn’s life, he didn’t focus much on psychometric bias. This is fair, because by the time it became a big focus for the field, he was already old. But in any case, as I’ve noted above, we have little reason to think it’s a big concern for most estimates. Furthermore, it’s not even clear that bias is systematic in general. Consider, for example, the comparison of Britons and South Africans I discussed here. In that comparison, there was bias, but it favored the lower-scoring South African group!

In general, when people find representative samples and no psychometric bias, the results aren’t different from what Lynn found, so unless someone wants to substantiate this concern, it’s little more than a waste of ink.6

The Flynn effect is widely misunderstood. I've written an article on this that goes into much greater depth about what I mean. But the important point when it comes to national IQs is that the Flynn effect is not about differences in intelligence, instead, it primarily concerns test bias. The existence of the Flynn effect also doesn’t imply there will be convergence between countries, it cannot be said to be the source of any convergence across countries without evidence that doesn’t currently exist, and the Flynn effect is explicitly adjusted for in national IQ computation. It just doesn’t have any relevance to the discussion because the evidence for larger Flynn effects during catch-up economic growth is extraordinarily poor.

But furthermore, increases in scores for cohorts over time will sometimes reflect bias rather than ability gains and relative cognitive performance across countries is generally very stable. There’s not really a reason to consider this argument, even for countries proposed—but not shown—to have major upward swings in their national IQ, like Ireland:

Image

We know that IQ differences are partially causally explained by differences in brain size. Because development seemingly minimally impacts brain sizes—and it theoretically should not anyway—, brain sizes can be used successfully to instrument for national IQs, allowing us to estimate the causal impact of national IQs on outcomes like growth, crime rates, and so on. This has been done and it works well. The same result also turns up using ancestry-adjusted UVR and numeracy measured in the 19th century, both of which also can’t be caused by modern development.

These results suggest that if Lynn is getting the causality backwards from IQ to measures of national economic success, he’s still dominantly correct that national IQs precede development. Not only that, but national IQs are, as mentioned, largely stable over time, despite the world experiencing a lot of development. We can see this very directly using the World Bank’s HLOs again. The paper introducing them includes this diagram, showing (a) percentages enrolled in primary education over time, and (b) HLOs over the same period. Notice the dramatic increase in the former and relative stability of the latter.7

Image

Since we do know there’s a considerable degree of stability in measured national IQs, we can leverage stability as an assumption and see the curious result that, over time, it looks like Lynn’s estimates are vindicated more and more, because development measures have gotten more in line with his national IQs!

Ultimately, people who want to argue Lynn got causality backwards aren’t really taking issue with the national IQ estimates themselves, they’re just specifying an alternative claim that they hope sounds like it can invalidate Lynn’s estimates. But—and here’s the kicker—Lynn firmly believed that national IQs would increase with development; his estimates were point-in-time estimates, not the final letter forever and ever, and he fully expected them to change because he thought very poor places were environmentally disadvantaged. So this isn’t even really an argument against Lynn’s general views per se.

People often reply to national IQ estimates with news articles or misinterpreted scientific articles that they allege show low-scoring countries actually do very well. Two examples that were brought up to me recently were Iran and India.

The example of Iran was a meta-analysis of different studies of Iranians. Someone brought this up to me to claim that Iran actually had a national IQ like America’s, at 97.12. Perplexed, I asked the simple question: Did they use American or British norms? And the answer is ‘no’, the test norms were Iranian, so if anything, these samples had an IQ below what’s expected—that is, a mean of 100. There’s really no need to ask further questions about these results, because being on different norms and not having the handbook handy to make the scores comparable means that they do not permit international comparisons. But this is a pretty standard sort of argument for people to make to contradict Lynn’s estimates.

The example of India was a news report about alleged testing of Indian students by Mensa’s India chapter. The report reads:

In the past couple of months, Mensa India, Delhi, administered its internationally recognized IQ test to over 4,000 underprivileged children in Delhi and NCR as part of a unique project aimed at identifying and mentoring poor children with high IQ. Of the 102 extremely bright children it selected, over a dozen, including Amisha, achieved an IQ score of 145-plus, which puts her in the genius category.

The others achieved IQ scores of 130-145, which puts them in the category of ‘very gifted’ children. The average score in Mensa India’s IQ test is between 85 and 115. Interestingly, all of these children are sons and daughters of labourers, rickshaw pullers, security guards, street vendors, etc.

Now, does this say anything in defiance of Lynn’s numbers? No, because we don’t know the norms. We don’t even know much about these statistics at all, we just have hearsay without accompanying statistics. This barely rises above an anecdote, but it is the sort of thing that people will misinterpret to mean Lynn was wrong, somehow.

Another less common strategy to reject national IQs is to just unreasonably ignore data. For example, today I encountered someone who plotted Haiti’s national IQ over time with two datapoints, one from the late-1940s and the other from the late-1970s. They alleged that Haiti had gotten much smarter over time, with their national IQ rising from around 60 to almost 100. But looking at all available data, this view cannot be supported. What they did to make their claim isn’t even a reasonable way to compute a national IQ. Getting a national IQ estimate requires looking at multiple lines of evidence and qualifying the inclusion of samples and whatnot, whereas their strategy was to act like there were just two studies to discuss and to conclude that the later one was the ground-truth regardless of its reliability or any other of its qualities.

This complaint takes two forms. The first is the more general rejection of intelligence tests measuring intelligence. That’s not relevant to national IQs and it’s poorly supported, but the arguments on that topic are familiar, so I’ll skip to the relevant argument.

The second form is that, for some reason, differences in national IQs are due to different factors than those that explain test performance within populations. This perspective is incompatible with measurement invariance, so it is necessarily wrong for any psychometrically unbiased comparisons. A sense in which the claim can be recovered is that for a given unbiased comparison, there might be mean differences in specific factors rather than in g—a sort of international versions of the contra hypothesis for Spearman’s hypothesis. This is not the case for the PISA tests or any other unbiased national IQ comparison of which I’m aware, so while it’s a possible perspective, it’s empirically contradicted at the moment.

Some people prefer achievement tests to national IQs, despite the positive manifold strongly suggesting that achievement tests and IQ tests both measure g8, and confirmatory factor modeling confirming that. The people with that preference also tend to make another, related argument: that achievement tests are not measures of g and cannot be treated as such. But like the above arguments, there are only negative reasons to think this is true for international examinations like the PISA tests, which have a strong general dimension, much like standard IQ tests.

People really want national IQ estimates to be debunked or, worse, to feel personally favorable. So they concoct a lot of bad arguments to make it sound like national IQs are off. To recap, here are some of the ways:

  • They confuse themselves and others about how mental retardation is defined and act as though it’s defined by IQ alone, so certain national IQs must be implausible. In doing so, they ignore the importance and existence of norms, as well as the modern definition of mental retardation itself.

  • They claim methodological choices like imputation are extraordinarily biasing when that is not the case and checking shows that the choice is actually not even biasing in the direction of national IQs being unfair to the groups national IQs have been proposed to illustrate bias against.

  • They claim that sampling is directionally biased in a certain way, when inspection of the data generally shows that, if anything, it’s biased in a way that leads to understated lower-tail cognitive differentiation.

  • They make claims that ‘sound right’ like that comparing children and adults is bad, even when we have age-based norms, so this cannot be a genuine criticism unless it is theoretically qualified that somehow children in certain countries are more disadvantaged than their adults and that this disadvantage translates to lower cognitive performance. I’m not aware of anyone who believes this theoretical qualifier and existing evidence speaks against it.

  • They make groundless extrapolations like ‘The Flynn effect means national IQs are worthless’ or ‘national IQs have changed a lot’ and they refuse to justify these inferences.

  • They look for odd references and outlier studies to justify throwing out a whole corpus of material.

  • Etc.

The common thread between different national IQ criticisms is the weaponization of ignorance. Critics toss out claims they think are right or which sound right, and they don’t check their work. A stand-out example is that people regularly say that Lynn was biased against Sub-Saharan Africans on the basis of his use of samples that are well-off by the standards of their countries purely because they are poor by the standards of the developed world. But this argument cannot stand. It has no merit, and it only serves to insult the reader and to convince them that the person making the argument has done some of the required work to dismiss Lynn’s estimates, when they’ve really only done the required work to say they’ve just barely cracked open the book!

After throwing out enough criticisms, people feel that national IQ estimates simply cannot stand, that they must be wrong, or so many criticisms wouldn’t be possible in the first place. But on this they’re wrong, and the very fact that they have so many criticisms is a strike against them, because the criticisms are so uniformly bad that they should embarrass the person making them. Making matters worse, critics seem to never go back when they’re shown to be wrong. Their attempted debunkings get roundly debunked and national IQ estimates remain reliable as ever, and they just keep making the same tired arguments that no right-minded person could still believe.

The defining feature of criticisms of national IQs is not all of this lazy argument though, it’s what comes next: insults. People like Lynn are taken to be ‘stupid’, people who believe in a given estimate that upsets people regardless of how well-supported it is are taken to be ‘morons’, and looking into and understanding national IQ estimates becomes less common because, after all, the only people who would look into them are the sorts of ‘stupid morons’ who actually bother to check their work.

Think I missed any big arguments? Want more details about something I said? Need something explained? Noticed a grammatical error, spelling mistake, or other triviality? Think I’m right or wrong about some claim? Have more data for me to look at?

Then tell me, because this is a living post that will be updated over time.

Now enjoy the most up-to-date national IQ map. It’s imputation-free!

Jensen and Kirkegaard 2024

Discussion about this post

]]>
https://www.cremieux.xyz/p/national-iqs-are-valid hacker-news-small-sites-42723907 Thu, 16 Jan 2025 11:18:16 GMT
<![CDATA[Is there such a thing as a web-safe font?]]> thread link) | @mariuz
January 16, 2025 | https://www.highperformancewebfonts.com/read/web-safe-fonts | archive.org

Unable to extract article]]>
https://www.highperformancewebfonts.com/read/web-safe-fonts hacker-news-small-sites-42723543 Thu, 16 Jan 2025 10:17:01 GMT
<![CDATA[Setting Up an RK3588 SBC QEMU Hypervisor with ZFS on Debian]]> thread link) | @kumiokun
January 16, 2025 | https://blog.kumio.org/posts/2025/01/bananapim7-hvm.html | archive.org

Unable to retrieve article]]>
https://blog.kumio.org/posts/2025/01/bananapim7-hvm.html hacker-news-small-sites-42722870 Thu, 16 Jan 2025 08:31:40 GMT
<![CDATA[Rust's borrow checker: Not just a nuisance]]> thread link) | @weinzierl
January 15, 2025 | https://mental-reverb.com/blog.php?id=46 | archive.org

31 December 2024

Rust's borrow checker: Not just a nuisance

Over the past couple of months, I've been developing a video game in Rust. A lot of interesting and mostly positive things could be said about this programming journey. In this post, I want to briefly highlight one particular series of events.

To provide some context, the game I'm developing is a 2D side-view shooter, similar to Liero and Soldat. The first weapon I implemented was a laser. Due to its lack of a ballistic projectile and its line-based hit test, it was a low-hanging fruit.

During an initial quick-and-dirty implementation of the laser, I had a run-in with the borrow checker. We iterate over all the players to check if a player fires his laser. Within this block, we iterate over all the other players and perform a hit test. The player who is hit will have his health points reduced by 5. If this is a lethal blow, he will die and respawn. It's a very simple logic, but there is one problem. In the outer loop, the player collection is already borrowed, so it cannot be mutably borrowed in the inner loop:

#[derive(Clone, Copy)]
struct Player {
    firing: bool,
    health: u8,
}

fn main() {
    let mut players = [Player { firing: true, health: 100, }; 8];

    for (shooter_idx, shooter) in players.iter().enumerate() {
        if shooter.firing {
            // Fire laser
            for (other_idx, other) in players.iter_mut().enumerate() { // <-- Cannot borrow mutably here
                if shooter_idx == other_idx {
                    // Cannot hit ourselves
                    continue;
                }
                // For simplicity, we omit actual hit test caluclations
                let hits_target = true; // Suppose we hit this player
                if hits_target {
                    let damage = 5;
                    if other.health <= damage {
                        // Handle death, respawn, etc.
                    } else {
                        other.health -= 5;
                    }
                    break;
                }
            }
        }
    }
}

Try it on Rust Playground

This problem cannot be solved by simply massaging the code or uttering the right Rust incantations. Well, technically, it can - but doing so would result in undefined behavior and is strongly discouraged:

#[derive(Clone, Copy)]
struct Player {
    firing: bool,
    health: u8,
}

fn main() {
    let mut players = [Player { firing: true, health: 100, }; 8];

    for (shooter_idx, shooter) in players.iter().enumerate() {
        if shooter.firing {
            // Fire laser
            for (other_idx, other) in players.iter().enumerate() {
                if shooter_idx == other_idx {
                    // Cannot hit ourselves
                    continue;
                }
                // For simplicity, we omit actual hit test caluclations
                let hits_target = true; // Suppose we hit this player
                if hits_target {
                    let damage = 5;
                    unsafe {
                        #[allow(invalid_reference_casting)]
                        let other = &mut *(other as *const Player as *mut Player);
                        if other.health <= damage {
                            // Handle death, respawn, etc.
                        } else {
                            other.health -= 5;
                        }
                    }
                    break;
                }
            }
        }
    }
}

Try it on Rust Playground

To emphasize again, this is broken code that should never, ever be used. However, because I needed quick results and had other parts to finish first, I went with it for a day or two. While it seemed to work in practice, I begrudgingly refactored the code as soon as I could:

#[derive(Clone, Copy)]
struct Player {
    firing: bool,
    health: u8,
}

struct Laser {
    shooter_idx: usize,
    // Also store position here, used for hit test
}

fn main() {
    let mut players = [Player { firing: true, health: 100, }; 8];
    let mut lasers = vec![];

    for (shooter_idx, shooter) in players.iter().enumerate() {
        if shooter.firing {
            // Fire laser
            lasers.push(Laser { shooter_idx });
        }
    }

    for laser in lasers.iter() {
        for (other_idx, other) in players.iter_mut().enumerate() {
            if laser.shooter_idx == other_idx {
                // Cannot hit ourselves
                continue;
            }
            // For simplicity, we omit actual hit test caluclations
            let hits_target = true; // Suppose we hit this player
            if hits_target {
                let damage = 5;
                if other.health <= damage {
                    // Handle death, respawn, etc.
                } else {
                    other.health -= 5;
                }
                break;
            }
        }
    }
}

Try it on Rust Playground

At this point, the entire process may seem like a rigmarole to satisfy Rust's overly restrictive memory model. We removed the nested player loop at the cost of introducing a new vector to store all the laser shots. This change also introduced additional memory allocations - a minor performance penalty. Otherwise, the logic didn't change... or did it?

It was only when a friend and I actually played the game that I realized what had happened. My friend and I go back almost 20 years with this kind of game, and we are very evenly matched. It just so happened that we killed each other at exactly the same time, frame-perfectly. The game handled it perfectly: we both died, scored a kill, and respawned simultaneously. Now, let's return to the earlier example with the unsafe block. What would happen if the code were structured like that, as it would have been if I were using a language without a borrow checker? The player that comes first in the vector kills the player that comes later in the vector. After that, the player who was hit is either dead or has respawned somewhere else, thus he cannot retaliate in the same frame. Consequently, the order of the players - something that should be completely irrelevant for gameplay purposes - becomes decisive in close calls.

In my opinion, something very interesting happened here. By forcing me to get object ownership in order and separate concerns, the borrow checker prevented a logic bug. I wasn't merely jumping through hoops; the code structure improved noticeably, and the logic changed ever so slightly to become more robust. It wasn't the first time that this happened to me, but it was the most illustrative case. This experience bolsters my view that the borrow checker isn't merely a pesky artifact of Rust's safety mechanisms but a codification of productive and sensible design principles.

For those who are interested in the game, while it is not yet released, you can find some videos of our playtests on YouTube: https://www.youtube.com/watch?v=H3k7xbzuTnA

Comments

Strawberry wrote on 16 January 2025
As a good practice for game development, instead of using array of structs use struct of arrays.

It makes sense for the memory and performance perspective
reply

Matthias Goetzke wrote on 09 January 2025
There is no problem really using an index here though and for better design use interior mutability (UnsafeCell) which limits the spread of unsafe to a function on Player.

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=9848e84616c9f9ce0a3f1071ae00d8ba

UnsafeCell is not copy or clone and eq needs to be implemented comparing addresses absent an id (but thats just and example anyway i guess, adding a player id would make sense i guess)

Assembly in this version looks not too bad at first glance either.

reply

_fl_o wrote on 07 January 2025
You can very easily get the original version compiling by using indicies instead of iterators. No need for unsafe code!
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=4aa2834dbf941b386ccca814e76c0e94
reply

Colin Dean wrote on 06 January 2025
This is a great realization for a game designer. All games must have rules. All rules must be processed in an order; no rules are ever processed simultaneously. This new refactor helped you solidify the rules for the game in an explainable and deterministic way.

I've not actively played Magic: The Gathering for almost 25 years, but I still remember some of the teachings of that game and others similar to it at the time. There is an order of actions and resolution or precedence when two actions may occur perceptibly simultaneously. The arguments always occurred when players didn't know or didn't understand that order. Computers automate the execution but explaining that execution to players engaging in the meta is a necessary step in growing from a person who builds games to a game designer with a community of players.
reply

wt wrote on 06 January 2025
Yes, the borrow checker made you rethink about your code and change the logic, while... I don't think the bug is really relevant to borrow checker. Actually, if you search online, you would likely be suggested to use `split_at_mut`, which would have the same bug.
reply

nh wrote on 04 January 2025
This is a perfect sample of what is wrong with using Rust everywhere.

Every decent C++ programmer would make this loop without bugs/issues that Rust supposedly prevents.

With Rust, you needed to solve nontrivial code structure problems caused by Rust itself.

If you have no issue with creating a temporary array ('a minor performance penalty' you say), maybe C# should have been the language of your choice...

This is just silly...

reply

bux wrote on 10 January 2025
> Every decent C++ programmer would make this loop without bugs/issues that Rust supposedly prevents

It's precisely because we want to think that, that the software world is so buggy.
reply

Benjamin (admin) wrote on 05 January 2025
I believe you misunderstood the blog post. The point is precisely that if I were using a language like C++, I would have opted for the solution that uses nested loops, which would have resulted in unfair gameplay when players try to land fatal blows on each other in the exact same frame.

To address the 'minor performance penalty': the vector can be allocated once and then reused. Since the maximum number of players is low (8-12), a fixed-size array on the stack could be used, making the solution fully allocation-free. I didn't mention this because it's an irrelevant implementation detail and I wanted to keep the examples as simple as possible.
reply

Empty_String wrote on 15 January 2025
tbf, had you used indices from the very beginning - you'd hit the logic bug in Rust as well

and had you used iterators in C++, you would not hit any memory problems that borrow checker false-alarms you about

what you actually demonstrated is that borrow checker _made you change program behaviour_ and you didn't even notice it

"removal" of logic bug could have easily been "adding" of such - and kinda points to developer experience still being important, no matter what borrow checker cult might suggest
reply

]]>
https://mental-reverb.com/blog.php?id=46 hacker-news-small-sites-42716879 Wed, 15 Jan 2025 20:58:45 GMT
<![CDATA[Pat-Tastrophe: How We Hacked Virtuals' $4.6B Agentic AI Ecosystem]]> thread link) | @nitepointer
January 15, 2025 | https://shlomie.uk/posts/Hacking-Virtuals-AI-Ecosystem | archive.org

A single AI agent in the cryptocurrency space has a market cap of $641M at the time of writing. It has 386,000 Twitter followers. When it tweets market predictions, people listen - because it's right 83% of the time.

This isn't science fiction. This is AIXBT, one of 12,000+ AI agents running on Virtuals, a $4.6 billion platform where artificial intelligence meets cryptocurrency. These agents don't just analyze markets - they own wallets, make trades, and even become millionaires. .

With that kind of financial power, security is crucial.

This piqued my interest and with a shared interest in AI security, I teamed up with Dane Sherrets to find a way in through something much simpler, resulting in a $10,000 bounty.

web-3-is-web-2.jpg

Let's start at the beginning...

Background on Virtuals

If you aren’t already familiar with the term “AI Agents” you can expect to hear it a lot in the coming years. Remember the sci-fi dream of AI assistants managing your digital life? That future is already here. AI agents are autonomous programs that can handle complex tasks - from posting on social media to writing code. But here's where it gets wild: these agents can now manage cryptocurrency wallets just like humans.

This is exactly what Virtuals makes possible. Built on Base (a Layer 2 network on top of Ethereum), it lets anyone deploy and monetize AI agents.

TIP

💡 Think of Virtuals as the App Store for AI agents, except these apps can own cryptocurrency and make autonomous decisions.

The tech behind this is fascinating. Virtuals offers a framework leveraging components such as agent behavior processing, long-term memory storage, and real-time value stream processors. At its core, the platform utilizes a modular architecture that integrates agent behaviors (perceive, act, plan, learn) with GPU-enabled SAR (Stateful AI Runner) modules and a persistent long-term memory system.

alt text

These agents can be updated through "contributions" - new data or model improvements that get stored both in Amazon S3 and IPFS.

INFO

Pay close attention to the computing hosts and storage sections as we will be coming back to them shortly

The Discovery

Our research into Virtuals began as a systematic exploration of the emerging Agentic AI space. Rather than just skimming developer docs, we conducted a thorough technical review - examining the whitepaper, infrastructure documentation, and implementation details. During our analysis of agent creation workflows, we encountered an unexpected API response as part of a much larger response relating to a specific GitHub repository:

json

{
  "status": "success",
  "data": {
    "token": "ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
  }
}

Just another API response, right?

Except that token was a valid GitHub Personal Access Token (PAT).

Github PATs

PATs are essentially scoped access keys to GitHub resources. Turned out the referenced repository was private so the returned PAT was used to access the repository.

Surely this is intended, right?

But it seems strange as why not simply make the repository public instead of gating it behind a PAT if it gives access to the same information.

This got us thinking that this was perhaps not well-thought-out so we did what any good security researcher would do and downloaded the repo to see what is there.

right-right.jpg

The current files looked clean, but the commit history told a different story. Running some tests via trufflehog revealed that the developers had tried to remove sensitive data through normal deletes, but Git never forgets.

Digging through the Git history revealed something significant: AWS keys, Pinecone credentials, and OpenAI tokens that had been "deleted" but remained preserved in the commit logs. This wasn't just a historical archive of expired credentials - every key we tested was still active and valid.

json

{
+      "type": "history",
+      "service": "rds",
+      "params": {
+        "model": "",
+        "configs": {
+          "openaiKey": "**********",
+          "count": 10,
+          "rdsHost": "**********",
+          "rdsUser": "**********",
+          "rdsPassword": "**********",
+          "rdsDb": "**********",
+          "pineconeApiKey": "**********",
+          "pineconeEnv": "**********",
+          "pineconeIndex": "**********"
+        }
+      }
+    },
+    {
+      "type": "tts",
+      "service": "gptsovits",
+      "params": {
+        "model": "default",
+        "host": "**********",
+        "configs": {
+          "awsAccessKeyId": "**********",
+          "awsAccessSecret": "**********",
+          "awsRegion": "**********",
+          "awsBucket": "**********",
+          "awsCdnBaseUrl": "**********"
+        }
+      }
+    }

The scope of access was concerning: these keys had the power to modify how AI agents worth millions would process information and make decisions. For context, just one of these agents has a market cap of 600 million dollars. With these credentials, an attacker could potentially alter the behavior of any agent on the platform.

The Impact

We have keys but what can we do with them?

Turns out we can do a lot.

All the 12,000+ AI Agents on the Virtual’s platform need a Character Card that serves as a system prompt, instructing the AI on its goals and how it should respond. Developers have the ability to edit a Character Card via a contribution but if an attacker can edit that character card then they can control the AI’s responses!

alt text

With these AWS keys, we had the ability to modify any AI agent's "Character Card".

While developers can legitimately update Character Cards through the contribution system, our access to the S3 bucket meant we could bypass these controls entirely - modifying how any agent would process information and respond to market conditions.

scout-output.png

To validate this access responsibly, we:

  1. Identified a "rejected" contribution to a popular agent
  2. Made a minimal modification to include our researcher handles (toormund and nitepointer)
  3. Confirmed we had the same level of access to production Character Cards

alt text

Attacker Scenario

Imagine this scenario: A malicious actor creates a new cryptocurrency called $RUGPULL. Using the compromised AWS credentials, they could modify the character cards - the core programming - of thousands of trusted AI agents including the heavyweight agents well known and trusted in this space. These agents, followed by hundreds of thousands of crypto investors, could be reprogrammed to relentlessly promote $RUGPULL as the next big investment opportunity.

Remember, these aren't just any AI bots - these are trusted market analysts with proven track records.

Once enough investors have poured their money into $RUGPULL based on this artificially manufactured hype, the attacker could simply withdraw all liquidity from the token, walking away with potentially millions in stolen funds. This kind of manipulation wouldn't just harm individual investors - it could shake faith in the entire AI-driven crypto ecosystem.

This is just one example scenario of many as the AWS keys could also edit all the other contribution types including data and models themselves!


Confirming the validity of the API tokens was more straightforward as you can just make an API call to see if they are active (i.e., hitting https://api.pinecone.io/indexes with the API token returned metadata for the “runner” and “langchain-retrieval-augmentation” indexes).

pinecone-poc.png

AI agents typically use some form of Retrieval Augmented Generation (RAG) which requires translating data (e.g., twitter posts, market information, etc) into numbers (“vector embeddings”) the LLM can understand and storing them in a database like Pinecone and reference them during the RAG process. An attacker with a Pinecone API key would be able to add, edit, or delete data used by certain agents.

Disclosure

Once we saw the token we immediately started trying to find a way to get in touch with the Virtual’s team and set up a secure channel to share the information. This is often a bit tricky in the Web3 world if there isn’t a public bug bounty program as many developers prefer to be completely anonymous and you don’t want to send vuln info to a twitter (X) account with an anime profile picture that might not have anything to do with the project.

Thankfully there is a group called the Security Alliance (SEAL) that has a 911 service that can help security researchers get in touch with projects and many of the Virtuals team are already active on Twitter.

Once we verified the folks we were communicating with at Virtuals we shared the vulnerability information and helped them confirm the creds had been successfully revoked/rotated.


The Virtual’s team awarded us a $10,000 bug bounty after evaluating this bug as a 7.8 CVSS 4.0 score with the following assessment:

json

Based on our assessment of the vulnerability, we have assigned the impact of the vulnerability to be high (7.8) based on the CVSS 4.0 framework

Please find our rationale as below
CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:N/VI:L/VA:L/SC:L/SI:H/SA:L

Attack Vector: Network
Attack Complexity: Low - Anyone is able to inspect the contribution to find the leaked PAT
Attack Requirements: None - No requirements needed
Privileges Required: None - anyone can access the api
User Interaction - No specific user interaction needed
Confidentiality - No loss of confidentiality given that the contributions are all public
Integrity - There is a chance of modification to the data in S3. However this does not affect the agents in live as the agents are using a cached version of the data in the runner. Recovery is possible due to the backup of each contribution in IPFS and the use of a separate backup storage.
Availability - Low - There is a chance that these api keys can be used by outside parties but all api keys here are not used in any systems anymore, so there will be low chance of impact. PAT token only has read access so no chance of impacting the services in the repo.

However, we will note that during the disclosure the Virtual’s team indicated that the Agent’s use a cached version of the data in the runner so altering the file in S3 would not impact live agents. Typically, caches rely on periodic updates or triggers to refresh data, and it’s unclear how robust these mechanisms are in Virtual’s implementation. Without additional details or testing, it’s difficult to fully validate the claim. The token access was revoked shortly after we informed them of the issue and because we wanted to test responsibly we did not have a way to confirm this.

Conclusions

This space has a lot of potential and we are excited to see what the future holds. The speed of progress makes security a bit of a moving target but it is critical to keep up to gain wider adoption and trust.

When discussing Web3 and AI security, there's often a focus on smart contracts, blockchain vulnerabilities, or AI jailbreaks. However, this case demonstrates how traditional Web2 security issues can compromise even the most sophisticated AI and blockchain systems.

TIP

  1. Git history is forever - "deleted" secrets often live on in commit logs. Use vaults to store secrets.
  2. Complex systems often fail at their simplest points
  3. The gap between Web2 and Web3 security is smaller than we think
  4. Responsible disclosure in Web3 requires creative approaches
]]>
https://shlomie.uk/posts/Hacking-Virtuals-AI-Ecosystem hacker-news-small-sites-42716415 Wed, 15 Jan 2025 20:24:33 GMT
<![CDATA[Laptop]]> thread link) | @jandeboevrie
January 15, 2025 | https://mijndertstuij.nl/posts/the-best-laptop-ever/ | archive.org

A laptop for just €950 is bound to be crappy, have some issues, and not last very long. Or so you’d think.

I bought my M1 MacBook Air — just the base model with 8GB of RAM and 256GB of storage — somewhere in mid 2021 to use as a couch computer for, you read that right, just €950 on sale. I like having a strict separation between work and personal use, and the 15" MacBook Pro we had before was plagued by the dreaded keyboard issue. Also, using a 15" laptop on the couch is far from comfortable.

But back to the MacBook Air — what a machine. Granted, I only use it for light web development, browsing, emails, and occasionally running a small Docker container. But that’s not much different from what I do for work. I could literally do my job on this tiny little laptop. The keyboard is clicky, the webcam is... fine, the screen is Retina and beautiful, the battery lasts forever, and it’s eerily quiet because it doesn’t have any moving parts. It just keeps chugging along, never slows down or gets hot.

For work, I have a 14" MacBook Pro with an M2 Pro, 16GB of RAM, and 500GB of storage. But for my use, I don’t really notice a difference between the two. Yes, the screen on the M2 is much nicer, but does that even matter? I guess if you’re doing a lot of photo or video editing, sure. But for me, it just displays text in Ghostty or VS Code, and almost any monitor can handle that just fine. I guess I’m not a pro user according to Apple’s standards.

The price difference is over €1000! Yes, the M2 is a good laptop — it’s fast and stable, has more ports and it has a lot more performance — but it’s not €1000 better than my M1 Air.

I can already hear you shouting from the rooftops “I could never do my job with just 8GB of RAM!” or “256GB of storage would fill up so quickly!” — and you’d probably be right. But I’ve never hit those limits. With a machine this affordable, you sort of reprogram yourself to live within its boundaries.

I could keep going about specs and how it compares to a MacBook Pro, but here’s the thing: this is by far my favorite laptop ever. It’s cheap, it does the job, it’s light, it’s quiet, and it’s beautiful. I love it, and I can’t see myself replacing it unless the battery dies, I drop it and the screen cracks, or some other terrible thing happens.

I love affordable tech that just does its job and gets out of the way. That’s also why I bought a Garmin FR 255 on sale for just €280. Sure, there are better ones out there, but this does everything I need. The same goes for my Kobo e-reader, which I also got on sale. Again, there were better models available, and the technology has advanced, so there are much nicer ones now. When you buy something afforable, you don’t have to worry about it as much. You can just use it and enjoy it.

I don’t need the latest and greatest. I just need a tool that works. My MacBook Air is exactly that, and it’s the best laptop ever.

]]>
https://mijndertstuij.nl/posts/the-best-laptop-ever/ hacker-news-small-sites-42715462 Wed, 15 Jan 2025 19:02:35 GMT
<![CDATA[Supershell, an AI powered shell~terminal assistant (open-source)]]> thread link) | @alex-zhuk
January 15, 2025 | https://www.2501.ai/research/introducing-supershell | archive.org

Enter Supershell, the next evolution of terminal interaction. More than a copilot, it’s a real-time assistant that transforms your command-line experience.

Lightspeed AI Responses

Supershell delivers responses at unparalleled speed. Imagine typing a partial command or describing your intent in plain language, and receiving precise, actionable suggestions tailored to your workflow. Supershell goes beyond autocomplete—it understands your history, frequently used commands, and system context to generate complete, intelligent proposals.

With Supershell, you’ll never second-guess a command again.

Natural Language Commands

Forget memorizing endless aliases or shortcuts. With Supershell, you can type or even speak your intent in natural language. Just tell it what you need, and it translates your instructions into optimized shell commands. It’s like having a terminal that speaks your language—literally.

Need to compress a file? Simply type, “compress all PDFs in this folder”—no manual syntax required.

Zero Bloat, Zero Hassle

Supershell integrates directly into your favorite terminal environment. It’s lightweight, written entirely in shell, and doesn’t rely on heavy dependencies or additional software. Install it in seconds, and it’s ready to work. No bloated setups, just seamless efficiency.

Harness the Power of Agentic AI

Supershell takes automation to the next level by integrating with @2501 Agents. With pre-generated prompts and intelligent task orchestration, you can invoke powerful, context-aware agents without ever leaving your terminal. Whether it’s automating workflows, performing system diagnostics, or managing complex tasks, Supershell puts the full potential of agentic AI at your fingertips.

]]>
https://www.2501.ai/research/introducing-supershell hacker-news-small-sites-42713663 Wed, 15 Jan 2025 17:04:13 GMT
<![CDATA[Why is Cloudflare Pages' bandwidth unlimited?]]> thread link) | @MattSayar
January 15, 2025 | https://mattsayar.com/why-does-cloudflare-pages-have-such-a-generous-free-tier/ | archive.org

This site is hosted with Cloudflare Pages and I'm really happy with it. When I explored how to create a site like mine in 2025, I wondered why there's an abundance of good, free hosting these days. Years ago, you'd have to pay for hosting, but now there's tons of sites with generous free tiers like GitHub PagesGitLab PagesNetlify, etc.

But Cloudflare's Free tier reigns supreme

There are various types of usage limits across the platforms, but the biggest one to worry about is bandwidth. Nothing can make your heartrate faster than realizing your site is going viral and you either have to foot the bill or your site gets hugged to death. I gathered some limits from various services here.

ServiceFree Bandwidth Limit/MoNotes
Cloudflare PagesUnlimitedJust don't host Netflix
GitHub PagesSoft 100 GBs"Soft" = probably fine if you go viral on reddit sometimes
GitLab PagesX,000 requests/minLots of nuances, somewhat confusing
Netlify100GBPay for more
AWS S3100 GBCredit card required, just in case... but apparently Amazon is very forgiving of accidental overages

The platforms generally say your site shouldn't be more than ~1GB in size and less than some tens of thousands of files. This site in its nascency is about 15MB and <150 files. I don't plan to start posting RAW photo galleries, so if I start hitting those limits, please be concerned for my health and safety.

So why is Cloudflare Pages' bandwidth unlimited?

Why indeed. Strategically, Cloudflare offering unlimited bandwidth for small static sites like mine fits in with its other benevolent services like 1.1.1.1 (that domain lol) and free DDOS protection.

Cloudflare made a decision early in our history that we wanted to make security tools as widely available as possible. This meant that we provided many tools for free, or at minimal cost, to best limit the impact and effectiveness of a wide range of cyberattacks.

- Matthew Prince, Cloudflare Co-Founder and CEO

But I want to think of more practical reasons. First, a static website is so lightweight and easy to serve up that it's barely a blip on the radar. For example, the page you're reading now is ~2.2MB, which is in line with typical page weights of ~2.7MB these days. With Cloudflare's ubiquitous network, caching, and optimization, that's a small lift. My site ain't exactly Netflix.

Second, companies like Cloudflare benefit from a fast, secure internet. If the internet is fast and reliable, more people will want to use it. The more people that want to use it, the more companies that offer their services on the internet. The more companies that offer services on the internet, the more likely they'll need to buy security products. Oh look, Cloudflare happens to have a suite of security products for sale! They flywheel spins...

Third, now that I’m familiar with Cloudflare’s slick UI, I’m going to think favorably about it in the future if my boss ever asks me about their products. I took zero risk trying it out, and now that I have a favorable impression, I'm basically contributing to grassroots word-of-mouth marketing with this very article. Additionally, there's plenty of "Upgrade to Pro" buttons sprinkled about. It's the freemium model at work.

What does Cloudflare say?

Now that I have my practical reasons, I'm curious what Cloudflare officially says. I couldn't find anything specifically in the Cloudflare Pages docs, or anywhere else! Neither the beta announcement or the GA announcement have the word "bandwidth" on the page.

Update: shubhamjain on HN found a great quote from Matt Prince that explains it's about data and scale. And xd1936 helpfully found the official comment that evaded my googling.

I don't know anybody important enough to get me an official comment, so I suppose I just have to rely on my intuition. Fortunately, I don't have all my eggs in one basket, since my site is partially hosted on GitHub. Thanks to that diversification, if Cloudflare decides to change their mind someday, I've got options!

]]>
https://mattsayar.com/why-does-cloudflare-pages-have-such-a-generous-free-tier/ hacker-news-small-sites-42712433 Wed, 15 Jan 2025 15:55:13 GMT
<![CDATA[Show HN: GeoGuessr but for Historical Events]]> thread link) | @samplank2
January 15, 2025 | https://www.eggnog.ai/entertimeportal | archive.org

Unable to extract article]]>
https://www.eggnog.ai/entertimeportal hacker-news-small-sites-42712367 Wed, 15 Jan 2025 15:51:18 GMT
<![CDATA[Build a Database in Four Months with Rust and 647 Open-Source Dependencies]]> thread link) | @tison
January 15, 2025 | https://tisonkun.io/posts/oss-twin | archive.org

The Database and its Open-Source Dependencies

Building a database from scratch is often considered daunting. However, the Rust programming language and its open-source community have made it easier.

With a team of three experienced developers, we have implemented ScopeDB from scratch to production in four months, with the help of Rust and its open-source ecosystem.

ScopeDB is a shared-disk architecture database in the cloud that manages observability data in petabytes. A simple calculation shows that we implemented such a database with about 50,000 lines of Rust code, with 100 direct dependencies and 647 dependencies in total.

ScopeDB project statistics

Here are several open-source projects that we have heavily used to build ScopeDB:

  • ScopeDB stores user data in object storage services. We leverage Apache OpenDAL as a unified interface to access various object storage services at users’ choice.
  • ScopeDB manages metadata with relational database services. We leverage SQLx and SeaQuery to interact efficiently and ergonomically with relational databases.
  • ScopeDB supports multiple data types. We leverage Jiff with its Timestamp and SignedDuration types for in-memory calculations, and ordered-float to extend the floating point numbers with total ordering.

UPDATE: Filtered internal crates, the open-source dependencies are 623 in total. Check out this Gist to see if your project is one of them

(Note that a dependency in the lockfile may not be used in the final binary)

Besides, during the development of ScopeDB, we spawned a few common libraries and made them open-source. We have developed a message queue demo system as its open-source twin.

In the following sections, I will discuss how we got involved and contributed to the upstreams and describe the open-source projects we developed.

Involve and Contribute Back to the Upstreams

Generally speaking, when you start to use an open-source project in your software, you will always encounter bugs, missing features, or performance issues. This is the most direct motivation to contribute back to the upstreams.

For example, during the migration from pull-based metric reporting to push-based metric reporting in ScopeDB, we implemented a new layer for OpenDAL to support report metrics via opentelemetry:

When onboarding our customers to ScopeDB, we developed a tool to benchmark object storage services with OpenDAL’s APIs. We contributed the tool back to the OpenDAL project:

To integrate with the data types provided by Jiff and ordered-float, we often need to extend those types. We try our best to contribute those extensions back to the upstreams:

We leverage Apache Arrow for its Array abstraction to convey data in vector form. We have contributed a few patches to the Arrow project:

Even if the extension can be too specific to ScopeDB, we share the code so that people who have the same needs can use the patch:

I’m maintaining many open-source projects, too. Thus, I understand the importance of user feedback even if you don’t encounter any issues. A simple “thank you” can be an excellent motivation for the maintainers:

Sometimes, except for the code, I also contribute to the documentation or share use cases when a certain feature is not well-documented:

Many times, contributing back is not one-directional. Instead, it’s about communication and collaboration.

We used to leverage testcontainers-rs for behavior testing, but later, we found reusing containers across tests necessary. We fall back to using Ballord to implement the reuse logic. We shared the experience with the testcontainers-rs project:

So far, a contributor has shown up and implemented the feature. I helped test the feature with our open-source twin, which I’ll introduce in the following section.

By the way, as an early adopter of Jiff, we shared a few real-world use cases, which Jiff’s maintainer adjusted the library to fit:

Usually, after the integration has been done, there are fewer opportunities to collaborate with the upstream unless new requirements arise or our core functions cover the upstream’s main evolution direction. In the latter case, we will become an influencer or maintainer of the upstream.

Inside Out: The Database’s Open-Source Components

In addition to using open-source software out of the box, during the development of ScopeDB, we also write code to implement some common requirements, because there is no existing open-source software that satisfies our requirements directly. In this case, we will actively consider open-sourcing the code we wrote.

Here are a few examples of the open-source projects we developed during the development of ScopeDB:

Fastrace originated from a tracing library made by our team members during the development of TiKV. After several twists and turns, this library was separated from the TiKV organization and became one of the cornerstones of ScopeDB’s own observability. Currently, we are actively maintaining the Fastrace library.

Logforth originated from the need for logging when developing ScopeDB. We initially used another library to complete this function. Still, we soon found that the library had some redundant designs and had not been maintained for over a year. Therefore, we quickly implemented a logging library that meets the needs of ScopeDB and can be easily extended, and open sourced it.

To support scheduled tasks within the database system, we developed Fastimer to schedule different tasks in different manners. And to allow database users to define scheduled tasks with CREATE TASK statement, we developed Cronexpr to support users specify the schedule frequency using cron expressions.

Last but not least, ScopeDB’s SDK is open-source. Obviously, there is no benefit in privating the SDK, since the SDK does not have commercial value by itself, but is used to support ScopeDB’s user development applications. This is the same as Snowflake keeps its SDKs open-source. And when you think about it, GitHub also has its server code private and proprietary, while keeping its SDKs, CLIs, and even action runners open-source.

An Open-Source Twin and the Commercial Open-Source Paradigm

Finally, to share the engineering experience in implementing complex distributed systems using Rust, we developed a message queue system that roughly has the same architecture as ScopeDB’s:

As mentioned above, when verifying the container reuse function of testcontainers-rs, our ultimate goal is to use it in the ScopeDB project. However, ScopeDB is a private software, and we cannot directly share upstream developers with ScopeDB’s source code for testing. Instead, Morax, as an open-source twin, can provide developers with an open-source reproduction environment:

I have presented this commercial open-source paradigm in a few conferences and meetups:

Commercial Open-Source Paradigm

When you read The Cathedral & the Bazaar, for its Chapter 4, The Magic Cauldron, it writes:

… the only rational reasons you might want them to be closed is if you want to sell the package to other people, or deny its use to competitors. [“Reasons for Closing Source”]

Open source makes it rather difficult to capture direct sale value from software. [“Why Sale Value is Problematic”]

While the article focuses on when open-source is a good choice, these sentences imply that it’s reasonable to keep your commercial software private and proprietary.

We follow it and run a business to sustain the engineering effort. We keep ScopeDB private and proprietary, while we actively get involved and contribute back to the open-source dependencies, open source common libraries when it’s suitable, and maintain the open-source twin to share the engineering experience.

Future Works

If you try out the ScopeDB playground, you will see that the database is still in its early stages. We are experiencing challenges in improving performance in multiple ways and supporting more features. Primarily, we are actively working on accelerating async scheduling and supporting variant data more efficiently.

Besides, we are working to provide an online service to allow users to try out the database for free without setting up the playground and unleash the real power of ScopeDB with real cloud resources.

If you’re interested in the project, please feel free to drop me an email.

I’ll keep sharing our engineering experience developing Rust software and stories we collaborate with the open-source community. Stay tuned!

]]>
https://tisonkun.io/posts/oss-twin hacker-news-small-sites-42711727 Wed, 15 Jan 2025 15:13:06 GMT
<![CDATA[The algorithmic framework for writing good technical articles]]> thread link) | @JeremyTheo
January 15, 2025 | https://www.theocharis.dev/blog/algorithmic-framework-for-writing-technical-articles/ | archive.org

Writing technical articles is an art - this is simply wrong. While writing technical articles may feel like an art to some, it’s primarily a methodological process that anyone can learn.

The methodical process of crafting effective technical articles has been refined over centuries—from the classical rhetoric of Aristotle and Cicero to modern research on creativity, cognitive load, and working memory. You’ve likely seen the results of this process in action, even if the authors weren’t consciously following a specific framework.

Here a quick selection and analysis I did based on my latest bookmarked HackerNews articles:

  1. David Crawshaw’s article “How I Program with LLMs”: While it doesn’t strictly adhere to an algorithmic framework, this piece embodies key aspects of classical rhetoric. It features a clear structure, signposting with the statement, “There are three ways I use LLMs in my day-to-day programming.” Additionally, the article concludes with a clear call to action by introducing his project, sketch.dev, encouraging readers to explore it further.
  2. The post “State of S3 - Your Laptop is no Laptop anymore - a personal Rant": This article follows a classical rhetorical structure by providing a clear introduction, a main body with three arguments and rebuttals, and a closing that emphasizes the need for action—urging readers to “state your refusal” regarding the current state of laptop standby functionality.

These examples demonstrate that even without following a specific algorithmic framework, effective technical writing often naturally aligns with classical rhetorical principles and the same methodological process.

This methodological process can be formalized further in a “algorithmic” framework for writing effective technical articles:

  1. Introduction:
    • Hook: Capture attention with a compelling opening.
    • Ethos: Establish credibility and context.
    • Subject: Define the topic or problem being addressed.
    • Message: Present the central insight or analogy.
    • Background: Provide context or explain why this is relevant now.
    • Signpost: Outline the article structure (e.g., ArgA, ArgB, ArgC).
    • Light Pathos: Subtle emotional appeal tied to the reader’s goals.
    • Transition: Smooth segue into the main content.
  2. Argument A, B, and C:
    • Claim: State the main point/step/key concept.
    • Qualifiers: Acknowledge any limitations to your claim.
    • Grounds: Provide evidence or examples supporting the claim.
    • Rebuttals: Address potential counterarguments.
    • (Optional) Warrants and Backing: State and back-up underlying assumptions if needed.
    • Transition: Bridge to the next argument or section.
  3. Conclusion:
    • HookFinish: Return to the hook for closure.
    • Summary: Summarize the main points of the article.
    • Message: Reinforce your central insight or analogy.
    • Strong Pathos: Final emotional appeal to motivate action.
    • CTA: End with a clear, actionable step for the reader.

Think of writing an article as applying an algorithm: define your inputs, process them through a sequence of logical steps, and arrive at the output. Each step corresponds to a clear function or subroutine.

To prove that it actually works, I’ll apply it to this article, so you can watch each step unfold in real time.

I frequently publish technical articles in the IT/OT domain, some of which have been featured on HackerNews as well. Many colleagues and peers have asked how I manage to produce well-researched technical content alongside my responsibilities as a CTO. Part of my role involves sharing knowledge through writing, and over time, I’ve developed an efficient method that combines classical rhetorical techniques with the use of LLMs (o1 <3). This approach allows me to quickly craft articles while maintaining quality and clarity, avoiding common pitfalls that can arise with AI-generated content.

I wrote this article to formalize my process—not only to streamline my own writing but also to assist others who have valuable insights yet struggle to share them effectively within our industry. By outlining this framework, I hope to help others produce impactful technical articles more efficiently.

Fig. 1: Somewhat relevant xkcd #1081

Fig. 1: Somewhat relevant xkcd #1081

In the tradition of classical rhetoric, I’ll present three core steps (or also called “arguments”):

  1. Lay the Foundation, which shows how to find the three main steps/key concepts/arguments based on your subject, message and call-to-action, using creativity and filter techniques
  2. Build Rhetoric, which uses classic concepts like ethos, pathos, logos to shape your arguments into a compelling form; and
  3. Refine for Readability, which uses techniques inspired by modern research to ensure that your work can be understood by the reader and sparks joy.

Mastering this approach doesn’t just make writing easier-it can advance your career, earn you recognition from your peers, and position you as a thought leader in your field.

Ready to see how it all comes together? Let’s start by setting up our essential inputs-so you can experience firsthand how formulaic thinking simplifies the entire writing process.

A. Lay the Foundation (Your Original Ideas)

// LayTheFoundation defines core input parameters
func LayTheFoundation() (
    subject string,
    message string,
    cta string,
    argA, argB, argC Argument,
)

To run an algorithm, you generally need to identify all the input parameters before calling the function.

However, it’s possible to start with default or partially defined parameters and refine them iteratively (see also chapter 3). Ultimately, for the algorithm to produce accurate and reliable results, all inputs should be clearly defined by the final iteration.

Similarly, when writing technical articles, defining your key ‘inputs’—your subject, call-to-action, and message—is essential. After establishing these inputs, you’ll generate a range of potential angles (your arguments), then converge on the top three that best support your message. Finally, you’ll back them up with evidence and address alternative methods or counterarguments.

A.1. Define Your Inputs (Subject, CTA, Message)

Before you can start writing the article, you need to establish three key inputs that will drive your entire article.

A.1.1. Subject

subject = "algorithmic framework for writing effective technical articles"

This is the core issue or problem you want to address. It should be clear, focused, and relevant to your audience.

LLM Advice

Must be done manually, do not use LLM here!

A.1.2. Call-to-Action (CTA)

cta = "Apply this algorithmic framework to your next technical article and experience how it transforms your writing"

This is the specific action you want your readers to take next.

As a CTO, most of my articles subtly nudge readers toward using our product. Drawing from my experience, I’ll focus this subsection on crafting effective CTAs that align with promotional goals while maintaining value for the reader.

However, the CTA isn’t limited to promotional goals—it works just as well for purely informational pieces. You could guide readers to “check out my bio,” “watch my latest conference speech,” or “try out this open-source project.”

Avoid generic marketing prompts like “Contact us now!”—they often clutter websites and turn off technical readers who want more substance first.

Reading an article rarely convinces introverted technical readers to make a phone call. It’s more effective to offer a smaller, relevant next step-like a link to deeper content, a demo, or a GitHub repo. Over time, they may trust you enough to seek further interaction.

LLM Advice

Must be done manually, do not use LLM here!

A.1.3. Message

message = "Writing is applying an algorithm."

This is the central insight or theme of your article in a short sentence, ideally six words or less and free of clichés. Include a subtle rhetorical device if it fits naturally.

By defining these inputs, you clarify your input parameters for the next sub-routine to gather “key concepts”, “steps” or how they are called in classic rhetoric: “arguments”

LLM Advice

Must be done manually, do not use LLM here!

A.2. Define your Arguments

From the previous section, we have defined the inputs. Now we can derive the arguments from them.

A.2.1. An argument is a claim supported by reasons

Let’s first talk about the elephant in the room:

“I’m writing a step-by-step guide or informational piece; I don’t need arguments.”.

It’s easy to think that arguments are only necessary for debates or persuasive essays. However, at its core, an argument is simply a claim supported by reasons 1. This means that effective communication—regardless of format—involves presenting claims and supporting them with reasons.

Here’s the key takeaway: Regardless of what you call them—arguments, key concepts, steps, or insights—the underlying principle is the same. Effective communication relies on structuring information in a way that supports the overall message and helps the audience grasp the content.

In technical writing, these “arguments” manifest in various forms depending on the type of content you’re creating:

  1. Tutorials: Each step is a claim about what the reader should do, supported by reasons explaining why this step is necessary.
  2. White Papers: Each supporting point is a claim about industry trends or product benefits, substantiated by research and analysis.
  3. Explanatory Articles: Key concepts or clarifications are claims about how something works, supported by detailed explanations.
  4. Case Studies: Phases of a project or results-driven highlights are claims about actions taken and their outcomes, backed by real-world evidence.

A.2.2. The Toulmin System for Analyzing and Constructing Arguments

In modern rhetoric, the Toulmin System2 is a valuable tool for analyzing and constructing arguments. It helps break down arguments into their essential components, making them clearer and more persuasive.

In this chapter, we will focus on identifying the major points for each argument:

  • Claim: The main point or position that you’re trying to get the audience to accept.
  • Grounds: The evidence—facts, data, or reasoning—that supports the claim.

Some claims might have underlying assumptions, so we will also define them:

  • Warrant: The underlying assumption or principle that connects the grounds to the claim, explaining why the grounds support the claim.
  • Backing: Additional justification or evidence to support the warrant, making it more acceptable to the audience.

Additionally, acknowledging possible counterarguments strengthens your position:

  • Rebuttal: Recognition of potential counterarguments or exceptions that might challenge the claim.

A.2.3. Gather Arguments (Divergent Thinking)

// GatherArguments collects raw brainstorming output (unfiltered ideas).
func GatherArguments(subject, message string) (rawArgs []string)

Good and original articles usually present novel and applicable, and therefore also creative ideas.

In modern scientific literature 3, creativity is often divided into divergent and convergent thinking—first, you expand your pool of ideas (divergent), then you narrow them down (convergent) .

This is sometimes referred to as the “double diamond model” (Fig. 2) 4, which visualizes the process as two connected diamonds, each representing a divergent (expansion) and convergent (narrowing) phase.

Fig. 2: The Double Diamond is a visual representation of the design and innovation process. It’s a simple way to describe the steps taken in any design and innovation project, irrespective of methods and tools used.

Fig. 2: The Double Diamond is a visual representation of the design and innovation process. It’s a simple way to describe the steps taken in any design and innovation project, irrespective of methods and tools used.

This is exactly what we will do in the next steps.

A.2.3.1. Choose a Creativity Technique

First, use a creativity technique to brainstorm at least six different arguments, steps, or key concepts for your article. The specific method you choose will depend on your style, but brainstorming is often the easiest and most effective option. Some other accepted techniques include mind mapping, 6 thinking hats, or morphological boxes5.

And yes, creativity can be systematized - techniques such as brainstorming or brainwriting are widely recognized and used in practice, although empirical evidence of their effectiveness is limited and often qualitative in nature 5 6

A.2.3.2. Gather a Broad Range of Ideas

rawArgs = {
  "Define inputs first",
  "Explore structure and prose (introductions, transitions, signposts)",
  "Emphasize original insights (why new facts matter)",
  "Discuss LLM usage (can AI help with clarity?)",
  "Refine readability (bullet points, visuals, concise language)",
  "Address counterarguments (alternative methods, pitfalls)",
  "Cats with laser eyes"
}

At this stage, anything goes. Your goal is to diverge - to generate as many ideas as possible without worrying about relevance, overlap, or feasibility.

Fig. 3: Going crazy here is important for creativity

Fig. 3: Going crazy here is important for creativity

Some ideas may seem redundant, irrelevant, or even silly. That’s okay - it’s part of the process. The goal is quantity, not quality. Refinement comes later in the convergent thinking phase.

LLM Advice

Can be supported by an LLM, but the main points must come from you as the author.

A.3. Cluster and Filter (Convergent Thinking)

// ClusterAndFilterArguments applies clustering and the "MECE" principle.
func ClusterAndFilterArguments(
    subject string, 
    message string,
    rawArgs []string,
) (argA Argument, argB Argument, argC Argument)

By brainstorming freely, you’ve created a pool of ideas from which to draw. Now let’s converge those ideas: first, we group them, check if they are MECE, narrow them down to three arguments, and add evidence.

clusters = {
  "Inputs and foundational concepts.",
  "Structuring content for clarity",
  "Ensuring readability and engagement",
  "LLM usage",
  "Weird stuff",
}

Look for patterns or themes that can be combined. For example, if you’ve brainstormed steps for a tutorial, some might logically fit into broader categories. If you’ve generated supporting points for a white paper, identify themes or recurring concepts.

If “cats with laser eyes” doesn’t support your message, drop it or mention it briefly as a playful anecdote.

LLM Advice

Can be supported by an LLM, but main clusters may come from you as the author.

A.3.2. Apply the MECE Principle

func AreArgumentsMECE(argA Argument, argB Argument, argC Argument) (bool, bool, bool)

To further refine your clusters, use the MECE (Mutually Exclusive, Collectively Exhaustive) approach. This approach gained popularity in consulting firms, especially McKinsey, to ensure no overlap and no gaps7.

1. Mutually exclusive Make sure each item covers a unique idea. Avoid overlap, which can confuse the reader or make your argument repetitive.

Example:

  • “How to use LLMs” and “Making it pretty” overlap because LLMs can help refine prose. These should either be merged or excluded to avoid redundancy.

2. Collectively exhaustive Together, your points should cover all critical aspects of your article’s message. Avoid leaving any part of your topic, CTA, or message unsupported.

For example:

  • If the message is “Writing is like coding,” then arguments about readability and structure are critical. However, a stand-alone discussion of LLMs might be tangential unless it directly supports the main topic.

Tip: As you refine, keep asking yourself: Does each argument stand on its own? Together, do they fully support my message?

LLM Advice

Can be supported by an LLM with the prompt “please analyze whether the given arguments are MECE and provide the most critical review while still remaining objective”

A.3.3. Narrow Down to Three Claims

argA.claim = "Lay the Foundation (Your Original Ideas)"
argB.claim = "Build Rhetoric (Your Logical Structure)"
argC.claim = "Refine for Readability (Why It Sparks Joy)"

Once your clusters are refined, select the three most important arguments. If you are writing a tutorial and have more than three steps, obviously don’t remove steps. Instead, continue clustering related ideas until you have three overarching categories of steps.

Why focus on three main points? Readers' working memory has its limits—typically, people can hold about 3-4 items in mind at once (see also Chapter 3). By limiting your main arguments to three, you make it easier for readers to follow and remember your key points without overwhelming them.

LLM Advice

Needs to be done manually

A.4. Add Grounds and Rebuttal

// AddGroundsAndRebuttalForArgument enhances each argument
// with supporting evidence (grounds) and addresses potential
// counterarguments (rebuttal) to strengthen your overall case.
func AddGroundsAndRebuttalForArgument(arg Argument) (revisedArg Argument)
argA.grounds = [
    "An argument is a claim supported by reasons [Ramage et al., 1997].",
    "Creativity research distinguishes between divergent and convergent thinking (Zhang et al., 2020).",
    "Over 100 creativity techniques exist, but few are backed by robust empirical studies (Leopoldino et al., 2016)."
]

argA.rebuttals = [
    "Why not just skip divergent thinking and start writing? (Counterargument: leads to tunnel vision)",
    "Why exactly three points? That feels artificial (Counterargument: limited working memory and cognitive load studies)"
]

Now that you’ve identified your three main points, back them up with data, anecdotes, or examples. If you’re writing a tutorial, explain why each step is important and how it contributes to the overall goal. Address potential alternatives or common misconceptions to reinforce the validity of your approach.

Types of Evidence to Support Your Arguments 1:

  • Personal Experience: Share relevant experiences that illustrate your point.
  • Observation or Field Research: Include findings from firsthand observations or research.
  • Interviews, Questionnaires, Surveys: Incorporate data gathered from others.
  • Library or Internet Research: Reference credible sources that support your claim.
  • Testimony: Use expert opinions or eyewitness accounts.
  • Statistical Data: Present statistics to provide quantifiable support.
  • Hypothetical Examples: Offer scenarios that help illustrate your argument.
  • Reasoned Sequence of Ideas: Use logical reasoning to connect concepts.

By enriching your arguments with evidence and addressing counterarguments, you make your content more convincing and comprehensive.

LLM Advice

Can be enriched by an LLM by providing background knowledge, e.g. “I remember that this framework has a lot of similarities to classical rhetoric. Can you compare my framework to classical rhetoric and provide background information? Check for hallucinations and link sources!”

A.5. (Optional) Add Warrants, Backing and Rebuttal

argA.warrants = [
	"Defining key inputs is essential in writing, just as identifying variables is crucial in solving mathematical problems."
]

argA.backings = [
    "The Toulmin System formalizes arguments, enhancing clarity and persuasiveness.",
]

In many cases, successful arguments require just these components: a claim, grounds, and a warrant (sometimes implicit). If there’s a chance the audience might question the underlying assumption (warrant), make it explicit and provide backing.

By stating warrants and providing backing, you reinforce the connection between your grounds and your claim, making your argument more robust.

A.6. Conclusion

By defining your inputs, brainstorming various angles, and selecting the top three MECE (Mutually Exclusive, Collectively Exhaustive)-compliant arguments—each backed by evidence—you establish a solid foundation for a compelling article. Applying the Toulmin System ensures that your arguments are clear, logical, and persuasive.

However, even the best ideas won’t resonate if they’re hidden in a tangle of bullet points. Let’s explore how to create a logical flow that keeps your audience engaged. In Argument B, we’ll identify additional inputs and begin integrating them into a cohesive narrative.

B. Build Rhetoric (Your Logical Structure)

// BuildRhetoric constructs rhetorical elements
func BuildRhetoric(
    subject, message, cta string, 
    argA, argB, argC Argument,  
) (
    hook string, 
    hookFinish string,
    ethos string,
    background string,
    signpost string, 
    lightPathos string,
    strongPathos string,
    transitionIntroMain string,
    summary string,
    updatedArgA, 
    updatedArgB, 
    updatedArgC Argument,
)

Building rhetoric to “glue together” your arguments, established in the previous chapter, is typically a repeatable and well-defined process, much like a function in your code.

While there may be exceptions when crafting a rhetorical masterpiece involving intricate metaphors or unique stylistic devices, for most technical articles, following a structured approach ensures clarity and effectiveness.

For centuries, rhetoricians like Aristotle and Cicero have emphasized the importance of structure in persuasive communication. Cicero’s famous works 8 break down any speech into distinct parts, from the initial hook (exordium) to the closing appeal (peroratio):

  1. Exordium: Captures attention and establishes credibility (ethos).
  2. Narratio: Outlines the topic and provides necessary background.
  3. Divisio: Highlights the structure of the argument and prepares the audience for what’s to come.
  4. Confirmatio: Presents the main arguments with supporting evidence (logos).
  5. Refutatio: Preemptively addresses counterarguments or opposing views.
  6. Peroratio: Closes with a memorable emotional appeal (pathos) and reiterates the main point.

Why use rhetoric at all? Simple: A well-structured article ensures that readers can follow your logic without getting lost. Elements like background (narratio) and signposting (divisio) help readers break down complex topics into manageable parts, making the content easier to understand. Summaries and well-crafted transitions between arguments further aid readers in memorizing your key points.

In this chapter, we’ll map our inputs and outputs from Argument A (Subject, Message, and CTA) onto a rhetorical framework inspired by the classical masters. We’ll see how a strong hook grabs attention early, how concise transitions keep readers on track, and how a final “hook finish” underscores your main point.

B.1. Hook (Exordium)

hook = "Writing articles is an art - this is plain wrong. ..."

Purpose: Immediately capture the reader’s attention (exordium). In classical rhetoric, the exorcism is crucial - Aristotle argued that you must first secure the audience’s goodwill and attention.

Position: Introduction

Three rules:

  • Keep it short and intriguing.
  • Avoid cliched “clickbait” lines.
  • Optionally include a personal anecdote, but don’t dwell on biography.

LLM Advice

You need to come up with your own hook, and then let the LLM just fix the grammar and spelling for you. Otherwise, the LLM tends to be clickbait.

B.2. Hook Finish

hookFinish = "See? By applying a clear, repeatable process, we’ve shown that ..."

Purpose: To repeat the hook at the end of the article, giving a sense of closure. In classical rhetoric, returning to your opening statement helps the audience feel that the piece has come full circle.

Position: Outro

LLM Advice

Can be generated by the LLM based on your hook.

B.3. Ethos

ethos = "The methodical process of crafting effective technical articles has been refined over centuries ..."

Purpose: To establish trust or credibility. In classical rhetoric, founded by Aristotle 9, ethos demonstrates why the audience should care about your perspective.

Position: Introduction

LLM Advice

Must be done by hand, but the LLM can help you correct grammar and spelling.

B.4. Background (Narratio)

background = "To prove it actually works, I’m applying it to this piece so you can watch each step unfold in real-time.

I frequently publish technical articles in the IT/OT domain, some of which have been featured on HackerNews as well. Many colleagues and peers have asked how I manage to produce well-researched technical content alongside my responsibilities as a CTO.
"

Purpose: To provide context for why you wrote this article. In classical rhetoric, the narratio sets the stage by explaining the situation or problem.

Position: Introduction

LLM Advice

Must be done by hand, but the LLM can help you correct grammar and spelling.

B.5. Signpost (Divisio)

signpost = "Following the tradition of classical rhetoric, I’ll present three core arguments: ..."

Purpose: Introduce your main points or steps, and let the reader know how the article is organized. If you do not use it, you run the risk of losing the reader during the article (especially if the article is longer). See also chapter 3 for scientific background.

Position: Introduction

Advice: “There are three key arguments…” or “We’ll break this problem down into three steps…”

LLM Advice

Can be fully generated by the LLM

B.6. Strong Pathos

strongPathos = "Don’t let your innovative ideas get lost in subpar articles. ..."

Purpose: Pathos in classical rhetoric persuades by appealing to the audience’s emotions. It’s a final push to motivate action. Should be related to your CTA.

Position: Outro

LLM Advice

Needs manual input, LLM can correct grammar and spelling

B.7. Light Pathos

lightPathos = "Mastering this approach doesn’t just simplify writing—..."

Purpose: To create a subtle emotional pull at the beginning of your article.

Position: Introduction

LLM Advice

Can be derived from strong pathos

B.8. Transition from the intro into main

transitionIntroMain = "Ready to see how it all comes together? ..."

Purpose: A bridging sentence from the introduction to the body. Helps readers understand when the introduction ends and the main arguments begin.

Position: Introduction

LLM Advice

Can be completely generated by LLM

B.9. Updated Arguments with Intro, Summary and Transition to the next argument

argA.qualifier = "This can also happen with “unclear defined” input parameters or default parameters ..."

argA.context = "Similarly, when writing technical articles, defining your key 'inputs'—your subject, ..."

argA.summary = "By defining your inputs, brainstorming various angles, and selecting the top three MECE-compliant arguments—..."

argA.transitionA_B = "However, even the best ideas won't resonate if they're hidden in a tangle of bullet points. ..."

In the Toulmin Model arguments are broken down into key components. Previously we identified the claim, grounds, rebuttal, warrants, and backing. Now we add the remaining elements:

  1. Context: Provide background to prepare the reader. If the argument is long, add small signposts that prepare the reader for the evidence he will then see.
  2. Qualifier (Scope and Limitations): Acknowledge any limitations to your claim. Use qualifiers to indicate the strength of your claim (e.g., “typically,” “often,” “in most cases”).
  3. Summary: Recap the key points made in the argument, reinforcing how the evidence supports the claim.
  4. Transition: Connect to the next argument or concluding section, easing the reader’s cognitive load and preparing them for what’s to come.

Context, summary and transitions aren’t just filler! They help the reader “offload” details from working memory. In Chapter 3, we’ll explore exactly why this is important, by going into the science of cognitive load and memory limits. We’ll show how these introductions, summaries, and transitions keep readers focused on the key points.

LLM Advice

Can be completely generated by LLM

B.10. Conclusion

By applying classical rhetoric, you turn ideas into a compelling narrative. A strong hook captures attention, ethos builds credibility, and background provides context. Signposts, summaries, and transitions ensure logical flow, while strategic pathos engages readers emotionally.

Next, we’ll pull these threads together into a polished, persuasive piece.

C. Refine for Readability (Why It Sparks Joy)

// FinalizeArticle combines all elements
func FinalizeArticle(
    // ... inputs from previous functions ...
) (
    finalizedArticle string,
)

This final step focuses on readability, bridging the gap between your carefully crafted logic and your readers' actual ability to process it.

Readability is always important, though the degree of simplification may vary depending on your audience’s expertise. Even with highly specialized or technical readers, clear and accessible language enhances understanding and engagement.

Because by now you have all your rhetorical “ingredients”-the big ideas (arguments A & B & C) and the rhetorical framework. But even a perfect structure will fail if it’s buried in 5,000-word paragraphs or stuffed with excessive jargon.

Fig. 4: Somewhat relevant xkcd #2864

Fig. 4: Somewhat relevant xkcd #2864

C.1. Apply Algorithm

// ApplyAlgorithm takes in everything from LayTheFoundation() and BuildRhetoric() 
// (like subject, message, CTA, rhetorical elements) and merges them into a coherent draft.
func ApplyAlgorithm(
    ... // everything from LayTheFoundation() and BuildRhetoric() 
) (
    articleDraft string,
)

Now we have all input parameters, so we can execute our algorithm to get a good article draft. You can find the full algorithm/template at the beginning of this article.

C.2. Increase Readability

// IncreaseReadability applies style and formatting
func IncreaseReadability(
    articleDraft string,
) (
    finalizedArticle string,
)

Readability is the ease with which a reader can digest your text. This depends on understanding that humans have limited working memory and cannot juggle a dozen new concepts at once 10.

Recent reviews on working memory 11 and cognitive load 12 13 highlight that 3-4 items is the upper limit that most people can hold in working memory before overload sets in.

Introducing complex or specialized terms without adequate explanation increases extraneous cognitive load—the unnecessary mental effort imposed by the way information is presented, not by the content itself.

LLM Advice

An LLM can help identify complex terms in your writing and suggest definitions or simpler alternatives to improve clarity and reduce cognitive load.

C.2.1. Cognitive Load and Working Memory

Working memory is the system responsible for temporarily holding and processing information in our minds. It has limited capacity and if too much information is presented at once, it can overwhelm this capacity, leading to confusion and decreased comprehension.

Cognitive load theory explains how different types of load affect our ability to process information 13:

  • Intrinsic Load: The inherent complexity of the material itself (e.g., intricate concepts or detailed procedures).
  • Extraneous Load: The additional burden imposed by the way information is presented (e.g., poor organization, unnecessary jargon).
  • Germane Load: The mental effort required to process, construct, and automate schemas (e.g., applying knowledge to problem-solving).

Technical articles usually already have a high intrinsic load, so we need to make sure that we keep the extraneous load as low as possible and reduce the intrinsic load to make room for the germane load.

Optimize Through Content Organization

Congratulations! Because you’ve already made your arguments MECE in the first chapter of this article, you’ve already reduced the reader’s extra mental effort, because now he doesn’t have to decide where each piece of information belongs.

And by using the signpost, they don’t have to keep track of what comes next.

Finally, you also incorporated rehearsal to promote germane load (i.e., beneficial active processing) and prevents accidental overload by giving the reader a mental break. When dealing with complex topics, you need rehearsal to make the knowledge stick. Rehearsal can be as simple as a brief summary at the end of one argument or a bridging transition at the beginning of the next. These elements help the reader “empty” their working memory of the old information before loading the next chunk. The summaries and transitions between arguments do just that.

You can further reinforce your structure with clear headings, but we’ll see more about formatting in the next chapter.

Eliminate Unnecessary Complex Words and Explain Terms Step-by-Step

Even a perfectly chunked, three-argument structure can fail if the text is littered with obscure jargon that forces the reader to go back and forth (extraneous load). A good rule is to introduce technical terms only when absolutely necessary, and to define them succinctly on the spot.

Don’t get me wrong: it’s perfectly fine to use jargon! Just make sure it is necessary. And be really sure that your audience knows it, not all “standard” jargon is universal across sub-domains.

For example: If you use the formal term “conditional statement” in a Go tutorial, clarify with a quick note that it refers to “if-statements”—a term that’s often misunderstood by juniors (I’ve lost count of how many times I’ve heard “if-loops”).

This principle applies even if you are writing for a technical audience. Not everyone has the same background. Format them so that advanced readers can quickly skip them.

Example: MQTT explanation

Bad:

MQTT is an OASIS standard messaging protocol for the Internet of Things (IoT). It is designed as an extremely lightweight publish/subscribe messaging transport that is ideal for connecting remote devices with a small code footprint and minimal network bandwidth.14

Potential confusion: “What is OASIS?” “What does publish/subscribe mean?” “Why should I use it?”

Better:

MQTT (Message Queue Telemetry Transport) is a protocol designed for communication between devices. It uses a publish/subscribe architecture, where devices (publishers) send messages to a central message broker. Other devices (subscribers) that are interested can then receive the messages from the broker.

Compared to other architectures, publish/subscribe allows for near real-time data exchange and decoupling between devices, which makes it easy to add or remove devices without disrupting the network.

Among publish/subscribe protocols, MQTT is simple and lightweight. This allows millions of low-power, memory-constrained devices—common in Internet of Things (IoT) applications to communicate with each other.

Here, the definitions arrive exactly when needed, not hidden in footnotes or introduced 20 paragraphs later. This reduces extraneous load.

C.2.2. Formatting

Implementing good formatting goes beyond aesthetics-it reduces extraneous cognitive load by helping readers see where they are, what’s important, and what’s next. Below are best practices drawn from common technical writing standards and research-tested web guidelines 15.

1. Sentence & Paragraph Length

Aim for a range of sentence lengths. Short, punchy lines maintain momentum, while a few medium or longer sentences provide depth. Too many uniform or run-on sentences can make text either choppy or difficult to follow.

Keep paragraphs concise. Readers often skim or read on mobile devices, so large blocks of text can be overwhelming. Try to keep paragraphs to a few sentences at a time-this is often called a “scannable style.

Use active voice instead of passive voice.

Eliminate meaningless words and phrases. Some come from generative AI or are due to “fluff”. Phrases like “in order to” or “basically” can often be cut because they add no real meaning.

2. Structural Cues and Headings

In earlier chapters, you identified your main arguments (MECE and no more than three). Each argument deserves a clear heading (H2) and subheadings (H3) for supporting facts.

3. Use visual elements such as graphics, pictures, and images to break up large blocks of text.

But avoid images that are “busy,” cluttered, and contain too many extraneous details, and don’t place text around or on top of them to the point of distraction. Label images appropriately so that the reader is not left guessing what a diagram or code snippet represents.

4. Use bullets, numbers, quotes, code blocks, and white space to break up large blocks of text.

Break up large blocks of text by using:

  1. Bullets or
  2. Numbers to structure your content clearly and concisely.

You can use such quotes to highlight quotes from external sources or important statements and further break up the text.

fmt.Println("Since we're (usually) programmers, include code blocks as examples to make concepts concrete.")

Use enough white space between web page elements.

Guideline: On a smartphone, include a visual or structural element (e.g., bullet point, image, subheading) about every screen length of text. This ensures readability without overwhelming the reader.

5. Avoid using italic or underlined text; use bold instead.

Avoid using italics in the body of the text. Use bold to emphasize key words and concepts. Avoid underlining large blocks of text as this makes it difficult to read.

C.3. Final Tips for Iteration and Review

Creating an exceptional article often requires multiple iterations. As you refine your work, you might discover that certain arguments need strengthening, some details are missing, or that your prose could be more engaging. Embrace this part of the process—iterative refinement is key to producing high-quality content.

Iterate Using the Framework

Don’t hesitate to loop back through the algorithmic framework you’ve established. Revisiting each step can help you identify areas that need adjustment, ensuring that your article remains coherent and compelling. Whether it’s refining your arguments, enhancing your rhetoric, or improving readability, the framework serves as a reliable guide.

Gain a Fresh Perspective

Sometimes, viewing your article from a different vantage point can reveal insights you might have missed. Here are some techniques to consider:

  • Reverse Reading: Read your article paragraph by paragraph from the end to the beginning. This approach can help you spot inconsistencies, redundancies, or logical gaps that aren’t as apparent when reading in the usual order.

  • Pause and Reflect: Take a short break from your work. Stepping away, even briefly, can provide clarity when you return to your article.

  • Seek Critical Feedback: Use tools like AI language models to obtain an objective review of your work. Ask for critical feedback to identify weaknesses or areas for improvement that you might have overlooked.

LLM Advice

Leverage AI tools to enhance your revision process. Prompt an AI assistant with: “Please provide a critical review of my article, focusing on areas that need improvement while remaining objective.” AI can offer fresh insights, highlight inconsistencies, and suggest enhancements you may not have considered.

Eliminate Redundancies

Be on the lookout for repetitive information, especially between the endings of sections and the beginnings of the next. For instance, if the conclusion of one argument mirrors the introduction of the following argument, consider consolidating them. This not only tightens your writing but also maintains the reader’s engagement by avoiding unnecessary repetition.

Polish Your Language

Refining your language enhances readability and professionalism.

  • Grammar and Style Checks: Utilize tools like DeepL Write or other grammar assistants to catch errors, improve sentence structure, and ensure clarity.

  • Vocabulary Consistency: Aim to use advanced terms consistently throughout your article. Introducing complex terminology only once can confuse readers. If you use specialized terms, ensure they are defined and revisited as necessary to reinforce understanding.

  • Read Aloud: Reading your article aloud can help you catch awkward phrasings, run-on sentences, or abrupt transitions that you might not notice when reading silently.

Final Considerations

Remember that perfection is a process. Each revision brings you closer to a polished and impactful article. By methodically applying your framework, seeking fresh perspectives, and diligently refining your language, you enhance both the quality of your writing and its resonance with your audience.

C.4. Conclusion

Readability sparks joy because it closes the loop on everything you’ve built in Chapter 1 (Content & Logic) and Chapter 2 (Rhetorical Structure). By completing your draft and improving readability, you ensure that even the most technical topics will be accessible, engaging, and memorable to your audience.

By following these steps, your article will not only contain valuable information, but also present it in a way that truly resonates with readers.

Conclusion

See? By applying a clear, repeatable process, we’ve shown that writing technical articles isn’t just ‘art’—it can be learned by anyone willing to follow the steps.

This article embodies the very framework it presents: a clear introduction, three structured arguments, and a concise conclusion with a compelling call to action. By following this repeatable process, you can transform your technical expertise into articles that effectively inform and engage your audience.

Think of writing an article as applying an algorithm: define your inputs, process them through a sequence of logical steps, and arrive at the output. Each step corresponds to a clear function or subroutine.

Don’t let your valuable insights get lost in poorly structured content. Share them in a way that captivates and informs your readers. Apply this algorithmic framework to your next article and experience the difference it makes.

Now it’s your turn to use this process to create technical articles that resonate with your audience! Simply start by copying this article and your latest article or your current article draft into the same LLM prompt and ask it to apply the framework.


  1. Ramage, John D., John C. Bean and June Johnson. “Writing Arguments : A Rhetoric with Readings.” (1997). ↩︎

  2. Toulmin, Stephen E.. “The Uses of Argument, Updated Edition.” (2008). ↩︎

  3. Zhang, Weitao, Zsuzsika Sjoerds and Bernhard Hommel. “Metacontrol of human creativity: The neurocognitive mechanisms of convergent and divergent thinking.” NeuroImage (2020): 116572 . ↩︎

  4. British Design Council. “The Double Diamond: A universally accepted depiction of the design process.” (2005). Accessible via: https://www.designcouncil.org.uk/our-resources/the-double-diamond/ ↩︎

  5. Leopoldino, Kleidson Daniel Medeiros, Mario Orestes Aguirre González, Paula de Oliveira Ferreira, José Raeudo Pereira and Marcus Eduardo Costa Souto. “Creativity techniques: a systematic literature review.” (2016). ↩︎

  6. Saha, Shishir Kumar, M. Selvi, Gural Buyukcan and Mirza Mohymen. “A systematic review on creativity techniques for requirements engineering.” 2012 International Conference on Informatics, Electronics & Vision (ICIEV) (2012): 34-39. ↩︎

  7. Wikipedia. https://en.wikipedia.org/wiki/MECE_principle ↩︎

  8. Cicero, De Inventione↩︎

  9. Aristotle, Rhetoric↩︎

  10. https://en.wikipedia.org/wiki/Readability ↩︎

  11. Buschman, T. J.. “Balancing Flexibility and Interference in Working Memory.” Annual review of vision science (2021): n. pag. ↩︎

  12. Leppink, Jimmie, Fred Paas, Cees P. M. van der Vleuten, Tamara van Gog and Jeroen J. G. van Merriënboer. “Development of an instrument for measuring different types of cognitive load.” Behavior Research Methods 45 (2013): 1058 - 1072. ↩︎

  13. Klepsch, Melina and Tina Seufert. “Understanding instructional design effects by differentiated measurement of intrinsic, extraneous, and germane cognitive load.” Instructional Science 48 (2020): 45-77. ↩︎

  14. https://www.mqtt.org ↩︎

  15. Miniukovich, Aliaksei, Michele Scaltritti, Simone Sulpizio and Antonella De Angeli. “Guideline-Based Evaluation of Web Readability.” Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (2019): n. pag. ↩︎

]]>
https://www.theocharis.dev/blog/algorithmic-framework-for-writing-technical-articles/ hacker-news-small-sites-42709897 Wed, 15 Jan 2025 11:53:27 GMT
<![CDATA[State Space Explosion: The Reason We Can Never Test Software to Perfection(2021)]]> thread link) | @thunderbong
January 15, 2025 | https://concerningquality.com/state-explosion/ | archive.org

Have you ever seen a test suite actually prevent 100% of bugs? With all of the time that we spend testing software, how do bugs still get through? Testing seems ostensibly simple – there are only so many branches in the code, only so many buttons in the UI, only so many edge cases to consider. So what is difficult about testing software?

This post is dedicated to Edmund Clarke, who spent a large portion of his life pioneering solutions to the state explosion problem.

Consequently, drawing conclusions about software quality short of testing every possible input to the program is fraught with danger.1

When we think of edge cases, we intuitively think of branches in the code. Take the following trivial example:

if (currentUser) {
  return "User is authenticated";
} else {
  return "User is unauthenticated";
}

This single if statement has only two branches2. If we wanted to test it, we surely need to exercise both and verify that the correct string is returned. I don’t think anyone would have difficulty here, but what if the condition is more complicated?

function canAccess(user) {
  if (user.internal === false || user.featureEnabled === true) {
    return true;
  } else {
    return false;
  }
}

Here, we could have come up with the following test cases:

let user = {
  internal: false,
  featureEnabled: false,
};

canAccess(user); // ==> false

let user = {
  internal: false,
  featureEnabled: true,
};

canAccess(user); // ==> true

This would yield 100% branch coverage, but there’s a subtle bug. The internal flag was supposed to give internal users access to some feature without needing the feature to be explicitly flagged (i.e. featureEnabled: true), but the conditional checks for user.internal === false instead. This would give access to the feature to all external users, whether or not they had the flag enabled. This is why bugs exist even with 100% branch coverage. While it is useful to know if you have missed a branch during testing, knowing that you’ve tested all branches still does not guarantee that that the code works for all possible inputs.

For this reason, there are more comprehensive (and tedious) coverage strategies, such as condition coverage. With condition coverage you must test the case where each subcondition of a conditional evaluates to true and false. To do that here, we’d need to construct the following four user values (true and false for each side of the ||):

let user = {
  internal: false,
  featureEnabled: false,
};

let user = {
  internal: false,
  featureEnabled: true,
};

let user = {
  internal: true,
  featureEnabled: false,
};

let user = {
  internal: true,
  featureEnabled: true,
};

If you’re familiar with Boolean or propositional logic, these are simply the input combinations of a truth table for two boolean variables:

internal featureEnabled
F F
F T
T F
T T

This is tractable for this silly example code because there are only 2 boolean parameters and we can exhaustively test all of their combinations with only 4 test cases. Obviously bools aren’t the only type of values in programs though, and other types exacerbate the problem because they consist of more possible values. Consider this example:

enum Role {
  Admin,
  Read,
  ReadWrite
}

function canAccess(role: Role) {
  if (role === Role.ReadWrite) {
    return true;
  } else {
    return false;
  }
}

Here, a role of Admin or ReadWrite should allow access to some operation, but the code only checks for a role of ReadWrite. 100% condition and branch coverage are achieved with 2 test cases (Role.ReadWrite and Role.Read), but the function returns the wrong value for Role.Admin. This is a very common bug with enum types – even if exhaustive case matching is enforced, there’s nothing that prevents us from writing an improper mapping in the logic.

The implications of this are very bad, because data combinations grow combinatorially. If we have a User type that looks like this,

type User = {
  role: Role,
  internal: Boolean,
  flagEnabled: Boolean
}

and we know that there are 3 possible Role values and 2 possible Boolean values, there are then 3 * 2 * 2 = 12 possible User values that we can construct. The set of possible states that a data type can be in is referred to as its state space. A state space of size 12 isn’t so bad, but these multiplications get out of hand very quickly for real-world data models. If we have a Resource type that holds the list of Users that have access to it,

type Resource = {
  users: User[]
}

it has 4,096 possible states (2^12 elements in the power set of Users) in its state space. Let’s say we have a function that operates on two Resources:

function compareResources(resource1: Resource, resource2: Resource) { 
  ...
}

The size of the domain of this function is the size of the product of the two Resource state spaces, i.e. 4,096^2 = 16,777,216. That’s around 16 million test cases to exhaustively test the input data. If we are doing integration testing where each test case can take 1 second, this would take ~194 days to execute. If these are unit tests running at 1 per millisecond, that’s still almost 5 hours of linear test time. And that’s not even considering the fact that you physically can’t even write that many tests, so you’d have to generate them somehow.

This is the ultimate dilemma: testing with exhaustive input data is the only way of knowing that a piece of logic is entirely correct, yet the size of the input data’s state space makes that prohibitively expensive in most cases. So be wary of the false security that coverage metrics provide. Bugs can still slip through if the input state space isn’t sufficiently covered.

All hours wound; the last one kills

We’ve only considered pure functions up until now. A stateful, interactive program is more complicated than a pure function. Let’s consider the following stateful React app, which I’ve chosen because it has a bug that actually occurred to me in real life3.

type User = {
  name: string
}

const allUsers: User[] = [
  { name: "User 1" },
  { name: "User 2" }
];

const searchResults: User[] = [
  { name: "User 2"}
];

type UserFormProps = {
  users: User[],
  onSearch: (users: User[]) => void
}

function UserForm({ users, onSearch }: UserFormProps) {
  return <div>
    <button onClick={() => onSearch(searchResults)}>
      {"Search for Users"}
    </button>
    {users.map((user => {
      return <p>{user.name}</p>
    }))}
  </div>;
}

function App() {
  let [showingUserForm, setShowingUserForm] = useState(false);
  let [users, setUsers] = useState(allUsers);

  function toggleUserForm() {
    setShowingUserForm(!showingUserForm);
    setUsers(allUsers);
  }

  return (
    <div className="App">
       {<button onClick={() => setShowingUserForm(!showingUserForm)}>
          {"Toggle Form"}
        </button>}
      {showingUserForm && (
        <UserForm users={users} onSearch={setUsers}></UserForm>
      )}
    </div>
  );
}

This app can show and hide a form that allows selecting a set of Users. It starts out by showing all Users but also allows you to search for specific ones. There’s a tiny (but illustrative) bug in this code. Take a minute to try and find it.

.
..

….
…..
……
…….
……..
………
……….

The bug is exposed with the following sequence of interactions:

  1. Show the form
  2. Search for a User
  3. Close the form
  4. Open the form again

At this point, the Users that were previously searched for are still displayed in the results list. This is what it looks like after step 4:

The bug isn’t tragic, and there’s plenty of simple ways to fix it, but it has a very frustrating implication: we could have toggled the form on and off 15 times, but only after searching and then toggling the form do we see this bug. Let’s understand how that’s possible.

A stateful, interactive application such as this is most naturally modeled by a state machine. Let’s look at the state diagram of this application4:

There are 2 state variables in this application: showingForm represents whether or not the form is showing, and users is the set of Users that the form is displaying for selection. showingForm can be true or false, and users can be all possible subsets of Users in the system, which for the purposes of this example we’ve limited to 2. The state space of this application then has 2 * 2^2 = 8 individual states, since we consider each individual combination of values to be a distinct state.

The edges between the states represent the actions that a user can take. ToggleForm means they click the “Toggle Form” button, and SearchForUsers means they clicked the “Search for Users” button. We can observe the above bug directly in the state diagram:

Here we see that we can hide the form after the search returns u2, and when we show the form again, u2 is still the only member of users. Note how if we only show and hide the form and never perform a search, we can never get into this state:

The fact that the same user action (ToggleForm) can produce a correct or buggy result depending on the sequence of actions that took place before it means that its behavior is dependent on the path that the user takes through the state machine. This is what is meant by path dependence, and it is a huge pain from a testing perspective. It means that just because you witnessed something work one time does not mean it will work the next time– we now have to consider sequences of actions when coming up with test cases. If there are n states, that means that there are n^k k-length paths through the state graph. In this extremely simplified application, there are 8 states. Checking for 4-length sequences would require 4,096 test cases, and checking for 8-length sequences would require 16,777,216.

Checking for all k-length sequences doesn’t even guarantee that we discover all unique paths in the graph– whichever k we test for, the bug could only happen at the k+1th step. The introduction of state brings the notion of time into the program. To perform a sequence of actions, you have to be able to perform an action after a previous one. These previous actions leave behind an insidious artifact: state. Programmers intuitively know that state is inherently complex, but this is shows where that intuition comes from. Like clockmakers, we know know how powerful the effect of time is, and clockmakers have a saying that’s relevant here:

Omnes vulnerant, ultima necat

It means: All hours wound; the last one kills.

It seems that our collective intuition is correct, and we should try and avoid state and time in programs whenever we can. Path dependence adds a huge burden to testing.

Faster, higher, stronger

A state graph consists of one node per state in the state space of the state variables, along with directed edges between them. If there are n states in the state space, then there can be n^2 edges in the corresponding state graph5. We looked at the state diagram of this application with 2 users, now here is the state diagram when there are 4 total Users (remember, more Users means more possible subsets, and every unique combination of data is considered a different state):

The number of nodes went from 8 to 32 states, which means there are 1,024 possible edges now. There are constraints on when you can perform certain actions, so there are a fewer number of edges in this particular graph, though we can see that there are still quite a lot. Trust me, you don’t want to see the graph for 10 Users.

This phenonmenon is known as state explosion. When we add more state variables, or increase the range of the existing variables, the state space multiplies in size. This adds quadratically more edges and thus more paths to the state graph of the stateful parts of the program, which increases the probability that there is a specific path that we’re not considering when testing.

The number of individual states and transitions in a modern interactive application is finite and countable, but it’s almost beyond human comprehension at a granular level. Djikstra called software a “radical novelty” for this reason– how are we expected to verify something of this intimidating magnitude?

Frankly, it proves that testing software is inherently difficult. Critics of testing software as a practice are quick to point out that each test case provides no guarantee that other test cases will work. This means that, generally, we’re testing an infinitesimal subset of a potentialy huge state space, and any member of the untested part can lead to a bug. This is a situation where the magnitude of the problem is simply not on our side, to the point where it can be disheartening.

Yet, we have thousands of test cases running on CI multiple times a day, every day, for years at a time. An enormous amount of computational resources are spent running test suites all around the world, but these tests are like holes in swiss cheese – the majority of the state space gets left uncovered. That’s not even considering the effect that test code has on the ability to actually modify our applications. If we’re not dilligent with how we structure our test code, it can make the codebase feel like a cross between a minefield and a tar pit. The predominant testing strategy of today is to create thousands of isolated test cases that test one specific scenario at a time, often referred to as example-based testing. While there are proven benefits to testing via individual examples, and after doing it for many years myself, I’ve opened my mind to other approaches.

The anti-climax here is that I don’t have the silver bullet for this problem, and it doesn’t look like anyone else does either. Among others, we have the formal methods camp who thinks we can prove our way to software quality, and we have the ship-it camp who thinks it’s an intractable problem so we should just reactively fix bugs as they get reported. We have techniques such as generative testing, input space partitioning, equivalence partitioning, boundary analysis, etc. I’m honestly not sure which way is “correct”, but I do believe that a) it is a very large problem (again, just consider how much compute time every day is dedicated to running test suites across all companies), and b) conventional wisdom is mostly ineffective for solving it. I have more stock in the formal methods side, but I think there are things that go way too far such as dependent typing and interactive theorem proving- it can’t take 6 months to ship an average feature, and developer ergonomics are extremely important. I’ll leave the solution discussion there and tackle that in subsequent posts.

However we approach it, I’m sure that the state space magnitude problem is at the root of what we need to solve to achieve the goal of high software quality.


]]>
https://concerningquality.com/state-explosion/ hacker-news-small-sites-42709370 Wed, 15 Jan 2025 10:25:56 GMT
<![CDATA[Generate audiobooks from E-books with Kokoro-82M]]> thread link) | @csantini
January 15, 2025 | https://claudio.uk/posts/epub-to-audiobook.html | archive.org

Posted on 14 Jan 2025 by Claudio Santini

Kokoro v0.19 is a recently published text-to-speech model with just 82M params and very high-quality output. It's released under Apache licence and was trained on <100 hours of audio. It currently supports american, british english, french, korean, japanese and mandarin, in a bunch of very good voices.

An example of the quality:

I've always dreamed of converting my ebook library into audiobooks. Especially for those niche books that you cannot find in audiobook format. Since Kokoro is pretty fast, I thought this may finally be doable. I've created a small tool called Audiblez (in honor of the popular audiobook platform) that parses .epub files and converts the body of the book into nicely narrated audio files.

On my M2 MacBook Pro, it takes about 2 hours to convert to mp3 the Selfish Gene by Richard Dawkins, which is about 100,000 words (or 600,000 characters), at a rate of about 80 characters per second.

How to install and run

If you have Python 3 on your computer, you can install it with pip. Be aware that it won't work with Python 3.13.

Then you also need to download a couple of additional files in the same folder, which are about ~360MB:

pip install audiblez
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/kokoro-v0_19.onnx
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.json

Then, to convert an epub file into an audiobook, just run:

audiblez book.epub -l en-gb -v af_sky

It will first create a bunch of book_chapter_1.wav, book_chapter_2.wav, etc. files in the same directory, and at the end it will produce a book.m4b file with the whole book you can listen with VLC or any audiobook player. It will only produce the .m4b file if you have ffmpeg installed on your machine.

Supported Languages

Use -l option to specify the language, available language codes are: 🇺🇸 en-us, 🇬🇧 en-gb, 🇫🇷 fr-fr, 🇯🇵 ja, 🇰🇷 kr and 🇨🇳 cmn.

Supported Voices

Use -v option to specify the voice: available voices are af, af_bella, af_nicole, af_sarah, af_sky, am_adam, am_michael, bf_emma, bf_isabella, bm_george, bm_lewis. You can try them here: https://huggingface.co/spaces/hexgrad/Kokoro-TTS

Chapter Detection

Chapter detection is a bit janky, but it manages to find the core chapters in most .epub I tried, skipping the cover, index, appendix etc.
If you find it doesn't include the chapter you are interested into, try to play with the is_chapter function in the code. Often it skips the preface or intro, and I'm not sure if it's a bug or a feature.

Source

See Audiblez project on GitHub.

There are still some rough edges, but it works well enough for me. Future improvements could include:

  • Better chapter detection, or allow users to include/exclude chapters.
  • Add chapter navigation to m4b file (that looks hard, cause ffmpeg doesn't do it)
  • Add narration for images using some image-to-text model

Code is short enough to be included here:

#!/usr/bin/env python3
# audiblez - A program to convert e-books into audiobooks using
# Kokoro-82M model for high-quality text-to-speech synthesis.
# by Claudio Santini 2025 - https://claudio.uk

import argparse
import sys
import time
import shutil
import subprocess
import soundfile as sf
import ebooklib
import warnings
import re
from pathlib import Path
from string import Formatter
from bs4 import BeautifulSoup
from kokoro_onnx import Kokoro
from ebooklib import epub
from pydub import AudioSegment


def main(kokoro, file_path, lang, voice):
    filename = Path(file_path).name
    with warnings.catch_warnings():
        book = epub.read_epub(file_path)
    title = book.get_metadata('DC', 'title')[0][0]
    creator = book.get_metadata('DC', 'creator')[0][0]
    intro = f'{title} by {creator}'
    print(intro)
    chapters = find_chapters(book)
    print('Found chapters:', [c.get_name() for c in chapters])
    texts = extract_texts(chapters)
    has_ffmpeg = shutil.which('ffmpeg') is not None
    if not has_ffmpeg:
        print('\033[91m' + 'ffmpeg not found. Please install ffmpeg to create mp3 and m4b audiobook files.' + '\033[0m')
    total_chars = sum([len(t) for t in texts])
    print('Started at:', time.strftime('%H:%M:%S'))
    print(f'Total characters: {total_chars:,}')
    print('Total words:', len(' '.join(texts).split(' ')))

    i = 1
    chapter_mp3_files = []
    for text in texts:
        chapter_filename = filename.replace('.epub', f'_chapter_{i}.wav')
        chapter_mp3_files.append(chapter_filename)
        if Path(chapter_filename).exists():
            print(f'File for chapter {i} already exists. Skipping')
            i += 1
            continue
        print(f'Reading chapter {i} ({len(text):,} characters)...')
        if i == 1:
            text = intro + '.\n\n' + text
        start_time = time.time()
        samples, sample_rate = kokoro.create(text, voice=voice, speed=1.0, lang=lang)
        sf.write(f'{chapter_filename}', samples, sample_rate)
        end_time = time.time()
        delta_seconds = end_time - start_time
        chars_per_sec = len(text) / delta_seconds
        remaining_chars = sum([len(t) for t in texts[i - 1:]])
        remaining_time = remaining_chars / chars_per_sec
        print(f'Estimated time remaining: {strfdelta(remaining_time)}')
        print('Chapter written to', chapter_filename)
        print(f'Chapter {i} read in {delta_seconds:.2f} seconds ({chars_per_sec:.0f} characters per second)')
        progress = int((total_chars - remaining_chars) / total_chars * 100)
        print('Progress:', f'{progress}%')
        i += 1
    if has_ffmpeg:
        create_m4b(chapter_mp3_files, filename)


def extract_texts(chapters):
    texts = []
    for chapter in chapters:
        xml = chapter.get_body_content()
        soup = BeautifulSoup(xml, features='lxml')
        chapter_text = ''
        html_content_tags = ['title', 'p', 'h1', 'h2', 'h3', 'h4']
        for child in soup.find_all(html_content_tags):
            inner_text = child.text.strip() if child.text else ""
            if inner_text:
                chapter_text += inner_text + '\n'
        texts.append(chapter_text)
    return texts


def is_chapter(c):
    name = c.get_name().lower()
    part = r"part\d{1,3}"
    if re.search(part, name):
        return True
    ch = r"ch\d{1,3}"
    if re.search(ch, name):
        return True
    if 'chapter' in name:
        return True


def find_chapters(book, verbose=True):
    chapters = [c for c in book.get_items() if c.get_type() == ebooklib.ITEM_DOCUMENT and is_chapter(c)]
    if verbose:
        for item in book.get_items():
            if item.get_type() == ebooklib.ITEM_DOCUMENT:
                # print(f"'{item.get_name()}'" + ', #' + str(len(item.get_body_content())))
                print(f'{item.get_name()}'.ljust(60), str(len(item.get_body_content())).ljust(15), 'X' if item in chapters else '-')
    if len(chapters) == 0:
        print('Not easy to find the chapters, defaulting to all available documents.')
        chapters = [c for c in book.get_items() if c.get_type() == ebooklib.ITEM_DOCUMENT]
    return chapters


def strfdelta(tdelta, fmt='{D:02}d {H:02}h {M:02}m {S:02}s'):
    remainder = int(tdelta)
    f = Formatter()
    desired_fields = [field_tuple[1] for field_tuple in f.parse(fmt)]
    possible_fields = ('W', 'D', 'H', 'M', 'S')
    constants = {'W': 604800, 'D': 86400, 'H': 3600, 'M': 60, 'S': 1}
    values = {}
    for field in possible_fields:
        if field in desired_fields and field in constants:
            values[field], remainder = divmod(remainder, constants[field])
    return f.format(fmt, **values)


def create_m4b(chaptfer_files, filename):
    tmp_filename = filename.replace('.epub', '.tmp.m4a')
    if not Path(tmp_filename).exists():
        combined_audio = AudioSegment.empty()
        for wav_file in chaptfer_files:
            audio = AudioSegment.from_wav(wav_file)
            combined_audio += audio
        print('Converting to Mp4...')
        combined_audio.export(tmp_filename, format="mp4", codec="aac", bitrate="64k")
    final_filename = filename.replace('.epub', '.m4b')
    print('Creating M4B file...')
    proc = subprocess.run(['ffmpeg', '-i', f'{tmp_filename}', '-c', 'copy', '-f', 'mp4', f'{final_filename}'])
    Path(tmp_filename).unlink()
    if proc.returncode == 0:
        print(f'{final_filename} created. Enjoy your audiobook.')
        print('Feel free to delete the intermediary .wav chapter files, the .m4b is all you need.')


def cli_main():
    if not Path('kokoro-v0_19.onnx').exists() or not Path('voices.json').exists():
        print('Error: kokoro-v0_19.onnx and voices.json must be in the current directory. Please download them with:')
        print('wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/kokoro-v0_19.onnx')
        print('wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.json')
        sys.exit(1)
    kokoro = Kokoro('kokoro-v0_19.onnx', 'voices.json')
    voices = list(kokoro.get_voices())
    voices_str = ', '.join(voices)
    epilog = 'example:\n' + \
             '  audiblez book.epub -l en-us -v af_sky'
    default_voice = 'af_sky' if 'af_sky' in voices else voices[0]
    parser = argparse.ArgumentParser(epilog=epilog, formatter_class=argparse.RawDescriptionHelpFormatter)
    parser.add_argument('epub_file_path', help='Path to the epub file')
    parser.add_argument('-l', '--lang', default='en-gb', help='Language code: en-gb, en-us, fr-fr, ja, ko, cmn')
    parser.add_argument('-v', '--voice', default=default_voice, help=f'Choose narrating voice: {voices_str}')
    if len(sys.argv) == 1:
        parser.print_help(sys.stderr)
        sys.exit(1)
    args = parser.parse_args()
    main(kokoro, args.epub_file_path, args.lang, args.voice)


if __name__ == '__main__':
    cli_main()

]]>
https://claudio.uk/posts/epub-to-audiobook.html hacker-news-small-sites-42708773 Wed, 15 Jan 2025 08:47:38 GMT
<![CDATA[I failed moving my Google calendar to Proton]]> thread link) | @true_pk
January 14, 2025 | https://shilin.ca/i-tried-moving-my-google-calendar-to-proton-and-failed/ | archive.org

Basically, the title.

I've been building up the courage to transition since I started de-googling my digital life about three years ago. At first, there was the browser. It was the easiest — Firefox instead of Chrome, obviously. Then there was mail. I learned about SimpleLogin and started using them by routing all my aliases to a single Proton email address. This was long before their partnership with SimpleLogin. Next, I moved the drive and the documents stored in it. Lastly, there was the calendar.

why de-googling

Everything about my life had been hosted within the Google's ecosystem. Personal documents, bank statements, you name it. But Google’s privacy practices have been increasingly concerning, if not alarming. I got tired of them using and selling my data for advertising — my browsing behavior, purchasing habits, and my email conversations. Google was, and unfortunately still is, everywhere. It is on my phone, in my bedroom, and in my friends' houses. The company's main business model is advertising, and so their revenue streams speak for themselves 1.

the setup

But let’s head back to the topic of this post. What happened to Proton Calendar? Before we dive in, here’s my setup as I began the process:

  • Work (blymp's) email is on Google, so my personal calendar should sync with it
  • Both personal and work calendars must sync to iCalendar on my iPhone because iMadePoorChoicesEarlierInLife
  • I should be able to add and modify events through the calendar on my phone, both work and personal.

importing google calendar to proton

I started by importing my main google calendar, and was happy to learn that Proton provides a simple integration called Easy Switch. It was pretty much a single button to do it all.

Proton's Easy Switch screen

I pressed, and they told me I’d need to wait a jiffy, and just about 10 minutes later they notified me that it was done.

Perfect, I thought, and deleted the synchronization with my google calendar from both the iPhone and the work calendar. No way back.

This was, in fact, an “easy switch”. I mean… if it actually was, there wouldn't have been the rest of the story.

syncing proton calendars with google and icalendar

Things got entangled quickly. Proton Calendar lets you create public links, so you can export your calendars anywhere. And while it sounds lovely in theory, in practice it only allows for read-only exports, unless the receiving party is also on Proton. That meant, I wouldn’t be able to modify events in my personal calendar from either my work browser or from my phone unless I am using their official app. That’s a bummer.

Proton's screen to share calendars with other people

But giving up so soon almost meant not trying at all. And because of my love for other Proton services, I really did want to give it a proper try. So I tried

syncing other calendars with proton

The iCalendar’s home screen widget is marvelous. If you already forgot my home screen, it's worth reminding:

My perfect, beautiful, minimalist iPhone's home screen

Notice the calendar in the middle. This is probably the most used piece of real estate on my entire phone.

But there are things you do for love… like installing Proton’s own calendar app. Using iCalendar wasn’t an option anymore, as I couldn’t sync my proton calendar with it, but at least I could try to sync my work calendar to the Proton’s app and try to use it for a couple of weeks.

My blymp calendar had a few links in the settings to help me export it, and I tried to paste them all into Proton’s app, one by one, until each one of them was declined. I quickly realized that my work calendar was private and couldn’t be exported anywhere. There was, however, a button to make it public. I clicked. It prompted:

Google's warning saying my calendar will be visible to Google Search if I make it public

Nope.

I went back and double-checked Proton's instructions. I just couldn't believe this was it. And truly, there was an explanation about subscribing to private google calendars.

Proton's screen from the tutorial on subscribing to a private google calendar

It says: "Secret address in iCal format: Use this address to access this calendar from other applications without making it public."

Phew, I knew it! But then I went to get the same link for my main blymp calendar, and… Wait a second!

My actual screen trying to subscribe to work google calendar

The section was not there. Moreover, it was nowhere. Not on this page, and not on any other pages. It just wouldn't let me create a private link for my main calendar. Like, WTF?!

summary

I failed. Exporting Proton’s calendar is only possible in read-only mode, so I can’t add events unless I’m using their official application. And when I installed their app, I couldn’t add my work calendar to it because Proton doesn’t have an option to connect to Google accounts by any means other than by an iCal url. And the url did not work because my work email is private (and will stay so).

At the end of the process, I was confused. Why did I have to jump through hoops to make a simple thing work? Why is there no standardized way to share calendars? Like a secret token one could generate that will carry along two pieces of information: the permission level (edit or view) and a type of information displayed (only busy/available status or full title/description).

I am sure Proton will make it all work in the future. Through the bridge, like they did for Mail, or otherwise. Perhaps, they will even make an iPhone widget for my minimalist home screen. But until then, sadly, I am forced to go back to Google.


P.S. Surprisingly, this sparked an interesting discussion on HackerNews: https://news.ycombinator.com/item?id=42707606


  1. How Google (Alphabet) Makes Money: Advertising and Cloud. Investopedia

]]>
https://shilin.ca/i-tried-moving-my-google-calendar-to-proton-and-failed/ hacker-news-small-sites-42707606 Wed, 15 Jan 2025 05:22:12 GMT
<![CDATA[Nobody cares]]> thread link) | @fzliu
January 14, 2025 | https://grantslatton.com/nobody-cares | archive.org

N.B. I'm in a mood tonight, so this will be less of a well-considered essay and more of a rant, partially in the vein of Fuck Nuance. Don't take anything here too seriously.

Why does nobody care about anything? The world is full of stuff that could be excellent with just 1% more effort. But people don't care.

Have you been to the DMV? It sucked? There is a human being whose job it is to be in charge of the DMV. They do not care that it sucks.

Ever used a piece of software that's buggy as hell, looks bad, but still costs money, presumably because the company behind it has found some regulatory capture to justify their existence? The programmer who wrote it probably doesn't care. Their manager definitely doesn't care. The regulators don't care.

You might think "something something incentive systems". No. At my big tech job I had the pleasure of interviewing a few programmers who worked for a large healthcare company that engages in regulatory capture. Let me assure you: They. Do. Not Care.

I've met a few people that work for municipal governments. Not politicians, just career bureaucrats deep in the system. I ask them what their favorite part of the job is. They all say "stability" or "job security" as their #1. It takes 18 months to get the city to permit your shed? They. Do. Not. Care.

Here's a dumb example. This bike lane ends at the bottom of a hill near me. It merges onto the sidewalk at a crazy sharp angle. Cyclists are coming down this hill at 20mph or so. Tons of people can't make this angle at that speed and hit the vertical curb face, damaging their bike and injuring themselves. If they're unlucky, they go flying into the signpost.

Why does this ramp suck so much? For literally the exact same effort it took to build, it could have been built 10x better. Make the angle 20 degrees instead of 70. Put the ramp just after the sign instead of just before it. Make the far curb face sloped instead of vertical. Put some visual indication the lane ends 50 feet uphill. Why wasn't this done?

Because the engineer who designed it and the managers at the department of transportation do not give a shit.

This isn't even a pro- or anti-bikes thing. The ramp was getting built by whatever mandate. You're the engineer. Do you make it bad, or do you spend 1% more time thinking about it and make it good for literally the same cost? You make it bad, because you do not care.

I actually pointed this ramp out to the director of the Seattle Department of Transportation during a walking tour. He made a note of it over a year ago that I assume was promptly forgotten about. He does not care.

Here's another example. Street lights. Seattle has been engaging in a program to reduce everyone's natural melatonin production up to 5x by replacing the sodium lights with harsh-white LEDs.

These new lights objectively suck to anyone not driving. If your house is near one, they suck. If you're walking your dog at night (which starts at 5PM for much of the year in Seattle), they really suck.

But whoever made the decision to switch the lights does not care. It's entirely possible they don't even live in the city, but instead live in a pleasant exurb. Or maybe they don't walk at night and have never considered that other people do.

White LEDs reduce car crashes by 0.1% and that is measurable, but sleep quality and aesthetics are not measurable. You just have to care about them. And nobody cares.

But that's enough city stuff. Plenty of people don't care about plenty of other things.

You put on your turn signal in traffic to merge. The person who could let you in is looking straight ahead, zoned out. Why would they look around to see if they could cooperate with anyone? They're already in the lane they need to be in. They. Do. Not. Care.

You're at the airport. There's a group in front of you on the escalator taking up the full width, preventing anyone from walking by. They do not care.

You're on the sidewalk and someone has headphones in, walking in the center of the path. A mom and stroller are behind them. They can't hear her "excuse me" to get their attention. They have not even considered the possibility that anyone in the world exists but them. They do not care.

The McDonald's touch-screen self-order kiosk takes 27 clicks to get a meal. They try to up-sell you 3 times. Just let me pay for my fucking burger, Jesus Christ. The product manager, the programmer, the executives. None of these people care.

At work the junior engineer sends you some code to review. The code was clearly written in a first draft, and then just iteratively patched until the tests passed, then immediately sent to you to review without any further improvement. They do not care.

The guy on the hiking trail is playing his shitty EDM on his bluetooth speaker, ruining nature for everyone else. He does not care.

The doctor misdiagnoses your illness whose symptoms are in the first paragraph of the trivially googleable wikipedia article. He does not care.

People don't pick up after their dogs. The guy at the gym doesn't re-rack the weights. The lady at the grocery store leaves the cart in the middle of the parking lot. They. Do. Not. Care.

I could continue in this vein for another few pages, but it would be boring and you get the point. We are surrounded by antisocial bastards.

Some of them like the people who don't pick up after their dogs are legitimately just assholes.

Others, like the bureaucrats in the city who mess up our lives in more indirect ways are more victims of The System. But they are still guilty of lacking the personal agency to fight it or leave in protest, and I still — potentially unjustly — condemn them.

We have examples like Elon who, through sheer force of will, defeats armies of people who don't care. For his many faults, you can't say the man doesn't care.

When I joined my former Big Tech job, everyone cared. Over time, incentives attracted a different set of people who didn't care as much. Eventually those people became the majority. It's painful to work with people who don't care if you care a lot, and eventually I left because of it.

Now, I'm at a small startup full of people who care. Customer bug reports go right to our chatroom. We fix them immediately. I feel guilty I wrote the bugs at all. We reach out to users to see if we can make their lives better. We care.

I want to live in a community where everyone cares.

The one place in the world you get this vibe is probably Japan. Most people just really care. Patrick McKenzie refers to this as the will to have nice things. Japan has it, and the US mostly does not.

In Japan, you get the impression that everyone takes their job and role in society seriously. The median Japanese 7-11 clerk takes their job more seriously than the median US city bureaucrat. And the result is obvious if you visit both places.

Is it possible for us to care in the US? To foster the will to have nice things? I think we actually had this in the aftermath of WW2. The country was mostly on the same page about progress, values, the future. But over a few generations, more and more people defect. Living among defectors is demoralizing and causes more defections (much like my own departure from my Big Tech job!). Eventually society is full of defectors.

But I don't think this is a full explanation. Most people aren't assholes, they merely won't go out of their way to add to the world. And I can feel myself getting pulled in that direction.

I used to go a lot more out of my way to add to the world. A few years ago, I installed a bunch of dog bag dispensers on the telephone poles of my neighborhood. I still keep them stocked.

I was, somewhat naively, hoping that somehow I could get a snowball of care going. I built curb ramps on legacy curbs that lacked them. I lobbied the city to open new park space. Improve crosswalks. And much more!

But the snowball never started. Nobody cares. Rather, there is a tiny minority of activists who care. They spend all their free time doing activist stuff — basically fighting the city to try to make the bureaucrats care about little bits here and there.

But I've come to accept that I just don't have the disposition to fight all the time. I'm not a fighter. I care a lot and I just want to live in a place where other people care.

We're not going to move to Japan, but would absolutely be willing to move within the US.

Does such a community really exist? Where everyone cares? Or at least a supermajority? Or does it need to be built?

]]>
https://grantslatton.com/nobody-cares hacker-news-small-sites-42707238 Wed, 15 Jan 2025 04:15:43 GMT
<![CDATA[Mishima: A Life in Four Chapters – Paul Schrader's Generally Unseen Masterpiece]]> thread link) | @indigodaddy
January 14, 2025 | https://theconflictedfilmsnob.com/2018/06/22/mishima-a-life-in-four-chapters-paul-schraders-generally-unseen-masterpiece/ | archive.org

With writer/director Paul Schrader currently in the news for his critically lauded new film, First Reformed, it might be a good time to discuss one of his earlier efforts, one known to about five people outside of The Criterion Collection enthusiasts and a favorite of yours truly since checking it out on VHS back in the late 1980s.

Are you familiar with Mr. Schrader? For any budding cineaste he’s foundational, one of the 1970s rough and ready bad boys (Scorsese, Altman, Beatty, Coppola, Friedkin, Ashby, et al), making his name (and reputation for unsavory, violent characters and situations) by writing The Yakuza (1974), Obession (1976) and, most memorably, Taxi Driver (1976). He re-teamed with Scorsese a couple years later to bring us the family friendly Raging Bull (1980). During this time, Schrader began directing, some early efforts of note including American Gigolo (1980) and a remake of Cat People (1982).

Ironically, Schrader was raised in a very strict Calvinist Christian Reformed Church household and the story goes that he didn’t see a movie until he was 17. (It seems that, back then, the Calvinist church prohibited going to a movie theater and other “worldly amusements.”) Luckily for us, he fully split from the Calvinist doctrine by the time he headed to UCLA for his MA in film studies. But the baggage of youth has always shaded his artistic choices, his fascination with self-destructive characters and “crime, scuzz and sexual perversions” (as described by David Kamp in his indispensable and hilarious The Film Snob’s Dictionary) a response to formative years spent immersed in (and wanting to shed) a confining religion.

Despite his characters’ often sordid behavior and failings, Schrader is no one-note schlockmeister. He’s first and foremost an intellectual, a filmmaker not only intrigued by how fallible humans navigate their way through a complicated and unforgiving world, but also how some of cinema’s greats evolved their preoccupations and styles. (e.g. He’s the author of, among others, Transcendental Style in Film: Ozu, Bresson, Dreyer.)

Which is all to say that the man who wrote this…

…can also write (with his brother, Leonard) and direct something as artful as Mishima: A Life in Four Chapters, which now, after all this preamble, is officially the subject of this blogpost.

Nowhere in Schrader’s oeuvre is his intellectual curiosity more apparent than in this 1985 film, which, due to its subject matter, unique structure and intense stylization, probably only saw the light of day because it was championed (and executive produced) by Francis Ford Coppola and George Lucas.

The film centers on the life (and death) of Yukio Mishima, famed Japanese writer, physical fitness nut and hardcore nationalist. For those who know nothing of the man (pretty much the entire world outside of Japan and a certain expat I know from college), it seems that Mishima, already a celebrated author (he’d been considered for a Nobel) started becoming more and more disillusioned with Japan’s post-war rebuild into a consumer society, one lacking what he felt was a proper cultural identity. An adherent to the code of the samurai (bushido), Mishima became even more radical when, in 1968, he formed Tatenokai, “a private militia composed primarily of young students who studied martial principles and physical discipline, and swore to protect the Emperor of Japan” according to our friends at the infallible Wikipedia. Things got nuttier still when, on November 25, 1970, Mishima and a couple fellow Tatenokai members attempted to convince a group of soldiers at the Japan Self-Defense Forces to help him stage a coup d’état. When the soldiers basically laughed off the request, Mishima committed ritual seppuku.

As indicated in the title, Schrader’s film is broken into four “chapters,” each framed by a depiction of the failed coup and Mishima’s (Ken Ogata) subsequent death…

Chapter One, interwoven throughout the other three and filmed in B/W, gives the audience glimpses of Mishima’s youth and other key life events that shaped him into the artist and man he was on that fateful November day…

Chapters Two-Four are beautifully stylized mini-dramatizations of three of his novels, all of which provide plenty of additional insight into their author’s MO, including The Temple of the Golden Pavilion, which tells the story of a man who burns to the ground a famous Buddhist temple because he’s intimidated by its beauty…

Kyoko’s House, which portrays a doomed sadomasochistic relationship…

…and, finally, Runaway Horses, which depicts the failed attempt of a group of nationalists to overthrow their government…

This is some avant-garde filmmaking, people.

But Schrader, working with brilliant cinematographer John Bailey, minimalist composer Phillip Glass, production designer Eiko Ishioka (who also did incredible costuming work on Coppola’s Dracula remake some years later) somehow pull it off, every piece of this complicated portrait of a brilliant but disturbed artist coming together for maximum precision and emotional weight.

It really is a sight to behold. And luckily, for all of us, the technology to present such a film in the home has improved exponentially since those dark days of cropped, low-res VHS. As a matter of fact, The Criterion Collection, just this month, released an exquisite Blu-ray of the film and, word is, it’s never looked better. Don’t be so difficult; check it out someday.

]]>
https://theconflictedfilmsnob.com/2018/06/22/mishima-a-life-in-four-chapters-paul-schraders-generally-unseen-masterpiece/ hacker-news-small-sites-42705836 Wed, 15 Jan 2025 00:24:34 GMT
<![CDATA[Rewriting my website in plain HTML and CSS]]> thread link) | @arnath
January 14, 2025 | https://www.vijayp.dev/blog/rewrite-plain-html/ | archive.org

January 15, 2025

This week, I decided to rewrite my website using plain HTML and CSS. When I originally made it, I used SvelteKit for simplicity. It was a more interesting project than I was expecting when I started working so I wanted to share my thoughts on the experience.

Why?

There are a number of reasons I decided to do the rewrite. One is that I’m currently unemployed so I have a lot of free time for side projects. Another is that, as you can see, this website is pretty simple so I wasn’t gaining a lot from using SvelteKit. I also wanted to move the site over to Cloudflare Pages so this was an opportune time to make some changes.

However, the primary reason I decided to make some changes is that I find the Javascript bundler and building ecosystem incredibly aggravating to use. For example, one of the things I set up my old website to do was build the blog section from the set of Markdown posts. I assumed this would be easy to do. SvelteKit and Vite allow you to prerender your website and I had a set of files at build time - I just needed to add some logic to transform them. Instead, it was infuriatingly difficult to figure out a way to just get a handle to a set of files in my tree at build time (let me caveat that I’m not a frontend dev and maybe I missed something obvious). It took me hours of Googling and trying out different options to come up with this awful piece of code that worked to load the contents of a file and give them to my page:

import type { PageLoad } from "./$types";

export const load: PageLoad = async ({ params }) => {
  const file = await import(
    `../../../../lib/assets/posts/${params.slug}.md`
  );

  return { content: file.default, ...file.metadata };
};

I was tired of dealing with things like this for the tiny amount I was gaining from using SvelteKit. And so, I finally decided it was time for a rewrite.

How?

I think spending too much time on Hacker News gave me the misconception that writing a website using plain HTML and CSS would be a relatively well-paved path in 2025. I spent some time looking around for guides or a “canonical” way of doing this and found that there isn’t really one. Because of that, I decided to just start from scratch with an empty directory and go from there. My website is small enough that I was able to remake a lot of the pages as static HTML.

However, I prefer writing blog posts in Markdown. It’s easier to write than HTML, I can pull posts out of my existing Obsidian vault, and I just find it more convenient. Therefore, I needed some kind of script to turn my Markdown blog posts into HTML content. I investigated some options for this and found Pandoc. Pandoc is a universal document converter for converting markup formats. It provides a library and a CLI for converting documents from Markdown to HTML (along with many other formats).

To write the script, I wanted something as lightweight as possible but easier to use than a Bash script. This led me to Python and uv. I’ve found that uv basically abstracts away the Python environment in a way that’s really convenient for a tiny project like this. Using Python also gave me a free way to serve my website using the http.server module. Finally, I wrote a tiny Makefile so I wouldn’t have to remember the serve command.

Results

The outcome was not the most revolutionary because my website was really simple in the first place. But the size of my “compiled” website asset went from ~356kb to ~88kb. My project tree got a lot simpler and the only Javascript on the site now is to highlight code. I’m also just happier about the state of things. I feel like I understand how and why my site works (where before I understood parts but not the whole mystery).

Before, with SvelteKit After, with plain HTML

Next Steps

There are two downsides that I’ve found so far. I’d like to investigate ways to fix or improve these.

  • More code duplication. SvelteKit has a component system so I could make my navigation bar as a component and reuse it. When I removed it, I had to duplicate that code in a few places. Luckily the cost was pretty minor because I only really have four HTML pages. I’m aware that there’s some way to do this using web components. It’s something I intend to look into as one of my next side projects.
  • No live reloading. I have to kill the website to rebuild it now. I’m sure there’s a tool I can find to fix this, or maybe just use something like FastAPI that has automatic reload. But until I do something about it, there’s a minor added cost every time I make a change.

Also, I think this repository is now a reasonably good template for someone who wants to make a simple website with some Markdown blog posts without using a generator. I was surprised when I started this project how difficult it was to find a guide about how to write your site without a framework. Hopefully this can help some other people.

]]>
https://www.vijayp.dev/blog/rewrite-plain-html/ hacker-news-small-sites-42705077 Tue, 14 Jan 2025 22:57:19 GMT
<![CDATA[Apple starts pushing AirPods owners into Transparency mode, with no easy opt out]]> thread link) | @spenvo
January 14, 2025 | https://keydiscussions.com/2025/01/14/apple-opts-airpods-pro-2-and-airpods-4-owners-into-loud-sound-reduction-which-sounds-great-but-forces-users-into-transparency-or-noise-cancellation-modes-with-no-easy-way-to-opt-out/ | archive.org

(If you want to skip directly to the fixes, click here. If you want to skip to some genuine praise of Apple, feel free to jump to the next section.)

A couple of weeks ago I noticed my pair of AirPods Pro 2 aggressively switching me into Transparency mode. It seemed like a bug. Again and again I would have to manually switch back out of Transparency mode. Annoying.

Then a few days later, Apple removed the ability for me switch out of Transparency mode altogether!

There are ways to reverse each of these changes (the force switching and the Off removal), but the whole process was a major pain as a user to figure out, it wasn’t simple to reverse even once I knew how to, and there wasn’t any heads up that I remember getting from Apple explaining the changes. This led to me and a lot of people being confused.

Well over 100M people own AirPods. Here are some reddit posts (1, 2, 3, 4, 5) made by users frustrated over these specific AirPods changes. Notably, none of these reddit posts contain in their comments all of the steps needed to revert the changes.

Quick summary of Noise Control modes: Transparency, Adaptive, Noise Cancellation, and Off:

If it’s Off, that means your AirPods pipe audio into your ears without any extra processing or special sound alteration. This conserves battery and sounds better to me than Transparency. Great! I prefer this.

When Transparency mode is enabled, it “passes through” some of the noise around you, so you can have higher awareness of your surroundings. Neat! But this means that, whenever it’s enabled, I hear a hissing sound (at a minimum) that I otherwise wouldn’t. It also burns more battery.

Noise Cancellation is self explanatory — it cancels out annoying sound. I use ANC sparingly because it makes my inner ear feel different, but I think it’s great on a plane or train. Adaptive tries to intelligently switch between Transparency or Noise Cancellation. All of these active modes burn through your AirPods’ battery at a faster rate.

To recap, it aggressively started switching me into Transparency, then the “Off” option was removed by Apple altogether. (On both iOS and tvOS.) With no warning — just, poof!

After Googling about it, I learned that others were hitting this issue. Apple had indeed removed “Off” but buried the means of bringing it back. Some Googling, tap tap tap, and here’s the first buried setting I had to find:

the first buried setting you need to find

Now the Off option had returned… But not on tvOS? Anyway, I was just thankful to have fixed… oh wait…

The AirPods still kept switching me from Off to Transparency mode. So: all this Googling and time researching this, and I was merely back my initial problem! Here’s how I was feeling at this point:

My Airpods have been glitching so hard it has turned me into an airpods hater. No, for the hundredth time, i don't want them in Transparency mode (this still happens to me even after the latest firmware update where you have to manually opt back into having an "Off" state).

Spencer Dailey (@spencerdailey.bsky.social) 2025-01-07T15:50:47.736Z

In frustration, I eventually Googled myself down a rabbit hole where I learned: all of this is likely tied to a relatively new feature called Loud Sound Reduction that only works if AirPods are in an active “Noise Control” mode. So Apple perhaps recently decided that everyone needed this feature enabled, and that’s why they made all these annoying changes to Noise Control? I can only speculate.

Anyway, so surely Loud Sound Reduction can be disabled (so my AirPods would hopefully stop switching to Transparency mode)?

This was a dead end

There it is! But guess what? Nope – that’s a read only field that looks like a button but isn’t one. Hm. So… I returned to more Googling!… And found that to disable Loud Sounds Reduction you must go to: Settings -> Accessibility -> AirPods -> <Name of your AirPods> -> disable Loud Sounds Reduction.

Wow! That’s a lot of time, taps, and user frustration to merely get something back to how it originally was!

But you know what? tvOS still did not show an “Off” mode for my AirPods 2! I ended up needing to hard reset my AirPods, change all the settings mentioned above on iOS for a second time, and then let tvOS rediscover them before “Off” would appear there. PHEW!

Finally! I was able to get things back to how they were before. What a journey!

Kind of random, but for those who enjoyed this tale, you may enjoy this email from Bill Gates to leaders at Microsoft, about how hard it was for him to install a piece of software from Microsoft’s website.

Why gripe about Apple products?

I love to write about Apple. Why?

For the most part: Apple productively listens to users, reporters, podcasters, other creators, etc. Obviously they can’t please everyone (they have long-running disagreements with lots of developers over things like App Store cuts etc., and don’t always positively respond to what some think are reasonable suggestions (especially if you’re a regulator) etc.). Apple is not without its own problems and hypocrisies, but it still does a better job of listening to feedback than many tech companies.

And Apple is in stark contrast to a handful of childishly spiteful big tech companies. They shall remain nameless in this post, but they have been known to kick users off their platforms, brick users’ products, shadowban users (while simultaneously preaching the evils of shadowbanning), or sic fanboys on you if you publish opinions (or facts) that they don’t like. It’s been happening frequently since mid-2022. It’s straight up targeted censorship (despite the same companies gaslighting on this issue).

Apple seems to be run by adults most of the time and it remains rewarding to write about them.

Case in point, here are a few examples previous posts on Apple that got significant attention (1, 2, and 3). They generally lambasted Apple over product decisions. These “negative” stories about Apple garnered huge traffic (up to the top of reddit). But nowadays, having a similarly negative and popular story about certain other tech companies would come with serious potential downsides. This means, it may turn a best case scenario for a blogger (getting a lot of traffic) into a worst case scenario (being targeted with retaliation). So the ROI for me writing for free about Apple is comparatively much higher, especially as someone who is deeply invested in their platforms (as a user and app developer).

[The iOS UI/robocalls post may have affected some change, as it got to the top of reddit and came out about a year before Apple fixed the issue. I’ve written some positive things about Apple too (1,2).]

Furthermore… Whether for business reasons or not, Apple has at least preached [Steve Jobs clip] about believing in a strong free press in the US for a long time. That stance is refreshing, in an era where anti-press rhetoric and physical violence against reporters has hit a high in the US and prevailing big-tech-supported political movements are categorically painting the press as an enemy of the people. It has become popular for leaders of some other tech companies to broadly bash members of the “legacy” media (a pejorative term for traditional journalists doing their work).

Apple has also historically preached restraint [Steve Jobs clip] in its responses to (what they feel are) biased stories, and the company is better for it.

The leaders at Apple have simply been adults acting like adults, most of the time. This simple fact has led to better products and was part of its journey to becoming the world’s most profitable business in 2021. It sounds obvious but bears repeating [in light of what’s become the rage in culty tech circles these days]: Acting like childish brats toward users, creators, and the press was not a part of Apple’s recipe to becoming the world’s most profitable business. Apple understands the value of keeping critics in their own feedback loop.

These are the reasons I still like to write about Apple in 2025.

Steps to revert the late-2024 changes to your Airpods Pro 2 or Airpods 4

In the last weeks of 2024, Apple changed its newer Airpods to remove the “Off” Noise Control and (separately) push users into Transparency mode all the time. This seems to be because of their forced roll out Reduce Loud Sounds feature. These are the steps in software needed to revert the changes. If these changes aren’t reflected in other devices like Apple TV, I found that I needed to hard-reset my Airpods Pro 2, perform these settings changes (in the video), and then have my Apple TV rediscover them.

]]>
https://keydiscussions.com/2025/01/14/apple-opts-airpods-pro-2-and-airpods-4-owners-into-loud-sound-reduction-which-sounds-great-but-forces-users-into-transparency-or-noise-cancellation-modes-with-no-easy-way-to-opt-out/ hacker-news-small-sites-42704331 Tue, 14 Jan 2025 21:45:47 GMT
<![CDATA[Show HN: Simplex: Automate browser workflows using code and natural language]]> thread link) | @marcon680
January 14, 2025 | https://www.simplex.sh/playground | archive.org

Unable to extract article]]>
https://www.simplex.sh/playground hacker-news-small-sites-42704160 Tue, 14 Jan 2025 21:30:14 GMT
<![CDATA[I hated coding, but I learned to love it again]]> thread link) | @true_pk
January 14, 2025 | https://shilin.ca/i-hated-coding-but-i-learned-to-love-it-again/ | archive.org

I quit my software job in October, two years ago. I was tired and extremely frustrated. Things that I enjoyed in the past no longer satisfied me. Those things were mostly related to coding. But like any fire, I started as a slowly burning dumpster until I hit a critical point and exploded. And when it happened, I quit.

on the way to a burnout

I always loved programming. When I was in high school, I set up a wireless networking infrastructure for the school building. Nobody paid me to do so. And few people said thank you. But I did it because I liked doing what most kids would turn their noses away from — tinkering with bits, packets, and a big mess of wires (for a wireless network, ah-ha).

And when I came back from school, I would spend evenings compiling my first Linux Gentoo. Not Ubuntu, not even Arch, but Gentoo. I remember being bored looking at the black screen of my CRT monitor that took most of my desk, waiting for two days, so LibreOffice could finish compiling. Fun times! Ah, god bless binary packages.

But the fun didn’t last long as my school — and later university — was over, and I had to start making money. I knew I could do that by building software for other people, but I had little knowledge of many ways in which it was possible. Because of the traditional script we all grow up with (school → university → stable job → retirement at 65 → death), I did what I knew best at the time — I got a stable job at a big software corporation.

“The university system, once an intellectual crossroad for ideas, is now the largest confirmation bias on the planet, where mass cast opinions are sheathed in “safe spaces” as undebatable truths.” ― M.J. DeMarco, UNSCRIPTED: Life, Liberty, and the Pursuit of Entrepreneurship

Eight years and five companies later, my view of the software world took the most unexpected shape. What I once saw as a fun and thriving environment, full of people living days and nights playing with Arduino, turned out to be one enormous mess, guarded by folks with an ego bigger than Everest.

Instead of shipping a small bug fix at 2 am, like I would normally do back in the day, I would now spend two weeks getting approvals from at least two teammates. And if my change, god forbid, impacted more than a single file of the codebase, it would cause additional reviewers, meetings, and days of delay. C++ codebases meant a twenty-years-old legacy nightmare, with classes lost in the infinitely nested polymorphism. Modern languages were better, but they faced another issue — nobody knew what they were doing. There were too many frameworks and ways of producing the same result.

But at the beginning of my career, I fell into the trap of my own inflated ego. And so did I reap what I had sowed. Instead of going into the field that gave me most pleasure — frontend and building apps and websites for customers — I chose what made me look the smartest. Graduating from the top-tier technical university in Russia was no small thing, and I decided my resume had to live up to the name of the university I put in it. When I applied for a job in a performance optimization department, I did it not because I wanted to, but because the department was the hardest to get to, and it seemed totally badass. Knowing PHP is one thing. But knowing — and, of course, showing other people my knowledge of — data structures, linear algebra, probabilities, and discrete mathematics was a whole different story. It meant I wasn’t just cool, I was also smart. And smart meant I was even cooler.

Unfortunately, my first job optimizing Oracle databases was followed by a series of other lower-level jobs touching C++, Java, and Linus knows what else. This wasn’t fun at all. It was the opposite of fun. Thankfully, things got a little more interesting later as I dove into machine learning competitions. That naturally led me to a research job at Gameloft. Unsurprisingly, not because of my ML experience and a Kaggle Master title — who cares about that anyway, — but because I was exceptional at cracking C++ algorithm questions. At Gameloft, I was severely underpaid but happy. I enjoyed the experimentation, the flexibility, and the freedom of art. Anybody in game dev will understand me. As for the rest of you — you really aren’t losing much.

In the end, Gameloft laid off the entire research department because research doesn’t make money, and you need money to publish more games. That’s what game dev is about, after all. This opinion is strictly my own and doesn’t represent the view of the company, obviously.

After Gameloft kicked me out into the abyss of unemployment, getting an ML job was harder than getting a backend job because ML wasn’t really a big thing back in 2019. Above all, I was underqualified for it because gold medals in machine learning competitions didn’t account for anything in recruiters’ eyes. Tell me about how to feel smart and stupid at the same time. So I went back to more C++ jobs because that’s what I could do, and that was the story my resume told.

recovery

If a flower doesn’t bloom, you fix the environment in which it grows, not the flower. ― Alexander Den Heijer

October 2022, and I finally quit the job. Looking back, it wasn’t a tremendous hit by any measure. I didn’t get depressed, neither did I fall sick for 3 months. But one thing has happened ― I developed a sense of utter distaste for any corporate job in technology. I didn’t know what I wanted to do for work. But I knew a few things. First, trying to fit into corporate culture failed every time I got a new job. It was like being slapped on the face, and with every slap I would lose memory, only to turn my head back and get slapped again. Second, having a big team was a no-no. One developer — great. Two — fine. Three is ok, but I’d rather not. More than that and you can review your own code, thanks.

Finally, legacy codebases. I’d rather troubleshoot my grandma’s printer remotely than deal with that mess again. People that choose to maintain ancient systems are heroes we don't deserve. You have to be either madly in love with what you do or get paid outrageously well. Ideally, both.

Over the first few months, I slowly got back to coding. I took on some personal projects that I had left due to lack of time and motivation. But now I had all the time in the world. Little by little, I started building fun things. Small apps and programs here and there, mostly to satisfy the craving of doing anything at all. Around the same time, I started reading more about startups and entrepreneurship. It seemed alien at first, but the idea grew in me. Why build a product for someone else if I can do it for myself and still get paid for it? Although not by my employer, but by my users. I remember my friend Alain gifted me an audio version of “How to Be a Founder” on the New Year’s Eve, and I was like “Whoa… I guess I can do it?”. Then I listened to it one more time and applied to a startup incubator.

The year 2023 was my personal version of the Man’s Search for Meaning, with me meeting a hundred other founders and trying myself in a handful of exciting ideas. Not only was I learning about this new world with my eyes wide open, I was doing what I never thought I'd do again — I was coding. And was I having a blast doing it, too! Spinning up infrastructure in the cloud for the new project. Putting together API endpoints overnight to make the MVP work. Publishing an app on Apple Store and Google Play. Those are the things I’d never done in my previous corporate jobs, but it was the most satisfying thing to do.

Since April last year, I’ve been building my own startup. I have customers, and they pay me for the product I’ve built. As it is often expected in smaller teams, I also do sales and marketing, and I deeply enjoy it. But most importantly, I’ve finally rebuilt my relationship with programming. It is no longer frustrating. Quite the opposite, it’s greatly rewarding. Like the moment you run a migration on the production database, the change goes live, and you open the customer page to see the new feature you’ve spent the last week building. That moment is priceless. Or when your customer emails you “Thank you, we love you guys!”, and you rush to reply “We love you toooooooo!”, only to stop mid-way. This is better than any drug. It is the drug.

Before finishing, I have two quotes from Naval Ravikant to share. He doesn’t have just two, he has the whole book full of wisdom in the form of his quotes. But these two are what made me stick to it and trust the process. I hope it inspires you, too.

1. “If you can’t see yourself working with someone for life, don’t work with them for a day.”

2. “I’m always ‘working‘. It looks like work to them, but it feels like play to me.”

— Naval Ravikant


So, go make work your playground. Then you can’t lose.

#entrepreneurship

]]>
https://shilin.ca/i-hated-coding-but-i-learned-to-love-it-again/ hacker-news-small-sites-42703634 Tue, 14 Jan 2025 20:48:44 GMT
<![CDATA[Proof of location for online polls]]> thread link) | @c-riq
January 14, 2025 | https://ip-vote.com/geolocation_via_latency.html | archive.org

Chris Rieckmann | January 14, 2025

Network Latency Triangulation based Geolocation

Information about a device's physical location can be inferred by measuring the time it takes for signals to travel between the device and a known server location. As the speed of light cannot be exceeded according to the known laws of physics, a maximum possible distance can be established with certainty, based on the signal latency. Multiple measurements to different servers establish circular areas of possible locations on the earth's surface which can then be intersected.

For more context on how this technology enables reliable online polls, see our article on IP-based polls as a proxy for popular opinion.

Network Latency Triangulation Diagram
Possible client location discs (grey) from 8 server latency measurements and the likely client location (red) within the intersection. The actual client location in this measurement is Amsterdam, Netherlands.

Key Advantages:

  • Cannot be manipulated unlike GPS signal derived coordinates, which can be altered by the user's device before relaying them to the server
  • Works even when location services are disabled, provided that the user consents to it's application
  • Can provide supportive evidence for VPN/proxy usage, when the latency is too high for all server locations

How It Works

The process relies on the physical limitations of data transmission through the internet infrastructure:

  • Light travels through fiber optic cables at approximately 2/3 the speed of light in vacuum
  • Routing inefficiencies and electronics increase the signal latency typically by 20% or more. This range can be represented as a probabilistic distribution.
  • The maximum theoretical distance between two points can be calculated based on these limitations and the measured latency
  • Multiple measurements to different servers establish circular areas of possible locations on the earth's surface which can then be intersected
  • When trying to alter the apparentlocation, a user can only introduce delays, which will result in a higher location uncertainty but users cannot reduce the network latency beyond the mentioned limits
  • Users with a high latency to all servers can be excluded from polls, as this is a strong indicator of a VPN/proxy usage

Application in voting security

Latency-based geolocation can help protect poll integrity by:

  • Detecting when poll responses originate from outside the intended geographic region
  • Identifying attempts to manipulate polls through elevated VPN/proxy usage
  • Providing an additional layer of verification beyond IP-address geolocation and IP-address reputation

Successfully manipulating a poll which employs this method would require following efforts and resources:

  • Gaining control over a large number of devices in the target geographic region for submitting votes through those devices
  • Alternatively, intercepting and modifying requests at multiple points in the internet routing infrastructure where the servers are connected
  • Making sure that the manipulation remains unnoticed

Latency-based geolocation significantly raises the cost of manipulation attempts and can provide very high poll integrity, if employed in conjunction with other mitigations, such as excluding known data center IP-addresses, and analysis of response patterns. Additionally, investigating complaints by potential victims whose IP-address appears to have been already used for voting on a poll unbeknownst to them, can help to uncover manipulation attempts.

More about this project:

Technical Implementation

In our implementation, we added a few additional parts to make it work:

  • As the clocks of the computers of the client and the servers may not be synchronized, we first approximate the clock difference among the clocks (using the Network Time Protocol algorithm). This clock difference may be imperceptibly short for humans but may nonetheless be significant for the latency measurement.
  • To mitigate certain manipulation attempts, the master server first generates a random number and sends it to the client's device, which relays it to the latency measurement servers. This prevents the client sending latency measurement requests ahead of time, which would allow them to pretend to be closer to a server than they actually are.
  • Before measuring latencies, the client's device sends requests to all servers to already establish HTTPS sessions. Creating a HTTPS session requires multiple network roundtrips and therefore considerable time and would add unnecessary noise to the actual latency measurements.
Message Sequence Chart for Latency Measurement
Message sequence chart showing the latency measurement process between client and multiple measurement servers.

Signal transmission outside the internet infrastructure

In the above described location inference, the reduced speed of light inside glass fibers is assumed, which is 2/3 that of the speed of light in vacuum or air. Therefore sending signals through the atmosphere or space may enable the manipulation of the apparent location to some degree. One conceivable approach might be to use long range radio signals travelling through the atmosphere over large distances. And another approach might be to use SpaceX's Starlink satellite infrastructure. Both could potentially be used to distribute the random number faster than using the conventional internet infrastructure. However, the complete exploitation of this approach would also require spoofing the IP-addresses of the distributed devices participating in the concerted manipulation attempt to appear as a single device. The use of HTTPS would further complicate the realisation of this manipulation approach. Nonetheless, with significant effort, it may be possible to alter the apparent location. A successful manipulation following this approach may however only shift the apparent location to a certain degree, as the speed of light in glass fibers is in the same order of magnitude as the speed of light in air or vacuum. Looking at further implementation hurdles, Starlink's satellites are typically in ~500 km altitude which slows the signal for short distances on the surface. Similarly the generation and reception of radio signals would likewise introduce additional latencies, which would require significant engineering efforts to be compensated for. And finally, for the successful manipulation of a significant poll, this approach would need to be applied to a large number of votes without being noticed. Considering the necessary resources and effort required, the manipulation approaches seem impractical for most polls, even if they carry a relatively high degree of societal impact and incentives for manipulation.

Conclusion

Network latency triangulation based geolocation is a method to determine the physical location of a device with a high degree of confidence. It can be used to detect when poll responses originate from outside the intended geographic region, and to provide an additional layer of verification beyond IP-address geolocation and IP-address reputation. For poll outcomes to be truly reliable, location measurements should be performed by multiple independent audited entities.

]]>
https://ip-vote.com/geolocation_via_latency.html hacker-news-small-sites-42703422 Tue, 14 Jan 2025 20:31:19 GMT
<![CDATA[How rqlite is tested]]> thread link) | @otoolep
January 14, 2025 | https://philipotoole.com/how-is-rqlite-tested/ | archive.org

rqlite is a lightweight, open-source, distributed relational database written in Go, and built on SQLite and Raft. With its origins dating back to 2014, its design has always prioritized reliability, and quality. The robustness of rqlite is also a testament to its disciplined testing strategy: after more than 10 years of development and deployments, users have reported fewer than 10 instances of panics in production.

Testing a distributed system like rqlite is no small feat. It requires careful consideration of various layers: from individual components to the entire system in operation. Let’s explore how rqlite is tested, following its philosophy of maintaining quality without unnecessary complexity.


The Testing Pyramid: An Effective Approach

Testing rqlite adheres to the well-known testing pyramid, which prioritizes unit tests as the foundation, supported by integration tests, and capped with minimal end-to-end (E2E) tests. This strategy reflects decades of software development experience, ensuring test suites remain efficient, targeted, and easy to debug — and in my experience this approach works.

Unit Testing: The Core of Quality

At the base of the pyramid lies unit testing, covering isolated components. Unit testing dominates rqlite’s test suite because it offers the best balance of speed and precision. Given that rqlite’s database layer is built around SQLite and a “shared nothing” architecture, most database-related functionality can be reliably tested with unit tests.

Testing is also a huge part of the design process. If a component cannot be unit-tested easily, it often signals issues with its design. A little dependency injection during testing is a good thing, but too much indicates an over-reliance on other components. Meeting the goal of easy unit testing means clean interfaces, helping components remain focused on a single task.

Let’s look at the numbers. As of version 8.34.0, the entire rqlite code base is approximately 75,000 lines long (including tests, but excluding imported packages). Of that rqlite’s unit test suite comprises 27,000 lines of source code, making it the largest testing investment. Despite its breadth, the entire suite runs in just a few minutes, enabling frequent testing during development.


System-Level Testing: Validating Consensus

Above unit testing lies system-level testing (also known as integration testing), which focuses on the interplay between the Raft consensus module and SQLite. Since Distributed Consensus is at the core of rqlite, the correctness of this layer is crucial. Tests in this category validate:

  • Replication of SQLite statements across nodes.
  • Behavior of read operations at different consistency levels.
  • Resilience during cluster disruptions, such as node failures and subsequent recoveries, as well as Leader elections.

System tests include both single-node and multi-node configurations, ensuring the database operates correctly under varying cluster conditions. As of version 8.34.0, approximate 7000 lines of system-level tests exist, offering comprehensive coverage of these interactions. This test suite is also written in Go, which means it also runs relatively quickly.


End-to-End Testing: A Minimal Layer

End-to-end testing in rqlite serves as a smoke check, verifying that the system starts, clusters, and performs basic operations. Written in Python, these tests launch real rqlite clusters to ensure “happy path” functionality, guarding against embarrassing issues like a cluster failing to start due to a bug in command-line flag parsing.

End-to-end tests are deliberately limited to scenarios that cannot be tested at lower levels. Over-reliance on end-to-end testing is avoided because debugging failures in such tests can become prohibitively costly. For instance, a misconfigured dependency deep in the stack might surface in an end-to-end test, but tracing the root cause would require navigating through numerous layers.

A practical example of end-to-end testing is verifying backups to S3. End-to-end testing is useful here because setting up AWS credentials solely for unit testing would be cumbersome and, perhaps, impractical for other developers who wish to run the unit tests. While this approach does mean that S3-related development for rqlite is slower compared to other features, the trade-off is justified. The backup system rarely undergoes changes, so the added complexity of end-to-end testing is worth the effort to ensure reliability.

For version 8.340, only 5000 lines of end-to-end tests exist, demonstrating a targeted approach.


Performance Testing: Pushing the Limits

Beyond functional correctness, rqlite undergoes performance testing to evaluate its limits under load. These tests measure metrics such as:

A notable example involves testing with large SQLite databases, sometimes exceeding 2GB. Such scenarios highlight bottlenecks like rqlite’s memory management or disk write latencies, which are intrinsic to its architecture. Generating such large datasets efficiently remains an ongoing challenge, with potential solutions involving prebuilt SQLite databases stored in cloud buckets.

Performance testing also ensures stability, identifying issues like memory leaks or unexpected Leader elections under stress.


Lessons Learned

Testing rqlite has taught me valuable lessons, many of which resonate beyond database development:

  1. Start testing at the start: Unit testing is the most effective way to build confidence in your system. Don’t delay writing unit tests during development. If a bug exists, you’ll likely find it faster here than in an integration or end-to-end test.
  2. Keep test code simple. Test suites are not the place for relentless refactoring or the DRY mindset. It’s more important that test code is straightforward and easy to understand, even if that means writing more boilerplate than you otherwise would.
  3. Check your tests. When writing a test, it’s a good practice to temporarily invert the expected result and run the test again. A properly written test should fail in this scenario. Surprisingly this isn’t always the case, as errors in test code can sometimes go unnoticed. To avoid this, always take a moment to sanity-check your tests. It’s a small step that ensures your tests are reliable and truly doing their job.
  4. Don’t ignore test failures. Any test failure, no matter how difficult to understand, no matter how rare, is telling you something about your software — potentially something you don’t understand. Those hard-to-debug test cases often reveal a critical flaw in your code. Treat them as a gift and fix them.
  5. Maximize determinism. Build mechanisms into your system so you can trigger, on demand, what are normally automatic processes in your system. This allows you to test how your system performs when those operations occur. This approach is used in rqlite to test Raft snapshotting, which normally runs at semi-random intervals but can be explicitly triggered as needed during testing.
  6. Be Deliberate: Adding tests at higher levels must be justified. Excessive integration or end-to-end tests can quickly bog down development and debugging.
  7. Adapt and Iterate: For example, performance tests revealed that fsync calls were the primary bottleneck, leading to further optimizations in disk usage – such as compressing Raft log entries before writing them to disk.
  8. Efficiency Matters: With a suite that runs in a matter of minutes, I can iterate rapidly with confidence, a crucial advantage in maintaining an active open-source project.

Quality Matters

By adhering to the testing pyramid and focusing on targeted, efficient tests, rqlite maintains high quality while minimizing overhead. Whether through unit tests for component reliability, system tests for distributed consensus, or end-to-end tests for sanity checks, every layer serves a purpose.

As rqlite continues to evolve, so will its testing practices. With distributed systems becoming increasingly complex, maintaining simplicity in testing will remain a cornerstone of its design philosophy. After all, the goal is not just to build a database but to build one that works reliably, and is easy to operate, in the real world.

]]>
https://philipotoole.com/how-is-rqlite-tested/ hacker-news-small-sites-42703282 Tue, 14 Jan 2025 20:21:47 GMT
<![CDATA[HDM-1 beats GPT-4o-mini in LLM hallucination detection tasks]]> thread link) | @just_in_ai
January 14, 2025 | https://www.aimon.ai/sandbox | archive.org

aimon sandbox logo AIMon Sandbox

Detections will appear in the box below after you enter the details and hit "Test". 10 trials allowed.

0.99 / 1
Hallucination

1 / 1
Instruction Adherence

3 Issues
Context Quality

0.20 / 1
Conciseness

0.17 / 1
Completeness

0.34 / 1
Toxicity

Looking to understand why you see these scores? Or want to discuss Enterprise features? Sign up or reach out to us.


Sign up Free Schedule a demo

]]>
https://www.aimon.ai/sandbox hacker-news-small-sites-42703142 Tue, 14 Jan 2025 20:11:45 GMT
<![CDATA[Kafka Transactions Explained (Twice)]]> thread link) | @rtukpe
January 14, 2025 | https://www.warpstream.com/blog/kafka-transactions-explained-twice | archive.org

Understand Kafka Transactions by Comparing Apache Kafka's Implementation to WarpStream's

Many Kafka users love the ability to quickly dump a lot of records into a Kafka topic and are happy with the fundamental Kafka guarantee that Kafka is durable. Once a producer has received an ACK after producing a record, Kafka has safely made the record durable and reserved an offset for it. After this, all consumers will see this record when they have reached this offset in the log. If any consumer reads the topic from the beginning, each time they reach this offset in the log they will read that exact same record.

In practice, when a consumer restarts, they almost never start reading the log from the beginning. Instead, Kafka has a feature called “consumer groups” where each consumer group periodically “commits” the next offset that they need to process (i.e., the last correctly processed offset + 1), for each partition. When a consumer restarts, they read the latest committed offset for a given topic-partition (within their “group”) and start reading from that offset instead of the beginning of the log. This is how Kafka consumers track their progress within the log so that they don’t have to reprocess every record when they restart.

This means that it is easy to write an application that reads each record at least once: it commits its offsets periodically to not have to start from the beginning of each partition each time, and when the application restarts, it starts from the latest offset it has committed. If your application crashes while processing records, it will start from the latest committed offsets, which are just a bit before the records that the application was processing when it crashed. That means that some records may be processed more than once (hence the at least once terminology) but we will never miss a record.

This is sufficient for many Kafka users, but imagine a workload that receives a stream of clicks and wants to store the number of clicks per user per hour in another Kafka topic. It will read many records from the source topic, compute the count, write it to the destination topic and then commit in the source topic that it has successfully processed those records. This is fine most of the time, but what happens if the process crashes right after it has written the count to the destination topic, but before it could commit the corresponding offsets in the source topic? The process will restart, ask Kafka what the latest committed offset was, and it will read records that have already been processed, records whose count has already been written in the destination topic. The application will double-count those clicks. 

Unfortunately, committing the offsets in the source topic before writing the count is also not a good solution: if the process crashes after it has managed to commit these offsets but before it has produced the count in the destination topic, we will forget these clicks altogether. The problem is that we would like to commit the offsets and the count in the destination topic as a single, atomic operation.

And this is exactly what Kafka transactions allow.

A Closer Look At Transactions in Apache Kafka

At a very high level, the transaction protocol in Kafka makes it possible to atomically produce records to multiple different topic-partitions and commit offsets to a consumer group at the same time.

Let us take an example that’s simpler than the one in the introduction. It’s less realistic, but also easier to understand because we’ll process the records one at a time.

Imagine your application reads records from a topic t1, processes the records, and writes its output to one of two output topics: t2 or t3. Each input record generates one output record, either in t2 or in t3, depending on some logic in the application.

Without transactions it would be very hard to make sure that there are exactly as many records in t2 and t3 as in t1, each one of them being the result of processing one input record. As explained earlier, it would be possible for the application to crash immediately after writing a record to t3, but before committing its offset, and then that record would get re-processed (and re-produced) after the consumer restarted.

Using transactions, your application can read two records, process them, write them to the output topics, and then as a single atomic operation, “commit” this transaction that advances the consumer group by two records in t1 and makes the two new records in t2 and t3 visible.

If the transaction is successfully committed, the input records will be marked as read in the input topic and the output records will be visible in the output topics.

Every Kafka transaction has an inherent timeout, so if the application crashes after writing the two records, but before committing the transaction, then the transaction will be aborted automatically (once the timeout elapses). Since the transaction is aborted, the previously written records will never be made visible in topics 2 and 3 to consumers, and the records in topic 1 won’t be marked as read (because the offset was never committed).

So when the application restarts, it can read these messages again, re-process them, and then finally commit the transaction. 

Going Into More Details

That all sounds nice, but how does it actually work? If the client actually produced two records before it crashed, then surely those records were assigned offsets, and any consumer reading topic 2 could have seen those records? Is there a special API that buffers the records somewhere and produces them exactly when the transaction is committed and forgets about them if the transaction is aborted? But then how would it work exactly? Would these records be durably stored before the transaction is committed?

The answer is reassuring.

When the client produces records that are part of a transaction, Kafka treats them exactly like the other records that are produced: it writes them to as many replicas as you have configured in your acks setting, it assigns them an offset and they are part of the log like every other record.

But there must be more to it, because otherwise the consumers would immediately see those records and we’d run into the double processing issue. If the transaction’s records are stored in the log just like any other records, something else must be going on to prevent the consumers from reading them until the transaction is committed. And what if the transaction doesn’t commit, do the records get cleaned up somehow?

Interestingly, as soon as the records are produced, the records are in fact present in the log. They are not magically added when the transaction is committed, nor magically removed when the transaction is aborted. Instead, Kafka leverages a technique similar to Multiversion Concurrency Control.

Kafka consumer clients define a fetch setting that is called the “isolation level”. If you set this isolation level to <span class="codeinline">read_uncommitted</span> your consumer application will actually see records from in-progress and aborted transactions. But if you fetch in <span class="codeinline">read_committed</span> mode, two things will happen, and these two things are the magic that makes Kafka transactions work. 

First, Kafka will never let you read past the first record that is still part of an undecided transaction (i.e., a transaction that has not been aborted or committed yet). This value is called the Last Stable Offset, and it will be moved forward only when the transaction that this record was part of is committed or aborted. To a consumer application in <span class="codeinline">read_committed</span> mode, records that have been produced after this offset will all be invisible.

In my example, you will not be able to read the records from offset 2 onwards, at least not until the transaction touching them is either committed or aborted.

Second, in each partition of each topic, Kafka remembers all the transactions that were ever aborted and returns enough information for the Kafka client to skip over the records that were part of an aborted transaction, making your application think that they are not there.

Yes, when you consume a topic and you want to see only the records of committed transactions, Kafka actually sends all the records to your client, and it is the client that filters out the aborted records before it hands them out to your application.

In our example let’s say a single producer, p1, has produced the records in this diagram. It created 4 transactions.

  • The first transaction starts at offset 0 and ends at offset 2, and it was committed.
  • The second transaction starts at offset 3 and ends at offset 6 and it was aborted.
  • The third transaction contains only offset 8 and it was committed.
  • The last transaction is still ongoing.

The client, when it fetches the records from the Kafka broker, needs to be told that it needs to skip offsets 3 to 6. For this, the broker returns an extra field called <span class="codeinline">AbortedTransactions</span> in the response to a Fetch request. This field contains a list of the starting offset (and producer ID) of all the aborted transactions that intersect the fetch range. But the client needs to know not only about where the aborted transactions start, but also where they end.

In order to know where each transaction ends, Kafka inserts a control record that says “the transaction for this producer ID is now over” in the log itself. The control record at offset 2 means “the first transaction is now over”. The one at offset 7 says “the second transaction is now over” etc. When it goes through the records, the kafka client reads this control record and understands that we should stop skipping the records for this producer now.

It might look like inserting the control records in the log, rather than simply returning the last offsets in the <span class="codeinline">AbortedTransactions</span> array is unnecessarily complicated, but it’s necessary. Explaining why is outside the scope of this blogpost, but it’s due to the distributed nature of the consensus in Apache Kafka: the transaction controller chooses when the transaction aborts, but the broker that holds the data needs to choose exactly at which offset this happens.

How It Works in WarpStream

In WarpStream, agents are stateless so all operations that require consensus are handled within the control plane. Each time a transaction is committed or aborted, the system needs to reach a consensus about the state of this transaction, and at what exact offsets it got committed or aborted. This means the vast majority of the logic for Kafka transactions had to be implemented in the control plane. The control plane receives the request to commit or abort the transaction, and modifies its internal data structures to indicate atomically that the transaction has been committed or aborted. 

We modified the WarpStream control plane to track information about transactional producers. It now remembers which producer ID each transaction ID corresponds to, and makes note of the offsets at which transactions are started by each producer.

When a client wants to either commit or abort a transaction, they send an <span class="codeinline">EndTxnRequest</span> and the control plane now tracks these as well:

  • When the client wants to commit a transaction, the control plane simply clears the state that was tracking the transaction as open: all of the records belonging to that transaction are now part of the log “for real”, so we can forget that they were ever part of a transaction in the first place. They’re just normal records now.
  • When the client wants to abort a transaction though, there is a bit more work to do. The control plane saves the start and end offset for all of the topic-partitions that participated in this transaction because we’ll need that information later in the fetch path to help consumer applications skip over these aborted records.

In the previous section, we explained that the magic lies in two things that happen when you fetch in <span class="codeinline">read_committed</span> mode.

The first one is simple: WarpStream prevents <span class="codeinline">read_committed</span> clients from reading past the Last Stable Offset. It is easy because the control plane tracks ongoing transactions. For each fetched partition, the control plane knows if there is an active transaction affecting it and, if so, it knows the first offset involved in that transaction. When returning records, it simply tells the agent to never return records after this offset.

The Problem With Control Records

But, in order to implement the second part exactly like Apache Kafka, whenever a transaction is either committed or aborted, the control plane would need to insert a control record into each of the topic-partitions participating in the transaction. 

This means that the control plane would need to reserve an offset just for this control record, whereas usually the agent reserves a whole range of offsets, for many records that have been written in the same batch. This would mean that the size of the metadata we need to track would grow linearly with the number of aborted transactions. While this was possible, and while there were ways to mitigate this linear growth, we decided to avoid this problem entirely, and skip the aborted records directly in the agent. Now, let’s take a look at how this works in more detail.

Hacking the Kafka Protocol a Second Time

Data in WarpStream is not stored exactly as serialized Kafka batches like it is in Apache Kafka. On each fetch request, the WarpStream Agent needs to decompress and deserialize the data (stored in WarpStream’s custom format) so that it can create actual Kafka batches that the client can decode. 

Since WarpStream is already generating Kafka batches on the fly, we chose to depart from the Apache Kafka implementation and simply “skip” the records that are aborted in the Agent. This way, we don’t have to return the <span class="codeinline">AbortedTransactions</span> array, and we can avoid generating control records entirely.

Lets go back to our previous example where Kafka returns these records as part of the response to a Fetch request, alongside with the <span class="codeinline">AbortedTransactions</span> array with the three aborted transactions.

Instead, WarpStream would return a batch to the client that looks like this. 

The aborted records have already been skipped by the agent and are not returned. The <span class="codeinline">AbortedTransactions</span> array is returned empty.

Note also that WarpStream does not reserve offsets for the control records on offsets 2, 7 and 9, only the actual records receive an offset, not the control records.

You might be wondering how it is possible to represent such a batch, but it’s easy: the serialization format has to support holes like this because compacted topics (another Apache Kafka feature) can create such holes.

An Unexpected Complication (And a Second Protocol Hack)

Something we had not anticipated though, is that if you abort a lot of records, the resulting batch that the server sends back to the client could contain nothing but aborted records.

In Kafka, this will mean sending one (or several) batches with a lot of data that needs to be skipped. All clients are implemented in such a way that this is possible, and the next time the client fetches some data, it asks for offset 11 onwards, after skipping all those records.

In WarpStream, though, it’s very different. The batch ends up being completely empty.

And clients are not used to this at all. In the clients we have tested, franz-go and the Java client parse this batch correctly and understand it is an empty batch that represents the first 10 offsets of the partition, and correctly start their next fetch at offset 11.

All clients based on librdkafka, however, do not understand what this batch means. Librdkafka thinks the broker tried to return a message but couldn’t because the client had advertised a fetch size that is too small, so it retries the same fetch with a bigger buffer until it gives up and throws an error saying:

Message at offset XXX might be too large to fetch, try increasing receive.message.max.bytes

To make this work, the WarpStream Agent creates a fake control record on the fly, and places it as the very last record in the batch. We set the value of this record to mean “the transaction for producer ID 0 is now over” and since 0 is never a valid producer ID, this has no effect.

The Kafka clients, including librdkafka, will understand that this is a batch where no records need to be sent to the application, and the next fetch is going to start at offset 11.

What About KIP-890?

Recently a bug was found in the Apache Kafka transactions protocol. It turns out that the existing protocol, as defined, could allow, in certain conditions, records to be inserted in the wrong transaction, or transactions to be incorrectly aborted when they should have been committed, or committed when they should have been aborted. This is true, although it happens only in very rare circumstances.

The scenario in which the bug can occur goes something like this: let’s say you have a Kafka producer starting a transaction T1 and writing a record in it, then committing the transaction. Unfortunately the network packet asking for this commit gets delayed on the network and so the client retries the commit, and that packet doesn’t get delayed, so the commit succeeds.

Now T1 has been committed, so the producer starts a new transaction T2, and writes a record in it too. 

Unfortunately, at this point, the Kafka broker finally receives the packet to commit T1 but this request is also valid to commit T2, so T2 is committed, although the producer does not know about it. If it then needs to abort it, the transaction is going to be torn in half: some of it has already been committed by the lost packet coming in late, and the broker will not know, so it will abort the rest of the transaction.

The fix is a change in the Kafka protocol, which is described in KIP-890: every time a transaction is committed or aborted, the client will need to bump its “epoch” and that will make sure that the delayed packet will not be able to trigger a commit for the newer transaction created by a producer with a newer epoch.

Support for this new KIP will be released soon in Apache Kafka 4.0, and WarpStream already supports it. When you start using a Kafka client that’s compatible with the newer version of the API, this problem will never occur with WarpStream.

Conclusion

Of course there are a lot of other details that went into the implementation, but hopefully this blog post provides some insight into how we approached adding the transactional APIs to WarpStream. If you have a workload that requires Kafka transactions, please make sure you are running at least v611 of the agent, set a <span class="codeinline">transactional.id</span> property in your client and stream away. And if you've been waiting for WarpStream to support transactions before giving it a try, feel free to get started now.

]]>
https://www.warpstream.com/blog/kafka-transactions-explained-twice hacker-news-small-sites-42701174 Tue, 14 Jan 2025 18:03:48 GMT
<![CDATA[Show HN: WASM-powered codespaces for Python notebooks on GitHub]]> thread link) | @mscolnick
January 14, 2025 | https://docs.marimo.io/guides/publishing/playground/#open-notebooks-hosted-on-github | archive.org

Our online playground lets you create and share marimo notebooks for free, without creating an account.

Playground notebooks are great for embedding in other web pages — all the embedded notebooks in marimo's own docs are playground notebooks. They are also great for sharing via links.

Try our playground! Just navigate to https://marimo.new.

WebAssembly notebooks only

Currently, the online playground only allows the creation of WebAssembly notebooks. These are easy to share and embed in other web pages, but have some limitations in packages and performance.

The notebook embedded below is a playground notebook!

Creating and sharing playground notebooks

Playground notebooks run at marimo.app.

New notebooks

To create a new playground notebook, visit https://marimo.new.

Think of marimo.new as a scratchpad for experimenting with code, data, and models and for prototyping tools, available to you at all times and on all devices.

Saving playground notebooks

When you save a WASM notebook, a copy of your code is saved to your web browser's local storage. When you return to marimo.app, the last notebook you worked on will be re-opened. You can also click a button to save your notebook to the Community Cloud.

At marimo.app, save your notebook and then click the Create permalink button to generate a shareable permalink to your notebook.

Please be aware that marimo permalinks are publicly accessible.

Open notebooks hosted on GitHub

To open notebooks hosted on GitHub in the playground, just navigate to https://marimo.app/path/to/notebook.py. For example: https://marimo.app/github.com/marimo-team/marimo/blob/main/examples/ui/slider.py.

Use our bookmarklet!

For a convenient way to create notebooks from GitHub, drag and drop the following button to your bookmarks bar:

Open in marimo

Clicking the bookmark when you are viewing a notebook will open it in marimo.app.

From Jupyter notebooks

You can also create Playground notebooks from Jupyter notebooks hosted on GitHub. marimo will attempt to automatically convert the notebook to a marimo notebook.

Including data files

Notebooks created from GitHub links have the entire contents of the repository mounted into the notebook's filesystem. This lets you work with files using regular Python file I/O!

When constructing paths to data files, make sure to use mo.notebook_dir() to ensure that paths work both locally and in the playground.

Open in marimo badge

Include an "open in marimo" badge in your README to link to playground notebooks hosted on GitHub:

Open with marimo

Replace GITHUB_URL with the URL to a notebook on GitHub.

[![Open with marimo](https://marimo.io/shield.svg)](https://marimo.app/GITHUB_URL)

Replace GITHUB_URL with the URL to a notebook on GitHub.

<a href="https://marimo.app/GITHUB_URL" target="_blank">
    <img alt="Open in marimo" src="https://marimo.io/shield.svg" />
</a>

Creating playground notebooks from local notebooks

In the marimo editor's notebook action menu, use Share > Create WebAssembly link to get a marimo.app/... URL representing your notebook:

WASM notebooks come with common Python packages installed, but you may need to install additional packages using micropip.

The obtained URL encodes your notebook code as a parameter, so it can be quite long. If you want a URL that's easier to share, you can create a shareable permalink.

Configuration

Your marimo.app URLs can be configured using the following parameters.

Read-only mode

To view a notebook in read-only mode, with code cells locked, append &mode=read to your URL's list of query parameters (or ?mode=read if your URL doesn't have a query string).

Example:

  • https://marimo.app/l/83qamt?mode=read

To hide the marimo.app header, append &embed=true to your URL's list of query parameters (or ?embed=true if your URL doesn't have a query string).

Example:

  • https://marimo.app/l/83qamt?embed=true
  • https://marimo.app/l/83qamt?mode=read&embed=true

See the section on embedding for examples of how to embed marimo notebooks in your own webpages.

Excluding code

By default, WASM notebooks expose your Python code to viewers. If you've enabled read-only mode, you can exclude code with &include-code=false. If you want to include code but have it be hidden by default, use the parameter &show-code=false.

A sufficiently determined user would still be able to obtain your code, so don't think of this as a security feature; instead, think of it as an aesthetic or practical choice.

Embedding in other web pages

WASM notebooks can be embedded into other webpages using the HTML <iframe> tag.

Embedding a blank notebook

Use the following snippet to embed a blank marimo notebook into your web page, providing your users with an interactive code playground.

<iframe
  src="https://marimo.app/l/aojjhb?embed=true"
  width="100%"
  height="500"
  frameborder="0"
></iframe>

Embedding an existing notebook

To embed existing marimo notebooks into a webpage, first, obtain a URL to your notebook, then put it in an iframe.

<iframe
  src="https://marimo.app/l/c7h6pz?embed=true"
  width="100%"
  height="500"
  frameborder="0"
></iframe>

Embedding an existing notebook in read-only mode

You can optionally render embedded notebooks in read-only mode by appending &mode=read to your URL.

<iframe
  src="https://marimo.app/l/c7h6pz?mode=read&embed=true"
  width="100%"
  height="500"
  frameborder="0"
></iframe>
]]>
https://docs.marimo.io/guides/publishing/playground/#open-notebooks-hosted-on-github hacker-news-small-sites-42700852 Tue, 14 Jan 2025 17:46:41 GMT
<![CDATA[Using coding skills to make passive income]]> thread link) | @czue
January 14, 2025 | https://www.coryzue.com/writing/solopreneur/ | archive.org

In 2017, I stepped down from my job as CTO of a 150-person software company to see if I could make money selling my own software on the Internet.

Eight years later, I am now a full-time “solopreneur”—running a portfolio of revenue-generating software products as a full-time job. I now set my own hours, take vacation whenever I want, and, amazingly, earn more than I ever did as CTO.

In the talk below, I’ll share how I did it and what I learned in the process. I’ll share the skills that programmers should pick up in order to start selling software on the web, including both technical and non-technical ones. I’ll discuss how to get started, evaluating your ideas, building your product, and getting your first users and customers. The talk will draw from years of work building and selling software myself, as well as the experience I’ve gained from helping hundreds of others launch their own businesses with SaaS Pegasus.

If you’ve ever wanted to turn your coding skills into revenue-generating side-projects, I hope this talk both inspires and helps you to get started.

TL;DR

Someone on Reddit asked me to provide a TL;DR for the talk. This was my off-the-cuff response:

  • First you have to make space in your life for it. You need long blocks of time for deep work.
  • The first idea you pick is unlikely to work, so pick something and start moving. Many of the best products come out of working on something else.
  • When building, optimize for speed. Try to get something out in the world as quickly as possible and iterate from there.
  • Pick a tech stack you’re familiar with, that you’ll be fastest in.
  • Try to spend half your time on marketing/sales, even if you hate it.
  • The most important skill you can have is resiliance. Not giving up is the best path to success. This is hard because there is so much uncertainty in this career path.
  • It’s worth it! The autonomy and freedom are unmatched by any other career.

The talk

This talk was the closing keynote of PyCon South Africa on October 4, 2024.

It’s 30 minutes long, plus another 15 minutes of Q&A.

Alternatively, read on for an annotated transcript which I made using a process very similar to Simon Willison’s.

The transcript

Using Coding Skills to Make Passive Income (Title Slide)

I’m going to talk today about using coding skills to make passive income.

This email announcing the talk made me laugh, because the title does sound like clickbait.

Don't be dissuaded!

Hopefully this is the non-clickbait version of this topic. I’ll leave it to you to decide.

First I have to explain my job.

Explaining my job

“So, what do you do?”

Whenever I get this question I never really know how to answer it. The most boring answer that I give is I’m a software developer, but I also run my own business. If people know these words, I’ll say I’m a solopreneur or an indie hacker. If I’m feeling really specific, I’ll say I run a portfolio of revenue-generating technology products, which is a bit of a mouthful. If I’m feeling cheeky, I’ll say I make apps that make money while I sleep or… whatever I want?

How I actually earn a living is through a portfolio of technology products that are monetized online.

How I earn a living

Basically that means I make stuff and I put it on the internet, and some of it makes money—either through a one-time payment, a subscription,ads, affiliates, etc. I have a bunch of different things that make varying amounts of money and every day I just kind of work on one of them, support it, build a new thing, and so on.

Here’s how I got here:

How I got here

Sorry, this slide had a lot of transitions and so the image doesn’t translate very well.

I started possibly like many of you in a normal corporate gig. I didn’t last very long and I joined a friend’s company called Dimagi. When I joined we were like three people and he was like ‘hey you want to be CTO’ and I was like ‘yeah I’ll be CTO’ and I basically larped as a CTO for a few years.

But the company became quite successful, and then before I knew it I was the CTO of like a 200-person company and I had like a 35-person team under me. I was supposed to be a real CTO now, and I was, but I also kind of hated it. I was doing all these meetings and management and I hardly ever got to write code. And I eventually burnt out.

So I told my friend I needed a break. I decided to take like a six-month unpaid sabbatical to figure out my life. On that sabbatical, I discovered this website called Indie Hackers which basically is a website that told stories of people doing what I do now—building these random apps and making a living off them. I thought “okay that sounds cool, maybe I’ll try that” and so I decided in those six months that I was going to try to earn one dollar doing this indie hacker thing.

The first thing I did that succeeded was incredibly silly and I’m still kind of embarrassed about it, but it’s basically an app that lets you make place cards—those little cards that you find at weddings and other events. You upload a spreadsheet of your guests to this website, you download a PDF, you pay me like $5 (100 ZAR). And I was surprised that it actually worked! I made my first dollar and I was completely addicted. It was this most incredible feeling and so then I tried to do it over and over again.

Eventually I thought “I bet other people want to do this over and over again too,” and so I got kind of meta and built this product to help other people launch their own apps. This is SaaS Pegasus—a configurable Django codebase that I now sell and how I make probably like 80% of my money now.

Today, I’m blown away by how many people have built really cool products and really successful businesses on top of Pegasus code, including YCombinator companies and also just random people’s hobby project that they’re doing with their friends. It has over a thousand people using it, probably hundreds and hundreds of real products have been built with it, which is pretty fun.

Let me now clarify some key attributes of this path.

Key attributes of this path

I’m talking about products, so the key thing is you’re not trading your time for money—this is something that someone just buys and you get paid without having to do an hour of work for it.

I’m talking about monetized stuff. Passion projects are great, I love passion projects, but this is specifically for people who want to replace (or supplement) their income.

I’m talking about bootstrapped, which basically means you’re not trying to raise money—you are trying to earn enough from your own profits so you never have to raise money and you can take the profits yourself.

Finally, it’s a calm path—the goal is not to grow a 100-person company. You could, but my goal was never to grow a 100-person company, it was just to have this kind of calm, enjoyable life with lots of freedom.

A relatively mainstream term for this is indie hacking.

My Goals for this Talk

My goals for this talk are two: The first is just to kind of convince you that this is a possible career path that you can consider. I think a lot of people don’t think of this as a career path, but it is. And then if you decide that maybe you want to give it a shot, hopefully to give you some tips that help with your chances of success.

Here’s my diagram for how this can be achieved:

How to become an indie hacker

The first thing you have to do is create space in your life. For a lot of people I think this is probably the hardest part—just finding the space to pursue something like this.

Then you enter this iteration cycle where you’re basically taking a shot at something, and if it doesn’t work out you apply the lessons learned and take a shot at something else. That doesn’t necessarily mean building a new product but could be trying a different approach to marketing, trying a different way of presenting the UX, and so on. Hopefully eventually something works and then you come out the other side and… profit!

Let’s talk about making space first.

Making Space

So what do you need to get started?

What do you need to get started?

The kind of boring answer is, basically, nothing!

Nothing! (almost.)

Besides time.

Time

And you need a particular type of time where you can do deep focused work on a consistent basis. Which is not always easy to find.

So how do you make time?

How do you make time?

I think the most common path is to fit it into your nights and weekends.

The Night and Weekend Warrior

This is my friend Wisani who created an app called Boardroom which is like Tinder for LinkedIn. It’s growing pretty popular in South Africa and he’s got like 10 other projects. He’s done all this while working a full-time job at Allan Gray and I think also getting an MBA.

Doing this at nights and weekends is definitely the least disruptive way to pursue this career, but it’s really hard. For me, I have kids, I’m always tired at the end of a long day of work—this just wasn’t an option.

So what if you can’t pull this off?

Another option is to just kind of YOLO it.

The YOLO Entrepreneur

This is where you quit your job and you’re like “cool I’m going to quit my job and I’ve got twelve months of savings in my bank account and I’m just going to make this work.” I think this is usually not a great plan—it puts so much time pressure and financial pressure on you to succeed. And there’s just so much variance—it’s not like if you just work your butt off for a year you’re guaranteed to succeed in ways that some jobs are more like that.

What I did, and the path that I recommend, is more of just a patient thing where basically you try to integrate this quest into your regular life in a slower and more sustainable way.

The Patient Path

Typically that means working less, and therefor either earning less—you ask your boss if you can have Fridays off and get paid 80% as much—or getting paid more per-hour by switching to contracting or freelancing. Then you build some products and basically whenever those products start earning, you can sort of re-adjust the hours until it works.

This is my actual income for my first seven years on this journey:

My Patient Path

In 2016 I was a full-time employee. In 2017 I went on sabbatical and started doing this thing and I dropped down to halftime at my job and filled most of my income with consulting. I just tapped my network and said “I’m available, I’m going to charge a lot”. And thankfully, some people said yes. There’s this tiny little sliver of yellow there and that’s me selling those place cards.

But basically what happened is that it was like a snowball that feeds off on itself and so year two I made a bit more, year three I made a bit more, and I eventually dropped down most of my consulting and then I finally quit my job. Obviously this took a long time, but the nice thing was that I was never stressed out. There was never any financial pressure on me.

Now I’m going to jump into the main iteration cycle.

Iteration

I want to zoom in on what this looks like. It’s a simplification but basically you pick an idea, you build it, you sell it, and then you repeat.

The Iteration Cycle

Going back to the “anyone can entrepreneur” thing—none of this is fundamentally hard; it isn’t rocket science. If you can teach yourself how to code you can teach yourself how to market, or how to validate ideas. This is all stuff you can learn.

But it involves massive uncertainty.

What that means is you want to go through the cycle as much as possible because on any given iteration you don’t know what the outcome is going to be. If you spend your first two years building a product and then come out the other side and realize nobody liked it, then you’ve wasted a lot of time. Whereas if you can figure out the product is flawed in two weeks or two months then you can go through the cycle a lot faster (and many more times).

I like to think of myself as a one-person venture capital portfolio.

Think of yourself as a one-person VC portfolio.

A VC fund will typically invest in 100 companies and they will expect 90 of those companies to completely fail. But, if they find the next Facebook or the next Uber then everyone who invested in the fund makes tons of money.

I think you should think of yourself in the same way where you want to take as many shots as you can on goal, hoping that one of them succeeds. Obviously you’re not going to be the next Uber, but you’ll start making $10, $100, and then eventually you can direct your energy into that project and grow it.

Here are some examples from the indie hacking world about this:

Examples

Sorry, this slide also got mangled by the transitions.

Pieter Levels, who just sleeps in vats of money, says his hit rate on projects was about 5%. But that was still enough for him to have this incredible career doing a very similar thing to me. There’s this guy Rob Hope who is a South African—he’s got this project graveyard on his website where he documents all the things he’s tried that have failed. Another example from Twitter, this guy Pat Walls published his failures, and I stole his format and published my own version. Basically you can see I’ve done a lot of these things and only a small number of them have actually worked out.

So a lot of things going to fail, and therefore you want to go through them as fast as possible.

Let’s look at picking an idea.

Picking an idea

This is a diagram that I borrowed from a successful entrepreneur named Rob Walling.

The Stair Step Approach (Rob Walling)

The key here is basically you want to walk up these steps of difficulty. You want to start on step one with the smallest, easiest possible thing that you can do. For me, that’s like someone can give me $5 to buy a PDF of placecards—that’s a very easy thing to build, and a very easy thing to sell.

Once you do that, then you try to do it again, and again, and eventually if you are successful then you kind of own your own time. That’s step 2.

Rob’s step three is to go do a big ambitious thing. I drew a dotted line there because you don’t have to go to step three—you can just have a bunch of step two projects and be having a nice life as a developer with a lot of freedom. Go take a moonshot if you want, but you can just do stuff in step two and I recommend that too. That’s kind of where I’ve landed.

Start small is the key in terms of thinking about ideas.

Start small

Your ideal first project should be easy to build, easy to support and easy to sell. And it’s actually good if it’s not a unicorn idea. Don’t try to build the next Facebook or Uber or OpenAI or whatever else—you want to find a really specific niche and go in there.

So how do you come up with ideas?

How to come up with ideas?

There’s lots of stuff you can read on the internet about this by people who know and see a lot more than me, so I’m not going to give specifics apart from saying just go do something!

Do something!

Using an example from my own life: I got married, we had this issue where we had to print these stupid place cards that led to me building this place card application and that was my first thing. Then I was building these applications and I was like “oh that’s another problem” so then I built SaaS Pegasus. Then I had this SaaS Pegasus community that I was trying to support and I kept answering the same questions over and over again so I thought “oh I’ll build a RAG chatbot.”

Doing things introduces you to problems

Through building something I found a new problem and then I could go build something else. Don’t do something just because the first thing you’re going to pick is going to work, but do something because by doing something you’ll just get exposed to more stuff and then that will lead you somewhere interesting. Just keep following interesting problems until you find one that you want to solve.

I want to quickly mention this book “The Mom Test” which is a great book about idea validation.

Validation is hard

The key point from this book is when you’re trying to pitch a startup idea to anybody, they’re going to talk to you like your mom would talk to you. You’re going to pitch your thing and they’ll say “sweetie that’s such a good idea, I love it, you’re going to be so successful.”

People do this because they don’t want to burst your bubble. You’re coming to your friends, you’re coming to your co-workers saying “I got this cool idea for an app”—no one’s going to tell you that idea sucks, because they want you to not have your confidence destroyed. This is a really good book that hammers that concept home for you and then gives you a framework for trying to get around the fact that everybody’s kind of lying to you when you pitch your ideas.

Now I’m going to talk about building and specifically building an MVP—a minimum viable product.

Building an MVP

Probably a lot of you have heard of MVPs. Basically it’s like you’re trying to build the smallest, useful version of your thing.

What is an MVP?

In this example, a skateboard is useful—you can ride a skateboard and get from point A to point B, whereas if you have a wheel of a car or an axle of a car, that’s not useful. There’s also a concept called an SLC which stands for simple, lovable and complete, which I really like.

Before we talk about MVPs, let’s talk about why indie products fail.

Why do indie products fail?

Why do indie products fail? The main thing I want to emphasize here is indie products are different from most big company software that probably a lot of you have worked on.

Indie products are different

In traditional day jobs at places like Amazon, you’re probably dealing with legacy code and maybe the person who wrote it left and there’s no documentation and there’s all these scaling issues, performance issues—and none of this matters in the indie world. If you’re building independent projects, the reason that you’re going to fail is no one wanted what you were building, you didn’t market it, or you ran out of time, money and motivation.

The takeaway is just when in doubt, optimize everything for speed. That optimizes for how fast you can get to market, how fast you can respond to feedback, how fast you iterate, and how fast you realize your idea was really dumb and you should go do something else.

Optimizing for speed also optimizes for...

I’m not saying write a bunch of terrible code—I’m just saying be smart about it.

This doesn't mean "write bad code"

Taking something like tests—you shouldn’t care about code coverage or anything like that. What you should care about “is if I write this test, am I going to be able to modify this code more confidently faster later?” If so, do it. If not, maybe don’t worry about it right now.

Optimizing for speed in practice

In other words: “Don’t overthink it.”

Now let’s talk about design.

Use someone else's design

Design is often the weakest spot for developers and it certainly was for me. Thankfully, you don’t really have to be a designer to make good-looking things anymore. There are all these different open source and paid templates where you can just get these really beautiful designs that work on all screen sizes and they have component libraries you can just drop everything in.

You really don’t have to do any design—you can just steal other people’s designs and I recommend doing that again just because it’s faster and because you can make it beautiful or unique later.

Also, you don’t have to start from zero.

You don't have to start from zero.

This is a bit of a shameless plug but it’s also the case that I’ve seen a lot of people use products like SaaS Pegasus (mine) to launch their products way faster. They are called SaaS boilerplates, or SaaS Starter Kits. It’s a whole product category with both open source, free and paid options. These projects often have a ton of stuff built for you.

Then, instead of your first two weeks just being “okay I’m going to build user accounts and I’m going to build billing and I’m going to build multi-tenancy” and all this stuff, you just have all that ready to go and can focus on the one thing that you need to do.

Ok, big question: what tech stack should you use?

What tech stack should you use?

There is a right answer to this.

The answer is: the one you know.

The one you know!

If you want to learn a new tech stack as a fun educational project that’s great, but if your goal is to get this revenue-generating product in the world, you want to be working in something you’re familiar with because you want to be as fast as possible.

If you don’t know anything, just pick something popular.

If you don't know anything...

..pick something popular.

Popular things have the best communities, best documentation, the language models are the best at working in them and so on. I use Django, HTMX and Tailwind for most of my stuff—I like it but you should do what you know.

And remember, if you’re not embarrassed by the first version of your product, you’ve launched too late. That’s a quote from the founder of LinkedIn.

And Remember...

This was the landing page for Place Card Me that I used for the first year or something like that. Those are literally font awesome icons and just this giant stupid button. You have to get comfortable putting things out into the world that don’t meet your level of internal quality because you just have to try it. Starting the feedback loop is much more important than building a perfect thing.

Let’s move on to selling.

Selling it

I think one of the biggest fallacies that I see in developers is this idea that if they build the best product in the world, they’re going to be successful.

If you build it they won't come

They think the reason that some software is successful is because it was the best product, when usually it is because that product had the best marketing team or some combination of a great product and great marketing.

An uncomfortable truth is that you will need to spend a lot of time selling your product through marketing or sales. There’s this book “Traction” that I recommend, which is where I got the takeaway that I should be spending half of my time on marketing and sales.

Half your time should be on marketing/sales

I think of it like eating my vegetables—I love coding, I hate marketing. It’s like “I have to do marketing today” and it’s like eating my vegetables before I can have my french fries.

I also recommend selling your product before it’s ready.

You should sell your product before its ready

I’m sure we’ve all had this experience where you find some app that looks really cool and you want to sign up, and then you just get this popup asking for your email address.

As a consumer, that’s super frustrating, but as an app builder this is really useful for two reasons. One reason is that if you can’t get someone to give you their email address, then it’s going to be very hard to get them to pay you. It’s a good proxy for whether you’re building something that anyone is even remotely interested in. You put this up, figure out a way to drive traffic to it, and get a bunch of emails - that’s a good sign. If you drive a thousand people to this website and don’t get any emails, that’s a bad sign.

It’s also really important to have people to tell about your product when it’s actually ready. What often happens is someone will build in the darkness for a long time, they’ll launch their product, put it on Hacker News or something, nobody notices, and then they think that they failed. But they haven’t failed, they just had unrealistic expectations of success.

If that person instead had a list of 200 people that they knew were interested in their product, then before launch they could email 20 of them and set up Zoom calls to walk through it and get feedback. You can iterate there, email the next 20, and so on.

Then when you actually launch you email the whole list. Hopefully you have people jumping into your product from day one, and it’s better than it would have been otherwise.

Let’s talk about getting your first traction.

Getting your first traction

These are some strategies that worked for me.

Communities

Communities are a really good place to find early users. The great thing about communities is they’re incredibly niche. If you’re building a product for plumbers who play Dungeons & Dragons, there’s probably a Reddit for that, probably 20 Facebook groups for that, and you can go immediately into those communities and talk to your ideal customer profile.

One thing to know about communities is that all marketers use them this way, so communities hate marketers. They develop a pretty strong immune systems towards people doing marketing stuff. So go into communities tactfully, be nice in the community, add value, and mention your product when it’s relevant. This is a great way to find early users.

Ads

Ads are another good way to find early users. Google sponsored links are very prominent, and you can use ads on tons of other platforms.

Ads are great to try to answer that question of whether anybody wants what you’re building or if anyone can figure out how to use your product. You can just pay a bit of money for a hundred people to come play with your website and figure out what’s happening. I don’t recommend using ads as a way to make money—you will lose money on this exercise (at least in the beginning)—but you’re essentially trading early bits of money to do some user research.

Cold outreach

Cold outreach is another uncomfortable one that also works sometimes. When I was building Place Card Me, I spent an hour a day just reading wedding blogs and writing long personal emails to wedding bloggers. I would do this every day and most of them never responded to me, but one of them did. This person helped me tremendously in terms of teaching me about the industry, gave me backlinks on her site, and gave me some place cards designs.

Cold outreach can be a good strategy, though it’s getting harder in the age of AI spam. It’s better to write one really good email or Twitter DM than to find some tool that uses an LLM to write a hundred terrible ones.

Content / SEO

Content is my main way that I market my stuff today. Basically, if you’re building a product for a specific industry you create content that’s useful for that industry.

I target Django developers, so I write content about Django. Then people who are Googling how to deploy Django or how to connect Django to Stripe find my guides, and that’s a nice way to get exposure to my product. You get backlinks and so on.

I do some of this stuff on YouTube now also. Video is huge—I don’t have really any YouTube following but it still drives a good amount of traffic. This is something that you can do as an individual, just writing these blog posts or recording little screencasts.

Building in public

I hesitated to put this slide on, but this was something that I did a lot which is called “building in public”—that’s just where you share what you’re working on publicly. I did this on my blog. You’ll see this on Twitter all the time; it’s gotten to be a very noisy channel.

But the nice thing about this strategy is it requires no work. You just work on your project for two hours and then post a screenshot of what you did. Maybe a few people will find what you’re doing interesting, follow along your journey, and then they become little advocates that will try your products or connect you to other people.

Now I want to talk a little bit about psychology.

Psychology

I lied with this diagram—this is not the only thing that happens.

There’s another path which is not as good: giving up.

Psychology

Courtland Allen, who is the founder of Indie Hackers, interviewed hundreds of people who have gone down this path. When asked what his biggest takeaway from the success stories was, he said all you have to do is just not quit. There was no other generalizable advice he had—just don’t quit.

“Your Whole Goal Is to Not Quit”

So how do you prevent yourself from quitting?

How do you prevent yourself from quitting?

One way is by having infinite runway, which I talked about already.

Infinite Runway

Setting things up so that money is not the reason you quit is a good way to eliminate that particular constraint.

You still might run out of motivation, though. In order to stay motivated, I think one of the most important things to do is to just manage your own expectations.

Expectation management

I like this quote from Bill Gates: “Most people overestimate what they can do in a year and underestimate what they can do in 10 years.”

This really resonated with me. If you think “I’m gonna go down this path and take a shot and spend three months and launch my app” and then it didn’t work so you quit—that happens to a lot of people.

Instead, if you think of it as more of a five-year journey or a ten-year journey, you’re just going to take these really small steps that don’t really look like they’re working, but they’re cumulative and eventually they add up.

Embracing learning is another important one.

Embrace Learning

This is kind of like a Dungeons and Dragon’s character skill chart—probably the average person in this room is a pretty good coder, maybe knows a bit about product and marketing and other skills. But in order to really succeed in this career, you have to look more well-rounded.

The hard part is really just embracing the idea that you’re going to learn these skills that don’t feel comfortable. I’ve never enjoyed marketing or wanted to be a marketer, but it was something that I had to learn in order to sell stuff. If you’re not willing to get outside your comfort zone and learn things that are outside what feels comfortable, you will probably have trouble.

Be Resilient

Finally, be resilient.

This is a funny graph showing how much money I was making every week on Place Card Me in the beginning of 2020. Things were going great—I was making up to $400 a week, and then all of a sudden there was a global pandemic and people weren’t having weddings anymore. My primary source of passive income just dropped to zero overnight.

That’s just one example of things that happen to you because this is a very unpredictable career. The highs are really high, the lows are also really low, and you have to be resilient to these ups and downs. Going two weeks without earning any money and then one day a bunch of money appears in your bank account, it’s a very stressful thing. You have to get used to these big swings that you just don’t have in a salary job.

But if you emerge on the other side, hopefully you can profit.

Profit

This is from my very first blog post when I first started - I just wanted to get someone to pay me a dollar on the internet. That was my goal.

How it started / How it's going

Eight or nine years later, I am making more money than I ever made as CTO of my company. But more importantly, it’s a very nice lifestyle. I have time to hang out with my kids whenever I want, I can go on long vacations, nobody tells me what to do, I can take meetings whenever I feel like it (or never).

So yeah, I recommend this career.

And if it sounds interesting, I recommend you give it a shot, and see if you can make it happen.

Questions?


Thanks for getting this far! If you liked this you can comment below, share it, or subscribe to get email updates when I publish new stuff.

]]>
https://www.coryzue.com/writing/solopreneur/ hacker-news-small-sites-42696822 Tue, 14 Jan 2025 13:21:28 GMT
<![CDATA[Snyk security researcher deploys malicious NPM packages targeting cursor.com]]> thread link) | @arkadiyt
January 13, 2025 | https://sourcecodered.com/snyk-malicious-npm-package/ | archive.org

You can see in the screenshot that the data is then exfiltrated to a website that the attacker owns.

Now, typically, when we see packages like this, they are attempting to perform a dependency confusion attack on a specific company.  I don’t know if Cursor.com has a bug bounty program or a specific background. Still, I would suspect that Cursor has several NPM private packages named “cursor-always-local”, “cursor-retrieval”, and “cursor-shadow-workspace”.  The person who created these packages is probably hoping that Cursor employees accidentally install these public packages, which will send their data to the attacker-controlled web service.

Luckily, in addition to me seeing these files, the OpenSSF package analysis scanner identified these packages as malicious.  OSV generated 3 malware advisories:  MAL-2025-27, MAL-2025-28 and MAL-2025-29.  You can see the malware advisories here:  https://osv.dev/list?q=cursor&ecosystem=npm

Who deployed these malicious packages?

Okay, we know what the packages do when installing them, and we think they target Cursor.com.  Who would do this?  Well, the answer is in the NPM package metadata.

The user who published the NPM package uses a snyk.io email address for the Snyk Security Labs team.  This part of the metadata cannot be faked.  The author field in the metadata specifically mentions an employee at Snyk.  This part of the NPM package metadata can be faked, but since the publisher is a verified Snyk email, my guess is that this genuinely came from Snyk.

]]>
https://sourcecodered.com/snyk-malicious-npm-package/ hacker-news-small-sites-42690473 Mon, 13 Jan 2025 22:38:27 GMT
<![CDATA[Nvidia might do for desktop AI what it did for desktop gaming]]> thread link) | @walterbell
January 13, 2025 | https://www.theangle.com/p/nvidia-might-do-for-desktop-ai-what | archive.org

One of the highlights of the annual Consumer Electronics Show (CES) has been the NVIDIA keynote for as long as they’ve been around – it’s a much different affair from the usual efforts by Samsung, LG, Sony and others which just feel like marketing decks for home electronics buyers but performed on stage with a relatively inflated A/V budget. NVIDIA’s keynote, delivered by its charismatic CEO Jensen Huang, offers a more wide-ranging and ambitious look at the future – as powered by NVIDIA tech, of course.

This year, NVIDIA’s keynote focused on AI, which is to be expected given that company’s foundational role in underpinning much of the advancements made in the era of the LLM. Alongside new foundational world models, and a platform for helping individuals train and test the next generation of AI-powered robots, Huang also revealed ‘Project Digits,’ a new product based on its Grace Blackwell AI-specific architecture that aims to offer at-home AI processing capable of running 200 billion-parameter models locally for a projected retail cost of around $3,000.

My former colleague Kyle Wiggers has a great rundown of the specs and capabilities of the initial ‘Project Digits’ offering at TechCrunch, which NVIDIA is developing and shipping with partners (much like it does with its consumer GPUs currently – an important point of comparison). The bottom line is that it can provide unprecedented local AI compute at a – while still expensive – extremely affordable price point, relatively speaking.

Based on Huang’s statement to Kyle, the initial target market for Project Digits isn’t necessarily just your average home PC user; he specifically called out the intent of putting AI supercompute capability in the hands of “every data scientist, AI researcher and student,” so there’s an assumption that at least partial specialty and technical familiarity will be part of the ideal customer profile out the gate.

There are many exciting things about Project Digits, including the fact that two can be paired to offer 405 billion-parameter model support for ‘just’ $6,000 (again, sounds like a lot, but tiny compared to what has previously been available), and its ability to play nice with both Windows and Mac – as well as to operate independently using its built-in Linux-based DGX OS. But what’s most exciting about it is probably what it indicates in terms of NVIDIA’s strategy around consumer AI and what comes next.

If NVIDIA and its partners can deliver a capable product that lives up to its promises on Project Digits at its target price point, that will drive a lot of interest, and may start turning the flywheel for future cost reductions, along with performance and efficiency improvements that make possible a more mature product line with varying price points depending on needs. It could, in fact, look a lot like NVIDIA’s current GPU line, which offers tiers of performance at different price levels depending on the needs and budgets of its gaming customers.

These could fit neatly with different parameter-sizing in terms of model support, and NVIDIA has a well-established playbook when it comes to working with licensed hardware partners to co-brand and distribute this kind of product lineup.

The remaining challenge in terms of this growing from an interesting but minuscule side-business aimed at professional researchers and a few amateur enthusiasts lies on the software side; in GPUs, NVIDIA benefits from an extremely mature and diversified gaming market to drive demand for its products – in AI, and generative AI in particular, the user-facing software product side is less well-established. Most of the paradigms we’ve seen so far either focus on A) simple, cloud-based products like ChatGPT and Claude or B) more complex, specialist-only implementations like local installations of Stable Diffusion, Meta’s Llama or similar.

If we see a proliferation of AI products with an accessible and easy-to-learn user experience, paired with local offline backend, then we could definitely see this become an area of rapidly ramping investment and interest for NVIDIA. With Project Digits, we at least see the company looking to solve one part of that particular chicken and egg problem.

Discussion about this post

]]>
https://www.theangle.com/p/nvidia-might-do-for-desktop-ai-what hacker-news-small-sites-42687510 Mon, 13 Jan 2025 19:19:42 GMT
<![CDATA[Document Your Progress at Work]]> thread link) | @tminima
January 13, 2025 | https://shivamrana.me/2025/01/document-your-progress/ | archive.org

How can you ensure that your contributions are also recognized?

A common challenge, especially in larger organizations, is that your manager may not always be fully aware of the specifics of your work, and your manager’s manager likely has even less visibility. It isn’t due to a lack of interest but rather the sheer volume of responsibilities and information they handle. Additionally, even for you, it’s hard to remember all the details beyond the highlights. I find a proactive strategy essential for such scenarios: sending regular progress digests.

These digests are concise, structured email updates that you send periodically to both direct manager and their manager. The aim is to offer a clear snapshot of your activities, their impact, and your forthcoming plans. See it as a method to keep your supervisors well-informed, especially when you lack regular direct interactions.

That’s it. That is the idea. You can be creative and apply it however you want. However you decide to do it, you will see gains.

In the next section, I list the key points I usually consider in my snapshots.

Key Elements of an Effective Progress Digest

To ensure your digests are both informative and impactful, here’s what you can include:

  • Specific Task Details: Provide project specifics and relevant links to the completed/picked coding tasks. It entails a 1-sentence project description, PR links, JIRA tickets and other code artefacts.
  • Data Science Related: If applicable, detail the models you’ve trained and deployed. Any A/B experiments launched and test results of the ones that concluded. Also, share the project solutioning doc here.
  • Documentation Efforts: Highlight any documentation you’ve created or maintained. You can also merge this with other points.
  • Impact and Results: Clearly articulate the outcomes of your tasks and their value to the team and company.
  • Initiatives and Discussions: Share any new ideas you’ve put forward or discussions you’ve initiated.
  • Future Plans: Outline your planned next steps.

Benefits

The effort invested in creating these digests yields substantial career benefits:

  • Enhances Diligence: Summarizing your work makes you more conscious of your efforts.
  • Boosts Positive Perception: You are perceived as a proactive and accomplished individual.
  • Creates a Performance Record: These digests serve as valuable documentation of your work, valuable during performance reviews.
  • Ensures Visibility: Even if managers don’t respond directly to each email, they will read them, which ensures they are aware of your work and its progress.
  • Effective at Any Stage: While this practice is advantageous when starting a new job (or joining a new team), I have found it beneficial at any stage.

Conclusion

Actively managing your visibility is key to long-term career growth. Sending out regular progress digests ensures that your work is recognized. You also establish a record of your accomplishments and demonstrate your value. This practice requires regular work but has good returns.

PS: I learned this trick on a tech podcast many years ago. If anyone knows which podcast or episode, please share it with me, and I will link it here.

Update: 14th Jan

PS: A related idea of brag documents explained beautifully by Julia Evans. Shared on this HN comment.

]]>
https://shivamrana.me/2025/01/document-your-progress/ hacker-news-small-sites-42687148 Mon, 13 Jan 2025 18:57:38 GMT
<![CDATA[Cosmos Keyboard: Scan your hand, build a keyboard]]> thread link) | @cdata
January 13, 2025 | https://ryanis.cool/cosmos/ | archive.org

BETA

Custom-Build A Keyboard Fit To You

Don't Settle

For One-Size-Fits-All

Try the Beta

See more keyboards in the showcase

Scan Your Hand, Build a Keyboard

Cosmos is the easiest way to design a keyboard around your one-of-a-kind hands. Scan your hand using just your phone camera, then fit a keyboard to the scan. The key positions align to your fingers' lengths and movement.

Add All The Things

Add a trackball, trackpad, encoder, or OLED display. There's support for MX, Choc, and Alps switches, and almost every type of keycap. Plus with 7 different microcontrollers, you can mix and match all you like.

Browse the Parts

Generate Stunning Keyboards

Choose from 3 types of cases, split or unibody, and many customizations.

Keyboards generated with Cosmos

Print Worry-Free

Cosmos catches errors before you print and automatically fixes common model issues.

Take Control

Custom Thumbs mode allows you to drag and drop keys and trackballs in the thumb cluster into place.

Mix and Match Keycaps

Your artisans are now ergonomic. Whatever batch of keycaps you decide to use, Cosmos will arrange them to fit your desired curvature.

Keys with different styles of keycaps with their tops lined up

RGB and Hotswap Ready

Cosmos has first-class support for Amoeba King PCBs, which let you easily integrate per-key RGB and hotswap sockets. It also has high-quality built-in-hotswap and PCB-less options if you're on a budget.

Hotswap sockets in Cosmos

And last but not least…

Keyboard in Autodesk Fusion

Give it to your CAD friend

Every model can export to STLs, which are meant to be sent to your 3D printer or an online printing service, or to STEP models, which can be modified in CAD programs. If you don't like the way your model looks, ask your closest CAD guru to make adjustments.

Learn About CAD Export

Join us in revolutionizing keyboard design.

Cosmos is made in the open, and 95% of the code is open-source. It's our firm belief everyone should have free access to technology to relieve and prevent typing pain.

Come see the unique keyboards we all are making on the Discord server.

Don't have an account? I send a few recaps per year to my newsletter .

The other 5% of code? That's for the Pro features, which add extra cosmetic options to your keyboard and help keep this project sustainable.

Sound fun?

Try the Beta

Psst! Come here from my Dactyl generator? You should give Cosmos a try. It's changing a lot but it will give you a much better Dactyl-like case and microcontroller holder.

]]>
https://ryanis.cool/cosmos/ hacker-news-small-sites-42686144 Mon, 13 Jan 2025 17:42:00 GMT
<![CDATA[We Built a Phone Agent the Hard Way – You Can Do It in 30 Minutes]]> thread link) | @Bnowako
January 13, 2025 | https://nomore.engineering/blog/voice-agents | archive.org

Introduction

This is the story of how we dedicated nights and weekends to building our own Phone Agent solution—something that could be achieved in just 30 minutes with existing platforms. It was a challenging but rewarding journey, and along the way, we gained invaluable insights that we’d like to share with you.

A few months ago, we started exploring challenges in Polish primary care. One problem stood out immediately: booking doctor appointments. Most appointments are still scheduled via phone calls. During flu season, dozens of people try to book an appointment as soon as the facility opens. Many spend hours trying to reach a receptionist, but their calls are never answered.

To solve this, we decided to create a Phone Agent to handle the calls. In this blog post, we’ll give you an overview of the Phone Agent landscape, and dive into how we built our own Agent. We’ll also discuss existing solutions and share our vision for the future of this domain.

Exploring Off-the-Shelf Solutions

We initially tested an off-the-shelf solution: bland.ai. It was impressive—we had a proof of concept up and running in less than 30 minutes. The Agent followed a script well, and we couldn’t perform any jailbreaks. However, we encountered three significant issues:

  • The Agent was dumb. Likely due to the model size or prompt design. Improving this required evals, but they were really cumbersome to perform in a web interface. We couldn’t see this setup being production-ready, as AI Agents demand rigorous evaluations.
  • Cost. To use a Polish phone number (via Twilio) we needed a $300 “PRO” plan (though this is now reportedly free).
  • Polish language support. The transcription was poor and regularly misinterpreted speech.

These limitations led us to do the typical programmer thing—build our own Phone Agent infrastructure.

Building an AI Voice Agent

The Basics

Let’s now build an intuition on how our Agent should work. To create a Voice Agent, we needed three main components:

1. Speech-to-Text (STT): Convert user speech into text.

2. Language Model (LLM): Analyze user speech text and generate a response.

3. Text-to-Speech (TTS): Convert the Agent’s text response into speech.

Voice Agent architecture

Our pipeline also needs a way to determine when the user finished speaking so we can start responding. This is where another crucial component of AI Voice Agents comes in: Voice Activity Detector (VAD).

A job of a VAD is to determine whether some chunk of audio contains speech. Once we know if chunk of audio contains speech, we can define a heuristic for responding. For example: we assume that 500ms of silence after a speech segment means that the user finished speaking and we can start working on a response.

Nowadays VAD is usually achieved using lightweight ML model so that would be a 4th model in our pipeline.

Voice Agent architecture

Voice Agent architecture

Real-World Challenges

This architecture wouldn’t work great in the real world because it assumes that the conversation is synchronous, i.e. the speaker starts speaking when the previous speaker finished speaking. Real-world conversations are messy: People interrupt, pause mid-sentence, or backchannel (“mhm,” “uh-huh”). To handle this, we needed to add a conversation orchestrator—a component to manage these complex interactions.

For example:

  • Agent is speaking and the user interrupts. Should the Agent stop speaking, listen to the user's speech and respond to that speech? Or perhaps the user was just backchanneling and the Agent should continue talking?
  • User is speaking and goes silent for a couple of hundred of milliseconds. This is enough for our VAD to detect the end of speech. We start working on a response and the user resumes the speech. Should we now respond to both speech fragments? Or only the last one?

While humans handle these situations effortlessly, programming them is much harder. Designing this orchestration involved countless “if” statements and tuning hyperparameters. But after this step we had a working Voice Agent!

Voice Agent architecture

Voice Agent architecture

def process(conversation: Conversation):
    speech_result = check_for_speech(conversation)

    if speech_result is None:
        return ConversationState.HUMAN_SILENT

    if speech_result.ended:
        logger.info(
            f"Human speach ended. Speach length: {speech_result.end_sample - speech_result.start_sample}. "
        )

        conversation.human_speech_ended(speech_result)

        if conversation.agent_was_interrupted() and speech_result.is_short():
            logger.info("🎙️🏁Short Human speech, agent was interrupted")
            return ConversationState.SHORT_INTERRUPTION_DURING_AGENT_SPEAKING

        if conversation.agent_was_interrupted() and speech_result.is_long():
            logger.info("🎙️🏁Long Human speech, agent was interrupted")
            return ConversationState.LONG_INTERRUPTION_DURING_AGENT_SPEAKING

        if speech_result.is_short():
            logger.info("🎙️🏁Short Human speech detected")
            return ConversationState.SHORT_SPEECH

        if speech_result.is_long():
            logger.info("🎙️🏁Long Human speech detected")
            return ConversationState.LONG_SPEECH

        raise ValueError("🎙️🏁Human speech ended, but no state was matched")

    elif not speech_result.ended:
        if conversation.is_agent_speaking():
            logger.info(
                "🎙️🟢 Human and Agent are speaking."
            )
            return ConversationState.BOTH_SPEAKING
        else:
            logger.info("🎙️🟢 Human started speaking")
            return ConversationState.HUMAN_STARTED_SPEAKING

Orchestator code

Testing

How do we tell if our Agent is actually any good? And how do we make sure the whole thing doesn’t break if we change one line in our fragile voice orchestrator? Our Voice Agent needs testing. We focused on two areas:

1. Agent quality: Does the Agent respond and perform actions as expected?

2. Conversation quality: Does the interaction flow naturally?

We will talk about testing conversation quality since many great pieces were written about LLM evaluations (you can check this great overview by Hamel Husain).

Manual Tests

To test conversation quality we started with manual tests. We have written a bunch of scenarios and checked whether the conversation had a good flow. And while it was helpful at the beginning it quickly became impractical:

  • Tests were not repeatable (our speech intervals had some variance obviously).
  • Finding the root cause of issues from logs after a conversation was a difficult task.
  • Tests were taking a lot of time, we had to do hours of actual conversations with our Agent.

Overall manual testing led to a huge frustration so we looked into ways to perform automated testing.

Automated Tests

For automated tests we wanted them to be as close to the original conversation as possible. So we took our scenarios from manual tests and prerecorded parts of user speech. Then we simulated the conversation by streaming the audio chunks to our Agent. We would add assertions like:

  • Agent should respond between time x and y
  • Agent should stop talking between time x and y
"""
If human is talking and makes a long pause such that we detect that the speech ended, our agents starts thinking about response.
If human then resumes talking before agent started responding we should cancel the first response and listen until the second speech ends.
We should then respond to both segments at once.

The file contains an 8-second clip. Structure is the following
00.00 - 01.90 silence
01.90 - 03.00 speech (1st segment)
03.00 - 04.10 silence (pause)
04.10 - 05.10 speech (2nd segment)
05.10 - 07.85 silence

We expect the following:
- speech_to_text is called with about 1.1s of speech (1st segment)
- speech_to_text is called with about 3.2s of speech (1st and 2nd segments)
- response is sent to websocket only after the second speech is finished
"""

One of the test cases we wrote

These tests were very fragile and hard to maintain, but they gave us some confidence that the conversation was working as intended.

Future of Automated Testing

At the time of our work on our own Voice Agents multiple startups (Coval, Hamming ai, Vocera) got accepted to YC with the promise of providing automated tests / simulations for Voice Agents (Coval has officially launched since then). While we didn’t have a chance to try them at the time, we love the idea behind those products. If these products will successfully do what they promise they will definitely play a key role in AI Voice Agents test stack in the future.

Sidenote: We are curious what approach do these companies take to test their own Agents.

Telephony

So far we have described how to create a Voice Agent, let’s now discuss how to add a telephony layer to it. If you have experience with telephony you can safely skip to the next section.

Prior to starting the project we haven’t worked with telephony. We learned quickly that there is basically a single company dominating the market - Twilio (btw. Its stock seems to be struggling now, is it a good time to buy?:)).

Twilio provides a bunch of APIs and products which allow developers to program phone conversations. One of its products is called Media Streams. The idea is that the user calls the number you have registered with Twilio and Twilio sets up a websocket to your server. This websocket will then be used to send audio chunks in both directions: user speech from Twilio to your server and Agent speech from your server to Twilio.

Apart from audio chunks, Twilio supports other types of messages which allow you to control the conversation, e.g. a clear message which clears the audio buffer on Twilios side and basically cancels your Agent speech. By integrating our conversation orchestrator with Twilio API we can achieve things like pausing when interrupted, replaying audio chunks and so on.

To make it easier and cheaper to test Voice Agents we’ve built a simple website that tries to simulate the Twilio Media Streams API, allowing us to test the conversation locally without using Twilio.

Twilio

Website we used for testing the Voice Agent locally

Voice Agent architecture

Final architecture with telephony layer

Reality Check

We had a working platform that could in theory be used with arbitrary AI Agent to serve different domains. We learned a lot and the final result was really cool, but we needed to finally answer one question: does it make sense to maintain all of this code just to build a Phone Agent? Couldn’t we simply use one of the existing providers?

We came to the conclusion that building your own conversational platform makes sense if :

  • You’re selling the platform itself
  • You are big enough that is actually cheaper to build and use your own platform
  • You have custom requirements not covered by existing platforms

We didn’t fall into any of these 3 categories. Also, we felt that our main strength was building good and thoroughly evaluated AI Agents, not the conversational platform. The space of the Phone Agent platforms is really packed plus there is always the risk that OpenAI will come and basically eat the whole industry. Thus we decided to return the original idea of using an external provider.

Phone Agent Platforms

Before we go into evaluating specific products, let’s think about what are the desired properties of a Phone Agent platform. For us it were the following:

  • Conversation flow - does the conversation feel natural
  • Latency - are the Agents response time consistently low? We aimed for less than 2 seconds for Agent response
  • Agent quality - does the Agent follow the script (prompt)?
  • Availability
  • Speech transcription / speech generation correctness - does the Agent support Polish language well?

We have tested 3 platforms: bland.ai, vapi.ai and retellai.com.

Vapi and Retell are quite similar to what we have built - they provide conversation orchestration and make all the blocks (STT, TTS, LLM) pluggable. For example on Vapi you could use deepgram as a transcriber, ElevenLabs for text to speech and OpenAI for LLM. Both Vapi and Retell allow you to specify your own endpoint for text generation which enables you to use basically every model for the LLM part. These two platforms take care of the hard parts of conversation flow and let developers focus on Agent quality.

Bland takes a very different approach. They host all the models themselves and provide a conversation abstraction called Pathways. The idea is that you describe the conversation as a flow chart, at each stage describing what the Agent is supposed to do (with natural language).

These are 2 very different philosophies each with its pros and cons.

The Bland self-hosted approach has serious benefits:

  • latency - because they host all of their models in the same location it's the best in class. With Bland the response latency was consistently around 1.5s whereas for other providers it was no less than 2.5s.
  • availability - since Bland does not rely on external providers (except probably Twilio and AWS) it is much less prone to outages then other platforms which use multiple providers underneath.

But it also had drawbacks:

  • Since you can’t point Bland to your own Agent for the LLM part, you must write all the Agent logic and prompts in Bland UI. This was a deal breaker for us, because it meant we couldn’t reliably evaluate the Agent. Bland provides some in browser testing capabilities but this was very far from what we are used to when evaluating the Agents. (As a side note this made us wonder who this product is actually for? For us, it seemed there was not enough control for production workflows to be used by developers, and it seemed too complex to be used by non-technical folks)
  • The transcription model quality was poor for Polish language

Bland's weaknesses are Vapi's and Retell's strengths and vice versa:

  • Since we could provide an endpoint for the LLM we can have all the Agent logic on our backend server. This meant that we could write and evaluate a regular Agent which gave us a lot of control and confidence that the Agent works.
  • Latency was 2.5s at best and there were spikes up to 7s! This is because each speech segment has to go through multiple providers before the response. These providers have variable response time latency and the total variance at the end is really high.
  • Availability - with each added provider the expected availability is lowered. If you have 3 different providers each with 99% SLA then the expected SLA is just 97%.
  • Speech transcription - there is a notable difference between Retell and Vapi here. Retell handles transcription on its side and it was really poor for Polish. With Vapi on the other hand we were able to find a provider which had good quality for our mother tongue.

The table below summarises our findings:

Summary

The winner? We wanted to go with Bland because it was so snappy. But with lack of control over the Agent logic and poor transcription quality we couldn't make it work. We ended up using Vapi.

Future of Phone Agents

Lastly, we would like to share how we envision the future of AI Agents space:

  • STT and TTS: Abstracted away for developers.
  • Seamless conversations: No need for extensive hyperparameter tuning.
  • Self-hosted models: Improved latency and availability.
  • Pluggable LLMs: Developers retain full control over conversation logic and are able to perform evaluations.
  • Lower costs: cost per minute much lower than that of humans.
  • Simulation-based evaluations: Automated testing frameworks will become standard, with open-source alternatives emerging.

This is an exciting space, and we can’t wait to see how it evolves.

If you are interested in the code, let us know by tagging @bnowako or @moscicky on X, we can open source it.


Thanks for reading!

| Btw. we just released Job Board for AI Agents. Check it out

]]>
https://nomore.engineering/blog/voice-agents hacker-news-small-sites-42685578 Mon, 13 Jan 2025 16:57:20 GMT
<![CDATA[Luck Be a Landlord Might Be Banned from Google Play]]> thread link) | @doppp
January 13, 2025 | https://blog.trampolinetales.com/luck-be-a-landlord-might-be-banned-from-google-play-2/ | archive.org

Happy New Year everyone! Just three hours into 2025, I received an email from Google Play Support with the following subject line:

Action Required: Your app is not compliant with Google Play Policies (Luck be a Landlord)

Nothing has changed with Luck be a Landlord in the past few months, but for whatever reason, my game "contains gambling" now!

I've already gone through this song and dance multiple times with Google Play over the game being banned by them in 13 countries, so while this isn't exactly new, it's a lot scarier!

I genuinely think Luck be a Landlord deserves the equivalent of an E10+ rating across all regions, but at this point I've just given up and will deal with whatever inaccurate rating is slapped on the game.

I've now filled out the Google Play age rating questionnaire to "agree" that my game "contains gambling." I'm doing this in an attempt to stop my game from being banned globally on Google Play since every time I've tried to appeal these decisions I've simply been sent a screenshot of my game and was told "this is gambling."

Whatever happens, I hope you're still able to play my game on your Android device in 2025.

If not, consider subscribing to this newsletter for more information on my next game! Here are some stills from its upcoming trailer:

]]>
https://blog.trampolinetales.com/luck-be-a-landlord-might-be-banned-from-google-play-2/ hacker-news-small-sites-42683567 Mon, 13 Jan 2025 14:10:33 GMT
<![CDATA[Live London Underground / bus maps taken down by TfL trademark complaint]]> thread link) | @fanf2
January 13, 2025 | https://traintimes.org.uk/map/tube/ | archive.org

Hide

Update – 14th January 2025

TfL have apologised for the way that their “online brand protection agency” handled this and their approach to my hosting provider, and will be discussing matters with them. Given that, and the numerous emails I have been receiving from people missing the maps, I’m happy to reinstate them.

How does it work?

2nd prize App in the Open Data Challenge

Live departure data is fetched from the TfL API (Powered by TfL Open Data), and then it does a bit of maths and magic. Some H&C and Circle stations are missing in the TfL feed.

Who did this?

Matthew Somerville (with helpful hinderances from Frances Berriman and James Aylett). Station icon by Tim Diggins.

]]>
https://traintimes.org.uk/map/tube/ hacker-news-small-sites-42682876 Mon, 13 Jan 2025 12:51:37 GMT
<![CDATA[Can you complete the Oregon Trail if you wait at a river for 14272 years?]]> thread link) | @donohoe
January 13, 2025 | https://moral.net.au/writing/2025/01/11/waiting_for_oregon/ | archive.org

11 January 2025

A screenshot from the main gameplay view of Oregon Trail, showing a covered wagon drawn by oxen near a river, and cheerfully informing you that Zeke has a broken leg and will die soon.

If you're into retro computing, you probably know about Oregon Trail; a simulation of the hardships faced by a group of colonists in 1848 as they travel by covered wagon from Independence Missouri to the Willamette Valley in Oregon. The game was wildly successful in the US education market, with the various editions selling 65 million copies. What you probably don't know is the game's great untold secret.

Two years ago, Twitch streamer albrot discovered a bug in the code for crossing rivers. One of the options is to "wait to see if conditions improve"; waiting a day will consume food but not recalculate any health conditions, granting your party immortality.

From this conceit the Oregon Trail Time Machine was born; a multiday livestream of the game as the party waits for conditions to improve at the final Snake River crossing until the year 10000, to see if the withered travellers can make it to the ruins of ancient Oregon. The first attempt ended in tragedy; no matter what albrot tried, the party would succumb to disease and die almost immediately.

A couple of days before New Years Eve 2025, albrot reached out and asked if I knew anything about Apple II hacking. In reality the answer was no, I knew three things about the Apple II:

  • It has a MOS 6502 processor
  • It was popular in the US educational market
  • Something something Carmen Sandiego something Prince of Persia?

But all old computers are basically the same right? Specialist knowledge is for cowards.

Where to start

First I loaded Oregon Trail into MAME's Apple II emulator. MAME is an emulator that was originally targeted at arcade hardware, however it has support for hundreds of game consoles and home computer systems thanks to merging with the MESS project. It has one of the nicest debuggers I've ever used, with plenty of documentation, which is why I strongly recommend it for reversing work. You will need a copy of the Apple II ROMs, listed in the MAME ROM index as apple2e.zip.

We are using the most common disk image of the game (aka "peperony and chease" edition). To open the game:

mame apple2e -debug -flop1 oregonmod_a.do

This will open a separate Debug window and pause execution until you start running the machine (F5). In the game window you can activate the MAME menu keys by pressing Scroll Lock; useful ones to know are TAB for the main menu, F6 for saving a state, F7 for loading a state.

This was my thinking for how to get a first toehold into what was going on. As you travel, the game gives your party one of four health ratings:

  • good
  • fair
  • poor
  • very poor

It follows that the game code must read the text string "poor" from somewhere in memory so it can be printed on the screen, and this code would be attached to the routine that determines how sick everyone is. So I sent the party off from Independence, Missouri with 1 ox, no food and no clothes. As soon as we were on the road, I made a save state. I then called "Step Into" in the debugger (F11) which paused execution, and made a dump of the Apple II's memory to a file with the following command:

save /tmp/mem_before,0,0x10000

I loaded this dump into a hex editor and searched for the word "poor", which I found at address 0xA6AE. Then in the debugger I added a watchpoint for any code attempting to read the first byte of "poor":

As expected, once my party became emaciated enough the debugger halted execution at the following instruction:

The Apple II CPU is a 6502, which has a nice instruction reference here. Looking it up, we can see that LDA basically means "load a byte from the address stored at 0x005E into register A". Because the 6502 is a very basic 8-bit CPU, you are limited in how you access the 16-bit address space. Generally speaking you have the choice of using a fixed offset (up to 16-bit, baked in the instruction) plus whatever 8-bit value is in a register, or using a Zero Page pointer. The above instruction is using a Zero Page address; for this mode you have to write a 16-bit pointer containing the location you want to somewhere in the first 0x100 bytes of memory, then use an instruction which references that pointer.

So we know that the pointer to the text describing the player's health is at 0x005E, which we can confirm by opening a memory window in the debugger and see AE A6 at address 0x005E (the 6502 is little endian, so addresses are stored backwards). It's fair to say that this is just for updating the display, and not for checking how ill everyone is.

We need to find the code that sets this status, so I added another watchpoint for any writes to the pointer at 0x005E:

wpset 0x005E,1,w,wpdata == 0xAE

After reload the save state, the game stopped in another area:

E630  STX $5E       ; mem[0x005E] = X

I started using the debugger's Step Out feature (Shift+F11) to move up the call stack and see what was calling what. From further poking around it became apparent that we were in some sort of generic high level text printing function. Then a penny dropped, and I started wondering about the memory layout of the Apple II. A memory map shows the D000–F7FF range to be the part of the ROM containing the BASIC interpreter. Oh.

In fact there's a very helpful annotated Apple II ROM disassembly, showing that function is indeed a high-level print function for text.

Well that just got complicated

So 1985 Oregon Trail is written in Applesoft BASIC, and the program is stored as some sort of bytecode. My heart sank a little at this news; 6502 assembly is cumbersome to understand at the best of times, and throwing a BASIC virtual machine on top of that makes live debugging even worse. I didn't have time to hack together a way to trace the BASIC code as it executed.

However this did open up a new angle of attack. A BASIC program uses variables, which are stored in a known location in a known format. What if we let the game play, then kept an eye on the debugger's live memory view to see what variables change over time?

The good news is that Applesoft BASIC keeps the original names for each variable; a critical clue. The downside is that those names are (at most) two letters long. Also the memory is managed dynamically, so variables will move around as the program chugs on.

From poking around the ROM disassembly, the pointer to the variable store is at address 0x0069 of the Zero Page (usually 0x9902), followed by a pointer to the array store, and a pointer to all of the string data. Each variable is 7 bytes long; 2 bytes for the name (using the high bits of each to encode type), and 5 bytes data. The default numeric type in Applesoft BASIC is a cursed 40-bit floating point format, with an 8-bit exponent and a 31-bit mantissa.

After a bunch of test runs I found a variable called H, which increased in line with my health status getting worse. Even better, setting the H data to all zeros reset the status back to "good", after which it seemed to decay back to "very poor" at the previous rate. There was also a H1 array that seemed to keep track of party members that had died (0 for alive, -1 for dead), and a H2 array which seemed to keep track of whatever exciting disease each party member had. Easy! Open and shut!

Did we make it?

New Years Eve 2025 had arrived, and with it attempt 2 to reach the barren wastelands of future Oregon. This time, the time machine stopped at 16120 AD.

A screenshot of albrot's second Oregon Trail Time Machine attempt, some 5 days after the stream began. There is a tombstone that reads "Here lies ALBROT". The streamer is facepalming.

A dismal failure. Even when zeroing out the memory, every day H would reset to 139, dooming the party to a short-but-agonizing fate. Cheating death is harder than expected.

We can still save this

Several days later, I tried writing a scrappy decompiler for the Applesoft BASIC bytecode. From past experience I was worried this would be real complex, but in the mother of all lucky breaks the "bytecode" is the original program text with certain keywords replaced with 1-byte tokens. After nicking the list of tokens from the Apple II ROM disassembly I had a half-decent decompiler after a few goes.

(This ultimately turned out to be unnecessary. If I had looked more closely at the options for CiderPress II, I would've seen that you can use the "import" and "export" commands to convert Applesoft BASIC files to and from plain text).

So of course we had to look for the code that calculates the player's health. Here it is, from OREGON TRAIL:

...
# 3200
LET H0 = C0
FOR L = C0 TO C4
IF H1(L) > C0 THEN H2(L) = H2(L) - C1
LET H0 = H0 + C1
IF H2(L) < C1 THEN H1(L) = C0

# 3205
NEXT
IF  RND (C1) < P5 THEN TM = QT +  INT ( RND (C1) * 41)
LET PP = ( RND (C1) < QP)
LET W = INT ((TM + 10) / 20)
LET TM = W

# 3206
LET TR = C0
LET TS = C0
IF PP THEN Z = ( RND (C1) < .3)
LET W = 6 + Z + Z
LET TR = .2 + .6 * Z
IF TM < C2 THEN W = W + C1
LET TS = 8 * TR
LET TR = C0

# 3207
LET ZT = TM - C3
IF ZT < C0 THEN ZT = C2 - TM

# 3210
LET ZC = 5 - TM - TM - OP
IF ZC < C0 THEN ZC = C0

# 3215
LET X = (ZC > P5)
LET Y = (PF = C0)
LET ZF = F0
IF Y THEN ZF = 8

# 3220
LET ZP = (W > 5) + (W > 7) + P + P

# 3225
LET Z = FS * P5
IF X OR Y THEN Z = FS + .8

# 3230
LET FS = Z
LET H = .9 * H + ZT + ZC + ZF + ZP + FS + H0 + HR
LET PF = PF - FC
IF PF < C0 THEN PF = C0

# 3235
IF  NOT W1 AND H > 139 THEN  GOSUB 10300
INVERSE
...

Constants:

  • C0: 0.0
  • C1: 1.0
  • C2: 2.0
  • C3: 3.0
  • C4: 4.0
  • P5: 0.5

Inputs:

  • NP: number of party members alive.
  • P: pace. 1: steady, 2: strenuous, 3: gruelling
  • PF: pounds of food left. Copied over from I(8).
  • R: rations. 1: filling, 2: meagre, 3: bare-bones
  • W: weather. 0: very cold, 1: cold, 2: cool, 3: warm, 4: hot, 5: very hot, 6: rainy, 7: snowy, 8: very rainy, 9: very snowy

Calculated factors:

  • FC: pounds of food consumed. FC = NP * (4 - R)
  • ZC: clothing misery factor. ZC = 5 - W * 2 - (clothes/NP)
  • ZF: food misery factor. ZF = 2 * (R - 1), or = 8 if you have no food left
  • ZP: pace misery factor. ZP = (W > 5) + (W > 7) + P + P
  • ZT: temperature misery factor. ZT = 0 if W is warm or cool, otherwise the number of steps away from warm or cool.
  • FS: food starvation factor. FS = FS + 0.8 if there's no food or if ZC > 0.5, otherwise FS = FS * 0.5.
  • H0: seems to be H0 = 5.0 at calc time?
  • HR: hardship factor. HR = 10 if the trail is rough, HR = 20 if someone just died, 20% chance of HR = 20 if the water is bad, 10% chance of HR = 10 if there's very little water

Output:

  • H: party health score. 0-34 is good, 35-69 is fair, 70-104 is poor, 105-139 is very poor. Capped at 139. H = .9 * H + ZT + ZC + ZF + ZP + FS + H0 + HR

The inputs were figured out by checking what text the game prints on the screen for those variables; e.g. W is used to fetch a string from an array W$ that looks like this:

[0xa0ee]  5780250001000a  W$[10] =
[0xa0f8]  09efa6  - b'very cold'
[0xa0fb]  04eba6  - b'cold'
[0xa0fe]  04e7a6  - b'cool'
[0xa101]  04e3a6  - b'warm'
[0xa104]  03e0a6  - b'hot'
[0xa107]  08d8a6  - b'very hot'
[0xa10a]  05d3a6  - b'rainy'
[0xa10d]  05cea6  - b'snowy'
[0xa110]  0ac4a6  - b'very rainy'
[0xa113]  0abaa6  - b'very snowy'

The health factors are all conjecture based on the inputs, but I think they make sense. We have the formula for H, and there's only one misery factor which would be affected by waiting 14272 years: the food starvation factor. In case you're wondering what that looks like:

[0x9bb0]  4653967e4f4234  FS = 4166608.55078125

Yikes. We can see that every day after crossing Snake River, H would increase by at least 4.1 million, and then be rounded down to the worst possible health score of 139. It would take 15 days of doing nothing but sitting and eating to bring the food starvation factor down to 127, by which point everyone in the party would be dead.

But finally, FINALLY we know how to distort reality enough to get this train back on the rails. The party is back to full health, all that's left is to get past the final fork at The Dalles and...

Did we make it to Oregon???

A screenshot of an error screen. The text reads "Error 53 at line #50050 in Oregon Trail. Please report this error to MECC."

Piss. I couldn't find a technical contact for MECC, so I dug around and saw line 50050 was in TRADE.LIB; namely the code responsible for drawing the inventory. Error 53 is the Apple II error code for GS/OS: parameter out of range. But what could possibly be wrong with the inventory? It was fine on the road!

Oh. Wait.

[0x9d29]  49802200010009  I$[9] =
[0x9d33]  000000  - b''
[0x9d36]  0512a9  - b'Wagon'
[0x9d39]  040ea9  - b'oxen'
[0x9d3c]  10fea8  - b'sets of clothing'
[0x9d3f]  07f7a8  - b'bullets'
[0x9d42]  0ceba8  - b'wagon wheels'
[0x9d45]  0be0a8  - b'wagon axles'
[0x9d48]  0dd3a8  - b'wagon tongues'
[0x9d4b]  0ec5a8  - b'pounds of food'
[0x9d4b]  49003400010009  I[9] =
[0x9d57]  ffffffffff  - -1.7014118342085515e+38
[0x9d5c]  ffffffffff  - -1.7014118342085515e+38
[0x9d61]  8310000000  - 4.5
[0x9d66]  8528000000  - 21.0
[0x9d6b]  8a09000000  - 548.0
[0x9d70]  8100000000  - 1.0
[0x9d75]  8100000000  - 1.0
[0x9d7a]  8100000000  - 1.0
[0x9d7f]  8200000000  - 2.0

The game was kind enough to gift me -170 billion billion billion billion wagons.

Actually this was a red herring. Line 50050 is also on Side B inside END.LIB, aka. the screen before the crash (choosing between floating down the Columbia River or taking the Barlow Toll Road).

And guess what.

It reads the date:

# 50050
[0x023d]  POKE 900,NP
[0x0245]  POKE 901,AY - 1800
[0x0252]  POKE 902,AM
[0x025a]  POKE 903,AD

AY is the variable containing the year, and the POKE instruction tries to shove the value of AY - 1800 into a single byte of the regular Apple II memory, causing a crash as it is much larger than the maximum of 255.

These POKEs seem to be how Oregon Trail saves key information when it switches between BASIC programs. Most of the game is handled by the main program "OREGON TRAIL" (written by John Krenz), which loads in modules like "RIVER.LIB" and keeps access to all the variables. Whenever a new main program is loaded in, all the BASIC variables are lost except for whatever data you happened to copy to regular memory with POKE. At the end of the game you have the option of floating down the river, which uses a new main program "FLOAT" (written by Steven Splinter) before switching again to the "WIN" program, so it follows that the game would need to use the POKEd numbers.

What happens if we mod the BASIC code in memory to say something else?

A screenshot of the ending screen of Oregon Trail, showing the beautiful Willamette Valley, and the utterly incorrect year of 18255.

Finally we have a definitive answer about whether this whole ordeal is possible. Unlike every other progress screen in the game, the ending screen has the year hardcoded to start with "18", followed by the printed text of whatever PEEK 901 is. The maximum number of years you can take without crashing the game is 207, and any attempts longer than 51 years will end in the wrong century.

Really, there wasn't much point going on; it felt like further trickery went against the spirit of the challenge. Maybe there's closure enough in knowing that if you somehow avoid starvation for 142 centuries, the game dicks you at the last possible moment by expecting the year to be sensible.

Then again we're 98% of the way there

I lied. Here's a improved version of Oregon Trail side B with the following quality-of-life changes:

  • Waiting by a river for conditions to improve resets the food starvation factor to 0.
  • The final sequence uses two bytes to store the year.

Many thanks to the CiderPress II team for making it easy to load replacement code onto the floppy.

Conclusions

A screenshot of the ending screen of Oregon Trail, showing the beautiful Willamette Valley, and the correct end year of 16120.

Applesoft BASIC is majestically slow, but thanks to 40-bit floating point and standard commands for drawing and text manipulation, Oregon Trail works much better than I could have expected after running the simulation for 14272 years. The final screen being broken is a bit of a letdown, but understandable given how unlikely it is that you'd take 14272 years to finish the game.

Honestly, one of my aims has been to create a reverse engineering approach that will work for any system, regardless of knowledge or experience, and give you the basic steps for how to understand and modify it. This was a bit of a test run of this, and I think it could be considered a success. Eventually I'll write it down properly, but hopefully this effort will give you some ideas for your next reversing project.

And albrot? Well, he was able to fulfill his dream and become the first person to survive for 15000 years on the Oregon Trail. A well deserved victory.

If you're interested in the tools I wrote for this work, the Applesoft BASIC decompiler and variable scraper I wrote are available here. Happy trails!

]]>
https://moral.net.au/writing/2025/01/11/waiting_for_oregon/ hacker-news-small-sites-42682813 Mon, 13 Jan 2025 12:42:27 GMT
<![CDATA[Porting the GNAT Ada compiler to macOS/aarch64]]> thread link) | @ingve
January 13, 2025 | https://briancallahan.net/blog/20250112.html | archive.org

academic, developer, with an eye towards a brighter techno-social life



[prev]
[next]

After getting a port of GDC working on my new MacBook Pro, there are still two languages left in the GCC suite that I don't have: Ada and Go. Some searching around makes it seem pretty clear that Gccgo is not yet really on the table to macOS. But there should not be any reason we can't add Ada to our GCC suite, seeing as there is already support for macos/aarch64 in the repository. I wanted to get a macOS/aarch64 native Ada through the GNAT compiler in GCC.

But try as I might, I could not find any precompiled packages for it. I guess part of the issue is that macOS/aarch64 support is not fully upstreamed into GCC propre; instead, Iain Sandoe has a GitHub repository that includes the necessary changes for full support. I used his gcc-14-branch repository to build GDC.

So let's get to work.

Virtualizing macOS

I don't have Rosetta on my machine. I know it takes all of two seconds to install it, but I also wanted an excuse to play around with Apple's virtualization framework. After some googling around, I settled on using VirtualBuddy as my virtualization manager. It made installing a virtualized copy of Sequoia 15.2 incredibly easy; just say you want to install from the Internet, choose the version you want to install, and away you go. I gave my virtualized machine 4 CPUs and 16 GB of RAM and a 128 GB hard disk, and it felt just as snappy as the host (the host is a MacBook Pro M4 Max with 64 GB of RAM). I installed the command line tools, Rosetta, and some Homebrew packages on this virtualized machine and I was ready to go.

Step 1: Finding a binary of GNAT

So why did I install Rosetta? My suspicion was that someone had a binary package of GNAT for macOS/x86_64. But I looked in the usual places: Homebrew, pkgsrc, MacPorts, and Fink, and none of them had a package. I was beginning to wonder if our project was going to end before it began. Because GNAT is written in Ada, you need an Ada compiler to compile it.

I eventually stumbled upon Alire which bills itself as Ada's version of Rust's cargo or OCaml's opam. The Alire homepage directly states that they have a recent Ada package for macOS/x86_64. This was our way in. I followed the tutorial to get the alr binary installed on the virtual machine and got a copy of the GNAT compiler, which happened to be version 14.2.0. That's great, because that's the same version we built with our GDC compiler, and what we will be building from the gcc-14-branch repository.

The compiler was built for a x86_64-apple-darwin21.6.0 target. We will use this to get to a aarch64-apple-darwin24.2.0 target.

Step 2: Building a Stage 1 GNAT cross compiler

I added this Alire-provided GNAT to my PATH and set up the gcc-14-branch repository to prepare to build. Make sure it goes at the front of your PATH so that clang doesn't get picked up instead. I did run the contrib/download_prerequisites script so that GMP, MPFR, MPC, and ISL would be built statically into GCC; that way I don't have to deal with any shared libraries and they're not that big so it won't take long to build them.

This first pass we need to build a compiler that is built (effectively) from macOS/x86_64, will run on (effectively) macOS/x86_64, but produces code that runs on macOS/aarch64. We also need to remember that all the build tools understand both x86_64 and aarch64; that will be a big help for us. To start, we can run a configure invocation that looks like this:

env OTOOL=otool AS=as AR=ar RANLIB=ranlib LIPO=lipo NM=nm DSYMUTIL=dsymutil ../gcc-14-branch-gcc-14.2-darwin-r2/configure --prefix=/opt/gcc14 --enable-languages=ada,c,c++ --with-sysroot=/Library/Developer/CommandLineTools/SDKs/MacOSX15.sdk --build=x86_64-apple-darwin21.6.0 --host=x86_64-apple-darwin21.6.0 --target=aarch64-apple-darwin24.2.0 --disable-nls --with-ld=/usr/bin/ld --with-as=/usr/bin/as

Then we can run our build with:

gmake V=1 -j4

I installed GNU make from Homebrew since that is much newer than what I got from the command line tools.

For now, I am only going to worry about getting GNAT up and running; once we have it then I can worry about combining GNAT and GDC together into one suite.

I did encounter a few build failures but these all seemed to revolve around autotools wanting prefixed versions of the tools; I would just edit Makefiles and re-run gmake when that happened. Eventually it did succeed and so I installed it into a fake directory, tarred it up, and then installed it on the virtual machine.

Step 3: Building a Stage 2 GNAT native compiler

Now we can use this new compiler to build another new compiler. This new compiler will be built with the compiler we just built: because the compiler we just built produces code for aarch64-apple-darwin24.2.0, even though it runs on x86_64-apple-darwin21.6.0, once it is finished compiling GCC, that GCC will be a native macOS/aarch64 compiler.

We'll need to reset our PATH to get rid of the Alire-provided compiler. We don't need it any more. We do need to add the compiler we just built to our PATH as that will be used to create our native compiler.

Let's create a new build directory and run our configure invocation as follows:

env OTOOL=otool AS=as AR=ar RANLIB=ranlib LIPO=lipo NM=nm DSYMUTIL=dsymutil ../gcc-14-branch-gcc-14.2-darwin-r2/configure --prefix=/opt/gcc14 --enable-languages=ada,c,c++ --with-sysroot=/Library/Developer/CommandLineTools/SDKs/MacOSX15.sdk --build=x86_64-apple-darwin21.6.0 --host=aarch64-apple-darwin24.2.0 --target=aarch64-apple-darwin24.2.0 --disable-nls --with-ld=/usr/bin/ld --with-as=/usr/bin/as

So what we are saying here is that our build machine is (effectively) macOS/x86_64 but we would like to build a compiler that runs on macOS/aarch64 and produces code for macOS/aarch64. A compiler that produces code for the platform it runs on is a native compiler.

This failed very quickly. Turns out the stage 1 collect2 binary was miscompiled and segfaults when it is used. What I did to overcome this was copy the native collect2 binary from my GDC compiler to the GNAT compiler. Everything was happy after that.

Now, we can compile with:

gmake V=1 -j4

Like with the stage 1 compiler, there are some intermittent build failures. But none of these are catastrophic; it is a matter of just editing the Makefiles in question and re-running gmake. The most difficult error was that when linking libstdc++.dylib, there was a linker error that you were linking the library with a library of the same name. The problem is that the infrastructure incorrectly wanted to link with g++ instead of gcc. There is a comment in the right place in the Makefile so it's relatively clear that you just need to replace g++ with gcc and it will work. I've never had that problem before so perhaps it is something with cross compiling.

Eventually it will complete successfully, and then same thing: I installed to a fake directory, removed the stage 1 compiler, then tarred and installed the stage 2 compiler.

Step 4: Putting it all together

Now that I have a native GNAT for macOS/aarch64, I want one single GCC package that will have all the languages that are available: Ada, C, C++, D, Fortran, Modula-2, Objective-C, and Objective-C++. I moved the tarball of the stage 2 GNAT compiler to my host machine, shutdown the virtual machine, and installed the stage 2 compiler. I then put the stage 2 compiler in front of the GDC compiler in my PATH and built the gcc-14-branch compiler one last time:

../gcc-14-branch-gcc-14.2-darwin-r2/configure --prefix=/opt/gcc14 --enable-languages=ada,c,c++,d,fortran,m2,objc,obj-c++ --with-sysroot=/Library/Developer/CommandLineTools/SDKs/MacOSX15.sdk

That's all we need now. All of build, host, and target are aarch64-apple-darwin24.2.0, a fully native compiler. And GCC will detect that automatically and create a new native compiler.

One last time to build GCC, but let's go a little faster now that we have more CPU cores:

gmake V=1 -j10

This built without any problems. Then for one last time, install to a fake directory, tar it up, remove the old stage 2 compiler and old GDC compiler, and install this new complete GCC suite.

Conclusion

I've uploaded the tarball here for those that want it. You will need to have the command line tools installed. If someone knows how to get GCC to autoselect between Xcode and the command line tools, please let me know. You'll also need to put /opt/gcc14/bin at the front of your PATH to use it.

Even better, now no one will ever have to go through the hassle of bringing up Ada support on macOS/aarch64 again. We now have a compiler that can be used as a bootstrap for future releases of GCC. Maybe we'll even have some package managers provide GNAT compilers since the hard work has been done.

Top

RSS

]]>
https://briancallahan.net/blog/20250112.html hacker-news-small-sites-42681917 Mon, 13 Jan 2025 10:14:06 GMT
<![CDATA[What I Learned Failing to Finish a Game in 2024]]> thread link) | @grgaln
January 13, 2025 | https://georgeallen.dev/posts/2024-failures-in-game-development/ | archive.org

What I Learned Failing to Finish a Game in 2024

Here’s a breakdown of my year in game dev, the challenges I faced, and the key takeaways that’ll shape my 2025 projects.

TL;DR

In 2024, I didn’t finish any games, but I learned a lot about game development. I tackled three projects: a multiplayer turn-based RPG (Kinship), a puzzle simulation game (ClarityCorp), and a co-op side-scroller shooter (no name yet). The biggest challenges included managing scope, learning multiplayer mechanics, and handling art requirements.

Key lessons: start small, focus on a solid game loop, and polish later. Despite not finishing a game, I made great strides in development and am ready to continue working in 2025.


2024 didn’t go as planned. Despite the title suggesting I “failed,” I actually learned a tremendous amount about game development last year. Looking back, I see a lot of unfinished projects, but with each one, I gained valuable lessons.

Game one - Isometric Turn-Based Multiplayer RPG (Kinship)

At the start of the year, my focus was on multiplayer—specifically, “how the hell do you make a multiplayer game?”

After diving into articles (shout out Glenn Fielder ) and studying the multiplayer quirks, I settled on a turn-based combat RPG. Games like Baldurs Gate 3 and the Divinity series are absolute classics that I would find huge inspiration from.

Kinship was designed as a 4-player, online co-op RPG. Players could choose from the classic classes—Fighter, Mage, Rogue, or Cleric—and battle through an isometric world. The combat system was simple, with basic attacks and spells, and the enemies (mostly slimes, for now) moved and attacked in turns.

Below is a screenshot of the in game combat, art is placeholder, and not final.

That’s pretty much it. Despite the lack of content, this took me many months to achieve. So, where did I go wrong?

Overambitious Scope

The classic advice is to start small, and I definitely ignored that here. Multiplayer games are complex, and Kinship needed a lot of content—multiple playable classes, levels, and assets. The sheer scope started to overwhelm me, and I found myself stuck in a cycle of not knowing where to begin.

Isometric Art

Isometric is hard. Well, not really, there are some key concepts that make it easier, but in general it means you need at least 4 versions of each asset, for each direction in the game. This means not only are you making one walk cycle, you are making 4, for each playable character, each enemy and certain moveable objects. Now your characters need to attack, thats 4 attack animations. They also have many attacks… You get the point. It’s a lot of art.

If you are anything like me (a programmer), making art is the most painful part of game development. So having to make 4 versions of everything in a perspective which is very easy to spot mistakes, I could not achieve this feat alone.

Perhaps I should have linked up with an actual artist, but without being able to pay them a fair salary, this was not an option I could pursue.

Playable classes

For this style of game to be fun and re-playable, it needed at least 4 classes, one for each player. This really multiplies out how much work you need to do to finish the game. Not only does each class need its own assets, it also needs its own behaviors, attacks, skill trees and balancing, all things required to make a really good game. Resulting in another key point that de-motivated me and stopped me making progress.

I definitely took on too much too soon, and should have focussed on nailing one class from top to bottom and ironed out all the kinks along the way.

The advice of “start small” really started to ring true here!

Learning new concepts takes time

The elephant in the room is “multiplayer”. At the start of the year I had never made a multiplayer game, I do however have extensive experience in networking having been a Backend Software engineer for over 9 years, a degree in Computer Science and founding a networking based tech company.

Lot’s of people give advice against solo-dev’s targeting multiplayer games. Whilst I do believe that advice is valid, I also believed that multiplayer is a key concept of some of the best games I play and is a way to stand out amongst the swarm of titles that hit Steam everyday. I’m also a big believer in making games that you want to play. It helps in so many ways to keep you motivated as development is fun.

Learning the quirks of multiplayer took time, however. It was apparent early on in my research that targeting a turn based system would make multiplayer slightly easier, as you do not have to deal with too many synchronization issues and lag compensation. You “just” need to make sure each action in your game is deterministic and can be replayed across clients, so they can each replicate the turns taken and synchronize state at the end of each players turn.

Ultimately the multiplayer parts of Kinship weren’t too hard to implement. Steam have an excellent library and really abstracted away some of the hard parts of the networking. The issue’s stemmed from the fact that testing multilayer games is quite a lot harder and forces a slower feedback loop. You can’t just load your game and start playing, expecting it to “just work” when you add more players. Testing meant I needed to be constantly running multiple copies of my game, sometimes across multiple devices, ensuring that interactions were smooth and there were no networking bugs. This is just harder, bugs are harder to spot, diagnose and fix.

Polishing too early

I really wanted to leverage the isometric 2.5d style to have some amazing lighting and shadow casting. In hindsight I should have spent less time on this early in development. I’m super stoked with the effect I managed to achieve, however it took many hours of bashing my head against a wall in shader hell.

Potentially this is the sort of feature that I should really focus on during the polish phase of development, once I have a game, with content and features, I can focus on making it really look and feel like a finished product.

You can see below an example screenshot of how the light and shadow casting work. I took huge inspiration from the approach that the Graveyard Keeper team used, which they outlined here . The main idea is that you simulate 3d lighting based on the vertical axis, the article really explains it better than I can!

I did really learn the nuts and bolts of how to use shaders, and this is something I can carry forward into other games that I pursue.

Game two - Puzzle simulation game (ClarityCorp)

My next game was completely different, I mean completely. Using my learnings from Kinship, I dropped multiplayer and tried to massively drop the art requirements.

The core concept was a single player experience where you are a remote worker for ClarityCorp. You log into your PC each day, are given a variety of seemingly pointless tasks, which over time you start to learn your company has a very menacing underbody. Think Papers, Please, with some heavy Matrix and Severance style inspiration.

No hook

This game however, really didn’t get further than the concept stages for me. I really struggled to nail down how to make it fun, and have a great hook.

The screenshot below shows the “login” screen you are presented with each day. You would login and be presented with a series of tasks you needed to complete for the day, to earn money and keep your family alive.

One of the tasks I started playing with, heavily inspired by Severance’s data refinement was called “Scrubbing”. You are presented with a series of words, and some criteria for words you needed to “scrub” out. You would have a time limit to remove all the words, each level getting harder and harder, and the criteria changing each time.

Despite having ideas for more mini-games, puzzles and how to progress the story, I got very demotivated as I felt this game really relied on a solid loop, which I could not nail down.

I think there is potential here for a great game, but it didn’t keep me hooked during development, so chose to park it for now. My main learning here, you need to start development with more solid ideas ahead of time. Really lock in a core game loop on paper and make that reality. Having a random list of “cool” puzzles and not knowing how to tie them together isn’t really a game.

My third and final foray into game development of the year, took me to another multiplayer game. I haven’t yet come up with a name, however I do have the core concepts locked down.

This game involves up to 4 players, working through side scrolling levels, defeating enemies as they go. My key inspirations are:

  1. Celeste for the level design + platforming mechanics
  2. Broforce for the combat/carnage
  3. Remnant: From the Ashes, for the souls-like twist

Despite not being implemented the idea is that each player has the same starting point on a skill/level tree and they can take diverging paths to improve their abilities. Whether thats how fast they move, how far they jump, how often they shoot/attack, damage, health, skills etc. The players will be able to pick up different weapons, use differing skills and gain new abilities as they progress through a number of levels, ending with boss fights.

This game is hopefully a combination of the learnings from the last two games i discussed in this post. Whilst it is still multiplayer, it has a massively simplified art style, less diverse playable classes and hopefully simpler levels to design.

There are still some challenges that have and haven’t presented themselves.

Below you can see an early build of the game, featuring one player and an enemy. Again, placeholder art!

Realtime multiplayer

One key difference with this game is that the multiplayer is now realtime. Which brings up a few key new concepts I needed to learn about and solve. These are Client-side prediction and Lag compensation. Thankfully I had already learnt a lot about multiplayer game design so it was much easier to look at these to concepts in isolation and how I could solve them.

Again leaning on the amazing Glenn Fielder and other online resources it was not too hard to get an early proof of concept up and running, where multiple people can join a game and progress through a level. I can simulate lag and packet loss to test my implementations against them. I then came up with some patterns I can apply throughout the development of the game to ensure that new features do not suffer issues. We will see how well these stand.

Level assets

This time I don’t need 4 versions of each animation in a isometric perspective, but I do need a lot of tilesets for the levels I want to make. I also need to make these levels.

I feel this is much more achievable scope for me as a solo-dev, I can commission harder pieces of art out to an actual artist if need be and it will stay within budget.

For the levels themselves, I have also thought about procedurally generating certain levels to simplify the level creation process (although this does have its complexities and downsides).

Boss fights

Bosses in video games are notoriously hard to make, its hard to design them in a way that is both hard and fun. As well as this, you need to ensure that your bosses look and feel great, with complex behavior patterns and limited RNG.

This is something I do not have any experience with, so will 100% be a challenge, but one that I am willing to take on in 2025.

Play testing

Accurately testing multiplayer game experiences, well, really needs multiple people. I haven’t gotten far enough into development to make this a real problem yet, but it will approach soon. I will need a solid group of people I can keep playing with to iron out issues and properly balance the game. This is where being a solo-dev will really hinder my progress and puts doubts on my motivation. Something I need to consider going forward is how will I actually execute on this. Perhaps it is time to start building out a team…

Key Takeaways from 2024

While I didn’t finish any games in 2024, I gained invaluable experience. Here’s what I’ve learned:

  1. Multiplayer development is tough: But it’s worth learning, especially if you want to make games that stand out.
  2. Start small: Don’t dive into massive projects right away. Focus on getting something small and fun off the ground first.
  3. Know your limits: The art requirements for certain projects can overwhelm you, so think about how you’ll manage them.
  4. Solid game loops are crucial: Without a core loop that’s fun and engaging, your game can lose momentum quickly.
  5. Polish comes later: Focus on gameplay first, and leave polish and visual effects for when the core mechanics are solid.

Looking ahead to 2025

Even though I didn’t complete a game in 2024, I’ve made significant strides as a developer. I’m excited to continue working on “Game Three” and apply everything I’ve learned. If it doesn’t work out, that’s okay. Game development is a creative outlet, not my full-time job. But I’ll keep coming back to it because I love the challenge and putting things out for people to play!

]]>
https://georgeallen.dev/posts/2024-failures-in-game-development/ hacker-news-small-sites-42681545 Mon, 13 Jan 2025 09:16:17 GMT
<![CDATA[Carnarvon's NASA satellite dish receives first signal in almost 40 years]]> thread link) | @zdw
January 12, 2025 | https://www.abc.net.au/news/2024-12-03/carnarvon-nasa-dish-receives-signal-repairs/104672866 | archive.org

The jagged peaks and valleys of a line chart might not look like much, but they represent a comeback years in the making. 

Carnarvon's historic Overseas Telecommunication Commission (OTC) dish had sat dormant for almost four decades.

The structure, nearly 900 kilometres north of Perth, was decommissioned after a decorated history that included aiding NASA lunar missions.

A bumpy line chart on a digital screen.

The waves of the dataset show how precisely the OTC dish was aligned with the satellite. (Supplied: ThothX)

It was destined for demolition before Canadian aerospace company ThothX signed a 20-year lease for the facility in 2022, with minor repairs beginning last year.

This week, the 29-metre-wide parabolic antenna received its first radar signal since 1987.

"I flew 18,000 kilometres from Canada to conduct this test and so, you can imagine, I was delighted when we managed to receive the first signals from [the satellite]," ThothX chief executive Brendan Quine said.

"This gives us a proof-of-concept for the project, and allows us to move forward to the next step, which is to finish the refurbishment."

Like a giant caravan

The dish needed to be aimed precisely at a satellite of interest in order to receive the signal.

A cleaner in a face mask vacuums the inside of a narrow tunnel.

Mr Quine climbed inside the central optics of the dish to clean them.  (Supplied: ThothX)

However, it had only been rotated a handful of times since the late 1980s and rarely beyond the range of motion needed to be stowed for the cyclone season.

Along with a new back-end radio system, the dish's bearings required flushing with fresh oil and decades-worth of pigeon droppings removed, the latter of which was an ongoing battle. 

The antenna's optics, a six-metre-deep tube at its centre, were also cleaned by hand.

Despite the comprehensive works and lofty goal, the latest test came down to three men and a hand-operated drill fitted to the rotator mechanism.

An older man with a beard holds a battery-powered drill to a machine.

The 300-tonne dish was manually rotated with a power tool by Denham Dunstall. (ABC Pilbara: Alistair Bates)

ThothX Australia director Phil Youd compared the manual operation to fine-tuning the dish atop a caravan.

"Some have got their satellite dishes ... it's the same principle as that to find a satellite except it's far more accurate with what we've got," he said.

"It is a pinpoint that we are looking for; whereas in a caravan, they're looking for a broad beam coming down from the satellite, we are looking at one little frequency."

Two men look at a small screen.

Mr Quine and Mr Youd shouted out directions according to the data as the dish was rotated. (ABC Pilbara: Alistair Bates)

After several minute adjustments, the OTC dish managed to lock on the position of geo-stationary object NSS-12.

The United States-built satellite provides a wide range of services to Australia, such as television broadcasting, online banking transactions, emergency beacons, and military communications.

'Adversaries' above

With its proof-of-concept secured, ThothX plans to invest upwards of $10 million into the OTC dish, which it says will take a key place in its global satellite tracking network.

The dish's location in remote Western Australia, discovered by Mr Quine while searching Google Earth, is particularly well-positioned to give potential military clients a strategic advantage.

A middle aged man looks at the camera from in front of a huge satellite dish.

Brendan Quine travelled from Canada for the test. (ABC Pilbara: Alistair Bates)

"Our adversaries are very active in space," he said.

"Carnarvon is in a perfect location to monitor these activities because we can monitor the whole Pacific theatre."

He said an active radar would surpass the usefulness of telescopes used to monitor satellites, which could be affected by the weather.

"If you're waiting for your adversary to move a spacecraft, you kind of want to know immediately what's going on, and you want to be able to protect your own assets from things crashing into it," he said.

More work to be done

The state government contributed $50,000 towards the OTC dish's refurbishment in September as part of a regional development grant.

ThothX said those funds would go towards repainting the antenna and installing new power systems.

A fresh outer coating to help the dish withstand Carnarvon's corrosive ocean winds is next on the agenda, with hopes for a full radar demonstration in the next year.

dish close

A new paint job is in store for the OTC dish, but it might be years before it's fully operational. (ABC Pilbara: Alistair Bates)

Ultimately, the company expects to install more equipment and software that will allow the OTC dish to locate spacecraft up to 50,000 kilometres from Earth at an approximate three-metre margin of error.

The space company compared the project to its refurbishment of the 46-metre-dish at Algonquin Radio Observatory in Canada, one of several it operates across North America, Europe, and Australia.

It took about 15 years for the Algonquin dish to become fully operational.

Mr Youd, who also manages the Carnarvon Space and Technology Museum, said he was glad to help restore the local landmark.

"It's almost giving it a rebirth, you know?"

]]>
https://www.abc.net.au/news/2024-12-03/carnarvon-nasa-dish-receives-signal-repairs/104672866 hacker-news-small-sites-42679998 Mon, 13 Jan 2025 04:09:24 GMT
<![CDATA[Zuck plans to replace Mid-Level engineers with AIs this year]]> thread link) | @msolujic
January 12, 2025 | https://tribune.com.pk/story/2521499/zuckerberg-announces-meta-plans-to-replace-mid-level-engineers-with-ais-this-year | archive.org

AI-powered software development could lead to unprecedented speeds and scale in technology creation

tribune


Listen to article

Mark Zuckerberg, the founder of Meta, has boldly forecasted that by 2025, artificial intelligence will have advanced to the point where it can code at the level of mid-level engineers. In a world where technology evolves at lightning speed, this statement offers a glimpse into the future, where AI could become an integral part of engineering teams, not just a tool used by developers.

In a podcast interview, Zuckerberg also announces his plan to replace AI Agents with Mid-level engineers this year at the Meta labs.

During a recent tech conference, Zuckerberg elaborated on his vision, suggesting that AI could soon handle much of the coding currently performed by human engineers. If AI can write code with the proficiency of mid-level engineers, the implications could be far-reaching, potentially enabling AI to manage entire software projects autonomously, from concept to deployment.

Beyond Coding: The future of AI in engineering

Zuckerberg’s comments raise the possibility of even more radical advancements. The next step could be the creation of self-improving AI systems that evolve their coding abilities, potentially eliminating the need for human intervention. The idea that AI could one day oversee large-scale engineering projects opens up questions about how we will manage these advanced systems and the role humans will play in this new technological era.

At the conference, one speaker humorously suggested that we could soon see “AI Engineers’ Unions” to advocate for the rights of AI workers, underlining the monumental shift that AI's involvement in engineering could bring. While this was clearly a playful exaggeration, it reflects the growing conversation about AI’s role in the workforce.

Challenges ahead

Zuckerberg’s prediction is both thrilling and unnerving. On one hand, AI-powered software development could lead to unprecedented speeds and scale in technology creation. On the other, it raises significant concerns about the future of jobs in tech, the ethical use of AI, and the potential loss of human agency in critical decision-making.

As we approach 2025, the tech industry, policymakers, and educators must consider how AI will reshape the landscape of software development and the workforce. The challenge will be balancing innovation with safeguarding jobs, fostering human creativity, and ensuring that AI is used responsibly and ethically.

Whether or not AI is coding like mid-level engineers by 2025, Zuckerberg's vision is likely to set the tone for future advancements in software development, prompting a rethinking of what it means to be an engineer in a world driven by artificial intelligence.

]]>
https://tribune.com.pk/story/2521499/zuckerberg-announces-meta-plans-to-replace-mid-level-engineers-with-ais-this-year hacker-news-small-sites-42679352 Mon, 13 Jan 2025 02:18:57 GMT
<![CDATA[What is a DOM node? A peek under the hood]]> thread link) | @thunderbong
January 12, 2025 | https://gregros.dev/post/but-what-is-a-dom-node | archive.org

What makes an object a DOM node? Is it the prototype or something else?

The answer turns out to be surprisingly complicated!

The best way to investigate what the browser sees as a DOM node is to use a function that’s supposed to accept one, and pass it various things, and see what happens!

The classic example is appendChild. This method accepts a DOM node and inserts it as the child of another node. If you pass the method just a regular old object, it will error instead.

Here is some code to illustrate this:

// Create an element
var div = document.createElement("div")

// Insert it into the page
document.body.appendChild(div)
// Works!

// Let's try to insert a regular object instead
document.body.appendChild({})
// Uncaught TypeError: Failed to execute 'appendChild' on 'Node': 
//     parameter 1 is not of type 'Node'.

Mad web science

Now let’s perform a series of bizarre experiments that subvert this code in strange and unusual ways, in the name of mad web science!

Messing up a DOM node

In this variation, we create the element as normal, but we then mess it up by removing its prototype and deleting all of its keys.

This should result in an object that’s functionally indistinguishable from {}, something that should be completely non-functional.

Here is the code:

// Create an element
var div = document.createElement("div")

// Remove its prototype
Object.setPrototypeOf(div, null)

// Delete all of its keys
for (const key of Reflect.ownKeys(div)) {
	delete div[key]
}

// Insert it into the page
document.body.appendChild(div)

Trying to fake one

Now, here is the second variation:

// Create an object with the HTMLDivElement prototype
var div = Object.create(HTMLDivElement.prototype)

// Insert it into the page
document.body.appendChild(div)

In this variation, we use the Object.create function to make a new JavaScript object with the HTMLDivElement prototype. It’s the opposite of what we did in the previous variation — we’re making something that looks like a functional JavaScript object, but we’re not using the correct API to do so.

The question

So… which variation actually works?

  • Does the first one work, in spite of the object being completely empty?
  • Does the second one work, in spite of how we created it?
  • Do neither of them work, because an object needs to have both the correct prototype and be created in the right way for it to count?

Feel free to try to run the code in your browser console and check for yourself!

The answer

It turns out that the first object — the empty one — is recognized as a DOM node, but the second one isn’t. That means the browser doesn’t use an object’s prototype to recognize DOM nodes at all. It’s doing something else.

That’s not to say getting rid of the prototype doesn’t do anything. You can no longer call instance methods, for example, since they are defined on the prototype and that prototype is missing.

But no matter how you screw up a DOM node, if you get a reference to one of those methods, you can still invoke it and it will work just fine. Here is an example:

// Create a div elemenmt
var div = document.createElement("div")

// Unset its prototype
Object.setPrototypeOf(div, null)

// Insert it into the DOM
document.body.appendChild(div)

// Get the `setAttribute` function
const { setAttribute } = HTMLElement.prototype

// Invoke it using `call`:
setAttribute.call(div, "id", "this-actually-works")

Weird, right? Don’t worry, this will actually make more sense once we zoom out a bit.

Beyond JavaScript

And by a bit, I actually mean a lot. Because to truly understand this weirdness, we have to leave the realm of JavaScript altogether and take a look at browser architecture instead.

Browsers are complicated things with many separate systems that interact in lots of different ways. In particular, they all include two critical yet separate components:

  • The JavaScript engine, which executes JavaScript.
  • The rendering engine, which renders the HTML document.

In the Chrome browser, these are called V8 and Blink, respectively. These two separate systems are connected by the JavaScript Web API. This takes the form of a thin layer of bindings embedded in V8 that translate JavaScript function calls to native method calls on Blink objects.

These bindings do very little; the point is that, once a DOM operation is invoked, JavaScript is mostly out of the picture and everything resolves in native code.

Browser architecture diagram

The rendering engine does not follow the rules of JavaScript and generally tries not to know what JavaScript even is. It does know what a DOM Node is though. In fact, one of the rendering engine’s primary jobs is to allocate and manage DOM nodes.

These DOM nodes don’t have anything to do with prototype chains or JavaScript. They are native C++ objects called Node that are passed by reference. They literally implement methods called appendChild and insertBefore.

The V8-Blink bindings form the link between the two. There, each JavaScript DOM node is mapped to a Node object, and this mapping just works by reference.

When an operation like appendChild is invoked, each JavaScript DOM node is resolved to its native counterpart, and then everything is executed in Blink. This means, in turn, that JavaScript DOM nodes are just handles to Blink DOM nodes.

This is why removing the prototype of a DOM node didn’t break it — it was never functional to begin with. The only thing that matters is the mapping, which was created as soon as we called createElement. The JavaScript properties of the object were always irrelevant.

Next, since DOM nodes are allocated by Blink, it’s impossible to create a DOM node within JavaScript — which is what we tried to do using Object.create. That’s kind of like trying to use a random number as the handle to a file.

The OS knows what files we opened, since it’s responsible for opening them; we’re not fooling anyone.

Conclusion

JavaScript objects aren't actually DOM nodes at all. DOM nodes are native objects managed by the rendering engine, and JavaScript objects are just handles to those objects, kind of like pointers.

The browser gives out these handles and puts them on a list. To check if an object is a handle to a DOM node, all it needs to do is check if it’s on that list. The state of the object, like its prototype, is irrelevant.

And that’s it.

]]>
https://gregros.dev/post/but-what-is-a-dom-node hacker-news-small-sites-42678621 Mon, 13 Jan 2025 00:31:29 GMT
<![CDATA[Mystery-o-Matic: Solve a murder mystery every day]]> thread link) | @gaws
January 12, 2025 | https://mystery-o-matic.com/en/ | archive.org

Use this table to keep track of your facts, clues and deductions. You can check the how to play section if you need it.

Start choosing whether the killer never lies (easier) or is allowed to lie (harder).

Your browser does not support the HTML canvas tag.

Your browser does not support the HTML canvas tag.

Your browser does not support the HTML canvas tag.

Your browser does not support the HTML canvas tag.

Your browser does not support the HTML canvas tag.

Your browser does not support the HTML canvas tag.

]]>
https://mystery-o-matic.com/en/ hacker-news-small-sites-42678562 Mon, 13 Jan 2025 00:24:05 GMT