<![CDATA[Hacker News - Small Sites - Score >= 20]]> https://news.ycombinator.com RSS for Node Mon, 03 Mar 2025 04:23:19 GMT Mon, 03 Mar 2025 04:23:19 GMT 240 <![CDATA[Made a scroll bar buddy that walks down the page when you scroll]]> thread link) | @hello12343214
March 2, 2025 | https://focusfurnace.com/scroll_buddy.html | archive.org

( Look at your scroll bar when you scroll )

Instead of a boring scrollbar thought it would be fun to have an animated stick figure that walks up and down the side of your page when you scroll.

This is the first prototype i made.

Going to make a skateboarder, rock climber, or squirrel next. what other kinds of scroll buddies should I make?

Get a scroll buddy for your website

Warning: An embedded example on the side of this page has an animation / movement that may be problematic for some readers. Readers with vestibular motion disorders may wish to enable the reduce motion feature on their device before viewing the animation. If you have reduce motion settings turned on Scroll Buddy should be hidden on most browsers.

ignore all the text below its just lorem ipsum to have content for scrolling.
Ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat. Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.

Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem.

Ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat. Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.

Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem.

Ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat. Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.

Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem.

Ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat. Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.

Heading 2

Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat. Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.

Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem.

Ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat. Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.

Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem.

Ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat. Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.

Heading 2

Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem.

Ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat. Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.

Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo.

Ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat. Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.

Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem.

Made with simple javascript

Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet.

Ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat. Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.

Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem.

At vero eos et accusamus et iusto odio dignissimos ducimus qui blanditiis praesentium voluptatum deleniti atque corrupti quos dolores et quas molestias excepturi sint occaecati cupiditate non provident.

Heading 2

Similique sunt in culpa qui officia deserunt mollitia animi, id est laborum et dolorum fuga. Et harum quidem rerum facilis est et expedita distinctio. Nam libero tempore, cum soluta nobis est eligendi optio cumque nihil impedit.

Ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat. Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.

Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem.

Temporibus autem quibusdam et aut officiis debitis aut rerum necessitatibus saepe eveniet ut et voluptates repudiandae sint et molestiae non recusandae. Itaque earum rerum hic tenetur a sapiente delectus.

Ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat. Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.

Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem.

Heading 2

Ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat. Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.

Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem.

Ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat. Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.

Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem.

Ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat. Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.

Ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat. Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.

Heading 2

Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem.

Ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat. Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.

Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem.

Ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat. Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.

Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem.

Ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat. Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.

Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem.

Heading 2

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris.

Ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat. Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.

Ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat. Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.

Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem.

Ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat. Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.

Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem.

Ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat. Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.

Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem.

Ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat. Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium.

Heading 2

Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem.

Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

]]>
https://focusfurnace.com/scroll_buddy.html hacker-news-small-sites-43237581 Mon, 03 Mar 2025 02:13:00 GMT
<![CDATA[Losing a 5-year-long Illinois FOIA lawsuit for database schemas]]> thread link) | @chaps
March 2, 2025 | https://mchap.io/losing-a-5yr-long-illinois-foia-lawsuit-for-database-schemas.html | archive.org

Losing a 5-year-long Illinois FOIA lawsuit for database schemas

March 2, 2025 — Matt Chapman

Thomas Ptacek, a friend and expert witness in this lawsuit summed it up best in the court's hallway while walking within three feet of opposing counsel: "This is fucking stupid".

His companion post explains why.

Intro

Working with the City of Chicago's parking ticket data—which I've received through FOIA—has always been a pain, especially in terms of knowing what exactly to request. In August 2018, I attempted to generally solve that problem, by submitting a request for the following:

An index of the tables and columns within each table of CANVAS.
Please include the column data type as well.

Per the CANVAS specification, the database in question is Oracle, 
so the below SQL query will likely yield the records pursuant to this request:

select utc.column_name as colname, uo.object_name as tablename, utc.data_type as type
from user_objects uo
join user_tab_columns utc on uo.object_name = utc.table_name
where uo.object_type = 'TABLE'

CANVAS Database Schema request on Muckrock

After the City initially denied the request with an argument that the records' release would compromise network security, I took the denial to court where we initially won at-trial. The City then appealed, which we won as well. The case ultimately went up to the Illinois Supreme Court, where we lost unanimously. Better Government Association did a good explainer of the consequences of that loss, which boils down to a significant broadening of public agencies' leeway to apply exemptions (i.e., withhold records or redact information) in response to FOIA requests.

Why Go Through All of This?

Under Illinois FOIA case law, if a request's responsive documents—the set of records or information within the scope of that request—are stored in a queryable database, a query must be written. The requester is not required to write the query. The law even requires the agency to give you the data in a format of your choice (protip: "excel format"). When it works, it's freaking great. Reality makes it difficult for a number of reasons, though:

  • The FOIA officer will likely need to defer any querying to a colleague who is a "data person."
  • You can't just ask a question:"FOIA does not compel the agency to provide answers to questions posed by the inquirer."
  • From the requester's perspective, "Is X column requestable?" isn't answerable without first attempting to request that column's data.
  • Requesting too many columns will likely lead to time-consuming back-and-forth, or a flat-out denial.
  • Even though Illinois FOIA requires that a requester be given a chance to narrow their request, FOIA officers sometimes just stop responding during this "conferral" process.

To generally work through this problem, many folk will spend hours surfing through PDFs, reports, contracts, work products, etc, just to get a sense of what data might exist. This process is frustrating and often yields incomplete results. Let's walk through my attempt with CANVAS.

First Attempts for Parking Ticket Data

My very first FOIA request was pretty narrow and sought the City's towing data. The City was unable to get me what I requested for reasons I can't seem to find, but it painted a picture that the Chicago doesn't really track how cars are towed.

A month later, the project began shifting towards parking ticket data in addition to towing data, so I requested:

all raw towing and parking violation records available in the CANVAS system and any records that are from imported/interpolated from non-CANVAS systems.

This request was denied. The Department of Finance argued that the request would take anywhere between 280 to 400 hours to complete:

There are 55 million ticket records and 928K seizure records in CANVAS. As far as tow information, we only have knowledge of when a vehicle is towed due to a boot and released. The Department of Finance's application support vender estimates a minimum of 60-80 hours to design/develop/test and run the program.

In addition, since this is like a conversion to another system, we are not sure how long it would take to transfer so much data, a rough estimate would be an additional 80-120 hours to design a solution to get all the data on some kind of media for retrieval. Compliance with this request as currently written would take approximately 140-200 hours utilizing our vendor's resources to the exclusion of other work assignments.

A couple months and some phone calls later, I submitted a narrower request, which was successfully fulfilled, because I included an explicit list of fields. After honing the request language a a bit more, I was eventually able to get the data used in the analysis of my first blog post.

But Wait, Is There More?

Despite getting the limited information I had requested, I still wanted to expand my analysis, which required knowing what other information exists within CANVAS. So, I submitted another request for high-level and low-level system information:


1. Code for CANVAS
2. Logs of CANVAS and/or CANVAS log analysis. 
3. Documentation for the operation of CANVAS, including how information is stored, what kind of database is used, along with any other technical documentation or generic documentation.
4. Any Wiki page related to CANVAS.
5. Any analysis of City parking ticket levels or trends.

The only records the City sent in response was a lackluster spreadsheet with just 100 rows, broken down by ward. I'm still not sure if this was the only analysis ever done at the time, but let's get back to the meat of this blog post.

1, 2, and 3 were denied because:

[The records] could be used in a security breach against CANVAS and jeopardize the security of the system, therefore it is being withheld.

But with the goal of just figuring out what information exists, the request was extremely wide and could have been narrowed to something more akin to a "data dictionary". To this day, I've never been able to get anything like a data dictionary from the City, though there is a contractual obligation—as described in the RFP spec for this $200 million system—for the City to maintain something like that! But alas, at least in 2018, the City claimed they don't have anything like it.

https://www.documentcloud.org/documents/25537825-document/#document/p180/a2624483
—Professional Services Agreement Between the City of Chicago Department of Finance and Department of Administrative Hearings and IBM Corporation: City of Chicago Violation, Noticing and Adjudication Business Process and System Support, p. 180 (2012)

Requesting Database Records from All City Databases

Sensing a pattern of a general failure to maintain data dictionaries, despite the City's public support for launching one, I submitted a FOIA request to every City agency for the following:

1. A short description of the database.
2. The names of the applications that are able to run queries/inserts.
3. All usernames and permissions
4. All database table names.
5. All column names in each table.
6. A description of each column.
7. Number of rows in each table.

A couple weeks later, Chicago's Department of Law sent me a letter on behalf of every agency and denied all parts, 1 through 7, of that request.

First, they argued that they would need to "create a new document":

First, no City Department is in possession of a document which contains the information you seek. The only way to compile the requested information, to the extent it is not exempt for one or more of the reasons noted below, would be to create a document.

Then, they requested a pedantic clarification about what "database" means:

Your request does not provide a definition of the term database. A commonly accepted definition of "database" is collection of pieces of information that is organized and used on a computer. http://www.merriam-webster.com/dictionary/database. Such a broad definition would include Excel spreadsheets. It would be unduly burdensome to the operations of each of the City's Departments to search every computer in use by its personnel in order to identify, open,review and catalogue each database and every Excel spreadsheet in the manner you request."

But even with all of that, they offered a helpful suggestion, and pointed to the City's "data dictionary":

Please note that in late 2013, the City of Chicago launched a publically available Data Dictionary which can be found at http://datadictionary.cityofchicago.org/. It is described as “a resource for anyone who is interested in understanding what data is held by City agencies and departments, how and if it may be accessed, and in what formats it may be accessed.”

Cool! It's a damn shame the system shutdown less than a year later, though.

"Metalicious": Chicago's Failed Data Dictionary

A lot of government agencies have absolutely recognized the problem of the public not knowing what information exists, including Chicago. One such attempt at fixing this problem is to voluntarily make the columns and table names of their databases open to the public, like the Department of Justice's PDFs of table names, column names, and descriptions of both.. There's even an open specification for government database schemas!

But even with agencies voluntarily making schema information public, such releases are effectively discretionary and are outside of the realm of FOIA.

One such release of discretionary information, as the Department of Law mentioned in their denial letter, is the 2013-released city-wide data dictionary project called "Metalicious". That's the actual name.

Metalicious was funded by a $300,000 John D. and Catherine T. MacArthur Foundation grant to UChicago's Chapin Hall, with the intended purpose of making table names, column names and descriptions of both publicly accessible. It's the City's "data dictionary".

CANVAS!

Schema information of the Chicago Budget System on Metalicious (2016)

An example of a system whose database schema information is released was the Chicago Budget System (CBS). A total of 110 tables are listed, with descriptions and a link to each table's columns. An interesting table worth investigation on its own is, BOOK_ALDERMANIC_PAYRATE, which is described as, "data used for creating pay schedule for aldermanic staff published in the Budget Book". Good to know!

Metalicious received some attention in civic data circles:

Journalists and civic inquisitors can use it to determine what information is available when composing Freedom of Information Act requests. Based on my own experience, knowing what to even ask for has been a challenge. All that is over.

All That Is Over: Its Inevitable Shutdown

Within a few short years, the project ostensibly shut down and its front page was replaced with a message about being down for "temporary maintenance". That temporary maintenance has been ongoing for about nine years now.

Down For Maintenance

Back in 2018, I asked the City's now-former Chief Data Officer Tom Schenk why it was shut down, and he explained:

Metalicious was retired because of lack of resources to expand it (originally grant funded). It had some, but very, very small proportion of databases. There was security review of any published data and some information was withheld if we felt it could undermine the application security. By Info Sec policy, it is confidential information until a review deems it appropriate for public release--same as the open data workflow which mirrors the FOIA workflow.

RIP.

Down For Maintenance |Last-Known Running | Metalicious GitHub

Requesting Metalicious

Okay, that's not surprising, but since the first goal here was to figure out whether column and table names are requestable, I submitted my request for the MySQL dump of Metalicious. As these things go, that request was also denied:

Please be advised the Department of Innovation and Technology neither maintains nor possesses any records that are responsive to your FOIA request.

So, I submitted another request and was sure to included a quote from a press release that was explicit about the Department's ownership of Metalicious.

They eventually sent me a copy of a MySQL dump with about 150 databases' columns and table names, including their descriptions. Neat! Progress!

To me, this reasonably shows that the City can provide table names and column names of City databases under IL FOIA.

The CANVAS Request and Trial

This brings us back to the FOIA request for the CANVAS database schema, which was twice appealed and died at the Illinois Supreme Court.

The request included a SQL statement for the City to run in order to fulfil the request. I made some small mistakes that bit me later, which is ripe for another whole post. Essentially, the City denied the request by arguing that the release of this information would jeopardize the security of Chicago's systems:

Your request seeks a copy of tables or columns within each table of CANVAS. The dissemination of these pieces of network information could jeopardize the security of the systems of the City of Chicago.  Please be advised that even if you were to narrow your request, certain records may be withheld from disclosure under the exemptions enumerated in the FOIA, including but not limited to the exemption set forthin 5 ILCS 140/7(1)(o).

I disagree wholeheartedly and Thomas Ptacek goes into more detail in his companion post.

Upon recieving this denial, I reached out to my attorneys at Loevy & Loevy who agreed to sue.

"Civic Hacker"

Eventually there was a trial in January 2020. During the trial, the City's attorneys argued that my intent was nefarious:

They are seeking the ability to have information that helps Mr. Chapman, civic hacker, go into the system and manipulate the data for whatever means he sees fit. That is not something that FOIA requires the City to do.

I have no idea where they came up with the idea that I wanted to manipulate their data, especially considering that just four months earlier, I was asked to help the City with parking ticket reform.

While we were waiting for the trial date, Kate LeFurgy, Director of Comms for the Office of the Mayor, reached out to me and asked if I could help with some parking ticket analysis (for free). I agreed, and compiled a spreadsheet detailing how a large number of vehicles received a disproportionate number of tickets—groupings that highlight, for example, one vehicle which received at least three tickets per week for 41 continuous weeks.

This is incredible. I can't thank you enough as to how helpful this was. I truly appreciate your time and talents on this work. It has been invaluable in shaping the reform measures we hope to put in place later this year.
-Kate LeFurgy | Fri, Aug 23, 2019

Those good spirits did not last long, and LeFurgy did not respond to my emails asking for thoughts on the CANVAS litigation.

Privacy When It's Convenient

Chicago's expert witness, Bruce Coffing, said in court:

In this particular case we are saying, I'm saying that from defending this, our constituents' information, their private information, one of the things that helps us defend that system is not making this [schema information] available.

It is not the only thing we do. We do many things. But I don't want to make it easier for the bad guys and bad gals out there to attack our system and let— put our constituents' private data at risk.

This argument is striking to me, because the City has already shared so much private data through FOIA.

For instance, in 2018, when I requested parking ticket data from the Department of Finance, their FOIA officer told me that they could not include both license plates andthe vehicles' registered address. To resolve this issue, they offered to remove license plate data and only provide addresses.

However, they had already given me the license plate data of millions of ticketed vehicles, in response to a different, earlier FOIA request. So, I received registered home addresses from one request, and license plates from another.

The responsive records from these two separate FOIA requests can easily be paired.

To demonstrate the extent of this problem, I created this visualization which shows the scale of private information disclosed by the Department of Finance: vehicle addresses from every U.S. state, including 11,057 unique addresses of Texas vehicles and 48,707 from Michigan.

I've been told by a reliable source that the Department of Finance no longer sends license plates nor registered addresses in response to FOIA requests.

Next Steps

The whole point of this entire thing was to make it easier to request data through FOIA. Ultimately, the goal is to simply send a SQL statement to an agency for them to run, and avoid so much of the usual nonsense. Basically, an API.

Relatedly, these two bills from last year were interesting, and sought to fix the IL Supreme Court's bad decision. But they didn't go anywhere during last year's session.

Fortunately this year, a new bill was filed with the addition of this language:

[...] and shall include the identification and a plain-text description of each of the types or categories of information of each field of each database of the public body. [...] and shall provide a sufficient description of the structures of all databases under the control of the public body to allow a requester to request the public body to perform specific database queries.
That's pretty neat! I hope it passes.

]]>
https://mchap.io/losing-a-5yr-long-illinois-foia-lawsuit-for-database-schemas.html hacker-news-small-sites-43237352 Mon, 03 Mar 2025 01:30:12 GMT
<![CDATA[The "strategic reserve" exposes crypto as the scam it always was]]> thread link) | @kolchinski
March 2, 2025 | https://alexkolchinski.com/2025/03/03/the-strategic-reserve-exposes-crypto-as-the-scam-it-always-was/ | archive.org

Today, President Trump announced that the US Government would begin using taxpayer dollars to systematically buy up a variety of cryptocurrencies. Crypto prices shot up on the news.

This is revealing, as crypto boosters have argued for years that cryptocurrency has legitimate economic value as a payment system outside of the government’s purview.

Instead, those same crypto boosters are now tapping the White House for money — in US Dollars, coming from US taxpayers.

Why?

Crypto has been one of the biggest speculative bubbles of all time, maybe the single biggest ever. Millions of retail investors have piled into crypto assets in the hope and expectation that prices will continue to go up. (Notice how much of the chatter around crypto is always around prices, as opposed to non-speculative uses.)

However, every bubble bursts once it runs out of gamblers to put new money in, and it may be that the crypto community believes that that time is near for crypto, as they are now turning to the biggest buyer in the world — the US Government — for help.

This shows that all the claims that crypto leaders have made for years about crypto’s value as a currency outside of government control have been self-serving lies all along: the people who have most prominently argued that position are now begging the White House to hand them USD for their crypto.

It also reveals how much crypto has turned into a cancer on our entire society.

In previous Ponzi schemes, the government has often stepped in to defuse bubbles and protect retail investors from being taken in by scammers.

But in this wave, not only has the government not stepped in to stop the scam, it has now been captured by people with a vested interest in keeping it going as long as possible.

Our president and a number of members of his inner circles hold large amounts of cryptocurrency and have a vested interested in seeing its value rise — Trump’s personal memecoin being a particularly notable example. And many other people in the corridors of power in Washington and Silicon Valley are in the same boat. “It is difficult to get a man to understand something, when his salary depends on his not understanding it”, and so some of the most prominent people in the country are now prepared to make any argument and implement any policy decision to boost the value of their crypto holdings.

How does this end?

Once the US taxpayer is tapped out, there’s not going to be any remaining larger pool of demand to keep crypto prices up, and in every previous speculative bubble, once confidence evaporates, prices will fall, probably precipitously. Unfortunately, as millions of people now have significant crypto holdings, and stablecoins have entangled crypto with fiat currency, the damage to the economy may be widespread.

The end of the crypto frenzy would, in the end, be a good thing. Cryptocurrency has a few legitimate uses, like helping citizens of repressive regimes avoid currency controls and reducing fees on remittances. But it has also enabled vast evil in the world. Diverting trillions of dollars away from productive investments into gambling is bad enough, but the untraceability of crypto has also enabled terrorist organizations, criminal networks, and rogue states like North Korea to fund themselves far more effectively than ever before. I’ve been hearing from my friends in the finance world that North Korea now generates a significant fraction, if not a majority, of its revenues by running crypto scams on Westerners, and that the scale of scams overall has grown by a factor of 10 since crypto became widely used (why do you think you’re getting so many calls and texts from scammers lately?)

I hope that the end of this frenzy of gambling and fraud comes soon. But in the meantime, let’s hope that not too much of our tax money goes to paying the scammers, and that when the collapse comes it doesn’t take down our entire economy with it.

Thanks to Alec Bell for helping edit this essay.

]]>
https://alexkolchinski.com/2025/03/03/the-strategic-reserve-exposes-crypto-as-the-scam-it-always-was/ hacker-news-small-sites-43236752 Mon, 03 Mar 2025 00:08:56 GMT
<![CDATA[Rotors: A practical introduction for 3D graphics (2023)]]> thread link) | @bladeee
March 2, 2025 | https://jacquesheunis.com/post/rotors/ | archive.org

When putting 3D graphics on a screen, we need a way to express rotations of the geometry we’re rendering. To avoid the problems that come with storing rotations as axes & angles, we could use quaternions. However quaternions require that we think in 4 distinct spatial dimensions, something humans are notoriously bad at. Thankfully there is an alternative that some argue is far more elegant and simpler to understand: Rotors.

Rotors come from an area of mathematics called geometric algebra. Over the past few years I’ve seen a steady increase in the number of people claiming we should bin quaternions entirely in 3D graphics and replace them with rotors. I know nothing about either so I figured I’d try out rotors. I struggled to find educational materials online that clicked well with how I think about these things though, so this post is my own explanation of rotors and the surrounding mathematical concepts. It’s written with the specific intention of implementing rotation for 3D graphics and is intended to be used partly as an educational text and partly as a reference page.

There are two sections: The first half is purely theoretical, where we’ll look at where rotors “come from”, investigate how they behave and see how we can use them to perform rotations. The second half will cover practical applications and includes example code for use-cases you’re likely to encounter in 3D graphics.

A word on notation

\(\global\def\v#1{\mathbf{#1}}\)

In this post we will write vectors, bivectors and trivectors in bold and lower-case (e.g \(\v{v}\) is a vector). Rotors will be written in bold and upper-case (e.g \(\v{R}\) is a rotor).

The basis elements of our 3D space are denoted \(\v{e_1, e_2, e_3}\), so for example \(\v{v} = v_1\v{e_1} + v_2\v{e_2} + v_3\v{e_3}\).

Where multiplication tables are given, the first argument is the entry on the far left column of the table and the second argument is the entry on the top row of the table.

Since this post is primarily concerned with 3D graphics & simulation, we will restrict our examples to 3 dimensions of space. Rotors (unlike quaternions) can easily be extended to higher dimensions but this is left as an exercise for the reader.

Introducing: The wedge product

We begin our journey by defining a new way to combine two vectors: the so-called “wedge product”, written as \(\v{a \wedge b}\). We define the wedge product of two vectors as an associative product that distributes over addition and which is zero when both arguments are the same:

\[\begin{equation} \v{v \wedge v} = 0 \tag{ 1 } \end{equation} \]

From this we can show that the wedge product is also anticommutative:

\(\v{(a \wedge b) = -(b \wedge a)}\)

Given vectors \(\v{a}\) and \(\v{b}\): \[ \begin{aligned} (\v{a + b}) \wedge (\v{a + b}) &= 0 \\ (\v{a \wedge a}) + (\v{a \wedge b}) + (\v{b \wedge a}) + (\v{b \wedge b}) &= 0 \\ 0 + (\v{a \wedge b}) + (\v{b \wedge a}) + 0 &= 0 \\ (\v{a \wedge b}) &= -(\v{b \wedge a}) \end{aligned} \]

We have yet to specify how to actually “compute” a wedge product though. We know that it produces zero when both arguments are equivalent but what if they aren’t? In this case we “compute” the wedge product by expressing the arguments in terms of its basis elements and multiplying out.

When it comes down to a pair of basis vectors we just leave them be. So for example we don’t simplify \(\v{e_1} \wedge \v{e_2}\) any further. This is because \(\v{e_1} \wedge \v{e_2}\) is not a vector. It’s a new kind of entity called a bivector. If you think of an ordinary a vector as a point (offset from the origin), then the bivector produced by applying the wedge product to those two vectors can be visualised as the infinite plane containing the origin and those two points. Equivalently, you can think of a bivector as the direction that is normal to the plane formed by the two vectors that we wedged together. The bivector \(\v{e_1 \wedge e_2}\) is in some sense the normal going in the same direction as the vector \(\v{e_3}\).

In the same way that we have basis vectors (\(\v{e_1}, \v{e_2}, \v{e_3})\), we also have basis bivectors: \(\v{e_{12}}, \v{e_{23}}, \v{e_{31}}\). Conveniently, these bivector basis elements are simple wedge products of the vector basis elements: \[ \v{e_{12}} = \v{e_1} \wedge \v{e_2} \\ \v{e_{23}} = \v{e_2} \wedge \v{e_3} \\ \v{e_{31}} = \v{e_3} \wedge \v{e_1} \]

Note that (as with vectors) we’re not restricted to a specific set of basis bivectors. Some texts prefer to use \(\v{e_{12}}, \v{e_{13}}, \v{e_{23}}\). The calculations work out a little differently but the logic is the same. For this post we’ll use \(\v{e_{12}}, \v{e_{23}}, \v{e_{31}}\) throughout. An important thing to note is that the 3-dimensional case is a little misleading here. It is very easy to confuse vectors with bivectors because they have the same number of basis elements. This is not true in higher dimensions. In 4-dimensional space, for example, there are 4 basis vectors but 6 basis bivectors so we should always explicitly state what basis elements we’re using in our calculations.

One last realisation is that in 3D we can go one step further. On top of vectors (representing lines) and bivectors (representing planes), we also have trivectors, which represent volumes. Trivectors are as far as we can go in 3D though because the space itself is 3-dimensional, there’s no room for more dimensions! Trivectors in 3D are sometimes referred to as “pseudoscalars” since they have only 1 basis element: \(\v{e_{123}}\). Trivectors in 3D are oriented (in the sense that the coefficient of the trivector basis element can be negative) but otherwise contain no positional information.

Below is a multiplication table for the wedge product of our 3D basis vectors:

\[\begin{array}{c|c:c:c} \wedge & \v{e_1} & \v{e_2} & \v{e_3} \\ \hline \v{e_1} & 0 & \v{e_{12}} & -\v{e_{31}} \\ \v{e_2} & -\v{e_{12}} & 0 & \v{e_{23}} \\ \v{e_3} & \v{e_{31}} & -\v{e_{23}} & 0 \\ \end{array}\]

Wedge product of non-basis vectors

Let us see what happens if we wedge together two arbitrary 3D vectors in the above manner:

\(\v{v \wedge u} = (v_1u_2 - v_2u_1)\v{e_{12}} + (v_2u_3 - v_3u_2)\v{e_{23}} + (v_3u_1 - v_1u_3)\v{e_{31}}\)

Given vectors \(\v{v} = v_1\v{e_1} + v_2\v{e_2} + v_3\v{e_3}\) and \(\v{u} = u_1\v{e_1} + u_2\v{e_2} + u_3\v{e_3}\):

\[ \begin{align*} \v{v \wedge u} &= (v_1\v{e_1} + v_2\v{e_2} + v_3\v{e_3}) \wedge (u_1\v{e_1} + u_2\v{e_2} + u_3\v{e_3}) \\ \v{v \wedge u} &= (v_1\v{e_1} \wedge u_1\v{e_1}) + (v_1\v{e_1} \wedge u_2\v{e_2}) + (v_1\v{e_1} \wedge u_3\v{e_3}) \tag{distribute over +}\\ &+ (v_2\v{e_2} \wedge u_1\v{e_1}) + (v_2\v{e_2} \wedge u_2\v{e_2}) + (v_2\v{e_2} \wedge u_3\v{e_3}) \\ &+ (v_3\v{e_3} \wedge u_1\v{e_1}) + (v_3\v{e_3} \wedge u_2\v{e_2}) + (v_3\v{e_3} \wedge u_3\v{e_3}) \\ \v{v \wedge u} &= v_1u_1(\v{e_1 \wedge e_1}) + v_1u_2(\v{e_1 \wedge e_2}) + v_1u_3(\v{e_1 \wedge e_3}) \tag{pull out coefficients}\\ &+ v_2u_1(\v{e_2 \wedge e_1}) + v_2u_2(\v{e_2 \wedge e_2}) + v_2u_3(\v{e_2 \wedge e_3}) \\ &+ v_3u_1(\v{e_3 \wedge e_1}) + v_3u_2(\v{e_3 \wedge e_2}) + v_3u_3(\v{e_3 \wedge e_3}) \\ \v{v \wedge u} &= 0 + v_1u_2\v{e_{12}} - v_1u_3\v{e_{31}} \\ &- v_2u_1\v{e_{12}} + 0 + v_2u_3\v{e_{23}} \\ &+ v_3u_1\v{e_{31}} - v_3u_2\v{e_{23}} + 0 \\ \v{v \wedge u} &= (v_1u_2 - v_2u_1)\v{e_{12}} + (v_2u_3 - v_3u_2)\v{e_{23}} + (v_3u_1 - v_1u_3)\v{e_{31}} \\ \end{align*} \]

Well now, those coefficients look awfully familiar don’t they? They’re exactly the coefficients of the usual 3D cross-product.1 This lines up with our earlier claim that bivectors function as normals: If you look at which coefficients go with which bivector basis elements, you’ll see that the coefficient for \(\v{e_{23}}\) is the same as the coefficient of \(\v{x}\) in the usual 3D cross-product.

By virtue of “sharing” the equation for 3D vector cross product, we can conclude that the magnitude of the bivector \(\v{v \wedge u}\) is equal to the area of the parallelogram formed by \(\v{v}\) and \(\v{u}\). A neat geometric proof of this (with diagrams) can be found on the mathematics Stack Exchange. The sign of the area indicates the winding order of the parallelogram, although which direction is positive and which is negative will depend on the handedness of your coordinate system.

Like vectors, bivectors can be written as the sum of some basis elements each multiplied by some scalar. It may not come as a surprise then that as with vectors, adding two bivectors together is simply a matter of adding each of the constituent components.

So we have vector addition and bivector addition. Can we add a vector to a bivector? Yes, but we leave them as separate terms. In the same way that we don’t try to “simplify” \(\v{e_1 + e_2}\), we also don’t try to simplify \(\v{e_1 + e_{12}}\). We just leave them as the sum of these two different entities. The resulting object is neither a vector nor a bivector, but is a more general object called a “multivector”. A multivector is just a sum of scalars, vectors, bivectors, trivectors etc. All scalars, vectors, bivectors etc are also multivectors, except that they only have non-zero coefficients on one “type” of basis element. So for example you could write the vector \(\v{e_1}\) as the multivector \(\v{e_1} + 0\v{e_{12}}\).

Multivectors are particularly relevant to the discussion of our second new operation and the protagonist of this post:

Geometric product

The geometric product is defined for arbitrary multivectors, is associative, and distributes over addition. Somewhat annoyingly (in an environment involving several types of products), it is denoted with no symbol, as just \(\v{ab}\). If we have two vectors \(\v{a}\) and \(\v{b}\), we can calculate their geometric product as:

\[\begin{equation} \v{ab = (a \cdot b) + (a \wedge b)} \tag{ 2 } \end{equation} \]

Where \(\cdot\) is the usual dot product we know from traditional linear algebra. Note that if both inputs are the same then by equation 1 we get:

\[\begin{equation} \v{aa} = \v{a \cdot a} \tag{ 3 } \end{equation} \]

This, and the fact that our basis vectors are all unit-length and perpendicular to one another leads us to: \(\v{e_ie_i} = \v{e_i} \cdot \v{e_i} = 1\) and \(\v{e_ie_j} = 0 + \v{e_i} \wedge \v{e_j} = -\v{e_je_i} ~~~\forall i \neq j\).

In particular this means that \(\v{e_1e_2} = \v{e_1 \wedge e_2} = \v{e_{12}}\) (and similarly for the other basis bivectors). Indeed basis bivectors being the wedge product of basis vectors is now revealed to be a special case of being the geometric product of basis vectors. This leads us to an analogous definition for trivectors: \(\v{e_{123}} = \v{e_1e_2e_3}\).

At this point we can compute a complete multiplication table for the geometric product with basis elements in 3D:

\[\begin{array}{c|c:c:c:c:c:c:c:c} \cdot\wedge & \v{e_1} & \v{e_2} & \v{e_3} & \v{e_{12}} & \v{e_{31}} & \v{e_{23}} & \v{e_{123}} \\ \hline \v{e_1} & 1 & \v{e_{12}} & -\v{e_{31}} & \v{e_2} & -\v{e_3} & \v{e_{123}} & \v{e_{23}} \\ \v{e_2} & -\v{e_{12}} & 1 & \v{e_{23}} & -\v{e_1} & \v{e_{123}} & \v{e_3} & \v{e_{31}} \\ \v{e_3} & \v{e_{31}} & -\v{e_{23}} & 1 & \v{e_{123}} & \v{e_1} & -\v{e_2} & \v{e_{12}} \\ \v{e_{12}} & -\v{e_2} & \v{e_1} & \v{e_{123}} & -1 & \v{e_{23}} & -\v{e_{31}} & -\v{e_3} \\ \v{e_{31}} & \v{e_3} & \v{e_{123}} & -\v{e_1} & -\v{e_{23}} & -1 & \v{e_{12}} & -\v{e_2} \\ \v{e_{23}} & \v{e_{123}} & -\v{e_3} & \v{e_2} & \v{e_{31}} & -\v{e_{12}} & -1 & -\v{e_1} \\ \v{e_{123}} & \v{e_{23}} & \v{e_{31}} & \v{e_{12}} & -\v{e_3} & -\v{e_2} & -\v{e_1} & -1 \\ \end{array}\]

Some multiplication table entries derived In case it's not clear how we can arrive at some of the values in the table above, here are some worked examples: \[ \v{e_1e_3} = \v{e_1 \wedge e_3} = -(\v{e_3 \wedge e_1}) = -\v{e_{31}} \\ \v{e_1e_{12}} = \v{e_1(e_1e_2)} = \v{(e_1e_1)e_2} = 1\v{e_2} = \v{e_2} \\ \v{e_3e_{12}} = \v{e_3e_1e_2} = -\v{e_1e_3e_2} = \v{e_1e_2e_3} = \v{e_{123}} \\ \v{e_{12}e_{12}} = (\v{e_1e_2})(\v{e_1e_2}) = (-\v{e_2e_1})(\v{e_1e_2}) = -\v{e_2}(\v{e_1e_1})\v{e_2} = -\v{e_2e_2} = -1 \\ \v{e_{123}e_{2}} = \v{(e_1e_2e_3)e_2} = -(\v{e_1e_3e_2})\v{e_2} = -\v{e_1e_3}(\v{e_2e_2}) = -\v{e_1e_3} = \v{e_3e_1} = \v{e_{31}}\\ \]

To compute the geometric product of two arbitrary multivectors, we can break the arguments down into their constituent basis elements and manipulate only those (using the multiplication table above). We need to do this because equations 2 and 3 above apply only to vectors, and do not apply to bivectors (or trivectors etc).

Inverses under the geometric product

Under the geometric product all non-zero vectors \(\v{v}\) have an inverse:

\[\begin{equation} \v{v}^{-1} = \frac{\v{v}}{|\v{v}|^2} \tag{ 4 } \end{equation} \]

Proof: \(\v{v}^{-1} = \frac{\v{v}}{|\v{v}|^2}\) Given a vector \(\v{v} \neq 0\) let's take \(\v{v^\prime} = \frac{\v{v}}{|\v{v}|^2}\), then: \[\begin{aligned} \v{vv^\prime} &= \v{v \cdot v^\prime + v \wedge v^\prime} \\ \v{vv^\prime} &= \frac{1}{|\v{v}|^2}(\v{v \cdot v}) + \frac{1}{|\v{v}|^2}(\v{v \wedge v}) \\ \v{vv^\prime} &= \frac{1}{|\v{v}|^2}|\v{v}|^2 + \frac{1}{|\v{v}|^2}0 \\ \v{vv^\prime} &= \frac{|\v{v}|^2}{|\v{v}|^2} \\ \v{vv^\prime} &= 1 \\ \end{aligned}\] and similarly if we multiply on the left side: \[\begin{aligned} \v{v^\prime v} &= \v{v^\prime \cdot v + v^\prime \wedge v} \\ \v{v^\prime v} &= \frac{1}{|\v{v}|^2}(\v{v \cdot v}) + \frac{1}{|\v{v}|^2}(\v{v \wedge v}) \\ \v{v^\prime v} &= \frac{1}{|\v{v}|^2}|\v{v}|^2 + \frac{1}{|\v{v}|^2}0 \\ \v{v^\prime v} &= \frac{|\v{v}|^2}{|\v{v}|^2} \\ \v{v^\prime v} &= 1 \\ \end{aligned}\] So \(\v{v^\prime} = \frac{\v{v}}{|\v{v}|^2} = \v{v^{-1}}\), the inverse of \(\v{v}\).

Similarly, the geometric product of two vectors \(\v{a}\) and \(\v{b}\) also has an inverse:

\[\begin{equation} (\v{ab})^{-1} = \v{b}^{-1}\v{a}^{-1} \tag{ 5 } \end{equation} \]

Proof: \((\v{ab})^{-1} = \v{b}^{-1}\v{a}^{-1}\) Given any two vectors \(\v{a}\) and \(\v{b}\) then we can multiply on the right: \[ (\v{ab})(\v{b}^{-1}\v{a}^{-1}) = \v{a}(\v{bb}^{-1})\v{a}^{-1} = \v{a}(1)\v{a}^{-1} = \v{aa}^{-1} = 1 \] and on the left: \[ (\v{b}^{-1}\v{a}^{-1})(\v{ab}) = \v{b}^{-1}(\v{a}^{-1}\v{a})\v{b} = \v{b}^{-1}(1)\v{b} = \v{b}^{-1}\v{b} = 1 \] and conclude that \(\v{b}^{-1}\v{a}^{-1} = (\v{ab})^{-1}\), the (left and right) inverse of \(\v{ab}\).

Since every vector has an inverse, for any two vectors \(\v{a}\) and \(\v{b}\) we can write: \[\begin{aligned} \v{a} &= \v{a} \\ \v{a} &= \v{abb}^{-1} \\ \v{a} &= \frac{1}{|\v{b}|^2} \v{(ab)b} \\ \v{a} &= \frac{1}{|\v{b}|^2} \v{(a \cdot b + a \wedge b) b} \\ \v{a} &= \frac{\v{a \cdot b}}{|\v{b}|^2} \v{b} + \frac{\v{a \wedge b}}{|\v{b}|^2} \v{b} \\ \end{aligned}\]

From this we conclude that for two arbitrary (non-zero) vectors \(\v{a}\) and \(\v{b}\), we can write one in terms of components parallel and perpendicular to the other:

\[\begin{equation} \v{a} = \v{a}_{\parallel b} + \v{a}_{\perp b} \tag{ 6 } \end{equation} \]

Where \(\v{a_{\parallel b}}\) is the component of \(\v{a}\) parallel to \(\v{b}\) (the projection of \(\v{a}\) onto \(\v{b}\)) and \(\v{a_{\perp b}}\) is the component of \(\v{a}\) perpendicular to \(\v{b}\) (the rejection of \(\v{a}\) from \(\v{b}\)). We know from linear algebra that

\[\begin{equation} \v{a_{\parallel b}} = \frac{\v{a \cdot b}}{|\v{b}|^2}\v{b} \tag{ 7 } \end{equation} \]

Substituting into the calculation above we get \(\v{a} = \v{a_{\parallel b}} + \frac{\v{a \wedge b}}{|\v{b}|^2} \v{b} \) from which we conclude that

\[\begin{equation} \v{a_{\perp b}} = \frac{\v{a \wedge b}}{|\v{b}|^2}\v{b} \tag{ 8 } \end{equation} \]

Reflections with the geometric product

Recall from linear algebra that given two non-zero vectors \(\v{a}\) and \(\v{v}\), we can write the reflection of \(\v{a}\) over \(\v{v}\) as:

\[ \v{a^\prime} = \v{a} - 2\v{a_{\perp v}} = \v{a}_{\parallel v} - \v{a}_{\perp v} \]

If we substitute equations 7 and 8 we get:

\[\begin{aligned} \v{a^\prime} &= \v{a}_{\parallel v} - \v{a}_{\perp v} \\ &= \frac{\v{v \cdot a}}{|\v{a}|^2} \v{a} - \frac{\v{v \wedge a}}{|\v{a}|^2} \v{a} \\ &= (\v{v \cdot a})\frac{\v{a}}{|\v{a}|^2} - (\v{v \wedge a})\frac{\v{a}}{|\v{a}|^2} \\ &= \v{(v \cdot a)a}^{-1} - (\v{v \wedge a})\v{a}^{-1} \\ &= \v{(a \cdot v)a}^{-1} + (\v{a \wedge v})\v{a}^{-1} \\ &= (\v{a \cdot v + a \wedge v})\v{a}^{-1} \\ &= \v{(av)a}^{-1} \\ \v{a^\prime} &= \v{ava}^{-1} \\ \end{aligned}\]

So we can reflect vectors using only the geometric product. \(\v{ava}^{-1}\) is a form we will see quite often, and is sometimes referred to as a “sandwich product”.

The first and last lines of the calculation above together demonstrate an important property: Since \(\v{ava}^{-1} = \v{a}_{\parallel v} - \v{a}_{\perp v}\), we know that \(\v{ava}^{-1}\) is just a vector and contains no scalar, bivector (or trivector etc) components. This means we can use the output of such a sandwich product as the input to another sandwich product, which we will do shortly.

For our own convenience, we can also produce an equation for the output of such a sandwich product:

Equation for 3D sandwich product \[\begin{align*} & \v{ava}^{-1} \\ =~& (\v{av})\v{a}^{-1} \\ =~& |a|^{-2} (\v{av})\v{a} \\ =~& |a|^{-2} (\v{(a \cdot v) + (a \wedge v)})\v{a} \\ =~& |a|^{-2} \lbrack \\ & (a_1v_1 + a_2v_2 + a_3v_3) \\ & + (a_1v_2 - a_2v_1)\v{e_{12}} \\ & + (a_2v_3 - a_3v_2)\v{e_{23}} \\ & + (a_3v_1 - a_1v_3)\v{e_{31}} \\ & \rbrack (a_1 \v{e_1} + a_2 \v{e_2} + a_3 \v{e_3}) \\ =~& |a|^{-2} \lbrack \\ & (a_1v_1 + a_2v_2 + a_3v_3)(a_1 \v{e_1} + a_2 \v{e_2} + a_3 \v{e_3}) \\ & + (a_1v_2 - a_2v_1)\v{e_{12}}(a_1 \v{e_1} + a_2 \v{e_2} + a_3 \v{e_3}) \\ & + (a_2v_3 - a_3v_2)\v{e_{23}}(a_1 \v{e_1} + a_2 \v{e_2} + a_3 \v{e_3}) \\ & + (a_3v_1 - a_1v_3)\v{e_{31}}(a_1 \v{e_1} + a_2 \v{e_2} + a_3 \v{e_3}) \\ \rbrack \\ =~& |a|^{-2} \lbrack \tag{multiply out the brackets on the right} \\ & (a_1v_1 + a_2v_2 + a_3v_3)a_1\v{e_1} + (a_1v_1 + a_2v_2 + a_3v_3)a_2\v{e_2} + (a_1v_1 + a_2v_2 + a_3v_3)a_3\v{e_3} \\ & + (a_1v_2 - a_2v_1)a_1\v{e_{12}}\v{e_1} + (a_1v_2 - a_2v_1)a_2\v{e_{12}}\v{e_2} + (a_1v_2 - a_2v_1)a_3\v{e_{12}}\v{e_3} \\ & + (a_2v_3 - a_3v_2)a_1\v{e_{23}}\v{e_1} + (a_2v_3 - a_3v_2)a_2\v{e_{23}}\v{e_2} + (a_2v_3 - a_3v_2)a_3\v{e_{23}}\v{e_3} \\ & + (a_3v_1 - a_1v_3)a_1\v{e_{31}}\v{e_1} + (a_3v_1 - a_1v_3)a_2\v{e_{31}}\v{e_2} + (a_3v_1 - a_1v_3)a_3\v{e_{31}}\v{e_3} \\ \rbrack \\ =~& |a|^{-2} \lbrack \tag{simplify the basis element products} \\ & (a_1v_1 + a_2v_2 + a_3v_3)a_1\v{e_1} + (a_1v_1 + a_2v_2 + a_3v_3)a_2\v{e_2} + (a_1v_1 + a_2v_2 + a_3v_3)a_3\v{e_3} \\ & - (a_1v_2 - a_2v_1)a_1\v{e_2} + (a_1v_2 - a_2v_1)a_2\v{e_1} + (a_1v_2 - a_2v_1)a_3\v{e_{123}} \\ & + (a_2v_3 - a_3v_2)a_1\v{e_{123}} - (a_2v_3 - a_3v_2)a_2\v{e_3} + (a_2v_3 - a_3v_2)a_3\v{e_2} \\ & + (a_3v_1 - a_1v_3)a_1\v{e_3} + (a_3v_1 - a_1v_3)a_2\v{e_{123}} - (a_3v_1 - a_1v_3)a_3\v{e_1} \\ \rbrack \\ =~& |a|^{-2} \lbrack \tag{multiply out the remaining brackets} \\ & a_1a_1v_1\v{e_1} + a_1a_2v_2\v{e_1} + a_3a_1v_3\v{e_1} \\ & + a_1a_2v_1\v{e_2} + a_2a_2v_2\v{e_2} + a_2a_3v_3\v{e_2} \\ & + a_3a_1v_1\v{e_3} + a_2a_3v_2\v{e_3} + a_3a_3v_3\v{e_3} \\ & - a_1a_1v_2\v{e_2} + a_1a_2v_1\v{e_2} + a_1a_2v_2\v{e_1} - a_2a_2v_1\v{e_1} + a_3a_1v_2\v{e_{123}} - a_2a_3v_1\v{e_{123}} \\ & + a_1a_2v_3\v{e_{123}} - a_3a_1v_2\v{e_{123}} - a_2a_2v_3\v{e_3} + a_2a_3v_2\v{e_3} + a_2a_3v_3\v{e_2} - a_3a_3v_2\v{e_2} \\ & + a_3a_1v_1\v{e_3} - a_1a_1v_3\v{e_3} + a_2a_3v_1\v{e_{123}} - a_1a_2v_3\v{e_{123}} - a_3a_3v_1\v{e_1} + a_3a_1v_3\v{e_1} \\ \rbrack \\ =~& |a|^{-2} \lbrack \tag{group by basis vector} \\ & a_1a_1v_1\v{e_1} + a_1a_2v_2\v{e_1} + a_3a_1v_3\v{e_1} + a_1a_2v_2\v{e_1} - a_2a_2v_1\v{e_1} - a_3a_3v_1\v{e_1} + a_3a_1v_3\v{e_1} \\ & + a_1a_2v_1\v{e_2} + a_2a_2v_2\v{e_2} + a_2a_3v_3\v{e_2} - a_1a_1v_2\v{e_2} + a_1a_2v_1\v{e_2} + a_2a_3v_3\v{e_2} - a_3a_3v_2\v{e_2} \\ & + a_3a_1v_1\v{e_3} + a_2a_3v_2\v{e_3} + a_3a_3v_3\v{e_3} - a_2a_2v_3\v{e_3} + a_2a_3v_2\v{e_3} + a_3a_1v_1\v{e_3} - a_1a_1v_3\v{e_3} \\ & + a_3a_1v_2\v{e_{123}} - a_2a_3v_1\v{e_{123}} + a_1a_2v_3\v{e_{123}} - a_3a_1v_2\v{e_{123}} + a_2a_3v_1\v{e_{123}} - a_1a_2v_3\v{e_{123}} \\ \rbrack \\ =~& |a|^{-2} \lbrack \tag{pull out the basis element factors} \\ & (a_1a_1v_1 + a_1a_2v_2 + a_3a_1v_3 + a_1a_2v_2 - a_2a_2v_1 - a_3a_3v_1 + a_3a_1v_3)\v{e_1} \\ & + (a_1a_2v_1 + a_2a_2v_2 + a_2a_3v_3 - a_1a_1v_2 + a_1a_2v_1 + a_2a_3v_3 - a_3a_3v_2)\v{e_2} \\ & + (a_3a_1v_1 + a_2a_3v_2 + a_3a_3v_3 - a_2a_2v_3 + a_2a_3v_2 + a_3a_1v_1 - a_1a_1v_3)\v{e_3} \\ & + (a_3a_1v_2 - a_2a_3v_1 + a_1a_2v_3 - a_3a_1v_2 + a_2a_3v_1 - a_1a_2v_3)\v{e_{123}} \\ \rbrack \\ =~& |a|^{-2} \lbrack \tag{simplify coefficients} \\ & (a_1a_1v_1 - a_2a_2v_1 - a_3a_3v_1 + 2a_1a_2v_2 + 2a_3a_1v_3)\v{e_1} \\ & + (2a_1a_2v_1 - a_1a_1v_2 + a_2a_2v_2 - a_3a_3v_2 + 2a_2a_3v_3)\v{e_2} \\ & + (2a_3a_1v_1 + 2a_2a_3v_2 - a_2a_2v_3 - a_1a_1v_3 + a_3a_3v_3)\v{e_3} \\ & + 0\v{e_{123}} \\ \rbrack \\ =~& |a|^{-2} \lbrack (a_1^2v_1 - a_2^2v_1 - a_3^2v_1 + 2a_1a_2v_2 + 2a_3a_1v_3)\v{e_1} \\ & + (2a_1a_2v_1 - a_1^2v_2 + a_2^2v_2 - a_3^2v_2 + 2a_2a_3v_3)\v{e_2} \\ & + (2a_3a_1v_1 + 2a_2a_3v_2 - a_2^2v_3 - a_1^2v_3 + a_3^2v_3)\v{e_3} \rbrack \\ \end{align*}\] That's a bit of a mouthful, but if we name the coefficients of each basis vector:

\[\begin{equation} \rho_1 = a_1^2v_1 - a_2^2v_1 - a_3^2v_1 + 2a_1a_2v_2 + 2a_3a_1v_3 \tag{ 9 } \end{equation} \]

\[\begin{equation} \rho_2 = 2a_1a_2v_1 - a_1^2v_2 + a_2^2v_2 - a_3^2v_2 + 2a_2a_3v_3 \tag{ 10 } \end{equation} \]

\[\begin{equation} \rho_3 = 2a_3a_1v_1 + 2a_2a_3v_2 - a_2^2v_3 - a_1^2v_3 + a_3^2v_3 \tag{ 11 } \end{equation} \]

then we're left with

\[\begin{equation} \v{ava}^{-1} = \frac{1}{|a|^2} (\rho_1 \v{e_1} + \rho_2 \v{e_2} + \rho_3 \v{e_3}) \tag{ 12 } \end{equation} \]

Rotors as a combination of two reflections

Now that we can safely achieve reflection of one vector over another by way of a geometric sandwich product, rotations are right around the corner: We just reflect twice.

Let \(\v{v}\) be our input vector (the one we’d like to rotate) and say we’d like to reflect over the vectors \(\v{a}\) and then \(\v{b}\). This is just a pair of sandwich products: \(\v{v}^{\prime\prime} = \v{bv}^\prime\v{b}^{-1} = \v{bava}^{-1}\v{b}^{-1}\). If we let \(\v{R = ba}\) then by equation 5 this can be conveniently written as: \(\v{v}^{\prime\prime} = \v{RvR}^{-1}\) and \(\v{R}\) is our rotor.

To see how this works, consider the following example and corresponding diagrams:

Rotation calculation example

Let \(\v{R = ba}\) with \(\v{a} = (\frac{\sqrt{3}}{2}, \frac{1}{2}, 0)\) (which is \((1,0,0)\) rotated 30 degrees counter-clockwise around the Z-axis) and \(\v{b} = (\frac{1 - \sqrt{3}}{2\sqrt{2}}, \frac{1 + \sqrt{3}}{2\sqrt{2}}, 0)\) (which is \((1,0,0)\) rotated 105 degrees counter-clockwise around the Z-axis). We’re rotating around the Z-axis because in the diagrams, positive Z is up.

Let \(\v{v} = (1,0,1)\).

Our rotated vector is therefore \(\v{v}^{\prime\prime} = \v{bv}^\prime\v{b}^{-1} = \v{b}(\v{ava}^{-1})\v{b}^{-1}\).

Let’s start with \(\v{v}^\prime\), and apply equations 9, 10, 11:

\[\begin{aligned} \rho_{a,1} &= a_1^2v_1 - a_2^2v_1 - a_3^2v_1 + 2a_1a_2v_2 + 2a_3a_1v_3 \\ \rho_{a,1} &= \left(\frac{\sqrt{3}}{2}\right)^2(1) - \left(\frac{1}{2}\right)^2(1) - (0)^2(1) + 2\left(\frac{\sqrt{3}}{2}\right)\left(\frac{1}{2}\right)(0) + 2(0)\left(\frac{\sqrt{3}}{2}\right)(1) \\ \rho_{a,1} &= \left(\frac{\sqrt{3}}{2}\right)^2 - \left(\frac{1}{2}\right)^2 \\ \rho_{a,1} &= \frac{3}{4} - \frac{1}{4} = \frac{1}{2}\\ \\ \rho_{a,2} &= 2a_1a_2v_1 - a_1^2v_2 + a_2^2v_2 - a_3^2v_2 + 2a_2a_3v_3 \\ \rho_{a,2} &= 2\left(\frac{\sqrt{3}}{2}\right)\left(\frac{1}{2}\right)(1) - \left(\frac{\sqrt{3}}{2}\right)^2(0) + \left(\frac{1}{2}\right)^2(0) - (0)^2(0) + 2\left(\frac{1}{2}\right)(0)(1) \\ \rho_{a,2} &= 2\left(\frac{\sqrt{3}}{2}\right)\left(\frac{1}{2}\right) \\ \rho_{a,2} &= \frac{\sqrt{3}}{2} \\ \\ \rho_{a,3} &= 2a_3a_1v_1 + 2a_2a_3v_2 - a_2^2v_3 - a_1^2v_3 + a_3^2v_3 \\ \rho_{a,3} &= 2(0)\left(\frac{\sqrt{3}}{2}\right)(1) + 2\left(\frac{1}{2}\right)(0)(0) - \left(\frac{1}{2}\right)^2(1) - \left(\frac{\sqrt{3}}{2}\right)^2(1) + (0)^2(1) \\ \rho_{a,3} &= -\frac{1}{4} - \frac{3}{4} \\ \rho_{a,3} &= -1 \\ \end{aligned}\]

With this done, equation 12 gets us to: \[\begin{aligned} \v{ava}^{-1} &= \frac{1}{|a|^2} (\rho_1 \v{e_1} + \rho_2 \v{e_2} + \rho_3 \v{e_3}) \\ &= (1) \left(\frac{1}{2} \v{e_1} + \frac{\sqrt{3}}{2} \v{e_2} + (-1) \v{e_3}\right) \\ \v{ava}^{-1} = \v{v}^\prime &= \frac{1}{2} \v{e_1} + \frac{\sqrt{3}}{2} \v{e_2} - \v{e_3} \\ \end{aligned}\]

Moving to our second reflection, we repeat the same process (although this time with rather more inconvenient numbers): \[\begin{aligned} \rho_{b,1} &= b_1^2v^\prime_1 - b_2^2v^\prime_1 - b_3^2v^\prime_1 + 2b_1b_2v^\prime_2 + 2b_3b_1v^\prime_3 \\ \rho_{b,1} &= \left(\frac{1-\sqrt{3}}{2\sqrt{2}}\right)^2\left(\frac{1}{2}\right) - \left(\frac{1+\sqrt{3}}{2\sqrt{2}}\right)^2\left(\frac{1}{2}\right) - (0)^2\left(\frac{1}{2}\right) \\ & + 2\left(\frac{1-\sqrt{3}}{2\sqrt{2}}\right)\left(\frac{1+\sqrt{3}}{2\sqrt{2}}\right)\left(\frac{\sqrt{3}}{2}\right) + 2(0)\left(\frac{1-\sqrt{3}}{2\sqrt{2}}\right)(-1) \\ \rho_{b,1} &= \left(\frac{1-\sqrt{3}}{2\sqrt{2}}\right)^2\left(\frac{1}{2}\right) - \left(\frac{1+\sqrt{3}}{2\sqrt{2}}\right)^2\left(\frac{1}{2}\right) + 2\left(\frac{1-\sqrt{3}}{2\sqrt{2}}\right)\left(\frac{1+\sqrt{3}}{2\sqrt{2}}\right)\left(\frac{\sqrt{3}}{2}\right) \\ \rho_{b,1} &= \frac{(1-\sqrt{3})^2}{16} - \frac{(1+\sqrt{3})^2}{16} + \frac{\sqrt{3}(1-\sqrt{3})(1+\sqrt{3})}{8} \\ \rho_{b,1} &= \frac{(4 - 2\sqrt{3}) - (4 + 2\sqrt{3})}{16} + \frac{\sqrt{3}(-2)}{8} \\ \rho_{b,1} &= \frac{-4\sqrt{3}}{16} - \frac{\sqrt{3}}{4} \\ \rho_{b,1} &= -\frac{\sqrt{3}}{2} \\ \\ \rho_{b,2} &= 2b_1b_2v^\prime_1 - b_1^2v^\prime_2 + b_2^2v^\prime_2 - b_3^2v^\prime_2 + 2b_2b_3v^\prime_3 \\ \rho_{b,2} &= 2\left(\frac{1-\sqrt{3}}{2\sqrt{2}}\right)\left(\frac{1+\sqrt{3}}{2\sqrt{2}}\right)\left(\frac{1}{2}\right) - \left(\frac{1-\sqrt{3}}{2\sqrt{2}}\right)^2\left(\frac{\sqrt{3}}{2}\right) + \left(\frac{1+\sqrt{3}}{2\sqrt{2}}\right)^2\left(\frac{\sqrt{3}}{2}\right) \\ & - (0)^2\left(\frac{\sqrt{3}}{2}\right) + 2\left(\frac{1+\sqrt{3}}{2\sqrt{2}}\right)(0)(-1) \\ \rho_{b,2} &= 2\left(\frac{1-\sqrt{3}}{2\sqrt{2}}\right)\left(\frac{1+\sqrt{3}}{2\sqrt{2}}\right)\left(\frac{1}{2}\right) - \left(\frac{1-\sqrt{3}}{2\sqrt{2}}\right)^2\left(\frac{\sqrt{3}}{2}\right) + \left(\frac{1+\sqrt{3}}{2\sqrt{2}}\right)^2\left(\frac{\sqrt{3}}{2}\right) \\ \rho_{b,2} &= \frac{(1-\sqrt{3})(1+\sqrt{3})}{8} - \frac{\sqrt{3}(1-\sqrt{3})^2}{16} + \frac{\sqrt{3}(1+\sqrt{3})^2}{16} \\ \rho_{b,2} &= \frac{-2}{8} - \frac{\sqrt{3}(4 - 2\sqrt{3})}{16} + \frac{\sqrt{3}(4 + 2\sqrt{3})}{16} \\ \rho_{b,2} &= -\frac{1}{4} - \frac{\sqrt{3}(- 4\sqrt{3}))}{16} \\ \rho_{b,2} &= -\frac{1}{4} + \frac{12}{16} \\ \rho_{b,2} &= \frac{1}{2} \\ \\ \rho_{b,3} &= 2b_3b_1v^\prime_1 + 2b_2b_3v^\prime_2 - b_2^2v^\prime_3 - b_1^2v^\prime_3 + b_3^2v^\prime_3 \\ \rho_{b,3} &= 2(0)\left(\frac{1-\sqrt{3}}{2\sqrt{2}}\right)\left(\frac{1}{2}\right) + 2\left(\frac{1+\sqrt{3}}{2\sqrt{2}}\right)(0)\left(\frac{\sqrt{3}}{2}\right) - \left(\frac{1+\sqrt{3}}{2\sqrt{2}}\right)^2(-1) \\ & - \left(\frac{1-\sqrt{3}}{2\sqrt{2}}\right)^2(-1) + (0)^2(-1) \\ \rho_{b,3} &= - \left(\frac{1+\sqrt{3}}{2\sqrt{2}}\right)^2(-1) - \left(\frac{1-\sqrt{3}}{2\sqrt{2}}\right)^2(-1) \\ \rho_{b,3} &= \frac{(1+\sqrt{3})^2}{8} + \frac{(1-\sqrt{3})^2}{8} \\ \rho_{b,3} &= \frac{4 + 2\sqrt{3}}{8} + \frac{4 - 2\sqrt{3}}{8} \\ \rho_{b,3} &= \frac{(4 + 2\sqrt{3}) + (4 - 2\sqrt{3})}{8} \\ \rho_{b,3} &= 1 \\ \end{aligned}\]

Leading us finally to:

\[ \v{v}^{\prime\prime} = \v{bava}^{-1}\v{b}^{-1} = \v{bv}^\prime\v{b}^{-1} = \frac{-\sqrt{3}}{2}\v{e_1} + \frac{1}{2}\v{e_2} + \v{e_3} \]

Of course usually you wouldn’t do it this way, you’d have \(\v{ba}\) precomputed (since that’s the rotor) and you’d just sandwich \(\v{v}\) with that. The calculation can also be simplified significantly because you know that the coefficient of \(\v{e_{123}}\) turns out to be zero. An example of this is given in the practical section below.

Reflection of \(\v{v}\) across \(\v{a}\) and \(\v{b}\) to produce \(\v{v}^{\prime\prime}\), shown in 3D (left) and a 2D top-down view (right).

You can see in the 2D diagram on the right, how each reflection inverts the angle between the input vector and the vector it’s being reflected across. In doing so twice, we have produced a total rotation by twice the angle between the two reflection vectors.

If you were to look only at the 2D diagram on the right however, you might be thinking that we only needed a single reflection. You could indeed get from one point on the circle to any other point on the circle by reflecting over just one appropriately selected vector but this wouldn’t actually be a rotation. The 3D diagram on the left demonstrates one of the reasons why this is not sufficient: We’d end up on the wrong side of the plane of reflection. Having two reflections allows us to “rotate” part of the way with the first reflection, flipping over to the other side of the plane of rotation, and the second rotation “rotates” us the rest of the way around while getting us back across the plane of rotation to our intended rotated vector.

Does that mean that a single vector would be sufficient in 2D? Well no we still need two, because there’s another problem: Reflection is simply not the same transformation as rotation and will, well…reflect…the relative positions of the vectors it’s applied to. Here’s the same example, but with two extra initial vectors, offset slightly from \(\v{v}\):

The same transformation applied to \(\v{v}\) and two vectors offset slightly from it

You can see how our 3 input vectors are in the wrong “order” (if you imagine going around the circle) after the first reflection, but that is fixed by the second reflection.

I confess that this is a slightly hand-wavey geometric justification that leans on one’s intuition for what reflections and rotations should look like. For the stout of heart, Jaap Suter provides a more rigorous algebraic derivation.

The identity rotor

When using rotors for rotation, you are likely to very quickly run into a situation where you want a “no-op” rotation. A rotation which transforms any input vector into itself. You want an identity rotor.

Any rotor that contains only a scalar component is an identity rotor. To see this, recall that we constructed our rotors as the geometric product of two vectors (\(\v{R} = \v{ba}\)). The rotor \(\v{R}\) produces a rotation by twice the angle between \(\v{a}\) and \(\v{b}\). If that angle is zero then twice that angle is still zero and the rotor will produce no rotation. If the angle between the two vectors is zero then we can express one of those vectors as a scalar multiple of the other (\(\v{b} = s\v{a}\) for \(\v{s \in \mathbb{R}}\)). Applying equation 2 then gives \[\begin{aligned} \v{R} &= \v{b \cdot a + b \wedge a} \\ &=(s\v{a}) \cdot \v{a} + (s\v{a}) \wedge \v{a} \\ &=s(\v{a} \cdot \v{a}) + s(\v{a} \wedge \v{a}) \\ &=s|a|^2 + s(0) \\ &=s|a|^2 \end{aligned}\]

Since we’ve placed no restrictions on \(s\) or \(\v{a}\), we may choose \(\v{a} = (1, 0, 0) \implies |a| = 1 \implies \v{R} = 1\).

Axis-angle representation for rotors

Recall from regular vector algebra that \(\v{a \cdot b} = |a| |b| cos\theta\) and \(|\v{a \times b}| = |\v{a \wedge b}| = |a| |b| sin\theta\). With this we can modify equation 2 to get an “axis-angle-like” representation:

\[\begin{aligned} \v{R} &= \v{ba} \\ &= \v{b \cdot a + b \wedge a} \\ &= \v{b \cdot a} + |b \wedge a| \left(\frac{\v{b \wedge a}}{|b \wedge a|}\right) \\ &= |b||a|cos\theta + |b||a|sin\theta \left(\frac{\v{b \wedge a}}{|b \wedge a|}\right) \\ \end{aligned}\]

If we consider just the case where \(\v{a}\) and \(\v{b}\) are unit vectors separated by an angle \(\theta\) then \(|b||a| = 1\) and we can change variables to \(\v{n} = \frac{\v{b \wedge a}}{|b \wedge a|}\) the unit bivector “plane” spanning \(\v{a}\) and \(\v{b}\), to get:

\[ \v{R} = cos\theta + sin\theta \v{n} \]

Finally, recall that the rotor will produce a rotation equal to twice the angle between its constituent vectors and so we should actually use only half of the input angle:

\[\begin{equation} \v{R} = cos\left(\frac{\theta}{2}\right) + sin\left(\frac{\theta}{2}\right) \v{n} \tag{ 13 } \end{equation} \]

Which direction this rotation goes in (clockwise or counter-clockwise) depends on the handedness of your coordinate system, as seen in the example below:

Example axis-angle calculations

Let’s take equation 13 and substitute \(\theta = 60\) degrees and \(\v{n} = (0,0,1)\) gives us:

\[ \v{R} = \frac{\sqrt{3}}{2} + \frac{1}{2}\v{e_{12}} \]

and if we use this to rotate the vector \(\v{v} = (1, 0, 0)\) we get:

\[\begin{aligned} \v{v^\prime} &= \v{RvR^{-1}} \\ &= \left(\frac{\sqrt{3}}{2} + \frac{1}{2}\v{e_{12}}\right)\v{e_1}\left(\frac{\sqrt{3}}{2} - \frac{1}{2}\v{e_{12}}\right) \\ &= \left(\frac{1}{4}\right)(\sqrt{3} + \v{e_{12}})\v{e_1}(\sqrt{3} - \v{e_{12}}) \\ &= \left(\frac{1}{4}\right)[(\sqrt{3} + \v{e_{12}})\v{e_1}](\sqrt{3} - \v{e_{12}}) \\ &= \left(\frac{1}{4}\right)(\sqrt{3}\v{e_1} + \v{e_{12}}\v{e_1})(\sqrt{3} - \v{e_{12}}) \\ &= \left(\frac{1}{4}\right)(\sqrt{3}\v{e_1} - \v{e_{2}})(\sqrt{3} - \v{e_{12}}) \\ &= \left(\frac{1}{4}\right)[\sqrt{3}\v{e_1}(\sqrt{3} - \v{e_{12}}) - \v{e_{2}}(\sqrt{3} - \v{e_{12}})] \\ &= \left(\frac{1}{4}\right)[(\sqrt{3}\v{e_1}\sqrt{3}) - (\sqrt{3}\v{e_1}\v{e_{12}}) - (\v{e_{2}}\sqrt{3}) + (\v{e_{2}}\v{e_{12}})] \\ &= \left(\frac{1}{4}\right)[3\v{e_1} - \sqrt{3}\v{e_2} - \sqrt{3}\v{e_{2}} - \v{e_{1}}] \\ &= \left(\frac{1}{4}\right)[2\v{e_1} - 2\sqrt{3}\v{e_2}] \\ &= \left(\frac{1}{2}\right)[\v{e_1} - \sqrt{3}\v{e_2}] \\ &= \frac{1}{2}\v{e_1} - \frac{\sqrt{3}}{2}\v{e_2} \\ &= \left(\frac{1}{2}, -\frac{\sqrt{3}}{2}, 0\right) \\ \end{aligned}\]

Which is indeed \(v\) rotated 60 degrees around the \(z\)-axis. Notice how we did not need to know (or decide) the handedness of our coordinate system in order to compute this. The calculation is the same, it just looks different when you draw/render it.

The same rotation (from the calculation above), shown in left-handed (left) and right-handed (right) coordinate systems

If you want to claim that a rotation is clockwise or counter-clockwise you need to give a reference viewpoint. If your reference is “looking along the axis” then the rotation in left-handed coordinates is going clockwise, while in the right-handed coordinates it’s counter-clockwise.

Applications: Putting rotors to work

Now that we’ve seen the theory of rotors, let’s turn our attention to more practical concerns. Below is a small collections of answers to questions I encountered myself when implementing rotors, with C++ code for reference.

How do I store a rotor in memory?

A rotor is just the geometric product of the two vectors that form the plane of rotation. In 3D it contains a scalar component and 3 bivector components. So we just store it as a tuple of 4 numbers (as we would a 4D vector used in the usual homogeneous coordinates setup)

struct rotor3
{
    float scalar;
    float xy;
    float yz;
    float zx;
};

How do I represent an <em>orientation</em> (as opposed to a <em>rotation</em>)?

Rotors (like quaternions) encode rotations. Transforms that, when applied to an orientation, produce a new orientation. There is no such thing as “a rotor pointing along the X-axis”, for example. This is great when we have something with a particular orientation (e.g a player character facing down the X axis) and want to transform it to some other orientation (e.g you want your player character to instead face down the Z axis), but doesn’t immediately help us encode “the player character is facing down the X axis” in the first place.

Thankfully we can select a convention for a “default” orientation (“facing down the X axis” for example) and then encode all orientations as rotations away from that default orientation.

How do I produce a rotor representing a rotation from orientation A to orientation B?

Let’s represent an orientation as a unit vector along the “forward” direction of the orientation. Now we have two vectors representing the initial and final orientations and we want to rotate from the initial vector to the final vector.

We could create a rotor just from those two vectors directly, but while that would produce a rotation in the correct plane in the correct direction, it would rotate twice as far as we’d like (since the rotation you get by applying a rotor to a vector is twice the angle between the two vectors that defined the rotor). The naive approach would be to compute the angle between our two vectors and then use an existing axis-angle rotation function to produce a “half-way” vector and then construct our rotor from that:

vec3 axis_angle_rotate(vec3 axis, float angle, vec3 vector_to_rotate);

rotor3 from_to_naive(vec3 from_dir, vec3 to_dir)
{
    // Calculations below assume the input directions are normalised
    from_dir = from_dir.normalized();
    to_dir = to_dir.normalized();

    // Get the angle between the input directions
    const float theta = acosf(dot(from_dir, to_dir));

    // Get the axis of rotation/normal of the plane of rotation
    const vec3 axis = cross(from_dir, to_dir).normalized();

    // Compute the second vector for our rotor, half way between from_dir and to_dir
    const vec3 halfway = axis_angle_rotate(axis, theta*0.5f, from_dir);

    const vec3 wedge = {
        (halfway.x * from_dir.y) - (halfway.y * from_dir.x),
        (halfway.y * from_dir.z) - (halfway.z * from_dir.y),
        (halfway.z * from_dir.x) - (halfway.x * from_dir.z),
    };
    rotor3 result = {};
    result.scalar = dot(from_dir, halfway);
    result.xy = wedge.x;
    result.yz = wedge.y;
    result.zx = wedge.z;
    return result;
}

Of course this assumes the existence of an axis_angle_rotate() function, but thankfully equation 13 provides exactly that. If we normalise the from- and to-vectors and denote the resulting directions as \(\v{a}\) and \(\v{b}\) respectively then we can get the angle between them as \(\theta = cos^{-1}(\v{a \cdot b})\) and our from-to rotor is:

\[\begin{equation} \v{R} = cos\left(\frac{\theta}{2}\right) + sin\left(\frac{\theta}{2}\right)\left(\frac{\v{b \wedge a}}{|b \wedge a|}\right) \tag{ 14 } \end{equation} \]

rotor3 from_to_rotor(vec3 from_dir, vec3 to_dir)
{
    // Calculations below assume the input directions are normalised
    from_dir = from_dir.normalized();
    to_dir = to_dir.normalized();

    // Get the angle between the input directions
    const float theta = acosf(dot(from_dir, to_dir));
    const float cos_half_theta = cosf(theta * 0.5f);
    const float sin_half_theta = sinf(theta * 0.5f);

    // Compute the normalized "to_dir wedge from_dir" product
    const vec3 wedge = vec3 {
        (to_dir.x * from_dir.y) - (to_dir.y * from_dir.x),
        (to_dir.y * from_dir.z) - (to_dir.z * from_dir.y),
        (to_dir.z * from_dir.x) - (to_dir.x * from_dir.z),
    }.normalized();

    rotor3 result = {};
    result.scalar = cos_half_theta;
    result.xy = sin_half_theta * wedge.x;
    result.yz = sin_half_theta * wedge.y;
    result.zx = sin_half_theta * wedge.z;
    return result;
}

This will be correct but requires us to do a bunch of trigonometry and if we could achieve the same thing without trigonometry then that might be faster (but as with all performance-motivated changes, you should measure it).

Recall that a rotor defined as the product of two vectors will produce a rotation from one toward the other. The problem is that it will produce a rotation by twice the angle between the input vectors, so it will “rotate past” our destination vector if we just used the product of our input vectors. Naturally then, we could swap out one of the arguments for a vector that is half-way between from and to such that twice the rotation will be precisely what we’re looking for!

rotor3 from_to_rotor_notrig(vec3 from_dir, vec3 to_dir)
{
    from_dir = from_dir.normalized();
    to_dir = to_dir.normalized();

    const vec3 halfway = (from_dir + to_dir).normalized();

    const vec3 wedge = {
        (halfway.x * from_dir.y) - (halfway.y * from_dir.x),
        (halfway.y * from_dir.z) - (halfway.z * from_dir.y),
        (halfway.z * from_dir.x) - (halfway.x * from_dir.z),
    };
    Rotor3 result = {};
    result.scalar = from_dir.dot(halfway);
    result.xy = wedge.x;
    result.yz = wedge.y;
    result.zx = wedge.z;
    return result;
}

I should note, however, that both of these implementations have at least one downside: They fail at (or very close to) from_dir == -to_dir. In the trigonometry-free version, this is because at that point the “halfway” vector will be zero and can’t be normalized so you’ll get garbage. You’d need to either be sure this will not happen or check for it and do something else in that case.

How do I append/combine/multiply two (or more) rotors?

Rotors can be combined by just multiplying them together with the geometric product. We know that a rotor \(\v{R}\) is applied to a vector \(\v{v}\) by way of the sandwich product \(\v{v}^\prime = \v{RvR}^{-1}\) so if we had two rotors \(\v{R}_1\) and \(\v{R}_2\) we’d just apply them in order: \(\v{v}^\prime = \v{R}_2\v{R}_1\v{v}\v{R}_1^{-1}\v{R}_2^{-1} = (\v{R}_2\v{R}_1)\v{v}(\v{R}_2\v{R}_1)^{-1}\) and we see that the combined rotor \(\v{R}_3 = \v{R}_2\v{R}_1\).

Of course this only works if the product of two rotors is again a rotor. In order to convince ourselves that this is the case we can just do the multiplication:

Geometric product of two 3D rotors

We’d like to verify that the product of two 3D rotors (each of which consist of one scalar component and 3 bivector components) is another rotor consisting of one scalar component and 3 bivector components.

Say we have two rotors: \[ \v{S} = s_0 + s_{12}\v{e_{12}} + s_{23}\v{e_{23}} + s_{31}\v{e_{31}} \\ \v{T} = t_0 + t_{12}\v{e_{12}} + t_{23}\v{e_{23}} + t_{31}\v{e_{31}} \\ \]

We just multiply them out as usual:

\[\begin{aligned} \v{ST} &= (s_0 + s_{12}\v{e_{12}} + s_{23}\v{e_{23}} + s_{31}\v{e_{31}})(t_0 + t_{12}\v{e_{12}} + t_{23}\v{e_{23}} + t_{31}\v{e_{31}}) \\ \v{ST} &= (s_0)(t_0 + t_{12}\v{e_{12}} + t_{23}\v{e_{23}} + t_{31}\v{e_{31}}) \\ &+ (s_{12}\v{e_{12}})(t_0 + t_{12}\v{e_{12}} + t_{23}\v{e_{23}} + t_{31}\v{e_{31}}) \\ &+ (s_{23}\v{e_{23}})(t_0 + t_{12}\v{e_{12}} + t_{23}\v{e_{23}} + t_{31}\v{e_{31}}) \\ &+ (s_{31}\v{e_{31}})(t_0 + t_{12}\v{e_{12}} + t_{23}\v{e_{23}} + t_{31}\v{e_{31}}) \\ \v{ST} &= (s_0 t_0) + (s_0 t_{12}\v{e_{12}}) + (s_0 t_{23}\v{e_{23}}) + (s_0 t_{31}\v{e_{31}}) \\ &+ (s_{12}\v{e_{12}} t_0) + (s_{12}\v{e_{12}}t_{12}\v{e_{12}}) + (s_{12}\v{e_{12}}t_{23}\v{e_{23}}) + (s_{12}\v{e_{12}}t_{31}\v{e_{31}}) \\ &+ (s_{23}\v{e_{23}}t_0) + (s_{23}\v{e_{23}}t_{12}\v{e_{12}}) + (s_{23}\v{e_{23}}t_{23}\v{e_{23}}) + (s_{23}\v{e_{23}}t_{31}\v{e_{31}}) \\ &+ (s_{31}\v{e_{31}}t_0) + (s_{31}\v{e_{31}}t_{12}\v{e_{12}}) + (s_{31}\v{e_{31}}t_{23}\v{e_{23}}) + (s_{31}\v{e_{31}}t_{31}\v{e_{31}}) \\ \v{ST} &= (s_0 t_0) + (s_0 t_{12}\v{e_{12}}) + (s_0 t_{23}\v{e_{23}}) + (s_0 t_{31}\v{e_{31}}) \\ &+ (s_{12}t_0\v{e_{12}}) + (s_{12}t_{12}\v{e_{12}}\v{e_{12}}) + (s_{12}t_{23}\v{e_{12}}\v{e_{23}}) + (s_{12}t_{31}\v{e_{12}}\v{e_{31}}) \\ &+ (s_{23}t_0\v{e_{23}}) + (s_{23}t_{12}\v{e_{23}}\v{e_{12}}) + (s_{23}t_{23}\v{e_{23}}\v{e_{23}}) + (s_{23}t_{31}\v{e_{23}}\v{e_{31}}) \\ &+ (s_{31}t_0\v{e_{31}}) + (s_{31}t_{12}\v{e_{31}}\v{e_{12}}) + (s_{31}t_{23}\v{e_{31}}\v{e_{23}}) + (s_{31}t_{31}\v{e_{31}}\v{e_{31}}) \\ \v{ST} &= s_0 t_0 + s_0 t_{12}\v{e_{12}} + s_0 t_{23}\v{e_{23}} + s_0 t_{31}\v{e_{31}} \\ &+ s_{12}t_0\v{e_{12}} - s_{12}t_{12} - s_{12}t_{23}\v{e_{31}} + s_{12}t_{31}\v{e_{23}} \\ &+ s_{23}t_0\v{e_{23}} + s_{23}t_{12}\v{e_{31}} - s_{23}t_{23} - s_{23}t_{31}\v{e_{12}} \\ &+ s_{31}t_0\v{e_{31}} - s_{31}t_{12}\v{e_{23}} + s_{31}t_{23}\v{e_{12}} - s_{31}t_{31} \\ \v{ST} &= (s_0 t_0 - s_{12}t_{12} - s_{23}t_{23} - s_{31}t_{31}) \\ &+ (s_0 t_{12} + s_{12}t_0 - s_{23}t_{31} + s_{31}t_{23})\v{e_{12}} \\ &+ (s_0 t_{23} + s_{12}t_{31} + s_{23}t_0 - s_{31}t_{12})\v{e_{23}} \\ &+ (s_0 t_{31} - s_{12}t_{23} + s_{23}t_{12} + s_{31}t_0)\v{e_{31}} \\ \end{aligned}\]

So clearly \(\v{ST}\) has only scalar and bivector components, and we can use the product as a new rotor.

This multiplication also translates fairly directly into code:

rotor3 combine(rotor3 lhs, rotor3 rhs)
{
    rotor3 result = {};
    result.scalar = lhs.scalar*rhs.scalar - lhs.xy*rhs.xy - lhs.yz*rhs.yz - lhs.zx*rhs.zx;
    result.xy = lhs.scalar*rhs.xy + lhs.xy*rhs.scalar - lhs.yz*rhs.zx + lhs.zx*rhs.yz;
    result.yz = lhs.scalar*rhs.yz + lhs.xy*rhs.zx + lhs.yz*rhs.scalar - lhs.zx*rhs.xy;
    result.zx = lhs.scalar*rhs.zx - lhs.xy*rhs.yz + lhs.yz*rhs.xy + lhs.zx*rhs.scalar;
    return result;
}

How do I invert or reverse a rotor to produce the same rotation in the opposite direction?

Since the rotor produced by the geometric product of vectors \(\v{ba}\) is a rotation in the plane formed by those two vectors, by twice the angle between those vectors (in the direction from a to b), we can produce a rotation in the same plane by the same angle in the opposite direction by just swapping \(\v{a}\) and \(\v{b}\) to get: \(\v{R}^\prime = \v{ab} = \v{a \cdot b + a \wedge b} = \v{b \cdot a - b \wedge a}\) which we can produce with very little computation from \(\v{R}\) by just negating the bivector components:

rotor3 reverse(rotor3 r)
{
    rotor3 result = {};
    result.scalar = r.scalar,
    result.xy = -r.xy;
    result.yz = -r.yz;
    result.zx = -r.zx;
    return result;
}

Given a particular rotor, how do I actually apply it to a vector directly?

Earlier when we showed how to apply a rotor, we did it in two steps as two separate reflection calculations. While mathematically equivalent, this requires that we store the vectors that make up our rotor (rather than just the scalar & bivector components) and requires us to do far more arithmetic. Instead we’ll sandwich the input vector directly with the entire, pre-computed rotor:

Direct rotor sandwich

Let \(\v{R} = r_0 + r_{12}\v{e_{12}} + r_{23}\v{e_{23}} + r_{31}\v{e_{31}}\) and \(\v{v} = v_1\v{e_1} + v_2\v{e_2} + v_3\v{e_3}\).

Now \(\v{R = ba}\), for vectors \(\v{a}\) and \(\v{b}\) so by equation 5 we have that \(\v{R^{-1} = (ba)^{-1} = a^{-1}b^{-1}}\) and equation 4 gives us \(\v{R^{-1}} = \frac{1}{|a||b|}\v{ab}\).

Our full sandwich product is therefore:

\[ \v{v^\prime} = \v{RvR}^{-1} = \v{(ba)v}\left(\frac{1}{|a||b|}\v{ab}\right) = \frac{1}{|a||b|}\v{(ba)v(ab)} \]

To keep our equations a little shorter, let’s start by just computing the first product \(\v{S = Rv}\):

\[\begin{align*} \v{S} =~& \v{Rv} = \v{(ba)v} \\ =~& (r_0 + r_{12}\v{e_{12}} + r_{23}\v{e_{23}} + r_{31}\v{e_{31}})(v_1\v{e_1} + v_2\v{e_2} + v_3\v{e_3}) \\ =~& r_0(v_1\v{e_1} + v_2\v{e_2} + v_3\v{e_3}) \\ & + r_{12}\v{e_{12}}(v_1\v{e_1} + v_2\v{e_2} + v_3\v{e_3}) \\ & + r_{23}\v{e_{23}}(v_1\v{e_1} + v_2\v{e_2} + v_3\v{e_3}) \\ & + r_{31}\v{e_{31}}(v_1\v{e_1} + v_2\v{e_2} + v_3\v{e_3}) \\ =~& r_0v_1\v{e_1} + r_0v_2\v{e_2} + r_0v_3\v{e_3} \\ & + r_{12}v_1\v{e_{12}}\v{e_1} + r_{12}v_2\v{e_{12}}\v{e_2} + r_{12}v_3\v{e_{12}}\v{e_3} \\ & + r_{23}v_1\v{e_{23}}\v{e_1} + r_{23}v_2\v{e_{23}}\v{e_2} + r_{23}v_3\v{e_{23}}\v{e_3} \\ & + r_{31}v_1\v{e_{31}}\v{e_1} + r_{31}v_2\v{e_{31}}\v{e_2} + r_{31}v_3\v{e_{31}}\v{e_3} \\ =~& r_0v_1\v{e_1} + r_0v_2\v{e_2} + r_0v_3\v{e_3} \\ & - r_{12}v_1\v{e_2} + r_{12}v_2\v{e_1} + r_{12}v_3\v{e_{123}} \\ & + r_{23}v_1\v{e_{123}} - r_{23}v_2\v{e_3} + r_{23}v_3\v{e_2} \\ & + r_{31}v_1\v{e_3} + r_{31}v_2\v{e_{123}} - r_{31}v_3\v{e_1} \\ =~& (r_0v_1 + r_{12}v_2 - r_{31}v_3)\v{e_1} \\ & + (r_0v_2 - r_{12}v_1 + r_{23}v_3)\v{e_2} \\ & + (r_0v_3 - r_{23}v_2 + r_{31}v_1)\v{e_3} \\ & + (r_{12}v_3 + r_{23}v_1 + r_{31}v_2)\v{e_{123}} \\ \end{align*}\]

To compute the final product we’ll write our calculations in terms of \(\v{S}\) rather than in terms of \(\v{R}\) and \(\v{v}\). This makes the equations shorter and also translates more easily into code.

Before we can do that though, we need a value for \(\v{ab}\). Since we already have \(\v{ba}\) in our original definition of \(\v{R}\), we can save ourselves having to compute \(\v{ab}\) by realising that its dot product term is commutative while its wedge product term is anti-commutative (equation 2), so we can produce one from the other just by negating the bivector component:

\[\v{ab} = r_0 - r_{12}\v{e_{12}} - r_{23}\v{e_{23}} - r_{31}\v{e_{31}}\]

Now we can complete the calculation:

\[\begin{align*} \v{v^\prime} =~& \v{(ba)v}\left(\frac{1}{|a||b|}\v{ab}\right) = \frac{1}{|a||b|}\v{S(ab)} \\ =~& \frac{1}{|a||b|} [ \\ & s_1\v{e_1}(r_0 - r_{12}\v{e_{12}} - r_{23}\v{e_{23}} - r_{31}\v{e_{31}}) \\ & + s_2\v{e_2}(r_0 - r_{12}\v{e_{12}} - r_{23}\v{e_{23}} - r_{31}\v{e_{31}}) \\ & + s_3\v{e_3}(r_0 - r_{12}\v{e_{12}} - r_{23}\v{e_{23}} - r_{31}\v{e_{31}}) \\ & + s_{123}\v{e_{123}}(r_0 - r_{12}\v{e_{12}} - r_{23}\v{e_{23}} - r_{31}\v{e_{31}}) \\ & ] \\ =~& \frac{1}{|a||b|} [ \tag{multiply out} \\ & s_1r_0\v{e_1} - s_1r_{12}\v{e_1}\v{e_{12}} - s_1r_{23}\v{e_1}\v{e_{23}} - s_1r_{31}\v{e_1}\v{e_{31}} \\ & + s_2r_0\v{e_2} - …


Article truncated for RSS feed. Read the full article at https://jacquesheunis.com/post/rotors/

]]>
https://jacquesheunis.com/post/rotors/ hacker-news-small-sites-43234510 Sun, 02 Mar 2025 20:10:55 GMT
<![CDATA[The Era of Solopreneurs Is Here]]> thread link) | @QueensGambit
March 2, 2025 | https://manidoraisamy.com/developer-forever/post/the-era-of-solopreneurs-is-here.anc-52867368-2029-4dc5-a7da-ece853a648b5.html | archive.org

DeepSeek just dropped a bombshell: $200M in annual revenue with a 500%+ profit margin—all while charging 25x less than OpenAI. But DeepSeek didn’t just build another AI model. They wrote their own parallel file system (3FS) to optimize costs—something that would have been unthinkable for a company of their size. This was possible because AI helped write the file system. Now, imagine what will happen in a couple of years—AI will be writing code, optimizing infrastructure, and even debugging itself. An engineer with AI tool can now outbuild a 100-person engineering team.

Disappearing Pillars


For years, the freemium business model, cloud computing, and AI have been converging. First, the internet killed the need for sales teams (distribution moved online). Then, serverless computing eliminated IT teams (AWS, Firebase, you name it). And now, AI is breaking the last barrier—software development itself. This shift has been happening quietly for 15 years, but AI is the final straw that breaks the camel’s back.

This kind of disruption was previously limited to narrow consumer products like WhatsApp, where a 20-member team built a product that led to a $19 billion exit. But now, the same thing is happening in business applications that requires breadth. AI will be able to build complex applications that were previously impossible for small teams. Take our own experience: Neartail competes with Shopify and Square, and it’s built by one person. Formfacade is a CRM that competes with HubSpot—built by one person. A decade ago, this wouldn’t have been possible for us. But today, AI handles onboarding, customer support, and even parts of development itself. So, what does this mean for SaaS? It won’t disappear, but it’s about to get a whole lot leaner.

Double Threat to Big Companies

For large incumbents, this shift isn’t just about new competition—it’s a fundamental restructuring of how software businesses operate. They face a double threat:

  1. They must cut down their workforce, even if employees are highly skilled, creating a moral dilemma.
  2. They have to rebuild their products from scratch for the AI era - a challenge for elephants that can't dance.

Look at payments, for example. Stripe charges 3% per transaction. We’re rolling out 2% fees for both payments and order forms because we use AI to read the seller’s SMS and automate the payment processing. It won’t hurt Stripe now—they make billions off Shopify’s transaction fees alone. But it’s a slow rug pull. First, AI-first companies like us will nibble away Shopify's revenue. Then, a few will break through and topple Shopify. And, only then will incumbents like Stripe feel the pinch as a second order effect.

Summary

This is a massive opportunity for startups right now. While the giants are trapped in their own complexity, nimble teams can build and launch AI-native solutions that directly challenge established players. Target a bloated SaaS vertical, rebuild it from the ground up with AI at its core, and position it as the next-generation alternative.

For example, the future of CRM isn’t just software—it’s software + sales team. Startups that don’t want to hire salespeople will eagerly adopt AI-driven CRMs that automate outreach, and follow-ups. Meanwhile, large companies will hesitate to fire their sales teams or switch from legacy CRMs due to vendor lock-in. But over time, startups using AI-native CRMs will scale into large companies themselves, forcing the laggards to transition or fall behind.

This is why we say, “The future is here, but not evenly distributed.” The AI-native solutions of today will become the default for the next wave of large enterprises. The opportunity isn’t in building software for existing companies—it’s in building it for the startups that will replace them. For founders starting companies today, this is Day Zero in the truest sense. The AI-native companies being built now are the ones that will define the next decade of SaaS. It’s not just disruption—it’s a complete reset.

]]>
https://manidoraisamy.com/developer-forever/post/the-era-of-solopreneurs-is-here.anc-52867368-2029-4dc5-a7da-ece853a648b5.html hacker-news-small-sites-43232999 Sun, 02 Mar 2025 17:52:48 GMT
<![CDATA[Speedrunners are vulnerability researchers, they just don't know it yet]]> thread link) | @chc4
March 2, 2025 | https://zetier.com/speedrunners-are-vulnerability-researchers/ | archive.org

/nl_img1

Thousands of video game enthusiasts are developing experience in the cybersecurity industry by accident. They have a fun hobby, pouring over the details of their favorite games, and they don't know they could be doing something very similar… by becoming a vulnerability researcher.

That probably requires some backstory, especially from a cybersecurity company's blog!

What's a speedrun?

Basically as soon as video games were released, people have been trying to beat them faster than their friends (or enemies) can. Gamers will do this for practically any game on the planet – but the most popular games, or the ones with the most cultural weight and cultish following, naturally end up with the fiercest competition. Speedrunners will run through their favorite game hundreds or thousands of times in order to get to get to the top of community-driven leaderboards for the fastest time… which puts incentives on that video game's community to find the absolute fastest way to clear the game, no matter how strange.

"Any percent" speedruns, or "any%" more commonly, are usually one of the most popular categories of speedrun for any given game. In it, all rules are off and no weird behavior is disallowed: intentionally triggering bugs in the game, which the developers never intended for the players to be able to perform, often have the potential to shave double-digit percentages of time off existing routes by cutting out entire swathes of the game from having to be played at all. Why do 1 -> 2 -> 3 if you can do a cool trick and skip from 1 -> 3 directly?

A lot of these glitches revolve around extremely precise movement… but for the most dedicated fans, they'll go even further.

Glitch hunting is reverse engineering

Entire groups will spring up inside a game's speedrunning community dedicated to discovering new glitches, and oftentimes they'll apply engineering to it.

These enthusiasts won't just try weird things in the game over and over (although that definitely helps!) – they'll use tools that are standard in the cybersecurity industry to pull apart how software works internally, such as IDA Pro or Ghidra, to discover exactly what makes their target video game tick. On top of static analysis, they'll leverage dynamic analysis as well: glitch hunters will use dynamic introspection and debugging tools, like the Dolphin Emulator’s memory viewer or Cheat Engine, to get a GDB-like interface for figuring out the program's internal data structures and how information is recorded.

And even further, they'll develop entirely new tooling: I've seen groups like the Paper Mario: The Thousand Year Door community reverse engineer game file formats and create Ghidra program loaders, or other groups completely re-implement Ghidra disassembled code in C so they can stick it under a fuzzer in isolation. Some of the speedrun glitch hunters are incredibly technically competent, using the exact same tooling and techniques that people in the cybersecurity industry use for reverse engineering every day.

…And it’s vulnerability research

Not only do these groups do reverse engineering, but they also are doing vulnerability research. Remember, they don't only try to figure out how games work, but they try to break the game in any way possible. These glitches end up looking stunningly similar to how memory corruption exploits work for any other computer program: they'll find buffer overflows, use-after-frees, and incorrect state machine transitions in their target games.

And perhaps most impressively, they'll productize their exploits, unlike a lot of people in the cybersecurity industry. Some vulnerability researchers will develop a proof-of-concept to demonstrate a bug – but never actually develop the technical chops on how that exploit would need to be developed further for an adversary to use it. They might intellectually know how to weaponize a buffer overflow, or a use-after-free, but speedrunning groups by necessity are actually doing it. Oftentimes, actually using these glitches requires working through extremely restrictive constraints, both for what inputs they have control over and what parts of the program they can influence.

Super Mario World runners will place items in extremely precise locations so that the X,Y coordinates form shellcode they can jump to with a dangling reference. Legend of Zelda: Ocarina of Time players will do heap grooming and write a function pointer using the IEEE-754 floating point number bit representation so the game “wrong warps” directly to the end credit sequence... with nothing more than a game controller and a steady hand.

Screenshot from an in-depth technical explanation of a Super Mario 64 glitch. Watch on YouTube.

Some of the game communities will even take it a step further! Tool-assisted speedruns, or "TAS" runs, will perform glitches so precise that they can't reliably be performed by human beings at all. They'll leverage frame-by-frame input recordings in order to hit the right angle on a game controller's stick, every time; they'll hit buttons on the exact video game tick, every time.

And because they have such precise control over their actions in games, they'll likewise be able to consider game glitches with exacting precision. TAS authors are able to leverage inspecting the video game with memory debuggers to craft a use-after-free with the perfect heap spray, or write multiple stages of shellcode payload in their player inventory with button presses.

There's even an entire event at the most popular speedrunning conference, Awesome Games Done Quick/AGDQ, called "TASbot." During it, a robot does all the inputs via a hard-wired controller to perform a tool-assisted speedrun in real time – so it can do things like get arbitrary code execution and use that to replace the video game with an entirely new one, using nothing but controller inputs.

An industry exists!

The fact these people are so technically competent only throws in stark relief how disconnected some of them are from the larger cybersecurity industry. Speedrun glitch hunters will develop heap use-after-free exploits, with accompanied technical write-ups on the level of Google Project Zero… and in doing so, refer to it as an "item storage" glitch, because they developed the knowledge from first principles without ever reading a Phrack article. They'll re-implement disassembled code from Ghidra in C for automated glitch discovery, but without any exposure to American Fuzzy Lop or the large academic body of work driving fuzzer research.

And, critically for us here at Zetier, they don't know you can get paid to do a job very close to finding video game glitches, and so they don't know to apply to our reverse engineering or vulnerability research job postings. A lot of these video game glitch hunters, even the ones writing novel Ghidra loaders or runtime memory analysis scripts, don't think of what they're doing as anything more than a fun hobby; they might go become a normal software engineer, if that. Some of them will look up "IDA Pro" on LinkedIn and see a million malware analysis job postings. No offense to my malware analysis friends, but malware reverse engineering and vulnerability research are two very different roles!

Vulnerability research in industry, unlike more “normal” malware analysis jobs, is usually in the form of an engineer spending significant time investigating exactly how a program works. Like video game glitch discovery, they don’t just care about what it does, but how it does it – and why the authors implemented it in that way, along with how that behavior may affect other parts of the program. Oftentimes, you end up building up a repertoire of small, innocuous “huh that’s weird”-style bugs that are individually useless… until you find some missing piece. And like game glitches, the most satisfying of discoveries on the job are from realizations that there’s a fundamental gap in thinking by the authors, where you don’t just find one glitch but an entire family of glitches, all from the same root cause.

A glimpse of an arbitrary code execution (ACE) exploit walk-through. See the video.

I personally love reading the technical game glitch write-ups that come out of speedrunning communities. Lots of my coworkers, and other people in the industry, likewise enjoy them. I love glitch write-ups because they remind me of the great parts of my job: extremely deep dives into the internals of how programs work, and working around odd constraints. Exploiting vulnerabilities requires performing mental gymnastics in order to chain seemingly innocuous primitives, like walking around out-of-bounds in Pokemon, together to do the thing in a way that allows the author to express their creativity and mastery over a piece of software.

Talking to people in speedrunning communities who love pouring over Assembly, or figuring out exactly what the implications are for a 1-byte buffer overflow in a textbox, only for them to shrug and explain they're reluctantly working in a non-technical industry, comes across to me as a shame. If any of these descriptions speak to you, or bring to mind one of your friends, reach out to hello@zetier.com. We'd love to chat.

Let the interwebs know that vulnerability researchers exist:

Share this on HackerNews

]]>
https://zetier.com/speedrunners-are-vulnerability-researchers/ hacker-news-small-sites-43232880 Sun, 02 Mar 2025 17:40:36 GMT
<![CDATA[Falsehoods programmers believe about languages (localization)]]> thread link) | @zdw
March 2, 2025 | https://www.lexiconista.com/falsehoods-about-languages/ | archive.org

This is what we have to put up with in the software localisation industry.

I can’t believe nobody has done this list yet. I mean, there is one about names, one about time and many others on other topics, but not one about languages yet (except one honorable mention that comes close). So, here’s my attempt to list all the misconceptions and prejudices I’ve come across in the course of my long and illustrious career in software localisation and language technology. Enjoy – and send me your own ones!


  • Sentences in all languages can be templated as easily as in English: {user} is in {location} etc.

  • Words that are short in English are short in other languages too.

  • For any text in any language, its translation into any other language is approximately as long as the original.

  • For every lower-case character, there is exactly one (language-independent) upper-case character, and vice versa.

  • The lower-case/upper-case distinction exists in all languages.

  • All languages have words for exactly the same things as English.

  • Every expression in English, however vague and out-of-context, always has exactly one translation in every other language.

  • All languages follow the subject-verb-object word order.

  • When words are to be converted into Title Case, it is always the first character of the word that needs to be capitalized, in all languages.

  • Every language has words for yes and no.

  • In each language, the words for yes and no never change, regardless of which question they are answering.

  • There is always only one correct way to spell anything.

  • Each language is written in exactly one alphabet.

  • All languages (that use the Latin alphabet) have the same alphabetical sorting order.

  • All languages are written from left to right.

  • Even in languages written from right to left, the user interface still “flows” from left to right.

  • Every language puts spaces between words.

  • Segmenting a sentence into words is as easy as splitting on whitespace (and maybe punctuation).

  • Segmenting a text into sentences is as easy as splitting on end-of-sentence punctuation.

  • No language puts spaces before question marks and exclamation marks at the end of a sentence.

  • No language puts spaces after opening quotes and before closing quotes.

  • All languages use the same characters for opening quotes and closing quotes.

  • Numbers, when written out in digits, are formatted and punctuated the same way in all languages.

  • No two languages are so similar that it would ever be difficult to tell them apart.

  • Languages that have similar names are similar.

  • Icons that are based on English puns and wordplay are easily understood by speakers of other languages.

  • Geolocation is an accurate way to predict the user’s language.

  • Country flags are accurate and appropriate symbols for languages.

  • Every country has exactly one “national” language.

  • Every language is the “national” language of exactly one country.

]]>
https://www.lexiconista.com/falsehoods-about-languages/ hacker-news-small-sites-43232841 Sun, 02 Mar 2025 17:36:20 GMT
<![CDATA[New battery-free technology can power devices using ambient RF signals]]> thread link) | @ohjeez
March 2, 2025 | https://news.nus.edu.sg/nus-researchers-develop-new-battery-free-technology/ | archive.org

In a breakthrough for green energy, researchers demonstrated a novel technique to efficiently convert ambient radiofrequency signals into DC voltage that can power electronic devices and sensors, enabling battery-free operation.

Ubiquitous wireless technologies like Wi-Fi, Bluetooth, and 5G rely on radio frequency (RF) signals to send and receive data. A new prototype of an energy harvesting module – developed by a team led by scientists from the National University of Singapore (NUS) – can now convert ambient or ‘waste’ RF signals into direct current (DC) voltage. This can be used to power small electronic devices without the use of batteries.

RF energy harvesting technologies, such as this, is essential as they reduce battery dependency, extend device lifetimes, minimise environmental impact, and enhance the feasibility of wireless sensor networks and IoT devices in remote areas where frequent battery replacement is impractical.

However, RF energy harvesting technologies face challenges due to low ambient RF signal power (typically less than -20 dBm), where current rectifier technology either fails to operate or exhibits a low RF-to-DC conversion efficiency. While improving antenna efficiency and impedance matching can enhance performance, this also increases on-chip size, presenting obstacles to integration and miniaturisation.

To address these challenges, a team of NUS researchers, working in collaboration with scientists from Tohoku University (TU) in Japan and University of Messina (UNIME) in Italy, has developed a compact and sensitive rectifier technology that uses nanoscale spin-rectifier (SR) to convert ambient wireless radio frequency signals at power less than -20 dBm to a DC voltage.

The team optimised SR devices and designed two configurations: 1) a single SR-based rectenna operational between -62 dBm and -20 dBm, and 2) an array of 10 SRs in series achieving 7.8% efficiency and zero-bias sensitivity of approximately 34,500 mV/mW. Integrating the SR-array into an energy harvesting module, they successfully powered a commercial temperature sensor at -27 dBm.

“Harvesting ambient RF electromagnetic signals is crucial for advancing energy-efficient electronic devices and sensors. However, existing Energy Harvesting Modules face challenges operating at low ambient power due to limitations in existing rectifier technology,” explained Professor Yang Hyunsoo from the Department of Electrical and Computer Engineering at the NUS College of Design and Engineering, who spearheaded the project.

Prof Yang added, “For example, gigahertz Schottky diode technology has remained saturated for decades due to thermodynamic restrictions at low power, with recent efforts focused only on improving antenna efficiency and impedance-matching networks, at the expense of bigger on-chip footprints. Nanoscale spin-rectifiers, on the other hand, offer a compact technology for sensitive and efficient RF-to-DC conversion.”

Elaborating on the team’s breakthrough technology, Prof Yang said, “We optimised the spin-rectifiers to operate at low RF power levels available in the ambient, and integrated an array of such spin-rectifiers to an energy harvesting module for powering the LED and commercial sensor at RF power less than -20 dBm. Our results demonstrate that SR-technology is easy to integrate and scalable, facilitating the development of large-scale SR-arrays for various low-powered RF and communication applications.”

The experimental research was carried out in collaboration with Professor Shunsuke Fukami and his team from TU, while the simulation was carried out by Professor Giovanni Finocchio from UNIME. The results were published in the prestigious journal, Nature Electronics, on 24 July 2024.

Spin-rectifier-based technology for the low-power operation

State-of-the-art rectifiers (Schottky diodes, tunnel diodes and two-dimensional MoS2), have reached efficiencies of 40–70% at Prf ≥ -10 dBm. However, the ambient RF power available from the RF sources such as Wi-Fi routers is less than -20 dBm. Developing high-efficiency rectifiers for low-power regimes (Prf < -20 dBm) is difficult due to thermodynamic constraints and high-frequency parasitic effects. Additionally, on-chip rectifiers require an external antenna and impedance-matching circuit, impeding on-chip scaling. Therefore, designing a rectifier for an Energy Harvesting Module (EHM) that is sensitive to ambient RF power with a compact on-chip design remains a significant challenge.

The nanoscale spin-rectifiers can convert the RF signal to a DC voltage using the spin-diode effect. Although the SR-based technology surpassed the Schottky diode sensitivity, the low-power efficiency is still low (< 1%). To overcome the low-power limitations, the research team studied the intrinsic properties of SR, including the perpendicular anisotropy, device geometry, and dipolar field from the polarizer layer, as well as the dynamic response, which depends on the zero-field tunnelling magnetoresistance and voltage-controlled magnetic anisotropy (VCMA). Combining these optimised parameters with the external antenna impedance-matched with a single SR, the researcher designed ultralow power SR-rectenna.

To improve output and achieve on-chip operation, the SRs were coupled in an array arrangement, with the small co-planar waveguides on the SRs employed to couple RF power, resulting in compact on-chip area and high efficiency. One of the key findings is that the self-parametric effect driven by well-known VCMA in magnetic tunnel junctions-based spin-rectifiers significantly contributes to the low-power operation of SR-arrays, while also enhancing their bandwidth and rectification voltage. In a comprehensive comparison with Schottky diode technology in the same ambient situation and from previous literature assessment, the research team discovered that SR-technology might be the most compact, efficient, and sensitive rectifier technology.

Commenting on the significance of their results, Dr Raghav Sharma, the first author of the paper, shared, “Despite extensive global research on rectifiers and energy harvesting modules, fundamental constraints in rectifier technology remain unresolved for low ambient RF power operation. Spin-rectifier technology offers a promising alternative, surpassing current Schottky diode efficiency and sensitivity in low-power regime. This advancement benchmarks RF rectifier technologies at low power, paving the way for designing next-generation ambient RF energy harvesters and sensors based on spin-rectifiers.”

Next steps

The NUS research team is now exploring the integration of an on-chip antenna to improve the efficiency and compactness of SR technologies. The team is also developing series-parallel connections to tune impedance in large arrays of SRs, utilising on-chip interconnects to connect individual SRs. This approach aims to enhance the harvesting of RF power, potentially generating a significant rectified voltage of a few volts, thus eliminating the need for a DC-to-DC booster.

The researchers also aim to collaborate with industry and academic partners for the advancement of self-sustained smart systems based on on-chip SR rectifiers. This could pave the way for compact on-chip technologies for wireless charging and signal detection systems.

]]>
https://news.nus.edu.sg/nus-researchers-develop-new-battery-free-technology/ hacker-news-small-sites-43232724 Sun, 02 Mar 2025 17:25:49 GMT
<![CDATA[An ode to TypeScript enums]]> thread link) | @disintegrator
March 2, 2025 | https://blog.disintegrator.dev/posts/ode-to-typescript-enums/ | archive.org

It’s official, folks. TypeScript 5.8 is out bringing with it the --erasableSyntaxOnly flag and the nail in the coffin for many of the near-primordial language features like Enums and Namespaces. Node.js v23 joined Deno and Bun in adding support for running TypeScript files withouth a build step. The one true limitation is that only files containing erasable TypeScript syntax are supported. Since Enums and Namespaces (ones holding values) violate that rule since they are transpiled to JavaScript objects. So the TypeScript team made it possible to ban those features with the new compiler flag and make it easy for folks to ensure their TS code is directly runnable.

But the issues with Enums didn’t start here. Over last few years, prominent TypeScript content creators have been making the case against enums on social media, blog posts and short video essays. Let me stop here and say it out loud:

In almost all ways that matter, literal unions provide better ergonomics than enums and you should consider them first.

The problem is that, like the articles I linked to there and many others out there, these statements are not interested in making a case for some of the strengths of enums. While I maintain my position above, I want to spend a minute eulogizing an old friend. Remember, as const assertions, which were introduced in TypeScript 3.4, were necessary to supplant enums. That’s nearly 6 years of using enums since TypeScript 0.9!

Probably my favorite argument in steelmanning enums is that you can document their members and the documentation is available anywhere you are accessing them. This includes deprecating them which can so useful if you are building APIs that evolve over time.

enum PaymentMethod {

CreditCard = "credit-card",

DebitCard = "debig-card",

Bitcoin = "bitcoin",

/**

* Use an electronic check to pay your bills. Please note that this may take

* up to 3 business days to go through.

*

* @deprecated Checks will no longer be accepted after 2025-04-30

*/

Check = "check",

}

const method = PaymentMethod.Check;

There have been many instances where a union member’s value on its own is not perfectly self-explanatory or at least ambiguous when living alongside similar unions in a large codebase. The documentation has to be combined into the TSDoc comment of the union type which cannot reflect deprecations and is not shown when hovering over a union member.

type PaymentMethod =

| "credit-card"

| "debit-card"

| "bitcoin"

/**

* Use an electronic check to pay your bills. Please note that this may

* take up to 3 business days to go through.

*

* @deprecated Checks will no longer be accepted after 2025-04-30

*/

| "check";

const method: PaymentMethod = "check";

There are ways to get around this limitation where object literals with a const assertion are used but the reality is that these literals aren’t typically imported and used by users of a library. They tend to be built up by library authors to have an iterable/indexable mapping around when validating unknown values or to enumerate in a UI e.g. in error messages or to build a <select> dropdown.

There are a couple more quality of life features that enums possess but I’m choosing not to go through here. For me personally, the degraded inline documentation is by far the toughest pill to swallow in moving to literal unions and I wanted to focus on that. I’m really hoping the TypeScript team finds a way to support TSDoc on union members as the world moves away from enums.

]]>
https://blog.disintegrator.dev/posts/ode-to-typescript-enums/ hacker-news-small-sites-43232690 Sun, 02 Mar 2025 17:23:12 GMT
<![CDATA[Understanding Smallpond and 3FS]]> thread link) | @mritchie712
March 2, 2025 | https://www.definite.app/blog/smallpond | archive.org

March 2, 202510 minute read

Mike Ritchie

I didn't have "DeepSeek releases distributed DuckDB" on my 2025 bingo card.

You may have stumbled across smallpond from Twitter/X/LinkedIn hype. From that hype, you might have concluded Databricks and Snowflake are dead 😂. Not so fast. The reality is, although this is interesting and powerful open source tech, it's unlikely to be widely used in analytics anytime soon. Here's a concise breakdown to help you cut through the noise.

We'll cover:

  1. what smallpond and its companion, 3FS, are
  2. if they're suitable for your use case and if so
  3. how you can use them

What is smallpond?

smallpond is a lightweight, distributed data processing framework recently introduced by DeepSeek. It extends DuckDB (typically a single-node analytics database) to handle larger datasets across multiple nodes. smallpond enables DuckDB to manage distributed workloads by using a distributed storage and compute system.

Key features:

  • Distributed Analytics: Allows DuckDB to handle larger-than-memory datasets by partitioning data and running analytics tasks in parallel.
  • Open Source Deployment: If you can manage to get it running, 3FS would give you powerful and performant storage at a fraction of the cost of alternatives.
  • Manual Partitioning: Data is manually partitioned by users, and smallpond distributes these partitions across nodes for parallel processing.

What is 3FS?

3FS, or Fire-Flyer File System, is a high-performance parallel file system also developed by DeepSeek. It's optimized specifically for AI and HPC workloads, offering extremely high throughput and low latency by using SSDs and RDMA networking technology. 3FS is the high-speed, distributed storage backend that smallpond leverages to get it's breakneck performance. 3FS achieves a remarkable read throughput of 6.6 TiB/s on a 180-node cluster, which is significantly higher than many traditional distributed file systems.

How Can I Use It?

To start, same as any other python package, uv pip install smallpond. Remove uv if you like pain.

But to actually get the benefits of smallpond, it'll take much more work and depends largely on your data size and infrastructure:

  • Under 10TB: smallpond is likely unnecessary unless you have very specific distributed computing needs. A single-node DuckDB instance or simpler storage solutions will be simpler and possibly more performant. To be candid, using smallpond at a smaller scale, without Ray / 3FS is likely slower than vanilla DuckDB and a good bit more complicated.
  • 10TB to 1PB: smallpond begins to shine. You'd set up a cluster (see below) with several nodes, leveraging 3FS or another fast storage backend to achieve rapid parallel processing.
  • Over 1PB (Petabyte-Scale): smallpond and 3FS were explicitly designed to handle massive datasets. At this scale, you'd need to deploy a larger cluster with substantial infrastructure investments.

Deployment typically involves:

  1. Setting up a compute cluster (AWS EC2, Google Compute Engine, or on-prem).
  2. Deploying 3FS on nodes with high-performance SSDs and RDMA networking.
  3. Installing smallpond via Python to run distributed DuckDB tasks across your cluster.

Steps #1 and #3 are really easy. Step #2 is very hard. 3FS is new, so there's no guide on how you would set it up on AWS or any other cloud (maybe DeepSeek will offer this?). You could certainly deploy it on bare metal, but you'd be descending into a lower level of DevOps hell.

Note: if you're in the 95% of companies in the under 10TB bucket, you should really try Definite.

I experimented with running smallpond with S3 swapped in for 3FS here, but it's unclear what, if any, performance gains you'd get over scaling up a single node for moderate-sized data.

Is smallpond for me?

tl;dr: probably not.

Whether you'd want to use smallpond depends on several factors:

  • Your Data Scale: If your dataset is under 10TB, smallpond adds unnecessary complexity and overhead. For larger datasets, it provides substantial performance advantages.
  • Infrastructure Capability: smallpond and 3FS require significant infrastructure and DevOps expertise. Without a dedicated team experienced in cluster management, this could be challenging.
  • Analytical Complexity: smallpond excels at partition-level parallelism but is less optimized for complex joins. For workloads requiring intricate joins across partitions, performance might be limited.

How Smallpond Works (Under the Hood)

Lazy DAG Execution
Smallpond uses lazy evaluation for operations like map(), filter(), and partial_sql(). It doesn't run these immediately. Instead, it builds a logical execution plan as a directed acyclic graph (DAG), where each operation becomes a node (e.g., SqlEngineNode, HashPartitionNode, DataSourceNode).

Nothing actually happens until you trigger execution explicitly with actions like:

  • write_parquet() — Writes data to disk
  • to_pandas() — Converts results to a pandas DataFrame
  • compute() — Forces computation explicitly
  • count() — Counts rows
  • take() — Retrieves a subset of rows

This lazy evaluation is efficient because it avoids unnecessary computations and optimizes the workflow.

From Logical to Execution Plan
When you finally trigger an action, the logical plan becomes an execution plan made of specific tasks (e.g., SqlEngineTask, HashPartitionTask). These tasks are the actual work units distributed and executed by Ray.

Ray Core and Distribution
Smallpond’s distribution leverages Ray Core at the Python level, using partitions for scalability. Partitioning can be done manually, and Smallpond supports:

  • Hash partitioning (based on column values)
  • Even partitioning (by files or row counts)
  • Random shuffle partitioning

Each partition runs independently within its own Ray task, using DuckDB instances to process SQL queries. This tight integration with Ray emphasizes horizontal scaling (adding more nodes) rather than vertical scaling (larger, more powerful nodes). To use it at scale, you’ll need a Ray cluster. You can run one on your own infrastructure on a cloud provider (e.g. AWS), but if you just want to test this out, it'll be easier to get started with Anyscale (founded by Ray creators).

Conclusion

smallpond and 3FS offer powerful capabilities for scaling DuckDB analytics across large datasets. However, their complexity and infrastructure demands mean they're best suited for scenarios where simpler solutions no longer suffice. If you're managing massive datasets and already have robust DevOps support, smallpond and 3FS could significantly enhance your analytics capabilities. For simpler scenarios, sticking with a single-node DuckDB instance or using managed solutions remains your best option.

]]>
https://www.definite.app/blog/smallpond hacker-news-small-sites-43232410 Sun, 02 Mar 2025 17:00:30 GMT
<![CDATA[Why do we have both CSRF protection and CORS?]]> thread link) | @smagin
March 2, 2025 | https://smagin.fyi/posts/cross-site-requests/ | archive.org

Hello, Internet. I thought about cross-site requests and realised we have both CSRF protection and CORS and it doesn’t make sense from the first glance. It does generally, but I need a thousand words to make it so.

CSRF stands for Cross-Site Request Forgery. It was rather popular in the earlier internet but now it’s almost a non-issue thanks to standard prevention mechanisms built into most of popular web frameworks. The forgery is to make user click on a form that will send a cross-site request. The protection is to check that the request didn’t come from a third-party site.

CORS stands for Cross-Origin Resource Sharing. It’s a part of HTTP specification that describes how to permit certain cross-site requests. This includes preflight requests and response headers that state which origins are allowed to send requests.

So, by default, are cross-origin requests allowed and we need CSRF protection, or they are forbidden and we need CORS to allow them? The answer is both.

The default behaviour

The default behaviour is defined by Same-origin policy, and is enforced by browsers. The policy states that, generally speaking, cross-site writes are allowed, and cross-site reads are not. You can send a POST request by submitting a form, you browser won’t let you read the response of it.

There is a newer part of this spec that sort of solves CSRF. In 2019, there was an initiative to change default cookies behaviour. Before that, cookies were always sent in cross-site requests. The default was changed to not send cookies in cross-site POST requests. To do that, a new SameSite attribute for the set-cookie header was introduced. The attribute value to make the old default is None, and the new default would be Lax.

In 2025, 96% browsers support the SameSite attribute, and 75% support the new default. Notably, Safari haven’t adopted the default, and UCBrowser doesn’t support any nice things.

Sidenote: I can’t understand how UCBrowser remains relatively popular among users, given that there are settings in js builders to build for N% of the users and next to nobody puts 99% there.

Sidenote 2: Origin is not the same as Site. Origin is a combination of a scheme, a hostname, and a port. Site is a combination of scheme and effective top level domain + 1. Subdomains and ports don’t matter for sites.

Links: Same-origin policy | caniuse SameSite cookie attribute

CORS

CORS is a way to override the same origin policy per origin.

The spec describes a certain browser-server interaction. Browser sends preflight requests of type OPTIONS before actual requests, server replies with rules for the origin. The rules are in a form of response headers. The rules may specify if the reply can be read, if headers can be sent and received, allowed HTTP methods. Header names start with Access-Control. Browser then follows the rules.

CORS applies for several types of the requests:

  • js-initiated fetch and XMLHttpRequest
  • web fonts
  • webgl textures
  • images/video frames drawn to a canvas using drawImage
  • css shapes from images

What is notoriously not in this list is form submissions, otherwise known as simple requests. This is part of the internet being backward-compatible:

The motivation is that the <form> element from HTML 4.0 (which predates cross-site fetch() and XMLHttpRequest) can submit simple requests to any origin, so anyone writing a server must already be protecting against cross-site request forgery (CSRF). Under this assumption, the server doesn’t have to opt-in (by responding to a preflight request) to receive any request that looks like a form submission, since the threat of CSRF is no worse than that of form submission. However, the server still must opt-in using Access-Control-Allow-Origin to share the response with the script.

From the CORS page on MDN.

Question to readers: How is that in line with the SameSite initiative?

CSRF protection

So, cross-site write requests are allowed, but responses won’t be shared. At the same time, as website developers, we mostly don’t want to allow that.

The standard protection is to include into a write request a user-specific token available only on read:

  • for forms this token is put into a hidden input,
  • for js-initiated requests the token can be stored in a cookie or in a meta tag, and is put into params or request headers.

JS-initiated requests are not allowed cross-site by default anyway, but they are allowed same-site. Adding a csrf token into js requests lets us do the check the same way for all the requests.

This way we still depend on browser in a way that it still has to prevent responses to be read cross-site by default, but a bit less than if we were also reading something like Origin request header instead of checking for the token.

Question to readers: In some of the frameworks CSRF tokens are rotated. Why?

Browser is important

I want to emphasise how important browsers are in this whole security scheme. All the client state for all the sites is stored in browser, and it decides which parts to expose and when. It’s browsers that enforce Same-origin policy, it’s browsers that don’t let read responses if it’s not allowed by server. It’s browsers that decide if they adopt the new SameSite=Lax default. It’s browsers that implement CORS and send safe preflight requests before actual PATCH or DELETE.

We really have to trust browsers that we use.

Conclusion

What I learned

The internet will become more secure and maybe a bit less backward-compatible when the SameSite=Lax default will be adopted by 100% of the browsers. Until then, we will have to live with the situation where simple POST requests are special and allowed cross-site, while others fall into the CORS bucket.

Thanks Nikita Skazki for reviewing this post more times than I care to admit.

This post on Hackernews

Sources

  1. Same-origin policy
  2. caniuse SameSite cookie attribute
  3. OWASP CSRF cheatsheet
  4. CORS wiki with requirements
  5. CORS spec
  6. CORS on MDN
  7. Preflight request
  8. Origin request header
  9. Origin and Site
]]>
https://smagin.fyi/posts/cross-site-requests/ hacker-news-small-sites-43231411 Sun, 02 Mar 2025 15:32:46 GMT
<![CDATA[Elon Musk backs US withdrawal from NATO alliance]]> thread link) | @dtquad
March 2, 2025 | https://ukdefencejournal.org.uk/elon-musk-backs-us-withdrawal-from-nato-alliance/ | archive.org

Elon Musk, a key figure in President Donald Trump’s administration and head of the United States Department of Government Efficiency, has backed calls for the United States to leave the North Atlantic Treaty Organisation (NATO).

Musk voiced his support on X (formerly Twitter) on Saturday night when he responded “I agree” to a post stating, “It’s time to leave NATO and the UN.” His endorsement aligns with growing calls from some Republican lawmakers, including Senator Mike Lee, to reconsider the US commitment to the alliance.

Lee, a long-time critic of NATO, has described it as a “Cold War relic” and argued that the alliance “has to come to a halt.” He claims NATO is a “great deal for Europe” but a “raw deal for America”, suggesting that US resources are being stretched to protect Europe while offering little direct benefit to American security.

Musk’s comments come amid broader discussions within the Trump administration over the future of America’s role in NATO and international alliances.

While Trump has not explicitly stated his intent to withdraw from NATO, he has repeatedly pressured European nations to increase their defence spending, warning that the US should not bear the financial burden of the alliance alone.

As a key figure in the administration, Musk’s influence on Trump’s policy decisions is significant. His endorsement of a NATO withdrawal could signal growing momentum within the White House for a shift towards a more isolationist foreign policy, focusing on domestic defence priorities over international commitments.

With the war in Ukraine ongoing and NATO playing a critical role in supplying military aid, any US withdrawal would drastically reshape the global security landscape. European leaders have already expressed concerns over Trump’s stance on NATO, particularly as the alliance works to counter Russian aggression and maintain stability in Eastern Europe.

Despite Musk and Lee’s calls for withdrawal, Trump has continued to engage with NATO leaders, recently hosting UK Prime Minister Keir Starmer in Washington for discussions on European security. However, with Trump’s administration pushing for major shifts in US foreign policy, NATO’s future role in American defence strategy remains uncertain.

]]>
https://ukdefencejournal.org.uk/elon-musk-backs-us-withdrawal-from-nato-alliance/ hacker-news-small-sites-43230324 Sun, 02 Mar 2025 13:31:26 GMT
<![CDATA[Crossing the uncanny valley of conversational voice]]> thread link) | @monroewalker
March 1, 2025 | https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice | archive.org

February 27, 2025

Brendan Iribe, Ankit Kumar, and the Sesame team

How do we know when someone truly understands us? It is rarely just our words—it is in the subtleties of voice: the rising excitement, the thoughtful pause, the warm reassurance.

Voice is our most intimate medium as humans, carrying layers of meaning through countless variations in tone, pitch, rhythm, and emotion.

Today’s digital voice assistants lack essential qualities to make them truly useful. Without unlocking the full power of voice, they cannot hope to effectively collaborate with us. A personal assistant who speaks only in a neutral tone has difficulty finding a permanent place in our daily lives after the initial novelty wears off.

Over time this emotional flatness becomes more than just disappointing—it becomes exhausting.

Achieving voice presence

At Sesame, our goal is to achieve “voice presence”—the magical quality that makes spoken interactions feel real, understood, and valued. We are creating conversational partners that do not just process requests; they engage in genuine dialogue that builds confidence and trust over time. In doing so, we hope to realize the untapped potential of voice as the ultimate interface for instruction and understanding.

Key components

  • Emotional intelligence: reading and responding to emotional contexts.
  • Conversational dynamics: natural timing, pauses, interruptions and emphasis.
  • Contextual awareness: adjusting tone and style to match the situation.
  • Consistent personality: maintaining a coherent, reliable and appropriate presence.

We’re not there yet

Building a digital companion with voice presence is not easy, but we are making steady progress on multiple fronts, including personality, memory, expressivity and appropriateness. This demo is a showcase of some of our work in conversational speech generation. The companions shown here have been optimized for friendliness and expressivity to illustrate the potential of our approach.

Conversational voice demo

1. Microphone permission is required. 2. Calls are recorded for quality review but not used for ML training and are deleted within 30 days. 3. By using this demo, you are agreeing to our Terms of Use and Privacy Policy. 4. We recommend using Chrome (Audio quality may be degraded in iOS/Safari 17.5).

Technical post

Authors

Johan Schalkwyk, Ankit Kumar, Dan Lyth, Sefik Emre Eskimez, Zack Hodari, Cinjon Resnick, Ramon Sanabria, Raven Jiang

To create AI companions that feel genuinely interactive, speech generation must go beyond producing high-quality audio—it must understand and adapt to context in real time. Traditional text-to-speech (TTS) models generate spoken output directly from text but lack the contextual awareness needed for natural conversations. Even though recent models produce highly human-like speech, they struggle with the one-to-many problem: there are countless valid ways to speak a sentence, but only some fit a given setting. Without additional context—including tone, rhythm, and history of the conversation—models lack the information to choose the best option. Capturing these nuances requires reasoning across multiple aspects of language and prosody.

To address this, we introduce the Conversational Speech Model (CSM), which frames the problem as an end-to-end multimodal learning task using transformers. It leverages the history of the conversation to produce more natural and coherent speech. There are two key takeaways from our work. The first is that CSM operates as a

single-stage model, thereby improving efficiency and expressivity. The second is our

evaluation suite, which is necessary for evaluating progress on contextual capabilities and addresses the fact that common public evaluations are saturated.

Background

One approach to modeling audio with transformers is to convert continuous waveforms into discrete audio token sequences using tokenizers. Most contemporary approaches ([1], [2]) rely on two types of audio tokens:

  1. Semantic tokens: Compact speaker-invariant representations of semantic and phonetic features. Their compressed nature enables them to capture key speech characteristics at the cost of high-fidelity representation.
  2. Acoustic tokens: Encodings of fine-grained acoustic details that enable high-fidelity audio reconstruction. These tokens are often generated using Residual Vector Quantization (RVQ) [2]. In contrast to semantic tokens, acoustic tokens retain natural speech characteristics like speaker-specific identity and timbre.

A common strategy first models semantic tokens and then generates audio using RVQ or diffusion-based methods. Decoupling these steps allows for a more structured approach to speech synthesis—the semantic tokens provide a compact, speaker-invariant representation that captures high-level linguistic and prosodic information, while the second-stage reconstructs the fine-grained acoustic details needed for high-fidelity speech. However, this approach has a critical limitation; semantic tokens are a bottleneck that must fully capture prosody, but ensuring this during training is challenging.

RVQ-based methods introduce their own set of challenges. Models must account for the sequential dependency between codebooks in a frame. One method, the delay pattern (figure below) [3], shifts higher codebooks progressively to condition predictions on lower codebooks within the same frame. A key limitation of this approach is that the time-to-first-audio scales poorly because an RVQ tokenizer with N codebooks requires N backbone steps before decoding the first audio chunk. While suitable for offline applications like audiobooks, this delay is problematic in a real-time scenario.

Example of delayed pattern generation in an RVQ tokenizer with 4 codebooks

Conversational Speech Model

CSM is a multimodal, text and speech model that operates directly on RVQ tokens. Inspired by the RQ-Transformer [4], we use two autoregressive transformers. Different from the approach in [5], we split the transformers at the zeroth codebook. The first multimodal backbone processes interleaved text and audio to model the zeroth codebook. The second audio decoder uses a distinct linear head for each codebook and models the remaining N – 1 codebooks to reconstruct speech from the backbone’s representations. The decoder is significantly smaller than the backbone, enabling low-latency generation while keeping the model end-to-end.

CSM model inference process. Text (T) and audio (A) tokens are interleaved and fed sequentially into the Backbone, which predicts the zeroth level of the codebook. The Decoder then samples levels 1 through N – 1 conditioned on the predicted zeroth level. The reconstructed audio token (A) is then autoregressively fed back into the Backbone for the next step, continuing until the audio EOT symbol is emitted. This process begins again on the next inference request, with the interim audio (such as a user utterance) being represented by interleaved audio and text transcription tokens.

Both transformers are variants of the Llama architecture. Text tokens are generated via a Llama tokenizer [6], while audio is processed using Mimi, a split-RVQ tokenizer, producing one semantic codebook and N – 1 acoustic codebooks per frame at 12.5 Hz. [5] Training samples are structured as alternating interleaved patterns of text and audio, with speaker identity encoded directly in the text representation.

Compute amortization

This design introduces significant infrastructure challenges during training. The audio decoder processes an effective batch size of B × S and N codebooks autoregressively, where B is the original batch size, S is the sequence length, and N is the number of RVQ codebook levels. This high memory burden even with a small model slows down training, limits model scaling, and hinders rapid experimentation, all of which are crucial for performance.

To address these challenges, we use a compute amortization scheme that alleviates the memory bottleneck while preserving the fidelity of the full RVQ codebooks. The audio decoder is trained on only a random 1/16 subset of the audio frames, while the zeroth codebook is trained on every frame. We observe no perceivable difference in audio decoder losses during training when using this approach.

Amortized training process. The backbone transformer models the zeroth level across all frames (highlighted in blue), while the decoder predicts the remaining N – 31 levels, but only for a random 1/16th of the frames (highlighted in green). The top section highlights the specific frames modeled by the decoder for which it receives loss.

Experiments

Dataset: We use a large dataset of publicly available audio, which we transcribe, diarize, and segment. After filtering, the dataset consists of approximately one million hours of predominantly English audio.

Model Sizes: We trained three model sizes, delineated by the backbone and decoder sizes:

  • Tiny: 1B backbone, 100M decoder
  • Small: 3B backbone, 250M decoder
  • Medium: 8B backbone, 300M decoder

Each model was trained with a 2048 sequence length (~2 minutes of audio) over five epochs.

Samples

Paralinguistics

Sentences from Base TTS

Foreign words

Sentences from Base TTS

Contextual expressivity

Samples from Expresso, continuation after chime

Pronunciation correction

Pronunciation correction sentence is a recording, all other audio is generated.

Conversations with multiple speakers

Single generation using audio prompts from two speakers

Evaluation

Our evaluation suite measures model performance across four key aspects: faithfulness to text, context utilization, prosody, and latency. We report both objective and subjective metrics—objective benchmarks include word error rate and novel tests like homograph disambiguation, while subjective evaluation relies on a Comparative Mean Opinion Score (CMOS) human study using the Expresso dataset.

Objective metrics

Traditional benchmarks, such as word error rate (WER) and speaker similarity (SIM), have become saturated—modern models, including CSM, now achieve near-human performance on these metrics.

Objective metric results for Word Error Rate (top) and Speaker Similarity (bottom) tests, showing the metrics are saturated (matching human performance).

To better assess pronunciation and contextual understanding, we introduce a new set of phonetic transcription-based benchmarks.

  • Text understanding through Homograph Disambiguation: Evaluates whether the model correctly pronounced different words with the same orthography (e.g., “lead” /lɛd/ as in “metal” vs. “lead” /liːd/ as in “to guide”).
  • Audio understanding through Pronunciation Continuation Consistency: Evaluates whether the model maintains pronunciation consistency of a specific word with multiple pronunciation variants in multi-turn speech. One example is “route” (/raʊt/ or /ruːt/), which can vary based on region of the speaker and context.

Objective metric results for Homograph Disambiguation (left) and Pronunciation Consistency (right) tests, showing the accuracy percentage for each model’s correct pronunciation. Play.ht, Elevenlabs, and OpenAI generations were made with default settings and voices from their respective API documentation.

The graph above compares objective metric results across three model sizes. For Homograph accuracy we generated 200 speech samples covering 5 distinct homographs—lead, bass, tear, wound, row—with 2 variants for each and evaluated pronunciation consistency using wav2vec2-lv-60-espeak-cv-ft. For Pronunciation Consistency we generated 200 speech samples covering 10 distinct words that have common pronunciation variants—aunt, data, envelope, mobile, route, vase, either, adult, often, caramel.

In general, we observe that performance improves with larger models, supporting our hypothesis that scaling enhances the synthesis of more realistic speech.

Subjective metrics

We conducted two Comparative Mean Opinion Score (CMOS) studies using the Expresso dataset to assess the naturalness and prosodic appropriateness of generated speech for CSM-Medium. Human evaluators were presented with pairs of audio samples—one generated by the model and the other a ground-truth human recording. Listeners rated the generated sample on a 7-point preference scale relative to the reference. Expresso’s diverse expressive TTS samples, including emotional and prosodic variations, make it a strong benchmark for evaluating appropriateness to context.

In the first CMOS study we presented the generated and human audio samples with no context and asked listeners to “choose which rendition feels more like human speech.” In the second CMOS study we also provide the previous 90 seconds of audio and text context, and ask the listeners to “choose which rendition feels like a more appropriate continuation of the conversation.” Eighty people were paid to participate in the evaluation and rated on average 15 examples each.

Subjective evaluation results on the Expresso dataset. No context: listeners chose “which rendition feels more like human speech” without knowledge of the context. Context: listeners chose “which rendition feels like a more appropriate continuation of the conversation” with audio and text context. 50:50 win–loss ratio suggests that listeners have no clear preference.

The graph above shows the win-rate of ground-truth human recordings vs CSM-generated speech samples for both studies. Without conversational context (top), human evaluators show no clear preference between generated and real speech, suggesting that naturalness is saturated. However, when context is included (bottom), evaluators consistently favor the original recordings. These findings suggest a noticeable gap remains between generated and human prosody in conversational speech generation.

Open-sourcing our work

We believe that advancing conversational AI should be a collaborative effort. To that end, we’re committed to open-sourcing key components of our research, enabling the community to experiment, build upon, and improve our approach. Our models will be available under an Apache 2.0 license.

Limitations and future work

CSM is currently trained on primarily English data; some multilingual ability emerges due to dataset contamination, but it does not perform well yet. It also does not take advantage of the information present in the weights of pre-trained language models.

In the coming months, we intend to scale up model size, increase dataset volume, and expand language support to over 20 languages. We also plan to explore ways to utilize pre-trained language models, working towards large multimodal models that have deep knowledge of both speech and text.

Ultimately, while CSM generates high quality conversational prosody, it can only model the text and speech content in a conversation—not the structure of the conversation itself. Human conversations are a complex process involving turn taking, pauses, pacing, and more. We believe the future of AI conversations lies in fully duplex models that can implicitly learn these dynamics from data. These models will require fundamental changes across the stack, from data curation to post-training methodologies, and we’re excited to push in these directions.

Join us

If you’re excited about building the most natural, delightful, and inspirational voice interfaces out there, reach out—we’re hiring. Check our open roles.

]]>
https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice hacker-news-small-sites-43227881 Sun, 02 Mar 2025 06:13:01 GMT
<![CDATA[Knowing CSS is mastery to front end development]]> thread link) | @tipiirai
March 1, 2025 | https://helloanselm.com/writings/knowing-css-is-mastery-to-frontend-development | archive.org

There are countless articles why developers should not focus on Frameworks too much and instead learn to understand the underlying languages. But I think rarely we can find good reasons except that Frameworks come and go. To me, the main reason is different: You won’t be a master at frontend development if you don’t understand underlying mechanisms of a language.

A usual stack today is React together with countless layers in between the language and the framework itself. CSS as styling method is not used natively but via JavaScript tools that translate it into native CSS. For JavaScript we nowadays write an opinionated Framework language mix using TypeScript which by itself is translated to native JavaScript in the end again. And while we all know the comfort of these tools and languages, there are many things that make it easier if you understand a browser’s ecosystem:

  • Debug JavaScript errors easier and also in foreign environments without a debugging browser extension installed
  • Debug CSS
  • Write custom CSS (and every project I’ve seen so far needs it somewhere)
  • Understand why errors occur that you may not find locally and only in client’s browsers

In the past years I had various situations where TypeScript developers (they called themselves) approached me and asked whether I could help them out with CSS. I expected to solve a complex problem but for me — knowing CSS very well — it was always a simple, straightforward solution or code snippet:

  • A multi-colored footer bar should not be an image, it’s a simple CSS background multi-step gradient with one line of code. No need to scale an image, create an SVG, just CSS.
  • Custom icons for an input field? Welp, it’s not that easy for privacy reasons to add a pseudo-class here in certain cases. But there are many simple solutions and no need to include another bloated npm dependency that nobody understands what it does.
  • Webfonts: Dev: We can’t add another webfont style, we already serve 4MB of webfonts.
    → Me: Alright, why don’t we serve it as Variable Font?
    → Dev: Oh, what’s this?
    → Check it out, we now load 218kb async, only one file and have all our styles we have and will ever need inside.

Nowadays people can write great React and TypeScript code. Most of the time a component library like MUI, Tailwind and others are used for styling. However, nearly no one is able to judge whether the CSS in the codebase is good or far from optimal. It is magically applied by our toolchain into the HTML and we struggle to understand why the website is getting slower and slower.

Most of the performance basics I learned ten years ago are still the most relevant ones today. Yet, most developers don’t know about them because we use create-react-web-app or similar things. Put Cloudflare on top to boost performance and reduce costs. Yes, that works for your website and little project.

What companies expect when they ask for a web dashboard serving real time data for their customers is different: It should be a robust, well working application that is easy to maintain. That means we need to combine the developer experience (React, TypeScript, all the little helpers) with the knowledge of how browsers and networks work. And only then we can boost performance, write accessible code, load dynamic data in a proper and safe way and provide fallbacks in case something goes wrong.

In cases of emergency like an Incident with the service, I’ve seen the difference often enough between people who exactly know where to look at, start debugging and go further, and those who try to find out in panic what’s going on here, hoping that a restart or re-deployment with reinstalled dependencies will help bring the service back to life.

And that means in the end again: If you know CSS, you also know the style framework. If you understand JavaScript, TypeScript is not a big problem for you. And that makes you a Senior or Principal.

]]>
https://helloanselm.com/writings/knowing-css-is-mastery-to-frontend-development hacker-news-small-sites-43227303 Sun, 02 Mar 2025 04:32:06 GMT
<![CDATA[Learning C# and .NET after two decades of programming]]> thread link) | @Kerrick
March 1, 2025 | https://kerrick.blog/articles/2025/why-i-am-learning-c-sharp-and-dot-net-after-two-decades-of-programming/ | archive.org

A photo of a net

I’ve been programming for over two decades, and I can’t make a full-stack enterprise web application.


The first lines of code I wrote were in GW-BASIC. When I was in eighth grade, I enrolled in a typing class. Students who finished their typing practice before class ended were given an extra credit opportunity: copying program source code. It was a fantastic test of applied accuracy, and I gladly participated. Eventually I started to pick up on some of the patterns I saw in those BASIC programs. I came up with my own programs—mad libs and simple calculators—and fell in love. I still couldn’t make a web site.

In high school, the library had a book about HTML. I made my first web pages, and my math teacher helped me put them online. I got a job bagging groceries to pay for a laptop, and used that laptop to develop simple web sites for local businesses. These were the first times I was ever paid to write code, and I was hooked. I still couldn’t make a rich web site.

When I got to college I learned JavaScript from another book, and CSS from blog posts and documentation web sites. Before I left college I took a job with the Web Design & Support department, implementing a major redesign of the school’s entire web site in HTML and CSS, with a splash of jQuery for interactivity. I still couldn’t make a web application.

After I left college I scraped together a meager living making Chrome extensions, writing Ruby for freelance clients, and working part-time at Best Buy. I still couldn’t make an enterprise web application.

By 2013 I had my first career job as a front-end developer at an enterprise Software as a Service business. Thanks to EmberJS, an amazing product team, a top-notch architect, and leadership that understood lean software, I built the front-end of our new platform that, over the next seven years, would become so successful that I’d take on brilliant apprentices, build a team, grow to Engineering Manager, and become Director of Software Engineering. But I still couldn’t make a full-stack enterprise web application.

When that company got acquired, I laid off half of my team and lost a part of myself. I could no longer stomach working in management, so I left. I had my mid-life crisis: I moved to the country, bought a farm, went back to college online, and tried to create a startup. I realized I was drifting, and that what I wanted was a steady stream of programming work on a great team. I found exactly that, thanks to the CTO of my previous employer. I am now responsible for improving and maintaining an enterprise Angular application powered by a C# / .NET back-end. It’s a bit rough around the edges, but I tidy as I go. I’m the only purely-front-end programmer on a team of twelve. I ship features our customers love, I help the team improve our processes, and I improve the existing legacy Angular application. But I still can’t make a full-stack enterprise web application.


Last quarter, I learned that our next front-end will use Blazor, not Angular. This means it will use C#, not TypeScript. This quarter, my manager gave the gift of time. Every hour I’m not fixing urgent bugs or implementing important features, he encouraged me to spend learning C#, .NET, and Blazor. The company paid for an O’Reilly Learning Platform subscription, and I’ve collected a list of books to study at work. I’ll still spend my nights and weekends improving at my craft, but instead of learning Ruby on Rails, I’ll be reading generally-applicable books: Patterns of Enterprise Application Architecture, Domain-Driven Design, Working Effectively with Legacy Code, Object-Oriented Analysis & Design with Applications, Data Modeling Essentials, and Designing Data-Intensive Applications.

I’ll blog and toot about what I learn as I go, and I hope you’ll join me. I’m learning C# and .NET, but starting from two decades of programming experience and a decade of software engineering experience. I’m learning web development, but starting from a deep knowledge of HTTP, browsers, and the front-end. I’m learning architecture and object-orientation, but starting from a background in structured and functional programming.

The only thing I love more than learning is my wife. I can’t wait for this learning journey, and I’m excited to share what I learn. Subscribe to my email list and perhaps you’ll learn something too.

Get notified when I publish new articles

Get notified when I post new articles. Privacy policy applies.

]]>
https://kerrick.blog/articles/2025/why-i-am-learning-c-sharp-and-dot-net-after-two-decades-of-programming/ hacker-news-small-sites-43226462 Sun, 02 Mar 2025 02:15:18 GMT
<![CDATA[Mozilla site down due to "overdue hosting payments" [fixed]]]> thread link) | @motownphilly
March 1, 2025 | https://linuxmom.net/@vkc/114089626244932902 | archive.org

Unable to extract article]]>
https://linuxmom.net/@vkc/114089626244932902 hacker-news-small-sites-43226089 Sun, 02 Mar 2025 01:20:17 GMT
<![CDATA[I'm done with coding]]> thread link) | @neelc
March 1, 2025 | https://www.neelc.org/2025/03/01/im-done-with-coding/ | archive.org

In my high school days, I was a huge server and networking person. My homelab was basically my identity, and not even a good one: consumer-level networking gear running Tomato and a then-7-year-old homebuilt desktop PC running FreeBSD.

Then I joined NYU’s Tandon School of Engineering for Computer Science. It was a full 180 into software engineering. I didn’t just code for assignments, I started with toy projects and went to major Tor contributions writing very complex patches, had two internships and ultimately a job at Microsoft.

Primarily due to “Big Data” experience at NYU CUSP, Microsoft placed me on the Viva Insights team. I’ve always hated the product, feeling it was unnecessary surveillance. I wanted out.

In fact, the disdain of Viva Insights was big enough to make me lose passion for coding and get into obsessive browsing and shopping because facing the music of working on a surveillance product would bother me even more. Open source work outside of package maintenance went to zero.

I’ve tried to discuss this with my mom, and she kept telling me how “lucky” I am for working at Microsoft saying “it’s big tech” and “you’re neurodivergent” and “you won’t survive at a smaller company.” She even bought into the marketing material telling me how it’s “not surveillance.”

I’ve decided that in the shitty job market, it’s not worth being a software engineer even if I make much less. Part of it is being “specialized” in over-glorified surveillance so even if I change employers, what’s the guarantee I won’t be working on another surveillance product. Assuming I can even get another job.

In fact, I’ll just live off dividend income and try to get my new IT startup Fourplex off the ground. Sure, I won’t be able to buy shiny homelab equipment as often as I did in the past, but I at least have the guarantee I’m not working on an unethical product.

While six figures is certainly nice, it’s only nice if it’s ethically done. I’d much rather flip burgers or bag groceries than work on surveillance for six figures. After all, Edward Snowden had a “stable” federal government job (not so stable now thanks to “DOGE”) and he gave it up to stand up for the right to privacy.

And I care more for my values than the name or salary. It’s not like I use Windows at home, I haven’t since 2012. I kept self-hosting email despite having worked at Microsoft 365 and still do even now. And I sacrificed job performance for my values of strong privacy.

Little did I know that my father (who was previously a big Big Data and AI advocate) would come out to hate Viva Insights. He says it’s “bullshit” and nobody uses it. Even when I worked at Microsoft I never used it. Not even once. It’s bloatware. Microsoft is 100% better off porting Office apps to Linux (despite me using a Mac now) or beefing up cybersecurity.

]]>
https://www.neelc.org/2025/03/01/im-done-with-coding/ hacker-news-small-sites-43225901 Sun, 02 Mar 2025 00:49:27 GMT
<![CDATA[Norwegian fuel supplier refuses U.S. warships over Ukraine]]> thread link) | @hjjkjhkj
March 1, 2025 | https://ukdefencejournal.org.uk/norwegian-fuel-supplier-refuses-u-s-warships-over-ukraine/ | archive.org

Norwegian fuel company Haltbakk Bunkers has announced it will cease supplying fuel to U.S. military forces in Norway and American ships docking in Norwegian ports, citing dissatisfaction with recent U.S. policy towards Ukraine.

In a strongly worded statement, the company criticised a televised event involving U.S. President Donald Trump and Vice President J.D. Vance, referring to it as the “biggest shitshow ever presented live on TV.”

Haltbakk Bunkers praised Ukrainian President Volodymyr Zelensky for his restraint, accusing the U.S. of “putting on a backstabbing TV show” and declaring that the spectacle “made us sick.”

As a result, the company stated: “We have decided to immediately STOP as fuel provider to American forces in Norway and their ships calling Norwegian ports. No Fuel to Americans!” Haltbakk Bunkers also urged Norwegians and Europeans to follow their lead, concluding their statement with the slogan “Slava Ukraina” in support of Ukraine.

Who is Haltbakk Bunkers?

Haltbakk Bunkers is a Norwegian fuel supplier that provides marine fuel for shipping and military operations. Based in Kristiansund, Norway, the company specialises in bunkering services for vessels operating in Norwegian waters, offering fuel logistics and distribution for both civilian and military customers.

Haltbakk Bunkers plays a significant role in Norway’s maritime industry, supplying fuel to vessels calling at Norwegian ports, including NATO and allied forces.

The decision to cut off the U.S. military could have logistical implications for American naval operations in the region. Norway is a key NATO member and frequently hosts U.S. and allied forces for joint exercises and Arctic defence operations.

This announcement raises questions about the broader European stance on U.S. policy towards Ukraine and whether other businesses or governments might take similar actions. It also highlights how private companies in Europe are responding independently to geopolitical developments.

The U.S. has not yet responded to the decision, and it remains to be seen whether this will affect fuel supply chains for American forces operating in Norway and the North Atlantic region.


At the UK Defence Journal, we aim to deliver accurate and timely news on defence matters. We rely on the support of readers like you to maintain our independence and high-quality journalism. Please consider making a one-off donation to help us continue our work. Click here to donate. Thank you for your support!

]]>
https://ukdefencejournal.org.uk/norwegian-fuel-supplier-refuses-u-s-warships-over-ukraine/ hacker-news-small-sites-43223872 Sat, 01 Mar 2025 21:29:36 GMT
<![CDATA[Abusing C to implement JSON parsing with struct methods]]> thread link) | @ingve
March 1, 2025 | https://xnacly.me/posts/2025/json-parser-in-c-with-methods/ | archive.org

Idea

  1. Build a JSON parser in c
  2. Instead of using by itself functions: attach functions to a struct and use these as methods
  3. make it C issue family free (segfaults, leaks, stack overflows, etc…)
  4. provide an ergonomic API

Usage

C

 1#include "json.h"
 2#include <stdlib.h>
 3
 4int main(void) {
 5  struct json json = json_new(JSON({
 6    "object" : {},
 7    "array" : [[]],
 8    "atoms" : [ "string", 0.1, true, false, null ]
 9  }));
10  struct json_value json_value = json.parse(&json);
11  json_print_value(&json_value);
12  puts("");
13  json_free_value(&json_value);
14  return EXIT_SUCCESS;
15}

Tip - Compiling C projects the easy way

Don’t take this as a guide for using make, in my projects I just use it as a command runner.

Compiler flags

These flags can be specific to gcc, I use gcc (GCC) 14.2.1 20250207, so take this with a grain of salt.

I use these flags in almost every c project I ever started.

SH

 1gcc -std=c23 \
 2	-O2 \
 3	-Wall \
 4	-Wextra \
 5	-Werror \
 6	-fdiagnostics-color=always \
 7	-fsanitize=address,undefined \
 8	-fno-common \
 9	-Winit-self \
10	-Wfloat-equal \
11	-Wundef \
12	-Wshadow \
13	-Wpointer-arith \
14	-Wcast-align \
15	-Wstrict-prototypes \
16	-Wstrict-overflow=5 \
17	-Wwrite-strings \
18	-Waggregate-return \
19	-Wswitch-default \
20	-Wno-discarded-qualifiers \
21	-Wno-aggregate-return \
22    main.c
FlagDescription
-std=c23set lang standard, i use ISO C23
-O2optimize more than -O1
-Wallenable a list of warnings
-Wextraenable more warnings than -Wall
-Werrorconvert all warnings to errors
-fdiagnostics-color=alwaysuse color in diagnostics
-fsanitize=address,undefinedenable AddressSanitizer and UndefinedBehaviorSanitizer
-fno-commonplace uninitialized global variables in the BSS section
-Winit-selfwarn about uninitialized variables
-Wfloat-equalwarn if floating-point values are used in equality comparisons.
-Wundefwarn if an undefined identifier is evaluated
-Wshadowwarn whenever a local variable or type declaration shadows another variable, parameter, type
-Wpointer-arithwarn about anything that depends on the “size of” a function type or of void
-Wcast-alignwarn whenever a pointer is cast such that the required alignment of the target is increased.
-Wstrict-prototypeswarn if a function is declared or defined without specifying the argument types
-Wstrict-overflow=5warns about cases where the compiler optimizes based on the assumption that signed overflow does not occu
-Wwrite-stringsgive string constants the type const char[length], warns on copy into non const char*
-Wswitch-defaultwarn whenever a switch statement does not have a default case
-Wno-discarded-qualifiersdo not warn if type qualifiers on pointers are being discarded.
-Wno-aggregate-returndo not warn if any functions that return structures or unions are defined or called.

Sourcing source files

I generally keep my header and source files in the same directory as the makefile, so i use find to find them:

SHELL

1shell find . -name "*.c"

Make and Makefiles

I don’t define the build target as .PHONY because i generally never have a build directory.

Putting it all together as a makefile:

MAKE

 1CFLAGS := -std=c23 \
 2	-O2 \
 3	-Wall \
 4	-Wextra \
 5	-Werror \
 6	-fdiagnostics-color=always \
 7	-fsanitize=address,undefined \
 8	-fno-common \
 9	-Winit-self \
10	-Wfloat-equal \
11	-Wundef \
12	-Wshadow \
13	-Wpointer-arith \
14	-Wcast-align \
15	-Wstrict-prototypes \
16	-Wstrict-overflow=5 \
17	-Wwrite-strings \
18	-Waggregate-return \
19	-Wcast-qual \
20	-Wswitch-default \
21	-Wno-discarded-qualifiers \
22	-Wno-aggregate-return
23
24FILES := $(shell find . -name "*.c")
25
26build:
27	$(CC) $(CFLAGS) $(FILES) -o jsoninc

Variadic macros to write inline raw JSON

This doesn’t really deserve its own section, but I use #<expression> to stringify C expressions in conjunction with __VA_ARGS__:

C

1#define JSON(...) #__VA_ARGS__

To enable:

C

1char *raw_json = JSON({ "array" : [ [], {}] });

Inlines to:

C

1char *raw_json = "{ \"array\" : [[]], }";

Representing JSON values in memory

I need a structure to hold a parsed JSON value, their types and their values.

Types of JSON values

JSON can be either one of:

  1. null
  2. true
  3. false
  4. number
  5. string
  6. array
  7. object

In C i use an enum to represent this:

C

 1// json.h
 2enum json_type {
 3  json_number,
 4  json_string,
 5  json_boolean,
 6  json_null,
 7  json_object,
 8  json_array,
 9};
10
11extern char *json_type_map[];

And i use json_type_map to map all json_type values to their char* representation:

C

1char *json_type_map[] = {
2    [json_number] = "json_number",   [json_string] = "json_string",
3    [json_boolean] = "json_boolean", [json_null] = "json_null",
4    [json_object] = "json_object",   [json_array] = "json_array",
5};

json_value & unions for atoms, array elements or object values and object keys

The json_value struct holds the type, as defined above, a union sharing memory space for either a boolean, a string or a number, a list of json_value structures as array children or object values, a list of strings that are object keys and the length for the three aforementioned fields.

C

 1struct json_value {
 2  enum json_type type;
 3  union {
 4    bool boolean;
 5    char *string;
 6    double number;
 7  } value;
 8  struct json_value *values;
 9  char **object_keys;
10  size_t length;
11};

Tearing values down

Since some of the fields in json_value are heap allocated, we have to destroy / free the structure upon either no longer using it or exiting the process. json_free_value does exactly this:

C

 1void json_free_value(struct json_value *json_value) {
 2  switch (json_value->type) {
 3  case json_string:
 4    free(json_value->value.string);
 5    break;
 6  case json_object:
 7    for (size_t i = 0; i < json_value->length; i++) {
 8      free(json_value->object_keys[i]);
 9      json_free_value(&json_value->values[i]);
10    }
11    if (json_value->object_keys != NULL) {
12      free(json_value->object_keys);
13      json_value->object_keys = NULL;
14    }
15    if (json_value->values != NULL) {
16      free(json_value->values);
17      json_value->values = NULL;
18    }
19    break;
20  case json_array:
21    for (size_t i = 0; i < json_value->length; i++) {
22      json_free_value(&json_value->values[i]);
23    }
24    if (json_value->values != NULL) {
25      free(json_value->values);
26      json_value->values = NULL;
27    }
28    break;
29  case json_number:
30  case json_boolean:
31  case json_null:
32  default:
33    break;
34  }
35  json_value->type = json_null;
36}

As simple as that, we ignore stack allocated JSON value variants, such as json_number, json_boolean and json_null, while freeing allocated memory space for json_string, each json_array child and json_object keys and values.

Printing json_values

Only a memory representation and no way to inspect it has no value to us, thus I dumped print_json_value into main.c:

C

 1void print_json_value(struct json_value *json_value) {
 2  switch (json_value->type) {
 3  case json_null:
 4    printf("null");
 5    break;
 6  case json_number:
 7    printf("%f", json_value->value.number);
 8    break;
 9  case json_string:
10    printf("\"%s\"", json_value->value.string);
11    break;
12  case json_boolean:
13    printf(json_value->value.boolean ? "true" : "false");
14    break;
15  case json_object:
16    printf("{");
17    for (size_t i = 0; i < json_value->length; i++) {
18      printf("\"%s\": ", json_value->object_keys[i]);
19      print_json_value(&json_value->values[i]);
20      if (i < json_value->length - 1) {
21        printf(", ");
22      }
23    }
24    printf("}");
25    break;
26  case json_array:
27    printf("[");
28    for (size_t i = 0; i < json_value->length; i++) {
29      print_json_value(&json_value->values[i]);
30      if (i < json_value->length - 1) {
31        printf(", ");
32      }
33    }
34    printf("]");
35    break;
36  default:
37    ASSERT(0, "Unimplemented json_value case");
38    break;
39  }
40}

Calling this function:

C

 1int main(void) {
 2  struct json_value json_value = {
 3      .type = json_array,
 4      .length = 4,
 5      .values =
 6          (struct json_value[]){
 7              (struct json_value){.type = json_string, .value.string = "hi"},
 8              (struct json_value){.type = json_number, .value.number = 161},
 9              (struct json_value){
10                  .type = json_object,
11                  .length = 1,
12                  .object_keys =
13                      (char *[]){
14                          "key",
15                      },
16                  .values =
17                      (struct json_value[]){
18                          (struct json_value){.type = json_string,
19                                              .value.string = "value"},
20                      },
21              },
22              (struct json_value){.type = json_null},
23          },
24  };
25  json_print_value(&json_value);
26  puts("");
27  return EXIT_SUCCESS;
28}

Results in:

TEXT

1["hi", 161.000000, {"key": "value"}, null]

json Parser struct, Function pointers and how to use them (they suck)

As contrary as it sounds, one can attach functions to structures in c very easily, just define a field of a struct as a function pointer, assign a function to it and you got a method, as you would in Go or Rust.

C

 1struct json {
 2  char *input;
 3  size_t pos;
 4  size_t length;
 5  char (*cur)(struct json *json);
 6  bool (*is_eof)(struct json *json);
 7  void (*advance)(struct json *json);
 8  struct json_value (*atom)(struct json *json);
 9  struct json_value (*array)(struct json *json);
10  struct json_value (*object)(struct json *json);
11  struct json_value (*parse)(struct json *json);
12};

Of course you have to define a function the c way (<return type> <name>(<list of params>);) and assign it to your method field - but I is not that complicated:

C

 1struct json json_new(char *input) {
 2  ASSERT(input != NULL, "corrupted input");
 3  struct json j = (struct json){
 4      .input = input,
 5      .length = strlen(input) - 1,
 6  };
 7
 8  j.cur = cur;
 9  j.is_eof = is_eof;
10  j.advance = advance;
11  j.parse = parse;
12  j.object = object;
13  j.array = array;
14  j.atom = atom;
15
16  return j;
17}

cur, is_eof and advance are small helper functions:

C

 1static char cur(struct json *json) {
 2  ASSERT(json != NULL, "corrupted internal state");
 3  return json->is_eof(json) ? -1 : json->input[json->pos];
 4}
 5
 6static bool is_eof(struct json *json) {
 7  ASSERT(json != NULL, "corrupted internal state");
 8  return json->pos > json->length;
 9}
10
11static void advance(struct json *json) {
12  ASSERT(json != NULL, "corrupted internal state");
13  json->pos++;
14  skip_whitespace(json);
15}

ASSERT is a simple assertion macro:

C

1#define ASSERT(EXP, context)                                                   \
2  if (!(EXP)) {                                                                \
3    fprintf(stderr,                                                            \
4            "jsoninc: ASSERT(" #EXP "): `" context                             \
5            "` failed at %s, line %d\n",                                       \
6            __FILE__, __LINE__);                                               \
7    exit(EXIT_FAILURE);                                                        \
8  }

Failing for instance if the argument to the json_new function is a null pointer:

C

1int main(void) {
2  struct json json = json_new(NULL);
3  return EXIT_SUCCESS;
4}

Even with a descriptive comment:

TEXT

1jsoninc: ASSERT(input != NULL): `corrupted input` failed at ./json.c, line 16

Parsing JSON with methods

Since we now have the whole setup out of the way, we can start with the crux of the project: parsing JSON. Normally I would have done a lexer and parser, but for the sake of simplicity - I combined these passes into a single parser architecture.

Ignoring Whitespace

As far as we are concerned, JSON does not say anything about whitespace - so we just use the skip_whitespace function to ignore all and any whitespace:

C

1static void skip_whitespace(struct json *json) {
2  while (!json->is_eof(json) &&
3         (json->cur(json) == ' ' || json->cur(json) == '\t' ||
4          json->cur(json) == '\n')) {
5    json->pos++;
6  }
7}

Parsing Atoms

Since JSON has five kinds of an atom, we need to parse them into our json_value struct using the json->atom method:

C

 1static struct json_value atom(struct json *json) {
 2    ASSERT(json != NULL, "corrupted internal state");
 3
 4    skip_whitespace(json);
 5
 6    char cc = json->cur(json);
 7    if ((cc >= '0' && cc <= '9') || cc == '.' || cc == '-') {
 8        return number(json);
 9    }
10
11    switch (cc) {
12        // ... all of the atoms ...
13    default:
14        printf("unknown character '%c' at pos %zu\n", json->cur(json), json->pos);
15        ASSERT(false, "unknown character");
16        return (struct json_value){.type = json_null};
17    }
18}

numbers

Info

Technically numbers in JSON should include scientific notation and other fun stuff, but lets just remember the projects simplicity and my sanity, see json.org.

C

 1static struct json_value number(struct json *json) {
 2  ASSERT(json != NULL, "corrupted internal state");
 3  size_t start = json->pos;
 4  // i don't give a fuck about scientific notation <3
 5  for (char cc = json->cur(json);
 6       ((cc >= '0' && cc <= '9') || cc == '_' || cc == '.' || cc == '-');
 7       json->advance(json), cc = json->cur(json))
 8    ;
 9
10  char *slice = malloc(sizeof(char) * json->pos - start + 1);
11  ASSERT(slice != NULL, "failed to allocate slice for number parsing")
12  memcpy(slice, json->input + start, json->pos - start);
13  slice[json->pos - start] = 0;
14  double number = strtod(slice, NULL);
15  free(slice);
16
17  return (struct json_value){.type = json_number, .value = {.number = number}};
18}

We keep track of the start of the number, advance as far as the number is still considered a number (any of 0-9 | _ | . | -). Once we hit the end we allocate a temporary string, copy the chars containing the number from the input string and terminate the string with \0. strtod is used to convert this string to a double. Once that is done we free the slice and return the result as a json_value.

null, true and false

null, true and false are unique atoms and easy to reason about, regarding constant size and characters, as such we can just assert their characters:

C

 1static struct json_value atom(struct json *json) {
 2  ASSERT(json != NULL, "corrupted internal state");
 3
 4  skip_whitespace(json);
 5
 6  char cc = json->cur(json);
 7  if ((cc >= '0' && cc <= '9') || cc == '.' || cc == '-') {
 8    return number(json);
 9  }
10
11  switch (cc) {
12  case 'n': // null
13    json->pos++;
14    ASSERT(json->cur(json) == 'u', "unknown atom 'n', wanted 'null'")
15    json->pos++;
16    ASSERT(json->cur(json) == 'l', "unknown atom 'nu', wanted 'null'")
17    json->pos++;
18    ASSERT(json->cur(json) == 'l', "unknown atom 'nul', wanted 'null'")
19    json->advance(json);
20    return (struct json_value){.type = json_null};
21  case 't': // true
22    json->pos++;
23    ASSERT(json->cur(json) == 'r', "unknown atom 't', wanted 'true'")
24    json->pos++;
25    ASSERT(json->cur(json) == 'u', "unknown atom 'tr', wanted 'true'")
26    json->pos++;
27    ASSERT(json->cur(json) == 'e', "unknown atom 'tru', wanted 'true'")
28    json->advance(json);
29    return (struct json_value){.type = json_boolean,
30                               .value = {.boolean = true}};
31  case 'f': // false
32    json->pos++;
33    ASSERT(json->cur(json) == 'a', "invalid atom 'f', wanted 'false'")
34    json->pos++;
35    ASSERT(json->cur(json) == 'l', "invalid atom 'fa', wanted 'false'")
36    json->pos++;
37    ASSERT(json->cur(json) == 's', "invalid atom 'fal', wanted 'false'")
38    json->pos++;
39    ASSERT(json->cur(json) == 'e', "invalid atom 'fals', wanted 'false'")
40    json->advance(json);
41    return (struct json_value){.type = json_boolean,
42                               .value = {.boolean = false}};
43  // ... strings ...
44  default:
45    printf("unknown character '%c' at pos %zu\n", json->cur(json), json->pos);
46    ASSERT(false, "unknown character");
47    return (struct json_value){.type = json_null};
48  }
49}

strings

Info

Again, similarly to JSON numbers, JSON strings should include escapes for quotation marks and other fun stuff, but lets again just remember the projects simplicity and my sanity, see json.org.

C

 1static char *string(struct json *json) {
 2  json->advance(json);
 3  size_t start = json->pos;
 4  for (char cc = json->cur(json); cc != '\n' && cc != '"';
 5       json->advance(json), cc = json->cur(json))
 6    ;
 7
 8  char *slice = malloc(sizeof(char) * json->pos - start + 1);
 9  ASSERT(slice != NULL, "failed to allocate slice for a string")
10
11  memcpy(slice, json->input + start, json->pos - start);
12  slice[json->pos - start] = 0;
13
14  ASSERT(json->cur(json) == '"', "unterminated string");
15  json->advance(json);
16  return slice;
17}

Pretty easy stuff, as long as we are inside of the string (before \",\n and EOF) we advance, after that we copy it into a new slice and return that slice (this function is especially useful for object keys - that’s why it is a function).

Parsing Arrays

Since arrays a any amount of JSON values between [] and separated via , - this one is not that hard to implement too:

C

 1struct json_value array(struct json *json) {
 2  ASSERT(json != NULL, "corrupted internal state");
 3  ASSERT(json->cur(json) == '[', "invalid array start");
 4  json->advance(json);
 5
 6  struct json_value json_value = {.type = json_array};
 7  json_value.values = malloc(sizeof(struct json_value));
 8
 9  while (!json->is_eof(json) && json->cur(json) != ']') {
10    if (json_value.length > 0) {
11      if (json->cur(json) != ',') {
12        json_free_value(&json_value);
13      }
14      ASSERT(json->cur(json) == ',',
15             "expected , as the separator between array members");
16      json->advance(json);
17    }
18    struct json_value member = json->parse(json);
19    json_value.values = realloc(json_value.values,
20                                sizeof(json_value) * (json_value.length + 1));
21    json_value.values[json_value.length++] = member;
22  }
23
24  ASSERT(json->cur(json) == ']', "missing array end");
25  json->advance(json);
26  return json_value;
27}

We start with a array length of one and reallocate for every new child we find. We also check for the , between each child.

A growing array would probably be better to minimize allocations, but here we are, writing unoptimized C code - still, it works :)

Parsing Objects

C

 1struct json_value object(struct json *json) {
 2  ASSERT(json != NULL, "corrupted internal state");
 3  ASSERT(json->cur(json) == '{', "invalid object start");
 4  json->advance(json);
 5
 6  struct json_value json_value = {.type = json_object};
 7  json_value.object_keys = malloc(sizeof(char *));
 8  json_value.values = malloc(sizeof(struct json_value));
 9
10  while (!json->is_eof(json) && json->cur(json) != '}') {
11    if (json_value.length > 0) {
12      if (json->cur(json) != ',') {
13        json_free_value(&json_value);
14      }
15      ASSERT(json->cur(json) == ',',
16             "expected , as separator between object key value pairs");
17      json->advance(json);
18    }
19    ASSERT(json->cur(json) == '"',
20           "expected a string as the object key, did not get that")
21    char *key = string(json);
22    ASSERT(json->cur(json) == ':', "expected object key and value separator");
23    json->advance(json);
24
25    struct json_value member = json->parse(json);
26    json_value.values = realloc(json_value.values, sizeof(struct json_value) *
27                                                       (json_value.length + 1));
28    json_value.values[json_value.length] = member;
29    json_value.object_keys = realloc(json_value.object_keys,
30                                     sizeof(char **) * (json_value.length + 1));
31    json_value.object_keys[json_value.length] = key;
32    json_value.length++;
33  }
34
35  ASSERT(json->cur(json) == '}', "missing object end");
36  json->advance(json);
37  return json_value;
38}

Same as arrays, only instead of a single atom we have a string as the key, : as a separator and a json_value as the value. Each pair is separated with ,.

]]>
https://xnacly.me/posts/2025/json-parser-in-c-with-methods/ hacker-news-small-sites-43222344 Sat, 01 Mar 2025 18:53:20 GMT
<![CDATA[Making o1, o3, and Sonnet 3.7 hallucinate for everyone]]> thread link) | @hahahacorn
March 1, 2025 | https://bengarcia.dev/making-o1-o3-and-sonnet-3-7-hallucinate-for-everyone | archive.org

A quick-fun story.

My (ops-but-sometimes-writes-scripts-to-help-out) coworker just tapped on my shoulder and asked me to look at his code that wasn't working. It was a bit something like this:

User.includes(investments: -> { where(state: :draft) })...

This is not a feature of ActiveRecord or any libraries that I'm aware of. I asked him why he thought this was valid syntax, and he pulled up his ChatGPT history. It looked something like this:

Ask: How can I dynamically preload an association with conditions in rails? (Potentially followed up with - no custom has_many associations, no preloader object, don't filter the base query, etc.)

Sometimes, you're routed to the correct answer. Which is to add the filter you want on the associated record as a standard where clause, and also add a .references(:association) to the query chain. Like so:

User.includes(:investments).where(investments: { state: :draft }).references(:investments) 

However, with just a few tests, you're usually routed to that bizarre, non-existent syntax of including a lambda as a keyword argument value to the association you want it applied to. I recreated this a few times below:

o3-mini
Sonnet 3.7
Sonnet 3.5

I was confused why the syntax "felt" familiar though, until my coworker pointed out I invented it while asking a question on the Rails forum two years ago.

Exploring APIs

Funny enough, my other "idea" in that thread is the other solution most LLMs hallucinate - accessing the Preloader object directly.

This don't work either

I didn't realize this when posting originally, but this still requires you to loop through the posts and load the query returned by the preloader into each posts association target. I didn't include that, and LLMs seem to be confused too.

As far as I'm aware, that forum post is the only place that you'll find that specific syntax exploration. As my comment above denotes, it would not work anyway. Why I included it in the first place is beyond me - I'm working on making my writing more concise (which is why I carved out a section to explain that, and then this, and now this explanation of that....)

Conclusion

LLMs are really smart most of the time. But, once it reaches niche topics and doesn't have sufficient context, it begins to resemble myself early in my career. Open StackOverflow, Ctrl+C, Ctrl+V, Leeroy Jenkins style. I can't help but find it endearing.

]]>
https://bengarcia.dev/making-o1-o3-and-sonnet-3-7-hallucinate-for-everyone hacker-news-small-sites-43222027 Sat, 01 Mar 2025 18:24:22 GMT
<![CDATA[Magic isn't real]]> thread link) | @SchwKatze
February 28, 2025 | https://pthorpe92.dev/magic/ | archive.org

Any sufficiently advanced technology is indistinguishable from magic.

  • Arthur C. Clarke

This quote applies just as much to developers as it does non-tech people, sometimes more. I remember towards the beginning of my programming journey (both the first time I learned 18+ years ago, and again ~15 years later), the root cause of the feeling responsible for what they call tutorial hell (I personally loathe tutorials, and always chose to instead try to build things myself, and I attribute a great deal of the relative success I have achieved to this).

The situation:

You feel like you understand perfectly how to properly swing a hammer, lay brick, frame drywall, and you learned the right way to measure and cut beams with a saw, yet you still look at buildings and architecture and stand completely baffled that those tools you have learned were the same ones used to build these great structures. With no idea where to start, you stare at your tools, supplies and materials wondering if they must have some kind of special equipment or secret freemason knowledge that you don't have access to. You don't know how someone ended up with that result, using the same tools you see in front of you, and you definitely cannot imagine cutting the first board or laying the first brick.

Many know that this is the exact feeling of learning how to program, and fully grasping the concepts of loops, variables, data structures, trees, stacks, linked-lists, arrays, control flow, etc, etc, etc... then looking at a compiler, a video game, an operating system, web browser and thinking yeah right.... Those devs must all have started programming C and x86 assembly while they were in diapers, and all attended Stanford where they were taught secret knowledge passed down from Ken Thompson, by Brian Kernaghan himself.

Assuming you don't take the strict path of the JS frameworker vercel user: eventually after enough time, you start to recognize patterns. You 'go to definition' on enough methods from libraries you use to see how they are implemented and you build enough side projects and watch enough 'tsoding daily', 'sphaerophoria', and 'awesomekling' to begin to demystify at least how things like web/network protocols, or image/video encodings, syscalls/file IO operations work at some level. You no longer would feel completely lost if you had to write a shell or a lisp interpreter: you would at the very least know that to begin, you would probably have to read the source file into memory and break it up into tokens before trying to parse it to build the syntax tree needed so you can traverse and analyze it before stepping through it to execute the code. Previously, what now feels so obvious to you, would have seemed some kind of sorcery reserved only for the aforementioned programming elite.

I'm sure I'm not alone, in that each time you pull the curtain off a piece of 'magic', you have the same thought:

Oooooh yeah. I mean, well duh.. how else would you do that? I can't believe I couldn't see it.

As time goes on, there are less and less things I run into where I cannot mentally parse at least from a very broad and high level, what an implementation might look like. Now I definitely don't claim to know how kernel internals, 3d rendering, or GPU drivers work, but what I mean is most things have lost the shadowy mystique, and feel more like something I can get excited to learn about, rather than a scary forbidden knowledge I will never be allowed to possess. Although for those things, that may as well be the case ;)

The other day, after a long day's work managing synchronizing different environments/k8s clusters, I decided to browse HN as I normally do at that time. I ran into a post referencing comptime for Go, that linked to a github repo. It immediately caught my attention, as although I have not written Zig myself, Andrew Kelly is one of my programming idols and I definitely follow zig's development. Comptime is one of Zig's most envied language features, and although it is achievable via metaprogramming or constexpr in other languages, zig's straightforward procedural approach/API makes it particularly unique and admired.

This was when I came upon that familiar feeling:

How tf

Confused..

^^^^^^ Me if you had told me I had to implement comptime in go without touching the compiler

So I decided that I had to know how this was done, and I had a few hours to spare so I decided I would maybe try to contribute, or at least add some kind of feature of any level of value, just to force myself to understand what was going on here.

Then after a brief peruse through the code...

Turns out, you can use the source file information you get through this flag you can pass at build time in Go called -toolexec which allows you to invoke toolchain programs, in this case the prep binary, which is called with the absolute path of the program and by using a combination of another one of the author's packages, goinject, and the yaegi: (yet another elegant Go interpreter) library: you can get the AST, file decorator and import restorer, by implementing Modifier, which then allows you to collect the variables from the relevant function in the tree, output them each to a temporary file, on which you run the interpreter, giving you the computed results of foo in prep.Comptime(foo()), which you then use to replace the values in the DST by the Modify pass. viola, you have achieved compile time computations.

Oh, well yeah. That makes perfect sense. I mean how else did I think it was gonna work?

After a couple hours, I had added variable scoping, and global const declarations which I concluded was actually not a useful feature at all, because each function is evaluated on it's own, leaving essentially a 0% chance of actual naming/scope conflicts. But the point is, I didn't discover that until I had finished writing it with some tests, and although the 'feature' is useless, the whole process was a very valuable learning experience and all around good use of my time.

This is just a reminder to everyone at different levels of their developer journey, that the "magic" is not real and the overwhelming majority of the time, you are simply lacking the necessary context and it will likely make perfect sense to you as soon as you have it.

It's always worth your time to learn parts of the stack that you might not work in daily. As you build your fundamental understanding, it demystifies other pieces of the puzzle that you would never would have put together otherwise. Even if it doesn't feel important now, I guarantee the knowledge pays off at some point in the future.

Keep learning every day, strive for deeper understanding, and spend time building or hacking on even things that are considered 'solved problems'. Even if you are only paid to write React, it is very much of value to you and your career to understand how the internals work, or how your one-click 'serverless' auto-scaling deployments work...

(hint: servers)

]]>
https://pthorpe92.dev/magic/ hacker-news-small-sites-43214353 Sat, 01 Mar 2025 01:09:22 GMT
<![CDATA[Self-Hosting a Firefox Sync Server]]> thread link) | @shantara
February 28, 2025 | https://blog.diego.dev/posts/firefox-sync-server/ | archive.org

After switching from Firefox to LibreWolf, I became interested in the idea of self-hosting my own Firefox Sync server. Although I had seen this was possible before, I had never really looked into it—until now. I embarked on a journey to set this up, and while it wasn’t completely smooth sailing, I eventually got it working. Here’s how it went.

Finding the Right Sync Server

Initial Search: Mozilla’s Sync Server Repo

I started by searching for “firefox sync server github” and quickly found Mozilla’s syncserver repo. This is an all-in-one package designed for self-hosting a Firefox Sync server. It bundles both the tokenserver for authentication and syncstorage for storage, which sounded like exactly what I needed.

However, there were two red flags:

  1. The repository had “failed” tags in the build history.
  2. A warning was prominently displayed stating that the repository was no longer being maintained and pointing to a new project in Rust.

Switching to Rust: syncstorage-rs

With that in mind, I followed the link to syncstorage-rs, which is a modern, Rust-based version of the original project. It seemed like the more viable option, so I decided to move forward with this one. But first, I wanted to check if there was a ready-to-go Docker image to make deployment easier. Unfortunately, there wasn’t one, but the documentation did mention running it with Docker.

This is where things started to get complicated.

Diving Into Docker: Confusion and Complexity

Documentation Woes

The Docker documentation had some strange parts. For example, it mentioned:

  • Ensuring that grpcio and protobuf versions matched the versions used by google-cloud-rust-raw. This sounded odd—shouldn’t Docker handle version dependencies automatically?
  • Another confusing part was the instruction to manually copy the contents of mozilla-rust-sdk into the top-level root directory. Again, why wasn’t this step automated in the Dockerfile?

At this point, I was feeling a bit uneasy but decided to push forward. I reviewed the repo, the Dockerfile, the Makefile, and the circleci workflows. Despite all that, I was still unsure how to proceed.

A Simpler Solution: syncstorage-rs-docker

I then stumbled upon dan-r’s syncstorage-rs-docker repo, which had a much simpler Docker setup. The description explained that the author had also encountered issues with the original documentation and decided to create a Docker container for their own infrastructure.

At this point, I felt reassured that I wasn’t alone in my confusion, and decided to give this setup a try.

Setting Up the Server: Docker Compose and MariaDB

Docker Compose Setup

I copied the following services into my docker-compose.yaml:

  firefox_mariadb:
    container_name: firefox_mariadb
    image: linuxserver/mariadb:10.6.13
    volumes:
      - /data/ffsync/dbdata:/config
    restart: unless-stopped
    environment:
      MYSQL_DATABASE: syncstorage
      MYSQL_USER: sync
      MYSQL_PASSWORD: syncpass
      MYSQL_ROOT_PASSWORD: rootpass

  firefox_syncserver:
    container_name: firefox_syncserver
    build:
      context: /root/ffsync
      dockerfile: Dockerfile
      args:
        BUILDKIT_INLINE_CACHE: 1
    restart: unless-stopped
    ports:
      - "8000:8000"
    depends_on:
      - firefox_mariadb
    environment:
      LOGLEVEL: info
      SYNC_URL: https://mydomain/sync
      SYNC_CAPACITY: 5
      SYNC_MASTER_SECRET: mastersecret
      METRICS_HASH_SECRET: metricssecret
      SYNC_SYNCSTORAGE_DATABASE_URL: mysql://sync:usersync@firefox_mariadb:3306/syncstorage_rs
      SYNC_TOKENSERVER_DATABASE_URL: mysql://sync:usersync@firefox_mariadb:3306/tokenserver_rs

A few tips:

  • Be cautious with the database passwords. Avoid using special characters like "/|%" as they can cause issues during setup.
  • I added the BUILDKIT_INLINE_CACHE argument to the Docker Compose file to make better use of caching, which reduced build time while testing.

Initializing the Database

I cloned the repository and copied the Dockerfile and initdb.sh script to my server. After making some tweaks, I ran the following steps to get the database up and running:

  1. Bring up the MariaDB container:
    docker-compose up -d firefox_mariadb
    
  2. Make the initialization script executable and run it:
    chmod +x initdb.sh
    ./initdb.sh
    

Bringing the Stack Online

Finally, I brought up the entire stack with:

Configuring Reverse Proxy with Caddy

Next, I needed to update my Caddy reverse proxy to point to the new Sync server. I added the following configuration:

mydomain:443 {
     reverse_proxy firefox_syncserver:8000 {
    }
}

After updating Caddy with the DNS entry, I restarted the proxy and the sync server was up and running.

Challenges Faced

While I eventually got everything working, there were a few notable challenges along the way:

  1. Database persistence: I had issues with persistent data when restarting the MariaDB container. Make sure to clear out old data if needed.
  2. Server storage: My server ran out of space during the build process due to the size of the Docker images and intermediate files.
  3. Following the right steps: It took me a while to figure out the right steps, and much of the time was spent experimenting with the Docker setup.

Final Thoughts

Setting up a self-hosted Firefox Sync server is not the easiest task, especially if you’re not very familiar with Docker or database management. The official documentation is confusing, but thanks to community efforts like the syncstorage-rs-docker repo, it’s doable.

In the end, it took me about two hours to get everything running, but it was worth it. If you’re looking to control your own Firefox Sync server, this guide should help you avoid some of the pitfalls I encountered.

Happy syncing!

]]>
https://blog.diego.dev/posts/firefox-sync-server/ hacker-news-small-sites-43214294 Sat, 01 Mar 2025 01:03:48 GMT
<![CDATA[Virtual museum of socialist era graphic design in Bulgaria]]> thread link) | @NaOH
February 28, 2025 | http://socmus.com/en/ | archive.org

Unable to extract article]]>
http://socmus.com/en/ hacker-news-small-sites-43209046 Fri, 28 Feb 2025 18:58:13 GMT
<![CDATA[Misusing police database now over half of all cybercrime prosecutions in the UK [pdf]]]> thread link) | @luu
February 28, 2025 | https://www.cl.cam.ac.uk/~ah793/papers/2025police.pdf | archive.org

Unable to extract article]]>
https://www.cl.cam.ac.uk/~ah793/papers/2025police.pdf hacker-news-small-sites-43207171 Fri, 28 Feb 2025 16:06:29 GMT
<![CDATA[AI is killing some companies, yet others are thriving – let's look at the data]]> thread link) | @corentin88
February 28, 2025 | https://www.elenaverna.com/p/ai-is-killing-some-companies-yet | archive.org

AI is quietly upending the business models of major content sites. Platforms like WebMD, G2, and Chegg - once fueled by SEO and ad revenue - are losing traffic as AI-powered search and chatbots deliver instant answers. Users no longer need to click through pages when AI summarizes everything in seconds. Brian Balfour calls this phenomenon Product-Market Fit Collapse, a fitting term, marking it as the next big shift in tech.

Key milestones accelerating this shift:
📅 Nov 30, 2022 – ChatGPT launches
📅 Mar 14, 2023 – GPT-4 released
📅 May 14, 2024 – Google rolls out AI Overviews

❗Disclaimer: I'm simply observing traffic trends from an external perspective and don’t have insight into the exact factors driving them. The timing aligns with AI, but like any business, multiple factors are at play and each case is unique.

→ The data comes from SEMRush. If you want access to trend reports like the one below, you can try it for free.

WebMD: Where every symptom leads to cancer. They're crashing and burning and the timing aligns with major AI releases. If they don’t launch AI agents (like yesterday), they’re in trouble. That said, they still pull in ~90M visits a month.

Quora: Once the go-to platform where user-generated questions got a mix of expert insights and absolute nonsense - is struggling. And it’s no surprise. AI now delivers faster, (usually) more reliable answers. Yet, despite the challenges, Quora still pulls in just under 1 billion visits a month.

Stack Overflow: The Q&A platform for developers, is now facing seemingly direct competition from ChatGPT, which can generate and debug code instantly. As AI takes over, the community is fading - but they still attract around 200M visits a month.

Chegg: A popular platform for students - now getting schooled by AI. Weirdly, they’re fighting back by suing Google over AI snippets. Not sure what they expect… Google controls the traffic and that’s the risk of relying on someone else’s distribution.

G2: A software review platform, is experiencing huge drop in traffic levels. This one is so rough.

CNET: A technology news and reviews website is experiencing 70% traffic drop from 4 years ago. They still pull in 50 million visits per month - an impressive volume - but a steep drop from the 150 million they once had.

Just look at Reddit. Many say they are impacted, but traffic says otherwise - they are CRUSHING it. Probably because people are gravitating toward authentic content and a sense of community. I know I cannot go a day without a Reddit scroll (/r/LinkedInLunatics alone is worth visiting on the daily). And look at the y-axis: their traffic is in the billions!

And even Wikipedia is managing to stay afloat (although research AI tools will probably hit it pretty hard). Also, over 5B visits a month - consider me impressed.

And you know who else is growing? Substack. User-generated content FTW.

Edited by Melissa Halim

]]>
https://www.elenaverna.com/p/ai-is-killing-some-companies-yet hacker-news-small-sites-43206491 Fri, 28 Feb 2025 15:12:54 GMT
<![CDATA[Write to Escape Your Default Setting]]> thread link) | @kolyder
February 28, 2025 | https://kupajo.com/write-to-escape-your-default-setting/ | archive.org

For those of us with woefully average gray matter, our minds have limited reach. For the past, they are enthusiastic but incompetent archivists. In the present, they reach for the most provocative fragments of ideas, often preferring distraction over clarity.

Writing provides scaffolding. Structure for the unstructured, undisciplined mind. It’s a practical tool for thinking more effectively. And sometimes, it’s the best way to truly begin to think at all.

Let’s call your mind’s default setting ‘perpetual approximation mode.’  A business idea, a scrap of gossip, a trivial fact, a romantic interest, a shower argument to reconcile something long past. We spend more time mentally rehearsing activities than actually doing them. You can spend your entire life hopping among these shiny fragments without searching for underlying meaning until tragedy, chaos, or opportunity slaps you into awareness.

Writing forces you to tidy that mental clutter. To articulate things with a level of context and coherence the mind alone can’t achieve. Writing expands your working memory, lets you be more brilliant on paper than you can be in person.

While some of this brilliance comes from enabling us to connect larger and larger ideas, much of it comes from stopping, uh, non-brilliance. Writing reveals what you don’t know, what you can’t see when an idea is only held in your head. Biases, blind spots, and assumptions you can’t grasp internally.

At its best, writing (and reading) can reveal the ugly, uncomfortable, or unrealistic parts of your ideas. It can pluck out parasitic ideas burrowed so deeply that they imperceptibly steer your feelings and beliefs. Sometimes this uprooting will reveal that the lustrous potential of a new idea is a mirage, or that your understanding of someone’s motives was incomplete, maybe projected.

If you’re repeatedly drawn to a thought, feeling, or belief, write it out. Be fast, be sloppy. Just as children ask why, why, why, you can repeat the question “why do I think/feel/believe this?” a few times. What plops onto the paper may surprise you. So too will the headspace that clears from pouring out the canned spaghetti of unconnected thoughts.

“Writing about yourself seems to be a lot like sticking a branch into clear river-water and roiling up the muddy bottom.”

~Stephen King, Different Seasons (Book)

“I write entirely to find out what I’m thinking, what I’m looking at, what I see and what it means. What I want and what I fear.”

~Joan Didion, Why I Write (Article)

]]>
https://kupajo.com/write-to-escape-your-default-setting/ hacker-news-small-sites-43206174 Fri, 28 Feb 2025 14:45:36 GMT
<![CDATA[CouchDB Prevents Data Corruption: Fsync]]> thread link) | @fanf2
February 28, 2025 | https://neighbourhood.ie/blog/2025/02/26/how-couchdb-prevents-data-corruption-fsync | archive.org

Programming can be exciting when the underlying fundamentals you’ve been operating under suddenly come into question. Especially when it comes to safely storing data. This is a story of how the CouchDB developers had a couple of hours of excitement making sure their fundamentals were solid (and your data was safe).

Modern software projects are large enough that it is unlikely that a single person can fit all of its constituent parts in their working memory. As developers we have to be okay with selectively forgetting how the program we are working on at the moment works in some parts to make progress on others.

Countless programming techniques as old as time itself (01.01.1970) help with this phenomenon and are commonly categorised as abstractions. As programmers we build ourselves abstractions in order to be able to safely forget how some parts of a program work.

An abstraction is a piece of code, or module, or library, that has a public API that we can use and remember that tells us what we can do with the piece of code, and that we can remember to have certain guarantees. Say a module has a function makeBlue(thing): you don’t necessarily have to remember how the function makes thing blue, all you need to know is that it does.

CouchDB is not a particularly large piece of software, but it is a relatively long running one, having been started in 2005. Certain parts of CouchDB are relatively old, meaning they solve a specific problem and we worked hard at the time to make sure we solve that problem good and proper and now all we, the CouchDB developers, remember is that we did solve it and that we can trust it. After that we don’t have much need to reevaluate the code in the module on an ongoing basis, so we are prone to forget specific details of how it works.

One consequence of this is that if new information appears that might affect the design of the old and trusted module, you have to scramble to re-understand all the details to see how the module fares in light of the new information.

This happened the other week when the CouchDB developers came across Justin Jaffray’s second part of his “NULL BITMAP Builds a Database” series: “#2: Enter the Memtable”. In it, Justin describes three scenarios for how data is written to disk under certain failure situations and evaluates what that means for writing software that does not want to lose any data (you know, a database).

CouchDB has long prided itself on doing everything in its power to not lose any data by going above and beyond to make sure your data is safe, even in rare edge-cases. Some other databases do not go as far as CouchDB goes.

For a moment, the CouchDB development team had collectively expunged the details of how CouchDB keeps data safe on disk that we could not immediately evaluate if CouchDB was susceptible to data loss in the specific scenario outlined by Justin.

To understand the scenario, we have to explain how Unix systems — and especially Linux — reads and writes data to disk. Before we go there though, rest assured this had us sweating for a hot minute. The CouchDB dev team literally stopped any other work and got together to sort out whether there was something we had to do. Data safety truly is a top priority.

The Art of Reading and Writing Data to Disk

For Unix programs to operate on files, they have to acquire a file handle with the syscall open. Once acquired, the program can use the file handle to read from or to write to any data it likes by specifying an offset and a length, both in bytes, that describes where in the file and how much of the file should be read from or written to.

The Unix kernel will respond to these syscalls by accessing the filesystem the file lives on. A filesystem’s job is to organise an operating system’s files onto a storage mechanism (NVMe, SSDs, hard drives, block storage etc.) and provide fast and safe access to those files.

All file systems define a block size. That is a chunk of bytes that are always read or written in bulk. Common block sizes are 4096 or multiples thereof, like 8192 or 16384, sometimes even 128k. These block sizes, or pages exist so file systems can efficiently make use of all the available storage space.

A consequence of this is that if you just want to read a single byte from storage, the kernel and file system will read at least a page of data and then only return the one byte. Even with the lowest page size of 4096, that’s 4095 bytes read from disk in vain.

As a result, most programs try to avoid reading one byte at a time and instead aim for aligning their data in a way that maps directly to the page size or multiples thereof. For example, CouchDB uses a 4096 byte page, PostgreSQL uses 8192.

The fundamental trade-off that is made with the various options for page sizes is latency vs. throughput at the cost of I/O amplification. In our example earlier, reading a single byte is fastest (i.e. happens with the lowest latency) from a 4096 byte page, at a ~4000x read amplification cost. On the opposite end, reading 1GB of data for a movie stream in 4096 byte chunks has no direct amplification (all bytes read are actually needed), but that will require 250,000 read requests to the file system. A larger page size like 1M will greatly improve streaming throughput.

So there’s a value to getting the page size right for the kind of application. For databases this usually means making it as small as possible, as individual records should be returned quickly, without sacrificing too much streaming performance for larger pieces of data.

The final piece of the puzzle is the page cache. This is the Unix kernel keeping file system pages in memory so it can serve them faster the next time they are requested.

Say you read the page (0,4096) once, the kernel will instruct the filesystem to load the bytes from storage into a kernel memory buffer. When you then read that same page again, the kernel will respond with the in-memory bytes instead of talking to the file system and storage again. And since storage is ~800,000 times slower than main memory, your second read is going to be a lot faster.

The same is happening for writing pages: if you write a new page (4097,8192) and then immediately read it again, that read will be very fast indeed, thanks to the page cache.

So far so good. How could this go wrong?

When writing a new page, Unix kernel can choose to write it into the page cache and then return the write call as a success. At that point, the data only lives in kernel memory and if the machine this runs on has a sudden power outage or kernel panic or other catastrophic failure, that data will be gone by the time the system has rebooted.

That’s a problem for databases. When a database like CouchDB writes new data to storage, it must make sure the data actually fully made it to storage in a way that it can guarantee to read again later, even if the machine crashes. For that purpose, the Unix kernel provides another syscall: fsync, which tells the kernel to write the data actually onto storage and not just into the page cache.

However, because the page cache provides a ludicrous speed improvement, databases aim to not fsync every single page. Instead they try to fsync as little as possible, while making sure data makes it safely to storage.

What what happens if nobody ever calls fsync? Will the data be lost for good? Not quite: the Kernel will decide when to flush the block to disk if the CPU and and disk aren’t otherwise busy. If that never happens, eventually, the Kernel pauses processes that are writing to disk, so it can safely flush the cached blocks to disk.

Heads up: we are going to gloss about a lot of details here to keep this under 50,000 words.

CouchDB database files consist of one or more B+-trees and a footer. On startup a database file is opened and read backwards until it finds a valid footer. That footer contains, among some metadata, a pointer to each of the B+-trees, which are then used to fulfil whatever request for reading or writing data needs to be handled.

When writing new data, CouchDB adds pages with B+-tree nodes to the end of the database file and then writes a new footer after that, which includes a pointer to the newly written B+-tree nodes.

To recap, the steps for reading are:

  1. Open the database.
  2. Read backwards until a valid footer is found.
  3. Traverse the relevant B+-tree to read the data you are looking for.

For writing:

  1. Open the database.
  2. Read backwards until a valid footer is found.
  3. Add new B+-tree nodes to the end of the file.
  4. Add a new footer.
  bt = B+-tree node, f = footer
┌──┬──┬──┬──┬──┬──┬──┬──┐
│  │ ◄┼─ │  │ ◄┼─ │  │  │
│ ◄┼─ │  │  │  │  │ ◄┼─ │               db file
│  │  │  │ ◄┼──┼─ │  │  │
└──┴──┴──┴──┴──┴──┴──┴──┘
 bt bt f  bt bt f  bt f

A database file with three footers, i.e. a file that has received
three writes. The footer includes pointers to B+-tree nodes.

 bt = B+-tree node, f = footer
┌──┬──┬──┬──┬──┬──┬──┬──┌──┌──┌──┐
│  │  │  │  │  │  │  │  │  │ ◄┼─ │
│  │  │  │  │  │  │  │  │ ◄┼──┼─ │      db file
│  │  │  │  │  │  │  │  │  │  │  │
└──┴──┴──┴──┴──┴──┴──┴──└──└──└──┘
 bt bt f  bt bt f  bt f  bt bt f

 The same database file, with two more B+-tree nodes and footer

With all this information we can revisit The Sad Path in Justin’s post:

I do a write, and it goes into the log, and then the database crashes before we fsync. We come back up, and the reader, having not gotten an acknowledgment that their write succeeded, must do a read to see if it did or not. They do a read, and then the write, having made it to the OS's in-memory buffers, is returned. Now the reader would be justified in believing that the write is durable: they saw it, after all. But now we hard crash, and the whole server goes down, losing the contents of the file buffers. Now the write is lost, even though we served it!

Let’s translate this to our scenario:

  • “The log” is just “the database file” in CouchDB.
  • A “hard crash“ is a catastrophic failure as outlined above.
  • The “file buffers” are the page cache.

In the sad path scenario, we go through the 4 steps of writing data to storage. Without any fsyncs in place, CouchDB would behave as outlined. But CouchDB does not, as it does use fsyncs strategically. But where exactly?

CouchDB calls fsync after step 3 and again after step 4. This is to make sure that data referenced in the footer actually ends up in storage before the footer. That’s because storage is sometimes naughty and reorders writes for performance or just chaos reasons.

If CouchDB is terminated before the first fsync, no data has been written. On restart, the previously existing footer will be found and any data it points to can be read. This will not include the write that was just interrupted, as none of that made it to memory or storage yet and the request has not returned with a success to the original caller.

If CouchDB is terminated after first but before the second fsync, data will have made it both to the page cache and disk, but the footer might not have made it yet. If it did not, same as before: the previously existing footer will be found on restart, and the current writer will not have received a successful response. If it did make it, we know because of the first fsync that any data it points to will be safely on disk, so we can load it as a valid footer.

But what if the footer makes it to the page cache and not storage and we restart CouchDB, read the footer and retrieve its data from the page cache? The writer could issue a read to see if its data made it and if it does, not retry the write: Boom, we are in the sad path and if the machine now crashes that footer is gone. For good. And with it, any pointer to the data that was just written.

However, CouchDB is not susceptible to the sad path. Because it issues one more fsync: when opening the database. That fsync causes the footer page to be flushed to storage and only if that is successful, CouchDB allows access to the data in the database file (and page cache) because now it knows all data to be safely on disk.

After working out these details, the CouchDB team could return to their regularly scheduled work items as CouchDB has proven, once again, that it keeps your data safe. No matter what.

« Back to the blog post overview

]]>
https://neighbourhood.ie/blog/2025/02/26/how-couchdb-prevents-data-corruption-fsync hacker-news-small-sites-43205512 Fri, 28 Feb 2025 13:46:49 GMT
<![CDATA[Netboot Windows 11 with iSCSI and iPXE]]> thread link) | @terinjokes
February 28, 2025 | https://terinstock.com/post/2025/02/Netboot-Windows-11-with-iSCSI-and-iPXE/ | archive.org

A fictious screenshot of a permanent ban from a game, in the Windows 95 installer style, with a 90s-era PC and a joystick in the left banner. The text is titled "Permanent Suspension" and reads "Your account has been permanently suspended due to the use of unauthorized Operating Systems or unauthorized virtual machines. This type of behavior causes damage to our community and the game's competitive integrity. This action will not be reversed."

Purposefully ambiguous and fictious permanent ban.

(created with @foone’s The Death Generator)

My primary operating system is Linux: I have it installed on my laptop and desktop. Thanks to the amazing work of the WINE, CodeWeavers, and Valve developers, it’s also where I do PC gaming. I can spin up Windows in a virtual machine for the rare times I need to use it, and even pass through a GPU if I want to do gaming.

There is one pretty big exception: playing the AAA game ████████████████ with friends. Unfortunately, the developer only allows Windows. If you attempt to run the game on Linux or they detect you’re running in a virtual machine, your device and account are permanently banned. I would prefer not to be permanently banned.

For the past several years my desktop has also had a disk dedicated to maintaining a Windows install. I’d prefer to use the space in my PC case1 for disks for Linux. Since I already run a home NAS, and my Windows usage is infrequent, I wondered if I could offload the Windows install to my NAS instead. This lead me down the course of netbooting Windows 11 and writing up these notes on how to do a simplified “modern” version.

My first task was determining how to get a computer to boot from a NAS. My experience with network block devices is with Ceph RBD, where a device is mounted into an already running operating system. For booting over an Ethernet IP network the standard is iSCSI. A great way to boot from an iSCSI disk is with iPXE. To avoid any mistakes during this process, I removed all local drives from the system.2

I didn’t want to run a TFTP server on my home network, or reconfigure DHCP to provide TFTP configuration. Even if I did, the firmware for my motherboard is designed for “gamers”, there’s no PXE ROM. I can enable UEFI networking and a network boot option appears in the boot menu, but no DHCP requests are made3. Fortunately, iPXE is available as bootable USB image, which loaded and started trying to fetch configuration from the network.

Hitting ctrl-b as directed on screen to drop to the iPXE shell, I could verify basic functionality was working.

iPXE 1.21.1+ (e7585fe) -- Open Source Network Boot Firmware -- https://ipxe.org
Features: DNS FTP HTTP HTTPS iSCSI NFS TFTP VLAN SRP AoE EFI Menu
iPXE> dhcp
Configuring (net0 04:20:69:91:C8:DD)...... ok
iPXE> show ${net0/ip}
192.0.2.3

I decided to use tgt as the iSCSI target daemon on my NAS4 as the configuration seemed the least complicated. In /etc/tgt/targets.conf I configured it with two targets: one as the block device I wanted to install Windows onto and the other being the installation ISO.

<target iqn.2025-02.com.example:win-gaming>
    backing-store /dev/zvol/zroot/sans/win-gaming
    params thin-provisioning=1
</target>

<target iqn.2025-02.com.example:win11.iso>
    backing-store /opt/isos/Win11_24H2_English_x64.iso
    device-type cd
    readonly 1
</target>

Back on the PC, I could tell iPXE to use these iSCSI disks, then boot onto the DVD. As multiple network drives are being added, each must be given a different drive ID starting from 0x80.

iPXE> sanhook --drive 0x80 iscsi:nas.example.com:::1:iqn.2025-02.com.example:win-gaming
Registered SAN device 0x80
iPXE> sanhook --drive 0x81 iscsi:nas.example.com:::1:iqn.2025-02.com.example:win11.iso
Registered SAN device 0x81
iPXE> sanboot --drive 0x81
Booting from SAN device 0x81

After a minute of the Windows 11 logo and a spinner, the Windows 11 setup appears. In an ideal situation, I could immediately start installing. Unfortunately, the Windows 11 DVD does not ship drivers for my network card, and the iSCSI connection information passed to the booted system from iPXE couldn’t be used. I’m a bit impressed the GUI loaded at all, instead of just crashing.

To rectify this, I would need to build a Windows PE image that included my networking drivers. WinPE is the minimal environment used when installing Windows. Fortunately, Microsoft has made this pretty easy nowadays. I downloaded and installed the Windows Assessment and Deployment Kit and the Windows PE add-on. After running “Deployment and Imaging Tools Environment” as an administrator, I could make a folder containing a base WinPE image.

> mkdir C:\winpe
> copype amd64 C:\winpe\amd64

After mounting the image, I was able to slipstream the Intel drivers. I searched through the inf files to find the folder that supported my network card.

> imagex /mountrw C:\winpe\amd64\media\sources\boot.wim C:\winpe\amd64\mount
> dism /image:C:\winpe\amd64\mount /add-driver /driver:C:\temp\intel\PRO1000\Winx64\W11\
> imagex /unmount /commit C:\winpe\amd64\mount

This new image is what we need to boot into to install Windows. As my NAS is also running an HTTP server, I copied over the files relevant to netbooting: from “C:‍\winpe\amd64\media” I copied “boot/BCD”, “boot/boot.sdi”, and “sources/boot.wim”, preserving the folders. I also downloaded wimboot to the same directory.

You can use iPXE to execute a script fetched with HTTP, which I took advantage of to reduce the amount of typing I’ll need to do at the shell. I saved the following script as “install.ipxe” in the same HTTP directory.

#!ipxe

sanhook --drive 0x80 iscsi:nas.example.com:::1:iqn.2025-02.com.example:win-gaming
sanhook --drive 0x81 iscsi:nas.example.com:::1:iqn.2025-02.com.example:win11.iso
kernel wimboot
initrd boot/BCD BCD
initrd boot/boot.sdi boot.sdi
initrd sources/boot.wim boot.wim
boot

Rebooting back to the iPXE prompt I could then boot using this script.

iPXE> dhcp
iPXE> chain http://nas.example.com/ipxe/install.ipxe

After a few seconds I was booted into WinPE with a Command Prompt. The command “wpeinit” ran automatically, configuring the network card and mounting the iSCSI disks. I found the DVD had been mounted as drive “D”, and could start the Windows Setup with “D:‍\setup.exe”.

However, after reaching the “Searching for Disks” screen the installer closed itself without any error. This seems to be a bug with the new version of setup, as restarting it and selecting the “Previous Version of Setup” on an earlier page used a version of the installer that worked.

The installation was spread across several restarts. Fortunately, once the installation files are copied over, nothing but the main disk image is required, reducing what I needed to type in the iPXE shell. The HTTP server could also be cleaned up at this point.

iPXE> dhcp
iPXE> sanboot iscsi:nas.example.com:::1:iqn.2025-02.com.example:win-gaming

After several more minutes, and a forced installation of a Windows zero-day patch, I was greeted by a Windows 11 desktop, booted over iSCSI. Task Manager even reports the C drive as being “SDD (iSCSI)”.

Booting from a USB stick and typing into an iPXE prompt every time I want to boot into Windows isn’t a great user experience. Fortunately, iPXE is also available as an EFI application which can be installed to the local EFI System Partition. I also discovered that iPXE will execute commands provided on the command line.

I reinstalled the disks used for Linux, copied over ipxe.efi to the EFI System Partition, and added a new entry to systemd-boot by creating “$ESP/loader/entries/win11.conf”

title Windows 11 (iPXE)
efi /ipxe/ipxe.efi
options prompt && dhcp && sanboot iscsi:nas.example.com:::1:iqn.2025-02.com.example:win-gaming

There seems to be a bug where the first word in the options field is ignored.5 I used a valid iPXE command prompt, which also provides a clear signal should it ever start being interpreted in the future version.

After a little bit of extra setup (installing Firefox and switching to dark mode), I was able to install Steam and the game. The game took a little bit longer to install due the slower disk speed over my network (time to upgrade to 10GbE?), but there was no noticeable delay during normal gameplay. I didn’t see any network saturation or high disk latencies in Task Manager during loading.

]]>
https://terinstock.com/post/2025/02/Netboot-Windows-11-with-iSCSI-and-iPXE/ hacker-news-small-sites-43204604 Fri, 28 Feb 2025 11:47:52 GMT
<![CDATA[Turning my ESP32 into a DNS sinkhole to fight doomscrolling]]> thread link) | @venusgirdle
February 28, 2025 | https://amanvir.com/blog/turning-my-esp32-into-a-dns-sinkhole | archive.org

Unable to extract article]]>
https://amanvir.com/blog/turning-my-esp32-into-a-dns-sinkhole hacker-news-small-sites-43204091 Fri, 28 Feb 2025 10:39:01 GMT
<![CDATA[Video encoding requires using your eyes]]> thread link) | @zdw
February 27, 2025 | https://redvice.org/2025/encoding-requires-eyes/ | archive.org

In multimedia, the quality engineers are optimizing for is perceptual. Eyes, ears, and the brain processing their signals are enormously complex, and there’s no way to replicate everything computationally. There are no “objective” metrics to be had, just various proxies with difficult tradeoffs. Modifying video is particularly thorny, since like I’ve mentioned before on this blog there are various ways to subtly bias perception that are nonetheless undesirable, and are impossible to correct for.

This means there’s no substitute for actually looking at the results. If you are a video engineer, you must look at sample output and ask yourself if you like what you see. You should do this regularly, but especially if you’re considering changing anything, and even more so if ML is anywhere in your pipeline. You cannot simply point at metrics and say “LGTM”! In this particular domain, if the metrics and skilled human judgement are in conflict, the metrics are usually wrong.

Netflix wrote a post on their engineering blog about a “deep downscaler” for video, and unfortunately it’s rife with issues. I originally saw the post due to someone citing it, and was incredibly disappointed when I clicked through and read it. Hopefully this post offers a counter to that!

I’ll walk through the details below, but they’re ultimately all irrelevant; the single image comparison Netflix posted looks like this (please ‘right-click -> open image in new tab’ so you can see the full image and avoid any browser resampling):

Downscaler comparison

Note the ringing, bizarre color shift, and seemingly fake “detail”. If the above image is their best example, this should not have shipped – the results look awful, regardless of the metrics. The blog post not acknowledging this is embarrassing, and it makes me wonder how many engineers read this and decided not to say anything.

The Post

Okay, going through this section by section:

How can neural networks fit into Netflix video encoding?

There are, roughly speaking, two steps to encode a video in our pipeline:

1. Video preprocessing, which encompasses any transformation applied to the high-quality source video prior to encoding. Video downscaling is the most pertinent example herein, which tailors our encoding to screen resolutions of different devices and optimizes picture quality under varying network conditions. With video downscaling, multiple resolutions of a source video are produced. For example, a 4K source video will be downscaled to 1080p, 720p, 540p and so on. This is typically done by a conventional resampling filter, like Lanczos.

Ignoring the awful writing[1], it’s curious that they don’t clarify what Netflix was using previously. Is Lanczos an example, or the current best option[2]? This matters because one would hope they establish a baseline to later compare the results against, and that baseline should be the best reasonable existing option.

2. Video encoding using a conventional video codec, like AV1. Encoding drastically reduces the amount of video data that needs to be streamed to your device, by leveraging spatial and temporal redundancies that exist in a video.

I once again wonder why they mention AV1, since in this case I know it’s not what the majority of Netflix’s catalog is delivered as; they definitely care about hardware decoder support. Also, this distinction between preprocessing and encoding isn’t nearly as clean as this last sentence implies, since these codecs are lossy, and in a way that is aware of the realities of perceptual quality.

We identified that we can leverage neural networks (NN) to improve Netflix video quality, by replacing conventional video downscaling with a neural network-based one. This approach, which we dub “deep downscaler,” has a few key advantages:

I’m sure that since they’re calling it a deep downscaler, it’s actually going to use deep learning, right?

1. A learned approach for downscaling can improve video quality and be tailored to Netflix content.

Putting aside my dislike of the phrase “a learned approach” here, I’m very skeptical of “tailored to Netflix content” claim. Netflix’s catalog is pretty broad, and video encoding has seen numerous attempts at content-based specialization that turned out to be worse than focusing on improving things generically and adding tuning knobs. The encoder that arguably most punched above its weight class, x264, was mostly developed on Touhou footage.

2. It can be integrated as a drop-in solution, i.e., we do not need any other changes on the Netflix encoding side or the client device side. Millions of devices that support Netflix streaming automatically benefit from this solution.

Take note of this for later: Netflix has many different clients and this assumes no changes to them.

3. A distinct, NN-based, video processing block can evolve independently, be used beyond video downscaling and be combined with different codecs.

Doubt

Of course, we believe in the transformative potential of NN throughout video applications, beyond video downscaling. While conventional video codecs remain prevalent, NN-based video encoding tools are flourishing and closing the performance gap in terms of compression efficiency. The deep downscaler is our pragmatic approach to improving video quality with neural networks.

“Closing the performance gap” is a rather optimistic framing of that, but I’ll save this for another post.

Our approach to NN-based video downscaling

The deep downscaler is a neural network architecture designed to improve the end-to-end video quality by learning a higher-quality video downscaler. It consists of two building blocks, a preprocessing block and a resizing block. The preprocessing block aims to prefilter the video signal prior to the subsequent resizing operation. The resizing block yields the lower-resolution video signal that serves as input to an encoder. We employed an adaptive network design that is applicable to the wide variety of resolutions we use for encoding.

Downscaler comparison

I’m not sure exactly what they mean by the adaptive network design here. A friend has suggested that maybe this just means fixed weights on the preprocessing block? I am, however, extremely skeptical of their claim that the results will generate to a wide variety of resolutions. Avoiding overfitting here would be fairly challenging, and there’s nothing in the post that inspires confidence they managed to overcome those difficulties. They hand-wave this away, but it seems critical to the entire project.

During training, our goal is to generate the best downsampled representation such that, after upscaling, the mean squared error is minimized. Since we cannot directly optimize for a conventional video codec, which is non-differentiable, we exclude the effect of lossy compression in the loop. We focus on a robust downscaler that is trained given a conventional upscaler, like bicubic. Our training approach is intuitive and results in a downscaler that is not tied to a specific encoder or encoding implementation. Nevertheless, it requires a thorough evaluation to demonstrate its potential for broad use for Netflix encoding.

Finally some details! I was curious how they’d solve the lack of a reference when training a downscaling model, and this sort of explains it; they optimized for PSNR when upscaled back to the original resolution, post-downscaling. My immediate thoughts upon reading this:

  1. Hrm, PSNR isn’t great[3].
  2. Which bicubic are we actually talking about? This is not filling me with confidence that the author knows much about video.
  3. So this is like an autoencoder, but with the decoder replaced with bicubic upscaling?
  4. Doesn’t that mean the second your TV decides to upscale with bilinear this all falls apart?
  5. Does Netflix actually reliably control the upscaling method on client devices[4]? They went out of their way to specify earlier that the project assumed no changes to the clients, after all!
  6. I wouldn’t call this intuitive. To be honest, it sounds kind of dumb and brittle.
  7. Not tying this to a particular encoder is sensible, but their differentiability reason makes no sense.

The weirdest part here is the problem formulated in this way actually has a closed-form solution, and I bet it’s a lot faster to run than a neural net! ML is potentially good in more ambiguous scenarios, but here you’ve simplified things to the point that you can just do some math and write some code instead[5]!

Improving Netflix video quality with neural networks

The goal of the deep downscaler is to improve the end-to-end video quality for the Netflix member. Through our experimentation, involving objective measurements and subjective visual tests, we found that the deep downscaler improves quality across various conventional video codecs and encoding configurations.

Judging from the example at the start, the subjective visual tests were conducted by the dumb and blind.

For example, for VP9 encoding and assuming a bicubic upscaler, we measured an average VMAF Bjøntegaard-Delta (BD) rate gain of ~5.4% over the traditional Lanczos downscaling. We have also measured a ~4.4% BD rate gain for VMAF-NEG. We showcase an example result from one of our Netflix titles below. The deep downscaler (red points) delivered higher VMAF at similar bitrate or yielded comparable VMAF scores at a lower bitrate.

Again, what’s the actual upscaling filter being used? And while I’m glad the VMAF is good, the result looks terrible! This means the VMAF is wrong. But also, the whole reason they’re following up with VMAF is because PSNR is not great and everyone knows it; it’s just convenient to calculate. Finally, how does VP9 come into play here? I’m assuming they’re encoding the downscaled video before upscaling, but the details matter a lot.

Besides objective measurements, we also conducted human subject studies to validate the visual improvements of the deep downscaler. In our preference-based visual tests, we found that the deep downscaler was preferred by ~77% of test subjects, across a wide range of encoding recipes and upscaling algorithms. Subjects reported a better detail preservation and sharper visual look. A visual example is shown below. [note: example is the one from above]

And wow, coincidentally, fake detail and oversharpening are common destructive behaviors from ML-based filtering that unsophisticated users will “prefer” despite making the video worse. If this is the bar, just run Warpsharp on everything and call it a day[6]; I’m confident you’ll get a majority of people to say it looks better.

This example also doesn’t mention what resolution the video was downscaled to, so it’s not clear if this is even representative of actual use-cases. Once again, there are no real details about how the tests with conducted, so I have no way to judge whether the experiment structure made sense.

We also performed A/B testing to understand the overall streaming impact of the deep downscaler, and detect any device playback issues. Our A/B tests showed QoE improvements without any adverse streaming impact. This shows the benefit of deploying the deep downscaler for all devices streaming Netflix, without playback risks or quality degradation for our members.

Translating out the jargon, this means they didn’t have a large negative effect on compressability. This is unsurprising.

How do we apply neural networks at scale efficiently?

Given our scale, applying neural networks can lead to a significant increase in encoding costs. In order to have a viable solution, we took several steps to improve efficiency.

Yes, which is why the closed-form solution almost certainly is faster.

The neural network architecture was designed to be computationally efficient and also avoid any negative visual quality impact. For example, we found that just a few neural network layers were sufficient for our needs. To reduce the input channels even further, we only apply NN-based scaling on luma and scale chroma with a standard Lanczos filter.

OK cool, so it’s not actually deep. Why should words have meaning, after all? Only needing a couple layers is not too shocking when, again, there’s a closed-form solution available.

Also, while applying this to only the luma is potentially a nice idea, if it’s shifting the brightness around you can get very weird results. I imagine this is what causes the ‘fake detail’ in the example above.

We implemented the deep downscaler as an FFmpeg-based filter that runs together with other video transformations, like pixel format conversions. Our filter can run on both CPU and GPU. On a CPU, we leveraged oneDnn to further reduce latency.

OK sure, everything there runs on FFmpeg so why not this too.

Integrating neural networks into our next-generation encoding platform

The Encoding Technologies and Media Cloud Engineering teams at Netflix have jointly innovated to bring Cosmos, our next-generation encoding platform, to life. Our deep downscaler effort was an excellent opportunity to showcase how Cosmos can drive future media innovation at Netflix. The following diagram shows a top-down view of how the deep downscaler was integrated within a Cosmos encoding microservice.

Downscaler comparison

Buzzword buzzword buzzword buzzword buzzword. I especially hate “encoding stratum function”.

A Cosmos encoding microservice can serve multiple encoding workflows. For example, a service can be called to perform complexity analysis for a high-quality input video, or generate encodes meant for the actual Netflix streaming. Within a service, a Stratum function is a serverless layer dedicated to running stateless and computationally-intensive functions. Within a Stratum function invocation, our deep downscaler is applied prior to encoding. Fueled by Cosmos, we can leverage the underlying Titus infrastructure and run the deep downscaler on all our multi-CPU/GPU environments at scale.

Why is this entire section here? This should all have been deleted. Also, once again, buzzword buzzword buzzword buzzword buzzword.

What lies ahead

The deep downscaler paves the path for more NN applications for video encoding at Netflix. But our journey is not finished yet and we strive to improve and innovate. For example, we are studying a few other use cases, such as video denoising. We are also looking at more efficient solutions to applying neural networks at scale. We are interested in how NN-based tools can shine as part of next-generation codecs. At the end of the day, we are passionate about using new technologies to improve Netflix video quality. For your eyes only!

I’m not sure a downscaler that takes a problem with a closed-form solution and produces terrible results paves the way for much of anything except more buzzword spam. I look forward to seeing what they will come up with for denoising!


Thanks to Roger Clark and Will Overman for reading a draft of this post. Errors are of course my own.

]]>
https://redvice.org/2025/encoding-requires-eyes/ hacker-news-small-sites-43201720 Fri, 28 Feb 2025 04:33:26 GMT
<![CDATA[macOS Tips and Tricks (2022)]]> thread link) | @pavel_lishin
February 27, 2025 | https://saurabhs.org/macos-tips | archive.org

Unable to extract article]]>
https://saurabhs.org/macos-tips hacker-news-small-sites-43201417 Fri, 28 Feb 2025 03:34:14 GMT
<![CDATA[Putting Andrew Ng's OCR models to the test]]> thread link) | @ritvikpandey21
February 27, 2025 | https://www.runpulse.com/blog/putting-andrew-ngs-ocr-models-to-the-test | archive.org

February 27, 2025

3 min read

Putting Andrew Ng’s OCR Models to The Test

Today, Andrew Ng, one of the legends of the AI world, released a new document extraction service that went viral on X (link here). At Pulse, we put the models to the test with complex financial statements and nested tables – the results were underwhelming to say the least, and suffer from many of the same issues we see when simply dumping documents into GPT or Claude.

Our engineering team, along with many X users, discovered alarming issues when testing complex financial statements:

  • Over 50% hallucinated values in complex financial tables
  • Missing negative signs and currency markers
  • Completely fabricated numbers in several instances
  • 30+ second processing times per document

Ground Truth

Andrew Ng OCR Output

Pulse Output

When financial decisions worth millions depend on accurate extraction, these errors aren't just inconvenient – they're potentially catastrophic.

Let’s run through some quick math: in a typical enterprise scenario with 1,000 pages containing 200 elements per page (usually repeated over tens of thousands of documents), even 99% accuracy still means 2,000 incorrect entries. That's 2,000 potential failure points that can completely compromise a data pipeline. Our customers have consistently told us they need over 99.9% accuracy for mission-critical operations. With probabilistic LLM models, each extraction introduces a new chance for error, and these probabilities compound across thousands of documents, making the failure rate unacceptably high for real-world applications where precision is non-negotiable.

As we've detailed in our previous viral blog post, using LLMs alone for document extraction creates fundamental problems. Their nondeterministic nature means you'll get different results on each run. Their low spatial awareness makes them unsuitable for complex layouts in PDFs and slides. And their processing speed presents serious bottlenecks for large-scale document processing.

At Pulse, we've taken a different approach that delivers:

  • Accurate extraction with probability of errors slowly approaching 0
  • Complete table, chart and graph data preservation
  • Low-latency processing time per document

Our solution combines proprietary table transformer models built from the ground up with traditional computer vision algorithms. We use LLMs only for specific, controlled tasks where they excel – not as the entire extraction pipeline. 

If your organization processes financial, legal, or healthcare documents at scale and needs complete reliability (or really any industry where accuracy is non-negotiable), we'd love to show you how Pulse can transform your workflow.

Book a demo here to see the difference for yourself.

]]>
https://www.runpulse.com/blog/putting-andrew-ngs-ocr-models-to-the-test hacker-news-small-sites-43201001 Fri, 28 Feb 2025 02:24:04 GMT
<![CDATA[Crossing the uncanny valley of conversational voice]]> thread link) | @nelwr
February 27, 2025 | https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice#demo | archive.org

February 27, 2025

Brendan Iribe, Ankit Kumar, and the Sesame team

How do we know when someone truly understands us? It is rarely just our words—it is in the subtleties of voice: the rising excitement, the thoughtful pause, the warm reassurance.

Voice is our most intimate medium as humans, carrying layers of meaning through countless variations in tone, pitch, rhythm, and emotion.

Today’s digital voice assistants lack essential qualities to make them truly useful. Without unlocking the full power of voice, they cannot hope to effectively collaborate with us. A personal assistant who speaks only in a neutral tone has difficulty finding a permanent place in our daily lives after the initial novelty wears off.

Over time this emotional flatness becomes more than just disappointing—it becomes exhausting.

Achieving voice presence

At Sesame, our goal is to achieve “voice presence”—the magical quality that makes spoken interactions feel real, understood, and valued. We are creating conversational partners that do not just process requests; they engage in genuine dialogue that builds confidence and trust over time. In doing so, we hope to realize the untapped potential of voice as the ultimate interface for instruction and understanding.

Key components

  • Emotional intelligence: reading and responding to emotional contexts.
  • Conversational dynamics: natural timing, pauses, interruptions and emphasis.
  • Contextual awareness: adjusting tone and style to match the situation.
  • Consistent personality: maintaining a coherent, reliable and appropriate presence.

We’re not there yet

Building a digital companion with voice presence is not easy, but we are making steady progress on multiple fronts, including personality, memory, expressivity and appropriateness. This demo is a showcase of some of our work in conversational speech generation. The companions shown here have been optimized for friendliness and expressivity to illustrate the potential of our approach.

Conversational voice demo

1. Microphone permission is required. 2. Calls are recorded for quality review but not used for ML training and are deleted within 30 days. 3. By using this demo, you are agreeing to our Terms of Use and Privacy Policy. 4. We recommend using Chrome (Audio quality may be degraded in iOS/Safari 17.5).

Technical post

Authors

Johan Schalkwyk, Ankit Kumar, Dan Lyth, Sefik Emre Eskimez, Zack Hodari, Cinjon Resnick, Ramon Sanabria, Raven Jiang

To create AI companions that feel genuinely interactive, speech generation must go beyond producing high-quality audio—it must understand and adapt to context in real time. Traditional text-to-speech (TTS) models generate spoken output directly from text but lack the contextual awareness needed for natural conversations. Even though recent models produce highly human-like speech, they struggle with the one-to-many problem: there are countless valid ways to speak a sentence, but only some fit a given setting. Without additional context—including tone, rhythm, and history of the conversation—models lack the information to choose the best option. Capturing these nuances requires reasoning across multiple aspects of language and prosody.

To address this, we introduce the Conversational Speech Model (CSM), which frames the problem as an end-to-end multimodal learning task using transformers. It leverages the history of the conversation to produce more natural and coherent speech. There are two key takeaways from our work. The first is that CSM operates as a

single-stage model, thereby improving efficiency and expressivity. The second is our

evaluation suite, which is necessary for evaluating progress on contextual capabilities and addresses the fact that common public evaluations are saturated.

Background

One approach to modeling audio with transformers is to convert continuous waveforms into discrete audio token sequences using tokenizers. Most contemporary approaches ([1], [2]) rely on two types of audio tokens:

  1. Semantic tokens: Compact speaker-invariant representations of semantic and phonetic features. Their compressed nature enables them to capture key speech characteristics at the cost of high-fidelity representation.
  2. Acoustic tokens: Encodings of fine-grained acoustic details that enable high-fidelity audio reconstruction. These tokens are often generated using Residual Vector Quantization (RVQ) [2]. In contrast to semantic tokens, acoustic tokens retain natural speech characteristics like speaker-specific identity and timbre.

A common strategy first models semantic tokens and then generates audio using RVQ or diffusion-based methods. Decoupling these steps allows for a more structured approach to speech synthesis—the semantic tokens provide a compact, speaker-invariant representation that captures high-level linguistic and prosodic information, while the second-stage reconstructs the fine-grained acoustic details needed for high-fidelity speech. However, this approach has a critical limitation; semantic tokens are a bottleneck that must fully capture prosody, but ensuring this during training is challenging.

RVQ-based methods introduce their own set of challenges. Models must account for the sequential dependency between codebooks in a frame. One method, the delay pattern (figure below) [3], shifts higher codebooks progressively to condition predictions on lower codebooks within the same frame. A key limitation of this approach is that the time-to-first-audio scales poorly because an RVQ tokenizer with N codebooks requires N backbone steps before decoding the first audio chunk. While suitable for offline applications like audiobooks, this delay is problematic in a real-time scenario.

Example of delayed pattern generation in an RVQ tokenizer with 4 codebooks

Conversational Speech Model

CSM is a multimodal, text and speech model that operates directly on RVQ tokens. Inspired by the RQ-Transformer [4], we use two autoregressive transformers. Different from the approach in [5], we split the transformers at the zeroth codebook. The first multimodal backbone processes interleaved text and audio to model the zeroth codebook. The second audio decoder uses a distinct linear head for each codebook and models the remaining N – 1 codebooks to reconstruct speech from the backbone’s representations. The decoder is significantly smaller than the backbone, enabling low-latency generation while keeping the model end-to-end.

CSM model inference process. Text (T) and audio (A) tokens are interleaved and fed sequentially into the Backbone, which predicts the zeroth level of the codebook. The Decoder then samples levels 1 through N – 1 conditioned on the predicted zeroth level. The reconstructed audio token (A) is then autoregressively fed back into the Backbone for the next step, continuing until the audio EOT symbol is emitted. This process begins again on the next inference request, with the interim audio (such as a user utterance) being represented by interleaved audio and text transcription tokens.

Both transformers are variants of the Llama architecture. Text tokens are generated via a Llama tokenizer [6], while audio is processed using Mimi, a split-RVQ tokenizer, producing one semantic codebook and N – 1 acoustic codebooks per frame at 12.5 Hz. [5] Training samples are structured as alternating interleaved patterns of text and audio, with speaker identity encoded directly in the text representation.

Compute amortization

This design introduces significant infrastructure challenges during training. The audio decoder processes an effective batch size of B × S and N codebooks autoregressively, where B is the original batch size, S is the sequence length, and N is the number of RVQ codebook levels. This high memory burden even with a small model slows down training, limits model scaling, and hinders rapid experimentation, all of which are crucial for performance.

To address these challenges, we use a compute amortization scheme that alleviates the memory bottleneck while preserving the fidelity of the full RVQ codebooks. The audio decoder is trained on only a random 1/16 subset of the audio frames, while the zeroth codebook is trained on every frame. We observe no perceivable difference in audio decoder losses during training when using this approach.

Amortized training process. The backbone transformer models the zeroth level across all frames (highlighted in blue), while the decoder predicts the remaining N – 31 levels, but only for a random 1/16th of the frames (highlighted in green). The top section highlights the specific frames modeled by the decoder for which it receives loss.

Experiments

Dataset: We use a large dataset of publicly available audio, which we transcribe, diarize, and segment. After filtering, the dataset consists of approximately one million hours of predominantly English audio.

Model Sizes: We trained three model sizes, delineated by the backbone and decoder sizes:

  • Tiny: 1B backbone, 100M decoder
  • Small: 3B backbone, 250M decoder
  • Medium: 8B backbone, 300M decoder

Each model was trained with a 2048 sequence length (~2 minutes of audio) over five epochs.

Samples

Paralinguistics

Sentences from Base TTS

Foreign words

Sentences from Base TTS

Contextual expressivity

Samples from Expresso, continuation after chime

Pronunciation correction

Pronunciation correction sentence is a recording, all other audio is generated.

Conversations with multiple speakers

Single generation using audio prompts from two speakers

Evaluation

Our evaluation suite measures model performance across four key aspects: faithfulness to text, context utilization, prosody, and latency. We report both objective and subjective metrics—objective benchmarks include word error rate and novel tests like homograph disambiguation, while subjective evaluation relies on a Comparative Mean Opinion Score (CMOS) human study using the Expresso dataset.

Objective metrics

Traditional benchmarks, such as word error rate (WER) and speaker similarity (SIM), have become saturated—modern models, including CSM, now achieve near-human performance on these metrics.

Objective metric results for Word Error Rate (top) and Speaker Similarity (bottom) tests, showing the metrics are saturated (matching human performance).

To better assess pronunciation and contextual understanding, we introduce a new set of phonetic transcription-based benchmarks.

  • Text understanding through Homograph Disambiguation: Evaluates whether the model correctly pronounced different words with the same orthography (e.g., “lead” /lɛd/ as in “metal” vs. “lead” /liːd/ as in “to guide”).
  • Audio understanding through Pronunciation Continuation Consistency: Evaluates whether the model maintains pronunciation consistency of a specific word with multiple pronunciation variants in multi-turn speech. One example is “route” (/raʊt/ or /ruːt/), which can vary based on region of the speaker and context.

Objective metric results for Homograph Disambiguation (left) and Pronunciation Consistency (right) tests, showing the accuracy percentage for each model’s correct pronunciation. Play.ht, Elevenlabs, and OpenAI generations were made with default settings and voices from their respective API documentation.

The graph above compares objective metric results across three model sizes. For Homograph accuracy we generated 200 speech samples covering 5 distinct homographs—lead, bass, tear, wound, row—with 2 variants for each and evaluated pronunciation consistency using wav2vec2-lv-60-espeak-cv-ft. For Pronunciation Consistency we generated 200 speech samples covering 10 distinct words that have common pronunciation variants—aunt, data, envelope, mobile, route, vase, either, adult, often, caramel.

In general, we observe that performance improves with larger models, supporting our hypothesis that scaling enhances the synthesis of more realistic speech.

Subjective metrics

We conducted two Comparative Mean Opinion Score (CMOS) studies using the Expresso dataset to assess the naturalness and prosodic appropriateness of generated speech for CSM-Medium. Human evaluators were presented with pairs of audio samples—one generated by the model and the other a ground-truth human recording. Listeners rated the generated sample on a 7-point preference scale relative to the reference. Expresso’s diverse expressive TTS samples, including emotional and prosodic variations, make it a strong benchmark for evaluating appropriateness to context.

In the first CMOS study we presented the generated and human audio samples with no context and asked listeners to “choose which rendition feels more like human speech.” In the second CMOS study we also provide the previous 90 seconds of audio and text context, and ask the listeners to “choose which rendition feels like a more appropriate continuation of the conversation.” Eighty people were paid to participate in the evaluation and rated on average 15 examples each.

Subjective evaluation results on the Expresso dataset. No context: listeners chose “which rendition feels more like human speech” without knowledge of the context. Context: listeners chose “which rendition feels like a more appropriate continuation of the conversation” with audio and text context. 50:50 win–loss ratio suggests that listeners have no clear preference.

The graph above shows the win-rate of ground-truth human recordings vs CSM-generated speech samples for both studies. Without conversational context (top), human evaluators show no clear preference between generated and real speech, suggesting that naturalness is saturated. However, when context is included (bottom), evaluators consistently favor the original recordings. These findings suggest a noticeable gap remains between generated and human prosody in conversational speech generation.

Open-sourcing our work

We believe that advancing conversational AI should be a collaborative effort. To that end, we’re committed to open-sourcing key components of our research, enabling the community to experiment, build upon, and improve our approach. Our models will be available under an Apache 2.0 license.

Limitations and future work

CSM is currently trained on primarily English data; some multilingual ability emerges due to dataset contamination, but it does not perform well yet. It also does not take advantage of the information present in the weights of pre-trained language models.

In the coming months, we intend to scale up model size, increase dataset volume, and expand language support to over 20 languages. We also plan to explore ways to utilize pre-trained language models, working towards large multimodal models that have deep knowledge of both speech and text.

Ultimately, while CSM generates high quality conversational prosody, it can only model the text and speech content in a conversation—not the structure of the conversation itself. Human conversations are a complex process involving turn taking, pauses, pacing, and more. We believe the future of AI conversations lies in fully duplex models that can implicitly learn these dynamics from data. These models will require fundamental changes across the stack, from data curation to post-training methodologies, and we’re excited to push in these directions.

Join us

If you’re excited about building the most natural, delightful, and inspirational voice interfaces out there, reach out—we’re hiring. Check our open roles.

]]>
https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice#demo hacker-news-small-sites-43200400 Fri, 28 Feb 2025 00:55:00 GMT
<![CDATA[World-first experimental cancer treatment paves way for clinical trial]]> thread link) | @femto
February 27, 2025 | https://www.wehi.edu.au/news/world-first-experimental-cancer-treatment-paves-way-for-clinical-trial/ | archive.org

An Australian-led international clinical trial will scientifically investigate the efficacy of the approach within a large cohort of eligible glioblastoma patients and could commence within a year.

The study will trial the use of double immunotherapy. In some patients, double immunotherapy will be combined with chemotherapy.

The trial will be led by The Brain Cancer Centre, which has world-leading expertise in glioblastoma.

“I am delighted to be handing the baton to Dr Jim Whittle, a leading Australian neuro-oncologist at Peter MacCallum Cancer Centre, The Royal Melbourne Hospital and Co-Head of Research Strategy at The Brain Cancer Centre, to commence a broader scientific study to scientifically determine if – and how – this process might work in treating glioblastoma,” said Prof Long, who also secured drug access for the clinical trial.

“While we are buoyed by the results of this experimental treatment so far, a clinical trial in a large group of patients would need to happen before anyone could consider it a possible breakthrough.”

Dr Whittle, also a laboratory head at WEHI, said: “We are pleased to be able to build on this exciting work by diving into the process of designing a clinical trial, which takes time, care and accuracy.

“When that process is complete, the result will be a world first clinical trial that enables us to thoroughly test the hypothesis against a representative sample of patients.”

The Brain Cancer Centre was founded by Carrie’s Beanies 4 Brain Cancer and established in partnership with WEHI with support from the Victorian Government.

The centre brings together a growing network of world-leading oncologists, immunologists, neurosurgeons, bioinformaticians and cancer biologists.

Commencement of recruitment for the clinical trial will be announced by The Brain Cancer Centre at a later date and will be limited to eligible patients.

]]>
https://www.wehi.edu.au/news/world-first-experimental-cancer-treatment-paves-way-for-clinical-trial/ hacker-news-small-sites-43199210 Thu, 27 Feb 2025 22:24:22 GMT
<![CDATA[Logs Matter More Than Metrics]]> thread link) | @mathewpregasen
February 27, 2025 | https://www.hyperdx.io/blog/logs-matter-more-than-metrics | archive.org

Disclosure: I run an observability company, so this post is subject to some (heavy) bias. However, it also underscores why I wanted to work on HyperDX.

Metrics matter. Logs matter more.

But that’s not how most developers see it. Developers love metrics. It’s something that they put care and attention into. Developers call meetings to figure out how to implement and interpret metrics. They are readily shown to new hires—colorful dashboards with sprawling statistics measuring CPU, memory, and network health. Once, when demoing my product, I was told by a engineering director, “This is cool, but where are the fancy charts?”

Logs get none of that hype. They are the ugly stepchild of observability. They get implemented, but with the attitude that you’d treat a necessary evil. They don’t get meetings dedicated to them. They’re never flaunted to new hires. They just exist, quietly recording events in the background.

Here’s the irony: while metrics might have the aesthetic of a complex system, logs are more useful 80% of the time. When an incorrect behavior emerges, logs are more likely to explain what happened than any metrics. Logs—particularly logs with high cardinality—provide a detailed recollection. They feature no dimension reduction. And metrics, by definition, do. They are just a crude read of a bug’s effect on an application.

Not All Logs Are Created Equal

The importance of logs is partially diminished because they are poorly implemented in many organizations. The difference between a good log and a great log is striking.

Great logs are those with attributes that can tie an event to the source of the issue (e.g. a user_id, payment, host, etc.). This is often framed as logs with high cardinality. High cardinality means that the log includes multiple fields containing unique values. For example, a front-end logged event might include a session ID, a request ID, a user ID, an organization ID, a payment ID, a timestamp, and a network trace. High cardinality like this is a heuristic for a log actually being useful in the case of an error.

Tricky Bugs Where Logs Are the Saving Grace

I have two contrasting examples that illustrate the value of logs.

The Socket Timeout

A while ago, we had a weird issue with sockets—customers reported certain queries would unpredictably time-out. On our dashboard, there were no reports of failed ClickHouse queries—however, customers failed to get data that originated in ClickHouse. Looking through our traces associated with those specific customers and timestamps, we discovered the error: The ClickHouse query succeeded, but the load balancer’s socket timed out before ClickHouse could reply. This was obvious by comparing the timestamps of the socket and the ClickHouse response, as well as observing the corresponding error returned within our API.

Using the logs, we were able to correlate the types of requests that would lead to the same behavior. Additionally, on the ClickHouse side, we could determine what query properties caused sluggish performance. These details are all things untraceable to a spurious failure metric.

Glofox Fake “DDoS”

Pierre Vincent has a fantastic developer talk (opens in a new tab) at InfoQ’s Dev Summit (opens in a new tab) where he discusses logs versus metrics. Pierre works at Glofox (opens in a new tab), a gym management software company. A few years ago, they experienced an incident that highlighted how metrics could be misleading in the absence of great logs.

Because Glofox creates gym software, the pandemic significantly impacted their product’s usage. Gyms suddenly closed (and subsequently opened) on government orders. On one of these reopening dates, Glofox experienced a massive surge in requests, which lit up metrics.

Through metrics, Glofox appeared to be suffering from a DDoS attack originating in Singapore. The easy remedy would be blocking all the IPs dispatching thousands of requests. Singapore was also reopening gyms that day, and Pierre suspected the incident wasn’t actually an attack. But it also wasn’t just returning users; the requests were overwhelming.

By diving through logs, Glofox’s engineering team nailed the culprit: Glofox’s front-end had a bug where lengthy sessions would dispatch more and more requests due to an unintentional JS loop. Many of Glofox’s Singaporean customers had been shut down for months but had minimized tabs. By reopening these tabs, Glofox’s back end was inundated by months of quarantined requests, which imitated a DDoS attack.

Only because of logs was Glofox able to diagnose the problem and devise a quick remedy that enabled their application to persist on one of the most important days of the year.

Developer Religions

I’ll admit this debate hinges on some concept of developer religions—the idea that developers, myself included, have strong beliefs because of some hypothetical ideal. Some developers swear by the importance of metrics; I care more about capturing high cardinality data through logs.

But to be clear, it is ridiculous to believe one should exist at the demise of the other. It’s more a matter of foundations. In my worldview, high cardinality should be the north star for building a good observability stack; metrics should follow.

Funnily enough, I hold the opposite belief regarding our marketing strategy. For marketing, I care more about metrics than individual stories. That’s because marketing is an optimizing outcomes problem—strategies succeed or fail on the basis on an aggregate. That mindset doesn’t hold when it comes to development, where the goal is to eliminate issues that any user is facing.

A Closing Thought

Logs matter. They matter in the same vein that testing matters, CI/CD matters, security matters. Without good logs, errors turn from nuisances to headaches. So next time that your team brings up the importance of metrics, push aside the hype of fancy charts to spend time improving your logs. Of course, you can take my opinion with a grain of salt—I run a observability company that’s built on good logs—but there’s a reason that I ended up in this space.

]]>
https://www.hyperdx.io/blog/logs-matter-more-than-metrics hacker-news-small-sites-43199096 Thu, 27 Feb 2025 22:11:47 GMT
<![CDATA[OpenCloud 1.0]]> thread link) | @doener
February 27, 2025 | https://opencloud.eu/en/news/opencloud-now-available-new-open-source-alternative-microsoft-sharepoint | archive.org

Unable to retrieve article]]>
https://opencloud.eu/en/news/opencloud-now-available-new-open-source-alternative-microsoft-sharepoint hacker-news-small-sites-43198572 Thu, 27 Feb 2025 21:13:42 GMT
<![CDATA[Show HN: Prompting LLMs in Bash scripts]]> thread link) | @chilipepperhott
February 27, 2025 | https://elijahpotter.dev/articles/prompting_large_language_models_in_bash_scripts | archive.org

I’ve been ex­per­i­ment­ing with us­ing LLMs lo­cally for gen­er­at­ing datasets to test Harper against. I might write a blog post about the tech­nique (which I am grandiosely call­ing LLM-assisted fuzzing”), but I’m go­ing to make you wait.

I’ve writ­ten a lit­tle tool called ofc that lets you in­sert Ollama into your bash scripts. I think it’s pretty neat, since it (very eas­ily) lets you do some pretty cool things.

For ex­am­ple, you can swap out the sys­tem prompt, so if you want to com­pare be­hav­ior across prompts, you can just toss it in a loop:

#!/bin/bash

subreddits=("r/vscode" "r/neovim" "r/wallstreetbets")


for subreddit in "${subreddits[@]}"; do
  echo "++++++++ BEGIN $subreddit ++++++++"
  ofc --system-prompt "Assume the persona of a commenter of $subreddit" "What is your opinion on pepperjack cheese."
  cat
done

Or, you can in­struct a model to prompt it­self:

ofc --system-prompt "$(ofc "Write a prompt for a large language model that makes it think harder. ")" "What is a while loop?"

ofc is in­stal­lable from ei­ther crates.io or its repos­i­tory.

cargo install ofc --locked


cargo install --git https://github.com/elijah-potter/ofc --locked
]]>
https://elijahpotter.dev/articles/prompting_large_language_models_in_bash_scripts hacker-news-small-sites-43197752 Thu, 27 Feb 2025 19:46:55 GMT
<![CDATA[Accessing region-locked iOS features, such as EU app stores]]> thread link) | @todsacerdoti
February 27, 2025 | https://downrightnifty.me/blog/2025/02/27/eu-features-outside.html | archive.org

The European Union's Digital Markets Act obligates Apple to provide certain features to iOS users in the EU, such as third party app stores. I live in the US and was able to develop a relatively-straightforward method to spoof your location on iOS and access these features, as well as any other region-locked iOS features you might be interested in experimenting with, even if you aren't in the required region.

If you look at the reverse engineered documentation, it would seem to be difficult to fool Apple's countryd service, since it uses almost all available hardware radios to determine your location – GPS, Wi-Fi, Bluetooth, and cellular. However, Apple has developed a "priority" system, roughly ranking the relative reliability of each location determination method. Since Location Services has the highest priority value, if it returns a location result, the results from the other methods seem to be ignored. Location Services relies solely on GPS and nearby Wi-Fi access points if Airplane Mode is enabled (and Wi-Fi re-enabled). Therefore, if you can spoof Wi-Fi geolocation (or if you can spoof GPS), then you can access region-locked features from anywhere, even on the iPhone with its wide array of radios.

On non-cellular iPad models, it has the potential to be even easier, because they only use Location Services (which can be disabled), or Wi-Fi country codes (which can be trivially spoofed). I was able to get this spoofing method working as well. However, it's not covered here.

I tested this with:

  • 2 ESP32 units creating 25 spoofed networks each (total 50)
  • iOS 18.2.1 on an iPhone 15, and an iPad Mini 6th gen

I was amazed at how consistent and reliable spoofing is, especially accounting for the low cost of the hardware involved and the simplicity of the spoofing software and method.

Most of the work was already done by Lagrange Point and Adam Harvey, developer of the Skylift tool. I was inspired by Lagrange Point's article to experiment with this and to reproduce their results. Check out their article on enabling Hearing Aid mode on AirPods in unsupported regions!

Please note that Apple could make the checks more difficult to bypass in the future through iOS updates. They don't have much of a reason to, since the current system is most likely more than sufficient to deter the average user from doing this, but it's very possible.

Contents

Procedure

What you'll need

  • Some experience with the command line
  • An iOS/iPadOS device with a USB-C port (recent iPads, or iPhone 15+)
    • You might be able to make it work on a Lightning iPhone, but it's much easier with a USB-C port + hub
  • A USB-C hub with Ethernet, HDMI out, and several USB ports
  • A USB keyboard and mouse
  • A USB-C extension cable
  • A display with HDMI input
  • One or two "faraday pouches"; make sure one is large enough to hold your device, and if buying a second make sure it's large enough to hold the other one
    • Any other faraday cage setup allowing only the tip of a single USB-C cable to break through the cage will work too, but these pouches make it easy
    • In my testing, using two pouches did reduce the number of external Wi-Fi networks appearing on the Wi-Fi list to zero, but I was still able to make it work with only one pouch – WMMV
  • A router that you can install a VPN on
    • You'll need to plug the router directly in to the device via an Ethernet cable, so a secondary/portable router is preferred
  • Access to a VPN service with an option to place yourself in an EU country
  • One or more ESP32 dev modules (preferably at least two)
  • A small battery power supply for the ESP32 modules (a small USB power bank works)
  • A free WiGLE account

These instructions assume you're using a Unix shell, so you might have to modify some of the commands slightly if you're on Windows.

Preparing the router

  1. Install a VPN on your router placing you in your chosen target country.
  2. Select an EU member state supported by your VPN as a spoof target. I chose the Netherlands.

Preparing the device

Creating a secondary Apple ID

You can't easily change the region on your Apple ID, and you probably don't want to do that anyway. But you can create a secondary Apple ID for use only while your device thinks that it's in the EU.

  1. Enable Airplane Mode and disable Bluetooth and Wi-Fi.
  2. Connect the device to the USB-C hub, and the USB-C hub to the router via Ethernet.
  3. Change your device region to your target country in Settings → General → Language & Region → Region.
  4. Sign out of your Apple ID: Settings → Your Account → Sign Out.
    • You'll need to sign out completely (including iCloud) in order to create a new account. Your data will not be lost. When you switch accounts again in the future, you only need to sign out of the App Store ("Media & Purchases"), not iCloud as well.
  5. Create a new Apple ID.
    • You can use the same phone number that's attached to your other Apple ID, or a Google Voice number.
    • For email, you'll need to either create an iCloud email, or use a "plus-style address".
  6. Make sure the Apple ID region is correct: App Store → Your Account → Your Account → Country/Region.
  7. Install at least one free app from the App Store to initialize the account.

Getting Wi-Fi data

  1. Find a popular indoor public attraction offering free Wi-Fi within the target country using Google Maps or similar software. I chose the Rijksmuseum. Note down the GPS coordinates of the center of the building.
  2. Imagine a rectangle surrounding the building and note down the GPS coordinates of the top-left and bottom-right points.
  3. Create a free account on WiGLE.
  4. Query the WiGLE database using the online API interface with these parameters:
    1. latrange1: lesser of two latitudes you noted
    2. latrange2: greater of two latitudes you noted
    3. longrange1: lesser of two longitudes you noted
    4. longrange2: greater of two longitudes you noted
    5. closestLat: latitude of center of building
    6. closestLong: longitude of center of building
    7. resultsPerPage: 25*n where n is the number of ESP32 units you have (e.g. 50 for 2 units)
  5. Execute the request, then download the response as JSON
  6. Clone the skylift repository:
    git clone https://github.com/DownrightNifty/skylift
    
  7. Set up skylift:
    cd skylift/
    python3 -m venv .venv
    source .venv/bin/activate
    pip install -r requirements.txt
    pip install setuptools
    python setup.py develop
    
  8. Convert the JSON data to the format used by skylift:
    # replace $PATH_TO_JSON, $TARGET_LAT, and $TARGET_LONG
    python ./extras/wigle_to_skylift.py $PATH_TO_JSON ./w2s_out $TARGET_LAT $TARGET_LONG
    
  9. Create the arduino sketch(es):
    c=1
    for file in ./w2s_out/*; do
        skylift create-sketch -i "$file" -o ./out_"$c" --max-networks 25 --board esp32
        ((c++))
    done
    
  10. Use the Arduino IDE to upload each sketch to each ESP32 unit.

Pre-generated Wi-Fi data

If you're having trouble with acquiring the data yourself, you could try using the sample data that I generated. If a large number of people start using it, I don't know if it will continue to work indefinitely, so please use your own data if possible.

The sample data can be found under the generated/ directory in my fork of Skylift.

Placing the device in the faraday pouch

  1. Before you continue, check the device's settings:
    1. Enable Airplane Mode, disable Bluetooth, and re-enable Wi-Fi.
    2. [Optional] Disable your lock screen (this makes controlling the device externally easier).
    3. [Optional] Make sure Apple Maps is allowed to access your location "while using the app": Settings → Privacy & Security → Location Services → Maps. Required because ask-for-permission prompts annoyingly don't appear on external displays.
    4. [iPhone only] Enable AssistiveTouch: Settings → Accessibility → Touch → AssistiveTouch. Required to enable mouse support on iPhone.
    5. Make sure you're signed in to the App Store with the EU Apple ID you created earlier: Settings → Your Account → Media & Purchases. Signing in to iCloud as well is unnecessary.
  2. Connect the USB-C extension cable to the device.
  3. [⚠️ Important] Insulate the ESP32 units from the metallic faraday pouch using plastic bags or something.
  4. Connect the ESP32 units to the battery.
  5. Place the device into a faraday pouch, along with the ESP32 units and their battery. Seal it as well as possible with only the tip of the cable sticking out (reducing its ability to let in radio signals).
    • If one pouch doesn't work, try using two pouches (placing one inside the other)
  6. Connect the USB-C hub to the cable. Connect the router via Ethernet, and a keyboard, mouse, and display via HDMI.

Spoofing location and unlocking EU features

Your iOS device should now only see the spoofed Wi-Fi networks, and cannot receive a GPS signal. Since we have a cable sticking out, this isn't a perfect faraday cage and it's possible that especially strong signals such as cell signals will make their way through, but that's okay.

  1. Make sure that you can control the device inside the faraday pouch using the connected keyboard, mouse, and display, and that the device has internet access through Ethernet.
  2. [Optional] Check the nearby Wi-Fi list to make sure you can only see fake Wi-Fi networks.
    • If you see one or two nearby networks, that should still be okay; the geolocation service seems to ignore irregularities like this and returns the most likely location result, considering all nearby networks.
    • 5GHz Wi-Fi is stronger than 2.4GHz. You could temporarily disable 5GHz on your main router if that helps.
  3. Disable Location Services and then re-enable it.
  4. Open Apple Maps and check to make sure it places you inside your target country.
  5. You should now have access to EU features such as third party app stores. Try installing AltStore PAL at: https://altstore.io/download

If it doesn't work the first time around, disable Location Services and re-enable it, then try again.

Caveats

"Third party app stores" != "sideloading"

I've written at length about why third party app stores aren't the same as "sideloading". Check out my new project, "Does iOS have sideloading yet?", below!

https://www.doesioshavesideloadingyet.com/

The 30 day grace period

Once you take your device out of the faraday pouch and it realizes that it's no longer in the EU, a 30-day grace period begins during which you can use EU features freely. After the grace period, certain features will become restricted. You'll still be able to use any apps from alternative app stores you downloaded, but they'll no longer receive updates.

However, you can simply repeat the location spoof process again once each month, if you want to continue to access these features.

Acknowledgements

Appendix: Notes on Apple's Eligibility system

Apple's Eligibility system has been mostly reverse engineered and documented, but I wanted to add some of my notes here for future reference.

As noted in the Lagrange Point article, you can monitor the activity of the eligibility service by monitoring the device's system logs, either through Console.app on macOS, or libimobiledevice on other platforms. This command is especially helpful:

idevicesyslog | grep RegulatoryDomain

Here's a sample output:

Here's the different location estimate methods, sorted by priority from lowest to highest:

  • WiFiAP (1): Uses the two-digit country codes of nearby Wi-Fi access points
  • ServingCellMCC (2): Uses the MCC code of the cell tower that the device is currently connected to(?)
  • NearbyCellsMCC (3): Uses the MCC codes of nearby cell towers
  • LatLonLocation (4): Uses coordinates from Location Services (GPS/Wi-Fi)

According to the Apple Wiki article:

countryd uses a mix of all signals to decide which country is the most likely physical location of the device.

However, I found that, in practice, if conflicting information is available, countryd will simply use the estimate with the highest priorty.

]]>
https://downrightnifty.me/blog/2025/02/27/eu-features-outside.html hacker-news-small-sites-43197163 Thu, 27 Feb 2025 18:45:18 GMT
<![CDATA[Show HN: Compiler Playground for energy-efficient embedded dataflow processor]]> thread link) | @keyi
February 27, 2025 | https://www.efficient.computer/resources/effcc-compiler-playground-launch | archive.org

We’re excited to announce the official launch of our effcc Compiler Playground, a new, interactive software ecosystem for our processor. For the first time, developers can now sign up to experience the performance of our breakthrough processor architecture and see first-hand how it can offer orders of magnitude greater energy efficiency.

As developers ourselves, we have experienced the frustrations of compiler friction and complexity. The effcc Compiler Playground was created to offer an interactive first look at our compiler and demonstrate how code is distributed and executed on the Fabric architecture. Just write or copy and paste C code into the Playground and the effcc Compiler automatically maps your code onto the Efficient dataflow architecture, identifying the most energy-efficient and performant representation for the Efficient Fabric. The visualization shows how your code is distributed to the tiles of the Fabric - the core architecture of the processor. Additionally, the Playground highlights the execution flow of the code, cycle-to-cycle, illuminating each operation tile-to-tile. 

The Playground also presents a debugger feature, which enables developers to see placement and routes at a more granular level. By zeroing in on specific tiles, users are provided with a more detailed look at how operations will function on the Efficient processor. This enables developers to quickly and intuitively optimize the performance of their code for the processor.

Finally, the Playground also offers visual energy estimates of battery life for a given application. This allows users to get a sense of the orders of magnitude improvement in energy efficiency when running an application on our Fabric processor compared to other processors available on the market today. The populated graph illustrates how much longer an application can run (in years) with our processor versus the alternatives.

We’re thrilled to share this first look at the Efficient processor architecture’s class-defining efficiency and exceptional developer experience. We’re committed to empowering our community with the necessary tools to push the boundaries of efficiency, while creating innovative, general-purpose computing applications. Please apply for our Early Access Program for the Playground to experience the benefits first hand. 

]]>
https://www.efficient.computer/resources/effcc-compiler-playground-launch hacker-news-small-sites-43197100 Thu, 27 Feb 2025 18:38:47 GMT
<![CDATA[ADHD Guide to Spaced Repetition]]> thread link) | @lakesare
February 27, 2025 | https://page.brick.do/adhd-guide-to-spaced-repetition-4ja9ZO4DXLM2 | archive.org

Unable to retrieve article]]>
https://page.brick.do/adhd-guide-to-spaced-repetition-4ja9ZO4DXLM2 hacker-news-small-sites-43196788 Thu, 27 Feb 2025 18:04:15 GMT
<![CDATA[Goodbye K-9 Mail]]> thread link) | @todsacerdoti
February 27, 2025 | https://cketti.de/2025/02/26/goodbye-k9mail/ | archive.org

TL;DR: I quit my job working on Thunderbird for Android and K-9 Mail at MZLA.

My personal journey with K-9 Mail started in late 2009, shortly after getting my first Android device1. The pre-installed Email app didn’t work very well with my email provider. When looking for alternatives, I discovered K-9 Mail. It had many of the same issues2. But it was an active open source project that accepted contributions. I started fixing the problems I was experiencing and contributed these changes to K-9 Mail. It was a very pleasant experience and so I started fixing bugs reported by other users.

In February 2010, Jesse Vincent, the founder of the K-9 Mail project, offered me commit access to the Subversion3 repository. According to my email archive, I replied with the following text:

Thank you! I really enjoyed writing patches for K-9 and gladly accept your offer. But I probably won’t be able to devote as much time to the project as I do right now for a very long time. I hope that’s not a big problem.

My prediction turned out to be not quite accurate. I was able to spend a lot of time working on K-9 Mail and quickly became one of the most active contributors.

In 2012, Jesse hired me to work on Kaiten Mail, a commercial closed-source fork of K-9 Mail. The only real differences between the apps were moderate changes to the user interface. So most of the features and bug fixes we created for Kaiten Mail also went into K-9 Mail. This was important to me and one of the reasons I took the job.

In early 2014, Jesse made me the K-9 Mail project leader4. With Kaiten Mail, end-user support was eating up a lot of time and eventually motivation to work on the app. So we stopped working on it around the same time and the app slowly faded away.

To pay the bills, I started working as a freelancing Android developer5. Maybe not surprisingly, more often than not I was contracted to work on email clients. Whenever I was working on a closed source fork of K-9 Mail6, I had a discounted hourly rate that would apply when working on things that were contributed to K-9 Mail. This was mostly bug fixes, but also the odd feature every now and then.

After a contract ended in 2019, I decided to apply for a grant from the Prototype Fund to work on adding JMAP support to K-9 Mail7. This allowed me to basically work full-time on the project. When the funding period ended in 2020, the COVID-19 pandemic was in full swing. At that time I didn’t feel like looking for a new contract. I filled my days working on K-9 Mail to mute the feeling of despair about the world. I summarized my 2020 in the blog post My first year as a full-time open source developer.

Eventually I had to figure out how to finance this full-time open source developer lifestyle. I ended up asking K-9 Mail users to donate so I could be paid to dedicate 80% of my time to work on the app. This worked out quite nicely and I wrote about it here: 2021 in Review.

I first learned about plans to create a Thunderbird version for Android in late 2019. I was approached because one of the options considered was basing Thunderbird for Android on K-9 Mail. At the time, I wasn’t really interested in working on Thunderbird for Android. But I was more than happy to help turn the K-9 Mail code base into something that Thunderbird could use as a base for their own app. However, it seemed the times where we had availability to work on such a project never aligned. And so nothing concrete happened. But we stayed in contact.

In December 2021, it seemed to have become a priority to find a solution for the many Thunderbird users asking for an Android app. By that time, I had realized that funding an open source project via donations requires an ongoing fundraising effort. Thunderbird was already doing this for quite some time and getting pretty good at it. I, on the other hand, was not looking forward to the idea of getting better at fundraising.
So, when I was asked again whether I was interested in K-9 Mail and myself joining the Thunderbird project, I said yes. It took another six months for us to figure out the details and announce the news to the public.

Once under the Thunderbird umbrella, we worked8 on adding features to K-9 Mail that we wanted an initial version of Thunderbird for Android to have. The mobile team slowly grew to include another Android developer, then a manager. While organizationally the design team was its own group, there was always at least one designer available to work with the mobile team on the Android app. And then there were a bunch of other teams to do the things for which you don’t need Android engineers: support, communication, donations, etc.

In October 2024, we finally released the first version of Thunderbird for Android. The months leading up to the release were quite stressful for me. All of us were working on many things at the same time to not let the targeted release date slip too much. We never worked overtime, though. And we got additional paid time off after the release ❤️

After a long vacation, we started 2025 with a more comfortable pace. However, the usual joy I felt when working on the app, didn’t return. I finally realized this at the beginning of February, while being sick in bed and having nothing better to do than contemplating life.
I don’t think I was close to a burnout – work wasn’t that much fun anymore, but it was far from being unbearable. I’ve been there before. And in the past it never was a problem to step away from K-9 Mail for a few months. However, it’s different when it’s your job. But since I am in the very fortunate position of being able to afford taking a couple of months off, I decided to do just that. So the question was whether to take a sabbatical or to quit.
Realistically, permanently walking away from K-9 Mail never was an option in the past. There was no one else to take over as a maintainer. It would have most likely meant the end of the project. K-9 Mail was always too important to me to let that happen.
But this is no longer an issue. There’s now a whole team behind the project and me stepping away no longer is an existential threat to the app.

I want to explore what it feels like to do something else without going back to the project being a foregone conclusion. That is why I quit my job at MZLA.

It was a great job and I had awesome coworkers. I can totally recommend working with these people and will miss doing so 😢


I have no idea what I’ll end up doing next. A coworker asked me whether I’ll stick to writing software or do something else entirely. I was quite surprised by this question. Both because in hindsight it felt like an obvious question to ask and because I’ve never even considered doing something else. I guess that means I’m very much still a software person and will be for the foreseeable future.

During my vacation I very much enjoyed being a beginner and learning about technology I haven’t worked with as a developer before (NFC smartcards, USB HID, Bluetooth LE). So I will probably start a lot of personal projects and finish few to none of them 😃

I think there’s a good chance that – after an appropriately sized break – I will return as a volunteer contributor to K-9 Mail/Thunderbird for Android.

But for now, I say: Goodbye K-9 Mail 👋


This leaves me with saying thank you to everyone who contributed to K-9 Mail and Thunderbird for Android over the years. People wrote code, translated the app, reported bugs, helped other users, gave money, promoted the app, and much more. Thank you all 🙏


]]>
https://cketti.de/2025/02/26/goodbye-k9mail/ hacker-news-small-sites-43196436 Thu, 27 Feb 2025 17:26:21 GMT
<![CDATA[Distributed systems programming has stalled]]> thread link) | @shadaj
February 27, 2025 | https://www.shadaj.me/writing/distributed-programming-stalled | archive.org

Over the last decade, we’ve seen great advancements in distributed systems, but the way we program them has seen few fundamental improvements. While we can sometimes abstract away distribution (Spark, Redis, etc.), developers still struggle with challenges like concurrency, fault tolerance, and versioning.

There are lots of people (and startups) working on this. But nearly all focus on tooling to help analyze distributed systems written in classic (sequential) programming languages. Tools like Jepsen and Antithesis have advanced the state-of-the-art for verifying correctness and fault tolerance, but tooling is no match for programming models that natively surface fundamental concepts. We’ve already seen this with Rust, which provides memory safety guarantees that are far richer than C++ with AddressSanitizer.

If you look online, there are tons of frameworks for writing distributed code. In this blog post, I’ll make the case that they only offer band-aids and sugar over three fixed underlying paradigms: external-distribution, static-location, and arbitrary-location. We’re still missing a programming model that is native to distributed systems. We’ll walk through these paradigms then reflect on what’s missing for a truly distributed programming model.


External-distribution architectures are what the vast majority of “distributed” systems look like. In this model, software is written as sequential logic that runs against a state management system with sequential semantics:

  • Stateless Services with a Distributed Database (Aurora DSQL, Cockroach)
  • Services using gossiped CRDT state (Ditto, ElectricSQL, Redis Enterprise)111. This may come as a surprise. CRDTs are often marketed as a silver bullet for all distributed systems, but another perspective is they simply accelerate distributed transactions. Software running over CRDTs is still sequential. This may come as a surprise. CRDTs are often marketed as a silver bullet for all distributed systems, but another perspective is they simply accelerate distributed transactions. Software running over CRDTs is still sequential.
  • Workflows and Step Functions

These architectures are easy to write software in, because none of the underlying distribution is exposed222. Well that’s the idea, at least. Serializability typically isn’t the default (snapshot isolation is), so concurrency bugs can sometimes be exposed. Well that’s the idea, at least. Serializability typically isn’t the default (snapshot isolation is), so concurrency bugs can sometimes be exposed. to the developer! Although this architecture results in a distributed system, we do not have a distributed programming model.

There is little need to reason about fault-tolerance or concurrency bugs (other than making sure to opt into the right consistency levels for CRDTs). So it’s clear why developers opt for this option, since it hides the distributed chaos under a clean, sequential semantics. But this comes at a clear cost: performance and scalability.

Serializing everything is tantamount to emulating a non-distributed system, but with expensive coordination protocols. The database forms a single point of failure in your system; you either hope that us-east-1 doesn’t go down or switch to a multi-writer system like Cockroach that comes with its own performance implications. Many applications are at sufficiently low scale to tolerate this, but you wouldn’t want to implement a counter like this.


Static-location architectures are the classic way to write distributed code. You compose several units—each written as local (single-machine) code that communicates with other machines using asynchronous network calls:

  • Services communicating with API calls, possibly using async / await (gRPC, REST)
  • Actors (Akka, Ray, Orleans)
  • Services polling and pushing to a shared pub/sub (Kafka)

These architectures give us full, low-level control. We’re writing a bunch of sequential, single-machine software with network calls. This is great for performance and fault-tolerance because we control what gets run where and when.

But the boundaries between networked units are rigid and opaque. Developers must make one-way decisions on how to break up their application. These decisions have a wide impact on correctness; retries and message ordering are controlled by the sender and unknown to the recipient. Furthermore, the language and tooling have limited insight into how units are composed. Jump-to-definition is often unavailable, and serialization mismatches across services can easily creep in.

Most importantly, this approach to distributed systems fundamentally eliminates semantic co-location and modularity. In sequential code, things that happen one after the other are textually placed one after the other and function calls encapsulate entire algorithms. But with static-location architectures, developers are coerced to modularize code on machine boundaries, rather than on semantic boundaries. In these architectures there is simply no way to encapsulate a distributed algorithm as a single, unified semantic unit.

Although static-location architectures offer developers the most low-level control over their system, in practice they are difficult to implement robustly without distributed systems expertise. There is a fundamental mismatch between implementation and execution: static-location software is written as single-machine code, but the correctness of the system requires reasoning about the fleet of machines as a whole. Teams building such systems often live in fear of concurrency bugs and failures, leading to mountains of legacy code that are too critical to touch.


Arbitrary-location architectures are the foundation of most “modern” approaches to distributed systems. These architectures simplify distributed systems by letting us write code as if it were running on a single machine, but at runtime the software is dynamically executed across several machines333. Actor frameworks don’t really count even if they support migration, since the developer still has to explicitly define the boundaries of an actor and specify where message passing happens Actor frameworks don’t really count even if they support migration, since the developer still has to explicitly define the boundaries of an actor and specify where message passing happens :

  • Distributed SQL Engines
  • MapReduce Frameworks (Hadoop, Spark)
  • Stream Processing (Flink, Spark Streaming, Storm)
  • Durable Execution (Temporal, DBOS, Azure Durable Functions)

These architectures elegantly handle the co-location problem since there are no explicit network boundaries in the language/API to split your code across. But this simplicity comes at a significant cost: control. By letting the runtime decide how the code is distributed, we lose the ability to make decisions about how the application is scaled, where the fault domains lie, and when data is sent over the network.

Just like the external-distribution model, arbitrary-location architectures often come with a performance cost. Durable execution systems typically snapshot their state to a persistent store between every step444. With some optimizations when a step is a pure, deterministic function With some optimizations when a step is a pure, deterministic function . Stream processing systems may dynamically persist data and are free to introduce asynchrony across steps. SQL users are at the mercy of the query optimizer, to which they at best can only give “hints” on distribution decisions.

We often need low-level control over where individual logic is placed for performance and correctness. Consider implementing Two-Phase Commit. This protocol has explicit, asymmetric roles for a leader that broadcasts proposals and workers that acknowledge them. To correctly implement such a protocol, we need to explicitly assign specific logic to these roles, since quorums must be determined on a single leader and each worker must atomically decide to accept or reject a proposal. It’s simply not possible to implement such a protocol in an arbitrary-location architecture without introducing unnecessary networking and coordination overhead.

If you’ve been following the “agentic” LLM space, you might be wondering: “Are any of these issues relevant in a world where my software is being written by an LLM?” If the static-location model is sufficiently rich to express all distributed systems, who cares if it’s painful to program in!

I’d argue that LLMs actually are a great argument why we need a new programming model. These models famously struggle under scenarios where contextually-relevant information is scattered across large bodies of text555. See the Needle in a Haystack Test; reasoning about distributed systems is even harder. See the Needle in a Haystack Test; reasoning about distributed systems is even harder. . LLMs do best when semantically-relevant information is co-located.

The static-location model forces us to split up our semantically-connected distributed logic across several modules. LLMs aren’t great yet at correctness on a single machine; it is well beyond their abilities to compose several single-machine programs that work together correctly. Furthermore, LLMs make decisions sequentially; splitting up distributed logic across several networked modules is inherently challenging to the very structure of AI models.

LLMs would do far better with a programming model that retains “semantic locality”. In a hypothetical programming model where code that spans several machines can be co-located, this problem becomes trivial. All the relevant logic for a distributed algorithm would be right next to each other, and the LLM can generate distributed logic in a straight-line manner.

The other piece of the puzzle is correctness. LLMs make mistakes, and our best bet is to combine them with tools that can automatically find them666. Lean is a great example of this in action. Teams including Google and Deepseek have been using it for some time. Lean is a great example of this in action. Teams including Google and Deepseek have been using it for some time. . Sequential models have no way to reason about the ways distributed execution might cause trouble. But a sufficiently rich distributed programming model could surface issues arising from network delays and faults (think a borrow-checker, but for distributed systems).

Although the programming models we’ve discussed each have several limitations, they also demonstrate desirable features that a native programming model for distributed systems should support. What can we learn from each model?

I’m going to skip over external-distribution, which as we discussed is not quite distributed. For applications that can tolerate the performance and semantic restrictions of this model, this is the way to go. But for a general distributed programming model, we can’t keep networking and concurrency hidden from the developer.

The static-location model seems like the right place to start, since it is at least capable of expressing all the types of distributed systems we might want to implement, even if the programming model offers us little help in reasoning about the distribution. We were missing two things that the arbitrary-location model offered:

  • Writing logic that spans several machines right next to each other, in a single function
  • Surfacing semantic information on distributed behavior such as message reordering, retries, and serialization formats across network boundaries

Each of these points have a dual, something we don’t want to give up:

  • Explicit control over placement of logic on machines, with the ability to perform local, atomic computations
  • Rich options for fault tolerance guarantees and network semantics, without the language locking us into global coordination and recovery protocols

It’s time for a native programming model—a Rust-for-distributed systems, if you will—that addresses all of these.

Thanks to Tyler Hou, Joe Hellerstein, and Ramnivas Laddad for feedback on this post!

  1. This may come as a surprise. CRDTs are often marketed as a silver bullet for all distributed systems, but another perspective is they simply accelerate distributed transactions. Software running over CRDTs is still sequential.

  2. Well that’s the idea, at least. Serializability typically isn’t the default (snapshot isolation is), so concurrency bugs can sometimes be exposed.

  3. Actor frameworks don’t really count even if they support migration, since the developer still has to explicitly define the boundaries of an actor and specify where message passing happens

  4. With some optimizations when a step is a pure, deterministic function

  5. See the Needle in a Haystack Test; reasoning about distributed systems is even harder.

  6. Lean is a great example of this in action. Teams including Google and Deepseek have been using it for some time.

]]>
https://www.shadaj.me/writing/distributed-programming-stalled hacker-news-small-sites-43195702 Thu, 27 Feb 2025 16:12:42 GMT
<![CDATA[Is It an AWS EC2 Instance or a US Visa?]]> thread link) | @alach11
February 27, 2025 | https://rahmatashari.com/app/ec2-visa-quiz | archive.org

Unable to extract article]]>
https://rahmatashari.com/app/ec2-visa-quiz hacker-news-small-sites-43195517 Thu, 27 Feb 2025 15:54:40 GMT
<![CDATA[Solitaire]]> thread link) | @goles
February 27, 2025 | https://localthunk.com/blog/solitaire | archive.org

I have cited a few games as inspiration for Balatro in the past, but I wanted to talk about one in particular that hasn’t been mentioned much that arguably is the most important.

I think if I had some kind of Balatro vision board, solitaire (Klondike) would be right in the middle of it with a big red circle around it. You can probably see some of the similarities between my game and the classic solo card game. I wanted my game to have the same vibe.

If you’re somehow unfamiliar, solitaire is a group of card games characterized by solo play. Klondike is usually the variant that most people in the west associate with solitaire, but one could argue even Balatro is technically a solitaire game. Traditional solitaire games exist at the peak of game culture for me. These games are so ubiquitous and accepted by society that almost everyone has some memory of playing them. They have transcended gaming culture more than even the biggest IPs (like Tetris or Mario), and they occupy this very interesting wholesome niche. Solitaire is almost viewed as a positive pastime more than a game. That feeling interests me greatly as a game designer.

As Balatro 1.0 development drew nearer to a close in early 2024, I found myself picturing the type of person that might play my game and what a typical play session might look like for them. My fantasy was that I was playing this weird game many years later on a lazy Sunday afternoon; I play a couple of runs, enjoy my time for about an hour, then set it down and continue the rest of my day. I wanted it to feel evergreen, comforting, and enjoyable in a very low-stakes way. I think that’s one of the reasons why there isn’t a player character, health, or classic ‘enemies’ in the game as well. I wanted this game to be as low stakes as a crossword or a sudoku puzzle while still exercising the problem solving part of the brain.

Essentially I wanted to play Balatro like people play solitaire.

One of the main ways that the vibe of solitaire and my own game differ is in the meta-game Balatro has that solitaire does not. Things like achievements, stake levels, unlocks, and challenges certainly can be looked at as a way to artificially inflate playtime, but those things were added for 2 other reasons I was more concerned about:

  1. To force players to get out of their comfort zone and explore the design of the game in a way they might not if this were a fully unguided gaming experience. In solitaire this probably isn’t super useful because the game has far fewer moving parts, so the player can figure everything out by themselves, but I don’t think that’s the case with a game like Balatro. I feel like even I learned a lot from these guiding goals that I wasn’t anticipating many months after the game launched.

  2. To give the players that already enjoy the game loop a sort of checklist to work through if they so choose. They can come up with a list of goals on their own (as I see many from the community have) but I do really appreciate when I play other games and they give me tasks to accomplish and shape my long-form play around while I enjoy the shorter play sessions individually.

It’s now been over a year since launch and I am still playing Balatro almost daily. I play a couple runs before I go to bed, and I feel like I just might have accomplished the task of recreating the feeling of playing solitaire for myself. Seeing the discourse around my game has me fairly convinced that this is decidedly not how the average player has been interacting with my game, but I’m still thrilled that people are having a great time with it and I’m even more happy that I feel like this game turned out how I wanted as a player myself.

This is why you might have seen me refer to this game as ‘jazz solitaire’ in the past. I wanted to bring the old feeling of solitaire into a game with modern design bells and whistles, creating something new and yet familiar. Only time will tell if I actually accomplished that.

]]>
https://localthunk.com/blog/solitaire hacker-news-small-sites-43195516 Thu, 27 Feb 2025 15:54:36 GMT
<![CDATA[RoboPianist: Dexterous Piano Playing with Deep Reinforcement Learning (2023)]]> thread link) | @bemmu
February 27, 2025 | https://kzakka.com/robopianist/#demo | archive.org

@ Conference on Robot Learning (CoRL) 2023

1UC Berkeley

2Google DeepMind

3Stanford University

4Simon Fraser University

TLDR We train anthropomorphic robot hands to play the piano using deep RL
and release a simulated benchmark and dataset to advance high-dimensional control.

Interactive Demo

This is a demo of our simulated piano playing agent trained with reinforcement learning. It runs MuJoCo natively in your browser thanks to WebAssembly. You can use your mouse to interact with it, for example by dragging down the piano keys to generate sound or pushing the hands to perturb them. The controls section in the top right corner can be used to change songs and the simulation section to pause or reset the agent. Make sure you click the demo at least once to enable sound.

Overview

Simulation

We build our simulated piano-playing environment using the open-source MuJoCo physics engine. It consists in a full-size 88-key digital keyboard and two Shadow Dexterous Hands, each with 24 degrees of freedom.

Musical representation

We use the Musical Instrument Digital Interface (MIDI) standard to represent a musical piece as a sequence of time-stamped messages corresponding to "note-on" or "note-off" events. A message carries additional pieces of information such as the pitch of a note and its velocity.

We convert the MIDI file into a time-indexed note trajectory (also known as a piano roll), where each note is represented as a one-hot vector of length 88 (the number of keys on a piano). This trajectory is used as the goal representation for our agent, informing it which keys to press at each time step.

The interactive plot below shows the song Twinkle Twinkle Little Star encoded as a piano roll. The x-axis represents time in seconds, and the y-axis represents musical pitch as a number between 1 and 88. You can hover over each note to see what additional information it carries.

A synthesizer can be used to convert MIDI files to raw audio:

Musical evaluation

We use precision, recall and F1 scores to evaluate the proficiency of our agent. If at a given instance of time there are keys that should be "on" and keys that should be "off", precision measures how good the agent is at not hitting any of the keys that should be "off", while recall measures how good the agent is at hitting all the keys that should be "on". The F1 score combines the precision and recall into a single metric, and ranges from 0 (if either precision or recall is 0) to 1 (perfect precision and recall).

Piano fingering and dataset

Piano fingering refers to the assignment of fingers to notes in a piano piece (see figure below). Sheet music will typically provide sparse fingering labels for the tricky sections of a piece to help guide pianists, and pianists will often develop their own fingering preferences for a given piece.

In RoboPianist, we found that the agent struggled to learn to play the piano with a sparse reward signal due to the exploration challenge associated with the high-dimensional action space. To overcome this issue, we added human priors in the form of the fingering labels to the reward function to guide its exploration.

Since fingering labels aren't available in MIDI files by default, we used annotations from the Piano Fingering Dataset (PIG) to create 150 labeled MIDI files, which we call Repertoire-150 and release as part of our environment.

Finger numbers (1 to 9) annotated above each note. Source: PianoPlayer

MDP Formulation

We model piano-playing as a finite-horizon Markov Decision Process (MDP) defined by a tuple \( (\mathcal{S}, \mathcal{A}, \mathcal{\rho}, \mathcal{p}, r, \gamma, H) \), where \( \mathcal{S} \) is the state space, \( \mathcal{A} \) is the action space, \( \mathcal{\rho}(\cdot) \) is the initial state distribution, \( \mathcal{p} (\cdot | s, a) \) governs the dynamics, \( r(s, a) \) is the reward function, \( \gamma \) is the discount factor, and \( H \) is the horizon. The goal of the agent is to maximize its total expected discounted reward over the horizon \( \mathbb{E}\left[\sum_{t=0}^{H} \gamma^t r(s_t, a_t) \right] \).

At every time step, the agent receives proprioceptive (i.e, hand joint angles), exteroceptive (i.e., piano key states) and goal observations (i.e., piano roll) and outputs 22 target joint angles for each hand. These are fed to proportional-position actuators which convert them to torques at each joint. The agent then receives a weighted sum of reward terms, including a reward for hitting the correct keys, a reward for minimizing energy consumption, and a shaping reward for adhering to the fingering labels.

For our policy optimizer, we use a state-of-the-art model-free RL algorithm DroQ and train our agent for 5 million steps with a control frequency of 20 Hz.

Quantitative Results

With careful system design, we improve our agent's performance significantly. Specifically, adding an energy cost to the reward formulation, providing a few seconds worth of future goals rather than just the current goal, and constraining the action space helped the agent learn faster and achieve a higher F1 score. The plot below shows the additive effect of each of these design choices on three different songs of increasing difficulty.

When compared to a strong derivative-free model predictive control (MPC) baseline, Predictive Sampling, our agent achieves a much higher F1 score, averaging 0.79 across Etude-12 versus 0.43 for Predictive Sampling.

Qualitative Results

Each video below is playing real-time and shows our agent playing every song in the Etude-12 subset. In each video frame, we display the fingering labels by coloring the keys according to the corresponding finger color. When a key is pressed, it is colored green.

Debug dataset

This dataset contains "entry-level" songs (e.g., scales) and is useful for sanity checking an agent's performance. Fingering labels in this dataset were manually annotated by the authors of this paper. It is not part of the Repertoire-150 dataset.

C Major Scale

D Major Scale

Twinkle Twinkle Little Star

Etude-12 subset

Etude-12 is a subset of the full 150-large dataset and consists of 12 songs of varying difficulty. It is a subset of the full benchmark reserved for more moderate compute budgets.

Piano Sonata D845 1st Mov (F1=0.72)

Partita No. 2 6th Mov (F1=0.73)

Bagatelle Op. 3 No. 4 (F1=0.75)

French Suite No. 5 Sarabande (F1=0.89)

Waltz Op. 64 No. 1 (F1=0.78)

French Suite No. 1 Allemande (F1=0.78)

Piano Sonata No. 2 1st Mov (F1=0.79)

Kreisleriana Op. 16 No. 8 (F1=0.84)

Golliwoggs Cakewalk (F1=0.85)

Piano Sonata No. 23 2nd Mov (F1=0.87)

French Suite No. 5 Gavotte (F1=0.77)

Piano Sonata K279 1st Mov (F1=0.78)

Common failure modes

Since the Shadow Hand forearms are thicker than a human's, the agent sometimes struggles to nail down notes that are really close together. Adding full rotational and translational degrees of freedom to the hands could give them the ability to overcome this limitation, but would pose additional challenges for learning.

The agent struggles with songs that require stretching the fingers over many notes, sometimes more than 1 octave.

Acknowledgments

This work is supported in part by ONR #N00014-22-1-2121 under the Science of Autonomy program.

This website was heavily inspired by Brent Yi's.

]]>
https://kzakka.com/robopianist/#demo hacker-news-small-sites-43192751 Thu, 27 Feb 2025 09:41:23 GMT
<![CDATA[Python as a second language empathy (2018)]]> thread link) | @luu
February 26, 2025 | https://ballingt.com/python-second-language-empathy/ | archive.org

abstract


It’s different! Let’s talk about how.

Because as Python experts (you did choose to come to a Python conference so likely you’re either an expert already or in time you’re going to become one if you keep going to Python conferences) we have a responsibility to help our colleagues and collaborators who don’t know Python as well we do. The part of that responsibility I want to focus on today is when other people have experience with other programming languages but are new to Python.

I work at Dropbox, which as Guido said earlier today is a company that uses a fair bit of Python. But a lot of programmers come to Dropbox without having significant Python experience. Do these people take a few months off when they join to really focus on learning and figure out exactly how Python works, having a lot of fun while they do it? That would great (briefly shows slide of Recurse Center logo) but that’s not what usually happens. Instead they learn on the job, they start making progress right away. They’ll read some books (my favorite is Python Essential Reference, but I hear Fluent Python is terrific), watch some Python talks, read some blog posts, ask questions at work, and Google a lot. That last one is the main one, lots of Google and lots of Stack Overflow.

Learning primarily by Googling can leave you with certain blind spots. If the way that you’re learning a language is by looking up things that are confusing to you, things that aren’t obviously confusing aren’t going to come up.

We ought to be trying to understand our colleagues’ understandings of Python. This is a big thing whenever you’re teaching, whenever you’re trying to communicate with another person: trying to figure out their mental model of a situation and providing just the right conceptual stepping stones to update that model to a more useful state.

We should try to understand the understandings of Python of people coming to Python as a new language. I’m going to call this “Python-as-a-second-language empathy.”

How do we build this PaaSL empathy thing?

The best thing you can do is learn another language first, and then learn Python. Who here has another language that they knew pretty well before learning Python? (most hands go up) Great! Terrific! That’s a superpower you have that I can never have. I can never unlearn Python, become fluent in another language, and then learn Python again. You have this perspective that I can’t have. I encourage you to use that superpower to help others with backgrounds similar to your own. I’d love to see “Django for Salesforce Programmers” as a talk at a Python conference because it’s very efficient when teaching to be able to make connections to a shared existing knowledge base.

Another thing you can do to build this PAASL empathy (I’m still deciding on an acronym) is to learn language that are different than the ones you know. Every time you learn a new language you’re learning new dimensions on which someone could have a misconception.

Consider the following:

Depending on the languages you know, you might make different assumptions about the answers to the following questions:

  • Will a always be equivalent to the sum of b and c from now on, or will that only be true right after we run this code?
  • Will b + c be evaluated right now, or when a is used later?
  • Could b and c be function calls with side effects?
  • Which will be evaluated first?
  • What does plus mean, and how do we find out?
  • Is a a new variable, and if so is it global now?
  • Does the value stored in a know the name of that variable?

These are questions you can have and ways that someone might be confused, but if you’re not familiar with languages that answer these questions in different ways you might not be able to conceive of these misunderstandings.

Another you thing you can do to build PSL empathy is listen. Listen to questions and notice patterns in them. If you work with grad students who know R and are learning Python, try to notice what questions repeatedly come up.

In a general sense, this is what my favorite PyCon speaker Ned Batchelder does a wonderful job of. Ned is a saint who spends thousands of hours in the #python irc channel repeatedly answering the same questions about Python. He does a bunch of other things like run the Boston Python Users Meetup group, and he coalesces all this interaction into talks which concisely hit all the things that are confusing about whatever that year’s PyCon talk is.

The final idea for building Py2ndLang empathy I’ll suggest is learning the language that your collaborator knows better so you can better imagine what their experience might be like. If you colleague is coming from Java, go learn Java! For this talk I did a mediocre job of learning C++ and Java. I did some research so I could try to present to you some of the things that could be tricky if you’re coming to Python from one of these languages. I chose these languages because they’re common language for my colleagues. It’s very reasonable to assume that a programming language will work like a language you already know, because so often they do! But then when there’s a difference it’s surprising.

C++ and Java are not my background! While Python was the first language I really got deep into, I had previous exposure to programming that colored my experience learning Python. My first programming language was TI-81 Basic, then some Excel that my mom taught me. In the Starcraft scenario editor you could write programs with a trigger language, so I did some of that. In middle school I got to use Microworlds Logo, which was pretty exciting. I did a little Visual Basic, got to college and did some MATLAB and some Mathematica, and then I took a CS course where they taught us Python.

My misconceptions about Python were so different than other students’, some of whom had taken AP Computer Science with Java in high school. The languages I learned were all dynamically typed languages with function scope, so I didn’t have the “where are my types?” reaction of someone coming from Java.

Java and C++ are good languages to focus on because they’re often taught in schools, so when interviewing or working with someone right out of undergrad it can be useful to try to understand these languages.

Before we get to a list of tricky bits, there are some thinks I won’t talk about because I don’t call then “tricky.” Not that they aren’t hard, but they aren’t pernicious, they’re misunderstandings that will be bashed down pretty quickly instead of dangerously lingering on. New syntax like colons and whitespace, new keywords like yield; Python gives you feedback in the form of SyntaxErrors about the first group, and there’s something to Google for with the second. When you first see a list comprehension in Python, you know there’s something not quite normal about this syntax, so you know to research it or ask a question about it.

Let’s split things that are tricky about Python for people coming from Java or C++ into three categories: things that look similar to Java or C++ but behave differently, things that behave subtly differently, and “invisible” things that leave no trace. The first category is tricky because you might not think to look up any differences, the second because you might test for differences and at a shallow level observe none when in fact some lurk deeper. The third is tricky because there’s no piece of code in the file you’re editing that might lead you to investigate. These are pretty arbitrary categories.

Look similar, behave differently

Decorators

There’s a think in Java called an annotation that you can stick on a method or a class or some other things. It’s a way of adding some metadata to a thing. And then maybe you could do some metaprogramming-ish stuff where you look at that metadata later and make decisions about what code to run based on them. But annotations are much less powerful than Python decorators.

>>> @some_decorator
... def foo():
...     pass
... 
>>> foo
<quiz.FunctionQuestion object at 0x10ab14e48>

Here (in Python) a python decorator is above a function, but what comes out is an instance of a custom class “FunctionQuestion” - it’s important to remember that decorators are arbitrary code and they can do anything. Somebody coming from Java might miss this, thinking this is an annotation adding metadata that isn’t transforming the function at definition time.

Class body assignments create class variables

I’ve seen some interesting cool bugs before because of this. The two assignments below are two very different things:

class AddressForm:
    questions = ['name', 'address']

    def __init__(self):
        self.language = 'en'

questions is a class attribute, and language is an instance attribute. These are ideas that exist in Java and C++ with slightly different names (questions might be called a “static” variable, and language called a “member” variable), but if you see something like the top in one of those languages people might assume you’re initializing attributes on an instance; they might think the first thing is another way of doing the second.

Run-time errors, not compile-time

Here I’ve slightly misspelled the word “print:”

if a == 2:
    priiiiiiiiiiiiint("not equal")

This is valid Python code, and I won’t notice anything unusual about it until a happens to be 2 when this code runs. I think people coming from languages like Java and C++ with more static checks will get bitten by this before too long and get scared of it, but there are a lot of cases for them to think about.

try:
    foo()
except ValyooooooooooError:
    print('whoops)'

Here’s I’ve slightly misspelled ValueError, but I won’t find out until foo() raises an exception.

try:
    foo()
except ValueError:
    priiiiiiiiiiiiiiint('whoops)'

Here ValueError is fine, but the code below it won’t run until foo() raises an exception.

Conditional and Run-Time Imports

Particularly scary examples of the above issue feature imports because people may think imports work like they do in Java or C++: something that happens before a program runs.

try:
    foo()
except ValueError:
    import bar
    bar.whoops()

It’s not until foo() raises a ValueError that we’ll find out whether the bar module is syntactically valid because we hadn’t loaded it yet, or whether a file called bar.py exists at all!

Block Scope

This might blow your mind if you’re mostly familiar with Python: there’s this idea called block scope. Imagine that every time you indented you got a new set of local variables, and each time you dedented those variables went away. People who use Java or C++ are really used to this idea, they really expect that when they go out of a scope (which they use curly brackets to denote, not indentation) that those variables will go away. As Python users, we might know that in the below,

def foo():
    bunch = [1, 2, 3, 4]
    for apple in bunch:
       food = pick(apple)

    print(apple)
    print(food)

the variables apple and bunch “escape” the for loop, because Python has function scope, not block scope! But this sneaks up on people a lot.

Introducing Bindings

This above is sort of a special case of something Ned Batchelder has a great talk about, which is that all the statements below introduce a new local variable X:

X = ...
for X in ...
[... for X in ...]
(... for X in ...)
{... for X in ...}
class X(...):
def X(...):
def fn(X): ... ; fn(12)
with ... as X:
except ... as X:
import X
from ... import X
import ... as X
from ... import ... as X

(these examples taken from the talk linked above)

import in a function introduces a new local variable only accessible in that function! Importing in Python isn’t just telling the compiler where to find some code, but rather to run some code, stick the result of running that code in a module object, and create a new local variable with a reference to this object.

Subtle behavior differences

Assignment

Python = is like Java, it’s always a reference and never a copy (which it is by default in C++).

Closures

A closure is a function that has references to outer scopes. (mostly - read more) C++ and Java have things like this. Lambdas in C++ require their binding behavior to be specified very precisely, so each variable might be captured by value or by reference or something else. So a C++ programmer will at least know to ask the question in Python, “how is this variable being captured?” But in Java the default behavior is to make the captured variable final, which is a little scarier because a Java programmer might assume the same about Python closures.

GC

It’s different! We have both reference counting and garbage collection in Python. This makes it sort of like smart pointers in C++ and sort of like garbage collection in Java. And __del__ finalizer doesn’t do what you think it does in Python 2!

Explicit super()

In Java and C++ there exist cases where the parent constructor for an object will get called for you, but in Python it’s necessary to call the parent method implementation yourself with super() if a class overrides a parent class method. Super is a very cooperative sort of thing in Python; a class might have a bunch of superclasses in a tree and to run all of them requires a fancy method resolution order. This works only so long every class calls super.

I’ll translate this one to the transcript later - for now you’ll have to watch it because the visual is important: explicit super.

Invisible differences

Properties and other descriptors

It can feel odd to folks coming from C++ or Java that we don’t write methods for getters and setters in Python; we don’t have to because ordinary attribute get and set syntax can cause arbitrary code to run.

obj.attr
obj.attr = value

This is in the invisible category because unless you go to the source code of the class it’s easy to assume code like this only reads or writes a variable.

Dynamic Attribute Lookup

Attribute lookup is super dynamic in Python! Especially when writing tests and mocking out behavior it’s important to know (for instance) that a data descriptor on a parent class will shadow an instance variable with the same name.

Monkeypatching

Swapping out implementations on a class or an instance is going to be new to people. It could happen completely on the other side of your program (or you test suite) but affect an object in your code.

Metaprogramming

It takes less characters in Python!

get_user_class("employee")("Tom", 1)

The code above returns a class object based on the string “employee” and then creates an instance of it. It might be easy to miss this if you expect metaprogramming to take up more lines of code.

Python 2 Whitespace Trivia

A tab is 8 spaces in Python 2 for the purposes of parsing significant whitespace, but is usually formatted as 4!

Should we try to teach everyone all these things right now? Maybe not! If someone is interested, sure. But I think it’s hard to hit all of these without much context. And be careful not to assume people don’t know these things, maybe they do know 80% of them. I think this cheat sheet presents things that are important to be aware of while teaching whatever other topic is most pedagogically appropriate.

I don’t have time to talk much about teaching, so I’ll point to Sasha Laundy’s talk (embedded above) which I love, and quickly quote Rose Ames and say that “knowledge is power; it’s measured in wats.” I think a great way to broach a misunderstanding is to present someone with a short code sample “wat” that demonstrates a misconception exists without necessarily explaining it because often all someone needed was to have the a flaw in their model pointed out to them.

Code review is a great impetus for sending someone such a wat. I don’t have time to talk about code review so I’ll point to this terrific post by Sandya Sankarram about it.

Another thing we can do with this information is to write it in code comments. I think of comments as the place to explain why code does a thing, not to explain what that code is doing. But if you know what’s happening in your code might surprise someone less familiar with Python, maybe you should say what it’s doing? Or maybe you should write simpler code and not do that interesting Python-specific thing.

In the same way Python library authors sometimes write code that straddles Python 2 and 3 by behaving the same in each, imagine writing Python code that, if it were Java or C++, would do the same thing. Perhaps you’d have quite unidiomatic code, but perhaps it’d be quite clear.

image from this Stack Overflow blog post

Python is becoming more popular. Maybe this means more people will understand it, and we’ll get to use all our favorite Python-specific features all the time! Maybe this will mean Python becomes the lingua franca which ought to be as simple and clear as possible. I imagine it will depend on the codebase. I think as a code base grows tending toward code that is less surprising to people who do not know Python well probably makes more sense.

One final use for this cheat sheet is interviewing: interviewing is a high time pressure communication exercise where it really can help to try to anticipate another person’s understanding of a thing. Candidates often interview with Python, but know C++ or Java better. If I can identify a misunderstanding like initializing instance variables in the class statement, I can quickly identify it, clarify with the candidate, and we can move on. Or perhaps I don’t even need to if the context is clear enough. And when I’m interviewing at companies, it’s helpful to remember what parts of my Python code I might need to explain to someone not as familiar with the language.

]]>
https://ballingt.com/python-second-language-empathy/ hacker-news-small-sites-43191696 Thu, 27 Feb 2025 05:46:32 GMT
<![CDATA[Calling Rust from Cursed Go]]> thread link) | @dvektor
February 26, 2025 | https://pthorpe92.dev/cursed-go/ | archive.org

can be expressed well with a story about when I had tried to get mattn/go-sqlite3 drivers to work on a Windows machine a couple years ago around version 1.17, and CGO would not build properly because my $GOPATH or $CC was in C:\Program Files\ and it split the path on the whitespace. Although I personally find the use of whitespace in system paths especially (and just windows in general) to be a tragedy, I was shocked that this was a real problem.

Other than dealing with some random cross platform issues that teammates would have before we switched to using the modernc/sqlite3 driver, I didn't have any experience writing CGO. I'll admit that I was maybe a bit ignorant to the details of exactly why it sucked, but the bad user experiences and many anecdotes I'd heard were enough for me to look elsewhere for solutions when it came time to needing to call foreign symbols in dynamically linked libraries from a Go program.

So why does CGO suck?

==============================

There have been many posts and videos that document this far better than I could, probably the most popular being this post from Dave Cheney. A quick summary would be:

  1. For performance, Go using CGO is going to be closer to Python than Go. (which doesn't mean it's not still significantly faster than python)
  2. CGO is slow and often still painful to build cross platform.
  3. Go is no longer able to be built into a single static binary.

This being said, if I can find a solution that solves even 1 of these three things, I would consider that a W.

The program in question:

==============================

Limbo is an open source, modern Rust reimplementation of sqlite/libsql that I have become heavily involved in during the last few months in my spare time.

One of my primary contributions has been developing the extension library, where I am focused on providing (by way of many procedural macros) the most intuitive and simple API possible for users who wish to extend limbo's functionality with things like functions, virtual tables and virtual filesystems in safe Rust (and eventually other languages), without users having to write any of the ugly and confusing unsafe/FFI code that is currently required to extend sqlite3 in any way.

Since limbo is written in Rust and the extensions need to be able to be loaded dynamically at runtime, even though both extension library and core are both in Rust, unfortunately this means that Rust must fall back to adhering to C ABI to call itself, and most of the goodies (traits, smart pointers, etc) and memory safety that Rust gives us by default are all out the window.

However with Rust, aside from that unfortunate situation, once you get used to it, FFI in general is a rather pleasant experience that I can equate to writing a better C with lots of features, or I can imagine something similar to Zig (shot in the dark, I haven't yet written zig).

What this has to do with Go:

==============================

A github issue for limbo was created, asking if a Go driver would be available. The most common theme between almost all the relevant comments was: avoid CGO (including my own, due to my previous interactions with it).

As a Go developer by profession, after a couple weeks of no activity on the issue, I decided that I would take on that task as well.

Besides a jank, half finished, speed-run programming language, I have almost exclusively used Go to write backends and web services, which basically everyone agrees is where it really shines. Most of the systems or general programming outside of web that I have done, has all been in Rust or C/C++ (mostly Rust). I say that to highlight that my knowledge and experience with Go and the ecosystem outside of web was/is minimal, so I was more or less starting from scratch.

All I knew at first was that I didn't want to use CGO, and I had heard of other packages that were supposedly 'Pure Go' implementations. So I naively figured there must be a way to just ignore whatever API the Go team wants you to use, and perhaps there is some 'unsafe' platform specific os package that lets you dlopen that people just avoid for one reason or another.

I started to look at how other drivers managed this, and realized that all of Go's current options for sqlite3 drivers, fall into one of the following categories:

(note: although there are a few more drivers out there that I have not named, they would be still included in one of the below categories).

  1. CGO: github.com/mattn/go-sqlite3

The defacto-standard and the first search result on Google for 'go sqlite driver'.

  1. Code-gen: modernc.org/sqlite

An extremely impressive project that generates Go code for the entire sqlite codebase, as well as some supporting libc code, for many different platforms.

  1. WASM: github.com/ncruces/go-sqlite3

Uses a wasm build of sqlite3 and packages the wazero runtime, as well as a Go vfs implementation wrapping sqlite3's OS interface. Has some minor caveats, but is production ready and well implemented.

  1. Pipes: github.com/cvilsmeier/sqinn-go

"reads requests from stdin, forwards the request to SQLite, and writes a response to stdout."

Why can't I just dlopen()?

When I was unable to find anything that remotely resembles dlopen in the Go standard library (except for windows.LoadLibrary() surprisingly, but more on that in my next post), I recalled a conversation I had with my boss/CTO at work.

She has been writing Go since the first open source release, and that is one of the things we originally bonded over, was the desire to move the company from PHP to Go. I remembered her telling me about a Game engine written in Go that wrote it's own CGO replacement because of how annoyed they were with CGO itself. I was now curious and decided to check it out.

Enter: purego

A library that allows you to dlopen()/dlsym() in Go without CGO. After looking at the API, I quickly realized that this is exactly what I was looking for.

But how did they do it?

syscall.a1, syscall.a2, _ = syscall_syscall15X(cfn, sysargs[0], sysargs[1], sysargs[2], sysargs[3], sysargs[4],
    sysargs[5], sysargs[6], sysargs[7], sysargs[8], sysargs[9], sysargs[10], sysargs[11],
    sysargs[12], sysargs[13], sysargs[14])
syscall.f1 = syscall.a2 // on amd64 a2 stores the float return. On 32bit platforms floats aren't support

rawdogging syscalls, apparently...

Calling Rust from cursed Go

==============================

Purego makes registering foreign symbols very simple. When the driver is registered, I dlopen() the library and

// Adapted for brevity/demonstration

var (
	libOnce           sync.Once
	limboLib          uintptr
	dbOpen            func(string) uintptr
	dbClose           func(uintptr) uintptr
	connPrepare       func(uintptr, string) uintptr
// ... all the function pointers at global scope
)

// Register all the symbols on library load
func ensureLibLoaded() error {
	libOnce.Do(func() {
         limboLib, err := purego.Dlopen(libPath, purego.RTLD_NOW|purego.RTLD_GLOBAL)
        if err != nil {
            return
        }
                     // functionPointer, uintptr, symbol name string
		purego.RegisterLibFunc(&dbOpen, limboLib, FfiDbOpen)
		purego.RegisterLibFunc(&dbClose, limboLib, FfiDbClose)
		purego.RegisterLibFunc(&connPrepare, limboLib, FfiDbPrepare)
		purego.RegisterLibFunc(&connGetError, limboLib, FfiDbGetError)
        // ...

After playing around with it a bit, and deciphering what my types needed to look like to properly pass values back and forth from Rust, I ended up with something like this:


type valueType int32

const (
	intVal  valueType = 0
	textVal valueType = 1
	blobVal valueType = 2
	realVal valueType = 3
	nullVal valueType = 4
)

// struct to send values over FFI
type limboValue struct {
	Type  valueType
	_     [4]byte // padding
	Value [8]byte
}

type Blob struct {
	Data uintptr
	Len  int64
}

I had to use a byte slice instead of a uintptr if I wanted to represent a union and possibly interpret those bytes as an int64 or float64, as well as a pointer to a TextValue or BlobValue struct that stores the bytes + length

With the accompanying Rust type:


#[repr(C)]
pub enum ValueType {
    Integer = 0,
    Text = 1,
    Blob = 2,
    Real = 3,
    Null = 4,
}

#[repr(C)]
pub struct LimboValue {
    value_type: ValueType,
    value: ValueUnion,
}

#[repr(C)]
union ValueUnion {
    int_val: i64,
    real_val: f64,
    text_ptr: *const c_char,
    blob_ptr: *const c_void,
}

This is how, for example, I convert the slice of driver.Args that in order to implement statement.Query from database/sql/driver.

// convert a Go slice of driver.Value to a slice of limboValue that can be sent over FFI
// for Blob types, we have to pin them so they are not garbage collected before they can be copied
// into a buffer on the Rust side, so we return a function to unpin them that can be deferred after this call
func buildArgs(args []driver.Value) ([]limboValue, func(), error) {
	pinner := new(runtime.Pinner)                 
    // I was unaware that runtime.Pinner was a thing, prior to this
	argSlice := make([]limboValue, len(args))
	for i, v := range args {
		limboVal := limboValue{}
		switch val := v.(type) {
		case nil:
			limboVal.Type = nullVal
		case int64:
			limboVal.Type = intVal
			limboVal.Value = *(*[8]byte)(unsafe.Pointer(&val))
		case float64:
			limboVal.Type = realVal
			limboVal.Value = *(*[8]byte)(unsafe.Pointer(&val))
		case string:
			limboVal.Type = textVal
			cstr := CString(val)
			pinner.Pin(cstr)
			*(*uintptr)(unsafe.Pointer(&limboVal.Value)) = uintptr(unsafe.Pointer(cstr))
		case []byte:
			limboVal.Type = blobVal
			blob := makeBlob(val)
			pinner.Pin(blob)
			*(*uintptr)(unsafe.Pointer(&limboVal.Value)) = uintptr(unsafe.Pointer(blob))
		default:
			return nil, pinner.Unpin, fmt.Errorf("unsupported type: %T", v)
		}
		argSlice[i] = limboVal
	}
	return argSlice, pinner.Unpin, nil
}

// convert a byte slice to a Blob type that can be sent over FFI
func makeBlob(b []byte) *Blob {
	if len(b) == 0 {
		return nil
	}
	return &Blob{
		Data: uintptr(unsafe.Pointer(&b[0])),
		Len:  int64(len(b)),
	}
}

Looking at purego's source code gave me some inspiration and helped me get the general idea of how to manipulate and work with pointers and types received over FFI. For instance, this is the function they use to convert a Go string to a C String:

/*
Credit (Apache2 License) to:
      https://github.com/ebitengine/purego/blob/main/internal/strings/strings.go
*/
func CString(name string) *byte {
	if hasSuffix(name, "\x00") {
		return &(*(*[]byte)(unsafe.Pointer(&name)))[0]
	}
	b := make([]byte, len(name)+1)
	copy(b, name)
	return &b[0]
}

And I was able to adapt everything else around these concepts. There were a few things that I wasn't super pleased with that I still have to figure out. For instance, sending back an array of strings from Rust was such a pain in the ass that rows.Columns() method calls this function:

#[no_mangle]
pub extern "C" fn rows_get_columns(rows_ptr: *mut c_void) -> i32 {
    if rows_ptr.is_null() {
        return -1;
    }
    let rows = LimboRows::from_ptr(rows_ptr);
    rows.stmt.num_columns() as i32
}

to get the number of result columns for the prepared statement, then calls rows_get_column_name with the index of the column name to return.

It's all pretty cursed huh?

================================

But it works, and so far there is a decent start to bindings that don't fit into any of the above categories with any of the existing sqlite drivers :)

I'll follow this post up soon with another explaining some of the caveats, the workarounds for them, whether or not we actually solved any of the issues that we described with CGO, and maybe run some benchmarks.

But for now, thanks for reading.

]]>
https://pthorpe92.dev/cursed-go/ hacker-news-small-sites-43191213 Thu, 27 Feb 2025 04:01:17 GMT