# Why Datalog? Datalog is a declarative query language that excels at expressing complex relationships in graph-structured data. While SQL handles tabular data well, modern applications increasingly need to model interconnected entities—social networks, organizational hierarchies, supply chains, knowledge graphs. ## When SQL Falls Short **The problem**: As applications evolve, relational schemas accumulate join complexity. What starts as simple tables becomes a maze of foreign keys, junction tables, and nested subqueries. Developers spend more time managing joins than expressing business logic. **Common pattern**: Start with SQL for simplicity → encounter complex relationships → add more joins → build ad-hoc graph features → end up with fragile, hard-to-maintain queries. **Datalog's approach**: Pattern matching over relationships. Instead of explicitly specifying how to join tables, you describe what patterns you're looking for. The query engine handles traversal. ## Query Comparison Let's find all projects where someone manages a team member who contributed to that project. ### SQL Approach ```sql SELECT DISTINCT m.name AS manager_name, e.name AS employee_name, p.name AS project_name FROM employees m JOIN manager_relationships mr ON m.id = mr.manager_id JOIN employees e ON mr.employee_id = e.id JOIN project_contributors pc ON e.id = pc.employee_id JOIN projects p ON pc.project_id = p.id WHERE m.id != e.id; ``` This requires: - Understanding the table structure (4 tables, 3 join conditions) - Explicit join order management - Mental model of how data flows through joins ### Datalog Approach ```clojure (d/q '[:find ?manager-name ?employee-name ?project-name :where [?manager :employee/manages ?employee] [?employee :project/contributed-to ?project] [?manager :employee/name ?manager-name] [?employee :employee/name ?employee-name] [?project :project/name ?project-name]] db) ``` Pattern matching expresses intent: - "Find managers who manage employees" - "Find employees who contributed to projects" - "Get their names" No explicit joins. No table knowledge required. The relationships are first-class. ## More Complex: Transitive Relationships Find all skills accessible through management chain (managers inherit team skills). ### SQL ```sql WITH RECURSIVE management_chain AS ( -- Base case: direct reports SELECT manager_id, employee_id, 1 as depth FROM manager_relationships UNION ALL -- Recursive case: indirect reports SELECT mc.manager_id, mr.employee_id, mc.depth + 1 FROM management_chain mc JOIN manager_relationships mr ON mc.employee_id = mr.manager_id WHERE mc.depth < 10 -- Prevent infinite recursion ) SELECT DISTINCT m.name AS manager_name, s.name AS accessible_skill FROM employees m JOIN management_chain mc ON m.id = mc.manager_id JOIN employee_skills es ON mc.employee_id = es.employee_id JOIN skills s ON es.skill_id = s.id; ``` Requires: - Understanding recursive CTEs - Managing recursion depth manually - Multiple joins to assemble result ### Datalog with Rules ```clojure ;; Define a rule for transitive management (d/q '[:find ?manager-name ?skill-name :in $ % :where (manages-recursively ?manager ?employee) [?employee :employee/has-skill ?skill] [?manager :employee/name ?manager-name] [?skill :skill/name ?skill-name]] db '[;; Rule definition [(manages-recursively ?m ?e) [?m :employee/manages ?e]] [(manages-recursively ?m ?e) [?m :employee/manages ?x] (manages-recursively ?x ?e)]]) ``` **Rules encapsulate logic**: `manages-recursively` handles the transitive relationship. Reusable across queries. ## Graph Traversal: Friend Recommendations Find friends-of-friends who share interests, excluding existing friends. ### SQL ```sql SELECT DISTINCT p1.name AS person_name, p3.name AS recommendation_name, i.name AS shared_interest FROM people p1 JOIN friendships f1 ON p1.id = f1.person_id JOIN people p2 ON f1.friend_id = p2.id JOIN friendships f2 ON p2.id = f2.person_id JOIN people p3 ON f2.friend_id = p3.id JOIN person_interests pi1 ON p1.id = pi1.person_id JOIN person_interests pi3 ON p3.id = pi3.person_id JOIN interests i ON pi1.interest_id = i.id AND pi3.interest_id = i.id LEFT JOIN friendships f_check ON (p1.id = f_check.person_id AND p3.id = f_check.friend_id) WHERE p1.id != p3.id AND f_check.person_id IS NULL; -- Not already friends ``` 8 joins, complex exclusion logic, hard to maintain. ### Datalog ```clojure (d/q '[:find ?person-name ?recommendation-name ?interest-name :where [?person :person/friend ?friend] [?friend :person/friend ?recommendation] [?person :person/interest ?interest] [?recommendation :person/interest ?interest] [(not= ?person ?recommendation)] (not [?person :person/friend ?recommendation]) [?person :person/name ?person-name] [?recommendation :person/name ?recommendation-name] [?interest :interest/name ?interest-name]] db) ``` **Natural expression**: Each line describes one relationship pattern. Negation uses `not` clause rather than LEFT JOIN tricks. ## Multi-Database Queries Datalog can query across multiple databases in a single query—useful for data federation or comparing environments. ```clojure ;; Compare production and staging: Find entities in staging not in production (d/q '[:find ?name ?value :in $staging $prod :where [$staging ?e :entity/name ?name] [$staging ?e :entity/value ?value] (not [$prod ?e2 :entity/name ?name])] staging-db prod-db) ``` SQL would require UNION queries, temporary tables, or separate queries with application-level comparison. ## When to Use Datalog **Choose Datalog when:** - Modeling interconnected entities (social graphs, org charts, supply chains) - Need recursive or transitive queries - Schema evolves frequently (schema-less or flexible schema) - Want to express logic through rules - Query across multiple databases or time points - Building knowledge graphs or semantic systems **SQL still wins for:** - Pure tabular data with minimal relationships - Standard BI tool integration - Window functions and complex aggregations over large sorted sets - Existing SQL infrastructure and expertise ## Performance Note Datalog query engines (including Datahike) use sophisticated join optimization. Performance is comparable to well-tuned SQL for most queries. For specific workloads, see our [benchmarks](https://gitlab.com/arbetsformedlingen/taxonomy-dev/backend/experimental/datahike-benchmark/). ## Learning Datalog Datalog syntax is minimal: - Pattern matching: `[?entity :attribute ?value]` - Rules: Define reusable logic patterns - Negation: `(not ...)` - Functions: `[(function arg) result]` Most developers become productive within hours. The conceptual shift from imperative joins to declarative patterns is the main learning curve, not syntax complexity. ## Further Reading - [Datalog Wikipedia](https://en.wikipedia.org/wiki/Datalog) - Academic background - [Learn Datalog Today](http://www.learndatalogtoday.org/) - Interactive tutorial - [Datomic Query Documentation](https://docs.datomic.com/query/query.html) - Datahike uses compatible query syntax - [Datahike Query Guide](https://cljdoc.org/d/org.replikativ/datahike/CURRENT/doc/query) - Full query capabilities