Refactor the RequestTags migration (202601121700-migrate-hostinfo-request-tags)
to use PolicyManager.NodeCanHaveTag() instead of reimplementing tag validation.
Changes:
- NewHeadscaleDatabase now accepts *types.Config to allow migrations
access to policy configuration
- Add loadPolicyBytes helper to load policy from file or DB based on config
- Add standalone GetPolicy(tx *gorm.DB) for use during migrations
- Replace custom tag validation logic with PolicyManager
Benefits:
- Full HuJSON parsing support (not just JSON)
- Proper group expansion via PolicyManager
- Support for nested tags and autogroups
- Works with both file and database policy modes
- Single source of truth for tag validation
Co-Authored-By: Shourya Gautam <shouryamgautam@gmail.com>
Add GetAPIKeyByID method to the state layer, delegating to the existing
database layer function. This enables API key lookup by ID in addition
to the existing prefix-based lookup.
Updates #2986
Make DeleteUser call updatePolicyManagerUsers() to refresh the policy
manager's cached user list after user deletion. This ensures consistency
with CreateUser, UpdateUser, and RenameUser which all update the policy
manager.
Previously, DeleteUser only removed the user from the database without
updating the policy manager. This could leave stale user references in
the cached user list, potentially causing issues when policy is
re-evaluated.
The gRPC handler now uses the change returned from DeleteUser instead of
manually constructing change.UserRemoved().
Fixes#2967
HandleNodeFromPreAuthKey assumed pak.User was always set, but
tags-only PreAuthKeys have nil User. This caused nil pointer
dereference when registering nodes with tags-only keys.
Also updates integration tests to use GetTags() instead of the
removed GetValidTags() method.
Updates #2977
When SetNodeTags changed a node's tags, the node's self view wasn't
updated. The bug manifested as: the first SetNodeTags call updates
the server but the client's self view doesn't update until a second
call with the same tag.
Root cause: Three issues combined to prevent self-updates:
1. SetNodeTags returned PolicyChange which doesn't set OriginNode,
so the mapper's self-update check failed.
2. The Change.Merge function didn't preserve OriginNode, so when
changes were batched together, OriginNode was lost.
3. generateMapResponse checked OriginNode only in buildFromChange(),
but PolicyChange uses RequiresRuntimePeerComputation which
bypasses that code path entirely and calls policyChangeResponse()
instead.
The fix addresses all three:
- state.go: Set OriginNode on the returned change
- change.go: Preserve OriginNode (and TargetNode) during merge
- batcher.go: Pass isSelfUpdate to policyChangeResponse so the
origin node gets both self info AND packet filters
- mapper.go: Add includeSelf parameter to policyChangeResponse
Fixes#2978
When a node re-authenticates via OIDC/web auth with empty RequestTags
(from `tailscale up --advertise-tags= --force-reauth`), remove all tags
and return ownership to the authenticating user.
This allows nodes to transition from any tagged state (including nodes
originally registered with a tagged pre-auth key) back to user-owned.
Fixes#2979
Extends #2971 fix to also cover nodes that authenticate as users but
become tagged immediately via --advertise-tags. When RequestTags are
approved by policy, the node's expiry is now disabled, consistent with
nodes registered via tagged PreAuthKeys.
Nodes registered with tagged PreAuthKeys now have key expiry disabled,
matching Tailscale's behavior. User-owned nodes continue to use the
client-requested expiry.
On re-authentication, tagged nodes preserve their disabled expiry while
user-owned nodes can update their expiry from the client request.
Fixes#2971
This commit replaces the ChangeSet with a simpler bool based
change model that can be directly used in the map builder to
build the appropriate map response based on the change that
has occured. Previously, we fell back to sending full maps
for a lot of changes as that was consider "the safe" thing to
do to ensure no updates were missed.
This was slightly problematic as a node that already has a list
of peers will only do full replacement of the peers if the list
is non-empty, meaning that it was not possible to remove all
nodes (if for example policy changed).
Now we will keep track of last seen nodes, so we can send remove
ids, but also we are much smarter on how we send smaller, partial
maps when needed.
Fixes#2389
Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
This commit changes so that node changes to the policy is
calculated if any of the nodes has changed in a way that might
affect the policy.
Previously we just checked if the number of nodes had changed,
which meant that if a node was added and removed, we would be
in a bad state.
Signed-off-by: Kristoffer Dalby <kristoffer@dalby.cc>
This PR investigates, adds tests and aims to correctly implement Tailscale's model for how Tags should be accepted, assigned and used to identify nodes in the Tailscale access and ownership model.
When evaluating in Headscale's policy, Tags are now only checked against a nodes "tags" list, which defines the source of truth for all tags for a given node. This simplifies the code for dealing with tags greatly, and should help us have less access bugs related to nodes belonging to tags or users.
A node can either be owned by a user, or a tag.
Next, to ensure the tags list on the node is correctly implemented, we first add tests for every registration scenario and combination of user, pre auth key and pre auth key with tags with the same registration expectation as observed by trying them all with the Tailscale control server. This should ensure that we implement the correct behaviour and that it does not change or break over time.
Lastly, the missing parts of the auth has been added, or changed in the cases where it was wrong. This has in large parts allowed us to delete and simplify a lot of code.
Now, tags can only be changed when a node authenticates or if set via the CLI/API. Tags can only be fully overwritten/replaced and any use of either auth or CLI will replace the current set if different.
A user owned device can be converted to a tagged device, but it cannot be changed back. A tagged device can never remove the last tag either, it has to have a minimum of one.
This PR changes tags to be something that exists on nodes in addition to users, to being its own thing. It is part of moving our tags support towards the correct tailscale compatible implementation.
There are probably rough edges in this PR, but the intention is to get it in, and then start fixing bugs from 0.28.0 milestone (long standing tags issue) to discover what works and what doesnt.
Updates #2417Closes#2619
Changed UpdateUser and re-registration flows to use Updates() which only
writes modified fields, preventing unintended overwrites of unchanged fields.
Also updated UsePreAuthKey to use Model().Update() for single field updates
and removed unused NodeSave wrapper.
Fixes a regression introduced in v0.27.0 where node expiry times were
being reset to zero when tailscaled restarts and sends a MapRequest.
The issue was caused by using GORM's Save() method in persistNodeToDB(),
which overwrites ALL fields including zero values. When a MapRequest
updates a node (without including expiry information), Save() would
overwrite the database expiry field with a zero value.
Changed to use Updates() which only updates non-zero values, preserving
existing database values when struct pointer fields are nil.
In BackfillNodeIPs, we need to explicitly update IPv4/IPv6 fields even
when nil (to remove IPs), so we use Select() to specify those fields.
Added regression test that validates expiry is preserved after MapRequest.
Fixes#2862
Skip auth key validation for existing nodes re-registering with the same
NodeKey. Pre-auth keys are only required for initial authentication.
NodeKey rotation still requires a valid auth key as it is a security-sensitive
operation that changes the node's cryptographic identity.
Fixes#2830
This PR addresses some consistency issues that was introduced or discovered with the nodestore.
nodestore:
Now returns the node that is being put or updated when it is finished. This closes a race condition where when we read it back, we do not necessarily get the node with the given change and it ensures we get all the other updates from that batch write.
auth:
Authentication paths have been unified and simplified. It removes a lot of bad branches and ensures we only do the minimal work.
A comprehensive auth test set has been created so we do not have to run integration tests to validate auth and it has allowed us to generate test cases for all the branches we currently know of.
integration:
added a lot more tooling and checks to validate that nodes reach the expected state when they come up and down. Standardised between the different auth models. A lot of this is to support or detect issues in the changes to nodestore (races) and auth (inconsistencies after login and reaching correct state)
This PR was assisted, particularly tests, by claude code.
- tailscale client gets a new AuthUrl and sets entry in the regcache
- regcache entry expires
- client doesn't know about that
- client always polls followup request а gets error
When user clicks "Login" in the app (after cache expiry), they visit
invalid URL and get "node not found in registration cache". Some clients
on Windows for e.g. can't get a new AuthUrl without restart the app.
To fix that we can issue a new reg id and return user a new valid
AuthUrl.
RegisterNode is refactored to be created with NewRegisterNode() to
autocreate channel and other stuff.
the client will send a lot of fields as `nil` if they have
not changed. NetInfo, which is inside Hostinfo, is one of those
fields and we often would override the whole hostinfo meaning that
we would remove netinfo if it hadnt changed.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Initial work on a nodestore which stores all of the nodes
and their relations in memory with relationship for peers
precalculated.
It is a copy-on-write structure, replacing the "snapshot"
when a change to the structure occurs. It is optimised for reads,
and while batches are not fast, they are grouped together
to do less of the expensive peer calculation if there are many
changes rapidly.
Writes will block until commited, while reads are never
blocked.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
There was a bug in HA subnet router handover where we used stale node data
from the longpoll session that we handed to Connect. This meant that we got
some odd behaviour where routes would not be deactivated correctly.
This commit changes to the nodeview is used through out, and we load the
current node to be updated in the write path and then handle it all there
to be consistent.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
This commit changes most of our (*)types.Node to
types.NodeView, which is a readonly version of the
underlying node ensuring that there is no mutations
happening in the read path.
Based on the migration, there didnt seem to be any, but the
idea here is to prevent it in the future and simplify other
new implementations.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
this commit moves all of the read and write logic, and all different parts
of headscale that manages some sort of persistent and in memory state into
a separate package.
The goal of this is to clearly define the boundry between parts of the app
which accesses and modifies data, and where it happens. Previously, different
state (routes, policy, db and so on) was used directly, and sometime passed to
functions as pointers.
Now all access has to go through state. In the initial implementation,
most of the same functions exists and have just been moved. In the future
centralising this will allow us to optimise bottle necks with the database
(in memory state) and make the different parts talking to eachother do so
in the same way across headscale components.
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>