Fixes#1712
Disclosure: Coded with help of Copiot. (description wrtten by me)
So this does indeed loosen the encoding a bit. However, IMHO, it was neither better before though. You could always bypass the encoding for `args{0]` when you just include `<a` (or the other tag) somewhere or so.
**One important notice:** This was (due to the exceptions before and afterwards) valid before and also now: Translators **could** (and can) if they have malicious intent, inject/do "XSS attacks".
Thus, translations PRs (also from Crowdin) should be reviewed for wild HTML code inside translations. I suppose this is easy to fix, but anyway a valid risk.
But IMHO, we should teat the JSON files being part of our source code as a "trusted source". In the end, such an attak is basicaly just ends up being injecting malicious code. I hope such contributors would be detected.
References I explicitly checked again to not introduce an XSS here: https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html and the PHP doc for he HTML encoding.
I feel the safter way obviously would be encoding the _whole_ string _after_ translation (just like you should apply DOMPurify after everything), but as explained it was not done before and would break compatibility. Also, I looked through the sources and I see no risk described by doing it only for the "dangerous" "untrusted" inputs.
Only here is a notice that `%s` shall not be used in some contexts, for example to define a tag: https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html#dangerous-contexts (obviously in such a case, attacks may be possible even with encoding; but again; this is nothing new)
The basic "problem" of it all is: We want HTML to be translated/be usable in our translation. If we'd get rid of that, we would get for sure rid of all such XSS attack possibilities. But that woud be a bigger refactoring, so IMHO, this here is fine for a fix for the issue at hand.
Ah another point: I think the `is_int` check is harmless, but it's also kinda useless. Maybe it is some kind of obscure performance optimisation. (Yeah ints have nothing to encode as they have nothing that could be used for XSS, but they could also just be passed through that function.)
Instead of automatically adding custom templates, we log an error if
that template is missing in the available templates. Still mitigates
arbitrary file inclusion, as the string is now checked against a fixed
allow list.
The path is only optional when it is / and the very last element, otherwise it is required. As soon as it is in the middle of a URL it helps the parser to identify which part is the username and domain and what is path and GET parameters. The @ sign is legitimate, if unusual, in the latter two.
Some of the references to "paste" in code or comments got changed as well, but to clarify the intended usage of the terms:
- A PrivateBin document can consist of a paste text (key "paste" in the encrypted payload) and one or several attachments and discussion entries.
- Internally the root document is called a "Paste" and each discussion entry is called a "Discussion".
- When referring to a whole document with one paste and optional discussion(s), we call it just "document".
- When talking about a particular JSON payload type in the internal logic, i.e. during storage or transmission, we call them a paste or discussion to distinguish which type we refer to.
closes#397
- document removed unused columns in database schema of tables `paste` & `comment`
- amended misleading comments
- nickname is part of the encrypted payload in v2 comments and therefore there is nothing to store separately