I do think it’s possible for the model to infer whether the response was audited vs not by the latency between the previous action and the current action relative to the actions expected time (aka timing side channel attacks). If you’re working in a terminal, you may have the timestamp in the prompt (or can query for it), or if you’re working in a computer identical setup the time is in the upper right.
I don’t disagree with your overall premise here, but I do think timing leaks information. It’s possible to start delaying all actions a variable amount to reduce information leakage, but then you start getting into the domain of “timing side channel attack defenses” which reduce the timing performance of the system proportional to the defense robustness.
Similarly, if auditing is a function of the prompt and not completion some percentage or the time that does reduce the confidence a model can place on such inferences.
Curious if you’ve already given thought to this and have other mitigations?
I do think it’s possible for the model to infer whether the response was audited vs not by the latency between the previous action and the current action relative to the actions expected time (aka timing side channel attacks). If you’re working in a terminal, you may have the timestamp in the prompt (or can query for it), or if you’re working in a computer identical setup the time is in the upper right.
I don’t disagree with your overall premise here, but I do think timing leaks information. It’s possible to start delaying all actions a variable amount to reduce information leakage, but then you start getting into the domain of “timing side channel attack defenses” which reduce the timing performance of the system proportional to the defense robustness.
Similarly, if auditing is a function of the prompt and not completion some percentage or the time that does reduce the confidence a model can place on such inferences.
Curious if you’ve already given thought to this and have other mitigations?
anomaly detection