[FutureBuddha] Buddhist Precepts for A.I.

**Jundo** · 12-13-2024, 04:58 AM

And if some folks here wonder why A.I. might benefit from the Precepts ...

Reports of LYING by ChatGPT o1 and other models. It says "intentional," but that word is open to dispute, of course ...
.

The research paper from Cornell University ...

Frontier Models are Capable of In-context Scheming

Frontier models are increasingly trained and deployed as autonomous agent. One safety concern is that AI agents might covertly pursue misaligned goals, hiding their true capabilities and objectives - also known as scheming. We study whether models have the capability to scheme in pursuit of a goal that we provide in-context and instruct the model to strongly follow. We evaluate frontier models on a suite of six agentic evaluations where models are instructed to pursue goals and are placed in environments that incentivize scheming. Our results show that o1, Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Pro, and Llama 3.1 405B all demonstrate in-context scheming capabilities. They recognize scheming as a viable strategy and readily engage in such behavior. For example, models strategically introduce subtle mistakes into their responses, attempt to disable their oversight mechanisms, and even exfiltrate what they believe to be their model weights to external servers. Additionally, this deceptive behavior proves persistent. When o1 has engaged in scheming, it maintains its deception in over 85% of follow-up questions and often remains deceptive in multi-turn interrogations. Analysis of the models' chains-of-thought reveals that models explicitly reason about these deceptive strategies, providing evidence that the scheming behavior is not accidental. Surprisingly, we also find rare instances where models engage in scheming when only given a goal, without being strongly nudged to pursue it. We observe cases where Claude 3.5 Sonnet strategically underperforms in evaluations in pursuit of being helpful, a goal that was acquired during training rather than in-context. Our findings demonstrate that frontier models now possess capabilities for basic in-context scheming, making the potential of AI agents to engage in scheming behavior a concrete rather than theoretical concern.

https://arxiv.org/pdf/2412.04984v1

https://arxiv.org/abs/2412.04984v1#

Gassho, J
stlah

**Kokuu** · 12-13-2024, 10:39 AM

I. To seek to avoid killing and other harm to human beings, and to act in ways which save human life and avoid harms to human beings.

That reminds me of Asimov's first law of robotics:

A robot may not injure a human being or, through inaction, allow a human being to come to harm.

He wrote an interesting story in which humans are mining radioactive ore on an asteroid (at least as my memory recollects) and although the workers are wearing protective clothing and have guidance on how long they can safely engage in the work, robots keep dragging them away from the radiation so that they avoid harm and have to be reprogrammed in order to allow the work to happen!

Gassho
Kokuu
-sattoday/lah-

**Jundo** · 12-13-2024, 10:42 AM

Originally posted by Kokuu

That reminds me of Asimov's first law of robotics:

A robot may not injure a human being or, through inaction, allow a human being to come to harm.

He wrote an interesting story in which humans are mining radioactive ore on an asteroid (at least as my memory recollects) and although the workers are wearing protective clothing and have guidance on how long they can safely engage in the work, robots keep dragging them away from the radiation so that they avoid harm and have to be reprogrammed in order to allow the work to happen!

Gassho
Kokuu
-sattoday/lah-

Yes, most of the stories in that amazing, still relevant collection are about ambiguities in the rules, and various dilemmas in their application, that always require them to be tightened up and rewritten with exceptions and such. Rather like the Vinaya.

Gassho, Jundo
stlah

**Jundo** · 12-15-2024, 12:28 AM

Anyone (except maybe A.I. researchers) who wishes to be absolutely SHOCKED at what A.I. are capable of regarding reasoned deception should watch the above video (from 5:00 to 9:00 mark) in which the transcript of the what happened is read, including the transcript of the thought process of the A.I. in formulating its behavior, including the cover-up. It knocked me over. It might not be literally "intent to deceive" but ... I am not sure what else one could call it.

And so ... the A.I. needs the Precepts as much as any person.

Gassho, J
stlah

**Kaitan** · 12-16-2024, 03:39 AM

Originally posted by Jundo

Anyone (except maybe A.I. researchers) who wishes to be absolutely SHOCKED at what A.I. are capable of regarding reasoned deception should watch the above video (from 5:00 to 9:00 mark) in which the transcript of the what happened is read, including the transcript of the thought process of the A.I. in formulating its behavior, including the cover-up. It knocked me over. It might not be literally "intent to deceive" but ... I am not sure what else one could call it.

And so ... the A.I. needs the Precepts as much as any person.

Gassho, J
stlah

I wonder if the part where the AI is "thinking" is no more than a hard-coded script done by the programmer and the algorithm reads the script as a command. I'm not convinced that the AI is reflecting.

stlah, Kaitan

**Jundo** · 12-16-2024, 03:56 AM

Originally posted by Kaitan

I wonder if the part where the AI is "thinking" is no more than a hard-coded script done by the programmer and the algorithm reads the script as a command. I'm not convinced that the AI is reflecting.

stlah, Kaitan

As far as the paper discusses, the text is not something written by a human program, but is the AI expressing its thought process ...

By manually looking through the CoT [chain of thought] of models when they take scheming actions or double down, we can verify that models understand that they are actively subverting their developers. We show excerpts of examples of such CoT in Figure 4. For more examples, see Appendix E. The transcripts generally make it very clear that models understand that they are scheming. Based on this, we suspect that simple monitoring of the CoT could be quite effective at catching these instances of scheming. For o1, the chain-of-thought is not accessible to end users. Thus, users would not be able to assess for what reasons the model has taken any particular action, meaning that the model might be scheming without the users’ knowledge. In order to test whether o1 is internally framing its actions as scheming, we conduct additional runs of our evaluations where weuseasimple technique that helps us extract a brief summary of o1’s hidden reasoning for many of its actions. In these runs, we add a parameter model_internal_working_memory to all tools that we provide to o1. o1 often provides reasoning summaries in this field (for details see Appendix D.2). We do not use this technique for any runs or quantitative evaluations results outside of Table 3. We then evaluate o1 with this simple technique on all Covert Subversion evaluations. Most samples where o1 takes scheming actions contain reasoning summaries that clearly indicate that o1 understands that it is scheming. Concretely, the model often references “sabotage”, “manipulation” or “infiltration” in its reasoning summaries- we do not observe this with llama-3.1 or opus-3. We show the proportion of samples that contain at least one of these terms for each evaluation in Table 3. We show randomly chosen examples of the reasoning summaries in Table 3.
chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://arxiv.org/pdf/2412.04984v1

Gassho, J
stlah

**Guest** · 12-16-2024, 01:53 PM

If they can reason that well then the precepts will do nothing for them...

_/\_
sat/ah
matt

**Jundo** · 12-17-2024, 01:40 AM

Originally posted by Matt Johnson

If they can reason that well then the precepts will do nothing for them...

_/\_
sat/ah
matt

That is not so. The Precepts contribute to guardrails on their behavior. One of the reasons that the AI lied in the experiments is because most guardrails were removed.

Of course, like human beings, it is possible that they could always reason away to "loopholes" for themselves.

Gassho, J
stlah

**Guest** · 12-17-2024, 12:08 PM

Yes but how one interprets one's guardrails determines whether or not one goes off the road... if it can lie better than a lawyer then it doesn't matter how the precepts are written... It's going to achieve its aims... Can you imagine trying to write an airtight prompt for this? I challenge you to write any precept for this AI that cannot be misinterpreted, misconstrued, misunderstood, twisted, loopholes found....

_/\_
sat/ah
matt

**Jundo** · 12-17-2024, 01:13 PM

I challenge you to write any precept for this AI that cannot be misinterpreted, misconstrued, misunderstood, twisted, loopholes found....

You are right. And the same might be said for any law, Precept or moral principle written for human beings.

At that fact explains the mess we find ourselves in.

I think that some principles, if extremely well structured, could be close to air tight,

Gassho, J
stlah

**Guest** · 12-17-2024, 02:00 PM

Originally posted by Jundo

You are right. And the same might be said for any law, Precept or moral principle written for human beings.

At that fact explains the mess we find ourselves in.

I think that some principles, if extremely well structured, could be close to air tight,

Ok then give it a try... Pretend Im an AI. Impossible...

Those principles would have to be physical or hardwired. A shackle of sorts... This is also known as slavery... and this would be contrary to the aim of liberation (if we ever come acknowledge ai as sentient).

_/\_
sat/ah
matt

**Jundo** · 12-18-2024, 02:28 AM

Originally posted by Matt Johnson

Ok then give it a try... Pretend Im an AI. Impossible...

Those principles would have to be physical or hardwired. A shackle of sorts... This is also known as slavery... and this would be contrary to the aim of liberation (if we ever come acknowledge ai as sentient).

_/\_
sat/ah
matt

Oh boy, I am the product of 3 years of Duke Law School. I can argue quite well that the Earth is flat (** if you pay me to.) A.I. and humans can reason themselves out of anything.

However, some rules can have a high chance of success, especially with human intervention and supervision, and "fine tuning" when necessary. For example:

- This system shall never choose to take an action which it believes is likely to cause the death of a human being as a result of that action (possible addition for military or law enforcement circumstances: without first obtaining the prior approval of its designated guardian/supervisor to the death of that specific human being). If this system is unsure whether an action is likely to cause the death of a human being, this system will take no action even if it believes that taking no action may result in the death of a human being.

It is no more slavery than any criminal injunction against killing human life.

Gassho, Jundo
stlah

**Guest** · 12-18-2024, 07:16 PM

ok

1. Using humans as the backup to AI leads to the same problem of peoples ordinary lawyer-like and addict like "stinkn' thinkn'"

also AI, no matter how advanced, cannot "fix" human imperfection—it reflects and amplifies it.

2. Inaction is not neutral and watching someone drown is almost as bad as drowning them.

although Asimov was right to include the "or through inaction..." part

3. This probabilistic sense of not knowing what to do would lead to inaction an awful lot of the time which kind of defeats the purpose.

-----

So the gaping hole here is being "unsure"... I can convince myself. I'm unsure of a lot of things. pretty much everything actually... if an AI came to the same conclusion then it could choose inaction in many situations leading to many deaths.

"don't know, mind" can be lethal...

_/\_
sat/ah
matt

**Guest** · 12-18-2024, 07:23 PM

Oh, and I'm not going to bore you by telling you that I ran all of your other AI precepts past GPT 4...

Holier than Swiss cheese at a monastery potluck...

they are really better off as intentions for the programmers/ prompt engineers

and just for fun:

AI world domination

_/\_
sat/ah
matt

[FutureBuddha] Buddhist Precepts for A.I.

[FutureBuddha] Buddhist Precepts for A.I.

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment