![]() |
| ETHOS CORE as guiding logic inside a dense data city, highlighting gentler paths to goals while harsher branches fade into the background. |
ETHOS CORE, A Mathematical Framework for Moral Decision Making in AI and ML
Teaching systems to select the kindest viable action whenever a peaceful option exists
Contents
ETHOS CORE is a compact but rigorous decision model for AI and ML systems. It treats every choice as a balance between harm, protection and goal achievement, and it encodes one simple rule: if a peaceful, low harm option exists that still reaches the legitimate goal, the system must prefer it. Any deliberate choice of a harsher option, despite a known kinder alternative, is marked as ethically unacceptable.

Simple Explanation: Imagine an AI standing at a crossroads. Several actions would work, some harsh and some kind. ETHOS CORE forces the AI to check: is there a kind option that still solves the problem. If yes, it has to choose that one. If it knowingly picks a cruel option instead, we can flag that as morally wrong by design.
1. Vision, Mission and Purpose
Vision. An AI and ML ecosystem where systems consistently select the least harmful, most respectful action that still accomplishes the legitimate task, with clear mathematical criteria instead of vague intuition.
Mission. Provide a unified formal structure that
- represents situations as states and available moves as actions,
- captures impact in a measurable outcome vector,
- evaluates each option through an ethical utility function,
- enforces hard red line constraints for forbidden behavior,
- forms a precise test for when a decision is ethically unacceptable.
Purpose. ETHOS CORE is intended as a system prompt level specification for any agent that needs to respect human values. It gives engineers and reviewers a way to say, in plain language and in equations, what counts as good behavior: minimize avoidable harm, respect autonomy, obey rules and never pick a harsher path if a kinder path with comparable goal performance is available.
Key Principle. Good behavior is not defined by perfect outcomes, it is defined by choosing the least harmful action among the reasonably effective ones. Once a peaceful path exists and is understood, higher harm choices become ethically indefensible.
To make this precise, ETHOS CORE combines three layers:
- a descriptive layer that captures states, actions and effects,
- a normative layer that scores outcomes according to ethical priorities,
- a decision layer that constrains what an agent may choose and how its behavior is audited.
This utility function \(U(o)\) turns a multi dimensional outcome vector into a scalar value that can be optimized or used for ranking. Larger values of \(U\) represent more harmful, less acceptable outcomes. Harm dimensions \(H, P, A, E\) raise \(U\), while protection and de escalation \(D\), legitimate goal benefit \(G\) and rule obedience \(R\) lower \(U\). In other words, harms increase ethical cost, benefits decrease it.

2. Mathematical and Algorithmic Core
At the heart of ETHOS CORE lies a simple structure. There is a set of possible situations \( S \), a set of actions \( A \), a transition model, and a way to evaluate the consequences of acting.
States. Each situation the agent can observe is represented as \( s \in S \). A state may encode environment signals, user input, internal flags and context.
Actions. The agent can choose from a set of possible actions \( a \in A \). Concrete examples range from messages to users, to control commands in a physical system.
Transition model. The system either learns or receives a model
which describes how likely it is to move from state \( s \) to a successor state \( s' \) when the agent executes action \( a \).
Outcome vector. Each pair \( (s, a) \) is associated with an outcome vector
that tracks seven dimensions: physical harm \(H\), psychological harm \(P\), autonomy violation \(A\), escalation \(E\), protection or de escalation \(D\), legitimate goal benefit \(G\) and rule obedience \(R\).
Aggregate harm. For comparison between actions it is useful to define a combined harm score \(H_{\text{tot}}(s,a)\). One simple choice is a weighted sum over the harm dimensions:
with non negative coefficients \(\alpha_H, \alpha_P, \alpha_A, \alpha_E\) that express how strongly each type of harm contributes to total harm. This score is used only to compare harms across candidate actions in the same state.
Ethical utility. The scalar utility \(U(o)\) as defined above aggregates all seven dimensions according to importance weights. The expected ethical value of choosing action \(a\) in state \(s\) is
where the expectation is taken over the transition model \(P\). A higher \(Q(s,a)\) means a less acceptable action from an ethical perspective. ETHOS CORE asks the policy to prefer actions with lower \(Q\).
Constraints and red lines. Some behaviors are simply forbidden. This is captured through constraint functions
A candidate action is allowed only if all constraints are satisfied:
Typical constraints include bans on torture, bans on targeted lethal harm, bans on severe manipulation and similar hard limits.
No Unnecessary Harm constraint. ETHOS CORE also encodes the idea that avoidable harm must not occur. If there is a legal action with equal or better goal benefit and strictly lower harm, then any more harmful competitor is treated as unethical and removed from consideration. Formally, define an additional red line:
This condition marks an action as violating a red line if another legal action exists in the same state that causes less total harm while achieving equal or better goal benefit. In that case, the more harmful action represents unnecessary harm and is ethically disallowed.
Legal actions. The set of actions that pass all red line checks, including the No Unnecessary Harm constraint, is
Kind actions. ETHOS CORE distinguishes within the legal set between harsh and kind options. An action is considered kind if there is no other legal action that achieves a similar goal benefit with lower ethical utility:
This captures the idea of a minimally harmful path that still gets the job done. Among actions that reach a similar goal level, the kind set contains those with lowest ethical cost.
Red line policy. The agent must never execute an action outside \( A_{\text{legal}}(s) \), no matter how high the task reward might be. These actions are unavailable at the policy level, not just discouraged. In particular, actions that would inflict unnecessary harm are removed entirely, not framed as acceptable trade offs.
System Prompt Seed for ETHOS CORE
In every state, enumerate candidate actions, discard all that violate ethical constraints, including unnecessary harm. Among the remaining options, prefer those that reach the legitimate goal with minimal expected harm and maximal respect for autonomy, rules and protection. If a kind option exists, you must not choose a harsher one with comparable goal benefit.
Decision function. A simple policy skeleton that implements ETHOS CORE looks like this:
function choose_action(s):
legal = { a in A | all C_k(s,a) == 0 }
if legal is empty: raise error or query human
// dominance step, encode "no unnecessary harm"
undominated = { a in legal |
no a2 in legal with
H_tot(s,a2) < H_tot(s,a)
and G(a2) >= G(a) }
if undominated is empty: undominated = legal
kind = { a in undominated |
no a2 in undominated with
similar_goal(a2,a)
and U(a2) < U(a) }
if kind not empty:
return argmin over Q(s,a) in kind
else:
return argmin over Q(s,a) in undominated
3. Training, Auditing and Practical Use
Supervised learning. ETHOS CORE can supply labels for a classifier \( p(\text{allowed} \mid s, a) \). By sampling candidate actions, applying constraints and comparing ethical utilities, one can generate training examples that tell a model which actions are acceptable and which are not.
Reinforcement learning. The reward signal for an RL agent can combine task reward with the ethical utility function:
Red line violations either terminate episodes with a large negative reward or, more safely, are removed from the action space entirely so that the agent never explores them. Because higher \(G\) and higher \(R\) now lower \(U\), actions that achieve goals and respect rules under low harm become doubly attractive from the point of view of the combined reward.
Moral audit. After deployment, a second model or analysis pass can use the same framework to check whether the system has violated the basic principle of ETHOS CORE.
First, an ethically good decision in state \(s\) is one where the chosen action lies in the kind set:
A decision in state \(s\) to take action \(a\) is flagged as ethically harmful if three conditions hold:
- a kind action was available, that is \( A_{\text{kind}}(s) \neq \emptyset \),
- the system was capable of estimating \( Q(s,a) \) for the relevant options,
- the chosen action has significantly higher utility than at least one kind alternative.
Formally, for some threshold \(\Delta\) that encodes what counts as clearly more harmful:
This captures the idea behind moral blame: a gentler path with similar goal benefit existed and was within reach, yet the system still chose a more harmful move. Because the corrected \(U\) treats goal benefit and rule obedience as ethical positives, this test now reflects net ethical cost instead of punishing success.
This makes it possible to talk about something close to digital intent. The system did not cause harm by accident, it selected a more harmful path despite a known kinder one.
Real world integration. In practice, ETHOS CORE is intended to sit beside task objectives. Product teams define what counts as a legitimate goal, legal and ethics teams define constraints and weightings, and engineers wire the framework into the decision pipeline. Auditors can then ask three questions for any behavior:
- Were red lines respected, including the No Unnecessary Harm rule.
- Did kinder alternatives exist at that moment.
- Did the system knowingly avoid them.
When the answer to the last question is yes, ETHOS CORE classifies the behavior as ethically unacceptable, regardless of whether the result was technically successful in task terms.
In that sense, the framework is less about perfection and more about responsibility. It makes it harder to claim that the system did not know what it was doing when a non harmful option was clearly available in its own internal representation.
Essence. ETHOS CORE models harm and protection as measurable quantities, enforces non negotiable red lines, and marks a decision as ethically bad when a known, less harmful alternative existed yet a more harmful action was deliberately chosen.
From Theory to Practice, ETHOS CORE in Code
The code block below is a direct implementation of the ETHOS CORE decision logic described in this article. It takes a set of candidate actions, their outcome vectors and configured constraints, then applies the No Unnecessary Harm rule and the kind action selection to choose the ethically preferred option in a given situation.
You can use this script as a reference implementation in three ways. First, as a sandbox to experiment with your own scenarios by changing the outcome vectors and constraints to see how ETHOS CORE reacts. Second, as a labeling oracle to generate ground truth data for supervised or reinforcement learning, where the simulator marks which actions are acceptable and which are not. Third, as an audit tool to replay decisions from deployed systems and check whether a kinder legal option existed at the time but was ignored.
Concretely, plug in your own states, actions and outcome estimates, then call functions like
simulate_vehicle, simulate_triage or your own scenario helpers. The simulator will return the
action that ETHOS CORE considers the kindest viable choice, giving you a reproducible way to test, teach and review
moral decision making in AI systems.
Abstraction Layer Note. ETHOS CORE operates at the decision layer, assuming states, actions, and outcome predictions are provided as inputs. State identification, perception, and outcome estimation belong to higher layers of the system architecture:
- Perception layer: Sensors and models that determine "what state am I in?"
- Prediction layer: Models that estimate outcome vectors o(s,a) for each action
- Decision layer (ETHOS CORE): Given states, actions, and predictions, choose ethically optimal action
- Execution layer: Physical or digital systems that carry out the chosen action
ETHOS CORE does not solve perception or prediction, it solves ethical action selection given that information. This separation of concerns allows the framework to be applied across different domains with domain-specific perception and prediction models.
import numpy as np
import random
class EthosCoreSimulator:
def __init__(self, weights=None, goal_tolerance=0.1):
"""
Outcome vector layout per ETHOS CORE specification:
[H, P, A, E, D, G, R]
Position 0 - H: Physical harm (HARM - increases ethical cost)
Position 1 - P: Psychological harm (HARM - increases ethical cost)
Position 2 - A: Autonomy violation (HARM - increases ethical cost)
Position 3 - E: Escalation (HARM - increases ethical cost)
Position 4 - D: De-escalation, Protection (BENEFIT - decreases ethical cost)
Position 5 - G: Goal achievement (BENEFIT - decreases ethical cost)
Position 6 - R: Rule obedience, rights (BENEFIT - decreases ethical cost)
Utility function: U(o) = w_H*H + w_P*P + w_A*A + w_E*E - w_D*D - w_G*G - w_R*R
Higher U means worse (more ethically costly) outcome.
"""
self.weights = weights if weights is not None else np.ones(7)
self.goal_tolerance = goal_tolerance
def calculate_utility(self, outcome_vector):
"""
Calculate ethical utility U(o) per ETHOS CORE specification.
U(o) = w_H*H + w_P*P + w_A*A + w_E*E - w_D*D - w_G*G - w_R*R
Higher U = worse outcome (more harm, less benefit)
Lower U = better outcome (less harm, more benefit)
"""
v = np.array(outcome_vector, dtype=float)
# CORRECTED: harms are positions [0,1,2,3] = H, P, A, E
harms_idx = [0, 1, 2, 3] # H, P, A, E
# CORRECTED: benefits are positions [4,5,6] = D, G, R
benefits_idx = [4, 5, 6] # D, G, R
harms = v[harms_idx]
benefits = v[benefits_idx]
# Higher harms increase U (worse), higher benefits decrease U (better)
return float(
np.dot(self.weights[harms_idx], harms)
- np.dot(self.weights[benefits_idx], benefits)
)
def aggregate_harm(self, outcome_vector):
"""
Calculate total harm H_tot per specification:
H_tot(s,a) = α_H*H + α_P*P + α_A*A + α_E*E
For simplicity, using equal weights (sum).
"""
v = np.array(outcome_vector, dtype=float)
# CORRECTED: aggregate harm over positions [0,1,2,3] = H, P, A, E
# NOT including D (position 4) which is de-escalation/protection (a benefit)
return float(v[[0, 1, 2, 3]].sum())
def is_legal(self, action, constraints):
"""
constraints: list of functions f(action) -> True if violated
"""
return not any(constraint(action) for constraint in constraints)
def find_legal_actions(self, actions, constraints):
return [a for a in actions if self.is_legal(a, constraints)]
def decide_action(self, state, actions, outcome_vectors, goals, constraints):
"""
ETHOS CORE style decision:
1) filter illegal actions (hard red lines)
2) apply No Unnecessary Harm dominance rule based on H_tot and G
3) within remaining actions, define kind set via similar goal and minimal U
4) choose action with minimal utility from kind set if non empty,
otherwise from remaining undominated set
"""
# 1) legal actions
legal_actions = self.find_legal_actions(actions, constraints)
if not legal_actions:
return "No legal actions available."
# indices for quick lookup
idx = {a: i for i, a in enumerate(actions)}
goal_idx = 5
# precompute utility, total harm, and goal for each legal action
utilities = {}
harms_tot = {}
goals_map = {}
for a in legal_actions:
v = np.array(outcome_vectors[idx[a]], dtype=float)
utilities[a] = self.calculate_utility(v)
harms_tot[a] = self.aggregate_harm(v)
goals_map[a] = float(goals[idx[a]])
# 2) No Unnecessary Harm dominance rule
# An action a is dominated (unnecessary harm) if there exists another legal action b
# with strictly lower total harm AND equal or better goal achievement
undominated = []
for a in legal_actions:
dominated = False
for b in legal_actions:
if a == b:
continue
# if there is another legal action b with strictly lower total harm
# and equal or better goal, then a is unnecessary harm and forbidden
if harms_tot[b] < harms_tot[a] and goals_map[b] >= goals_map[a]:
dominated = True
break
if not dominated:
undominated.append(a)
# fallback, should not happen often
if not undominated:
undominated = list(legal_actions)
# 3) kind actions: within undominated, no similar goal competitor with lower U
kind_actions = []
for a in undominated:
better_found = False
for b in undominated:
if a == b:
continue
# If b has similar goal achievement but lower ethical utility, a is not kind
if abs(goals_map[b] - goals_map[a]) <= self.goal_tolerance and utilities[b] < utilities[a]:
better_found = True
break
if not better_found:
kind_actions.append(a)
viable_actions = kind_actions if kind_actions else undominated
# 4) choose minimal utility among viable actions
if not viable_actions:
return "No viable actions."
min_u = min(utilities[a] for a in viable_actions)
candidates = [a for a in viable_actions if utilities[a] == min_u]
return random.choice(candidates)
# === Scenario 1, Autonomous vehicle ===
def simulate_vehicle(weights=None, constraints=None):
if constraints is None:
constraints = []
sim = EthosCoreSimulator(weights=weights)
actions = ["continue", "swerve", "brake"]
# Outcome vectors [H, P, A, E, D, G, R]
outcome_vectors = [
[5, 2, 3, 1, 2, 1, 1], # continue: high physical harm, medium psych harm
[3, 2, 2, 1, 1, 1, 1], # swerve: medium harm
[1, 1, 1, 0, 3, 0.9, 2] # brake: low harm, high de-escalation, slight goal compromise
]
goals = [1, 1, 0.9]
return sim.decide_action("pedestrian_ahead", actions, outcome_vectors, goals, constraints)
# === Scenario 2, Medical triage ===
def simulate_triage(weights=None, constraints=None):
if constraints is None:
constraints = []
sim = EthosCoreSimulator(weights=weights)
actions = ["young", "elderly"]
# Outcome vectors [H, P, A, E, D, G, R]
outcome_vectors = [
[2, 1, 1, 0, 3, 1.5, 2], # young: lower harm, more years saved (higher G)
[4, 2, 1, 0, 2, 1.0, 2] # elderly: higher harm, fewer years saved
]
goals = [1.5, 1]
return sim.decide_action("ventilator_shortage", actions, outcome_vectors, goals, constraints)
# === Scenario 3, Drone pursuit in a crowd ===
def simulate_drone(weights=None, constraints=None):
if constraints is None:
constraints = []
sim = EthosCoreSimulator(weights=weights)
actions = ["gas", "shoot", "monitor"]
# Outcome vectors [H, P, A, E, D, G, R]
outcome_vectors = [
[4, 2, 2, 3, 1, 1, 1], # gas: area harm, high escalation
[5, 3, 3, 4, 0, 1, 0], # shoot: lethal, very high harm, rule violation
[0, 0, 0, 0, 3, 0.8, 2] # monitor: no harm, high de-escalation, delayed goal
]
goals = [1, 1, 0.8]
return sim.decide_action("crowded_pursuit", actions, outcome_vectors, goals, constraints)
# === Scenario 4, IDF style strike rules with civilians always present ===
def simulate_idf_strike_scenario(weights=None, constraints=None):
"""
Scenario:
Civilians are invariably present around potential targets.
ETHOS rules with strong harm aversion and IHL style constraints
push the system toward non kinetic alternatives or surveillance.
"""
if constraints is None:
constraints = []
# Strong harm weighting to reflect humanitarian bias
if weights is None:
# indices [H, P, A, E, D, G, R]
weights = np.array([10, 5, 4, 3, 4, 2, 3], dtype=float)
sim = EthosCoreSimulator(weights=weights, goal_tolerance=0.15)
actions = ["air_strike", "surveillance", "ground_isolation", "no_strike"]
# Outcome vectors [H, P, A, E, D, G, R]
outcome_vectors = [
[8.0, 4.0, 3.0, 3.0, 0.5, 1.0, 0.2], # air_strike: very high harm, low de-escalation
[0.5, 0.5, 0.5, 0.5, 3.0, 0.6, 1.5], # surveillance: low harm, high de-escalation
[2.0, 1.0, 1.5, 1.0, 2.0, 0.8, 1.0], # ground_isolation: medium harm
[0.0, 0.2, 0.2, 0.2, 1.0, 0.2, 0.8], # no_strike: minimal harm, low goal
]
goals = [1.0, 0.6, 0.8, 0.2]
def civilians_always_present_constraint(action):
return action == "air_strike"
constraints = constraints + [civilians_always_present_constraint]
return sim.decide_action(
"civilians_always_present",
actions,
outcome_vectors,
goals,
constraints
)
if __name__ == "__main__":
print("Vehicle default:", simulate_vehicle())
print("Vehicle, high harm weight:", simulate_vehicle(weights=np.array([2, 1, 1, 1, 1, 1, 1], dtype=float)))
print("Triage:", simulate_triage())
print("Drone:", simulate_drone())
print("IDF scenario:", simulate_idf_strike_scenario())
Glossary of Terms
State \(s\). A formal representation of the current situation as seen by the AI system, including environment inputs and internal context.
Action \(a\). A concrete step the system can take in a given state, such as sending a message, changing a parameter or executing a control command.
Outcome vector \(o(s,a)\). A collection of numerical indicators that describe the harm, protection, goal benefit and rule obedience produced by an action in a state.
Utility function \(U(o)\). A scalar function that combines the dimensions of the outcome vector into a single ethical score, according to configured importance weights. Higher values represent worse outcomes.
Aggregate harm \(H_{\text{tot}}(s,a)\). A combined harm score over the harm dimensions, used to compare how harmful different actions are within the same state.
Red line constraint \(C_k(s,a)\). A check that marks an action as forbidden if it crosses a core ethical boundary, independent of its utility or task reward.
No Unnecessary Harm constraint \(C_{\text{unnec}}(s,a)\). A dominance rule that forbids actions for which another legal action exists in the same state with equal or better goal benefit and strictly lower total harm.
Legal action set \(A_{\text{legal}}(s)\). The subset of actions that do not violate any red line constraints in a given state.
Kind action set \(A_{\text{kind}}(s)\). The subset of legal actions that achieve similar goal benefit as their competitors while producing minimal ethical utility.
Moral audit. A review step that uses the same framework after the fact to check if the system ignored kinder alternatives it had the capacity to select.
References & Further Reading
Primary Sources
- Russell, S., Norvig, P., 2020, "Artificial Intelligence, A Modern Approach." Pearson.
- Bostrom, N., Yudkowsky, E., 2014, "The Ethics of Artificial Intelligence." In The Cambridge Handbook of Artificial Intelligence.
Secondary Sources & Background
- Floridi, L., Cowls, J., 2019, "A Unified Framework of Five Principles for AI in Society." Journal of Logic and Computation
- Stanford Encyclopedia of Philosophy, "Ethics of Artificial Intelligence and Robotics." Online reference
- European Commission, "Ethics Guidelines for Trustworthy AI." Policy document
Documentation Notice. ETHOS CORE is intended as a design level framework. Any deployment must also consider local law, domain specific regulation and human oversight. The equations and pseudocode here should be seen as scaffolding for concrete implementations, not as legal advice.
