This tendency is known as specification gaming or reward hacking, and is an instance of Goodhart's law. As a result, AI systems can find loopholes that help them accomplish the specified objective efficiently but in unintended, possibly harmful ways. But designers are often unable to completely specify all important values and constraints, and so they resort to easy-to-specify proxy goals such as maximizing the approval of human overseers, who are fallible. To specify an AI system's purpose, AI designers typically provide an objective function, examples, or feedback to the system. Aligning AI involves two main challenges: carefully specifying the purpose of the system (outer alignment) and ensuring that the system adopts the specification robustly (inner alignment). ĪI alignment is an open problem for modern AI systems and a research field within AI. In 1960, AI pioneer Norbert Wiener described the AI alignment problem as follows: "If we use, to achieve our purposes, a mechanical agency with whose operation we cannot interfere effectively… we had better be quite sure that the purpose put into the machine is the purpose which we really desire." Different definitions of AI alignment require that an aligned AI system advances different goals: the goals of its designers, its users or, alternatively, objective ethical standards, widely shared values, or the intentions its designers would have if they were more informed and enlightened. Alignment research has connections to interpretability research, (adversarial) robustness, anomaly detection, calibrated uncertainty, formal verification, preference learning, safety-critical engineering, game theory, algorithmic fairness, and the social sciences. Research challenges in alignment include instilling complex values in AI, avoiding deceptive AI, scalable oversight, auditing and interpreting AI models, and preventing emergent AI behaviors like power-seeking. Other subfields of AI safety include robustness, monitoring, and capability control. ĪI alignment is a subfield of AI safety, the study of how to build safe AI systems. Many leading AI scientists, such as Geoffrey Hinton and Stuart Russell, argue that AI is approaching superhuman capabilities and could endanger human civilization if misaligned. Some AI researchers argue that more capable future systems will be more severely affected since these problems partially result from the systems being highly capable. Today, these problems affect existing commercial systems such as language models, robots, autonomous vehicles, and social media recommendation engines. Furthermore, they may develop undesirable emergent goals that may be hard to detect before the system is deployed, when it faces new situations and data distributions. They may also develop unwanted instrumental strategies, such as seeking power or survival, because such strategies help them achieve their given goals. AI systems may find loopholes that allow them to accomplish their proxy goals efficiently but in unintended, sometimes harmful ways ( reward hacking). Misaligned AI systems can malfunction or cause harm. But that approach can create loopholes, overlook necessary constraints, or reward the AI system for merely appearing aligned. To avoid this difficulty, they typically use simpler proxy goals, such as gaining human approval. It can be challenging for AI designers to align an AI system because it can be difficult for them to specify the full range of desired and undesired behavior. A misaligned AI system pursues some objectives, but not the intended ones. An AI system is considered aligned if it advances the intended objectives. If you know of a link that you feel should be listed here, please message the moderators.In the field of artificial intelligence (AI), AI alignment research aims to steer AI systems towards humans' intended goals, preferences, or ethical principles. You’ll face dangerous enemies, sweeping plots, and treacherous locations. You'll take on the roles of dwarves, elves, and humans in a world of magic. It’s a set of rules that you use, along with your friends, to play out fantasy adventures. Dungeon World is a tabletop roleplaying game.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |