让AI玩一千万次贪吃蛇-天翼云

贪吃蛇实现

在深入探讨人工智能如何掌握贪吃蛇游戏之前，让我们先回顾一下贪吃蛇游戏的基本设计和规则。贪吃蛇是一款经典的电子游戏，其简单的规则和直观的游戏玩法使其成为了历史上最受欢迎的游戏之一。

游戏规则

在贪吃蛇游戏中，玩家控制一条不断移动的蛇，游戏目标是吃掉出现在屏幕上的食物，每吃掉一个食物，蛇的长度就会增加。游戏的挑战在于蛇不能触碰到屏幕边缘或自己的身体，否则游戏结束。随着蛇的不断增长，避免撞到自己变得越来越难，这就需要玩家有较高的策略和反应能力。

游戏实现

在我们的Python实现中，游戏界面使用Pygame库创建，设置了固定的宽度和高度，以及基本的颜色定义，如黑色背景、绿色蛇身和红色食物。游戏中的蛇和食物都以方块的形式表示，每个方块的大小由block_size变量定义。游戏的主体是SnakeGame类，它封装了游戏的所有基本逻辑，包括蛇的初始化、移动、食物的生成以及碰撞检测等。

width, height = 640, 480  # 游戏窗口的宽度和高度
block_size = 20  # 每个方块的大小，包括蛇的每一部分和食物
class SnakeGame:
    def __init__(self, width, height, block_size):
        self.width = width
        self.height = height
        self.block_size = block_size
        self.reset()

在reset方法中，游戏被重置到初始状态，蛇回到屏幕中心，长度为3个方块，方向向上。同时，食物被随机放置在屏幕上的某个位置。

def reset(self):
    initial_position = [self.width // 2, self.height // 2]
    self.snake = [
        initial_position,
        [initial_position[0] - self.block_size, initial_position[1]],
        [initial_position[0] - 2 * self.block_size, initial_position[1]]
    ]
    self.direction = 'UP'
    self.food = [random.randrange(1, self.width // self.block_size) * self.block_size,
                 random.randrange(1, self.height // self.block_size) * self.block_size]
    self.score = 0
    self.done = False
    return self.get_state()

蛇的移动是通过在蛇头前面添加一个新的方块，并在蛇尾去掉一个方块来实现的。如果蛇吃到了食物，就不去掉蛇尾的方块，从而使蛇的长度增加。

        在游戏的每一步，step方法都会被调用，它接收一个动作（上、下、左、右），然后更新游戏状态，包括蛇的位置、游戏得分和游戏是否结束。这个方法是连接游戏逻辑和AI代理的桥梁，AI代理将根据游戏的当前状态决定下一步的最佳动作。

完整的游戏类：

class SnakeGame:
    def __init__(self, width, height, block_size):
        self.width = width
        self.height = height
        self.block_size = block_size
        self.reset()

    def reset(self):
        initial_position = [self.width // 2, self.height // 2]
        self.snake = [
            initial_position,
            [initial_position[0] - self.block_size, initial_position[1]],
            [initial_position[0] - 2 * self.block_size, initial_position[1]]
        ]
        self.direction = 'UP'
        self.food = [random.randrange(1, self.width // self.block_size) * self.block_size,
                     random.randrange(1, self.height // self.block_size) * self.block_size]
        self.score = 0
        self.done = False
        return self.get_state()

    def step(self, action):
        directions = ['UP', 'DOWN', 'LEFT', 'RIGHT']
        if self.direction != 'UP' and action == 1:
            self.direction = 'DOWN'
        elif self.direction != 'DOWN' and action == 0:
            self.direction = 'UP'
        elif self.direction != 'LEFT' and action == 3:
            self.direction = 'RIGHT'
        elif self.direction != 'RIGHT' and action == 2:
            self.direction = 'LEFT'

        # 计算食物的距离以判断是否靠近食物
        distance_to_food_before = self.distance(self.snake[0], self.food)

        # 移动蛇
        x, y = self.snake[0]
        if self.direction == 'UP':
            y -= block_size
        elif self.direction == 'DOWN':
            y += block_size
        elif self.direction == 'LEFT':
            x -= block_size
        elif self.direction == 'RIGHT':
            x += block_size
        new_head = [x, y]


        if x < 0 or x >= width or y < 0 or y >= height:
            self.done = True
            reward = -500  # 撞墙的惩罚
            return self.get_state(), reward, self.done
        if new_head in self.snake:
            self.done = True
            reward = -200  # 撞到自身的严重惩罚
            return self.get_state(), reward, self.done


        self.snake.insert(0, new_head)
        if new_head == self.food:
            self.score += 1
            reward = 500  # 吃到食物的奖励
            self.food = [random.randrange(1, width // block_size) * block_size,
                         random.randrange(1, height // block_size) * block_size]
        else:
            self.snake.pop()


            distance_to_food_after = self.distance(new_head, self.food)
            if distance_to_food_after < distance_to_food_before:
                reward = 30  # 靠近食物的奖励
            else:
                reward = -15  # 无进展的小惩罚

        # 检查是否长时间直线移动
        if len(set([part[0] for part in self.snake])) == 1 or len(set([part[1] for part in self.snake])) == 1:
            if len(self.snake) > 5:  # 如果蛇的长度超过一定阈值
                reward -= 5  # 长时间直线移动的小惩罚

        return self.get_state(), reward, self.done

    def distance(self, point1, point2):
        return ((point1[0] - point2[0]) ** 2 + (point1[1] - point2[1]) ** 2) ** 0.5

    def get_state(self):
        head = self.snake[0]
        point_l = [head[0] - self.block_size, head[1]]
        point_r = [head[0] + self.block_size, head[1]]
        point_u = [head[0], head[1] - self.block_size]
        point_d = [head[0], head[1] + self.block_size]

        state = [
            head[0] / self.width, head[1] / self.height,  # 蛇头的相对位置
            self.food[0] / self.width, self.food[1] / self.height,  # 食物的相对位置
            int(self.direction == 'LEFT'), int(self.direction == 'RIGHT'),
            int(self.direction == 'UP'), int(self.direction == 'DOWN'),  # 蛇头的方向
            int(self.is_collision(point_l)), int(self.is_collision(point_r)),
            int(self.is_collision(point_u)), int(self.is_collision(point_d))  # 周围是否会发生碰撞
        ]

        return np.array(state, dtype=float)

    def is_collision(self, point):
        return point[0] < 0 or point[0] >= self.width or point[1] < 0 or point[1] >= self.height or point in self.snake

    def display(self, screen):
        screen.fill(black)
        for part in self.snake:
            pygame.draw.rect(screen, green, pygame.Rect(part[0], part[1], self.block_size, self.block_size))
        pygame.draw.rect(screen, red, pygame.Rect(self.food[0], self.food[1], self.block_size, self.block_size))
        pygame.display.flip()

Q学习算法实现

Q学习是一种无模型的强化学习算法，广泛应用于各种决策过程，包括我们的贪吃蛇游戏。它使代理能够在与环境交互的过程中学习，从而确定在给定状态下采取哪种动作以最大化未来的奖励。接下来，我们将深入探讨Q学习的核心概念和工作原理，并结合前文提到的贪吃蛇游戏代码进行说明。

Q学习简介

Q学习的目标是学习一个策略，告诉代理在特定状态下应该采取什么动作以获得最大的长期奖励。这种策略通过一个叫做Q函数的东西来表示，它为每个状态-动作对分配一个值（Q值）。这个值代表了在给定状态下采取某个动作，并遵循当前策略的情况下，预期能获得的总奖励。

Q表和Q值

在实践中，Q函数通常通过一个表格（Q表）来实现，表中的每一行对应一个可能的状态，每一列对应一个可能的动作，表中的每个元素表示该状态-动作对的Q值。在我们的贪吃蛇游戏中，状态可以是蛇头的位置、食物的位置以及蛇头的方向等信息的组合，动作则是蛇的移动方向（上、下、左、右）。

Q学习更新规则

Q学习的核心是其更新规则，它定义了如何根据代理的经验逐步调整Q值。更新规则如下：

让AI玩一千万次贪吃蛇

Q学习在贪吃蛇游戏中的应用

在贪吃蛇游戏的实现中，我们首先定义了一个QLearningAgent类，它包含了学习率、折扣率和探索率等参数，以及一个空的Q表来存储Q值。

class QLearningAgent:
    def __init__(self, learning_rate=0.5, discount_rate=0.8, exploration_rate=0.45):
        self.learning_rate = learning_rate
        self.discount_rate = discount_rate
        self.exploration_rate = exploration_rate
        self.q_table = {}

代理通过get_action方法根据当前状态选择动作，这里使用了ϵ-贪婪策略来平衡探索和利用：

def get_action(self, state):
    state_key = str(state)
    if state_key not in self.q_table:
        self.q_table[state_key] = np.zeros(4)

    if np.random.rand() < self.exploration_rate:
        return random.randint(0, 3)  # 探索
    else:
        return np.argmax(self.q_table[state_key])  # 利用

在每一步之后，update_q_table方法会根据Q学习的更新规则来调整Q值：

def update_q_table(self, state, action, reward, next_state, done):
    state_key = str(state)
    next_state_key = str(next_state)

    if next_state_key not in self.q_table:
        self.q_table[next_state_key] = np.zeros(4)

    next_max = np.max(self.q_table[next_state_key])
    self.q_table[state_key][action] = (1 - self.learning_rate) * self.q_table[state_key][action] + \
                                      self.learning_rate * (reward + self.discount_rate * next_max)

通过这种方式，Q学习代理能够在与环境的交互中逐渐学习到在各种状态下应该采取的最佳动作，从而在贪吃蛇游戏中获得尽可能高的分数。随着训练的进行，代理会变得越来越熟练，最终能够展示出高水平的游戏技巧。

整体项目完整代码

 import time
import pygame
import random
import numpy as np


width, height = 640, 480


black = (0, 0, 0)
green = (0, 255, 0)
red = (255, 0, 0)


block_size = 20
speed = 15

class SnakeGame:
    def __init__(self, width, height, block_size):
        self.width = width
        self.height = height
        self.block_size = block_size
        self.reset()

    def reset(self):
        initial_position = [self.width // 2, self.height // 2]
        self.snake = [
            initial_position,
            [initial_position[0] - self.block_size, initial_position[1]],
            [initial_position[0] - 2 * self.block_size, initial_position[1]]
        ]
        self.direction = 'UP'
        self.food = [random.randrange(1, self.width // self.block_size) * self.block_size,
                     random.randrange(1, self.height // self.block_size) * self.block_size]
        self.score = 0
        self.done = False
        return self.get_state()

    def step(self, action):
        directions = ['UP', 'DOWN', 'LEFT', 'RIGHT']
        if self.direction != 'UP' and action == 1:
            self.direction = 'DOWN'
        elif self.direction != 'DOWN' and action == 0:
            self.direction = 'UP'
        elif self.direction != 'LEFT' and action == 3:
            self.direction = 'RIGHT'
        elif self.direction != 'RIGHT' and action == 2:
            self.direction = 'LEFT'

        # 计算食物的距离以判断是否靠近食物
        distance_to_food_before = self.distance(self.snake[0], self.food)

        # 移动蛇
        x, y = self.snake[0]
        if self.direction == 'UP':
            y -= block_size
        elif self.direction == 'DOWN':
            y += block_size
        elif self.direction == 'LEFT':
            x -= block_size
        elif self.direction == 'RIGHT':
            x += block_size
        new_head = [x, y]


        if x < 0 or x >= width or y < 0 or y >= height:
            self.done = True
            reward = -500  # 撞墙的惩罚
            return self.get_state(), reward, self.done
        if new_head in self.snake:
            self.done = True
            reward = -200  # 撞到自身的严重惩罚
            return self.get_state(), reward, self.done


        self.snake.insert(0, new_head)
        if new_head == self.food:
            self.score += 1
            reward = 500  # 吃到食物的奖励
            self.food = [random.randrange(1, width // block_size) * block_size,
                         random.randrange(1, height // block_size) * block_size]
        else:
            self.snake.pop()


            distance_to_food_after = self.distance(new_head, self.food)
            if distance_to_food_after < distance_to_food_before:
                reward = 30  # 靠近食物的奖励
            else:
                reward = -15  # 无进展的小惩罚

        # 检查是否长时间直线移动
        if len(set([part[0] for part in self.snake])) == 1 or len(set([part[1] for part in self.snake])) == 1:
            if len(self.snake) > 5:  # 如果蛇的长度超过一定阈值
                reward -= 5  # 长时间直线移动的小惩罚

        return self.get_state(), reward, self.done

    def distance(self, point1, point2):
        return ((point1[0] - point2[0]) ** 2 + (point1[1] - point2[1]) ** 2) ** 0.5

    def get_state(self):
        head = self.snake[0]
        point_l = [head[0] - self.block_size, head[1]]
        point_r = [head[0] + self.block_size, head[1]]
        point_u = [head[0], head[1] - self.block_size]
        point_d = [head[0], head[1] + self.block_size]

        state = [
            head[0] / self.width, head[1] / self.height,  # 蛇头的相对位置
            self.food[0] / self.width, self.food[1] / self.height,  # 食物的相对位置
            int(self.direction == 'LEFT'), int(self.direction == 'RIGHT'),
            int(self.direction == 'UP'), int(self.direction == 'DOWN'),  # 蛇头的方向
            int(self.is_collision(point_l)), int(self.is_collision(point_r)),
            int(self.is_collision(point_u)), int(self.is_collision(point_d))  # 周围是否会发生碰撞
        ]

        return np.array(state, dtype=float)

    def is_collision(self, point):
        return point[0] < 0 or point[0] >= self.width or point[1] < 0 or point[1] >= self.height or point in self.snake

    def display(self, screen):
        screen.fill(black)
        for part in self.snake:
            pygame.draw.rect(screen, green, pygame.Rect(part[0], part[1], self.block_size, self.block_size))
        pygame.draw.rect(screen, red, pygame.Rect(self.food[0], self.food[1], self.block_size, self.block_size))
        pygame.display.flip()

class QLearningAgent:
    def __init__(self, learning_rate=0.5, discount_rate=0.8, exploration_rate=0.45):
        self.learning_rate = learning_rate
        self.discount_rate = discount_rate
        self.exploration_rate = exploration_rate
        self.q_table = {}

    def get_action(self, state):
        state_key = str(state)
        if state_key not in self.q_table:
            self.q_table[state_key] = np.zeros(4)

        if np.random.rand() < self.exploration_rate:
            return random.randint(0, 3)
        else:
            return np.argmax(self.q_table[state_key])

    def update_q_table(self, state, action, reward, next_state, done):
        state_key = str(state)
        next_state_key = str(next_state)

        if next_state_key not in self.q_table:
            self.q_table[next_state_key] = np.zeros(4)

        next_max = np.max(self.q_table[next_state_key])
        self.q_table[state_key][action] = (1 - self.learning_rate) * self.q_table[state_key][action] + \
                                          self.learning_rate * (reward + self.discount_rate * next_max)

        if done:
            self.exploration_rate *= 0.9999
            if self.exploration_rate < 0.1:
                self.exploration_rate = 0.3

game = SnakeGame(width, height, block_size)
agent = QLearningAgent()
num_episodes = 10000000
visualization_interval = 10000
for episode in range(num_episodes):
    if episode % visualization_interval == 1:
        pygame.init()
        screen = pygame.display.set_mode((width, height))
        visualize = True
    else:
        visualize = False

    state = game.reset()
    done = False
    total_reward = 0

    while not done:
        if visualize:
            game.display(screen)
            clock = pygame.time.Clock()
            clock.tick(12)

        action = agent.get_action(state)
        next_state, reward, done = game.step(action)
        agent.update_q_table(state, action, reward, next_state, done)

        state = next_state
        total_reward += reward

    if visualize:
        pygame.quit()

    print(f"轮次: {episode}, 奖励分数: {total_reward}, 探索率: {agent.exploration_rate}, {agent}")

运行过程截图

让AI玩一千万次贪吃蛇

代码分析

环境设置

游戏环境是通过Pygame库来创建和管理的。首先，我们初始化Pygame并设置游戏窗口的尺寸。接着，定义了几种基本颜色用于绘制游戏元素，如蛇身和食物。

import pygame
import random

width, height = 640, 480  # 游戏窗口的宽度和高度
black = (0, 0, 0)  # 背景颜色
green = (0, 255, 0)  # 蛇的颜色
red = (255, 0, 0)  # 食物的颜色
block_size = 20  # 每个方块的大小

蛇的行为定义

SnakeGame类封装了蛇的行为和游戏逻辑。蛇的初始状态是位于屏幕中心，长度为3个方块，方向向上。蛇的每一部分都是一个方块，我们通过一个列表来维护这些方块的坐标。

Q学习代理实现

QLearningAgent类实现了Q学习算法。它使用一个Q表来存储和更新状态-动作对的Q值，并根据ϵ-贪婪策略来选择动作。

Q学习代理实现

QLearningAgent类实现了Q学习算法。它使用一个Q表来存储和更新状态-动作对的Q值，并根据ϵ-贪婪策略来选择动作。

game = SnakeGame(width, height, block_size)
agent = QLearningAgent(learning_rate=0.5, discount_rate=0.8, exploration_rate=0.45)

for episode in range(num_episodes):
    state = game.reset()
    done = False

    while not done:
        action = agent.get_action(state)
        next_state, reward, done = game.step(action)
        agent.update_q_table(state, action, reward, next_state, done)
        state = next_state

通过这个过程，Q学习代理逐渐学会了如何在贪吃蛇游戏中做出更好的决策，从而获得更高的分数。随着训练的进行，我们可以观察到代理的性能逐渐提高，这表明它正在学习和适应游戏环境。

小结

本文通过结合Python和Pygame库实现了一个经典的贪吃蛇游戏，并引入了Q学习算法，使得人工智能代理能够自主学习如何玩游戏并提高其性能。通过详细解析游戏设计、Q学习的核心原理以及代码实现的每个步骤，我们展示了如何将强化学习应用于实际问题中，快把代码拿走试试看吧

活动

智算服务

应用商城

合作伙伴

开发者

支持与服务

了解天翼云

让AI玩一千万次贪吃蛇

让AI玩一千万次贪吃蛇

贪吃蛇实现

游戏规则

游戏实现

Q学习算法实现

Q学习简介

Q表和Q值

Q学习更新规则

Q学习在贪吃蛇游戏中的应用

整体项目完整代码

运行过程截图

代码分析

环境设置

蛇的行为定义

Q学习代理实现

Q学习代理实现

小结

相关文章

【强化学习】强化学习的基本概念与应用

AIGC的底层技术：底层逻辑代码分析与原理实现

计算机初级选手的成长历程——扫雷详解

全局变量_基本操作

shell基础_shell简介

变量基础_变量场景

编程语言逻辑

shell基础_开发规范解读

全局变量_嵌套shell

内置变量_默认值相关

作者介绍

最新文章

【强化学习】强化学习的基本概念与应用

全局变量_基本操作

JDK动态代理笔记整理

装饰者设计模式（二）番外篇 装饰者设计模式和静态代理设计模式区别

JDK动态代理和CGLIB动态代理的区别

量子机器学习：颠覆性的前沿技术

热门文章

react18-学习笔记14-枚举（Enum）

java学习第一天笔记-hello world小案例8

react项目实战学习笔记-学习38-滚动条样式

java202302java学习笔记第四天-用户交互scanner之3

react18-学习笔记5-安装和使用ts

java202302java学习笔记第九天-数组的遍历和三道综合学习6小案例

热门标签

相关产品

弹性云主机

天翼云电脑（公众版）

对象存储

云硬盘

随机文章

如何解决爬虫重复率高的问题

【人工智能基础一】深度学习基础

react18-学习笔记9-interface初探

react项目实战学习笔记-学习28-表格结构搭建

java202302java学习笔记第四天-用户交互scanner之3

仅用pygame+python实现植物大战僵尸-----完成比完美更重要

装饰者设计模式（二）番外篇装饰者设计模式和静态代理设计模式区别