Lox

笔记,未整理...

跟随着 crafting interpreter , 使用 go 写了一遍 lox 的解释器

分别是手写的词法 scanner , 语法 parser , 每个AST 节点的解释 , 类的实现 , 环境变量的实现 , 简单的语义分析(解决 shadow variable) , 使对变量的操作关联到正确的 scope 上

scanner 部分

使用了一个简单的状态机实现 , 基本是 by case 地写就行 , 还可以用 flex 生成

context-free grammer compare to REGULAR EXP

if i want to add new grammar , how ?

why we need scanner (abstraction , separation of concern, some impl combine all , and become hard to read,maintain) what is scanner

how many methods to implement (easy -> hard) by case scan , and sub-case (most easy to maintain , clean code , which hcl use) regular expression (build your own , and CS164 ) DFA (by hand , or lexical grammar (LEX auto generate a DFA machine)) the way I do in go , and some small technique (like go generate) what problem i solved and by what tech in the process what i learned? (go error handle https://lailin.xyz/post/go-training-03.html , go generate) can this use in production , daily work ? (custom script/conf lang , document / web page parse)

parser 部分

职责: token stream -> ast

是几个里面比较复杂的一个 , 主要分为表达式和语句 , 对表达式 eval 会产生值 , 对语句 eval 会产生副作用

表达式需要解决关联和优先级 , 左递归等问题 , 目前除了赋值是右关联的,其他都是左关联 , 也就是 a=b=c 等价于 a = (b = c)

左递归问题通过改写产生式解决 , 如 additive | additive + INTEGER => additive | INTEGER + additive , 但是这样改写后, 关联性被改变了

还需要继续改写 , additive | unary + unary ?

优先级参考表格

问自己一个问题, 如何加新的语法(如数组,字典) 到上下文无关文法里 , 优先级要怎么确定

除了手写外, 可以用 bison 生成

resolver

这部分只做了一件事, 遍历 AST 生成各个遍历的 scope step , 使得诸如

var a =1;

if true { var a = a +1; //第一个 a 在内部作用域 (声明并赋值) , 第二个 a 是外层作用域 }

能够符合正确的作用域

之后语义分析可以加在这里 , 修饰 AST declaration binding , shadow scope problem

resolver 是通过 scope stack 入栈出栈完成的分析

interpreter

树的遍历, visitor 模式

各个表达式节点, 语句节点的eval函数编写

表达式节点只需要 DFS 然后返回值就行了

decl 节点 stmt 节点的一种 , 和env 相关的定义操作

语句节点有不同的副作用 , 赋值语句需要环境/作用域的支持 , 每个代码块是一个作用域 , 如果在本作用域找不到就往父级作用域找 , 这里的作用域层级是静态的 , 也就是编译期确定的 , 有一门热门语言是奇葩的动态作用域(JavaScript) 还有一类可归类为 IO 副作用的语句 , 如 print

class

class 作为一个新功能添加到解释器中 , 提供了一个新功能添加步骤的参考

类其实是把数据和代码捆绑在一起

1.类提供的功能:

构造函数创建和初始化新实例存储和访问实例的字段定义一组所有类实例共享的非法 , 并在各自实例的状态下操作

2.类的语法:

类和函数一样, 需要 decl

classDecl -> "class" IDENTIFIER "{" function* "}" ;

由于 class 内部的 method 不需要 fun 关键字 , 因此不用 funDecl , 而是另外定义了

function -> IDENTIFIER "(" parameters? ")" block ;

在 Stmt 节点生成器处添加Class 节点的生成语句

前面说了 class 是把数据和方法绑定在一起 , 对应地就需要有地方保存 class 的方法和变量和 class 的名字

这里可以服用 Stmt.Function decl , 因为节点的父 env 是通过参数传入的 , 完全可以复用

classDecl 的优先级就和其他 decl 优先级一致,放一起

scanner 和 parser 部分修改完毕了

接着是 interpreter , 在这里把 classDecl 作为一个环境变量定义在当前作用域下 , 先声明后定义避免 shadow scope

定义一个 toString 方法, 尝试 print class 变量 , 可以跑到所有改动的代码

class Cream { serveOn() { return "Scones"; } }

print Cream;

定义的部分完成了.

接着是创建实例的部分

lox 没有类的静态方法 , 所以如果没有实例将毫无用处

为了不引入新语法 , lox 把实例化的方式设计为 class Bagel {} Bagel(); 复用 CallExpr

需要把 LoxClass 实现为 LoxCallable 的 , 和 funcdecl 一样

实现 call 接口 , 并产生一个 LoxInstance

每次完成一部分 , 作者就会尝试 print 出来以跑过全部改动的代码

class LoxInstance { private LoxClass klass;

LoxInstance(LoxClass klass) { this.klass = klass; }

@Override public String toString() { return klass.name + " instance"; } }

LoxInstance 应当能够调用 LoxClass 里的方法 (所以 loxClass 需要传入作为自己的一个属性) , 并带上自己的状态 , 但目前还没有实现状态

接着实现实例属性 : properties on instances lox 不实现私有属性 , 所有属性都是公开的 , 并且可以不声明直接使用

属性访问语法的设计: someObject.someProperty

由于该语法的点号和function call ( someObject() ) 的括号有相同的优先级 (找相同优先级的) , 所以修改并加在 call 中

call -> primary ( "(" arguments? ")" | "." IDENTIFIER )* );

并将 .IDENTIFIER parse 为GetExpression

GetExpression 是左关联递归的 , 使用一个 while (true) { expr = new Expr.Get(expr,name); 嵌套

egg.scramble(3).with(cheddar);

              ()
          .           cheddar
    ()       with

. 3 egg scramble

新增节点类型 , 需要在 resolver 里增加 resolve 方法 , getExpr 的 resolve 不需要做任何事 , 因为属性是动态 look up 的 , resolver 只负责编译期静态分析/检查

然后是解释 GetExpr 节点 , 节点需要做的事则是从 loxinstance 中获取属性并返回

如果属性不存在, 不像 js 返回 nil (容易造成 bug) , 而是直接报错

property 和 field 有些许的不同

接着是对属性进行赋值 , 找到之前赋值的语法 , 扩展它

assignment -> ( call ".")? IDENTIFIER "=" assignment | logic_or

和 get expr 不同的是 , set expr 没有链式调用 , 只有最后一个才是 setter

breakfast.omelette.filling.meat = ham

                           expr.set .=
                       meat              ham
            expr.get .
expr.get .              filling

breakfast omelette

解析的方式和想的不一样 , 先把左侧当做一个任意大的 get expr , 然后碰到 = 的时候,再转换成 set expr

return new Expr.Assign(name, value); } else if (expr instanceof Expr.Get) { Expr.Get get = (Expr.Get)expr; return new Expr.Set(get.object, get.name, value); } lox/Parser.java, in assignment()

同样 set expr 也需要加到 resolver

method on class

由于 getter (链式调用) 和 function call (函数执行) 的语法 parser 已经实现了,这里不需要添加新的语法

这里有个 pulled apart 的问题,跳过 var m = object.method; m(argument);

resolveFunction 在静态分析这里需要 resolve this expr , 后面讲到

最终函数调用是通过 LoxInstance 调用的, 在 instance 里有 loxklass , 各个 method 的ast node 就保存在 loxclass 的method map 里

object.method , method 是 field , 但不是属性 , 这是property 和 field 的区别

当 get expr 返回 loxFunction 后, Bacon().eat() , 遇到() call expr , function 会被 eval

最后是 this 关键字

有了 this 才能在 method 内和instance 的属性关联到一起

有了前面的设计 , this expr 就很简单了 , this expr eval to current LoxInstance

但是要怎么做到 this 和当前的 lox instance 绑定呢

this 就像个挂在函数周围的额外数据 , 听起来像闭包 , 这里的非法是将 this 定义为method 作用域的一个隐藏变量

关键的工作都在 resolver 里 , 在 resolver 中确定 this 绑定的 loxinstance

这里需要为 this 在原先 env 基础上 , 创建一个新的 env scope , 保存在里面, 为什么呢?

保存在原来的 env 里有什么问题吗, 因为解释的时候是分成两步的 , 一部是 cake.taste , eval get expr 并产生 env 包含 this , 一步是 call expr , () , 又产生一个新的 env 执行

我觉得合到一起也没什么问题

array

为了检验学习效果 , 尝试着添加数组的功能 , 跟随 class 的步骤

closure 的实现

收获

对编译原理的前端有了初步的认识

资料

go 自带的 AST 生成工具

cs143 cool 编程语言实验指导

reference source clox scanner optimization https://craftinginterpreters.com/scanning-on-demand.html#identifiers-and-keywords go standard ast interface and go generate https://lailin.xyz/post/41140.html jlox source code crafting-inter... related page cs164 https://inst.eecs.berkeley.edu/~cs164/fa20/ hashicorp configuration language go-workwx ast json https://calebschoepp.com/blog/2020/adding-a-list-data-type-to-lox/ Adding A List Data Type To Lox - Caleb Schoepp

后续

添加新功能数组和字典 , 词法,语法设计

把 byte code machine 看了

Lox