• Stars
    star
    34
  • Rank 766,985 (Top 16 %)
  • Language
    Rust
  • License
    MIT License
  • Created almost 3 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Semantic analyzer library for compilers written in Rust for semantic analysis of programming languages AST

License: MIT Lints Tests Crates.io version codecov

mrLSD/semantic-analyzer-rs

Semantic analyzer is an open source semantic analyzer for programming languages that makes it easy to build your own efficient compilers.

๐ŸŒ€ What is the library for and what tasks does it solve

Creating a compilers for a programming language is process that involves several key stages. Most commonly it is:

โ–ถ๏ธ Lexical Analysis (Lexer): This stage involves breaking down the input stream of characters into a series of tokens. Tokens are the atomic elements of the programming language, such as identifiers, keywords, operators, etc.

โ–ถ๏ธ Syntax Analysis (Parsing): At this stage, the tokens obtained in the previous stage are grouped according to the grammar rules of the programming language. The result of this process is an Abstract Syntax Tree (AST), which represents a hierarchical structure of the code.

โฉ Semantic Analysis: This stage involves checking the semantic correctness of the code. This can include type checking, scope verification of variables, etc.

โ–ถ๏ธ Intermediate Code Optimization: At this stage, the compiler tries to improve the intermediate representation of the code to make it more efficient. This can include dead code elimination, expression simplification, etc.

โ–ถ๏ธ Code Generation: This is the final stage where the compiler transforms the optimized intermediate representation (IR) into machine code specific to the target architecture.

This library represent Semantic Analysis stage.

๐ŸŒป Features

โœ… Name Binding and Scope Checking: The analyzer verifies that all variables, constants, functions are declared before they're used, and that they're used within their scope. It also checks for name collisions, where variables, constants, functions, types in the same scope have the same name.

โœ… Checking Function Calls: The analyzer verifies that functions are called with the number of parameters and that the type of arguments matches the type expected by the function.

โœ… Scope Rules: Checks that variables, functions, constants, types are used within their scope, and available in the visibility scope.

โœ… Type Checking: The analyzer checks that operations are performed on compatible types for expressions, functions, constant, bindings. For operations in expressions. It is the process of verifying that the types of expressions are consistent with their usage in the context.

โœ… Flow Control Checking: The analyzer checks that the control flow statements (if-else, loop, return, break, continue) are used correctly. Supported condition expressions and condition expression correctness check.

โœ… Building the Symbol Table: For analyzing used the symbol table as data structure used by the semantic analyzer to keep track of symbols (variables, functions, constants) in the source code. Each entry in the symbol table contains the symbol's name, type, and scope related for block state, and other relevant information.

๐ŸŒณ Semantic State Tree

The result of executing and passing stages of the semantic analyzer is: Semantic State Tree.

This can be used for Intermediate Code Generation, for further passes semantic tree optimizations, linting, backend codegen (like LLVM) to target machine.

๐ŸŒฒ Structure of Semantic State Tree

  • blocks state and related block state child branches. It's a basic entity for scopes: variables, blocks (function, if, loop). Especially it makes sense for expressions. This allows you to granularly separate the visibility scope and its visibility limits. In particular - all child elements can access parent elements. However, parent elements cannot access child elements, which effectively limits the visibility scope and entity usage.

    • variables state: block state entity, contains properties of variable in current state like: name, type, mutability, allocation, mallocation.

    • inner variables state: block state entity, contains inner variables names. It's useful for Intermediate Representation for codegen backends like LLVM. Where shadowed name variables should have different inner names. It means inner variables always unique.

    • labels state: block state entity, that contains all information about control flow labels.

  • Global state: contains global state of constants, declared functions and types.

  • State entity: contains:

    • Global State
    • Errors results
    • Semantic tree results

All of that source data, that can be used for Intermediate Representation for next optimizations and compilers codegen.

๐Ÿงบ Subset of programming languages

The input parameter for the analyzer is a predefined AST (abstract syntax tree). As a library for building AST and the only dependency used nom_locate - which allows getting all the necessary information about the source code, for further semantic analysis and generating relevant and informative error messages. Currently decided that the AST is a fixed structure because it is a fundamental element that defines the lexical representation of a programming language.

On the other hand, it allows you to implement any subset of the programming language that matches syntax tree. It also implies a subset of lexical representations from which an AST can be generated that meets the initial requirements of the semantic analyzer. As a library for lexical analysis and source code parsing, it is recommended to use: nom is a parser combinators library.

AST displays the Turing complete programming language and contains all the necessary elements for this.

๐Ÿ›‹๏ธ Examples

  • ๐Ÿ”Ž There is the example implementation separate project ๐Ÿ’พ Toy Codegen. The project uses the SemanticStack results and converts them into Code Generation logic. Which clearly shows the possibilities of using the results of the semantic-analyzer-rs SemanticStackContext results. LLVM is used as a backend, inkwell as a library for LLVM codegen, and compiled into an executable program. The source of data is the AST structure itself.

MIT LICENSE

More Repositories

1

riscv-fs

F# RISC-V Instruction Set formal specification
F#
279
star
2

iron-cms

CMS based on Iron Framework for Rust lang
Rust
46
star
3

go-benchmark-app

Application for HTTP benchmarking via different rules and configs
Go
27
star
4

nickel-cms

The research project on the development of CMS based on Nickel framework
Rust
14
star
5

amq

Message queue distributed framework based on Actix
Rust
12
star
6

iLang

iLang - a functional programming language and compiler
Rust
9
star
7

riscv-cpu

RISC-V five stage pipline CPU
SystemVerilog
5
star
8

fpga

Research & Development FPGA projects for different boards
GLSL
4
star
9

echo-cms

CMS based on Echo golang framework
HTML
4
star
10

reqrep-events

Golang sample app for REQ/REP message queue events based on go-mangos
Go
3
star
11

rcp

Fast copy files based on Rust lang
Rust
3
star
12

rocket-rs

Rust Rocket app
Rust
3
star
13

useful-rust

Rust useful tools & tutorials based on https://doc.rust-lang.org/book/ Chapters and Rust By Examples
Rust
2
star
14

llvm-api-swift

Swift LLVM API is a library and utils set fully compatible with LLVM-C API. The library can be used to create compilers backend based on LLVM
Swift
2
star
15

llvm-sample

Rust
2
star
16

DAG-chain

DAG chain platform
F#
2
star
17

llvm-ir-tutor

LLVM IR Tutorial
LLVM
2
star
18

riscv-gcc-source-tests

C
2
star
19

avr-timers

AVR timers solutions
Assembly
2
star
20

reqrep-rs

Rust
2
star
21

simple-employees-task

Go
1
star
22

hsb

Haskell
1
star
23

gpg-test

1
star
24

auth-kit

Golang Authentication Kit
Go
1
star
25

status-check

Status Check system based on Yii Framework
PHP
1
star
26

mrLSD

Let's change the world... My favorite languages: Rust, F#, Go, Haskell
1
star
27

tokenomics

JavaScript
1
star
28

nodemcu-v3

NodeMCU v3 devboard projects
1
star
29

go-tutorial

Go Tutorials
Go
1
star
30

toy-codegen

Toy Codegen as example for `semantic-analyzer-rs` library.
Rust
1
star
31

mips-one-stage-cpu

MIPS 32 one stage CPU with limited ISA
Verilog
1
star
32

HaskellTutorials

Haskell
1
star
33

swift-async-toy

Toy async queue implemented on Swift
Swift
1
star
34

hashmap-copy

Rustlang concurent copy and change hashmap
Rust
1
star
35

dandelion

Semantic parser
F#
1
star
36

arduino

Arduino projects based on Atmel chips
C++
1
star
37

julia-dockerhub

DockerHub files for https://hub.docker.com/r/mrlsd/julia/
Makefile
1
star
38

exonum-blockchain

Exomun framework Blockchaind samples
1
star
39

yogaGo

Yoga CMS based on Golang
HTML
1
star
40

simple-http-files

Go
1
star
41

linux-util

Most useful utils for Linux
Shell
1
star
42

pyredmine

Python redmine console grid renderer for specific user and sorted by priority
Python
1
star
43

cabal-dependency-resolver

Haskell Cabal packages dependency resolver
Python
1
star
44

agda-emacs

Agda & Emacs integrations and samples code
Agda
1
star
45

idris-docker

Idris language docker image builder
Makefile
1
star
46

yasvm

Yet another simple Virtual Machine
Rust
1
star
47

rust-patterns

Rust design putterns templates
Rust
1
star
48

useful-haskell

Useful Haskell and algorithms
Haskell
1
star
49

git-fetch-history

Filter git commits and change their data
Go
1
star
50

book-formatter

Format text book to html formatted with page separation
Rust
1
star