Marsha AI Language
Describe Logic โดฒ Provide Examples โดฒ Run Reliably
Marsha is an LLM-based programming language. Describe what you want done with a simple syntax, provide examples of usage, and the Marsha compiler will guide an LLM to produce tested Python software.
Usage
The Marsha compiler can be used to compile the syntax using a pip
module via a terminal or Jupyter Notebook:
pip install git+https://github.com/alantech/marsha
python -m marsha data_mangling.mrsh
Syntax
The Marsha syntax looks a lot like markdown and is a mixture of English and mathematical notation. It has its own file format .mrsh
that houses function definition(s). The syntax is subject to change as Marsha is currently in an alpha state. If you have a legitimate use case for Marsha, please let us know.
Data Types
Data types provide function type safety which helps improve the accuracy of the code generation. The data type format is almost identical to the CSV format.
# type EmployeeSkills
name, skill
Bob, math
Jake, spreadsheets
Lisa, coding
Sue, spreadsheets
It is also possible for Marsha to infer the data type from CSV file
# type EmployeesByDepartment employees_by_department.csv
Functions
Functions are the bread and butter of Marsha and can easily define transformations between different data types. There are three sections to a Marsha function: the declaration, the description, and the examples.
The declaration is a Markdown heading section prefixed with func
, then followed by a name, parenthesis containing the input type(s), and finally a colon followed by the output type. The name must be a single word, but the types don't need to be classic software types, or even the explicit data types defined above. They can themselves be simple descriptions of what the type is meant to be. Eg,
# func get_employee_skills(list of EmployeesByDepartment, list of DepartmentSkills): list of EmployeeSkills
The next section is the description of the function. Here you explain what the function should do. Being more explicit here will reduce variability in the generated output and improve reliability in behavior, but it's up to you just how explicit you will be and how much you leave to the LLM to figure out. This is similar to declarative languages like SQL and HTML where there are defaults for things you do not specify, like the sort order of select
statements or the default styling of a <div>
. Eg,
This function receives a list of EmployeesByDepartment and a list of DepartmentSkills. The function should be able to create a response of EmployeeSkills merging the 2 list by department. Use the pandas library.
The final section is the example section. Here you provide examples of calling the function and what its output should be. Marsha uses this to provide more information to the LLM to generate the logic you want, but also uses it to generate a test suite to validate that what it has generated actually does what you want it to. This feedback loop makes Marsha more reliable than directly using the LLM itself. In some ways, this is similar to Constraint-based programming languages where you validate and verify the behavior of your function in the definition of the function itself, but it is also less stringent than those, allowing incomplete constraints where constraint-based languages will fail to compile in the face of that ambiguity. Eg,
* get_employee_skills() = throws an error
* get_employee_skills([EmployeesByDepartment('Joe', 'Accounting')]) = throws an error
* get_employee_skills([], []) = []
* get_employee_skills([EmployeesByDepartment('Joe', 'Accounting')], []) = []
* get_employee_skills([], [DepartmentSkills('Accounting', 'math')]) = []
* get_employee_skills([EmployeesByDepartment('Joe', 'Accounting')], [DepartmentSkills('Accounting', 'math')]) = [EmployeeSkills('Joe', 'math')]
* get_employee_skills([EmployeesByDepartment('Joe', 'Accounting'), EmployeesByDepartment('Jake', 'Engineering')], [DepartmentSkills('Accounting', 'math')]) = [EmployeeSkills('Joe', 'math')]
* get_employee_skills([EmployeesByDepartment('Joe', 'Accounting'), EmployeesByDepartment('Jake', 'Engineering')], [DepartmentSkills('Accounting', 'math'), DepartmentSkills('Engineering', 'coding')]) = [EmployeeSkills('Joe', 'math'), EmployeeSkills('Jake', 'coding')]
Altogether this produces:
# func get_employee_skills(list of EmployeesByDepartment, list of DepartmentSkills): list of EmployeeSkills
This function receives a list of EmployeesByDepartment and a list of DepartmentSkills. The function should be able to create a response of EmployeeSkills merging the 2 list by department. Use the pandas library.
* get_employee_skills() = throws an error
* get_employee_skills([EmployeesByDepartment('Joe', 'Accounting')]) = throws an error
* get_employee_skills([], []) = []
* get_employee_skills([EmployeesByDepartment('Joe', 'Accounting')], []) = []
* get_employee_skills([], [DepartmentSkills('Accounting', 'math')]) = []
* get_employee_skills([EmployeesByDepartment('Joe', 'Accounting')], [DepartmentSkills('Accounting', 'math')]) = [EmployeeSkills('Joe', 'math')]
* get_employee_skills([EmployeesByDepartment('Joe', 'Accounting'), EmployeesByDepartment('Jake', 'Engineering')], [DepartmentSkills('Accounting', 'math')]) = [EmployeeSkills('Joe', 'math')]
* get_employee_skills([EmployeesByDepartment('Joe', 'Accounting'), EmployeesByDepartment('Jake', 'Engineering')], [DepartmentSkills('Accounting', 'math'), DepartmentSkills('Engineering', 'coding')]) = [EmployeeSkills('Joe', 'math'), EmployeeSkills('Jake', 'coding')]
Goals
The Marsha syntax is meant to be:
- minimal and "obvious", but also discourage lax or incomplete information that could lead to unpredictable behavior
- be mechanically parseable for syntax highlighting and quick feedback on correctness issues to the user
- make it easy to define examples to reduce the probability of generating faulty code and allow generating tests that the application code can be tested against
Compiler
Marsha is compiled by an LLM into tested software that meets the requirements described, but implementation details can vary greatly across runs much like if different developers implemented it for you. There is typically more than one way to write software that fulfills a set of requirements. However, the compiler is best-effort and sometimes it will fail to generate the described program. We aim for 80%+ accuracy on our examples. In general, the more detailed the description and the more examples are provided the more likely the output will work.
In order to use the compiler, the following environment variables must be set:
OPENAI_ORG
OPENAI_SECRET_KEY
Support for other LLMs, including running something locally, is planned but not yet implemented.
There are also a few flags on how to use Marsha:
$ marsha --help
usage: marsha [-h] [-d] [-q] [-a ATTEMPTS] [-n N_PARALLEL_EXECUTIONS] [--exclude-main-helper] [-s] source
Marsha AI Compiler
positional arguments:
source
options:
-h, --help show this help message and exit
-d, --debug Turn on debug logging
-q, --quick-and-dirty
Code generation with no correction stages run
-a ATTEMPTS, --attempts ATTEMPTS
-n N_PARALLEL_EXECUTIONS, --n-parallel-executions N_PARALLEL_EXECUTIONS
--exclude-main-helper
Skips addition of helper code for running as a script
-s, --stats Save stats and write them to a file
-d
adds a significant amount of debug information to the screen. Probably not useful if you're not working on Marsha itself.-q
runs only the initial code generation phase without any of the corrective feedback stages. This is significantly cheaper, but more likely to generate code that doesn't quite work. This could be useful if you're using Marsha like Github Copilot or directly asking for code from ChatGPT, but with the Marsha syntax providing some more structure to produce a better result than you might if simply given a blank screen to write into.-a
The number of times marsha should attempt to compile your program, defaulting to just once. If set to more than 1, on a failure it will try again. For some trickier programs this might improve the ability to get working code at the cost of more LLM calls.-n
The number of parallel LLM threads of "thought" to pursue per attempt. This defaults to 3. When a path succeeds, all of the other paths are cancelled.-s
Save the stats that are printed by default to a file, instead. Probably not useful if you're not working on Marsha itself.--exclude-main-helper
Turns off the automatically generated code to make using your compiled Marsha code from the CLI easier, which is included by default.
Using compiled Marsha code
By default, Marsha appends logic to the generated Python code to make usage simpler, allowing you to invoke it from the CLI and potentially start a REST server.
$ python -m duckduckgo --help
usage: duckduckgo.py [-h] [-c {BeautifulSoup,duckduckgo}] [-j] [-t] [-i] [-f INFILE] [-o OUTFILE] [-s SERVE] [params ...]
Marsha-generated CLI options
positional arguments:
params Arguments to be provided to the function being run. Optimistically converted to simple python types by default, and left as strings if not possible
options:
-h, --help show this help message and exit
-c {BeautifulSoup,duckduckgo}, --func {BeautifulSoup,duckduckgo}
Specifies the function to call. Defaults to the last defined function
-j, --force-json Forces arguments, files, or stdin to be parsed as JSON
-t, --force-text Forces arguments, files, or stdin to be parsed as raw text
-i, --stdin Ignores CLI parameters in favor of stdin (as a single parameter)
-f INFILE, --infile INFILE
Ignores CLI parameters in favor of reading the specified file (as a single parameter)
-o OUTFILE, --outfile OUTFILE
Saves the result to a file instead of stdout
-s SERVE, --serve SERVE
Spins up a simple REST web server on the specified port. When used all other options are ignored
-c
Lets you choose which function within the generated code you wish to invoke. By default it selects the last function defined, as that is usually a "main-like" function.params
are all non-option arguments provided, in order, to the function you are invoking.-j
and-t
let you choose if the param(s) provided will be parsed as JSON or kept as plain text. By default it will opportunistically parse the arguments but if it fails will keep it as text-i
,-f
, and-o
let you choose how input and output is managed. By default inputs are theparams
arguments and the output is tostdout
, but you can use-i
to then ignore allparams
and treatstdin
as the singular input param for your function. Similarly-f
will do the same, but for the file you specify, and-o
will write the result to a file you specify instead of tostdout
.-s
Is a flag to instead run a simple REST server. Using this flag causes it to ignore all other flags. The various function names become/func_name
endpoints that you can POST to and get a response body back. If you set theContent-Type
header toapplication/json
the input and output will be JSON, if not it will be plain text. If your function takes mutliple arguments, it must be called in JSON mode with the arguments each being an element of a top-level array.
Roadmap
- Improve average accuracy for our test bed above 90%
- Support for visualizations and data storage (geek mode: handle side-effect logic better in general)
- Syntax highlighting (vim, vscode, etc)
- Support for different types of LLM
- Bootstrap the Marsha compiler with a Marsha program
- More target languages other than Python
- A module system
- Edits to Marsha mutating existing Python code instead of regenerating
- "Decompiler" from source code into Marsha syntax
- "Debugger" meta mode to take existing Marsha definition and an example of an unexpected failure and recommend what to update with the Marsha definition.
- Optmization "levels" (spend more time on more iterations with the LLM improving performance, security, etc)
- Marsha GUI mode: visual editor baked into the compiler (eventually with the decompiler/debugger/etc features), and able to generate a GUI wrapper for generated code, enabling end-to-end non-terminal usage
- Better support for a mixed environment (Marsha functions can be used by Python, but how to get Marsha to use hand-written Python functions)
- Better "web scraping" behavior (LLM likes to assume the internet still looks like it did in November 2021, but HTML structure has often changed for the largest websites; automatically correcting that assumption would be nice)