• This repository has been archived on 06/Jul/2023
  • Stars
    star
    146
  • Rank 244,525 (Top 5 %)
  • Language
    C++
  • Created over 1 year ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

S6: A standalone JIT compiler library for CPython

S6 was a project that started within DeepMind in 2019 to accelerate CPython with just-in-time compilation (“JIT”). These features would be provided as a regular Python library, and no changes to the CPython interpreter itself would be necessary. S6 was aiming to do for Python what V8 did for Javascript (the name is an homage to V8). Work was based on CPython verison 3.7. Depending on the workload, we saw speedups as high as 9.5x on common benchmarks.

import s6

@s6.jit
def foo(x):
  return x + 1

Project goals

Python is slow, and a lot of researchers use Python as their primary interface to build models. This can create friction for researchers, and speeding up the model building process can improve iteration and development time. The network effect of Python has made moving to a different language very difficult. So with Python here to stay, we looked to improve upon it.

The requirements we set ourselves were:

  • Increase the speed of common programs x-fold: not 10%, but 10x.
  • Be fully compatible with CPython. Most tools, libraries, and infrastructure that researchers use only work with CPython.
  • Not require any code changes. Users should be able to import S6, wrap their top level function in a JIT annotation, and that’s it.

Status and support

We have stopped working on S6 internally. As such, this repository has been archived and we are not accepting pull requests or issues. We open-sourced the code and provided a design overview below to spur conversations within the Python community and inspire future work on improving Python.

Setting it up locally

We set up a Docker container and some Jupyter notebooks to allow you to build and experiment with S6. To do so, follow these steps.

Set up the S6 Docker container:

  1. Install docker.

  2. Build the S6 docker container:

    docker build -t deepmind-s6:latest .

You can run the docker container in two different ways:

  1. By running a notebook server:

    docker run -p 8888:8888 deepmind-s6:latest jupyter lab --port 8888 --ip 0.0.0.0 --no-browser --allow-root --NotebookApp.token=SECRET_TOKEN

    Then open localhost:8888/lab?token=SECRET_TOKEN in a web browser.

    Demo notebooks are found in the notebooks directory.

  2. By running an interactive shell session:

    docker run -it deepmind-s6:latest /bin/bash

    The test suite for S6 can be run from within a Docker shell session with:

    (cd /open-s6/s6/build; ninja test)

Benchmark results

Overall, S6 provides a speed of up to 10x over CPython for pure Python benchmarks. Workloads that heavily rely on C APIs, like those that use NumPy, are bottlenecked by the C API calltime. Future work could include targeted optimisations in the C code to improve this; for example, use the profiling metadata to skip all the type checks normally done by C APIs.

Most testing during development was against internal benchmarks, or private variants of external benchmarks. Here we report results on three public pyperformance benchmarks; their docstrings detail minor modifications.

Richards

Richards is a common benchmark that best represents pure Python code that researchers write. For the Richards benchmark located under the benchmarks directory, if you run with 100 iterations, depending on your hardware, you should see a 7x speedup over the default Python 3.7 interpreter.

  • On our internal dev machines, we saw speedups of 9.5x on average.

Raytrace

Raytrace is a pure-Python mathematical benchmark.

  • On our internal dev machines, we saw speedups of 3-4.5x on average.

Unpack Sequence

Unpack sequence is the worst performing benchmark we ran on S6. The cause is that S6 has its own ceval.c interpreter that is not as well optimized as the standard CPython one. We had to write a separate ceval.c-like interpreter in order to add hooks for S6 to compile functions and collect type feedback. With the standard unpack sequence benchmark you can see a 5x slowdown as the main do_unpacking function is called once and with that one call, we use our own, slower interpreter.

Even then, UNPACK_SEQUENCE is a difficult bytecode to handle from a type checking perspective. Even if the input to the function is an array/tuple, an array/tuple can contain any different type and in different orderings. So we need to type check every element to make sure that the JIT'd function is being passed the correct input types. Because of all this type checking, if a function either unpacks a giant array/tuple or does a lot of unpacking (like the benchmark), our compiled code can end up being slower than interpreted CPython. As such, if we detect there are many or a large unpack, we avoid compiling the function and bail out to the optimized ceval.c interpreter in CPython everytime we come back to this function.

With the issues of UNPACK_SEQUENCE in mind, we included two version of the unpack sequence benchmark. The super slow one (0.2x speedup) is where the for-loop is in do_unpacking so we never bail out to ceval.c and then a faster one (0.97x speedup) where we took the for-loop out of do_unpacking and instead call do_unpacking loops number of times.

Known limitations

S6 exists as a standalone dependency and does not require any code changes in CPython. However, it was developed for an environment with two unusual features:

  1. Native code needs to store a frame pointer, which typically requires the -fno-omit-frame-pointer compiler flag when building CPython and native extensions. Without this change, jit compilation might crash in some cases.
  2. In the CPython interpreter, the function PyFrame_FastToLocalsWithError needs to be intercepted by S6. We used to do this by linking Python and Python extensions with the linker flag --wrap=PyFrame_FastToLocalsWithError, but this is not part of the current release, and jit compilation might misbehave in some cases.

As discussed in the unpack_sequence benchmark section, S6 does not run well on giant functions. To mitigate this, break up your function into smaller pieces.

S6 only supports CPython 3.7, but with some changes it could be made to support later CPython versions.

There is further work to be done optimising S6's strongjit. Further performance improvements would be expected from additional compiler optimisations, such as inlining.

Design

S6 replaces the existing CPython interpreter/bytecode handler ceval.c with our own interpreter.cc. For every PyCodeObject, S6 does the following:

  • Check to see if the function has already been compiled; if the code has, S6 looks up and runs the compiled code.

  • If the code is not compiled, S6 runs the S6 bytecode interpreter, which is very similar to the CPython interpreter.

Similarly to V8, we maintain a counter that counts how many interpreter instructions have been executed. When this reaches a threshold, the PyCodeObject at the top of the call stack is considered for compilation.

To determine what code to compile, we use the Oracle which asynchronously compiles the function after a certain number of bytecode instructions and function visits. When compiling code, S6 translates it to strongjit: an intermediate representation suitable for various optimization passes. CPython bytecode implementations can be complex, and translation may create many strongjit instructions per Python bytecode.

Given type feedback, we may choose to make assumptions about inputs to a function in order to generate optimisation opportunities (for instance that an input is always an integer). We refer to this as function specialization. Guards are put in place to check that these assumptions hold. If they do not, then the function must be de-optimized and the bytecode is interpreted as before. Because checking the type of all arguments at every function call may be costly, and to avoid producing huge numbers of specializations, we only specialize on the common case of the type of the first argument (usually self).

When deoptimizing, we convert the optimized function state into CPython state (PyFrameObject) and continue with the normal interpreter. However in many cases we are not at a bytecode boundary when we deoptimize. In this case we interpret the strongjit IR until we get to a boundary point. After writing and optimizing the strongjit IR, S6 lowers it to x86 machine code using the asmjit library.

Hidden classes

In its initial design, S6 used inline caching keyed on an object’s Map. A Map comes from prototype-based languages (introduced in SELF). It is also termed a Shape (in V8). It attempts to describe the location of attributes within an object. All objects with the same Map have the same attributes in the same locations. An inline cache memoizes the last result of an operation based on the Map of the input.

Frequently we used the Map as a cache key, as any modification to the behavior of an object should result in a Map transition (Maps are immutable).

For example, if we have the following in Python:

c.x

The pseudo code of lookup would look like this:

if map(c) == M:
  return c.__attributes__[0]
else:
  return __getattr__(c, "x")

Or for example, as shown in the diagram below, to get the x attribute we:

  1. Obtain the tp_dictptr field from the object’s type.
  2. Use the tp_dictptr to get the __dict__ field’s address from the object.
  3. Follow the __dict__ to get the PyDictObject itself.
  4. Look at the values array. The Map from step 2 tells you how to map from key to value.

(below, the red arrows are direct pointers, while purple arrows are indirect using the offset int value)

Attribute access diagram.

However, there are at least three issues with this approach:

In almost all instances a cache keyed on a Map is insufficient, as Python’s type system allows modifications to an object’s Type to modify attribute lookup on the object, even if the object has overridden a given attribute already. The example of this is data descriptors. A modification to a Type potentially invalidates all associated Maps. A Map could be specialized on the type version of its type, but this loses all the benefits of structural typing. So when we should have a monomorphic cache lookup since two types look the same (duck typing), we still require a polymorphic lookup.

class C(object):
  def __init__(self):
    self.x = 2
class D(object):
  def __init__(self):
    self.x = 3
def f(a : Union[C or D]):
  return a.x  # Structurally monomorphic, nominally polymorphic.

Inline caches reference mutable data. This can be a problem when dealing with functions. Holding borrowed references to function objects is unsafe, and acquiring references to function objects can be costly.

def f(l, a):
  def g(x): ...
  for x in l:
    a += g(x) # inline cache holds reference to "g". Unsafe if borrowed, costly
              # if increffed every time, and changes the lifetime of any object
              # captured by `g`.
  return a

We needed to add an extra field to PyObject to store the map pointer, but this would have broken ABI guarantees and made the possibility of upstreaming S6 into CPython very challenging.

So instead we implemented hidden classes. Hidden classes are a parallel type system to Python’s types. They are a subcategorization of types - objects of a unique type may have different hidden classes, but objects of a unique hidden class always have the same type. In this way they are distinct from Maps or Shapes that provide uniqueness based only on structural typing.

Hidden classes provide:

  • Behavioral stability; if an object’s hidden class has not changed, its behavior has not changed.
  • A dense numbering; for efficient lookup in a global dispatch table.
  • A compile-time-inspectable enumeration of members and methods.
  • Type information about member variables and methods.
  • Type information is provided as a lattice in which information becomes weaker but never stronger.

Given that hidden classes contain a lattice of type information, they are not immutable, but they do change in well-defined and observable ways.

CPython never uses hidden classes. They are invisible to CPython and no type queries, subclass queries will never return different information based on hidden classes.

Obtaining a class from a PyObject

We must be able to store a mapping between PyObjects and hidden classes. The mapping must be incredibly fast.

One of the goals of S6 is to be import-compatible with existing CPython. Modifications to the PyObject class were not allowed, and as PyObject only contains two members (ob_refcnt and ob_type), there is nowhere to hide a class ID. Instead, we hide the hidden class ID inside PyDictObjects and PyTypeObjects.

We observe that there are two categories of object - objects that behave the same as a newly created object of their type, and those that behave differently.

class A(object): pass
a = A()
b = A()  # a and b behave the same, and are of the same type, so same class ID.
b.x = 2  # Now b behaves differently from A, they must have differing class IDs.

For objects that behave the same a a newly created object of the same type, we can store the class ID in the type object.

For objects that behave differently: in order to behave differently they must have a __dict__. This is the only means of changing behaviour. Therefore we can store the class ID in the PyDictObject used as __dict__.

Specifically: Class identifiers are dense integers <= 20 bits in size.

If the PyObject has a __dict__, the class identifier uses the uppermost 20 bits of the dict’s ma_version_tag field.

Otherwise the class identifier uses the uppermost 20 bits of PyTypeObject’s tp_flags field. This field is 64-bits long and is already zeroed.

Class identifier storage diagram.

This has implications:

  • We don’t support hidden classes for objects with custom attribute getters or setters, because modifications to an object’s attributes require a dict for us to store the class ID.
  • If an object’s dict is modified outside the visibility of S6, a new ma_version_tag will be set by CPython. This will have the upper 20 bits zeroed, setting the effective class ID to 0. This allowed us to notice if our assumptions have been broken behind our back. Note that wrap-around for the other 44 bits would take 5 years if dicts were modified at 100kHz.
  • We don’t support objects with tp_dictoffset < 0. This means the dict is offset from the end of the object in memory, which is a more complex calculation. This is rare and only really used for str or long subclasses.

The ability to store class IDs inside the PyTypeObject allows us to store a hidden class for all objects even if they do not have a dict. An example of this is builtin types, which are generally immutable. Objects that have custom attribute setters are still excluded however, because tracking whether an object still has the behaviors of its base type or not is impossible.

Behavioral changes

There are two classes of behavioral change: modifications to the type or a supertype and modifications to an object.

Type modification

Modification of a type potentially changes the behavior of all hidden classes indirectly derived from that type. Therefore, in order to use a hidden class with a type, we must ensure that we can detect all behavioral changes of the type and its supertypes.

This is achieved by wrapping the modifiable members of the type and all its supertypes (tp_setattro, __bases__, __class__) with a wrapper function that informs S6. If the type has a custom setattro, we do not optimize the type.

When a type is modified, the world is stopped (easy because of the GIL), compilations are halted, all affected classes are modified and if any attribute of a class was changed that generated code relied upon, the generated code is thrown away.

Object modification

If an attribute is added or deleted from an object, the object must take a class transition and have a new class. This is so that all attribute loads and stores can load and store without nullptr or out of bounds guards.

If an existing attribute is modified, then it is possible the behavior of an object changes. Examples include modifying a method. The method table for the object is then incorrect; the object must undergo a class transition. Attributes for which modification requires a class transition are marked as ‘behavioral’.

class C(object):
  def f(self): pass

a = C()  # Has hidden class H
a.x = 2  # New hidden class, H -> H2 because new attribute `x`.
a.x = 3  # Same class H2 - attribute merely changed.
a.f = 4  # New hidden class, H2 -> H3, because `f` was a member function before
         # and thus was marked "behavioral".
Cacheable attributes

Attributes that do not change value across all object instances of a class are termed ‘cacheable’. These attributes have a stable value and the code generator may emit code that relies upon this value being stable. Simple cases include a boolean attribute. An attribute-write to an attribute that is currently marked ‘cacheable’ to a different value simply removes the ‘cacheable’ flag and any code generated that relied on the attribute is thrown away.

Globals

Python code contains a large number of lookups to the __globals__ dict. Even a function call requires consulting the globals dict to obtain the PyFunctionObject for a symbol name.

We can consider this dict in the same way as we treat object __dict__ fields; we store a class ID in the globals dict, and can use all of the fast attribute tricks implemented for objects on global values.

In particular, the cacheable property described above implicitly allows us to optimize away code hidden behind global booleans that are never modified:

DEBUG_MODE = False  # Set this to True to enable debug statements.

def f(a):
  if DEBUG_MODE:  # This is a load of globals.DEBUG_MODE, which is known to be
    print(a)      # always False at runtime.
  return a + 1

Adoption of new types and objects

Adoption of a type or object is where S6 determines the correct hidden class and sets it. Adoption of an object involves ensuring that its type is adopted and setting its hidden class. This may involve inspecting the dictionary to see which attributes have already been set. Adoption of a type involves creating a new hidden class, setting the Py_TPFLAGS_S6_Adopted flag and overriding tp_new to a wrapper function that allows us to adopt all new instances of the object.

When S6 is initialized, we eagerly scan the type hierarchy to adopt all types. We override PyTypeObject::tp_new to a wrapper that lets us detect and adopt newly created types.

This should allow us to catch all new instances of all objects except some extension objects (Extension types that call Py_Ready() for the first time after we have greedily scanned the type hierarchy). These will be small in number and we can add mitigations if needed (like adopting in tactical places like GetAttr).

Oracle

The Oracle class determines which functions are worth compiling and manages the compilation queue. For development/debugging, you can configure whether functions should be compiled asynchronously/synchronously, how often functions are profiled and compiled, and more.

Compilation in the Oracle is a two step process. We don’t immediately compile the function we first step into. Every profile_bytecode_instructions_interval instructions, the current PyCodeObject is inspected and hotness_threshold decremented; if hotness_threshold reaches zero, the object will be optimized. This two-stage process reduces the frequency of accessing code metadata.

Arithmetic Unboxing

Since bools/floats/longs in Python are all objects, they are boxed types. Doing arithmetic operations on these boxed types can be very expensive compared to a single assembly instruction on values in registers. And since we collect type information, we can unbox the values from the CPython boxed types to do much faster arithmetic operations. Refer to UnboxInst and BoxInst for more information.

API

@s6.jit

Decorator to mark the function as JITable to S6. When the callable is invoked, just-in-time compilation mode is enabled if it was not enabled already. All functions called transitively by the decorated function are considered for just-in-time compilation, but only one function is compiled at a time. Note that a function in the call stack is not necessarily compiled on the first call, but only after some number of calls. See the Oracle section for more details.

Developer APIs

A number of internal APIs are exposed by calling s6.inspect(foo) for any function foo that is decorated with @s6.jit or have been marked for tracing if the decorated function called the function. These are completely unnecessary during normal operation, and are made accessible for users to inspect and introspect the JIT process.

strongjit

Returns a string version of the intermediate representation used by S6 of fn, which is named “strongjit”. The version returned here is the version after optimizations before it is compiled to machine code. The strongjit can be interpreted by using the _evaluate method of the s6.jit object.

is_compiled

Returns True if fn is currently compiled, and False otherwise.

force_compile()

Compiles fn, if it has not already been compiled. Throws an exception if the compilation fails.

deoptimize()

Deoptimize fn. This only deoptimizes the main specialization. Throws NotCompiledError if not compiled

_evaluate(...)

Evaluates fn with the S6 evaluator. "Evaluating" means interpreting the S6 strongjit of fn.

The function fn needs to be compiled, otherwise execution will fall back to using the CPython interpreter and compilation will never be triggered. This function has no purpose other than debugging. If calling a compiled function with _evaluate doesn't result in the same behavior as when calling it will the normal call (of a jitted function), then S6 code generation has a bug.

This will apply to all functions called by fn recursively except if another explicit call to the S6 API changes it. If a called function wasn't compiled, execution will revert to using the plain CPython interpreter.

Strongjit IR

Strongjit is a static single-assignment form intermediate representation (SSA IR) that uses block arguments instead of PHI instructions for control flow. To produce optimized code, Python bytecode is translated to strongjit. Successive optimisation passes improve the strongjit. The improved strongjit is then translated to assembly. The design goals of the IR were:

  • Textually round-trippable for testing.
  • Compact, cache efficient. This is a JIT compiler, and we manipulate existing instructions/ops much less than frameworks like MLIR. Strongjit is, however, designed to be easily translatable to MLIR.
  • Amenable to standard operations and traversals found in optimizing compilers
  • Certain operations should be cheap:
    • Instruction insertion.
    • Instruction deletion.
    • Block insertion, deletion, splitting.
    • RPO traversal.
    • Traversing def-use chains (at least from use to def; traversing from a def to all uses is less common).
    • Holding a handle to an instruction that is invariant to instruction insertion/deletion.

Blocks and Instructions

  • Values are the base of the hierarchy. Basic blocks, instructions and block arguments are all values. They can use other Values and be used in turn. Values have an operand list.

  • Instructions are Values that reside inside a Block. Their address is constant and is never reused.

  • Blocks are Values. Instructions can refer to Blocks as operands, for example jmp &0. Blocks contain a doubly linked list of Instructions. Blocks do not own the storage for Instructions - this is owned by Function.

  • Functions are the outermost unit of the hierarchy. They maintain a list of Blocks and own the storage for all Blocks and Instructions. The list of Blocks is again a doubly linked list.

  • Offset: a bytecode instruction offsets. Gives a location within interpreted code. Used for exception/tracing location information and for deoptimization out of generated code.

For cache locality, all instructions reside in a single deque-like container that uses tombstones when instructions are erased to keep all instruction addresses constant. This allows pointers to instructions to be stored.

There is a lightweight class hierarchy that uses the instruction opcode as a discriminator (See: casts) and allows rich, type safe accessors/mutators.

Instructions

See implementation in instructions.h.

  • ConstantInst: Materializes an int64 immediate as a Value.

  • JmpInst: An unconditional jump to a single successor.

  • BrInst: A conditional branch to two successors.

  • CompareInst: Compares two operands.

  • ExceptInst: Unconditional jump to the exception handler.

  • DeoptimizeIfInst: A conditional deoptimize. The true_successor is a deoptimized block. Maintains a set of values that are required in order to run the deoptimized code.

  • ReturnInst: Unconditional return.

  • UnreachableInst: Terminate the current control flow arc with an error.

  • NegateInst: Unary arithmetic negation.

  • NotInst: Unary boolean not.

  • AddInst: Binary arithmetic addition.

  • SubtractInst: Binary arithmetic subtraction.

  • MultiplyInst: Binary arithmetic multiplication.

  • DivideInst: Binary arithmetic division.

  • RemainderInst: Binary arithmetic modulo operation.

  • AndInst: Binary boolean and operation.

  • OrInst: Binary boolean or operation.

  • XorInst: Binary boolean xor operation.

  • ShiftLeftInst: Binary shift-left operation.

  • ShiftRightSignedInst: Binary signed shift-right operation.

  • IntToFloatInst: integer to float conversion.

  • SextInst: Sign-extension operation.

  • IncrefInst: Increase reference count.

  • DecrefInst: Decrease reference count.

  • LoadInst: Loads from a pointer location with optional extension.

  • LoadGlobalInst: Loads a global by name, given an index into the kNames tuple. Lookup follows the Python convention, looking in the frame globals dict and then in builtins if not found in globals. The result is a borrowed reference to a PyObject* or 0 (with an appropriate exception set) if the lookup failed.

  • StoreInst: Stores to a pointer location with optional truncation.

  • FrameVariableInst: Obtains the address of a local variable. This instruction also takes an index i, used with the kConsts, kNames, kFastLocals and kFreeVars kinds.

  • CallNativeInst: Calls a C function by symbol. The callee must be in a known allowlist.

  • RematerializeInst: A lazily called CallNativeInst. It exists to allow eliding work on the fast path that needs to be done if we deoptimize. An example here is eliding an attribute load for a method call. If we are forced to deoptimize before the call, for example while computing arguments, we are forced to materialize the attribute load. A RematerializeInst is only run during deoptimization, if the de-optimizer finds a RematerializeInst in the value stack it will call the runtime function to materialize the true value.

  • CallPythonInst: Calls a Python value.

  • CallAttributeInst: Calls an attribute on a value. This is equivalent to (call_python (call_native s6::GetAttr $x, "attr"), ...). The attribute is stored as a string inside the function's string table.

  • CallNativeIndirectInst: Calls a native function through a function pointer.

  • SafepointInst: A BytecodeBeginInst, YieldValueInst or a DeoptimizeIfSafepointInst. It contains a bytecode offset, value stack and try handler stack; enough information to construct an accurate interpreter frame.

  • BytecodeBeginInst: Marks the beginning of a bytecode instruction. This contains all the information required to reconstruct a CPython interpreter state: bytecode offset, value stack contents and try-block stack contents.

  • YieldValueInst: Yields within a generator function. The function's state is saved, and can be resumed after the yield. YieldValueInst takes a single operand that is yielded to its caller, and its result is the value the caller sends back. Because generators may be deoptimized while paused, YieldValueInst also holds enough information to reconstruct an interpreter frame (it is a SafepointInst).

  • DeoptimizeIfSafepointInst: Deoptimizes if a condition is true or optionally false. This instruction has enough information to materialize an interpreter frame (it is a SafepointInst), so it does not need to act as control flow - if deoptimization occurs then no more strongjit IR code is run.

  • In this way it is distinguished from DeoptimizeIfInst, which deoptimizes but is not at a Safepoint boundary and needs to run more strongjit code (in the evaluator, usually) to get to a boundary.

  • AdvanceProfileCounterInst: Adds amount to the profile counter. This indicates that approximately the amount of CPython bytecodes of work has been performed.

  • IncrementEventCounterInst: Increments a named event counter managed by the EventCounters singleton. Each event counter counts the number of times that it has been incremented by reaching a trace point during the execution of compiled code.

  • TraceBeginInst: Starts a trace event with a named argument in the trace category TraceCategories::kFunctionExecution. The name is stored in the global intern table. Deoptimization is handled implicitly. Each TraceBeginInst must be balanced by a TraceEndInst. This means that a TraceBeginInst inserted at the beginning of a function must be balanced by a TraceEndInst at all exit points, which include:

    • ReturnInst
    • ExceptInst, where the instruction has no successors.
    • YieldInst, with a matching TraceBeginInst after the YieldInst to resume the event.
  • TraceEndInst: Ends a trace event with a named argument in the trace category TraceCategories::kFunctionExecution. The trace event must have been started by a corresponding TraceBeginInst.

  • BoxInst: Constructs a Python Long/Float/Boolean from the operand.

  • UnboxInst: Extracts a native Long/Float/Boolean from a Python object. If the object is not of the correct type (e.g. PyFloat where PyLong is expected) or, in the case of PyLong, its value exceeds 64 bits, then the overflow flag is set. This can be detected using overflowed? as the very next instruction.

  • OverflowedInst: Determines whether the operand overflowed. The operand must be an arithmetic or unbox instruction, and must occur immediately before this instruction in the same block.

  • FloatZeroInst: Determines whether the operand is floating point zero, either positive or negative.

  • GetClassIdInst: Given an object, returns its class ID.

  • GetObjectDictInst: Given an object, returns its dict or zero on failure.

    • If the dictoffset or type are known, this instruction can operate more efficiently.
  • GetInstanceClassIdInst: Given an object's __dict__, returns its class ID.

  • CheckClassIdInst: Checks if an object has a particular class ID.

  • LoadFromDictInst: Given an object's __dict__, loads __dict__.ma_values[index].

  • StoreToDictInst: Given an object's __dict__, stores to __dict__.ma_values[index] and returns the value replaced.

  • CallVectorcallInst: Calls a native function with CPython's vectorcall calling convention.

  • SetObjectClassInst: Sets the class ID of an object. The object and its instance dictionary are passed, and the ma_version_tag of the instance dictionary is written.

More Repositories

1

deepmind-research

This repository contains implementations and illustrative code to accompany DeepMind publications
Jupyter Notebook
12,817
star
2

alphafold

Open source code for AlphaFold.
Python
11,700
star
3

sonnet

TensorFlow-based neural network library
Python
9,691
star
4

pysc2

StarCraft II Learning Environment
Python
7,904
star
5

mujoco

Multi-Joint dynamics with Contact. A general purpose physics simulator.
Jupyter Notebook
7,202
star
6

lab

A customisable 3D platform for agent-based AI research
C
7,012
star
7

graph_nets

Build Graph Nets in Tensorflow
Python
5,325
star
8

graphcast

Python
4,242
star
9

learning-to-learn

Learning to Learn in TensorFlow
Python
4,063
star
10

open_spiel

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.
C++
4,019
star
11

alphageometry

Python
3,580
star
12

dm_control

Google DeepMind's software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo.
Python
3,473
star
13

acme

A library of reinforcement learning components and agents
Python
3,372
star
14

trfl

TensorFlow Reinforcement Learning
Python
3,139
star
15

dm-haiku

JAX-based neural network library
Python
2,697
star
16

alphatensor

Python
2,616
star
17

dnc

A TensorFlow implementation of the Differentiable Neural Computer.
Python
2,478
star
18

mctx

Monte Carlo tree search in JAX
Python
2,209
star
19

gemma

Open weights LLM from Google DeepMind.
Jupyter Notebook
2,061
star
20

code_contests

C++
2,010
star
21

kinetics-i3d

Convolutional neural network model for video classification trained on the Kinetics dataset.
Python
1,639
star
22

mathematics_dataset

This dataset code generates mathematical question and answer pairs, from a range of question types at roughly school-level difficulty.
Python
1,621
star
23

optax

Optax is a gradient processing and optimization library for JAX.
Python
1,492
star
24

bsuite

bsuite is a collection of carefully-designed experiments that investigate core capabilities of a reinforcement learning (RL) agent
Python
1,465
star
25

penzai

A JAX research toolkit for building, editing, and visualizing neural networks.
Python
1,405
star
26

educational

Jupyter Notebook
1,382
star
27

jraph

A Graph Neural Network Library in Jax
Python
1,306
star
28

rc-data

Question answering dataset featured in "Teaching Machines to Read and Comprehend
Python
1,285
star
29

rlax

Python
1,185
star
30

tapnet

Tracking Any Point (TAP)
Python
1,033
star
31

scalable_agent

A TensorFlow implementation of Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures.
Python
972
star
32

neural-processes

This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CNPs), Neural Processes (NPs), Attentive Neural Processes (ANPs).
Jupyter Notebook
966
star
33

android_env

RL research on Android devices.
Python
946
star
34

mujoco_menagerie

A collection of high-quality models for the MuJoCo physics engine, curated by Google DeepMind.
Jupyter Notebook
926
star
35

dramatron

Dramatron uses large language models to generate coherent scripts and screenplays.
Jupyter Notebook
904
star
36

tree

tree is a library for working with nested data structures
Python
891
star
37

xmanager

A platform for managing machine learning experiments
Python
796
star
38

mujoco_mpc

Real-time behaviour synthesis with MuJoCo, using Predictive Control
C++
771
star
39

materials_discovery

Python
770
star
40

chex

Python
716
star
41

reverb

Reverb is an efficient and easy-to-use data storage and transport system designed for machine learning research
C++
692
star
42

alphadev

Python
662
star
43

pycolab

A highly-customisable gridworld game engine with some batteries included. Make your own gridworld games to test reinforcement learning agents!
Python
654
star
44

ferminet

An implementation of the Fermionic Neural Network for ab-initio electronic structure calculations
Python
643
star
45

hanabi-learning-environment

hanabi_learning_environment is a research platform for Hanabi experiments.
Python
614
star
46

funsearch

Jupyter Notebook
611
star
47

ai-safety-gridworlds

This is a suite of reinforcement learning environments illustrating various safety properties of intelligent agents.
Python
577
star
48

dqn

Lua/Torch implementation of DQN (Nature, 2015)
Lua
546
star
49

ithaca

Restoring and attributing ancient texts using deep neural networks
Jupyter Notebook
540
star
50

meltingpot

A suite of test scenarios for multi-agent reinforcement learning.
Python
516
star
51

distrax

Python
509
star
52

recurrentgemma

Open weights language model from Google DeepMind, based on Griffin.
Python
505
star
53

surface-distance

Library to compute surface distance based performance metrics for segmentation tasks.
Python
493
star
54

tracr

Python
467
star
55

dsprites-dataset

Dataset to assess the disentanglement properties of unsupervised learning methods
Jupyter Notebook
463
star
56

alphamissense

Python
455
star
57

narrativeqa

This repository contains the NarrativeQA dataset. It includes the list of documents with Wikipedia summaries, links to full stories, and questions and answers.
Shell
432
star
58

lab2d

A customisable 2D platform for agent-based AI research
C++
415
star
59

open_x_embodiment

Jupyter Notebook
409
star
60

dqn_zoo

DQN Zoo is a collection of reference implementations of reinforcement learning agents developed at DeepMind based on the Deep Q-Network (DQN) agent.
Python
406
star
61

clrs

Python
376
star
62

spriteworld

Spriteworld: a flexible, configurable python-based reinforcement learning environment
Python
366
star
63

dm_pix

PIX is an image processing library in JAX, for JAX.
Python
363
star
64

concordia

A library for generative social simulation
Python
351
star
65

mathematics_conjectures

Jupyter Notebook
348
star
66

alphastar

Python
346
star
67

spiral

We provide a pre-trained model for unconditional 19-step generation of CelebA-HQ images
C++
327
star
68

dm_env

A Python interface for reinforcement learning environments
Python
326
star
69

dm_robotics

Libraries, tools and tasks created and used at DeepMind Robotics.
Python
315
star
70

uncertain_ground_truth

Dermatology ddx dataset, Jax implementations of Monte Carlo conformal prediction, plausibility regions and statistical annotation aggregation from our recent work on uncertain ground truth (TMLR'23 and ArXiv pre-print).
Python
315
star
71

long-form-factuality

Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".
Python
314
star
72

launchpad

Python
305
star
73

leo

Implementation of Meta-Learning with Latent Embedding Optimization
Python
302
star
74

streetlearn

A C++/Python implementation of the StreetLearn environment based on images from Street View, as well as a TensorFlow implementation of goal-driven navigation agents solving the task published in “Learning to Navigate in Cities Without a Map”, NeurIPS 2018
C++
279
star
75

gqn-datasets

Datasets used to train Generative Query Networks (GQNs) in the ‘Neural Scene Representation and Rendering’ paper.
Python
269
star
76

enn

Python
265
star
77

multi_object_datasets

Multi-object image datasets with ground-truth segmentation masks and generative factors.
Python
247
star
78

AQuA

A algebraic word problem dataset, with multiple choice questions annotated with rationales.
238
star
79

card2code

A code generation dataset for generating the code that implements Hearthstone and Magic The Gathering card effects.
236
star
80

grid-cells

Implementation of the supervised learning experiments in Vector-based navigation using grid-like representations in artificial agents, as published at https://www.nature.com/articles/s41586-018-0102-6
Python
236
star
81

arnheim

Jupyter Notebook
235
star
82

synjax

Python
233
star
83

torch-hdf5

Torch interface to HDF5 library
Lua
231
star
84

dm_memorytasks

A set of 13 diverse machine-learning tasks that require memory to solve.
Python
220
star
85

Temporal-3D-Pose-Kinetics

Exploiting temporal context for 3D human pose estimation in the wild: 3D poses for the Kinetics dataset
Python
214
star
86

opro

official code for "Large Language Models as Optimizers"
Python
199
star
87

dm_alchemy

DeepMind Alchemy task environment: a meta-reinforcement learning benchmark
Python
197
star
88

neural_testbed

Jupyter Notebook
187
star
89

kfac-jax

Second Order Optimization and Curvature Estimation with K-FAC in JAX.
Python
187
star
90

pg19

179
star
91

xquad

173
star
92

jmp

JMP is a Mixed Precision library for JAX.
Python
171
star
93

spectral_inference_networks

Implementation of Spectral Inference Networks, ICLR 2019
Python
165
star
94

abstract-reasoning-matrices

Progressive matrices dataset, as described in: Measuring abstract reasoning in neural networks (Barrett*, Hill*, Santoro*, Morcos, Lillicrap), ICML2018
162
star
95

neural_networks_chomsky_hierarchy

Neural Networks and the Chomsky Hierarchy
Python
162
star
96

xitari

This is the 0.4 release of the Arcade Learning Environment (ALE), a platform designed for AI research. ALE is based on Stella, an Atari 2600 VCS emulator.
C++
159
star
97

tensor_annotations

Annotating tensor shapes using Python types
Python
158
star
98

symplectic-gradient-adjustment

A colab that implements the Symplectic Gradient Adjustment optimizer from "The mechanics of n-player differentiable games"
Jupyter Notebook
150
star
99

mc_gradients

Jupyter Notebook
149
star
100

interval-bound-propagation

This repository contains a simple implementation of Interval Bound Propagation (IBP) using TensorFlow: https://arxiv.org/abs/1810.12715
Python
148
star