Builder

The Builder is a submodule of ProtoSyn.Core module. As such, the following core introduces both new types and methods that work together, in a generally independent way from the rest of the module, and require an unique exploratory section on their own.

The following sections offer a more in-depth view of the available types and methods for building molecular structures from template libraries:

Defining and loading a Stochastic L-Grammar
Building a molecular structure
Manipulating a molecular structure by adding new residues from templates

Defining and loading a Stochastic L-Grammar

A core feature of ProtoSyn is the generation of structures from scratch, using residue templates as building blocks for complex structures. The Builder submodule introduces this functionality by providing support for Stochastic L-grammars. As a succint summary, L-grammar systems provide a simple syntax to encode rather complex structures, supporting ramifications (as in carbohydrates and glycoproteins) and random generation of compositions by stochastic rules.

In ProtoSyn, several different L-Grammar systems are provided, based on the type of variables. For example, in the Peptides module, a peptide-based L-Grammar is made available, where the alphabet/variables are the 20 natural aminoacids. In this case, since peptides are a linear chain, there isn't a stochastic rule: each aminoacid, when expanded, simply returns itself. Finally, the main operator (α) is a peptidic bond generator, connecting two residues in a row. Therefore, as an example of employment of Peptides L-Grammar, the string "GME" could be easily expanded and built into a 3D structure of 3 aminoacids: Glycine-Methionine-Glutamic Acid, connected by peptidic bonds.

Note:

Since L-grammars are specific for a given type/family of molecules, no default grammar is provided by ProtoSyn.Core module. The following examples and details are discussed by using the Peptides default L-grammar.

ProtoSyn L-grammar

Figure 1 | A diagram representation of the Peptides default L-grammar. Any L-grammar in ProtoSyn is composed of 3 elements: [1] a variables library containing the templates of all the building blocks available. Each variable is a complete description of all internal associations between Atom instances (Bonds and Parenthood relationships) as well as all internal coordinates and charges). This information, once loaded, forms an independent Fragment object and is indexed by a :name or a :code. In the case of the Peptides L-Grammar, there are 20 variables, one for each of the 20 natural aminoacids; [2] one or more operators, describing bridging connections between 2 of the L-Grammar variables. These, once loaded, return a function that bonds (and applies the correct Parenthood relationships) the requested Atom instances, while also applying specific internal coordinates to the involved Atom instances. In the case of the Peptides L-Grammar, the only available operator describes a peptidic bond (in reality, 2 extra operators are available for generating internal coordinates for the linear peptides & the special case of prolines); [3] optionally, a set of stochastic rules for choosing an operator. ProtoSyn employs stochastic rules for choosing what operator to apply to any 2 given templates, meaning that different operators can be randomly applied based on a set of weights, generating complex structures in a random way, if desired. In the case of the Peptides L-grammar, such rules are not applied, since there is only 1 operator to be applied linearly.

As previously explored, ProtoSyn supports Stochastic L-Grammar structures for defining semi-random and ramified molecular structures. The following types and methods explore how this is achieved in more detail. In addition, loading an LGrammar also adds any additional information (such as available residue types for mutation) to all relevant global variables in ProtoSyn.

ProtoSyn.LGrammar — Type

LGrammar{T <: AbstractFloat, K, V}(rules::Dict{K, Vector{StochasticRule{K,V}}}, variables::Dict{K, Fragment}, operators::Dict{K, Function}, defop::Opt{Function})

An LGrammar instance. Holds information regarding a stochastic L-Grammar system, made up of a set of variables connectable by one or more operators. Optionally, stochastic rules can randomly pick the operator to apply, based on a set of weights.

LGrammar{T, K, V}() where {T <: AbstractFloat, K, V}

Return an empty LGrammar instance.

Fields:

rules::Dict{K, Vector{StochasticRule{K,V}}} - A dictionary of StochasticRule instances indexed by the variable key over which the given rule will operate;
variables::Dict{K, Fragment} - A dictionary of variables (Fragment templates) indexed by the corresponding code;
operators::Dict{K, Function} - A dictionary of operator Function instances indexed by a named String;
defop::Opt{Function} - Default operator. If no operator is described in the given derivation (during the build process), uses this operator.

Note:

As a general rule, LGrammar instances are loaded from an .YML file (using the load_grammar_from_file method). Check this entry for a more in-depth description of the file format.

Examples

julia> grammar = LGrammar{Float64, String, Vector{String}}()
LGrammar{Float64, String, Vector{String}}:
 Rules: None.
 Variables: None.
 Operators: None.

julia> grammar = ProtoSyn.Peptides.grammar

source

ProtoSyn.StochasticRule — Type

StochasticRule(p::T, rule::Pair{K, V}) where {T <: AbstractFloat, K, V}

Return a new StochasticRule instance with the given probability of occurrence p. The rule is a Pair{K, V} where in most cases K is an instance of type String (i.e.: a Key) and V is an instance of type Vector{String} (i.e.: a Vector of instructions). These are also called of "production instructions" and define the result of deriving the given "key" in any derivation. As an example, the pair "A" => ["A", "ɑ", "A"] would be interpreted upon derivation, and the entry A would be expanded to AA, where both new A instances are joined by the ɑ operator, with a probability of occurrence of p.

Fields

p::T - The probability of occurrence;
source::K - The key of the rule on this StochasticRule instance;
production::V - The resulting vector of the derivation of this StochasticRule instance on the given source.

See also

LGrammar

Examples

julia> sr = StochasticRule(1.0, "A" => ["A", "ɑ", "A"])
A(p=1.0) -> ["A", "ɑ", "A"]

source

ProtoSyn.load_grammar_from_file — Function

load_grammar_from_file([::Type{T}], filename::AbstractString, key::String) where {T <: AbstractFloat}

Create an LGrammar instance from the contents of a grammar file (in .YML format) under the key entry. The file contents are parsed by the lgfactory method. Any numerical entry is parsed to the provided type T (or Units.defaultFloat if no type is provided). Return the parsed LGrammar instance. automatically calls load_grammar_extras_from_file!.

See also

LGrammar lgfactory

Examples

julia> lgrammar = load_grammar_from_file(Float64, filename, "peptide")

julia> lgrammar = load_grammar_from_file(filename, "peptide")

source

ProtoSyn L-grammar

Figure 2 | An exploration of the .YML file format describing a new LGrammar instance (and loaded by the load_grammar_from_file method). Templates for the variables entry can be in any of the supported formats by ProtoSyn (such as .YML and .PDB). Usually .YML formats are employed, since extra information such as the Parenthood relationships between intra-residue atoms can be easily included. ProtoSyn is able to parse certain unit symbols, such as the degree symbol (°). Otherwise, the default units are in radians.

ProtoSyn.lgfactory — Function

lgfactory([::Type{T}], template::Dict) where {T <: AbstractFloat}

Create an LGrammar instance from the contencts of a template Dict (normally read from a grammar file). Any numerical entry is parsed to the provided type T (or Units.defaultFloat if no type is provided). The operators entry is parsed by the opfactory method. Return the parsed LGrammar instance.

Note:

This is an internal method of ProtoSyn and shouldn't normally be used directly.

source

ProtoSyn.opfactory — Function

opfactory(args::Any)

Return the operation function (as a closure) given the input arguments args (normally read from a grammar file).

See also

lgfactory

Note:

The resulting operation function is responsible for setting the internal coordinates of residues in the system when connecting, building and manipulating poses.

Note:

This is an internal method of ProtoSyn and shouldn't normally be used directly.

source

ProtoSyn.load_grammar_extras_from_file! — Function

load_grammar_extras_from_file!([::Type{T}], filename::AbstractString, key::String) where {T <: AbstractFloat}

Loads the key entry in the given LGrammar .YML file (filename) extras into the correct global variables in ProtoSyn. Any numerical entry is parsed to the provided type T (or Units.defaultFloat if no type is provided).

The extra info loaded by this method is:

Any alt entry is added to ProtoSyn.alt_residue_names

Note:

Other modules (such as Peptides) may retrieve extra information from the LGrammar file. As such, these modules often include an expanded method for load_grammar_extras_from_file!.

Note:

This method is automatically called from load_grammar_from_file. This is the recommended way to load an LGrammar (this method shouldn't be called as a standalone for most applications).

Examples

julia> ProtoSyn.load_grammar_extras_from_file!(ProtoSyn.resource_dir*"/Peptides/grammars.yml", "default")

source

Building a molecular structure

One of the main goals of an L-Grammar in ProtoSyn is to facilitate building a molecular structure from a sequence by joining together template variables as building blocks. A vector of codes describes the desired structure. In the case of Peptides, for example, this is simply a linear sequence of aminoacids, while more complex structures, such as ramified carbohydrates or glycoproteins might have an equally more complex vector of codes. The following methods explore further on how to use ProtoSyn's L-Grammar system to build new molecular structures from a template libraries.

ProtoSyn.@seq_str — Macro

@seq_str(s::String)

Construct a vector of strings from the provided string. Helpful when providing a derivation to any building method (such as build).

Short syntax

seq"..."

Examples

julia> seq"ABC"
3-element Vector{String}:
 "A"
 "B"
 "C"

source

ProtoSyn.fragment — Method

fragment(grammar::LGrammar{T, K, V}, derivation) where {T <: AbstractFloat, K, V}

Create and return a new Fragment (Pose instance with just a single Segment) using the given derivation sequence on the provided LGrammar grammar instructions. The main purpose of fragments is to be temporary carriers of information, such as during the building process of a new peptide from a sequence. Therefore, these structures often don't have any real meaning and are, as such, deprived of a root/origin for the graph. Actual structures should instead be of the slightly more complete type Pose.

See also

build

Examples

julia> frag = fragment(res_lib, seq"AAA")
Fragment(Segment{/UNK:63875}, State{Float64}:
 Size: 30
 i2c: false | c2i: false
 Energy: Dict(:Total => Inf)
)

source

ProtoSyn.build — Function

build(grammar::LGrammar{T}, derivation)

Build a new Pose instance using the given derivation sequence on the provided LGrammar grammar instructions. Return the generated Pose after synching (using the sync! method).

See Also

fragment

Examples

julia> res_lib = ProtoSyn.Peptides.grammar;

julia> pose = ProtoSyn.build(res_lib, seq"GME")
Pose{Topology}(Topology{/UNK:1}, State{Float64}:
 Size: 39
 i2c: false | c2i: false
 Energy: Dict(:Total => Inf)
)

source

Sometimes, an LGrammar may provide multiple tautomers for a single Residue type. By default, when building a peptide from a sequence, ProtoSyn will use the first found tautomer, so the list order is important.

ProtoSyn.find_tautomer — Function

find_tautomer(tautomer::Tautomer, target::Residue)

Given a target Residue, search the provided Tautomer tautomer list for the corresponding template Residue, based on the Graph (employs the travel_graph method).

Examples

julia> tautomer = Peptides.grammar.variables["H"]
Fragment(Segment{/HIE:22535}, State{Float64}:
 Size: 17
 i2c: false | c2i: false
 Energy: Dict(:Total => Inf)
)(And 1 other tautomer(s) available.)

julia> ProtoSyn.find_tautomer(tautomer, pose.graph[1][72])
Fragment(Segment{/HID:3247}, State{Float64}:
 Size: 17
 i2c: false | c2i: false
 Energy: Dict(:Total => Inf)
)

source

Manipulating a molecular structure by adding new residues from templates

Once built (or loaded), a molecular structure can be manipulated and changed in various ways. Several methods available to add, modify and remove Residue instances from a molecular structure are discussed in the Methods section (see Appending, inserting and removing Atom and Residue instances). The Builder submodule also includes methods allowing the insertion of template residues from a sequence of vector of codes.

ProtoSyn.append_fragment! — Method

append_fragment!(pose::Pose{Topology}, residue::Residue, grammar::LGrammar, derivation; op::Any = "α")

Based on the provided grammar, add the residue sequence from derivation to the given Pose pose, appending it after the given Residue residue. This residue and the new Fragment will be connected using operation op ("α" by default). Request internal to cartesian coordinate conversion and return the altered Pose pose.

See also

insert_fragment!

Examples

jldoctest julia> ProtoSyn.append_fragment!(pose, pose.graph[1][36], res_lib, seq"MMM") Pose{Topology}(Topology{/2a3d:532}, State{Float64}: Size: 628 i2c: true | c2i: false Energy: Dict(:Total => Inf) )

source

ProtoSyn.insert_fragment! — Method

insert_fragment!(pose::Pose{Topology}, residue::Residue, grammar::LGrammar, derivation; op::Any = "α", connect_upstream::Bool = true)

Based on the provided grammar, add the residue sequence from derivation to the given pose, inserting it on the position of the given Residue instance residue (the residue gets shifted downstream). The first downstream Residue and the new Fragment will be connected using operation op ("α" by default). If connect_upstream is set to true (is, by default), also connect to the upstream Residue instances using the same operation op. Request internal to cartesian coordinate conversion and return the altered Pose pose.

See also

append_fragment!

Examples

julia> ProtoSyn.unbond!(pose, pose.graph[1][1]["C"], pose.graph[1, 2, "N"])
Pose{Topology}(Topology{/UNK:1}, State{Float64}:
 Size: 343
 i2c: true | c2i: false
 Energy: Dict(:Total => Inf)
)

julia> ProtoSyn.insert_fragment!(pose, pose.graph[1][2], res_lib, seq"A")
Pose{Topology}(Topology{/UNK:1}, State{Float64}:
 Size: 353
 i2c: true | c2i: false
 Energy: Dict(:Total => Inf)
)

source

ProtoSyn Manipulation

Figure 3 | Some examples of the application of molecular manipulation methods: [1] Appending Residue instances at the end of a Segment using the append_fragment! method; [2] Adding Residue instances at the center and [3] at the beggining of an existing Segment, using the insert_fragment! method. In the schematic representation of the molecular structure, R denotes the Topology root.