Builder
The Builder is a submodule of
ProtoSyn.Core
module. As such, the following core introduces both new types and methods that work together, in a generally independent way from the rest of the module, and require an unique exploratory section on their own.
The following sections offer a more in-depth view of the available types and methods for building molecular structures from template libraries:
- Defining and loading a Stochastic L-Grammar
- Building a molecular structure
- Manipulating a molecular structure by adding new residues from templates
Defining and loading a Stochastic L-Grammar
A core feature of ProtoSyn is the generation of structures from scratch, using residue templates as building blocks for complex structures. The Builder submodule introduces this functionality by providing support for Stochastic L-grammars. As a succint summary, L-grammar systems provide a simple syntax to encode rather complex structures, supporting ramifications (as in carbohydrates and glycoproteins) and random generation of compositions by stochastic rules.
In ProtoSyn, several different L-Grammar systems are provided, based on the type of variables. For example, in the Peptides module, a peptide-based L-Grammar is made available, where the alphabet/variables are the 20 natural aminoacids. In this case, since peptides are a linear chain, there isn't a stochastic rule: each aminoacid, when expanded, simply returns itself. Finally, the main operator (α
) is a peptidic bond generator, connecting two residues in a row. Therefore, as an example of employment of Peptides L-Grammar, the string "GME"
could be easily expanded and built into a 3D structure of 3 aminoacids: Glycine-Methionine-Glutamic Acid
, connected by peptidic bonds.
Since L-grammars are specific for a given type/family of molecules, no default grammar is provided by ProtoSyn.Core
module. The following examples and details are discussed by using the Peptides default L-grammar.
Figure 1 | A diagram representation of the Peptides default L-grammar. Any L-grammar in ProtoSyn is composed of 3 elements: [1] a variables library containing the templates of all the building blocks available. Each variable is a complete description of all internal associations between Atom
instances (Bonds and Parenthood relationships) as well as all internal coordinates and charges). This information, once loaded, forms an independent Fragment
object and is indexed by a :name
or a :code
. In the case of the Peptides L-Grammar, there are 20 variables, one for each of the 20 natural aminoacids; [2] one or more operators, describing bridging connections between 2 of the L-Grammar variables. These, once loaded, return a function that bonds (and applies the correct Parenthood relationships) the requested Atom
instances, while also applying specific internal coordinates to the involved Atom
instances. In the case of the Peptides L-Grammar, the only available operator describes a peptidic bond (in reality, 2 extra operators are available for generating internal coordinates for the linear peptides & the special case of prolines); [3] optionally, a set of stochastic rules for choosing an operator. ProtoSyn employs stochastic rules for choosing what operator to apply to any 2 given templates, meaning that different operators can be randomly applied based on a set of weights, generating complex structures in a random way, if desired. In the case of the Peptides L-grammar, such rules are not applied, since there is only 1 operator to be applied linearly.
As previously explored, ProtoSyn supports Stochastic L-Grammar structures for defining semi-random and ramified molecular structures. The following types and methods explore how this is achieved in more detail. In addition, loading an LGrammar
also adds any additional information (such as available residue types for mutation) to all relevant global variables in ProtoSyn.
ProtoSyn.LGrammar
— TypeLGrammar{T <: AbstractFloat, K, V}(rules::Dict{K, Vector{StochasticRule{K,V}}}, variables::Dict{K, Fragment}, operators::Dict{K, Function}, defop::Opt{Function})
An LGrammar
instance. Holds information regarding a stochastic L-Grammar system, made up of a set of variables
connectable by one or more operators
. Optionally, stochastic rules
can randomly pick the operator to apply, based on a set of weights.
LGrammar{T, K, V}() where {T <: AbstractFloat, K, V}
Return an empty LGrammar
instance.
Fields:
rules::Dict{K, Vector{StochasticRule{K,V}}}
- A dictionary ofStochasticRule
instances indexed by thevariable
key over which the given rule will operate;variables::Dict{K, Fragment}
- A dictionary of variables (Fragment
templates) indexed by the corresponding code;operators::Dict{K, Function}
- A dictionary of operatorFunction
instances indexed by a namedString
;defop::Opt{Function}
- Default operator. If no operator is described in the givenderivation
(during thebuild
process), uses this operator.
See also
StochasticRule
build
load_grammar_from_file
As a general rule, LGrammar
instances are loaded from an .YML file (using the load_grammar_from_file
method). Check this entry for a more in-depth description of the file format.
Examples
julia> grammar = LGrammar{Float64, String, Vector{String}}()
LGrammar{Float64, String, Vector{String}}:
Rules: None.
Variables: None.
Operators: None.
julia> grammar = ProtoSyn.Peptides.grammar
ProtoSyn.StochasticRule
— TypeStochasticRule(p::T, rule::Pair{K, V}) where {T <: AbstractFloat, K, V}
Return a new StochasticRule
instance with the given probability of occurrence p
. The rule
is a Pair{K, V}
where in most cases K
is an instance of type String
(i.e.: a Key) and V
is an instance of type Vector{String}
(i.e.: a Vector of instructions). These are also called of "production instructions" and define the result of deriving the given "key" in any derivation. As an example, the pair "A" => ["A", "ɑ", "A"]
would be interpreted upon derivation, and the entry A
would be expanded to AA
, where both new A
instances are joined by the ɑ
operator, with a probability of occurrence of p
.
Fields
p::T
- The probability of occurrence;source::K
- The key of therule
on thisStochasticRule
instance;production::V
- The resulting vector of the derivation of thisStochasticRule
instance on the givensource
.
See also
Examples
julia> sr = StochasticRule(1.0, "A" => ["A", "ɑ", "A"])
A(p=1.0) -> ["A", "ɑ", "A"]
ProtoSyn.load_grammar_from_file
— Functionload_grammar_from_file([::Type{T}], filename::AbstractString, key::String) where {T <: AbstractFloat}
Create an LGrammar
instance from the contents of a grammar file (in .YML format) under the key
entry. The file contents are parsed by the lgfactory
method. Any numerical entry is parsed to the provided type T
(or Units.defaultFloat
if no type is provided). Return the parsed LGrammar
instance. automatically calls load_grammar_extras_from_file!
.
See also
Examples
julia> lgrammar = load_grammar_from_file(Float64, filename, "peptide")
julia> lgrammar = load_grammar_from_file(filename, "peptide")
Figure 2 | An exploration of the .YML file format describing a new LGrammar
instance (and loaded by the load_grammar_from_file
method). Templates for the variables
entry can be in any of the supported formats by ProtoSyn (such as .YML and .PDB). Usually .YML formats are employed, since extra information such as the Parenthood relationships between intra-residue atoms can be easily included. ProtoSyn is able to parse certain unit symbols, such as the degree symbol (°
). Otherwise, the default units are in radians.
ProtoSyn.lgfactory
— Functionlgfactory([::Type{T}], template::Dict) where {T <: AbstractFloat}
Create an LGrammar
instance from the contencts of a template
Dict (normally read from a grammar file). Any numerical entry is parsed to the provided type T
(or Units.defaultFloat
if no type is provided). The operators
entry is parsed by the opfactory
method. Return the parsed LGrammar
instance.
See also
LGrammar
load_grammar_from_file
opfactory
This is an internal method of ProtoSyn and shouldn't normally be used directly.
ProtoSyn.opfactory
— Functionopfactory(args::Any)
Return the operation
function (as a closure) given the input arguments args
(normally read from a grammar file).
See also
The resulting operation function is responsible for setting the internal coordinates of residues in the system when connecting, building and manipulating poses.
This is an internal method of ProtoSyn and shouldn't normally be used directly.
ProtoSyn.load_grammar_extras_from_file!
— Functionload_grammar_extras_from_file!([::Type{T}], filename::AbstractString, key::String) where {T <: AbstractFloat}
Loads the key
entry in the given LGrammar
.YML file (filename
) extras into the correct global variables in ProtoSyn. Any numerical entry is parsed to the provided type T
(or Units.defaultFloat
if no type is provided).
The extra info loaded by this method is:
- Any
alt
entry is added toProtoSyn.alt_residue_names
Other modules (such as Peptides) may retrieve extra information from the LGrammar
file. As such, these modules often include an expanded method for load_grammar_extras_from_file!
.
This method is automatically called from load_grammar_from_file
. This is the recommended way to load an LGrammar
(this method shouldn't be called as a standalone for most applications).
Examples
julia> ProtoSyn.load_grammar_extras_from_file!(ProtoSyn.resource_dir*"/Peptides/grammars.yml", "default")
Building a molecular structure
One of the main goals of an L-Grammar in ProtoSyn is to facilitate building a molecular structure from a sequence by joining together template variables as building blocks. A vector of codes describes the desired structure. In the case of Peptides, for example, this is simply a linear sequence of aminoacids, while more complex structures, such as ramified carbohydrates or glycoproteins might have an equally more complex vector of codes. The following methods explore further on how to use ProtoSyn's L-Grammar system to build new molecular structures from a template libraries.
ProtoSyn.@seq_str
— Macro@seq_str(s::String)
Construct a vector of strings from the provided string. Helpful when providing a derivation to any building method (such as build
).
Short syntax
- seq"..."
Examples
julia> seq"ABC"
3-element Vector{String}:
"A"
"B"
"C"
ProtoSyn.fragment
— Methodfragment(grammar::LGrammar{T, K, V}, derivation) where {T <: AbstractFloat, K, V}
Create and return a new Fragment
(Pose
instance with just a single Segment
) using the given derivation
sequence on the provided LGrammar
grammar
instructions. The main purpose of fragments is to be temporary carriers of information, such as during the building process of a new peptide from a sequence. Therefore, these structures often don't have any real meaning and are, as such, deprived of a root/origin for the graph. Actual structures should instead be of the slightly more complete type Pose
.
See also
Examples
julia> frag = fragment(res_lib, seq"AAA")
Fragment(Segment{/UNK:63875}, State{Float64}:
Size: 30
i2c: false | c2i: false
Energy: Dict(:Total => Inf)
)
ProtoSyn.build
— Functionbuild(grammar::LGrammar{T}, derivation)
Build a new Pose
instance using the given derivation
sequence on the provided LGrammar
grammar
instructions. Return the generated Pose
after synching (using the sync!
method).
See Also
Examples
julia> res_lib = ProtoSyn.Peptides.grammar;
julia> pose = ProtoSyn.build(res_lib, seq"GME")
Pose{Topology}(Topology{/UNK:1}, State{Float64}:
Size: 39
i2c: false | c2i: false
Energy: Dict(:Total => Inf)
)
Sometimes, an LGrammar
may provide multiple tautomers for a single Residue
type. By default, when building a peptide from a sequence, ProtoSyn will use the first found tautomer, so the list order is important.
ProtoSyn.find_tautomer
— Functionfind_tautomer(tautomer::Tautomer, target::Residue)
Given a target
Residue
, search the provided Tautomer tautomer
list for the corresponding template Residue
, based on the Graph (employs the travel_graph
method).
Examples
julia> tautomer = Peptides.grammar.variables["H"]
Fragment(Segment{/HIE:22535}, State{Float64}:
Size: 17
i2c: false | c2i: false
Energy: Dict(:Total => Inf)
)(And 1 other tautomer(s) available.)
julia> ProtoSyn.find_tautomer(tautomer, pose.graph[1][72])
Fragment(Segment{/HID:3247}, State{Float64}:
Size: 17
i2c: false | c2i: false
Energy: Dict(:Total => Inf)
)
Manipulating a molecular structure by adding new residues from templates
Once built (or loaded), a molecular structure can be manipulated and changed in various ways. Several methods available to add, modify and remove Residue
instances from a molecular structure are discussed in the Methods section (see Appending, inserting and removing Atom and Residue instances). The Builder submodule also includes methods allowing the insertion of template residues from a sequence of vector of codes.
ProtoSyn.append_fragment!
— Methodappend_fragment!(pose::Pose{Topology}, residue::Residue, grammar::LGrammar, derivation; op::Any = "α")
Based on the provided grammar
, add the residue sequence from derivation
to the given Pose
pose
, appending it after the given Residue
residue
. This residue and the new Fragment
will be connected using operation op
("α" by default). Request internal to cartesian coordinate conversion and return the altered Pose
pose
.
See also
Examples
jldoctest julia> ProtoSyn.append_fragment!(pose, pose.graph[1][36], res_lib, seq"MMM") Pose{Topology}(Topology{/2a3d:532}, State{Float64}: Size: 628 i2c: true | c2i: false Energy: Dict(:Total => Inf) )
ProtoSyn.insert_fragment!
— Methodinsert_fragment!(pose::Pose{Topology}, residue::Residue, grammar::LGrammar, derivation; op::Any = "α", connect_upstream::Bool = true)
Based on the provided grammar
, add the residue sequence from derivation
to the given pose
, inserting it on the position of the given Residue
instance residue
(the residue
gets shifted downstream). The first downstream Residue
and the new Fragment
will be connected using operation op
("α" by default). If connect_upstream
is set to true (is, by default), also connect to the upstream Residue
instances using the same operation op
. Request internal to cartesian coordinate conversion and return the altered Pose
pose
.
See also
Examples
julia> ProtoSyn.unbond!(pose, pose.graph[1][1]["C"], pose.graph[1, 2, "N"])
Pose{Topology}(Topology{/UNK:1}, State{Float64}:
Size: 343
i2c: true | c2i: false
Energy: Dict(:Total => Inf)
)
julia> ProtoSyn.insert_fragment!(pose, pose.graph[1][2], res_lib, seq"A")
Pose{Topology}(Topology{/UNK:1}, State{Float64}:
Size: 353
i2c: true | c2i: false
Energy: Dict(:Total => Inf)
)
Figure 3 | Some examples of the application of molecular manipulation methods: [1] Appending Residue
instances at the end of a Segment
using the append_fragment!
method; [2] Adding Residue
instances at the center and [3] at the beggining of an existing Segment
, using the insert_fragment!
method. In the schematic representation of the molecular structure, R denotes the Topology
root.