General Input Files

SPECI uses structured input files to define chemical systems for automated speciation analysis. The main input file format is CSV, typically located in the inputs_template/ directory or provided as examples in the examples/ folder.

Input File Types

  • Component Data (`components-data.csv`): - Contains the list of all components (e.g., metals, ligands, solvents) and their relevant properties. - Required columns:

    • components — Unique name/label for each fragment. Fragments should be defined so that no bonds are broken or formed within a fragment. For multidentate ligands, specify each donor atom with Donor + ligand name + bonding atom (e.g., DonorLN2N1). The code recognizes multidentate ligands if Donor is present in the name.

    • charge — Net charge of the component, or (for multidentate ligands) the charge contributed by the bonding atom.

    • connectivity allowed — List the allowed number of connectivities for the bonding atom with other fragments (e.g., “0, 1, 2”). If 0 is included, the component may be omitted from some molecules.

    • donor atom — The element type of the bonding atom (e.g., N, O, C).

    • type — The role of the component (e.g., metal, ligand). If a ligand is multidentate, type it as ligand. Neutral monodentate ligands should also be marked as donor ligands.

  • Structure Files (.ct): - For each component, a ChemDraw .ct file is required, showing explicit H atoms. - Bonding atoms in ChemDraw must be labeled as Node for metals/ligands, and as Nod1, Nod2, etc. for multidentate donor ligands.

Example: Component Data CSV

components,charge,connectivity allowed,donor atom,type
Mg,2,"0, 1, 2, 3, 4",Mg,metal
OEt2,0,"0, 1",O,ligand
R,-1,"0, 1",C,ligand
DonorLN2N1,-1,"0, 1",N,ligand
DonorLN2N2,-1,"0, 1, 2",N,ligand

(Your file may contain additional components and columns as needed.)

Tips for Creating Input Files

  • Copy an example CSV from the inputs_template/ or examples/ folder.

  • Ensure all required columns are filled for each component.

  • Check for correct naming, charge, and donor atom specification, especially for multidentate ligands.

  • Save all .ct structure files with the exact same names as the fragments in the CSV.

Location of Input Files

  • Place input files in your working directory or reference them in your scripts/notebooks.

  • Example input sets and templates are in the examples/ and inputs_template/ folders.

Input Settings

The following Python variables and functions control aspects of graph generation. These should be edited directly in the Jupyter notebook (see SPECI.ipynb).

Graph Generation Parameters

charge_specified = 0              # Total charge of the species to generate
monomers = 2                      # Maximum number of repeats per fragment/component in the species
donor_M_connection = 6            # Maximum number of connections between metals (M) and donor atoms
donor_as_bridgeing = False        # Set True if using donor ligand to describe a charged multidentate ligand
center = ['Mg', 'Na']             # Metal centers for filtering out 3D structures with close metal–atom contacts
df = pd.read_csv('components-data.csv')   # Path to fragment property CSV file
user_specified_atomcount_controll = []    # Example: [('component1', 2), ('component2', 1)]
bonds_not_constructed = [('Na', 'R')]     # Prevents certain bonds from being constructed
num_cores = 7                     # Number of CPU cores to use (set 7 for an 8-core machine, etc.)
dummy_atom_needed = ['Mg', 'Li']  # Specify if dummy atoms are needed in the structure
  • Adjust these parameters according to your chemical system and computational resources.

  • user_specified_atomcount_controll lets you limit the number of certain fragments in each generated species.

  • bonds_not_constructed prevents the program from forming bonds between specific components.

Additional Settings for Advanced Users

The following Python variables and functions control advanced aspects of structure embedding and output. Most are set directly in the Jupyter notebook (see SPECI.ipynb).

2D to 3D Structure Embedding Settings

You may need to adjust the atomic distance thresholds for filtering out poor 3D structures:

def passes_non_H_threshold(atoms, coords, threshold=1.6):   # Change threshold as needed
def passes_all_thresholds(atoms, coords, non_H_threshold=1.6, all_atom_threshold=0.9): # Adjust thresholds
  • Default thresholds (in Å) are typically sufficient, but you can tighten or loosen them for challenging systems.

XYZ Structure Optimization and Gaussian Input

To optimize 3D structures using quantum chemistry (e.g., Gaussian):

totla_structure_num = 100     # Total number of structures to model (should be at least 3x number of groups)
# Custom Gaussian input file generation (example):
i = 0
name = 'speciation'
for index in collected_indices:
    new_comfile = name + str(index) + '.com'
    open(new_comfile, '+a').write('%chk=' + name + str(index) + '.chk' + '\n' +
                                  '%nprocshared=16\n' +
                                  '%mem=32GB\n' +
                                  '#p opt=loose PM7 freq scf=xqc \n' +
                                  '\n speciation study\n' +
                                  '\n-1 1\n')
    with open("speciation" + str(index) + ".xyz", 'r') as f:
        lines = f.readlines()
    for line in lines[2:]:
        open(new_comfile, '+a').write(line)
    open(new_comfile, '+a').write('\n')
    i += 1
  • Note: Users must adapt these templates to their own cluster/scheduler and level of theory. Advanced computational chemistry experience is required.

Gaussian Output Interpretation & Data Export

directory_path = "/path/to/your/PM7-log"    # Specify your Gaussian log file directory
df.to_csv("/path/to/DFT_Energies.csv", index=False)   # Path to save energies/results
  • Make sure you update the paths for your own system.

Interacting Ion Pair (IIP) Generation

If using the IIP feature for ion pair generation:

  • Cation and anion XYZ files must be pre-generated and placed in folders named cat and ani.

def random_placement_all(cation, anion, initial_distance=50.0, final_distance=2.5):
    # Adjust final_distance as needed for closest approach
rotation_angles = [0, 120, 240]   # Customize rotation sampling
sorted_finals = sorted(all_final_structs, key=lambda x: x[1])[:10]   # Keep top N structures
# Usage:
base_dir = '/path/to/IIP/cat2ani2'    # Set your working directory
generate_combinations(base_dir, cation_count=1, anion_count=1)    # Specify number of cations/anions
  • Output IIP XYZs will be saved to the generated_xyz folder.

Important Notes

  • Advanced features require prior experience with computational chemistry software (e.g., Gaussian).

  • Always review and adapt Python scripts and computational job settings for your hardware and chemical system.

  • For help and troubleshooting, consult the [GitHub repository](https://github.com/Manting-Mu/OLIGO/issues) or example notebooks.