ISD Architecture
2026-02-14
Data availability and user experience had become very poor








Researchers and Studies
Researchers have Agreements and Training Status
Studies have Owners, Assets and Contracts



Two Python ETL Scripts
researcher.py to merge researcher training & agreements
studies.py to build study/asset/contract hierarchy
Load all training and agreement records
def load_training() -> dict[str, datetime | None]:
training = {}
with open(TRAINING_FILE, newline="") as f:
reader = csv.DictReader(f)
clean_headers(reader)
...
def load_agreements() -> dict[str, bool]:
agreements = {}
with open(AGREEMENT_FILE, newline="") as f:
reader = csv.DictReader(f)
clean_headers(reader)
...
def normalise_username(value: str) -> str:
value = value.strip().lower()
if "@" in value:
value = value.replace("@", "_")
return f"{value}#EXT#@liveuclac.onmicrosoft.com"
return f"{value}@ucl.ac.uk" Load training and agreements
def merge_records() -> list[Record]:
training = load_training()
agreements = load_agreements()
# All users who appear in either CSV, sorted alphabetically
all_users = sorted(set(training) | set(agreements))
merged_records: list[Record] = []
for user in all_users:
has_agreed = agreements.get(user, False)
training_date = training.get(user, None)
merged_records.append(Record(...))Load training and agreements, then combine on username:
def merge_records() -> list[Record]:
training = load_training()
agreements = load_agreements()
# All users who appear in either CSV, sorted alphabetically
all_users = sorted(set(training) | set(agreements))
merged_records: list[Record] = []
for user in all_users:
has_agreed = agreements.get(user, False)
training_date = training.get(user, None)
merged_records.append(Record(...))Union both datasets so all unique users are captured
Build complete study hierarchy with nested assets and contracts
def build_import_json(
studies: Dict[str, dict],
assets_by_case: Dict[str, List[dict]],
study_contracts_by_case: Dict[str, List[dict]],
) -> List[dict]:
output: List[dict] = []
for case_ref, study in studies.items():
study["contracts"] = study_contracts_by_case.get(case_ref, [])
assets = assets_by_case.get(case_ref, [])
study["assets"] = assets
output.append(study)def read_studies(filename: str) -> Dict[str, dict]:
studies: Dict[str, dict] = {}
with open(filename, newline="", encoding="utf-8") as f:
reader = csv.DictReader(f)
clean_headers(reader) # Strip BOM & normalize headers
for i, row in enumerate(reader, start=2):
case_ref = (row.get("CaseRef") or "").strip()
if not case_ref:
raise ValueError(f"Study row missing CaseRef (line {i})")Strict validation catches missing or malformed data early
Parse SharePoint dates (DD/MM/YYYY) to ISO format (YYYY-MM-DD):
Ensures consistent date handling across all entities
Both scripts output normalised data:
researchers.py
studies.py
Ready for consumption by the import services in the portal codebase
IG Migration | https://finleybacon.github.io/presentations/