BGSTM Data Model Architecture¶
Last Updated: February 2026
Version: 2.0.0
Related Files: - Models:backend/app/models/*.py- Schema:database/schema.sql- Sample Data:backend/app/db/sample_data.py
Table of Contents¶
- Overview
- Entity Relationship Diagram
- Core Entities
- Design Principles
- AI Integration Points
- Data Flow Diagrams
- Common Queries and Use Cases
- Validation Rules
- Migration and Evolution
- Performance Considerations
Overview¶
The BGSTM data model is designed to support AI-powered requirement-to-test case traceability linking. It enables organizations to:
- Track relationships between requirements and test cases
- Leverage AI/ML to suggest potential links based on semantic similarity
- Maintain a complete audit trail of all traceability activities
- Generate comprehensive traceability matrices and coverage reports
- Support flexible metadata and extensibility for custom workflows
Key Design Goals¶
- Traceability First: Every requirement-to-test case relationship is explicitly tracked with full audit trails
- AI-Ready: Fields and structures designed for machine learning integration
- Flexible: JSONB fields and tag arrays allow custom metadata without schema changes
- Extensible: Easy to add new types, statuses, and relationships
- Performant: Indexed for common query patterns and large datasets
Entity Relationship Diagram¶
Enhanced ERD with Full Field Details¶
erDiagram
REQUIREMENT ||--o{ REQUIREMENT_TEST_CASE_LINK : "has many"
TEST_CASE ||--o{ REQUIREMENT_TEST_CASE_LINK : "has many"
REQUIREMENT ||--o{ LINK_SUGGESTION : "generates many"
TEST_CASE ||--o{ LINK_SUGGESTION : "generates many"
REQUIREMENT {
uuid id PK "Primary Key"
varchar_100 external_id UK "Unique, Indexed, Nullable"
varchar_500 title "NOT NULL, Indexed"
text description "NOT NULL"
enum type "functional|non_functional|technical"
enum priority "critical|high|medium|low"
enum status "draft|approved|implemented|tested|closed"
varchar_100 module "Nullable, Indexed"
text_array tags "Nullable"
jsonb custom_metadata "Nullable, Flexible"
varchar_50 source_system "Nullable"
text source_url "Nullable"
varchar_100 created_by "Nullable"
integer version "Default 1"
timestamp created_at "Default NOW()"
timestamp updated_at "Auto-updated"
}
TEST_CASE {
uuid id PK "Primary Key"
varchar_100 external_id UK "Unique, Indexed, Nullable"
varchar_500 title "NOT NULL, Indexed"
text description "NOT NULL"
enum type "functional|integration|performance|security|ui|regression"
enum priority "critical|high|medium|low"
enum status "draft|ready|executing|passed|failed|blocked|deprecated"
jsonb steps "Nullable, Structured"
text preconditions "Nullable"
text postconditions "Nullable"
jsonb test_data "Nullable"
varchar_100 module "Nullable, Indexed"
text_array tags "Nullable"
enum automation_status "manual|automated|automatable"
integer execution_time_minutes "Nullable"
jsonb custom_metadata "Nullable, Flexible"
varchar_50 source_system "Nullable"
text source_url "Nullable"
varchar_100 created_by "Nullable"
integer version "Default 1"
timestamp created_at "Default NOW()"
timestamp updated_at "Auto-updated"
}
REQUIREMENT_TEST_CASE_LINK {
uuid id PK "Primary Key"
uuid requirement_id FK "NOT NULL, ON DELETE CASCADE, Indexed"
uuid test_case_id FK "NOT NULL, ON DELETE CASCADE, Indexed"
enum link_type "covers|verifies|validates|related"
float confidence_score "Nullable, 0.0-1.0"
enum link_source "manual|ai_suggested|ai_confirmed|imported"
timestamp created_at "Default NOW()"
varchar_100 created_by "Nullable"
timestamp confirmed_at "Nullable"
varchar_100 confirmed_by "Nullable"
text notes "Nullable"
constraint uq_requirement_test_case "UNIQUE(requirement_id, test_case_id)"
}
LINK_SUGGESTION {
uuid id PK "Primary Key"
uuid requirement_id FK "NOT NULL, ON DELETE CASCADE, Indexed"
uuid test_case_id FK "NOT NULL, ON DELETE CASCADE, Indexed"
float similarity_score "NOT NULL, 0.0-1.0"
enum suggestion_method "semantic_similarity|keyword_match|heuristic|hybrid"
text suggestion_reason "Nullable"
jsonb suggestion_metadata "Nullable, Stores embeddings"
enum status "pending|accepted|rejected|expired, Indexed"
timestamp created_at "Default NOW()"
timestamp reviewed_at "Nullable"
varchar_100 reviewed_by "Nullable"
text feedback "Nullable, User feedback"
}
Enum Type Reference¶
graph TD
subgraph "Requirement Enums"
RT[RequirementType]
RT --> RT1[functional]
RT --> RT2[non_functional]
RT --> RT3[technical]
RS[RequirementStatus]
RS --> RS1[draft]
RS --> RS2[approved]
RS --> RS3[implemented]
RS --> RS4[tested]
RS --> RS5[closed]
end
subgraph "TestCase Enums"
TCT[TestCaseType]
TCT --> TCT1[functional]
TCT --> TCT2[integration]
TCT --> TCT3[performance]
TCT --> TCT4[security]
TCT --> TCT5[ui]
TCT --> TCT6[regression]
TCS[TestCaseStatus]
TCS --> TCS1[draft]
TCS --> TCS2[ready]
TCS --> TCS3[executing]
TCS --> TCS4[passed]
TCS --> TCS5[failed]
TCS --> TCS6[blocked]
TCS --> TCS7[deprecated]
AS[AutomationStatus]
AS --> AS1[manual]
AS --> AS2[automated]
AS --> AS3[automatable]
end
subgraph "Link Enums"
LT[LinkType]
LT --> LT1[covers]
LT --> LT2[verifies]
LT --> LT3[validates]
LT --> LT4[related]
LS[LinkSource]
LS --> LS1[manual]
LS --> LS2[ai_suggested]
LS --> LS3[ai_confirmed]
LS --> LS4[imported]
end
subgraph "Suggestion Enums"
SM[SuggestionMethod]
SM --> SM1[semantic_similarity]
SM --> SM2[keyword_match]
SM --> SM3[heuristic]
SM --> SM4[hybrid]
SS[SuggestionStatus]
SS --> SS1[pending]
SS --> SS2[accepted]
SS --> SS3[rejected]
SS --> SS4[expired]
end
subgraph "Shared Enums"
PL[PriorityLevel]
PL --> PL1[critical]
PL --> PL2[high]
PL --> PL3[medium]
PL --> PL4[low]
end
Relationship Cardinality¶
| Relationship | Type | Description | Cascade Behavior |
|---|---|---|---|
| Requirement → RequirementTestCaseLink | One-to-Many | One requirement can have many test case links | DELETE CASCADE |
| TestCase → RequirementTestCaseLink | One-to-Many | One test case can link to many requirements | DELETE CASCADE |
| Requirement → LinkSuggestion | One-to-Many | One requirement can have many AI suggestions | DELETE CASCADE |
| TestCase → LinkSuggestion | One-to-Many | One test case can have many AI suggestions | DELETE CASCADE |
| Requirement ↔ TestCase | Many-to-Many | Via RequirementTestCaseLink junction table | Through junction |
Database Support¶
The data model supports both: - PostgreSQL: Full-featured with native UUID, JSONB, ENUM types, and ARRAY types - SQLite: Development-friendly with TEXT-based workarounds via custom SQLAlchemy type decorators
SQLAlchemy custom types (GUID, JSON, ArrayType) automatically adapt based on the database dialect.
Core Entities¶
1. Requirement Entity¶
Purpose: Captures software requirements from various sources (Jira, Azure DevOps, custom systems). Requirements represent the "what" of the system—functional, non-functional, or technical specifications that need to be implemented and validated.
Use Cases: - Import requirements from external systems - Track requirement status through the development lifecycle - Link requirements to test cases for traceability - Generate coverage reports
Field Specifications¶
| Field | Type | Constraints | Description | Business Rules |
|---|---|---|---|---|
id | UUID | PK, NOT NULL, Default: uuid4() | Unique identifier for the requirement | Auto-generated |
external_id | VARCHAR(100) | UNIQUE, Indexed, Nullable | External system identifier (e.g., "JIRA-1234") | Must be unique if provided |
title | VARCHAR(500) | NOT NULL, Indexed | Short descriptive title | Max 500 chars, required |
description | TEXT | NOT NULL | Detailed requirement description | No length limit, required |
type | ENUM | NOT NULL | Type of requirement | One of: functional, non_functional, technical |
priority | ENUM | NOT NULL | Business priority | One of: critical, high, medium, low |
status | ENUM | NOT NULL, Default: draft | Current lifecycle status | One of: draft, approved, implemented, tested, closed |
module | VARCHAR(100) | Nullable, Indexed | Module or component name | Used for organizing requirements |
tags | TEXT[] | Nullable | Array of custom tags | Flexible categorization |
custom_metadata | JSONB | Nullable | Extensible metadata | Store any custom fields as JSON |
source_system | VARCHAR(50) | Nullable | Origin system name | e.g., "Jira", "Azure DevOps", "Custom" |
source_url | TEXT | Nullable | URL to source requirement | Deep link to original requirement |
created_by | VARCHAR(100) | Nullable | Username who created it | For audit trail |
version | INTEGER | Default: 1 | Version number | Increment on major changes |
created_at | TIMESTAMP | NOT NULL, Default: NOW() | Creation timestamp | Auto-set on insert |
updated_at | TIMESTAMP | NOT NULL, Default: NOW() | Last update timestamp | Auto-updated on change |
RequirementType Enum¶
| Value | Description | Use Case |
|---|---|---|
functional | Functional requirements describing system behavior | "User can log in with email/password" |
non_functional | Quality attributes (performance, security, usability) | "System must support 10,000 concurrent users" |
technical | Technical constraints or implementation requirements | "Must use PostgreSQL 14+" |
RequirementStatus Enum¶
| Value | Description | Typical Transition |
|---|---|---|
draft | Initial state, not yet reviewed | → approved |
approved | Reviewed and approved for implementation | → implemented |
implemented | Code implemented but not tested | → tested |
tested | Testing complete and passed | → closed |
closed | Requirement fully delivered | Terminal state |
Relationships¶
- One-to-Many with RequirementTestCaseLink:
requirement.links(back_populates="requirement") - Cascade:
all, delete-orphan(deleting requirement removes all its links) - One-to-Many with LinkSuggestion:
requirement.suggestions(back_populates="requirement") - Cascade:
all, delete-orphan(deleting requirement removes all its suggestions)
Example Data¶
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"external_id": "REQ-001",
"title": "User Authentication System",
"description": "Implement secure user authentication with email/password login, OAuth support (Google, Facebook), password reset functionality, and session management. Must include 2FA option.",
"type": "functional",
"priority": "critical",
"status": "approved",
"module": "Authentication",
"tags": ["security", "user-management", "login"],
"custom_metadata": {
"complexity": "high",
"estimated_hours": 40,
"sprint": 5
},
"source_system": "Jira",
"source_url": "https://company.atlassian.net/browse/REQ-001",
"created_by": "product_manager",
"version": 1,
"created_at": "2026-01-15T10:00:00Z",
"updated_at": "2026-01-20T14:30:00Z"
}
2. TestCase Entity¶
Purpose: Represents test scenarios that validate requirements. Test cases describe the "how" of validation—specific steps, conditions, and expected outcomes to verify system behavior.
Use Cases: - Create manual or automated test cases - Link test cases to requirements for coverage tracking - Track test execution status - Generate test reports and metrics
Field Specifications¶
| Field | Type | Constraints | Description | Business Rules |
|---|---|---|---|---|
id | UUID | PK, NOT NULL, Default: uuid4() | Unique identifier for test case | Auto-generated |
external_id | VARCHAR(100) | UNIQUE, Indexed, Nullable | External system identifier (e.g., "TC-5678") | Must be unique if provided |
title | VARCHAR(500) | NOT NULL, Indexed | Descriptive test case title | Max 500 chars, required |
description | TEXT | NOT NULL | Detailed test case description | No length limit, required |
type | ENUM | NOT NULL | Type of test | One of: functional, integration, performance, security, ui, regression |
priority | ENUM | NOT NULL | Test priority | One of: critical, high, medium, low |
status | ENUM | NOT NULL, Default: draft | Current execution status | One of: draft, ready, executing, passed, failed, blocked, deprecated |
steps | JSONB | Nullable | Structured test steps | JSON object with step numbers as keys |
preconditions | TEXT | Nullable | Prerequisites for execution | What must be true before test |
postconditions | TEXT | Nullable | Expected state after test | What should be true after test |
test_data | JSONB | Nullable | Test input data | JSON with test data values |
module | VARCHAR(100) | Nullable, Indexed | Module or component name | Used for organizing test cases |
tags | TEXT[] | Nullable | Array of custom tags | Flexible categorization |
automation_status | ENUM | Default: manual | Automation state | One of: manual, automated, automatable |
execution_time_minutes | INTEGER | Nullable | Expected execution time | In minutes |
custom_metadata | JSONB | Nullable | Extensible metadata | Store any custom fields as JSON |
source_system | VARCHAR(50) | Nullable | Origin system name | e.g., "TestRail", "Zephyr", "Custom" |
source_url | TEXT | Nullable | URL to source test case | Deep link to original test case |
created_by | VARCHAR(100) | Nullable | Username who created it | For audit trail |
version | INTEGER | Default: 1 | Version number | Increment on major changes |
created_at | TIMESTAMP | NOT NULL, Default: NOW() | Creation timestamp | Auto-set on insert |
updated_at | TIMESTAMP | NOT NULL, Default: NOW() | Last update timestamp | Auto-updated on change |
TestCaseType Enum¶
| Value | Description | Use Case |
|---|---|---|
functional | Validates functional requirements | Feature testing |
integration | Tests interaction between components | API integration tests |
performance | Validates performance requirements | Load/stress testing |
security | Security and vulnerability testing | Penetration testing |
ui | User interface and UX testing | Visual regression tests |
regression | Ensures existing features still work | After changes/updates |
TestCaseStatus Enum¶
| Value | Description | Typical Transition |
|---|---|---|
draft | Being written, not ready for execution | → ready |
ready | Ready to be executed | → executing |
executing | Currently being executed | → passed, failed, blocked |
passed | Test passed successfully | Terminal (until re-run) |
failed | Test failed | → ready (after fix) |
blocked | Cannot execute due to dependency | → ready (when unblocked) |
deprecated | No longer relevant/maintained | Terminal state |
AutomationStatus Enum¶
| Value | Description | Typical Use |
|---|---|---|
manual | Must be executed manually | UI tests, exploratory tests |
automated | Fully automated test | CI/CD pipeline tests |
automatable | Can be automated but not yet done | Candidate for automation |
Relationships¶
- One-to-Many with RequirementTestCaseLink:
test_case.links(back_populates="test_case") - Cascade:
all, delete-orphan(deleting test case removes all its links) - One-to-Many with LinkSuggestion:
test_case.suggestions(back_populates="test_case") - Cascade:
all, delete-orphan(deleting test case removes all its suggestions)
Example Data¶
{
"id": "660e8400-e29b-41d4-a716-446655440001",
"external_id": "TC-001",
"title": "Verify User Login with Valid Credentials",
"description": "Test successful login flow with valid email and password. Verify session creation, user dashboard access, and proper token generation.",
"type": "functional",
"priority": "critical",
"status": "passed",
"steps": {
"1": "Navigate to login page",
"2": "Enter valid email address",
"3": "Enter valid password",
"4": "Click 'Login' button",
"5": "Verify redirection to dashboard"
},
"preconditions": "User account exists in the database",
"postconditions": "User is logged in and session is active",
"test_data": {
"email": "test@example.com",
"password": "Test@123"
},
"module": "Authentication",
"tags": ["login", "smoke-test", "critical"],
"automation_status": "automated",
"execution_time_minutes": 5,
"custom_metadata": {
"last_execution": "2026-02-10T15:30:00Z",
"test_suite": "regression"
},
"source_system": "TestRail",
"source_url": "https://company.testrail.io/index.php?/cases/view/1",
"created_by": "qa_engineer",
"version": 2,
"created_at": "2026-01-15T11:00:00Z",
"updated_at": "2026-02-10T15:30:00Z"
}
3. RequirementTestCaseLink Entity¶
Purpose: Explicit traceability links between requirements and test cases. This junction table enables many-to-many relationships and tracks the nature and confidence of each link.
Use Cases: - Create manual traceability links - Convert AI suggestions to confirmed links - Generate traceability matrices - Track test coverage per requirement
Field Specifications¶
| Field | Type | Constraints | Description | Business Rules |
|---|---|---|---|---|
id | UUID | PK, NOT NULL, Default: uuid4() | Unique identifier for link | Auto-generated |
requirement_id | UUID | FK, NOT NULL, Indexed, ON DELETE CASCADE | Reference to requirement | Must exist in requirements table |
test_case_id | UUID | FK, NOT NULL, Indexed, ON DELETE CASCADE | Reference to test case | Must exist in test_cases table |
link_type | ENUM | NOT NULL, Default: covers | Nature of the relationship | One of: covers, verifies, validates, related |
confidence_score | FLOAT | Nullable, Range: 0.0-1.0 | AI confidence (if AI-created) | 1.0 for manual, 0.0-1.0 for AI |
link_source | ENUM | NOT NULL | How link was created | One of: manual, ai_suggested, ai_confirmed, imported |
created_at | TIMESTAMP | NOT NULL, Default: NOW() | When link was created | Auto-set on insert |
created_by | VARCHAR(100) | Nullable | Username who created link | For audit trail |
confirmed_at | TIMESTAMP | Nullable | When link was confirmed | Set when AI suggestion accepted |
confirmed_by | VARCHAR(100) | Nullable | Username who confirmed | For AI-suggested links |
notes | TEXT | Nullable | Additional notes or context | Free-form notes |
Unique Constraint¶
- uq_requirement_test_case: UNIQUE(requirement_id, test_case_id)
- Prevents duplicate links between same requirement and test case pair
LinkType Enum¶
| Value | Description | Use Case |
|---|---|---|
covers | Test case covers requirement | Direct coverage relationship |
verifies | Test case verifies requirement implementation | Validation testing |
validates | Test case validates requirement completeness | Acceptance testing |
related | Indirect or supporting relationship | Related but not direct coverage |
LinkSource Enum¶
| Value | Description | confidence_score |
|---|---|---|
manual | Created by user directly | 1.0 (full confidence) |
ai_suggested | AI generated, awaiting approval | N/A (use suggestion table) |
ai_confirmed | AI suggested, user accepted | From original suggestion |
imported | Imported from external system | 1.0 or from import |
Relationships¶
- Many-to-One with Requirement:
link.requirement(back_populates="links") - Many-to-One with TestCase:
link.test_case(back_populates="links")
Example Data¶
{
"id": "770e8400-e29b-41d4-a716-446655440002",
"requirement_id": "550e8400-e29b-41d4-a716-446655440000",
"test_case_id": "660e8400-e29b-41d4-a716-446655440001",
"link_type": "covers",
"confidence_score": 1.0,
"link_source": "manual",
"created_at": "2026-01-22T09:00:00Z",
"created_by": "qa_engineer",
"confirmed_at": null,
"confirmed_by": null,
"notes": "Direct coverage of login functionality"
}
4. LinkSuggestion Entity¶
Purpose: AI-generated suggestions for potential requirement-test case links. Represents the "pending" state before human review and approval. This enables a human-in-the-loop workflow for AI-assisted traceability.
Use Cases: - AI generates suggestions based on semantic similarity - Users review and accept/reject suggestions - Track AI model performance and accuracy - Improve models based on user feedback
Field Specifications¶
| Field | Type | Constraints | Description | Business Rules |
|---|---|---|---|---|
id | UUID | PK, NOT NULL, Default: uuid4() | Unique identifier for suggestion | Auto-generated |
requirement_id | UUID | FK, NOT NULL, Indexed, ON DELETE CASCADE | Reference to requirement | Must exist in requirements table |
test_case_id | UUID | FK, NOT NULL, Indexed, ON DELETE CASCADE | Reference to test case | Must exist in test_cases table |
similarity_score | FLOAT | NOT NULL, Range: 0.0-1.0 | AI confidence score | Higher = more confident match |
suggestion_method | ENUM | NOT NULL | Algorithm used | One of: semantic_similarity, keyword_match, heuristic, hybrid |
suggestion_reason | TEXT | Nullable | Human-readable explanation | Why AI suggests this link |
suggestion_metadata | JSONB | Nullable | Algorithm-specific data | Store embeddings, matched keywords, etc. |
status | ENUM | NOT NULL, Default: pending, Indexed | Review status | One of: pending, accepted, rejected, expired |
created_at | TIMESTAMP | NOT NULL, Default: NOW() | When suggestion was created | Auto-set on insert |
reviewed_at | TIMESTAMP | Nullable | When user reviewed | Set on accept/reject |
reviewed_by | VARCHAR(100) | Nullable | Username who reviewed | For audit trail |
feedback | TEXT | Nullable | User feedback on suggestion | Why accepted/rejected |
SuggestionMethod Enum¶
| Value | Description | Typical Use Case |
|---|---|---|
semantic_similarity | Uses embeddings and cosine similarity | Most accurate for natural language |
keyword_match | Matches specific keywords/terms | Fast, simple matching |
heuristic | Rule-based matching | Custom business rules |
hybrid | Combines multiple methods | Best overall accuracy |
SuggestionStatus Enum¶
| Value | Description | Next Action |
|---|---|---|
pending | Awaiting user review | Display in UI for review |
accepted | User confirmed link | Create RequirementTestCaseLink |
rejected | User declined link | Archive, use for training |
expired | Suggestion too old or outdated | Archive, may regenerate |
Relationships¶
- Many-to-One with Requirement:
suggestion.requirement(back_populates="suggestions") - Many-to-One with TestCase:
suggestion.test_case(back_populates="suggestions")
Example Data¶
{
"id": "880e8400-e29b-41d4-a716-446655440003",
"requirement_id": "550e8400-e29b-41d4-a716-446655440000",
"test_case_id": "660e8400-e29b-41d4-a716-446655440001",
"similarity_score": 0.87,
"suggestion_method": "semantic_similarity",
"suggestion_reason": "High semantic similarity between requirement description and test case title. Matched concepts: authentication, login, user credentials.",
"suggestion_metadata": {
"embedding_distance": 0.13,
"matched_keywords": ["authentication", "login", "password", "user"],
"model_version": "sentence-transformers/all-mpnet-base-v2"
},
"status": "pending",
"created_at": "2026-02-15T08:00:00Z",
"reviewed_at": null,
"reviewed_by": null,
"feedback": null
}
Design Principles¶
1. Traceability First¶
The data model is built around explicit traceability as a first-class concept:
- Many-to-Many Relationships: Requirements and test cases can have multiple links (e.g., one requirement covered by multiple tests, one test validating multiple requirements)
- Link Types: Different types of relationships (
covers,verifies,validates,related) provide semantic meaning - Unique Constraints:
(requirement_id, test_case_id)uniqueness prevents duplicate links - Bi-directional Navigation: SQLAlchemy relationships enable easy querying in both directions
Benefits: - Generate comprehensive traceability matrices - Calculate test coverage per requirement - Identify gaps (requirements without tests, orphaned tests) - Track impact of requirement changes
2. AI-Ready Design¶
Fields and structures specifically designed for AI/ML integration:
Embedding Storage¶
suggestion_metadataJSONB: Stores ML model embeddings, feature vectors, and intermediate results- Example: Store sentence transformer embeddings as arrays in JSON
- Flexible schema allows different ML models without database changes
Similarity Scoring¶
similarity_score(0.0-1.0): Normalized confidence metric- 0.0 = No confidence
- 0.5 = Moderate confidence
- 1.0 = Perfect match (manual links)
confidence_scorein links: Preserves AI confidence even after acceptance
Suggestion Methods¶
suggestion_methodenum: Tracks which algorithm generated the suggestion- Enables A/B testing and model comparison
- Types:
semantic_similarity,keyword_match,heuristic,hybrid
Feedback Loop¶
feedbackfield: Captures user reasoning for accept/reject decisionsreviewed_byandreviewed_at: Tracks who and when- This data trains future models (supervised learning)
Benefits: - Support multiple ML approaches simultaneously - Track model performance and accuracy - Improve models based on user feedback - Experiment with new algorithms without schema changes
3. Audit Trail¶
Complete audit trail for compliance and accountability:
| Entity | Audit Fields | Purpose |
|---|---|---|
| All Entities | created_at, created_by | Who created and when |
| Requirement & TestCase | updated_at, version | Track changes over time |
| Link | confirmed_at, confirmed_by | Track AI link approvals |
| Suggestion | reviewed_at, reviewed_by | Track review decisions |
Benefits: - Compliance with regulatory requirements (FDA, ISO) - Debug and troubleshoot issues - Understand system usage patterns - Generate audit reports
4. Flexibility via JSONB and Arrays¶
Extensibility without schema migrations:
JSONB Fields¶
custom_metadata: Store any custom fields per organizationsteps: Structured test steps (numbered, nested, conditional)test_data: Test inputs, expected outputs, configurationssuggestion_metadata: ML model data, embeddings, features
Example custom_metadata:
{
"complexity": "high",
"estimated_hours": 40,
"sprint": 5,
"epic_id": "EPIC-123",
"acceptance_criteria": ["AC1", "AC2"],
"regulatory_tag": "FDA-21-CFR-Part-11"
}
Array Fields¶
tags: Flexible categorization without predefined enums- Example:
["security", "GDPR", "critical-path", "smoke-test"]
Benefits: - No schema changes for custom fields - Each organization can extend the model - Supports evolving requirements - Fast queries on JSONB (indexed where needed)
5. Extensibility¶
Easy to extend the model for new use cases:
Adding New Enums¶
To add a new requirement type: 1. Add to RequirementType enum in backend/app/models/requirement.py 2. Run Alembic migration to update PostgreSQL enum 3. Update API schemas in backend/app/schemas/requirement.py
Adding New Entities¶
Follow the pattern: 1. Create new model inheriting from Base and TimestampMixin 2. Add relationships to existing entities 3. Create migration 4. Add API endpoints
Example: Adding a "Defect" entity linked to test cases
Versioning Strategy¶
versionfield: Simple integer versioning for requirements/test cases- Future: Add
requirement_historytable for full change tracking
Benefits: - Adapt to changing business needs - Support new testing methodologies - Integrate with new external systems - Scale from small teams to enterprises
6. Performance Optimization¶
Indexes on high-frequency query fields:
| Table | Indexed Fields | Purpose |
|---|---|---|
| requirements | external_id, title, status, module | Fast lookups and filtering |
| test_cases | external_id, title, status, module | Fast lookups and filtering |
| requirement_test_case_links | requirement_id, test_case_id | Fast join queries |
| link_suggestions | requirement_id, test_case_id, status | Fast pending suggestions query |
Query Patterns: - Get all test cases for a requirement: Index on requirement_id - Find requirements without tests: Left join with count - Get pending suggestions: Index on status - Search by title: Index on title (can add full-text search)
Benefits: - Sub-second queries even with 100,000+ entities - Efficient traceability matrix generation - Fast AI suggestion retrieval - Scales to enterprise datasets
AI Integration Points¶
How AI Uses the Data Model¶
The BGSTM data model is designed to support AI-assisted requirement-test case linking through a human-in-the-loop workflow.
1. Embedding Storage¶
Where: LinkSuggestion.suggestion_metadata JSONB field
What to Store:
{
"embedding_distance": 0.13,
"requirement_embedding": [0.123, -0.456, 0.789, ...], // 768-dim vector
"test_case_embedding": [0.234, -0.567, 0.890, ...], // 768-dim vector
"matched_keywords": ["authentication", "login", "password"],
"model_version": "sentence-transformers/all-mpnet-base-v2",
"model_timestamp": "2026-02-15T08:00:00Z"
}
Recommended Models: - Sentence Transformers: all-mpnet-base-v2 (768-dim, general purpose) - OpenAI: text-embedding-ada-002 (1536-dim, high quality) - Custom: Fine-tuned models on domain-specific data
Storage Considerations: - JSONB is efficient for storing arrays (embeddings) - Can be indexed with GIN indexes for metadata queries - For vector similarity search at scale, consider pgvector extension
2. Similarity Scoring¶
Field: similarity_score (FLOAT, 0.0-1.0)
Interpretation:
| Score Range | Confidence Level | UI Display | Action |
|---|---|---|---|
| 0.90 - 1.00 | Very High | 🟢 Green | Auto-suggest with high priority |
| 0.75 - 0.89 | High | 🟡 Yellow | Strong suggestion |
| 0.60 - 0.74 | Moderate | 🟠 Orange | Review recommended |
| 0.00 - 0.59 | Low | 🔴 Red | Low priority or skip |
Calculation Methods:
-
Cosine Similarity (most common):
-
Euclidean Distance (convert to 0-1):
-
Dot Product (normalized):
3. Suggestion Methods¶
Field: suggestion_method enum
semantic_similarity¶
- Description: Uses NLP embeddings and cosine similarity
- Best For: Natural language requirements and test cases
- Algorithm:
- Generate embeddings for requirement description
- Generate embeddings for test case description
- Calculate cosine similarity
- Threshold at 0.60+ for suggestions
keyword_match¶
- Description: TF-IDF or keyword extraction with overlap scoring
- Best For: Structured requirements with specific terminology
- Algorithm:
- Extract keywords from requirement (noun phrases, technical terms)
- Extract keywords from test case
- Calculate Jaccard similarity or weighted overlap
- Threshold at 0.50+ for suggestions
heuristic¶
- Description: Rule-based matching (module, tags, external_id patterns)
- Best For: Structured data with consistent naming conventions
- Algorithm:
- Match on
modulefield (if same module, +0.3 score) - Match on
tagsoverlap (percentage of shared tags) - Match on
priority(same priority, +0.1 score) - Combine scores with weights
hybrid¶
- Description: Combines multiple methods with weighted average
- Best For: Maximum accuracy across diverse data
- Algorithm:
- Run all three methods
- Weighted average:
0.6 * semantic + 0.3 * keyword + 0.1 * heuristic - Threshold at 0.65+ for suggestions
4. Suggestion Workflow¶
stateDiagram-v2
[*] --> Generate: AI runs batch job
Generate --> Pending: Create LinkSuggestion
Pending --> Accepted: User approves
Pending --> Rejected: User declines
Pending --> Expired: TTL exceeded (30 days)
Accepted --> CreateLink: Create RequirementTestCaseLink
CreateLink --> [*]
Rejected --> Archive: Store for training
Archive --> [*]
Expired --> Archive: Remove from active suggestions
Step-by-Step Process:
-
Generation Phase (Batch Job):
# Pseudo-code for requirement in requirements: req_embedding = model.encode(requirement.description) for test_case in test_cases: tc_embedding = model.encode(test_case.description) score = cosine_similarity(req_embedding, tc_embedding) if score >= threshold: suggestion = LinkSuggestion( requirement_id=requirement.id, test_case_id=test_case.id, similarity_score=score, suggestion_method="semantic_similarity", status="pending" ) db.add(suggestion) -
Review Phase (User Interface):
- Display suggestions sorted by
similarity_scoreDESC - Show requirement and test case side-by-side
- User clicks "Accept" or "Reject"
-
Optionally collect
feedbacktext -
Acceptance Phase:
- Update suggestion:
status = "accepted",reviewed_at = NOW(),reviewed_by = user -
Create link:
-
Rejection Phase:
- Update suggestion:
status = "rejected",reviewed_at = NOW(),reviewed_by = user - Store
feedbackif provided -
Use for model retraining (negative examples)
-
Expiration Phase (Cleanup Job):
- Find suggestions with
status = "pending"ANDcreated_at < NOW() - 30 days - Update:
status = "expired" - Optionally archive to separate table
5. Confidence Levels for UI Display¶
Visual Indicators:
🟢 High Confidence (0.85+)
"Strong match based on semantic similarity"
→ Show at top of list
→ Consider "quick accept" button
🟡 Medium-High (0.75-0.84)
"Good match, review recommended"
→ Show in main list
→ Standard accept/reject flow
🟠 Medium (0.65-0.74)
"Possible match, manual review needed"
→ Show in expanded list
→ Add warning icon
🔴 Low (0.60-0.64)
"Weak match, may not be relevant"
→ Show in "More suggestions" section
→ De-emphasize visually
UI Component Example:
<SuggestionCard
requirement={req}
testCase={tc}
score={0.87}
method="semantic_similarity"
reason="High semantic similarity: matched concepts include authentication, login, user credentials"
onAccept={() => acceptSuggestion(suggestionId)}
onReject={() => rejectSuggestion(suggestionId)}
confidence={getConfidenceLevel(0.87)} // Returns "high"
/>
Batch Actions: - "Accept all high-confidence suggestions (0.85+)" - "Reject all low-confidence suggestions (< 0.65)" - "Review medium suggestions"
6. Model Performance Tracking¶
Metrics to Calculate:
-- Acceptance rate by score range
SELECT
CASE
WHEN similarity_score >= 0.85 THEN 'High'
WHEN similarity_score >= 0.75 THEN 'Medium-High'
WHEN similarity_score >= 0.65 THEN 'Medium'
ELSE 'Low'
END as confidence_level,
COUNT(*) as total_suggestions,
SUM(CASE WHEN status = 'accepted' THEN 1 ELSE 0 END) as accepted,
ROUND(100.0 * SUM(CASE WHEN status = 'accepted' THEN 1 ELSE 0 END) / COUNT(*), 2) as acceptance_rate
FROM link_suggestions
WHERE status IN ('accepted', 'rejected')
GROUP BY confidence_level;
Use Results: - Adjust score thresholds - Tune model weights (for hybrid method) - Retrain models on accepted/rejected examples - A/B test different models
Data Flow Diagrams¶
1. Manual Link Creation Flow¶
sequenceDiagram
actor User
participant UI
participant API
participant DB
User->>UI: Select Requirement REQ-001
UI->>API: GET /requirements/REQ-001
API->>DB: SELECT * FROM requirements WHERE id=...
DB-->>API: Requirement data
API-->>UI: Requirement details
User->>UI: Browse/Search Test Cases
UI->>API: GET /test-cases?search=login
API->>DB: SELECT * FROM test_cases WHERE title LIKE...
DB-->>API: Test case list
API-->>UI: Display test cases
User->>UI: Select TC-001, click "Create Link"
UI->>API: POST /links {requirement_id, test_case_id, link_type}
API->>DB: Check unique constraint
DB-->>API: No duplicate found
API->>DB: INSERT INTO requirement_test_case_links
DB-->>API: Link created
API-->>UI: Success response
UI->>User: Show success message + updated traceability
Key Points: - User has full control - link_source = "manual" - confidence_score = 1.0 - Instant creation (no review needed)
2. AI Suggestion Flow¶
sequenceDiagram
actor Admin
participant BatchJob
participant AI_Service
participant DB
actor User
participant UI
participant API
Admin->>BatchJob: Trigger AI suggestion job
BatchJob->>DB: SELECT * FROM requirements WHERE status != 'closed'
DB-->>BatchJob: Active requirements
BatchJob->>DB: SELECT * FROM test_cases WHERE status != 'deprecated'
DB-->>BatchJob: Active test cases
loop For each requirement
BatchJob->>AI_Service: Generate embedding(requirement.description)
AI_Service-->>BatchJob: req_embedding
loop For each test case
BatchJob->>AI_Service: Generate embedding(test_case.description)
AI_Service-->>BatchJob: tc_embedding
BatchJob->>BatchJob: Calculate similarity score
alt Score >= threshold (0.60)
BatchJob->>DB: INSERT INTO link_suggestions
DB-->>BatchJob: Suggestion created
end
end
end
BatchJob->>Admin: Job complete (X suggestions created)
Note over User,API: Later... User reviews suggestions
User->>UI: Navigate to "Pending Suggestions"
UI->>API: GET /suggestions?status=pending&sort=score:desc
API->>DB: SELECT * FROM link_suggestions WHERE status='pending' ORDER BY similarity_score DESC
DB-->>API: Pending suggestions
API-->>UI: Display suggestions with scores
User->>UI: Review suggestion #1 (score: 0.87)
UI->>User: Show req + test case side-by-side
User->>UI: Click "Accept"
UI->>API: POST /suggestions/{id}/accept {feedback}
API->>DB: UPDATE link_suggestions SET status='accepted', reviewed_at=NOW()
DB-->>API: Suggestion updated
API->>DB: INSERT INTO requirement_test_case_links (link_source='ai_confirmed')
DB-->>API: Link created
API-->>UI: Success
UI->>User: Show success + remove from pending list
Key Points: - AI job runs asynchronously (batch or scheduled) - Suggestions are pending until reviewed - User makes final decision - Accepted suggestions create links with link_source = "ai_confirmed"
3. Traceability Matrix Generation Flow¶
sequenceDiagram
actor User
participant UI
participant API
participant DB
User->>UI: Request Traceability Matrix
UI->>API: GET /reports/traceability-matrix?module=Authentication
API->>DB: Query requirements and links
Note over API,DB: SELECT r.*, <br/>COUNT(l.id) as link_count<br/>FROM requirements r<br/>LEFT JOIN requirement_test_case_links l ON r.id = l.requirement_id<br/>WHERE r.module = 'Authentication'<br/>GROUP BY r.id
DB-->>API: Requirements with link counts
API->>DB: Query detailed links
Note over API,DB: SELECT l.*, r.external_id as req_id, t.external_id as tc_id, t.status<br/>FROM requirement_test_case_links l<br/>JOIN requirements r ON l.requirement_id = r.id<br/>JOIN test_cases t ON l.test_case_id = t.id<br/>WHERE r.module = 'Authentication'
DB-->>API: Detailed link data
API->>API: Build matrix structure
Note over API: {<br/> "REQ-001": {<br/> "test_cases": ["TC-001", "TC-005"],<br/> "coverage": "covered",<br/> "passed_tests": 2<br/> },<br/> "REQ-002": {<br/> "test_cases": [],<br/> "coverage": "not_covered"<br/> }<br/>}
API-->>UI: Traceability matrix data
UI->>User: Display matrix with coverage status
alt Generate Report
User->>UI: Click "Export to Excel"
UI->>User: Download traceability_matrix.xlsx
end
Key Points: - Uses LEFT JOIN to find requirements without test coverage - Aggregates link counts and test statuses - Can filter by module, priority, status - Supports export formats (JSON, Excel, PDF)
Common Queries and Use Cases¶
1. Get All Test Cases Linked to a Requirement¶
Use Case: Display all test cases that validate a specific requirement
SQL:
SELECT
tc.id,
tc.external_id,
tc.title,
tc.status,
tc.automation_status,
l.link_type,
l.confidence_score,
l.created_at
FROM test_cases tc
INNER JOIN requirement_test_case_links l ON tc.id = l.test_case_id
WHERE l.requirement_id = '550e8400-e29b-41d4-a716-446655440000'
ORDER BY l.link_type, tc.priority DESC;
SQLAlchemy (Python):
from sqlalchemy.orm import selectinload
requirement = session.query(Requirement).options(
selectinload(Requirement.links).selectinload(RequirementTestCaseLink.test_case)
).filter(
Requirement.id == requirement_id
).first()
for link in requirement.links:
print(f"{link.test_case.external_id}: {link.test_case.title} [{link.link_type}]")
API Endpoint: GET /requirements/{id}/test-cases
2. Find Requirements with No Test Coverage¶
Use Case: Identify requirements that need test cases (coverage gap analysis)
SQL:
SELECT
r.id,
r.external_id,
r.title,
r.priority,
r.status,
r.module
FROM requirements r
LEFT JOIN requirement_test_case_links l ON r.id = l.requirement_id
WHERE l.id IS NULL
AND r.status IN ('approved', 'implemented', 'tested')
ORDER BY r.priority, r.created_at;
SQLAlchemy (Python):
from sqlalchemy import and_, or_
from sqlalchemy.orm import outerjoin
uncovered_requirements = session.query(Requirement).outerjoin(
RequirementTestCaseLink
).filter(
RequirementTestCaseLink.id.is_(None),
Requirement.status.in_(['approved', 'implemented', 'tested'])
).order_by(
Requirement.priority, Requirement.created_at
).all()
API Endpoint: GET /requirements?coverage=none&status=approved,implemented,tested
3. Get Pending AI Suggestions Sorted by Confidence¶
Use Case: Display pending suggestions for user review, highest confidence first
SQL:
SELECT
s.id,
s.similarity_score,
s.suggestion_method,
s.suggestion_reason,
s.created_at,
r.external_id as req_id,
r.title as req_title,
tc.external_id as tc_id,
tc.title as tc_title
FROM link_suggestions s
INNER JOIN requirements r ON s.requirement_id = r.id
INNER JOIN test_cases tc ON s.test_case_id = tc.id
WHERE s.status = 'pending'
AND s.similarity_score >= 0.65
ORDER BY s.similarity_score DESC, s.created_at DESC
LIMIT 50;
SQLAlchemy (Python):
from sqlalchemy.orm import joinedload
pending_suggestions = session.query(LinkSuggestion).options(
joinedload(LinkSuggestion.requirement),
joinedload(LinkSuggestion.test_case)
).filter(
LinkSuggestion.status == SuggestionStatus.PENDING,
LinkSuggestion.similarity_score >= 0.65
).order_by(
LinkSuggestion.similarity_score.desc(),
LinkSuggestion.created_at.desc()
).limit(50).all()
API Endpoint: GET /suggestions?status=pending&min_score=0.65&sort=score:desc&limit=50
4. Generate Traceability Matrix Data¶
Use Case: Create a comprehensive requirement-to-test case traceability report
SQL:
SELECT
r.external_id as requirement_id,
r.title as requirement_title,
r.priority as requirement_priority,
r.status as requirement_status,
STRING_AGG(tc.external_id, ', ' ORDER BY tc.external_id) as linked_test_cases,
COUNT(l.id) as test_case_count,
COUNT(CASE WHEN tc.status = 'passed' THEN 1 END) as passed_count,
COUNT(CASE WHEN tc.status = 'failed' THEN 1 END) as failed_count,
CASE
WHEN COUNT(l.id) = 0 THEN 'Not Covered'
WHEN COUNT(CASE WHEN tc.status = 'passed' THEN 1 END) = COUNT(l.id) THEN 'Fully Tested'
WHEN COUNT(CASE WHEN tc.status = 'failed' THEN 1 END) > 0 THEN 'Issues Found'
ELSE 'Partial Coverage'
END as coverage_status
FROM requirements r
LEFT JOIN requirement_test_case_links l ON r.id = l.requirement_id
LEFT JOIN test_cases tc ON l.test_case_id = tc.id
WHERE r.status != 'closed'
GROUP BY r.id, r.external_id, r.title, r.priority, r.status
ORDER BY r.priority, r.external_id;
API Endpoint: GET /reports/traceability-matrix?format=json
5. Find Orphaned Test Cases (No Requirement Links)¶
Use Case: Identify test cases that aren't linked to any requirements
SQL:
SELECT
tc.id,
tc.external_id,
tc.title,
tc.type,
tc.status,
tc.module,
tc.created_at
FROM test_cases tc
LEFT JOIN requirement_test_case_links l ON tc.id = l.test_case_id
WHERE l.id IS NULL
AND tc.status NOT IN ('deprecated', 'draft')
ORDER BY tc.priority, tc.created_at;
SQLAlchemy (Python):
orphaned_test_cases = session.query(TestCase).outerjoin(
RequirementTestCaseLink
).filter(
RequirementTestCaseLink.id.is_(None),
TestCase.status.notin_(['deprecated', 'draft'])
).order_by(
TestCase.priority, TestCase.created_at
).all()
API Endpoint: GET /test-cases?linked=false&status=!deprecated,!draft
6. Get Test Coverage Percentage by Module¶
Use Case: Calculate test coverage metrics per module
SQL:
WITH module_stats AS (
SELECT
r.module,
COUNT(DISTINCT r.id) as total_requirements,
COUNT(DISTINCT CASE WHEN l.id IS NOT NULL THEN r.id END) as covered_requirements
FROM requirements r
LEFT JOIN requirement_test_case_links l ON r.id = l.requirement_id
WHERE r.status IN ('approved', 'implemented', 'tested')
GROUP BY r.module
)
SELECT
module,
total_requirements,
covered_requirements,
ROUND(100.0 * covered_requirements / NULLIF(total_requirements, 0), 2) as coverage_percentage
FROM module_stats
ORDER BY coverage_percentage, module;
API Endpoint: GET /reports/coverage-by-module
7. Find Duplicate or Similar Links¶
Use Case: Identify potential duplicate links (same requirement-testcase pair)
SQL:
-- This should return 0 rows due to unique constraint, but useful for debugging
SELECT
requirement_id,
test_case_id,
COUNT(*) as link_count
FROM requirement_test_case_links
GROUP BY requirement_id, test_case_id
HAVING COUNT(*) > 1;
8. Get AI Suggestion Acceptance Rate¶
Use Case: Track AI model performance and accuracy
SQL:
SELECT
suggestion_method,
CASE
WHEN similarity_score >= 0.85 THEN 'Very High (0.85+)'
WHEN similarity_score >= 0.75 THEN 'High (0.75-0.84)'
WHEN similarity_score >= 0.65 THEN 'Medium (0.65-0.74)'
ELSE 'Low (< 0.65)'
END as confidence_level,
COUNT(*) as total_suggestions,
SUM(CASE WHEN status = 'accepted' THEN 1 ELSE 0 END) as accepted,
SUM(CASE WHEN status = 'rejected' THEN 1 ELSE 0 END) as rejected,
ROUND(100.0 * SUM(CASE WHEN status = 'accepted' THEN 1 ELSE 0 END) / COUNT(*), 2) as acceptance_rate
FROM link_suggestions
WHERE status IN ('accepted', 'rejected')
GROUP BY suggestion_method, confidence_level
ORDER BY suggestion_method, confidence_level;
API Endpoint: GET /analytics/suggestion-performance
9. Get Recent Activity (Audit Trail)¶
Use Case: Show recent traceability activity for dashboard
SQL:
-- Recent links created
SELECT
'link_created' as activity_type,
l.created_at as activity_time,
l.created_by as user,
r.external_id || ' → ' || tc.external_id as description,
l.link_source
FROM requirement_test_case_links l
JOIN requirements r ON l.requirement_id = r.id
JOIN test_cases tc ON l.test_case_id = tc.id
WHERE l.created_at >= NOW() - INTERVAL '7 days'
UNION ALL
-- Recent suggestions reviewed
SELECT
'suggestion_reviewed' as activity_type,
s.reviewed_at as activity_time,
s.reviewed_by as user,
r.external_id || ' → ' || tc.external_id as description,
s.status::text
FROM link_suggestions s
JOIN requirements r ON s.requirement_id = r.id
JOIN test_cases tc ON s.test_case_id = tc.id
WHERE s.reviewed_at >= NOW() - INTERVAL '7 days'
AND s.status IN ('accepted', 'rejected')
ORDER BY activity_time DESC
LIMIT 50;
API Endpoint: GET /activity?days=7&limit=50
10. Search Requirements and Test Cases by Keyword¶
Use Case: Full-text search across requirements and test cases
SQL (with basic LIKE):
SELECT 'requirement' as type, id, external_id, title, description
FROM requirements
WHERE title ILIKE '%authentication%' OR description ILIKE '%authentication%'
UNION ALL
SELECT 'test_case' as type, id, external_id, title, description
FROM test_cases
WHERE title ILIKE '%authentication%' OR description ILIKE '%authentication%'
ORDER BY type, external_id
LIMIT 50;
SQL (with PostgreSQL full-text search):
-- Add tsvector columns and indexes for better performance:
-- ALTER TABLE requirements ADD COLUMN search_vector tsvector;
-- CREATE INDEX idx_requirements_search ON requirements USING GIN(search_vector);
SELECT id, external_id, title, ts_rank(search_vector, query) as rank
FROM requirements, to_tsquery('english', 'authentication & login') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 50;
API Endpoint: GET /search?q=authentication&type=requirement,test_case
Validation Rules¶
UUID Format Validation¶
All id fields must be valid UUIDs (version 4 recommended):
- Format:
xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxxwhere x is hexadecimal (0-9, a-f) and y is 8, 9, a, or b - Example:
550e8400-e29b-41d4-a716-446655440000 - Validation (Python):
- Database: Enforced by UUID type in PostgreSQL
Similarity Score Range¶
Fields: similarity_score (LinkSuggestion), confidence_score (RequirementTestCaseLink)
- Range: 0.0 to 1.0 (inclusive)
- Precision: Typically 2-3 decimal places
- Validation (SQL):
- Validation (Python):
String Length Limits¶
| Field | Max Length | Validation |
|---|---|---|
title | 500 chars | VARCHAR(500), NOT NULL |
external_id | 100 chars | VARCHAR(100), UNIQUE |
module | 100 chars | VARCHAR(100) |
source_system | 50 chars | VARCHAR(50) |
created_by, reviewed_by, confirmed_by | 100 chars | VARCHAR(100) |
description, notes, feedback, suggestion_reason | Unlimited | TEXT |
Python Validation:
from pydantic import BaseModel, Field
class RequirementCreate(BaseModel):
title: str = Field(min_length=1, max_length=500)
external_id: str | None = Field(None, max_length=100)
description: str = Field(min_length=1)
Required vs Optional Fields¶
Always Required (NOT NULL)¶
- All
idprimary keys titleanddescription(Requirement, TestCase)type,priority,status(Requirement, TestCase)requirement_id,test_case_id(Links, Suggestions)link_type,link_source(RequirementTestCaseLink)similarity_score,suggestion_method,status(LinkSuggestion)created_at(all entities)
Optional (NULL allowed)¶
external_id(can be auto-generated or omitted)module,tags,custom_metadata(flexible fields)source_system,source_url(only if from external source)created_by,reviewed_by,confirmed_by(optional user tracking)confidence_score(only relevant for AI-generated links)notes,feedback,suggestion_reason(optional text fields)confirmed_at,reviewed_at(timestamp only when action occurs)- Test case specific:
steps,preconditions,postconditions,test_data,execution_time_minutes
Unique Constraints¶
| Table | Constraint | Columns | Purpose |
|---|---|---|---|
| requirements | external_id | UNIQUE | Prevent duplicate imports |
| test_cases | external_id | UNIQUE | Prevent duplicate imports |
| requirement_test_case_links | uq_requirement_test_case | (requirement_id, test_case_id) | Prevent duplicate links |
Database Enforcement:
-- Unique constraint on links
ALTER TABLE requirement_test_case_links
ADD CONSTRAINT uq_requirement_test_case
UNIQUE (requirement_id, test_case_id);
-- Unique index on external_id
CREATE UNIQUE INDEX idx_requirements_external_id_unique ON requirements(external_id);
Application Handling:
from sqlalchemy.exc import IntegrityError
try:
link = RequirementTestCaseLink(requirement_id=req_id, test_case_id=tc_id)
session.add(link)
session.commit()
except IntegrityError:
session.rollback()
raise ValidationError("Link already exists between this requirement and test case")
Enum Value Constraints¶
All enum fields must use defined values:
RequirementType¶
- Valid:
functional,non_functional,technical - Invalid: Any other string
PriorityLevel¶
- Valid:
critical,high,medium,low - Invalid: Any other string
RequirementStatus¶
- Valid:
draft,approved,implemented,tested,closed - Invalid: Any other string
TestCaseType¶
- Valid:
functional,integration,performance,security,ui,regression - Invalid: Any other string
TestCaseStatus¶
- Valid:
draft,ready,executing,passed,failed,blocked,deprecated - Invalid: Any other string
AutomationStatus¶
- Valid:
manual,automated,automatable - Invalid: Any other string
LinkType¶
- Valid:
covers,verifies,validates,related - Invalid: Any other string
LinkSource¶
- Valid:
manual,ai_suggested,ai_confirmed,imported - Invalid: Any other string
SuggestionMethod¶
- Valid:
semantic_similarity,keyword_match,heuristic,hybrid - Invalid: Any other string
SuggestionStatus¶
- Valid:
pending,accepted,rejected,expired - Invalid: Any other string
Database Enforcement (PostgreSQL):
CREATE TYPE requirement_status AS ENUM ('draft', 'approved', 'implemented', 'tested', 'closed');
-- PostgreSQL will reject any value not in the enum
Application Validation (Python):
from enum import Enum
class RequirementStatus(str, Enum):
DRAFT = "draft"
APPROVED = "approved"
IMPLEMENTED = "implemented"
TESTED = "tested"
CLOSED = "closed"
# Pydantic automatically validates enum values
class RequirementCreate(BaseModel):
status: RequirementStatus = RequirementStatus.DRAFT
JSONB Field Validation¶
While JSONB fields are flexible, validate structure when needed:
Test Case Steps¶
# Expected structure: {"1": "step text", "2": "step text", ...}
def validate_steps(steps: dict) -> bool:
if not isinstance(steps, dict):
return False
for key in steps.keys():
if not key.isdigit(): # Keys should be numeric strings
return False
return True
Test Data¶
# Should be a dictionary, but structure is flexible
def validate_test_data(data: dict) -> bool:
return isinstance(data, dict)
Custom Metadata¶
# Any valid JSON object
def validate_custom_metadata(metadata: dict) -> bool:
return isinstance(metadata, dict)
Migration and Evolution¶
Adding New Requirement Types¶
When you need to add a new requirement type (e.g., compliance):
1. Update Python Enum¶
File: backend/app/models/requirement.py
class RequirementType(str, enum.Enum):
FUNCTIONAL = "functional"
NON_FUNCTIONAL = "non_functional"
TECHNICAL = "technical"
COMPLIANCE = "compliance" # NEW
2. Create Alembic Migration¶
File: backend/alembic/versions/XXX_add_compliance_requirement_type.py
def upgrade():
# PostgreSQL: Add new enum value
op.execute("ALTER TYPE requirement_type ADD VALUE IF NOT EXISTS 'compliance'")
def downgrade():
# Note: PostgreSQL doesn't support removing enum values easily
# Consider creating a new enum type if rollback is needed
pass
3. Update API Schemas¶
File: backend/app/schemas/requirement.py
class RequirementCreate(BaseModel):
type: Literal["functional", "non_functional", "technical", "compliance"]
4. Update Documentation¶
- Update this file's enum table
- Update API documentation
- Update UI dropdowns
Adding New Test Case Types¶
Same process as requirement types:
- Update
TestCaseTypeenum inbackend/app/models/test_case.py - Create Alembic migration:
ALTER TYPE test_case_type ADD VALUE 'new_type' - Update API schemas
- Update documentation
Adding New Link Types¶
Same process for LinkType enum:
class LinkType(str, enum.Enum):
COVERS = "covers"
VERIFIES = "verifies"
VALIDATES = "validates"
RELATED = "related"
DEPENDS_ON = "depends_on" # NEW: Dependency relationship
Versioning Strategy¶
Simple Version Field (Current Implementation)¶
- Field:
version(INTEGER, default 1) - Usage: Increment when requirement/test case significantly changes
- Limitations: No history, just a number
Future: Full History Tracking¶
Add history tables:
CREATE TABLE requirement_history (
history_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
requirement_id UUID NOT NULL REFERENCES requirements(id) ON DELETE CASCADE,
version INTEGER NOT NULL,
title VARCHAR(500) NOT NULL,
description TEXT NOT NULL,
-- ... all other fields ...
changed_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
changed_by VARCHAR(100),
change_reason TEXT
);
Benefits: - Full audit trail - Restore previous versions - Compare changes over time - Regulatory compliance (FDA 21 CFR Part 11)
Handling Breaking Changes¶
Schema Changes That Require Data Migration¶
Example: Change module from VARCHAR(100) to a foreign key to a new modules table
-
Create new modules table:
-
Populate modules from existing data:
-
Add new foreign key column:
-
Migrate data:
-
Drop old column (after validation):
API Versioning¶
When making breaking changes to APIs:
- Version the API:
/v1/requirements→/v2/requirements - Support both versions: Run v1 and v2 endpoints simultaneously
- Deprecation period: Give clients 6-12 months to migrate
- Sunset v1: Remove old endpoints after migration period
Adding Custom Fields via JSONB¶
No migration needed! Just use custom_metadata:
# Organization A adds custom field
requirement.custom_metadata = {
"regulatory_tag": "FDA-21-CFR-Part-11",
"validation_status": "IQ/OQ complete"
}
# Organization B adds different custom fields
requirement.custom_metadata = {
"jira_epic": "EPIC-123",
"sprint_points": 8
}
Index for Performance (if querying frequently):
-- Index specific JSONB keys
CREATE INDEX idx_requirements_metadata_regulatory
ON requirements USING GIN ((custom_metadata -> 'regulatory_tag'));
Multi-Tenant Strategy (Future)¶
To support multiple organizations:
-
Add organization table:
-
Add organization_id to all entities:
-
Add row-level security (RLS):
Performance Considerations¶
Existing Indexes¶
The schema includes strategic indexes for optimal query performance:
Requirements Table Indexes¶
| Index Name | Columns | Type | Purpose |
|---|---|---|---|
idx_requirements_external_id | external_id | B-tree | Fast lookups by external ID |
idx_requirements_title | title | B-tree | Search and autocomplete |
idx_requirements_status | status | B-tree | Filter by status (draft, approved, etc.) |
idx_requirements_module | module | B-tree | Module-based queries and reports |
| Primary Key | id | B-tree | Implicit index on UUID primary key |
Test Cases Table Indexes¶
| Index Name | Columns | Type | Purpose |
|---|---|---|---|
idx_test_cases_external_id | external_id | B-tree | Fast lookups by external ID |
idx_test_cases_title | title | B-tree | Search and autocomplete |
idx_test_cases_status | status | B-tree | Filter by status (passed, failed, etc.) |
idx_test_cases_module | module | B-tree | Module-based queries and reports |
| Primary Key | id | B-tree | Implicit index on UUID primary key |
Requirement-Test Case Links Indexes¶
| Index Name | Columns | Type | Purpose |
|---|---|---|---|
idx_links_requirement_id | requirement_id | B-tree | Find all test cases for a requirement |
idx_links_test_case_id | test_case_id | B-tree | Find all requirements for a test case |
uq_requirement_test_case | (requirement_id, test_case_id) | Unique | Prevent duplicates, also serves as index |
| Primary Key | id | B-tree | Implicit index on UUID primary key |
Link Suggestions Indexes¶
| Index Name | Columns | Type | Purpose |
|---|---|---|---|
idx_suggestions_requirement_id | requirement_id | B-tree | Find suggestions for a requirement |
idx_suggestions_test_case_id | test_case_id | B-tree | Find suggestions for a test case |
idx_suggestions_status | status | B-tree | Filter pending/accepted/rejected |
| Primary Key | id | B-tree | Implicit index on UUID primary key |
Expected Query Patterns¶
High-Frequency Queries (Hot Paths)¶
-
Get requirement by external_id:
O(log n)via index -
Get test cases for requirement:
O(log n + k)via index -
Get pending suggestions:
O(log n + k)via index -
Search by title:
O(log n + k)via index
Medium-Frequency Queries¶
-
Coverage gap analysis (requirements without tests): Uses LEFT JOIN
-
Traceability matrix: Multiple joins with aggregation
Optimization for Large Datasets¶
At 10,000 Requirements and 20,000 Test Cases¶
Expected Performance (with indexes): - Single requirement lookup: < 1ms - Get test cases for requirement: < 5ms - Generate traceability matrix: < 100ms - Search by keyword: < 50ms
At 100,000 Requirements and 500,000 Test Cases¶
Recommended Optimizations:
-
Add Composite Indexes:
-
Materialized Views for Reports:
CREATE MATERIALIZED VIEW mv_traceability_matrix AS SELECT r.id as requirement_id, r.external_id, COUNT(l.id) as test_case_count, COUNT(CASE WHEN tc.status = 'passed' THEN 1 END) as passed_count FROM requirements r LEFT JOIN requirement_test_case_links l ON r.id = l.requirement_id LEFT JOIN test_cases tc ON l.test_case_id = tc.id GROUP BY r.id, r.external_id; -- Refresh periodically REFRESH MATERIALIZED VIEW CONCURRENTLY mv_traceability_matrix; -
Partition Large Tables (PostgreSQL 10+):
-- Partition by module CREATE TABLE requirements ( -- ... fields ... ) PARTITION BY LIST (module); CREATE TABLE requirements_auth PARTITION OF requirements FOR VALUES IN ('Authentication', 'Authorization'); CREATE TABLE requirements_payment PARTITION OF requirements FOR VALUES IN ('Payment', 'Checkout'); -
Full-Text Search (instead of ILIKE):
-- Add tsvector column ALTER TABLE requirements ADD COLUMN search_vector tsvector; -- Create GIN index CREATE INDEX idx_requirements_fts ON requirements USING GIN(search_vector); -- Update trigger to maintain search_vector CREATE TRIGGER requirements_search_update BEFORE INSERT OR UPDATE ON requirements FOR EACH ROW EXECUTE FUNCTION tsvector_update_trigger(search_vector, 'pg_catalog.english', title, description); -- Query using full-text search SELECT * FROM requirements WHERE search_vector @@ to_tsquery('english', 'authentication & login');
Database Configuration Tuning¶
PostgreSQL Configuration (for production)¶
File: postgresql.conf
# Memory Settings
shared_buffers = 4GB # 25% of system RAM
effective_cache_size = 12GB # 75% of system RAM
work_mem = 64MB # Per-query memory
maintenance_work_mem = 1GB # For CREATE INDEX, VACUUM
# Query Planner
random_page_cost = 1.1 # For SSD storage (default is 4.0 for HDD)
effective_io_concurrency = 200 # For SSD storage
# Write Performance
wal_buffers = 16MB
checkpoint_completion_target = 0.9
max_wal_size = 4GB
# Logging (for query optimization)
log_min_duration_statement = 100 # Log queries > 100ms
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h '
log_checkpoints = on
log_connections = on
log_disconnections = on
log_lock_waits = on
Connection Pooling¶
Use PgBouncer or Pgpool-II for connection pooling:
# pgbouncer.ini
[databases]
bgstm = host=localhost port=5432 dbname=bgstm
[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
reserve_pool_size = 5
reserve_pool_timeout = 3
Caching Strategy¶
Application-Level Caching¶
Redis for frequently accessed data:
import redis
from functools import lru_cache
cache = redis.Redis(host='localhost', port=6379, db=0)
def get_requirement_with_cache(external_id: str):
# Check cache first
cached = cache.get(f"req:{external_id}")
if cached:
return json.loads(cached)
# Query database
requirement = session.query(Requirement).filter(
Requirement.external_id == external_id
).first()
# Cache for 5 minutes
cache.setex(
f"req:{external_id}",
300,
json.dumps(requirement.to_dict())
)
return requirement
What to Cache: - Individual requirements/test cases (by external_id) - Traceability matrix (refresh every 5 minutes) - Module lists - Enum values - User permissions
What NOT to Cache: - Pending suggestions (needs real-time updates) - Active links (changes frequently) - Audit trail data
Partitioning Considerations for Scale¶
When to Partition¶
Consider partitioning when: - Table size exceeds 100GB - Query performance degrades despite indexes - Data has natural partitions (by date, module, organization)
Partition Strategies¶
1. List Partitioning (by module):
CREATE TABLE requirements (
id UUID,
module VARCHAR(100),
-- ... other fields
) PARTITION BY LIST (module);
CREATE TABLE requirements_auth PARTITION OF requirements
FOR VALUES IN ('Authentication', 'Authorization', 'Security');
CREATE TABLE requirements_payment PARTITION OF requirements
FOR VALUES IN ('Payment', 'Checkout', 'Orders');
CREATE TABLE requirements_other PARTITION OF requirements
DEFAULT; -- Catch-all for other modules
2. Range Partitioning (by date):
CREATE TABLE link_suggestions (
id UUID,
created_at TIMESTAMP,
-- ... other fields
) PARTITION BY RANGE (created_at);
CREATE TABLE suggestions_2026_01 PARTITION OF link_suggestions
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE suggestions_2026_02 PARTITION OF link_suggestions
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
-- Archive old partitions
-- DROP TABLE suggestions_2025_01; -- Or move to archive storage
3. Hash Partitioning (by id):
CREATE TABLE test_cases (
id UUID,
-- ... other fields
) PARTITION BY HASH (id);
CREATE TABLE test_cases_p0 PARTITION OF test_cases
FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE test_cases_p1 PARTITION OF test_cases
FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE test_cases_p2 PARTITION OF test_cases
FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE test_cases_p3 PARTITION OF test_cases
FOR VALUES WITH (MODULUS 4, REMAINDER 3);
Benefits: - Parallel query execution across partitions - Faster bulk deletes (DROP PARTITION vs DELETE) - Better query performance (partition pruning) - Easier archival and backup strategies
Monitoring and Profiling¶
Key Metrics to Monitor¶
-
Query Performance:
-
Index Usage:
-
Table Bloat:
-
Connection Pooling:
- Monitor active connections
- Track connection wait times
- Alert on connection pool exhaustion
Use EXPLAIN ANALYZE¶
Before optimizing, profile your queries:
EXPLAIN ANALYZE
SELECT r.*, COUNT(l.id) as link_count
FROM requirements r
LEFT JOIN requirement_test_case_links l ON r.id = l.requirement_id
WHERE r.status = 'approved'
GROUP BY r.id;
Look for: - Seq Scan: Consider adding index - Nested Loop: Check if Hash Join would be faster - High cost: Query needs optimization
Scalability Targets¶
| Metric | Target | Notes |
|---|---|---|
| Requirements | 1,000,000+ | With partitioning and indexes |
| Test Cases | 5,000,000+ | With partitioning and indexes |
| Links | 10,000,000+ | Efficient junction table |
| Concurrent Users | 1,000+ | With connection pooling |
| Query Response Time (p95) | < 100ms | For indexed queries |
| Traceability Matrix Generation | < 5s | For 10,000 requirements |
| AI Batch Suggestion Job | < 1 hour | For 10,000 req × 20,000 TC pairs |
Appendix: Reference Implementation¶
Sample Data Loading¶
See backend/app/db/sample_data.py for the ShopFlow E-Commerce sample dataset:
- 5 Requirements (user authentication, product search, shopping cart, checkout, order tracking)
- 4 Test Cases covering various functionalities
- 5 Manual links between requirements and test cases
This sample data demonstrates: - All requirement and test case types - Different priority levels - Various statuses - JSONB structures (steps, test_data, custom_metadata) - Manual links with different link types
Code Examples¶
Creating a Requirement¶
from app.models.requirement import Requirement, RequirementType, PriorityLevel, RequirementStatus
requirement = Requirement(
external_id="REQ-123",
title="User Authentication System",
description="Implement secure user authentication...",
type=RequirementType.FUNCTIONAL,
priority=PriorityLevel.CRITICAL,
status=RequirementStatus.APPROVED,
module="Authentication",
tags=["security", "user-management"],
custom_metadata={"complexity": "high", "sprint": 5},
created_by="product_manager"
)
session.add(requirement)
session.commit()
Creating a Test Case¶
from app.models.test_case import TestCase, TestCaseType, TestCaseStatus, AutomationStatus
test_case = TestCase(
external_id="TC-456",
title="Verify User Login with Valid Credentials",
description="Test successful login flow...",
type=TestCaseType.FUNCTIONAL,
priority=PriorityLevel.CRITICAL,
status=TestCaseStatus.PASSED,
steps={
"1": "Navigate to login page",
"2": "Enter valid credentials",
"3": "Click login button",
"4": "Verify successful login"
},
automation_status=AutomationStatus.AUTOMATED,
created_by="qa_engineer"
)
session.add(test_case)
session.commit()
Creating a Manual Link¶
from app.models.link import RequirementTestCaseLink, LinkType, LinkSource
link = RequirementTestCaseLink(
requirement_id=requirement.id,
test_case_id=test_case.id,
link_type=LinkType.COVERS,
confidence_score=1.0,
link_source=LinkSource.MANUAL,
created_by="qa_engineer",
notes="Direct coverage of login functionality"
)
session.add(link)
session.commit()
Creating an AI Suggestion¶
from app.models.suggestion import LinkSuggestion, SuggestionMethod, SuggestionStatus
suggestion = LinkSuggestion(
requirement_id=requirement.id,
test_case_id=test_case.id,
similarity_score=0.87,
suggestion_method=SuggestionMethod.SEMANTIC_SIMILARITY,
suggestion_reason="High semantic similarity between requirement and test case descriptions",
suggestion_metadata={
"embedding_distance": 0.13,
"model_version": "sentence-transformers/all-mpnet-base-v2"
},
status=SuggestionStatus.PENDING
)
session.add(suggestion)
session.commit()
Accepting a Suggestion¶
# Update suggestion status
suggestion.status = SuggestionStatus.ACCEPTED
suggestion.reviewed_at = datetime.utcnow()
suggestion.reviewed_by = current_user.username
suggestion.feedback = "Correct match, good suggestion"
# Create corresponding link
link = RequirementTestCaseLink(
requirement_id=suggestion.requirement_id,
test_case_id=suggestion.test_case_id,
link_type=LinkType.COVERS,
confidence_score=suggestion.similarity_score,
link_source=LinkSource.AI_CONFIRMED,
created_by=current_user.username
)
session.add(link)
session.commit()
Summary¶
The BGSTM data model provides a robust, scalable foundation for AI-powered requirement-to-test case traceability. Key highlights:
✅ Four Core Entities: Requirements, TestCases, Links, Suggestions
✅ AI-Ready: Designed for ML integration with embeddings, scoring, and feedback loops
✅ Flexible: JSONB and array fields allow extensibility without schema changes
✅ Performant: Strategic indexes support sub-second queries at scale
✅ Traceable: Complete audit trails for compliance and accountability
✅ Extensible: Easy to add new types, fields, and relationships
Next Steps: 1. Review the model with your team 2. Load sample data using backend/app/db/sample_data.py 3. Implement API endpoints based on common query patterns 4. Develop the AI suggestion engine 5. Build the user interface for link management
For questions or contributions, see CONTRIBUTING.md.
Document Version: 2.0.0
Last Updated: February 2026
Maintained By: BGSTM Project Team