When a PKI has been untouched for over a decade, the question is often not if improvements are needed, but where to begin. That was the case for one of the largest rural lifestyle retailers in the U.S., founded in 1938 and now operating more than 2,000 stores across 49 states. Serving millions of customers with critical products for home, land, pet, and animal care, the company relies heavily on its digital backbone to maintain operational trust and service integrity.
Approach of PKI Design
As part of a broader initiative to strengthen its IT security posture, the organization initiated a detailed assessment of its Public Key Infrastructure (PKI). The assessment quickly revealed that the existing infrastructure was exposed to several risks, including unchecked CA validity periods, irregular CRL publication intervals, and private keys stored on software-based machines. These findings highlighted the need for a structured approach.
Rather than patching things reactively, a custom roadmap was developed to address core risks and reduce the overall attack surface within the existing PKI environment. However, instead of jumping straight into implementation, the organization opted for the most deliberate and strategic step of designing the PKI. This approach allowed them to carefully lay out the architecture of the new environment, clearly defining the role of the existing PKI, identifying the right starting point for the transition, and establishing a structured migration plan. The goal was to bring clarity, simplify management, and build a sustainable PKI, one equipped with the right technologies and aligned with modern security standards.
After all, you don’t build a bank vault without blueprints, so why build a PKI without a design for it first?
The PKI design was shaped through multiple in-depth working sessions and technical discussions with key stakeholders from across the organization. These sessions focused on establishing a clear trust hierarchy (Root and Issuing CAs), aligning the architecture with Active Directory Forest design, and streamlining certificate lifecycle management for various users, devices, and application scenarios. The resulting PKI architecture was customized to support the organization’s distributed AD domain structure, ensuring that each domain’s unique use cases and operational workflows were reflected in the final design.
Before diving into what the PKI design fixed, it is important to understand why a fix was needed in the first place and the specific challenges or limitations that were addressed in the workshops.
Challenges
The organization’s Public Key Infrastructure (PKI) had been operating on aging infrastructure, primarily hosted on Windows Server 2012 R2. Through in-depth workshop discussions and a comprehensive review of existing PKI policies, procedures, and standards, the following architectural and operational weaknesses were identified:
- The PKI trust chain was built around a single Root Certification Authority (CA) and multiple Issuing CAs. However, there was no formal backup or disaster recovery strategy in place for any of the CAs, leaving the environment vulnerable in the event of a CA failure or data center outage, the organization had no reliable path to restore certificate issuance.
- The Root CA and Issuing CA private keys were stored without the protection of a Hardware Security Module (HSM). As one Senior Security Engineer put it, “That’s like locking the crown jewels in a filing cabinet.” The absence of HSM protection significantly increased the risk of key theft or compromise.
- The Root CA certificate had a validity period of over 20 years, far exceeding CA/B Forum recommendations. Such long-lived certificates expand the potential blast radius in case of key compromise and limit the organization’s ability to adapt cryptographically over time.
- The Active Directory Forest was still operating on a legacy Windows Server version that had reached End of Life (EOL) and End of Support (EOS), which restricted integration with newer PKI capabilities, such as certificate auto-enrollment, modern cryptographic templates, and policy-based issuance controls.
- CRL validity period were inconsistently configured across the three Issuing CAs, ranging from more than 7 days for base CRLs and more than 24 hours for delta CRLs. This inconsistency created operational overhead and increased the risk of revocation-related errors going unnoticed without constant manual oversight.
- Multiple MDM platforms were being utilized to manage different device types, with Android, macOS, and iOS devices spread across distinct systems. This fragmented MDM setup introduced unnecessary complexity and inefficiencies in certificate deployment and endpoint trust.
- Within the security settings of all three Issuing CAs, multiple user accounts and groups had been granted certificate issuance and management rights, but without any clear role mapping or justification. This lack of access control governance introduced risk of misuse or misconfiguration.
- The PKI environment was assessed to have a weak governance framework. Certificate templates lacked proper documentation, defined ownership, and clear purpose mapping. In the absence of centralized oversight, the environment had accumulated redundant, outdated, and unused templates, increasing the likelihood of issuing misconfigured certificates.
Solution
Following a series of whiteboarding sessions and technical workshops with key stakeholders, a new PKI architecture was designed to address the core challenges and establish a strong foundation for future security, scalability, and operational efficiency. The decisions outlined below reflect the outcomes of those sessions and collectively formed the basis of the organization’s PKI modernization strategy.
- To move away from outdated infrastructure, it was decided that Windows Server 2022 or higher would be used to pilot the new PKI setup. This approach allowed for the evaluation of compatibility with modern security features and validation of the architecture in a controlled environment prior to full-scale implementation.
- Two dedicated Issuing CAs, each aligned to distinct use cases, were planned to be hosted in the data center. This structure provided logical segmentation and supported tailored certificate issuance policies.
- In addition to the production environment, it was also decided to establish corresponding Issuing CAs in a Disaster Recovery (DR) environment, designed to ensure high availability. These DR CAs were intended for failover scenarios and were not issued certificates during normal operations but would be regularly tested and kept synchronized through manual configuration replication.
- The Root CA certificate validity was reduced to 10 years, while Issuing CAs were assigned a 5-year validity. End-entity certificates were configured as per the industry best practices.
- Industry best practices were enforced to configure Certificate Revocation List (CRL) publication and certificate status checking. For e.g., subordinate CAs were set to publish base CRLs every 7 days and delta CRLs every 24 hours, with a 2-day overlap period to minimize revocation-related disruptions during outages. CRLs and AIA extensions were distributed via HTTP (primary), LDAP (secondary), and OCSP to ensure real-time and reliable certificate status validation.
- A comprehensive overhaul of certificate templates was planned to eliminate redundancy, define clear ownership, and map each template to its intended use. This enabled stricter issuance policies and reduced operational ambiguity. Access to Issuing CAs was also restructured, ensuring that only authorized roles had the ability to issue or manage certificates, effectively closing gaps caused by undocumented or excessive user/group permissions.
- To address fragmentation and simplify certificate lifecycle management, the design recommended leveraging Microsoft Cloud PKI. This allowed seamless integration with existing MDM platforms while reducing dependency on on-premises infrastructure, ensuring a unified and scalable trust model anchored by a single source of truth.
Business Impact
By addressing longstanding architectural gaps and operational inefficiencies, the organization was able to strengthen its digital trust foundation and align with modern security standards. The key business outcomes included:
- With private keys now secured in hardware security modules (HSMs), the risk of compromise was significantly reduced. The environment moved away from outdated infrastructure and legacy dependencies, enabling smoother integration with modern platforms and cloud-native services.
- Operational clarity improved through tighter governance, certificate templates were streamlined, access controls were properly mapped, and issuance processes became far more predictable and auditable. This not only reduced administrative overhead but also minimized the likelihood of errors or misconfigurations.
- Business continuity was strengthened with the introduction of disaster recovery CAs, ensuring that critical certificate issuance could continue even during outages. The overall trust model became more scalable, resilient, and easier to manage, allowing IT and security teams to focus on proactive improvements rather than firefighting issues in a fragmented setup.
- Most importantly, the organization established a future-ready PKI framework capable of supporting growing users, devices, and applications, meeting the requirements for crypto-agility, and automation initiatives across its various use cases.
Conclusion
This modernization journey underscores a reality of secure PKI begins with smart design, and not just reactive fixes. By prioritizing architecture, enforcing access controls, adopting HSM-backed key storage, and streamlining certificate governance, the organization built a scalable, resilient trust foundation. Most importantly, it positioned itself to support future growth, automation, and crypto-agility, proving that a well-planned PKI is not just a security upgrade, but a strategic enabler for the digital enterprise.