Sequence alignment is an important tool in bioinformatics and computational biology. It uses dynamic programming (DP)—based algorithm to obtain optimal scores during the sequence homology search. This algorithm guarantees for accurate search, however with expense of quadratic time complexity. Thus, researchers have implemented the DP algorithm in Field Programmable Gate Array (FPGA)-based platform. However, the configuration stage also endures several challenges especially for protein sequence alignment. Prior to the sequence homology search, the processing element (PE) requires frequent memory load and rapid access to substitution matrix coefficients. The efficient supply of configuration data for the PE is crucial as to reduce the configuration time, hence affected speed performance of the core system. Typical PE configuration scheme uses serial configuration chain where it configures different look-up tables in the pipeline of PEs sequentially. Consequently, the configuration time increases proportionally to the number PEs. Thus, in this work, a new architecture of PE parallel loader with parallel configuration chain technique has been proposed. The parallel loader consists of several circular buffers, designed using n-bit registers and transmitted to the PEs via large data bus. This allows efficient and simultaneous supply of the configuration data to all PEs. This loader has been implemented on Virtex-5 FPGA and achieved 480.25 MHz clock frequency. It utilized only 52 or 0.3 percent of the XC5VLX110 Virtex-5 slices. Moreover, the parallel loader element length is parameterizable, thus it can load any size of substitution matrix score either BLOSUM or PAM series.