Optimising a Pipelined RISC-V Core: From Naive Pipeline to Near-Superscalar Performance