In a server or a personal computer, the executable applications and libraries reside in storage devices. At the beginning of the execution, they are accessed, transformed, possibly uncompressed, and stored in RAM before the execution starts.
The firmware of an embedded device is in general one single binary file containing all the software components, which can be transferred to the internal flash memory of the MCU. Since the flash memory is directly mapped to a fixed address in the memory space, the processor is capable of sequentially fetching and executing single instructions from it with no intermediate steps. This mechanism is called execute in place (XIP). All non-modifiable sections on the firmware don't need to be loaded in memory, and are accessible through direct addressing in the memory space. This includes not only the executable instructions, but also all the variables that are marked as constant by the compiler. On the other hand, supporting XIP requires a few extra steps in the preparation of the firmware image to be stored in flash, and the linker needs to be instructed about the different memory-mapped areas on the target.
The internal flash memory mapped in the address space of the microcontroller is not accessible for writing. Altering the content of the internal flash can be done only by using a block-based access, due to the hardware characteristics of flash memory devices. Before changing the value of a single byte in flash memory, in fact, the whole block containing it must be erased and rewritten. The mechanism offered by most manufacturers to access block-based flash memory for writing is known as In-Application Programming (IAP). Some filesystem implementations take care of abstracting write operations on a block-based flash device, by creating a temporary copy of the block where the write operation is performed.
During the selection of the components for a microcontroller-based solution, it is vital to properly match the size of the flash memory to the space required by the firmware. The flash is in fact often one of the most expensive components in the MCU, so for a deployment on a large scale, choosing an MCU with a smaller flash could be more cost-effective. Developing software with code size in mind is not very usual nowadays within other domains, but it may be required when trying to fit multiple features in such little storage. Finally, compiler optimizations may exist on specific architectures to reduce code size when building the firmware and linking its components.
Additional non-volatile memories that reside outside of the MCU silicon can typically be accessed using specific interfaces, such as Serial Peripheral Interface. External flash memories use different technologies than the internal flash, which is designed to be fast and execute code in place. While being generally more dense and less expensive, external flash memories do not allow direct memory mapping in the physical address space, which makes them not suitable for storing firmware images, as it would be impossible to execute the code fetching the instructions sequentially, unless a mechanism is used to load the executable symbols in RAM, because read access on these kinds of devices is performed one block at a time. On the other hand, write access may be faster when compared to IAP, making these kinds of non-volatile memory devices ideal for storing data retrieved at runtime in some designs.