Character encoding is the system of assigning numeric correspondence to characters, such as letters and symbols, so computers can store, process, and share them. Encoding issues have become less common in data handling with the widespread adoption of the UTF-8 standard. Still, some researchers may experience problems when working with data from legacy systems or old databases. Here, we cover some of the basics of character encoding standards and tips for researchers to avoid potential problems.

Perma Link

Whether you have collected your own data or will be reusing existing datasets, you probably need to clean them up before you move forward with data analysis. This process includes fixing or removing incorrect, corrupted, unformatted, duplicate, or incomplete data. While the cleaning-up process may look different depending on the dataset you have at hand, this handout covers some essential tips to complete this task more efficiently while making your data more consistent, accurate, and high quality.

Perma Link