About the Workshop:
Using language models to embed proteins in high-dimensional space
Please join us for a hands-on, one-hour workshop designed for graduate students and postdocs to explore the rapidly evolving field of protein language models.
This session will introduce the basics of protein language models and guides participants through embedding protein sequences from FASTA files using Python and R.
- We will start by covering the foundational concepts behind protein language models—how they function similarly to natural language models but are specialized for protein sequences.
- We will then proceed with practical work in Jupyter notebooks, using Python and R libraries to generate embeddings from protein sequences with the help of a provided script.
- After creating the embeddings, we will use R to perform Principal Component Analysis (PCA) and visualize the relationships between the sequences.
This workshop is hands-on, and participants will gain experience in both generating and analyzing protein embeddings using their personal computers. By the end of the session, we aim to equip everyone with the skills needed to create and work with protein embeddings, enabling downstream applications in areas such as structure prediction and functional annotation.
Participants should have:
- a basic knowledge of Python and/or R
- intermediate experience in bioinformatics or computational biology
Prior to the workshop, attendees will receive installation instructions and a helper script to prepare for the session. Seating at this event is limited to 50 people.
Registration Deadline = Monday Oct. 21st