Explore literary prizes with Positron’s Data Explorer
By Julia Silge
Summary
## Key takeaways - **Direct CSV to Data Explorer**: Click a CSV file in the workspace to open it straight into the data explorer using DuckDB, with no runtime needed even after restarting the session. [01:45], [02:11] - **Pin Columns and Rows**: Pin columns like name to keep them fixed on the side while scrolling wide data, and pin rows to keep them at the top for easy navigation and comparison. [05:24], [05:52] - **Search and Filter Columns**: Search in the summary panel to quickly find columns like year or those with P, and filter to see names containing Julia while maintaining sorts. [03:55], [08:25] - **Convert UI to Code**: Click convert to code to get SQL reflecting UI filters and sorts on the literary prizes data; generates Polars or tidyverse code for data frames. [09:06], [09:46] - **Performant for Millions Rows**: Data explorer handles 952 rows and 23 columns here but is built to be performant up to millions of rows and very wide data. [03:32]
Topics Covered
- CSV Opens Instantly Sans Runtime
- Pin Columns Rows for Wide Data
- UI Converts to Reproducible Code
- Data Explorer Fuels EDA Reproducibility
Full Transcript
Hi, my name is Julia Silgi and I am a data scientist and engineering manager at Posit. And these days I am working on
at Posit. And these days I am working on Posetron which is our new next generation data science and IDE. We just
pushed out a new monthly release of Positron that has some features I'm pretty excited about. And in this screencast, I'm just going to quickly walk through um what's new in the data
explorer. The data explorer is one of
explorer. The data explorer is one of our flagship features that I think makes positron different from other IDE that you may um that you may uh have alternatives and using. And so I'm going
to in this screencast walk through using um last week's tidy Tuesday data set on uh literary prizes um some of the some of how to use the data explorer some of what it can do that you may be
interested in.
All right, let's dig into these these literary prizes and with especially with an eye to uh what's new in the data explorer um in posatron here. So I've
got some Polar's code here that reads in this CSV and I'm going to show a a couple of different ways of getting into the data explorer. Um the first is if
you have objects that are either in your Python or your R um run like in your session that you have read in from a CSV or you've created them like any any kind
of different way you've done that you can either use something like a view the view magic or you can come over here and
click in which will give us um uh the the view here over into the data the data set. I also though want to show you
data set. I also though want to show you how a way that you can you can just get straight there. Actually, not with R or
straight there. Actually, not with R or Python. If you have a CSV over in your
Python. If you have a CSV over in your um in your workspace, you can click it and that will open your uh will will
actually use duct DB to open that CSV straight into your data explorer. So
even if I came over here, I restarted Python so that um like my my data frame is gone here like nothing is in my session right now. I click here and it
will open it um just with no runtime at all which I find really helpful when I want to um just kind of get a first look at what this is like. Let us um
let us actually I'm just going to get rid of the secondary sidebar al together over here so that I can have a nice big view here into my data explorer. If I
want that back I can say um focus into the secondary sidebar and it'll come back like that. But I'm going to get rid of it and just focus on the data
explorer for a while. So um uh this state explorer it's got a grid here and then it has a summary panel over here and in the summary panel you see
information um for example this is the um the prize year here and we can we can scroll um up and down and see the different variables. It tells us how
different variables. It tells us how many variables are missing, what percentage are missing. If I click in here, I can see some um some uh summary
stats and a little bit of a nice um uh little little histogram in here telling me what these are like. So, the summary panel gives me a nice view into uh overall and then of course I can
scroll around in the data explorer as I want to. The data explorer is built uh
want to. The data explorer is built uh to be performance uh first. It's built
to be super performant. So here this is you can see down here in the bottom um it tells me I've got uh 952 rows, 23 columns, but the data explorer is built
to be performant up to millions of rows and very wide data as well there. Okay,
so let's talk about a few of the things that you can do now that um are new in the release of Posatron that just came out. One is that you can search in the
out. One is that you can search in the summary panel over here. So, let's say I want to I'm like, h, how many of these are about a year? And I I find there the
year comes up and I'm like, oh, okay, how many of these, you know, have a P in them? And it filters down there. So, you
them? And it filters down there. So, you
can quickly, uh, search into your columns there. Um I uh we've got, you
columns there. Um I uh we've got, you know, tool tips here that tell you things about the values and what percentage they make up, which I find really helpful for getting going with
that initial um that initial kind of first look at a data set. I uh want to highlight this. This isn't new, but can
highlight this. This isn't new, but can be quite nice. What I'm looking at here is a view that's like specialized UI for
dealing with it. that sometimes say if I want to look at this CV CSV sometimes I just want to open it as the literally in an editor like I just want to be able to
see this and actually if I I don't really want to but like I could you know I can literally type now because this is an editor just a plain CSV so I can get to that from here or I can get to that
from here as well if I do open with and choose the the um the text editor there.
So that is um uh opening as a plain text file.
Something that is new in the newest release for the data explorer is being able to pin columns and rows. So let's
say I this this is pretty wide, right?
There's lots of stuff here. And I let's say I'm I'm like, oh man, I want to pin I want to pin the name here. So, I can pin this column and then it pops over to
this side and stays where it is. And
then I can um I can go around and um and scroll back and forth and see uh uh it like be able to compare what's in the name column to these other things. So,
you can pin more than one column. You
can pin rows as well. So, pinning is something. And notice when I pinned it
something. And notice when I pinned it also popped this up to the top so that and it also is pinned over here. So you
can do um use pins um of columns and of rows to be able to uh quickly uh you know get a sense of what's in your data to be able to navigate it pretty well.
So let us unpin this here. So now we're back to the unpin state here. Now let me talk about other things that you can do.
So you can um select and copy various parts of this if you want to. So I can um copy a column. Now, this column is on
my um uh on my clipboard. So, if I open a new file, uh let's call it I'm going to call this file um pasting
results. And I'm going to call it a I'm
results. And I'm going to call it a I'm going to uh call it as a TSV. And I am going to
open this with the um built in here like this. So if I paste it, um notice that I
this. So if I paste it, um notice that I can I just pasted that whole column there. So it was on my clipboard.
there. So it was on my clipboard.
Another thing that we can do, let me um hide this here. If I
um am scrolling around and I for some reason I want to uh select some rectangular shape like I want to you
know I'm I'm holding shift and clicking to get some kind of rectangular shape and then I copy then I can paste that in. And again this is a tab separated
in. And again this is a tab separated which is a really good fit. you know,
you can paste it into a CSV file like I have here, or you can paste it into a um uh you know, Excel or Google spreadsheets or something like that.
It's a really good fit for that as well.
So, we can select, we can copy and paste. Let me also now talking about
paste. Let me also now talking about sorting. So, let's say the prize alias
sorting. So, let's say the prize alias or prize name here. Let's say I want to
um I want to sort uh ascending here. And
so now I've got this sorted.
So it's in alphabetical order and it's telling me uh I can see right here that it is sorted. And if I were to scroll down and now I see that the prize names
are all grouped together in that way. So
I can sort and then let's say um I also want to filter. Let's see. I want to find um all the
all the people who um whose first name is their first name contains um my name is Julia. So let's see how many people have
Julia. So let's see how many people have a a Julie in their name like that. So
here we've got some Julian's and some Julia's here. So um I I sorted. So, this
Julia's here. So um I I sorted. So, this
is still sorted over here. And I've also done a filter. So, I can see um up here it's telling me the filters that I have.
I can clear things. I can get rid of this. I can add another filter if I'm
this. I can add another filter if I'm interested in that. But something else that's new in this current release of Posatron that I'm really excited about
is the convert to code um fe feature here. So if I click this, it's going to
here. So if I click this, it's going to give me code here that um uh that reflects what I've done in the UI. So if
I I have done a um I have done a a filter and so it's giving me the uh SQL code that shows me how to do that filter. And then I've also sorted by
filter. And then I've also sorted by prize name. And so it gives me the SQL
prize name. And so it gives me the SQL here. Um uh if I instead were I can copy
here. Um uh if I instead were I can copy it here and I you know if I want to go in here I can copy it. This is a C T C T C T C T C T C T C T C T C T C T C T C T C T C T C T C T C T C T C T C T CSV file, so not the best place to um copy
of course, but really exciting to be able to have that. So if I was looking at a Polar dataf frame, I would get Polar's Python code. Pandas, I would get Pandanda's Python code. And if I was
looking at an R dataf frame, we currently have support for Tidyiverse code, but are going to be adding more pretty soon.
All right, we did it. We opened up this data set about literary prizes in the data explorer. We talked about how to
data explorer. We talked about how to use different features like pinning columns, filtering and sorting, how to export the um as code the UI um choices
that we made. When I think about the data explorer, um, what excites me about it is that it helps you in your exploratory data analysis, like that very iterative exploratory part of the
work that we do as data practitioners.
And at the same time, it has tools that help you um, help you have reproducible practices, help you set you up for success in on the kind of work that you need to do. So, I hope this was helpful
and I'll see you next time.
Loading video analysis...