# Solution-1
Apply your skills to filter structures and visualize the hits.

For details see [filters](https://github.com/sbl-sdsc/mmtf-pyspark/tree/master/mmtfPyspark/filters) and [demos](https://github.com/sbl-sdsc/mmtf-pyspark/tree/master/demos/filters).

#### Import pyspark and mmtfPyspark

In [1]:
from pyspark.sql import SparkSession
from mmtfPyspark.io import mmtfReader
from mmtfPyspark.filters import ContainsDnaChain, ContainsLProteinChain, ContainsSequenceRegex, RFree
from mmtfPyspark.structureViewer import view_structure, view_group_interaction

#### Configure Spark

In [2]:
spark = SparkSession.builder.appName("Solution-1").getOrCreate()

#### Read PDB structures

In [3]:
path = "../resources/mmtf_reduced_sample/"
pdb = mmtfReader.read_sequence_file(path).cache()

### TODO-1: Filter structures by R-free in the range [0, 0.2]

In [4]:
pdb = pdb.filter(RFree(0, 0.2))

### TODO-2: Find the protein-DNA complexes in this dataset

In [5]:
pdb = pdb.filter(ContainsLProteinChain()).filter(ContainsDnaChain())

### TODO-3: Create a of list the PDB IDs of the protein-DNA complexes

In [6]:
complexes = pdb.keys().collect()

### Visualize the protein-DNA complexes

In [7]:
view_structure(complexes);

interactive(children=(IntSlider(value=0, continuous_update=False, description='Structure', max=27), Output()),…

### TODO-4: Find complexes with a Zinc finger motif
Zinc finger have the following sequence motif (regular expression): "C.{2,4}C.{12}H.{3,5}H".

In [8]:
pdb = pdb.filter(ContainsSequenceRegex("C.{2,4}C.{12}H.{3,5}H"))

In [9]:
view_group_interaction(pdb.keys().collect(),"ZN");

interactive(children=(IntSlider(value=0, continuous_update=False, description='Structure', max=1), Output()), …

In [10]:
spark.stop()